5,649 Matching Annotations
  1. May 2025
    1. Author response:

      We sincerely thank all reviewers for their thoughtful, detailed, and supportive evaluations of our manuscript. We are very pleased that the reviewers appreciated the integrative approach of our study, the quality of the imaging and analyses, and the insights provided into the parallel evolution of biomineralization mechanisms in sponges and corals.

      We are carefully considering all the suggestions made, including those regarding the improvement of figure clarity and the clarification of certain image interpretations. These comments are extremely valuable, and we are preparing a detailed point-by-point reply to accompany our revised manuscript.

      It was also brought to our attention that the links to the Zenodo repository were incorrect. We apologize for this oversight and any inconvenience it may have caused and will updae the links in our revised manuscript. In the meantime, the correct Zenodo repositories can be accessed using the following links:

      https://zenodo.org/records/14755899

      https://zenodo.org/records/13847772

      We again thank the reviewers for their constructive feedback, which will help us to further strengthen the manuscript.

    1. Author response:

      We thank the editors and reviewers for their thoughtful and constructive evaluation of our manuscript, “Krüppel Regulates Cell Cycle Exit and Limits Adult Neurogenesis of Mushroom Body Neural Progenitors in Drosophila.” We are pleased that all reviewers recognised the novelty and significance of identifying Krüppel (Kr) as a key transcription factor promoting timely termination of mushroom body neuroblast (MBNB) proliferation, and the potential antagonistic function of Kr-h1.

      We appreciate the helpful suggestions aimed at improving the mechanistic clarity and presentation of our findings. Below, we outline how we plan to address the major points raised in the full revision.

      (1) Characterisation of the KrIf-1 allele and Kr expression

      We agree that clarifying the nature of the KrIf-1 allele is important. In response to this concern, we will examine Kr expression in KrIf-1 mutant larval, pupal, and adult brains using immunostaining and available reporter lines. These experiments will help determine whether the observed neuroblast retention phenotype correlates with altered Kr expression in MBNBs.

      (2) Regulatory relationships between Kr, Kr-h1, Imp, Syp, Chinmo, and E93

      We are currently performing additional experiments to clarify the interactions among these temporal factors. For instance, we are testing whether Kr-h1 overexpression alters the expression of Imp, Syp, and E93. We have obtained a published E93 antibody from Dr Chris Doe (Syed et al., 2017) and will include E93 expression analysis in our revised manuscript.

      While Chinmo is of interest, its expression is well established to be regulated downstream of Imp/Syp via mRNA stability (Liu et al., 2015; Ren et al., 2017). Given that we currently lack reliable tools to assess Chinmo levels, we will focus primarily on Imp, Syp, and E93 as readouts for Kr/Kr-h1 function. If we succeed in obtaining Chinmo antibodies or reporter lines in time, we will include corresponding data.

      (3) Expression of Kr-h1 in MBNBs

      We fully agree that direct evidence for Kr-h1 expression in MBNBs is important. To address this, we have obtained the Kr-h1::GFP BAC transgenic line (BDSC #96786) and are currently using it to assess Kr-h1 expression in MBNBs. We also tested an anti–Kr-h1 antibody previously reported by Kang et al. (2017), developed in the context of fat body studies, but it did not yield clear signals in larval MBNBs. However, previous work by Shi et al. (2007) clearly demonstrated Kr-h1 expression in the developing MB, including MBNBs, using a custom antibody developed by their lab. We also contacted the Lee lab to request this antibody, but unfortunately, it is no longer available. We will include the results obtained using the GFP BAC line in the revised manuscript and, if needed, pursue RNA in situ hybridisation to further validate Kr-h1 expression in MBNBs.

      (4) Temporal Kr knockdown and MARCM analysis

      We appreciate the suggestion to validate our RNAi-based temporal knockdown results using MARCM. We plan to perform MBNB-specific MARCM analysis following the strategy described by Rossi et al. (2020). However, this approach requires additional time due to the logistics of acquiring the necessary fly stocks, generating appropriate genetic combinations, and conducting clonal analyses. While we will make every effort to include these data, we note that RNAi-based knockdown offers the advantage of temporal reversibility and has been essential for assessing stage-specific requirements in our current study.

      (5) Details of the targeted genetic screen

      Kr was initially identified as part of a broader, ongoing effort to screen for candidate transcription factors and cell cycle regulators involved in neuroblast cell cycle exit and/or quiescence. As this screen is still preliminary and incomplete, we prefer not to include the full dataset at this stage. Instead, we will revise the manuscript to clarify that Kr was prioritised for further investigation based on the striking MBNB-specific phenotype observed upon RNAi-mediated knockdown and in the KrIf-1 mutant, rather than through a completed screening process.

      (6) Clarifying the model (Figure 6D) and interactions

      We will revise the proposed model to distinguish between experimentally supported interactions and speculative ones. As noted above, we will primarily focus on the Imp/Syp and E93 axis in relation to Kr and Kr-h1 activity. Chinmo will be omitted from the model unless further data become available to support its inclusion.

      (7) Clarifications on figures and data presentation

      We appreciate the feedback on figure clarity. We will revise figures such as 1B, 2C, and 3A to improve legibility and presentation. We will also correct typographical errors and figure references, and clarify the activity patterns of the GAL4 drivers. Specifically, while UASmCD8::GFP expression driven by OK107-GAL4 is markedly weaker in MBNBs than in their neuronal progeny (as seen, for example, in Figure S3C), the driver remains active and functionally relevant in MBNBs. We believe the weak expression in MBNBs likely explains the absence of a NB retention phenotype in OK107>KrIR adult brains (see main text, Lines 374–376). As suggested by the reviewer, we will clarify this point earlier in the manuscript and can include additional data showing OK107>GFP expression patterns in pupal MB lineages as supplementary material.

      (8) Analysis of public datasets

      We will include results from our analysis of publicly available datasets such as FlyAtlas2, modENCODE, and a time-course RNA-seq dataset specific to MBNBs (Liu et al., 2015). While the spatial resolution of FlyAtlas2 and modENCODE is limited, the MBNB dataset provides valuable temporal information up to 36 h after puparium formation (APF). From this dataset, we observe that Kr expression remains consistently low throughout development, with only a modest increase at 84 h ALH (mean TPM ~11) and 36 h APF (~7), suggesting it does not undergo strong transcriptional regulation in MBNBs. In contrast, Kr-h1 is highly expressed during early larval stages (24–84 h ALH; mean TPM ~55–60) and shows a marked suppression by 36 h APF (mean TPM ~2), consistent with its proposed role in promoting MBNB proliferation. Importantly, Eip93F (E93) exhibits a reciprocal pattern to Kr-h1—with minimal expression until 84 h ALH (mean TPM ~24), followed by a substantial induction at 36 h APF (mean TPM ~104), aligning with its known role in triggering neuroblast termination. These temporal expression dynamics support our model that Kr-h1 and E93 function in opposition during the transition from proliferative to terminating neuroblast states. We will summarise these findings in the revised manuscript, along with appropriate discussion of dataset limitations.

      We hope this provisional response conveys our strong commitment to thoroughly addressing the reviewers’ concerns and improving the manuscript. We are currently carrying out additional experiments and will submit a revised version with new data and enhanced clarity in due course.

      References:

      Kang et al., 2017. Sci Rep. 7(1):16369. doi: 10.1038/s41598-017-16638-1.

      Shi et al., 2007. Dev Neurobiol. 67(11):1614–1626. doi: 10.1002/dneu.20537.

      Rossi et al., 2020. eLife. 9:e58880. doi: 10.7554/eLife.58880.

      Liu et al., 2015. Science. 350(6258):317–320. doi: 10.1126/science.aad1886.

      Ren et al., 2017. Curr Biol. 27(9):1303–1313. doi: 10.1016/j.cub.2017.03.018. Syed et al., 2017. eLife. 6:e26287. doi: 10.7554/eLife.26287.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      The manuscript by Garcia et al. describes how the expression of a respiratory chain alternative oxidase (AOX) from the tunicate Ciona intestinalis, capable of transferring electrons directly from reduced coenzyme Q (CoQ) to oxygen, is able to induce an increase in the mass of Drosophila melanogaster larvae and an accelerated development, especially when the larvae are kept at low temperatures. In order to explain this phenomenon, the paper addresses the modifications in the activity and levels of the 'canonical' electron transfer system (ETS), i.e., complexes I-IV and of the ATP synthase. In addition, the abundance of different metabolites as well as the NAD+/NADH ratios are measured, finding significant differences between the larvae.

      Strengths:

      The observations of differences in growth, body mass and food intake in the wt D. melanogaster larvae vs. those expressing the AOX transgene are solid. The evidence that mild uncoupling of the ETS might accelerate development of the fly larvae is convincing."

      We appreciate the reviewer’s attention to our results and hope we can improve the manuscript to address all criticism appropriately.

      Weaknesses:

      Some of the observations, especially those concerning the origin of the metabolic remodelling in AOX-expressing larvae, are left unexplained, and the argumentation is somewhat speculative. What the authors mean by "reconfiguration" of the mitochondrial electron transfer system is not clear. If this implies that there is an actual change in ETS function and/or structural organisation in the presence of AOX, this conclusion is not supported by the experimental data. In addition, the influence of AOX activity in the mitochondrial ETS system is tested in vitro in the presence of saturating concentrations of substrates. The real degree to which AOX activity is actually influencing ETS activity in vivo remains unknown.

      Indeed, the term “reconfiguration” may seem a little too strong. However, we do have preliminary structural data on larval mitochondria indicating that the term is adequate in this context. We plan to work on obtaining concrete data to sustain our claims that AOX imparts significant functional and structural remodeling of the organelle, which would be consistent with our respirometry and BN-PAGE data. If the data turns out not to be robust enough, we will consider replacing the term with one that better reflects our findings.

      We also realize that the in vivo data we are presenting (body mass, mobility, food intake) are indirect measurements of metabolism and that a more direct approach is necessary to assess the real degree to which AOX influences ETS activity in vivo. To address this issue, we plan to expand our pharmacological treatments of the larval development and to measure whole larval oxygen consumption.

      Reviewer #2 (Public review):

      Summary:

      This manuscript presents intriguing findings about the role of alternative oxidase (AOX) from the tunicate Ciona intestinalis in accelerating growth and development when expressed in Drosophila melanogaster.

      Strengths:

      The study is overall well-constructed, including appropriate analysis. Likewise, the manuscript is written clearly and supported by high-quality figures. The present study provides valuable insights into AOX's role in Drosophila development. The paper attempts to explore a unique mechanism by which AOX influences Drosophila development, providing insights into mitochondrial respiration and its physiological effects. This is relevant for understanding mitochondrial dysfunction and potential therapeutic applications. The study employs a variety of approaches, including calorimetry, infrared thermography, and genetic analyses, to investigate AOX's impact on metabolism and development.

      We sincerely thank the reviewer for recognizing the strengths and acknowledging the novelty of our study.

      Weaknesses:

      There are a number of methodological limitations and substantial gaps in the interpretation of the data presented, which reduces the strength of its conclusions. For instance, there is a misunderstanding of the non-proton motive nature of the AOX - it does not uncouple respiration, merely decouple it as it neither contributes to nor dissipates the proton motive force, in contrast to chemical uncouplers or proton uncouplers such as UCPs. The authors need to reassess their data in light of the above.

      The reviewer is absolutely right about the non-proton motive nature of AOX. We will reassess our data considering that AOX decouples respiration and, if necessary and possible, we will add new experiments to address the methodological limitations raised by the reviewer.

    1. Author response:

      We appreciate the reviewers' positive feedback on our paper. We especially thank them for their evaluation of the genetic analysis, which required a significant amount of timef time. We acknowledge that several aspects of our interpretation and description of the results need correction, as noted by both reviewers. Additionally, we recognize the importance of providing a more comprehensive overview of previous findings, including those conducted in mice, in the manuscript. In the revised version, we will thoroughly address the reviewers' concerns.

      Both reviewers emphasized the need for further validation to ascertain whether the specific requirement of Hox genes in the Hoxba and Hoxbb clusters for pectoral fin bud formation is due to their expression patterns or the functional roles of Hox proteins. This consideration has been on our agenda for some time; however, our submitted paper does not sufficiently address this aspect. In the revised manuscript, we will conduct a comprehensive analysis of the expression patterns of Hox genes in zebrafish to draw informed conclusions on this matter.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      The authors investigate how the viscoelasticity of the fingertip skin can affect the firing of mechanoreceptive afferents and they find a clear effect of recent physical skin state (memory), which is different between afferents. The manuscript is extremely well-written and well-presented. It uses a large dataset of low threshold mechanoreceptive afferents in the fingertip, where it is particularly noteworthy that the SA-2s have been thoroughly analyzed and play an important role here. They point out in the introduction the importance of the non-linear dynamics of the event when an external stimulus contacts the skin, to the point at which this information is picked up by receptors. Although clearly correlated, these are different processes, and it has been very well-explained throughout. I have some comments and ideas that the authors could think about that could further improve their already very interesting paper. Overall, the authors have more than achieved their aims, where their results very much support the conclusions and provoke many further questions. This impact of the previous dynamics of the skin affecting the current state can be explored further in so many ways and may help us to better understand skin aging and the effects of anatomical changes of the skin.

      At the beginning of the Results, it states that FA-2s were not considered as stimuli did not contain mechanical events with frequency components high enough to reliably excite them. Was this really the case, did the authors test any of the FA-2s from the larger dataset? If FA-2s were not at all activated, this is also relevant information for the brain to signal that it is not a relevant Pacinian stimulus (as they respond to everything). Further, afferent receptive fields that were more distant to the stimulus were included, which likely fired very little, like the FA-2s, so why not consider them even if their contribution was low?

      Thank you for bringing this up, we have now clarified in the text that while FA-2s did respond at a low rate during the experiment, their responses were not reliably driven by the force stimuli. In the Methods section we have included the following text:

      “Initially, 10 FA-2 neurons were also included in the analysis. But their responsiveness during the experiment was remarkably low, and unlike the other neuron types, their responses were rarely affected by force stimuli. Specifically, only one of the observed FA-2 neurons responded during the force protraction phases. Due to the lack of clear stimulus-driven responses, FA-2 neurons were subsequently excluded from further analysis.”

      One question that I wondered throughout was whether you have looked at further past history in stimulation, i.e. not just the preceding stimulus, but 2 or 3 stimuli back? It would be interesting to know if there is any ongoing change that can be related back further. I do not think you would see anything as such here, but it would be interesting to test and/or explore in future work (e.g. especially with sticky, forceful, or sharp indentation touch). However, even here, it could be that certain directions gave more effects.

      This is a very interesting question! A discernible effect from the previous stimulus could persist at the end of the current stimulation (see Figure 4C), potentially influencing the next one—a 2-stimuli-back effect. Unfortunately, our experimental design did not allow for rigorous testing of this effect. While all possible pairs of stimulus directions were included in immediately consecutive trials, this was not the case for pairs separated by additional trials. Hence, the combination of a likely weak effect and limited variation in history precluded a thorough analysis of a 2-stimuli-back effect. Future work should delve into the time course of the viscoelastic effect in greater detail.

      Did the authors analyze or take into account the difference between receptive field locations? For example, did afferents more on the sides have lower responses and a lesser effect of history?

      An investigation into the potential impact of the relationship between the receptive field location on the fingertip skin and the primary contact site of the stimulus surface revealed no discernible influence for SA-1 and SA-2 neurons. In contrast, FA-1 neurons, particularly those predominantly sensitive to the previous stimulation or displaying mixed sensitivity, exhibited a tendency to terminate near the primary stimulation site. We have added these observations to the text:

      “We found no straightforward relationship between a neuron's sensitivity to current and previous stimulation and its termination site in fingertip skin. Specifically, there was no statistically significant effect of the distance between a neuron's receptive field center and the primary contact site of the stimulus surface on whether neurons signaled current, prior, or mixed information for SA-1 (Kruskal-Wallis test H(2)=3.86, p= 0.15) or SA-2 neurons (H(2)=0.75, p=0.69). However, a significant difference emerged for FA-1 neurons (H(2)=8.66, p=0.01), indicating that neurons terminating closer to the stimulation site on the flat part of the fingertip were more likely to signal past or mixed information.”

      Was there anything different in the firing patterns between the spontaneous and non-spontaneously active SA-2s? For example, did the non-spontaneous show more dynamic responses?

      The firing patterns of both spontaneously and non-spontaneously active SA-2 neurons shared similarities in terms of adaptation and range of firing rate modulation in response to force stimuli, i.e., ‘dynamic response’. The distinction lay in the pattern of modulation of the firing rate associated with stimulus presentations. For spontaneously active SA-2 neurons, this modulation occurred around a significant background discharge, implying that a force stimulus could either decrease or increase the firing rate, depending on how it deformed the fingertip. This characteristic is well illustrated by the firing pattern of the neuron depicted in the lower panels of Figure 3D. Conversely, in non-spontaneously active SA-2 neurons, a force stimulus could only induce an increase in the firing rate or no change. Although the neuron depicted in the upper panels of Figure 3D exhibited some background activity, it serves to exemplify this characteristic. In the text, we have elucidated the dynamics of the SA-2 neuron response by highlighting that force stimulation can either decrease or increase the firing rate in neurons with spontaneous activity through the following addition/change:

      “This increased variability was most evident during the force protraction phase where most neurons exhibited the most intense responses. Increased variability was also observed in instances where the dynamic response to force stimulation involved a decrease in the firing rate (lower panels of Figure 3D). This phenomenon was observed in SA-2 neurons that maintained an ongoing discharge during intertrial periods (cf. Fig. 2A). In these cases, the response to a force stimulus constituted a modulation of the firing rate around the background discharge, signifying that a force stimulus could either decrease or increase the firing rate depending on the prevailing stimulus direction.”

      Were the spontaneously active SA-2 afferents firing all the time or did they have periods of rest - and did this relate to recent stimulation? Were the spontaneously active SA-2s located in a certain part of the finger (e.g. nail) or were they randomly spread throughout the fingertip? Any distribution differences could indicate a more complicated role in skin sensing.

      SA-2 neurons, in general, are well-known for undergoing significant post-stimulation depression (e.g., Knibestöl and Vallbo, 1970; Chambers et al., 1972; Burgess and Perl, 1973). In our force stimulations, this post-excitatory depression manifested as a reduced or absent response during the latter part of the stimulus retraction period for stimuli in directions that markedly excited the neuron. The excitability recovered when the fingertip relaxed during the subsequent intertrial period, and for "spontaneously active" neurons, the firing resumed (see examples in Figure 7A). Furthermore, some “spontaneously active” neurons could be silenced or exhibit a near-silent period during force stimulation for certain force directions, while the spontaneous firing returned during the upcoming intertrial period when the fingertip shape recovered (for example, see responses to stimulation in the proximal and especially ulnar directions in the top panel in Figure 7A).

      Regarding the location of the receptive field centres of spontaneously active and non-spontaneously active SA-2 neurons on the fingertip we did not observe any obvious spatial segregation. To illustrate this, we have revised Figure 1A by color-marking SA-2 neurons that exhibited ongoing activity in intertrial periods, and the figure caption has been modified accordingly:

      “Figure 1. Experimental setup. A. Receptive field center locations shown on a standardized fingertip for all first-order tactile neurons included in the study, categorized by neuron type. Purple symbols denote spontaneously active SA-2 neurons exhibiting ongoing activity without external stimulation.”

      Did the authors look to see if the spontaneous firing in SA-2s between trials could predict the extent to which the type 1 afferents encode the proceeding stimulus? Basically, does the SA-2 state relate to how the type 1 units fire?

      We found no clear indications that the responses of FA-1 and SA-1 could be readily anticipated based on the firing patterns of SA-2 neurons.

      In the discussion, it is stated that "the viscoelastic memory of the preceding loading would have modulated the pattern of strain changes in the fingertip differently depending on where their receptor organs are situated in the fingertip". Can the authors expand on this or make any predictions about the size of the memory effect and the distance from the point of stimulation?

      We have explored this topic further in the text, referring to recent studies modeling essential aspects of fingertip mechanics. However, in our view, current models lack the capability to predict the specific nature sought by the reviewer. These models should include a detailed understanding of the intricate networks of collagen fibers anchoring the pulp tissue at the distal phalangeal bone and the nail. They should also consider potential inherent directional preferences of the receptor organs, attributed to their microanatomy. The text modifications are as follows:

      “In addition to the receptor organ locations, the variation in sensitivity among neurons to fingertip deformations in response to both previous and current loadings would stem from the fingertip’s geometry and its complex composite material properties. Possible inherent directional preferences of the receptor organs, attributed to their microanatomy, could also be significant. However, mechanical anisotropy, particularly within the viscoelastic subcutaneous tissue of the fingertip induced by intricately oriented collagen fiber strands forming fat columns in the pulp (Hauck et al., 2004), are likely to play a crucial role. This anisotropy would shape the dynamic pattern of strain changes at neurons' receptor sites, intricately influencing a neuron's sensitivity not only to current but also to preceding loadings. Indeed, recent modeling efforts suggest that such mechanical anisotropy strongly influences the spatiotemporal distribution of stresses and strains across the fingertip (Duprez et al., 2024).”

      Relatedly, we have included additional text to provide a more comprehensive explanation of the “bulk deformation” of the fingertip that occurs during the loadings:

      “As pressure increases in the pulp, the pulp tissue bulges at the end and sides of the fingertip. Simultaneously, the tangential force component amplifies the bulging in the direction of the force while stretching the skin on the opposite side.”

      In the discussion, it would be good if the authors could briefly comment more on the diversity of the mechanoreceptive afferent firing and why this may be useful to the system.

      The diversity in responses among neurons is instrumental in enhancing the information transmitted to the brain by averting redundancy in information acquisition. This diversity thereby contributes to an overall increase in information. We've included a brief statement, along with several references, underscoring this concept:

      "The resulting diversity in the sensitivities of neurons might enhance the overall information collected and relayed to the brain by the neuronal population, facilitating the discrimination between tactile stimuli or mechanical states of the fingertip (see Rongala et al., 2024; Corniani et al., 2022; Tummala et al., 2023, for more extensive explorations of this idea)."

      Also, the authors could briefly discuss why this memory (or recency) effect occurs - is it useful, does it serve a purpose, or it is just a by-product of our skin structure? There are examples of memory in the other senses where comparisons could be drawn. Is it like stimulus adaptation effects in the other senses (e.g. aftereffects of visual motion)?

      We have expanded the concluding paragraph of the discussion, specifically delving into the question of whether the mechanical memory effect serves a deliberate purpose or is simply an incidental byproduct of our skin structure:

      “In any case, the viscoelastic deformability of the fingertips plays a pivotal role in supporting the diverse functions of the fingers. For example, it allows for cushioned contact with objects featuring hard surfaces and allows the skin to conform to object shapes, enabling the extraction of tactile information about objects' 3D shapes and fine surface properties. Moreover, deformability is essential for the effective grasping and manipulation of objects. This is achieved, among other benefits, by expanding the contact surface, thereby reducing local pressure on the skin under stronger forces and enabling tactile signaling of friction conditions within the contact surface for control of grasp stability. Throughout, continuous acquisition of information about various aspects of the current state of the fingertip and its skin by tactile neurons is essential for the functional interaction between the brain and the fingers. In light of this, the viscoelastic memory effect on tactile signaling of fingertip forces can be perceived as a by-product of an overall optimization process within prevailing biological constraints.”

      One point that would be nice to add to the discussion is the implications of the work for skin sensing. What would you predict for the time constant of relaxation of fingertip skin, how long could these skin memory effects last? Two main points to address here may be how the hydration of the skin and anatomical skin changes related to aging affect the results. If the skin is less viscoelastic, what would be the implications for the firing of mechanoreceptors?

      It is likely that the time constant depends to some extent on mechanical factors of the skin, which will likely change due to age or environmental factors. However, while these questions are intriguing, they fall outside the scope of the current study and we are not aware of studies that have addressed these issues directly in experiments either.

      How long does it take for the effect to end? Again, this will likely depend on the skin's viscoelasticity. However, could the authors use it in a psychophysical paradigm to predict whether participants would be more or less sensitive to future stimuli? In this way, it would be possible to test whether the direction modifies touch perception.

      Time constants for tissue viscoelasticity have been estimated to extend up to several seconds (see citations in the introduction). While direct perceptual effects could indeed be explored through psychophysical experimental paradigms, we are currently unaware of any studies specifically addressing the type of effect described in this study. In addition to the statement that, concerning manipulation and haptic tasks, "to our knowledge, a possible influence of fingertip viscoelasticity on task performance has not been systematically investigated," we have now also addressed tactile psychophysical tasks conducted during passive touch with the following sentence in the text:

      “Similarly, there is a lack of systematic investigation of potential effects of fingertip viscoelasticity on performance in tactile psychophysical tasks conducted during passive touch.”

      Reviewer #2 (Public Review):

      Summary:

      The authors sought to identify the impact skin viscoelasticity has on neural signalling of contact forces that are representative of those experienced during normal tactile behaviour. The evidence presented in the analyses indicates there is a clear effect of viscoelasticity on the imposed skin movements from a force-controlled stimulus. Both skin mechanics and evoked afferent firing were affected based on prior stimulation, which has not previously been thoroughly explored. This study outlines that viscoelastic effects have an important impact on encoding in the tactile system, which should be considered in the design and interpretation of future studies. Viscoelasticity was shown to affect the mechanical skin deflections and stresses/strains imposed by previous and current interaction force, and also the resultant neuronal signalling. The result of this was an impaired coding of contact forces based on previous stimulation. The authors may be able to strengthen their findings, by using the existing data to further explore the link between skin mechanics and neural signalling, giving a clearer picture than demonstrating shared variability. This is not a critical addition, but I believe would strengthen the work and make it more generally applicable.

      Strengths:

      - Elegant design of the study. Direct measurements have been made from the tactile sensory neurons to give detailed information on touch encoding. Experiments have been well designed and the forces/displacements have been thoroughly controlled and measured to give accurate measurements of global skin mechanics during a set of controlled mechanical stimuli.

      - Analytical techniques used. Analysis of fundamental information coding and information representation in the sensory afferents reveals dynamic coding properties to develop putative models of the neural representation of force. This advanced analysis method has been applied to a large dataset to study neural encoding of force, the temporal dynamics of this, and the variability in this.

      Weaknesses:

      - Lack of exploration of the variation in neural responses. Although there is a viscoelastic effect that produces variability in the stimulus effects based on prior stimulation, it is a shame that the variability in neural firing and force-induced skin displacements have been presented, and are similarly variable, but there has been no investigation of a link between the two. I believe with these data the authors can go beyond demonstrating shared variability. The force per se is clearly not faithfully represented in the neural signal, being masked by stimulation history, and it is of interest if the underlying resultant contact mechanics are.

      Thank you for this suggestion. We have added a new section investigating the link between skin deformation and neural firing in more depth via a simple neural model. Please see our answer below in the ‘Recommendations’ section for further details.

      Validity of conclusions:

      The authors have succeeded in demonstrating skin viscoelasticity has an impact on skin contact mechanics with a given force and that this impacts the resultant neural coding of force. Their study has been well-designed and the results support their conclusions. The importance and scope of the work is adequately outlined for readers to interpret the results and significance.

      Impact:

      This study will have important implications for future studies performing tactile stimulation and evaluating tactile feedback during motor control tasks. In detailed studies of tactile function, it illustrates the necessity to measure skin contact dynamics to properly understand the effects of a force stimulus on the skin and mechanoreceptors.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (Very) minor comments

      - The authors say at the beginning of the Results that, "The fourth type of tactile neurons in the human glabrous skin, fast adapting type II neurons...". Although generally written that there are four types of afferent in the glabrous skin, it would be better to state that these are low-threshold A-beta myelinated mechanoreceptive afferents, at least one time, as there are other types of afferent in the glabrous skin that respond to mechanical stimulation (e.g. low and high threshold C-fibers).

      This is now clarified at the start of the Results section:

      “We recorded action potentials in the median nerve of individual low-threshold A-beta myelinated first-order human tactile neurons innervating the glabrous skin of the fingertip…”

      - Fig. 3: Could you add '(N)' as the measurement of force for Fig. 3A for Fz, Fy, and Fz? Also, please change 'Data was recorded' to 'Data were recorded' in the legend.

      Fixed.

      - At the beginning of the Methods, you say that your study conforms to the Declaration of Helsinki, which actually requires pre-registration in a database. If you did not pre-register your study, please can you add '... in accordance with the Declaration of Helsinki, apart from pre-registration in a database'.

      Thanks for making us aware of this. We have added the suggested qualifier to the ethics statement.

      Reviewer #2 (Recommendations For The Authors):

      The neural representation/encoding of the actual displacement vectors would be a useful addition to the analyses. These vectors have been demonstrated to systematically change with the condition in the irregular series (Figure 2E) and will thus significantly act on the dynamics of induced mechanical changes in the skin with a given interaction force. Thus, it could be examined how the neurons code the magnitude of displacements as well as their direction. An evaluation of the extent to which the imposed displacement magnitudes are encoded in the neural responses would be a useful addition in explaining the signalling of the force events and how the central nervous system decodes these. Evaluating an alternative displacement encoding for comparison to pure force encoding may reveal more about how contact events are represented in the tactile system, which must decode these variable afferent signals to reconstruct a percept of the interaction. It could then be explored how the central nervous system may then scale the dynamic afferent responses based on the background viscoelastic state likely to be present in the SA-II afferent signals (Figure 7) for a context in which to evaluate the dynamic contact forces. This may of course be a complex relationship for the type-I afferents, where the underlying mechanical events evoking the firing (microslips not represented in global forces) have not been measured here. Such a model could be more widely applicable, as the skin viscoelasticity and displacement magnitudes are a straightforward measurement metric and could perhaps be used as a better proxy for neural signalling. This would allow the investigation of a wider variety of forces, and the study of the timing of the viscoelastic effect, both of which have been fixed here. This would give the work a broader impact, rather than just highlighting that this effect produces variability, it could reveal if this mechanical feature is structured in the neural representation. The categorical encoding/decoding tested here is specific to the stimuli used (magnitudes, intervals), but there is the possibility that this may be more generally applicable (within the bounds of forces/speeds) if the underlying basis of the variability in the signalling produced by the viscoelasticity is identified. Since the time course of the viscoelasticity has not been measured here (fixed forces and intervals), further study is required to fully understand the implications this has for a wider variety of situations.

      We agree that a better understanding of how the mechanical deformations are reflected in the resulting spike trains would be valuable. While ultimately a full understanding will need precise measurements of skin deformation across the whole fingertip to account for mechanical propagation to mechanoreceptor locations, relating the deformations at the contact location with neural firing patterns directly can provide useful hints into which aspects of deformation are encoded and how. To this end, we ran a new analysis that aimed to predict the time-varying neural responses directly from the recorded mechanical movements of the contactor.

      Below we have reproduced the new results and methods text along with the additional figures for this analysis. Note that we have also added text in the Discussion to interpret these findings in the context of our other results.

      New section in Results titled Predicting neural responses from contactor movements: “The similarity in the history-dependent variation in neural firing and fingertip deformation at a given force stimulus suggests that neuronal firing is determined by how the fingertip deforms rather than the applied force itself. However, this similarity does not clarify the relationship between fingertip deformation dynamics and neural signaling. To investigate further, we fit cross-validated multiple linear regression models to evaluate how well distinct aspects of contactor movement could predict the time-varying firing rates of individual neurons during the protraction phases of the irregular sequence. The models used predictors based on (1) the three-dimensional position of the contactor, (2) its three-dimensional velocity, (3) a combination of position and velocity signals, and, finally, (4) position and velocity signals along with all possible two-way interactions between them, capturing potentially complex relationship between fingertip deformations and neural signaling.

      Comparing the variance explained (R<sup>2</sup>) by each regression model for each neuron type revealed clear differences between the models (Figure 5A). A two-way mixed design ANOVA, with regression model as within-group effects and neuron type as a between-group effect revealed a main effect of model on variance explained (F(3,462) = 815.5, p < 0.001, η<sub>p</sub><sup>2</sup> = 0.84). Model prediction accuracy overall increased with the number of predictors, with the two-way interaction model outperforming all others (p < 0.001 for all comparisons, Tukey’s HSD). Additionally, a significant main effect of neuron type (F(2,154) = 29.8, p < 0.001, η<sub>p</sub><sup>2</sup> = 0.28) and a significant interaction between regression model and neuron type were observed (F(6,462) = 50.8, p < 0.001, η<sub>p</sub><sup>2</sup> = 0.40).

      For neuron type, model predictions were most accurate for SA-2 neurons, followed by SA-1 neurons, with FA-1 neurons showing the lowest accuracy (p < 0.003 for all comparisons, Tukey’s HSD). The interaction between model and neuron type revealed distinct patterns. For SA-1 and SA-2 neurons, position-only and velocity-only models had similar prediction accuracy (p ≥ 0.996, Tukey’s HSD) with no significant differences between these neuron types (p ≥ 0.552, Tukey’s HSD). FA-1 neurons performed poorly with the position-only model but showed higher accuracy with the velocity-only model (p < 0.001, Tukey’s HSD) and better than SA-1 neurons (p = 0.006, Tukey’s HSD). Models combining position and velocity predictors (without interactions) surpassed both position-only and velocity-only models for SA-1 and SA-2 neurons (p < 0.001, Tukey’s HSD). Overall, the differences between neuron types broadly match their tuning to static and dynamic stimulus properties.

      The two-way interaction model, accounting for most variance in neural responses, produced mean R<sup>2</sup> values of 0.75 for FA-1, 0.88 for SA-1, and 0.91 for SA-2 neurons (Figure 5A). To evaluate the contribution of the different predictors, we ranked them using the permutation feature importance method, focusing on the six most important ones. Regression analyses using only these variables explained almost all of the variance explained by the full model, with a median R<sup>2</sup> reduction of just 0.055 across all neurons. Across all neuron types, at least half included all three velocity components (dPx, dPy, dPz) among the top six, with FA-1 neurons showing the highest prevalence (Figure 5B). Interactions between normal position (Pz) and each velocity component were also frequently observed, while interactions involving tangential position and velocity components were less common. Interactions among velocity components were relatively well represented, followed by interactions limited to position components. Position signals were generally less represented, except for normal position (Pz) in slowly adapting neurons, where it appeared in 50% of SA-1 and 68% of SA-2 neurons. Despite these broad trends, important predictors varied widely across ranks even within a given neuron class (see Figure 5-figure supplement 1), and even the most frequent variables appeared in only a subset of cases, suggesting broad variability in sensitivity across neurons.”

      New methods paragraph titled Predicting time-varying firing rates from skin deformations:

      “This analysis was conducted in Python (v3.13) with pandas for data handling, numpy for numerical operations, and scikit-learn for model fitting and evaluation.

      To assess how well individual neurons' time-varying firing rates could be predicted from simultaneous contactor movements, we fitted multiple linear regression models (see Khamis et al., 2015, for a similar approach}. This analysis focused on the force protraction phase of the irregular sequence, where neurons were most responsive and sensitive to stimulation history. Data from 100 ms before to 100 ms after the protraction phase (between -0.100 s and 0.225 s relative to protraction onset) were included for each trial. Neurons were included if they fired at least two action potentials during the force protraction phase and the following 100 ms in at least five of the 25 trials. This ensured sufficient variability in firing rates for meaningful regression analysis, resulting in 68 SA-1, 38 SA-2, and 51 FA-1 neurons being included.

      Contractor position signals digitized at 400 Hz were linearly interpolated to 1000 Hz. Instantaneous firing rates, derived from action potentials sampled at 12.8 kHz, were resampled at 1000 Hz to align with position signals. A Gaussian filter (σ = 10 ms, cutoff ~16 Hz) was applied to the firing rate as well as to the position signals before differentiation. To account for axonal conduction (8–15 ms) and sensory transduction delays (1–5 ms), firing rates were advanced by 15 ms to align approximately with independent variables.

      Regressions were performed using scikit-learn's Ridge and RidgeCV regressors, which apply L2 regularization to mitigate overfitting. Hyperparameter tuning for the regularization parameter (alpha) was performed using GridSearchCV with a predefined range (0.001–1000.0), incorporating five-fold cross-validation to select the best value. To minimize overfitting risks, model performance was further validated with independent five-fold cross-validation (KFold), and R<sup>2</sup> scores were computed using cross_val_score.

      We constructed four linear regression models with increasing complexity: (1) Position-only, using three-dimensional contactor positions (Px, Py, Pz); (2) Velocity-only, using three-dimensional velocities (dPx, dPy, dPz); (3) Combined, including all position and velocity signals (6 predictors); and (4) Interaction, including all signals and their two-way interactions (21 predictors). All features were standardized using StandardScaler to improve regularization and model convergence. PolynomialFeatures generated second-order interaction terms for the interaction model. Feature importance was evaluated with permutation_importance, and simpler models were built using the most important features. These models were validated through cross-validation to assess retained explanatory power.”

      Minor:

      - It would be useful to add a brief description of the material aspects of the contactor tip to the methods (as per Birznieks 2001).

      We have added the following statement:

      “To ensure that friction between the contactor and the skin was sufficiently high to prevent slips, the surface was coated with silicon carbide grains (50–100 μm), approximating the finish of smooth sandpaper.”

      - The axes labelling on Figure 3A and legend description is ambiguous, probably placing the Px, Py, and Pz labels on the far left axes and the Fx, Fy, and Fz on the right side of the far right axes would make this clearer.

      Label placement has been improved along with some other minor fixes.

      - For the quasi-static phase analysis, the phrase "absence of loading" used in reference to the interstimulus period and SA-II afferents does not seem to be a correct description. The finger is still loaded (at least in the normal direction), with a magnitude of imposed displacement that counteracts the viscoelastic force exerted by the skin mechanics of the fingertip. Although there is a zero net-force load, a mechanical stimulus is still being actively applied to the skin.

      We have changed the wording throughout the text and now consistently refer either to the “interstimulus period” directly or to an “absence of externally applied stimulation” to avoid confusion.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #2 (Public review):

      Summary:

      The revised paper by Kim et al. reports two disease mutations in proBMP4, S91C and E93G, disrupt the FAM20C phosphorylation site at Ser91, blocking the activation of proBMP4 homodimers, while still allowing BMP4/7 heterodimers to function. Analysis of DMZ explants from Xenopus embryos expressing the proBMP4 S91C or E93G mutants showed reduced expression of pSmad1 and tbxt1. The expert amphibian tissue transplant studies were expanded to in vivo studies in Bmp4S91C/+ and Bmp4E93G/+ mice, highlighting the impact of these mutations on embryonic development, particularly in female mice, consistent with patient studies. Additionally, studies in mouse embryonic fibroblasts (MEFs) demonstrated that the mutations did not affect proBMP4 glycosylation or ER-to-Golgi transport but appeared to inhibit the furin-dependent cleavage of proBMP4 to BMP4. Based on these findings and AI modeling using AlphaFold of proBMP4, the authors speculate that pSer91 influences access of furin to its cleavage site at Arg289AlaLysArg292 in a new "Ideas and Speculation" section. Overall, the authors addressed the reviewers' comments, improving the presentation.

      Strengths:

      The strengths of this work continue to lie in the elegant Xenopus and mouse studies that elucidate the impact of the S91C and E93G disease mutations on BMP signaling and embryonic development. Including an "Ideas and Speculation" subsection for mechanistic ideas reduces some shortcomings regarding the analysis of the underlying mechanisms.

      Weaknesses:

      (1)  (Minor) In Figure S1 and lines 165-174 and 179-180, the authors should consider that, unlike the wild-type protein (Ser), which can be reversibly phosphorylated or dephosphorylated, phosphomimic mutations are locked into mimicking either the phosphorylated state (Asp) or the non-phosphorylated state (Ala). Consequently, if the S91D mutant exhibits lower activity than WT, it could imply that S91D interferes with other regulatory constraints, as the authors suggest. However, it may also be inhibiting activation. Therefore, caution is warranted when comparing S91D with S91C to conclude that Ser91 phosphorylation increases BMP4 activity. While additional experiments are not necessary, further consideration is essential.

      (Minor) In lines 394-399, the authors cleverly speculate that pS91 interacts with Arg289-the essential P4 arginine for furin processing. If so, this interaction could hinder the cleavage of proBMP4, as indicated by the results in Figure S1. The discussion would benefit from considering that, contrary to their favored model, dephosphorylation at Ser91 might actually facilitate cleavage.

      We have added a paragraph raising this possibility but explaining why it is unlikely and inconsistent with our in vivo data. The S91D construct was a simple control that was tested in ectopic expression assays and not in vivo.  We can make no conclusions about whether this construct resembles the phosphorylated state or whether it hinders or facilitates cleavage in vivo. The conclusion that dephosphorylation promotes BMP4 cleavage or activity is not compatible with the finding that two mutations associated with birth defects in humans (p.S91C or p.E93G) that are predicted to prevent FAM20C-mediated phosphorylation of the BMP4 prodomain lead to impaired proteolytic maturation of endogenous BMP4 and reduced BMP activity in vivo. 

      (2)  In Figure 4, panels A, E, and I, the proBMP bands in the mouse embryonic lysates and MEFs expressing the mutations show a clear size shift. Are these shifts a cause or a consequence of the lack of cleavage? Regardless, the size shifts should be explicitly noted.

      These intriguing shifts were observed in some but not all biological replicates.  When present, the shifts were not reversed by treatment with phosphatases or deglycosylases, and the shifts were never observed in epitope tagged wild type controls.  We have added a paragraph noting the shifts and our tests of whether they might be due to glycosylation, phosphorylation or epitope tags. 

      (3)  (Minor) In line 314, the authors should consider modifying the wording to: "is required for modulating proprotein convertase..."

      The original wording (“Collectively, our findings are consistent with a model in which FAM20C-mediated phosphorylation of the BMP4 prodomain is not required for folding or exit of the precursor protein from the ER, but is required for proprotein convertase recognition and/or for trafficking to post-TGN compartment(s) where BMP4 is cleaved”) more accurately reflects the model that is supported by our findings. Stating that “phosphorylation ……is required to modulate proprotein convertase recognition and/or trafficking” is vague and leaves open the possibility that it modulates in either direction, which our data do not support as described in point 1 above.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews: 

      Reviewer #1 (Public Review): 

      This study investigates the role of microtubules in regulating insulin secretion from pancreatic islet beta cells. This is of great importance considering that controlled secretion of insulin is essential to prevent diabetes. Previously, it has been shown that KIF5B plays an essential role in insulin secretion by transporting insulin granules to the plasma membrane. High glucose activates KIF5B to increase insulin secretion resulting in the cellular uptake of glucose. In order to prevent hypoglycemia, insulin secretion needs to be tightly controlled. Notably, it is known that KIF5B plays a role in microtubule sliding. This is important, as the authors described previously that beta cells establish a peripheral sub-membrane microtubule array, which is critical for the withdrawal of excessive insulin granules from the secretion sites. At high glucose, the sub-membrane microtubule array is destabilized to allow for robust insulin secretion. Here the authors aim to answer the question of how the peripheral array is formed. Based on the previously published data the authors hypothesize that KIF5B organizes the sub-membrane microtubule array via microtubule sliding. 

      General comment: 

      This manuscript provides data that indicate that KIF5B, like in many other cells, mediates microtubule sliding in beta cells. This study is limited to in vitro assays and one cell line. Furthermore, the authors provide no link to insulin secretion and glucose uptake and the overall effects described are moderate. Finally, the overall effect of microtubule sliding upon glucose stimulation is surprisingly low considering the tight regulation of insulin secretion. Moreover, the authors state "the amount of MT polymer on every glucose stimulation changes only slightly, often undetectable…. In fact, we observe a prominent effect of peripheral MT loss only after a long-term kinesin depletion (three-four days)". This challenges the view that a KIF5Bdependent mechanism regulating microtubule sliding plays a major role in controlling insulin secretion. 

      (1) Our initial study was indeed done in a cell line, which is a normal approach to addressing molecular mechanisms of a phenomenon in a challenging cell model: primary pancreatic beta cells are prone to rapidly dedifferentiate outside of the organism and are hard to genetically modify. To address this reviewer’s comment, in the revised manuscript we now confirm the phenotype in beta cells within intact pancreatic islets from a KIF5B KO mouse model (New Figure 2 – Supplemental Figure 1).

      (2) We agree that testing the effect of microtubule sliding on insulin secretion is an important question. Unfortunately, the experimental design needed to accomplish this task is not straighDorward. Importantly, besides microtubule sliding, KIF5B is heavily engaged in insulin granule transport, and GSIS deficiency upon KIF5B inactivation is well documented (e.g. Varadi et al 2002). In this study, we choose not to repeat this GSIS assay because of ample existing data. However, this reported GSIS deficiency could result from a combination of lack of insulin granule delivery to the periphery (previous data) and from the depletion of insulin granules from the periphery due to the loss of the submembrane MT bundle (this study and Bracey et al 2020).  In order to exclusively test the role of MT sliding in secretion, a significant investment in mutant tool development would be needed. Ideally, a new mutant mouse model where insulin granule transport is allowed by MT sliding in blocked must be developed to specifically address this question. To conclude, answering this question will be the subject for another, follow-up study. 

      (3) We respecDully disagree with the reviewer’s opinion that the effect of MT sliding in beta cells is moderate. As MT networks go, even a slight change in MT configuration often has dramatic consequences. For example, in mitotic spindles, a tiny overgrowth of microtubule ends during metaphase, which causes them to attach to both kinetochores rather than just one, is very significant for the efficiency of chromosome segregation, causing aneuploidy and cancer. The changes in beta-cell MT networks that we are reporting are much stronger: the effect on the peripheral MT network accumulated over three days of KIF5B depletion is dramatic (Fig 2 B, C). Short-term gross MT network configurations after a single glucose stimulation are harder to detect, but MTs at the cell periphery are, in fact, destabilized and fragmented, as we and others have previously reported (Ho et al 2020, Mueller et al 2021). Preventing this MT rearrangement completely blocks GSIS (Zhu et al 2015, Ho et al 2020). 

      One of the most fascinating features of insulin secretion regulation is that the amount of generated insulin granules significantly exceeds the normal physiological needs for insulin secretion (~100 times more than needed). At the same time, even slightly facilitated glucose depletion can be devastating. Accordingly, the excessive insulin content of a beta cell resulted in the development of multiple levels of control, preventing excessive secretion. Our previous data suggest that the peripheral MT array provides one of those mechanisms. This study indicates that microtubule sliding is necessary to form the proper peripheral network in the long term. Short-term glucose-induced changes in the peripheral MT array likely need to be subtle to prevent over-secretion. Thus, we are not surprised that a dramatic effect of sliding inhibition is only detectable by our approaches after the changes in the MT network accumulate over time. In the revised paper, we now discuss the potential impact of peripheral MT sliding on positive and negative regulation of secretion and add a schematic model illustrating these processes.

      Specific comments: 

      (1) Notably, the authors have previously reported that high glucose-induced remodeling of microtubule networks facilitates robust glucose-stimulated insulin secretion. This remodeling involves the disassembly of old microtubules and the nucleation of new microtubules. Using real-time imaging of photoconverted microtubules, they report that high levels of glucose induce rapid microtubule disassembly preferentially in the periphery of individual β-cells, and this process is mediated by the phosphorylation of microtubule-associated protein tau. Here, they state that the sub-membrane microtubule array is destabilized via microtubule sliding. What is the relevance of the different processes? 

      In this comment, the summary of our previous conclusions is correct, but the conclusion of this current study is re-stated incorrectly. Indeed, we have previously shown that in high glucose, MTs are destabilized at the cell periphery and nucleated in the cell interior. However, this current paper does not state that “the sub-membrane microtubule array is destabilized via microtubule sliding”. To answer this reviewer’s question, our data support a model where, during glucose stimulation, MT sliding within the peripheral bundle might move fragments of MTs severed by other mechanisms. Importantly, we propose that MT sliding restores the partially destabilized peripheral bundle by delivery of MTs that are nucleated at the cell interior and incorporating them into that bundle. In our overall model, three processes (destabilization, nucleation, and sliding to restore the bundle) are coordinated to maintain beta cell fitness on each GSIS cycle.

      (2) On one hand the authors describe how KIF5B depletion prevents sliding and the transport of microtubules to the plasma membrane to form the sub-membrane microtubule array. This indicates KIF5B is required to form this structure. On the other hand, they describe that at high glucose concentration, KIF5B promotes microtubule sliding to destabilize the sub-membrane microtubule array to allow robust insulin secretion. This appears contradictory. 

      We never intended to make an impression that MT sliding destabilized the sub-membrane bundle. Apologies if there was a reason in our wording that caused this misunderstanding of our model. We propose that while the bundle is destabilized downstream of glucose signaling (e.g. due to tau phosphorylation, please see Ho et al Diabetes 2020), MT sliding remodels the bundle and thereafter rebuilds it to prevent over-secretion. In the revised manuscript, we have doublechecked the whole text to make sure that such misunderstanding is avoided. 

      (3) Previously, it has been shown that KIF5B induces tubulin incorporation along the microtubule shaft in a concentration-dependent manner. Moreover, running KIF5B increases microtubule rescue frequency and unlimited growth of microtubules. Notably, KIF5B regulates microtubule network mass and organization in cells (PMID: 34883065). Consequently, it appears possible that the here observed phenomena of changes in the microtubule network might be due to alterations in these processes. 

      We thank the reviewer for proposing this alternative explanation to the observed change in microtubule networks after KIF5B depletion. We have now directly tested this possibility. Namely, we have re-expressed the kinesin-1 motor domain in MIN6 cells depleted of KIF5B. This motor domain construct by itself is not capable of driving microtubule sliding because it lacks the tail domain. At the same time, it is known to move very efficiently at microtubules and should provide the effects as reported in the article cited by the reviewer. We found that the reexpression of the kinesin motor domain does not rescue microtubule network defects in beta cells (see new Figure 2 – Supplemental Figure 2). Thus, we conclude that the effects of kinesin depletion on the microtubule network in beta cells are due to the lack of microtubule sliding, as reported here.

      (4) The authors provide data that indicate that microtubule sliding is enhanced upon glucose stimulation. They conclude that these data indicate that microtubule sliding is an integral part of glucose-triggered microtubule remodeling. Yet, the authors fail to provide any evidence that this process plays a role in insulin secretion or glucose uptake. 

      We would like to point out that we do not “fail” but rather choose not to overload our study by repeating insulin secretion assays in KIF5B-inactivated cells because this would not have been very informative. It has been found previously that kinesin-1 inactivation or knockout significantly attenuates insulin secretion because kinesin-1 is actively transporting insulin granules and kinesin-1 activity is enhanced under high glucose conditions (e.g. Varadi et al 2002, Cui et al., 2011, Donelan et al, 2002). That said, our current finding is very much in line with these previous data. When kinesin is depleted, two things would be happening at the same time: in the absence of sub-membrane microtubule bundle pre-existing insulin granules would be over-secreted, and new insulin would not be delivered to the periphery, both decreasing GSIS. Unfortunately, we do not have tools yet that would allow us to dissect which part of the insulin secretion defect is due to prior over-secretion (the consequence of deficient MT sliding) and which part is due to the lack of new granule delivery. We plan to develop such tools in the future and elaborate on them in a follow-up study. Here, our goal is to understand microtubule organization principles in beta cells, and we choose not to extend the scope of the current study to metabolic assays.  

      (5) The authors speculate that the sub-membrane microtubule array prevents the over-secretion of insulin. Would one not expect in this case a change in the distribution of insulin granules at the plasma membrane when this array is affected? Or after glucose stimulation? Notably, it has been reported that "the defects of β-cell function in KIF5B mutant mice were not coupled with observable changes in islet morphology, islet cell composition, or β-cell size" and "the subcellular localization of insulin vesicles was found to not be affected significantly by the decreased Kif5b level. The cytoplasm of both wild-type and mutant β-cells was filled with insulin vesicles. Insulin vesicle numbers per square μm were determined by counting all insulin vesicles in randomly photographed β-cells. More insulin granules were found in Kif5b knockout β-cells compared with control cells. This phenomenon is consistent with the observation that insulin secretion by β-cells is affected" whereby "Insulin vesicles (arrowheads) were distributed evenly in both mutant and control cells" (PMID: 20870970).  

      Quantitative analyses in the study cited by the reviewer do not include assays that would be relevant to our study. Particularly, in that study neither the amount of insulin granules at the cell periphery nor the ratio between the number of granules at the periphery and the beta cell interior has been analyzed. In addition, in our preliminary observations not shown here, insulin content in beta cells in KIF5B KO mice is highly heterogeneous, with a subpopulation of cells severely depleted of insulin. This opens a new avenue of investigation into beta cell heterogeneity, which is out of the scope of this current study. Thus, we chose to restrict this current study to microtubule organization data.   

      (6) Does the sub-membrane microtubule array exist in primary beta cells (in vitro and/or in vivo) and how it is affected in KIF5B knockout mice?  

      Yes, it does exist. In fact, we have first reported it in mouse islets (Bracey et al 2020, Ho et al 2020). Now, we report that the sub-membrane bundle is defective, and microtubules are misaligned in KIF5B KO mice (new Figure 2 – Supplemental Figure 1).

      Reviewer #2 (Public Review): 

      In this article, Bracey et al. provide insights into the factors contributing to the distinct arrangement observed in sub-membrane microtubules (MTs) within mouse β-cells of the pancreas. Specifically, they propose that in clonal mouse pancreatic β-cells (MIN6), the motor protein KIF5B plays a role in sliding existing MTs towards the cell periphery and aligning them with each other along the plasma membrane. Furthermore, similar to other physiological features of β-cells, this process of MTs sliding is enhanced by a high glucose stimulus. Because a precise alignment of MTs beneath the cell membrane in β-cells is crucial for the regulated secretion of pancreatic enzymes and hormones, KIF5B assumes a significant role in pancreatic activity, both in healthy conditions and during diseases. 

      The authors provide evidence in support of their model by demonstrating that the levels of KIF5B mRNA in MIN6 cells are higher compared to other known KIFs. They further show that when KIF5B is genetically silenced using two different shRNAs, the MT sliding becomes less efficient. Additionally, silencing of KIF5A in the same cells leads to a general reorganization of MTs throughout the cell. Specifically, while control cells exhibit a convoluted and non-radial arrangement of MTs near the cell membrane, KIF5B-depleted cells display a sparse and less dense sub-membrane array of MTs. Based on these findings, the Authors conclude that the loss of KIF5B strongly affects the localization of MTs to the periphery of the cell. Using a dominant-negative approach, the authors also demonstrate that KIF5B facilitates the sliding of MTs by binding to cargo MTs through the kinesin-1 tail binding domain. Additionally, they present evidence suggesting that KIF5B-mediated MT sliding is dependent on glucose, similar to the activity levels of kinesin-1, which increase in the presence of glucose. Notably, when the glucose concentrations in the culturing media of MIN6 cells are reduced from 20 mM to 5 mM, a significant decrease in MT sliding is observed. 

      Strengths:

      This study unveils a previously unexplained mechanism that regulates the specific rearrangement of MTs beneath the cell membrane in pancreatic β-cells. The findings of this research have implications and are of significant interest because the precise regulation of the MT array at the secretion zone plays a critical role in controlling pancreatic function in both healthy and diseased states. In general, the author's conclusions are substantiated by the provided data, and the study demonstrates the utilization of state-of-the-art methodologies including quantification techniques, and elegant dominant-negative experiments. 

      Weaknesses:

      A few relatively minor issues are present and related to data interpretation and the conclusions drawn in the study. Namely, some inconsistencies between what appears to be the overall and sub-membrane MT array in scramble vs. KIF5B-depleted cells, the lack of details about the sub-cellular localization of KIF5B in these cells and the physiological significance of the effect of glucose levels in beta-cells of the pancreas. 

      We thank the reviewer for this insighDul review. In the revised version, we provided re-worded and extended interpretations and conclusions to prevent any issues or misunderstandings.  We trust that while some noted apparent inconsistencies may reflect the intrinsic heterogeneity of the beta cell population, all data presented here indicate the same trend in phenotypes.  In the revised version, we have provided additional cell views and, in places, alternative representative images and videos, to clear out any apparent inconsistencies. We also would like to point out that we in fact reported KIF5B localization: not surprisingly, KIF5B predominantly localized to insulin granules and the punctate staining fills the whole cytoplasm (Figure 2A, bottom panel). However, as pointed out in detail in our response to reviewer 1, we choose to leave out an extensive study of the physiological and metabolic consequences of the reported microtubule network dynamics to a follow-up study. 

      Reviewer #3 (Public Review): 

      Prior work from the Kaverina lab and others had determined that beta-cells build a microtubule network that differs from the canonical radial organization typical in most mammalian cell types and that this organization facilitates the regulated secretion of insulin-containing secretory granules (IGs). In this manuscript, the authors tested the hypothesis that kinesin-driven microtubule sliding is an underlying mechanism that establishes a sub-membranous microtubule array that regulates IG secretion. They employed knock-down and dominant-negative strategies to convincingly show microtubule sliding does, in fact, drive the assembly of the sub-membranous microtubule band. They also used live cell imaging assays to demonstrate that kinesin-mediated microtubule sliding in beta-cells is triggered by extracellular high glucose. Overall, this is an interesting and important study that relates microtubule dynamics to an important physiological process. The experiments were rigorous and well-controlled. 

      We truly appreciate this reviewer’ opinion. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      Figures: 

      (1) Figure 1: 

      a) Why can one not see here, and in most following images, the peripheral sub-membrane microtubule array? One can also not see an accumulation of microtubules in the cell interior. 

      Microtubule pattern in beta cells is variable, and the sub-membrane array is seen in the whole population to a variable extent (see directionality histogram in Figure 2E for statistics). In fact, an array of peripheral MTs parallel to the cell border is present in the example shown in Figure 1 and in all following control images. To make it clearer, we now show the pre-bleach images in Figure 1 D-F at a lower magnification, so that the differences in MT density at the cell periphery and cell center are more clearly seen: MTs lack at the periphery in KF5B-depleted but not the control cells.  

      b) 5 min appears to be a long time and enough time to polymerize a significant number of new microtubules. 

      We interpret this comment as the reviewer’s concern that in FRAP assays, fluorescently-labeled MTs moving into the bleached area might be newly polymerizing MTs rather than preexisting MT relocated into that area. However, this is not the case because newly polymerized MTs contain predominantly quenched “dark” tubulin molecules and only a small percent of fluorescent tubulin. These dim MTs are not included in MT sliding assay analysis, where a threshold for bright MTs is introduced. Now, we added more details for the quantification of these data to Materials and Methods section.

      c) The overall effects appear minor. It is unclear how Fig. 1-Suppl-Fig.1, where no significant difference is shown, is translated into Figure 1 J and K showing a significant difference. 

      With all due respect, we do not agree that the effect is minor. Please see our response to the Public Review where we discuss the major consequences of MT defects in detail. 

      To answer this specific comment, we show that there are significant differences in the number of rapidly moving MTs (5-sec displacement over 0.3 µm) and in the amount of stationary MTs (5sec displacement is below 0.15 µm). There is no significant difference in the amount of slightly displaced MTs (displacements between 0.15 and 0.3 µm; the central part of the histogram). This might indicate that these slight displacements do not depend on kinesin-1 motor but rather are caused by experimental noise, pushing by moving organelles, and/or myosin-dependent forces in the cell. In the revised manuscript, we have this quantification more clearly detailed in Methods and included in Figure legends.

      d) The authors utilize single molecule tracking to further strengthen their conclusion that KIF5B promotes microtubule sliding. The observed effects are weaker than the data obtained from photobleaching experiments. The videos clearly show that there is still significant movement also in KIF5B-depleted cells. If K560RigorE236A binds irreversibly to a microtubule and this microtubule is growing (not only by the addition of tubulin dimers to the plus end; see PMID: 34883065) wouldn't that also result in movement of the tagged K560RigorE236A? As KIF5B is also required in the transport of insulin granules, it should also label "interior microtubules". And in Video 2 it appears that pretty much all "labeled" microtubules are moving. 

      K560RigorE236A forms fiducial marks along the whole MTs lattice, as previously shown in (Tanenbaum et al., 2014). When it is bound to MT lattice, K560RigorE236A moves with the whole MT if it is being relocated. The mechanism described in (PMID: 34883065) appears to be absent or minor in beta cells (see Figure 2- Supplemental Figure 2), thus, even if this mechanism would displace already polymerized MTs, this is not happening in this cell type.

      The reviewer is correct, K560RigorE236A does mark all MTs throughout a beta cell. All MTs are moving slightly in a living cell because they are pushed around by moving organelles, actin contractility, etc. MTs may also be slid by other MT-dependent motors (dynein against the membrane and such). So, it is not surprising that the MT network is “breezing,” and kinesindependent sliding is only a part of MT movement. What we show here is that the KIF5Bdependent MT sliding is responsible for a relatively “long-distance” relocation of MTs manifested in long, directional displacement of fiducial marks.  This does not exclude other movements. This makes extraction of kinesin-dependent MT movements somewhat challenging, of course, that is why we needed to do those extensive analyses. 

      e) Figure 1 G to K is misleading, at least in the context of the provided videos. There are several microtubules that move extensively in shRNA#2-treated cells and overall there appears more movement in this cell as in the control cell. Figure 1I is clearly not representative of the movement shown in Video 2. 

      We apologize if our selection of representative movies/figures for this experiment was imperfect. Indeed, in all depleted cells, SunTag puncta still move to a certain extent, either due to incomplete depletion or to alternative intracellular forces dislocating microtubules. However, there is a clear difference in the fraction of persistently moving puncta (please see Figure 1K and  histogram in Figure 1 - Supplemental Figure 1B). Unfortunately, when the number of SunTag puncta per a cell is variable, it sometimes prevents a good visual perception of the actual distribution of moving versus stationary microtubules. We now show an alternative representative movie for the Figure 1I and the corresponding Video 2, with a goal to compare cells with more consistent numbers of Sun-Tag puncta.

      (2) Figure 2A. 

      a) This is the only image that clearly shows the existence of a sub-membrane microtubule array and the concentration of microtubules in the cell interior. The differences are unclear between the experimental setups including the length of cultivation and knockdown of KIF5B or expression of mutants. 

      We now provide a more detailed description of each image acquisition and processing in Materials and Methods. In brief, while the morphology of MT patterns is intrinsically variable in beta cells, all control cells have populated peripheral MTs that exhibit a more parallel configuration as compared to depletions and mutants.

      b) The authors state "While control cells had convoluted non-radial MTs with a prominent sub-membrane array, typical for beta cells (Fig. 2A), KIF5B-depleted cells featured extra-dense MTs in the cell center and sparse reseeding MTs at the periphery (Fig. 2B, C)". Could that not be explained with the observation that "Kinesin-1 controls microtubule length" (PMID: 34883065)? 

      Thank you for this interesting alternative idea. It does not appear to be the case for beta cells.

      Please see Figure 2-Supplemental Figure 2  and our response to Public Review Comment #3.

      Also, our apologies for the typo in the original manuscript: this is “receding” nor “reseeding”.

      (3) Figure 3: 

      a) This is an elegant way to determine whether KIF5B is involved in microtubule sliding independent of the fact that the effect appears very small. 

      Thank you!            

      b) The assay depends on ectopic expression of a dominant negative mutant. It appears important to show that KIFDNwt is high enough expressed to indeed block the binding of endogenous KIF5B. The authors need to provide a control for this. Furthermore, authors need to provide evidence that other functions of KIF5B are not impaired such as transport of insulin granules and tubulin incorporation or microtubule stability and length.

      Expression of cargo-binding motor domains routinely causes a dominant-negative effect of their cargo transport. This exact construct has been used for the purpose of dominant-negative action previously (Ravindran et al., 2017). It does prevent the membrane cargo binding of KIF5B (Ravindran et al., 2017), thus the transport of insulin granules is also impaired in overexpression cells. Confirming this fact would not influence our study conclusions, so we chose not to repeat these assays for the sake of time.

      c) N-numbers should be similar. The data for KIFDNmut are difficult to interpret with possibly 2 experiments showing little to no displacement and 3 showing displacement. 

      In the revised manuscript, additional data have been added to increase N-numbers.

      (4) Figure 4 and supplements: The morphology of the KIFDNwt cells is greatly affected and this makes it difficult to say whether the effect on microtubules at the cell periphery is a direct or indirect effect. 

      Yes, these cells often have less spread appearance, obscuring visual perception of MT distribution. We have now replaced the image of KIFDNwt cell (Figure 4, Supplemental Figure 1 A) to a more visually representative example.

      Things to do: 

      (1) Notably, the authors have previously reported that high glucose-induced remodeling of microtubule networks facilitates robust glucose-stimulated insulin secretion. This remodeling involves the disassembly of old microtubules and the nucleation of new microtubules. Here, they state that the sub-membrane microtubule array is destabilized via microtubule sliding. What is the relevance of the different processes? Please discuss these in the manuscript. 

      Thank you, we have now extended our discussion of these points and our prior findings. We have also added a schematic model figure for clarity (Figure 7).  

      (2) 5 min appears to be a long time and enough time to polymerize a significant number of new microtubules. Do the authors have any information about the speed of MT formation in MIN6 cells? Can the authors repeat this experiment by preventing MT polymerization? Or repeat the experiment with EB1/EB3 reporter to visualize microtubule growth in the same experimental setting? 

      While some MT polymerization will happen in this timeframe, newly polymerized MTs contain predominantly quenched “dark” tubulin molecules and only a small percent of fluorescent tubulin. These dim MTs are not included in MT sliding assay analysis, where a threshold for bright MTs is introduced. We apologize for initially omitting certain details from the FRAP assay analysis. Now these details have been added.   

      Are the microtubules shown on the cell surface (TIRF microscopy) or do we see here all microtubules? 

      Please see Materials and Methods for microscopy methods and image processing for each figure. Specifically, FRAP assays show a maximum intensity projection of spinning disk confocal stacks over 2.4µm in height (approximately the ventral half of a cell).

      (3) Previously, it has been shown that KIF5B induces tubulin incorporation along the microtubule shaft in a concentration-dependent manner. Moreover, running KIF5B increases microtubule rescue frequency and unlimited growth of microtubules. Notably, KIF5B regulates microtubule network mass and organization in cells (PMID: 34883065). Consequently, it appears possible that the here observed phenomena of changes in the microtubule network might be due to alterations in these processes. Authors need to exclude these possibilities and discuss them. 

      Thank you for this interesting alternative idea. It does not appear to be the case for beta cells. Please see Figure 2-Supplemental Figure 2  and our response to Public Review Comment #3.

      (4) It is important that the authors describe in the text and possibly in the figure legends the differences between the experimental set-ups including the length of cultivation and knock down of KIF5B or expression of mutants. 

      Thank you, please see these details in the text (Materials and Methods section).

      (5) Figure 5: Does KIF5B depletion rescue the kinesore-induced defects 

      Thank you for suggesting this control. We have now conducted corresponding experiments. The answer is yes, it does. Kinesore does not induce detectable changes in MT patterns in KIF5Bdepleted cells (new Figure 5-Supplemental Figure 2). 

      (6) Can the authors block kinesin-1 resulting in microtubule accumulation in the cell center and then release the block, and best inhibiting microtubule formation, to see whether the microtubules accumulated in the cell center will be transported to the periphery? 

      This proposed experiment would have been a nice illustration to the study, however it has proven to be too challenging. Unfortunately we have to leave it for the future studies. However,  the experiments already included in the paper are sufficient to prove our conclusions. 

      Minor comments: 

      (1) The English needs to be improved. Oaen it is unclear what the authors try to convey. The manuscript is difficult to read and contains several overstatements. 

      The revised manuscript has been through several rounds of proof-reading for clarity.

      (2) It is important to describe in more detail in the introduction what is known about KIF5B in beta cells. Previously, it has been demonstrated that silencing, or inactivation by a dominant negative form of KIF5B, blocks the sustained phase of glucose-stimulated insulin secretion (PMID: 9112396, PMID: 12356920, PMID: 20870970). 

      Yes, this is of course very important and have been cited in the original manuscript. Now, we have expanded the discussion on the matter.

      (3) Figure 1B and Fig. 1 Suppl Fig.1: Please provide band sizes and provide information on the size of KIF5B. 

      We have replaced Fig. 1B and Suppl Fig 1A with quantitative analysis of KIF5B depletion, not found in new Fig. 1B and Suppl Fig. 1A-C. 

      (4) It is important to state the used glucose concentrations in Figure 1D (based on the methods section it is probably 25 mM glucose) and all subsequent experiments. Is this correct and comparable to Figure 6A or B? For the non-specialized reader, more information should be provided on why initial glucose starvation is performed.  

      Cell culture models of pancreatic beta cells are routinely maintained at glucose levels that at considered “high”, or stimulatory for secretion. This is needed to prevent the loss of cells’ capacity to respond to glucose stimulation over generations. In order to test GSIS, cells need to be equilibrated at low (fasting, standardly 2.8mM) glucose levels for several hours, so that they are capable of secreting insulin upon glucose addition. 25mM glucose is normally used to stimulate GSIS in cell culture models of beta cells, like MIN6. This is a higher concentration as compared to what is needed to stimulate primary beta cells in islets.

      Reviewer #2 (Recommendations For The Authors): 

      I have the following specific questions that pertain to data interpretation and the conclusions drawn.

      (1) The morphology of the overall MT array before the bleach treatment in both control cells and KIF5B-KD cells depicted in Figure 1D-F and Figure 2A-C appears to be distinct. In Figure 1, it seems that the absence of KIF5B results in a general augmentation of MT mass, whereas the arrangement presented in Figure 2 indicates the contrary. Even in the sub-membrane areas, this phenomenon appears to hold true. However, the images used in this study, which depict entire cells or a significant portion of cells, may not be ideal for visualizing the sub-membrane regions.

      It would be beneficial if the author could offer some explanations for this apparent inconsistency. 

      While beta cell population is intrinsically heterogeneous, all data presented here indicate the same trend in phenotypes. Possibly, some apparent inconsistency between figure 1 and 2 appeared because in the original manuscript we did not show the pre-bleach whole-cell overview in Figure 1. In the revised version, we now show the whole cells for pre-bleach so that MT organization at the cell periphery can be assessed. Please note that in the control cell, MTs are more or less equally distributed over the cell, while in KIF5B depletions the cell periphery is significantly less populated than the cell center. Furthermore, we did not detect MT mass augmentation or increase in KIF5B depletions. One possible explanation for such reviewer’s impression from Figure 2 is that Figure 2 F-H shows thresholded images where threshold was adjusted to highlight peripheral MTs in each cell. Please note that this is not the same threshold for each cell (see Figure 2 - Supplemental Figure 2 and 3). Thus, KIF5B-depleted cells that have fewer MTs at the periphery appear brighter in these thresholded images. For the true comparison of MT intensity, please see Figure 2 A-C (grayscale image, not the threshold).

      (2) It would be helpful if the author could provide a visual representation or comment on the sub-cellular localization of KIF5B in MIN6 cells. Is it predominantly localized in the submembrane region, or is it more evenly distributed throughout the cytoplasm? 

      Please see Fig 2A, lower panel. KIF5B is seen across the cell as a punctate staining, in agreement with previous findings that it mostly localize at IGs.

      (3) The alteration in microtubule (MT) organization and sliding in the absence of KIF5B seems to initiate in proximity to the apparent microtubule organizing center (MTOC) depicted in Figure 2A, and then "simply" extends towards the sub-membrane region. Although the authors acknowledge it, it would be advantageous for the readers to have a clearer indication that the sub-membrane microtubule (MT) reorganization in the absence of KIF5B is a result of a broader MT reorganization rather than a specific occurrence restricted to the sub-membrane regions. 

      Thank you for this comment. We now extend our discussion to clearer state our conclusions and interpretations of this point. We also have added a schematic Figure 7 as an illustration. 

      (4) Regarding the "glucose experiments," it is common to add 20-25 mM glucose to culture media, but physiological concentrations of glucose typically hover around 5 mM. Therefore, it is somewhat unclear what the implications are when investigating the impact of KIF5B depletion on MT sliding at 2.8 mM of glucose. It would be helpful if the authors could provide some commentary on this matter, particularly in relation to physiological and pathological conditions. 

      2.8 mM glucose is a standard low glucose condition used to model glucose deprivation/fasting. For functional primary beta cells within pancreatic islets, GSIS can be triggered by glucose stimulation as low as 8-12 mM glucose. However, for glucose stimulation of cultured beta cells such as MIN6 used in this paper, 20-25 mM glucose is standardly used because these cell lines have a higher threshold of stimulation compared to primary beta cells and whole islets.

      (5) In supplementary Figure 1A, it would be helpful if the lanes in the WB were marked indicating what is what. In my observation, it appears that Supplementary Figure 1A, particularly lanes #2, 3, and 4, display the GAPDH protein (MW 36 kDa) (or is it alpha-tubulin, as mentioned in the Material and Methods section and indicated in lane #409?) relative to Figure 1A. I am curious about KIF5B (MW 108 kDa). Is it represented by the upper band? Did the author probe the same membrane simultaneously with two different primary antibodies? This should be clarified, and the author should indicate the molecular weight of the ladder. 

      Indeed, in the original WB two antibodies have been used together, due to a challenge in collecting a sufficient number of shRNA-expressing beta cells. It caused a confusion and improper interpretation of the loading control. We thank the reviewer for catching this.  We have now replaced old Fig. 1B and Suppl. Fig. 1A with quantitative analysis of KIF5B depletion based on single-cell immunofluorescent staining. It is now found in new Fig. 1B and Suppl Fig. 1A-C.  

      Reviewer #3 (Recommendations For The Authors): 

      In all of the figures that present microtubule orientations (e.g. Figure 2E) the error bars obscure the vertical bins making them difficult to read or interpret. If they were rendered at a larger scale, it would be easier to read and interpret these results. 

      Thank you pointing this out. We now show these histograms with a different format of error bars and without outliers that obscure the view. A variant with outliers is now shown in the supplement. 

      Some of the callouts to the videos in the paper are inaccurate. Perhaps the authors reordered sections of the paper but failed to correctly renumber the video citations? 

      Thank you for this comment, we have corrected all callouts now.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This short report shows that the transcription factor gene mirror is specifically expressed in the posterior region of the butterfly wing imaginal disk, and uses CRISPR mosaic knock-outs to show it is necessary to specify the morphological features (scales, veins, and surface) of this area.

      Strengths:

      The data and figures support the conclusions. The article is swiftly written and makes an interesting evolutionary comparison to the function of this gene in Drosophila. Based on the data presented, it can now be established that mirror likely has a similar selector function for posterior-wing identity in a plethora of insects.

      We thank the reviewer for their feedback.

      Weaknesses:

      This first version has minor terminological issues regarding the use of the terms "domains" and "compartment".

      We acknowledge that the terminologies “domains” and “compartments” might lead to confusion. To avoid confusion we have removed the term “compartment” from the manuscript.

      Reviewer #2 (Public Review):

      This is a short and unpretentious paper. It is an interesting area and therefore, although much of this area of research was pioneered in flies, extending basic findings to butterflies would be worthwhile. Indeed, there is an intriguing observation but it is technically flawed and these flaws are serious.

      The authors show that mirror is expressed at the back of the wing in butterflies (as in flies). They present some evidence that is required for the proper development of the back of the wing in butterflies (a region dubbed the vannus by the ancient guru Snodgrass). But there are problems with that evidence. First, concerning the method, using CRISP they treat embryos and the expectation is that the mirror gene will be damaged in groups of cell lineages, giving a mosaic animal in which some lines of cells are normal for mirror and others are not. We do not know where the clones or patches of cells that are defective for mirror are because they are not marked. Also, we do not know what part of the wing is wild type and what part is mutant for mirror. When the mirror mutant cells colonise the back of the wing and that butterfly survives (many butterflies fail to develop), the back of the wing is altered in some selected butterflies. This raises a second problem: we do not know whether the rear of the wing is missing or transformed. From the images, the appearance of the back of the wing is clearly different from the wild type, but is that due to transformation or not? And then I believe we need to know specifically what the difference is between the rear of the wing and the main part. What we see is a silvery look at the back that is not present in the main part, is it the structure of the scales? We are not told.

      Thank you for this feedback. We appreciate that many readers may not accustomed to looking at mosaic knockouts. As discussed in a previous review article (Zhang & Reed 2017), we rely on a combination of contralateral asymmetry and replicates to infer mutant phenotypes. For many genes (e.g. pigmentation enzymes) mutant clones are obvious, but for other types of genes (e.g. ligands) clone boundaries are sometimes not directly diagnosable. It is simply a limitation of our study system. Nonetheless, you see for yourself that “the back of the wing is altered in some butterflies” – the effects of deleting mirror are clear and repeatable.

      In terms of interpreting mutant phenotypes, we agree that that paper would benefit from a better description of the specific effects. Therefore, we have included an improved, more systematic description of phenotypes, along with better-annotated figures showing changes in wing shape and venation, scale coloration, and color pattern transformation (e.g. posterior elongation of the orange marginal stripes).

      There are other problems. Mirror is only part of a group of genes in flies and in flies both iroquois and mirror are needed to make the back of the wing, the alula (Kehl et al). What is known about iro expression in butterflies?

      In Drosophila mirror, araucan, and caupolican comprise the so-called Iroqouis Complex of genes. As denoted in Figure S4 and in Kerner et al (doi: https://doi.org/10.1186/1471-2148-9-74) the divergence of araucan and caupolican into two separate paralogs is restricted to Drosophila. As in most insects, butterflies have only two Iroquois Complex genes: araucan and mirror. We tested the role of araucan in Junonia coenia as shown in our pre-print: https://doi.org/10.1101/2023.11.21.568172. Its expression appears to be restricted to early pupal wings where it is transcribed in all scale-forming cells. Mosaic araucan KOs resulted in a change in scale iridescent coloration associated with changes in the laminar thickness of scale cells.  

      In flies, mirror regulates a late and local expression of dpp that seems to be responsible for making the alula. What happens in butterflies? Would a study of the expression of Dpp in wildtype and mirror compromised wings be useful?

      We thank the reviewer for the proposal and agree that a future study comparing Dpp in wild-type versus mirror KO butterflies would be useful to clarify the mechanism of Dpp signalling in wing development. It is not clear, however, that the results of a Dpp experiment would change the conclusions of our current study therefore we decided not to undertake these additional experiments for our revision.

      Thus, I find the paper to be disappointing for a general journal as it does little more than claim what was discovered in Drosophila is at least partly true in butterflies. 

      We respect that the reviewer does not have a strong interest in the comparative aspects of this study. Fair enough. This report is primarily aimed at biologists interested in the evolutionary history of insect wings.

      Also, it fails to explain what the authors mean by "wing domains" and "domain specification". They are not alone, butterfly workers, in general, appear vague about these concepts, their vagueness allowing too much loose thinking.

      A domain is “a region distinctively marked by some physical feature”. This term is used extensively in the developmental biology literature (e.g. “expression domain”, “embryonic domain”, “tissue domain”, “domain specification”) and is found throughout popular textbooks (e.g. Alberts et al. “The Cell”, Gilbert “Developmental Biology”). We prefer the term “domain” because of its association in the Drosophila literature with transcription factors that define fields of cells. We specifically avoided using the term “compartment” because of its association with cell lineage, which we have not tested. 

      Since these matters are at the heart of the purpose and meaning of the work reported here, we readers need a paper containing more critical thought and information. I would like to have a better and more logical introduction and discussion.

      We would like the very same thing, of course, and we hope the reviewer finds our revised manuscript to be more satisfying to read.

      The authors do define what they mean by the vannus of the wing. In flies the definition of compartments is clear and abundantly demonstrated, with gene expression and requirement being limited precisely to sets of cells that display lineage boundaries. It is true that domains of gene expression in flies, for example of the iroquois complex, which includes mirror, can only be related to patterns with difficulty. Some recap of what is known plus the opinion of the authors on how they interpret papers on possible lineage domains in butterflies might also be useful as the reader, is no wiser about what the authors might mean at the end of it!

      We thank the reviewer for this suggestion. However, our experiments have little to contribute to the topic of cell lineage compartmentalization. We have therefore opted to avoid speculating on this topic to prevent confusion and to keep the manuscript focused on our experimental results.

      The references are sometimes inappropriate. The discovery of the AP compartments should not be referred to Guillen et al 1995, but to Morata and Lawrence 1975. Proofreading is required.

      We thank the reviewer for suggesting this important reference. We have included it in our revision.

      Reviewer #3 (Public Review):

      Summary:

      The manuscript by Chatterjee et al. examines the role of the mirror locus in patterning butterfly wings. The authors examine the pattern of mirror expression in the common buckeye butterfly, Junonia coenia, and then employ CRISPR mutagenesis to generate mosaic butterflies carrying clones of mirror mutant cells. They find that mirror is expressed in a well-defined posterior sector of final-instar wing discs from both hindwings and forewings and that CRISPR-injected larvae display a loss of adult wing structures presumably derived from the mirror expressing region of hindwing primordium (the case for forewings is a bit less clear since the mirror domain is narrower than in the hindwing, but there also do seem to be some anomalies in posterior regions of forewings in adults derived from CRISPR injected larvae). The authors conclude that the wings of these butterflies have at least three different fundamental wing compartments, the mirror domain, a posterior domain defined by engrailed expression, and an anterior domain expressing neither mirror nor engrailed. They speculate that this most posterior compartment has been reduced to a rudiment in Drosophila and thus has not been adequately recognized as such a primary regional specialization.

      Critique:

      This is a very straightforward study and the experimental results presented support the key claims that mirror is expressed in a restricted posterior section of the wing primordium and that mosaic wings from CRISPR-injected larvae display loss of adult wing structures presumably derived from cells expressing mirror (or at least nearby). The major issue I have with this paper is the strong interpretation of these findings that lead the authors to conclude that mirror is acting as a high-level gene akin to engrailed in defining a separate extreme posterior wing compartment. To place this claim in context, it is important in my view to consider what is known about engrailed, for which there is ample evidence to support the claim that this gene does play a very ancestral and conserved function in defining posterior compartments of all body segments (including the wing) across arthropods.

      (1) Engrailed is expressed in a broad posterior domain with a sharp anterior border in all segments of virtually all arthropods examined (broad use of a very good panspecies anti-En antibody makes this case very strong).

      (2) In Drosophila, marked clones of wing cells (generated during larval stages) strictly obey a straight anterior-posterior border indicating that cells in these two domains do not normally intermix, thus, supporting the claim that a clear A/P lineage compartment exists.

      In my opinion, mirror does not seem to be in the same category of regulator as engrailed for the following reasons:

      (1) There is no evidence that I am aware of, either from the current experiments, or others that the mirror expression domain corresponds to a clonal lineage compartment. It is also unclear from the data shown in this study whether engrailed is co-expressed with mirror in the posterior-most cells of J. coenia wing discs. If so, it does not seem justified to infer that mirror acts as an independent determinant of the region of the wing where it is expressed.

      (2) Mirror is not only expressed in a posterior region of the wing in flies but also in the ventral region of the eye. In Drosophila, mirror mutants not only lack the alula (derived approximately from cells where mirror is expressed), but also lack tissue derived from the ventral region of the eye disc (although this ventral tissue loss phenotype may extend beyond the cells expressing mirror).

      In summary, it seems most reasonable to me to think of mirror as a transcription factor that provides important development information for a diverse set of cells in which it can be expressed (posterior wing cells and ventral eye cells) but not that it acts as a high-level regulator as engrailed.

      Recommendation:

      While the data provided in this succinct study are solid and interesting, it is not clear to me that these findings support the major claim that mirror defines an extreme posterior compartment akin to that specified by engrailed. Minimally, the authors should address the points outlined above in their discussion section and greatly tone down their conclusion regarding mirror being a conserved selector-like gene dedicated to establishing posterior-most fates of the wing. They also should cite and discuss the original study in Drosophila describing the mirror expression pattern in the embryo and eye and the corresponding eye phenotype of mirror mutants: McNeill et al., Genes & Dev. 1997. 11: 1073-1082; doi:10.1101/gad.11.8.1073.

      We thank the reviewer for their summary, critique, and recommendations. We agree with everything the reviewer says. Honestly, however, we were surprised by these comments because we took great care in the paper to never refer to mirror as a compartmentalization gene or claim it has a function in cell lineage compartmentalization like engrailed. As pointed out, we lack clonal analyses to test for compartmentalization. This is why we used the term “domain” instead of “compartment” in the title and throughout the manuscript. Nevertheless, we have recrafted the discussion in the manuscript, including completely removing the term “compartment”, to better avoid implications that mirror plays a role in cell lineage compartmentalization. 

      We also thank the reviewer for recommending the paper about the role of mirror in eye development. For the sake of keeping the paper focused, however, we decided not to broach the topic of mirror functions outside the context of wing development.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      I have minor comments for improvement.

      The abstract and introductions are terminologically problematic when they refer to the concept of compartment and compartment boundaries. Allegedly this confusion has previously propagated in several articles related to butterfly wing development, which keeps alienating this literature from being taken seriously by fly specialists, for example. So it is important to use the right terms. I will try to explain point by point here, but I would appreciate it if the authors could undertake a significant rewrite taking these comments into account. The authors use the terms compartment and compartment boundary. This has a very specific use in developmental genetics: mitotic clones never cross a boundary (or compartment). I think the authors can keep referring to the equivalent of the A-P boundary, which is situated somewhere between M1-M2 based on unpublished data from the Patel Lab, and is not very well defined (Engrailed expression moves a little bit during development in this area). Domain is a looser term and can be used more liberally to describe genetically defined regions.

      - "Classical morphological work subdivides insect wings into several distinct domains along the antero-posterior (AP) axis, each of which can evolve relatively independently." Yes. This concept of domain and individuation seems important. You could make a proposed link to selector genes here.

      - "There has been little molecular evidence, however, for AP subdivision beyond a single compartment boundary described from Drosophila melanogaster." Incorrect, and this conflates "domain" and "compartment".

      Flies have wing AP domains too, that pattern their veins (see the cited Banerjee et al). 

      - "Our results confirm that insect wings can have more than one posterior developmental domain, and support models of how selector genes may facilitate evolutionarily individuation of distinct AP domains in insect wings". Yes, and I like the second part of the sentence. Still, I would recommend simply deleting "confirm that insect wings can have more than one posterior developmental domain, and" because this is neglecting previous work on AP genetic regionalization in both flies (vein literature) and butterflies (e.g. McKenna and Nijhout, Banerjee et al).

      - "Analyses of wing pattern diversity across butterflies, considering both natural variation and genetic mutants, suggest that wings can be subdivided into at least five AP domains, bounded by the M1, M3, Cu2, and 2A veins respectively, within each of which there are strong correlations in color pattern variation and wing morphology (Figure 1A)". Yes, and I would recommend emphasizing they correspond to welldefined gene expression domains as mentioned in Banerjee et al, or McKenna and Nijhout.

      - "The anterior-most of these domains, bordered by the M1 vein, appears to correspond to an AP compartment boundary originally described by cell lineage tracing in Drosophila melanogaster, and later supported in butterfly wings by expression of the Engrailed transcription factor. Interestingly, however, D. melanogaster work has yet to reveal clear evidence for additional AP domain boundaries in the wing." Confusingly, because the first sentence is about compartments while the second is about AP domains. I also think the claim that Dmel has no other known AP domains is dubious because Spalt is highly regionalized in flies.

      - "Previous authors have proposed the existence of such individuated domains, and speculated that they may be specified by selector genes.5,10 Our data provide experimental support for this model, and now motivate us to identify factors that specify other domain boundaries between the M1 and A2 veins." Yes, I completely agree with this way to emphasize the selector effect, and to link it to the concept of "individuated domain"

      We cannot thank the reviewer enough for the time and thought they devoted to giving helpful suggestions to improve our manuscript. We have applied all of the above recommendations to the revision.

      Fig. S1: the field needs to move away from Red/Green microscopy images, for accessibility reasons.

      The easiest fix here would be to change the red channels to magenta.

      Green/Magenta provides excellent contrast and accessibility in general in 2-channel images.

      We thank the reviewer for this suggestion. We have improved the color accessibility of Fig. S1.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The study addresses how faces and bodies are integrated in two STS face areas revealed by fMRI in the primate brain. It builds upon recordings and analysis of the responses of large populations of neurons to three sets of images, that vary face and body positions. These sets allowed the authors to thoroughly investigate invariance to position on the screen (MC HC), to pose (P1 P2), to rotation (0 45 90 135 180 225 270 315), to inversion, to possible and impossible postures (all vs straight), to the presentation of head and body together or in isolation. By analyzing neuronal responses, they found that different neurons showed preferences for body orientation, head orientation, or the interaction between the two. By using a linear support vector machine classifier, they show that the neuronal population can decode head-body angle presented across orientations, in the anterior aSTS patch (but not middle mSTS patch), except for mirror orientation.

      Strengths:

      These results extend prior work on the role of Anterior STS fundus face area in face-body integration and its invariance to mirror symmetry, with a rigorous set of stimuli revealing the workings of these neuronal populations in processing individuals as a whole, in an important series of carefully designed conditions.

      Minor issues and questions that could be addressed by the authors:

      (1) Methods. While monkeys certainly infer/recognize that individual pictures refer to the same pose with varying orientations based on prior studies (Wang et al.), I am wondering whether in this study monkeys saw a full rotation of each of the monkey poses as a video before seeing the individual pictures of the different orientations, during recordings.

      The monkeys had not been exposed to videos of a rotating monkey pose before the recordings. However, they were reared and housed with other monkeys, providing them with ample experience of monkey poses from different viewpoints.

      (2) Experiment 1. The authors mention that neurons are preselected as face-selective, body-selective, or both-selective. Do the Monkey Sum Index and ANOVA main effects change per Neuron type?

      We have performed a new analysis to assess whether the Monkey Sum Index is related to the response strength for the face versus the body as measured in the Selectivity Test of Experiment 1. To do this we selected face- and body-category selective neurons, as well as neurons responding selectively to both faces and bodies. First, we selected those neurons that responded significantly to either faces, bodies, or the two control object categories, using a split-plot ANOVA for these 40 stimuli. From those neurons, we selected face-selective ones having at least a twofold larger mean net response to faces compared to bodies (faces > 2 * bodies) and the control objects for faces (faces  > 2* objects). Similarly, a body-selective neuron was defined by a twofold larger mean net response to bodies compared to faces and the control objects for bodies. A body-and-face selective neuron was defined as having a twofold larger net response to the faces compared to their control objects, and to bodies compared to their control objects, with the ratio between mean response to bodies and faces being less than twofold. Then, we compared the distribution of the Monkey Sum Index (MSI) for each region (aSTS; mSTS), pose (P1, P2), and centering (head- (HC) or monkey-centered (MC)) condition. Too few body-and-face selective neurons were present in each combination of region, pose, and centering (a maximum of 7) to allow a comparison of their MSI distribution with the other neuron types. The Figure below shows the distribution of the MSI for the different orientation-neuron combinations for the body- and face-selective neurons (same format as in Figure 3a, main text). The number of body-selective neurons, according to the employed criteria, varied from 21 to 29, whereas the number of face-selective neurons ranged from 14 to 24 (pooled across monkeys). The data of the two subjects are shown in a different color and the number of cases for each subject is indicated (n1: number of cases for M1; n2: number of cases for M2). The arrows indicate the medians for the data pooled across the monkey subjects. For the MC condition, the MSI tended to be more negative (i.e. relatively less response to the monkey compared to the sum of the body and face responses) for the face compared to the body cells, but this was significant only for mSTS and P1 (p = 0.043; Wilcoxon rank sum test; tested after averaging the indices per neuron to avoid dependence of indices within a neuron). No consistent, nor significant tendencies were observed for the HC stimuli. This absence of a consistent relationship between MSI and face- versus body-selectivity is in line with the absence of a correlation between the MSI and face- versus body-selectivity using natural images of monkeys in a previous study (Zafirova Y, Bognár A, Vogels R. Configuration-sensitive face-body interactions in primate visual cortex. Prog Neurobiol. 2024 Jan;232:102545).

      We did not perform a similar analysis for the main effects of the two-way ANOVA because the very large majority of neurons showed a significant effect of body orientation and thus no meaningful difference between the two neuron types can be expected.

      Author response image 1.

      (3) I might have missed this information, but the correlation between P1 and P2 seems to not be tested although they carry similar behavioral relevance in terms of where attention is allocated and where the body is facing for each given head-body orientation.

      Indeed, we did not compute this correlation between the responses to the sitting (P1) and standing (P2) pose avatar images. However, as pointed out by the reviewer, one might expect such correlations because of the same head orientations and body-facing directions. Thus, we computed the correlation between the 64 head-body orientation conditions of P1 and P2 for those neurons that were tested with both poses and showed a response for both poses (Split-plot ANOVA). This was performed for the Head-Centered and Monkey-Centered tests of Experiment 1 for each monkey and region. Note that not all neurons were tested with both poses (because of failure to maintain isolation of the single unit in both tests or the monkey stopped working) and not all neurons that were recorded in both tests showed a significant response for both poses, which is not unexpected since these neurons can be pose selective. The distribution of the Pearson correlation coefficients of the neurons with a significant response in both tests is shown in Figure S1. The median correlation coefficient was significantly larger than zero for each region, monkey, and centering condition (outcome of Wilcoxon tests, testing whether the median was different from zero (p1 = p-value for M1; p2: p-value for M2) in Figure), indicating that the effect of head and/or body orientation generalizes across pose. We have noted this now in the Results (page 12) and added the Figure (New Figure S1) in the Suppl. Material.

      (4) Is the invariance for position HC-MC larger in aSTS neurons compared to mSTS neurons, as could be expected from their larger receptive fields?

      Yes, the position tolerance of the interaction of body and head orientation was significantly larger for aSTS compared to mSTS neurons, as we described on pages 11 and 12 of the Results. This is in line with larger receptive fields in aSTS than in mSTS. However, we did not plot receptive fields in the present study.

      (5) L492 "The body-inversion effect likely results from greater exposure to upright than inverted bodies during development". Monkeys display more hanging upside-down behavior than humans, however, does the head appear more tilted in these natural configurations?

      Indeed, infant monkeys do spend some time hanging upside down from their mother's belly. While we lack quantitative data on this behavior, casual observations suggest that even young monkeys spend more time upright. The tilt of the head while hanging upside down can vary, just as it does in standing or sitting monkeys (as when they search for food or orient to other individuals). To our knowledge, no quantitative data exist on the frequency of head tilts in upright versus upside-down monkeys. Therefore, we refrain from further speculation on this interesting point, which warrants more attention.

      (6) Methods in Experiment 1. SVM. How many neurons are sufficient to decode the orientation?

      The number of neurons that are needed to decode the head-body orientation angle depends on which neurons are included, as we show in a novel analysis of the data of Experiment 1. We employed a neuron-dropping analysis, similar to Chiang et al. (Chiang FK, Wallis JD, Rich EL. Cognitive strategies shift information from single neurons to populations in prefrontal cortex. Neuron. 2022 Feb 16;110(4):709-721) to assess the positive (or negative) contribution of each neuron to the decoding performance. We performed cross-validated linear SVM decoding N times, each time leaving out a different neuron (using N-1 neurons; 2000 resamplings of pseudo-population vectors). We then ranked decoding accuracies from highest to lowest, identifying the ‘worst’ (rank 1) to ‘best’ (rank N) neurons. Next, we conducted N decodings, incrementally increasing the number of included neurons from 1 to N, starting with the worst-ranked neuron (rank 1) and sequentially adding the next (rank 2, rank 3, etc.). This analysis focused on zero versus straight angle decoding in the aSTS, as it yielded the highest accuracy. We applied it when training on MC and testing on HC for each pose. Plotting accuracy as a function of the number of included neurons suggested that less than half contributed positively to decoding. We show also the ten “best” neurons for each centering condition and pose. These have a variety of tuning patterns for head and body orientation suggesting that the decoding of head-body orientation angle depends on a population code. Notably, the best-ranked (rank N) neuron alone achieved above-chance accuracy. We have added this interesting and novel result to the Results (page 16) and Suppl. Material (new Figure S3).

      (7) Figure 3D 3E. Could the authors please indicate for each of these neurons whether they show a main effect of face, body, or interaction, as well as their median corrected correlation to get a flavor of these numbers for these examples?

      We have indicated these now in Figure 3.

      (8) Methods and Figure 1A. It could be informative to precise whether the recordings are carried in the lateral part of the STS or in the fundus of the STS both for aSTS and mSTS for comparison to other studies that are using these distinctions (AF, AL, MF, ML).

      In experiment 1, the recording locations were not as medial as the fundus. For experiments 2 and 3, the ventral part of the fundus was included, as described in the Methods. We have added this to the Methods now (page 31).

      Wang, G., Obama, S., Yamashita, W. et al. Prior experience of rotation is not required for recognizing objects seen from different angles. Nat Neurosci 8, 1768-1775 (2005). https://doi-org.insb.bib.cnrs.fr/10.1038/nn1600

      Reviewer #2 (Public review):

      Summary:

      This paper investigates the neuronal encoding of the relationship between head and body orientations in the brain. Specifically, the authors focus on the angular relationship between the head and body by employing virtual avatars. Neuronal responses were recorded electrophysiologically from two fMRI-defined areas in the superior temporal sulcus and analyzed using decoding methods. They found that: (1) anterior STS neurons encode head-body angle configurations; (2) these neurons distinguish aligned and opposite head-body configurations effectively, whereas mirror-symmetric configurations are more difficult to differentiate; and (3) an upside-down inversion diminishes the encoding of head-body angles. These findings advance our understanding of how visual perception of individuals is mediated, providing a fundamental clue as to how the primate brain processes the relationship between head and body - a process that is crucial for social communication.

      Strengths:

      The paper is clearly written, and the experimental design is thoughtfully constructed and detailed. The use of electrophysiological recordings from fMRI-defined areas elucidated the mechanism of head-body angle encoding at the level of local neuronal populations. Multiple experiments, control conditions, and detailed analyses thoroughly examined various factors that could affect the decoding results. The decoding methods effectively and consistently revealed the encoding of head-body angles in the anterior STS neurons. Consequently, this study offers valuable insights into the neuronal mechanisms underlying our capacity to integrate head and body cues for social cognition-a topic that is likely to captivate readers in this field.

      Weaknesses:

      I did not identify any major weaknesses in this paper; I only have a few minor comments and suggestions to enhance clarity and further strengthen the manuscript, as detailed in the Private Recommendations section.

      Reviewer #3 (Public review):

      Summary:

      Zafirova et al. investigated the interaction of head and body orientation in the macaque superior temporal sulcus (STS). Combining fMRI and electrophysiology, they recorded responses of visual neurons to a monkey avatar with varying head and body orientations. They found that STS neurons integrate head and body information in a nonlinear way, showing selectivity for specific combinations of head-body orientations. Head-body configuration angles can be reliably decoded, particularly for neurons in the anterior STS. Furthermore, body inversion resulted in reduced decoding of head-body configuration angles. Compared to previous work that examined face or body alone, this study demonstrates how head and body information are integrated to compute a socially meaningful signal.

      Strengths:

      This work presents an elegant design of visual stimuli, with a monkey avatar of varying head and body orientations, making the analysis and interpretation straightforward. Together with several control experiments, the authors systematically investigated different aspects of head-body integration in the macaque STS. The results and analyses of the paper are mostly convincing.

      Weaknesses:

      (1) Using ANOVA, the authors demonstrate the existence of nonlinear interactions between head and body orientations. While this is a conventional way of identifying nonlinear interactions, it does not specify the exact type of the interaction. Although the computation of the head-body configuration angle requires some nonlinearity, it's unclear whether these interactions actually contribute. Figure 3 shows some example neurons, but a more detailed analysis is needed to reveal the diversity of the interactions. One suggestion would be to examine the relationship between the presence of an interaction and the neural encoding of the configuration angle.

      This is an excellent suggestion. To do this, one needs to identify the neurons that contribute to the decoding of head-body orientation angles. For that, we employed a neuron-dropping analysis, similar to Chiang et al. (Chiang FK, Wallis JD, Rich EL. Cognitive strategies shift information from single neurons to populations in prefrontal cortex. Neuron. 2022 Feb 16;110(4):709-721.) to assess the positive (or negative) contribution of each neuron to the decoding performance. We performed cross-validated linear SVM decoding N times, each time leaving out a different neuron (using N-1 neurons; 2000 resamplings of pseudo-population vectors). We then ranked decoding accuracies from highest to lowest, identifying the ‘worst’ (rank 1) to ‘best’ (rank N) neurons. Next, we conducted N decodings, incrementally increasing the number of included neurons from 1 to N, starting with the worst-ranked neuron (rank 1) and sequentially adding the next (rank 2, rank 3, etc.). This analysis focused on zero versus straight angle decoding in the aSTS, as it yielded the highest accuracy. We applied it when training on MC and testing on HC for each pose. Plotting accuracy as a function of the number of included neurons suggested that less than half contributed positively to decoding (see Figure S3). We examined the tuning for head and body orientation of the 10 “best” neurons (Figure S3). For half or more of those the two-way ANOVA showed a significant interaction. These are indicated by the red color in the Figure. They showed a variety of tuning patterns for head and body orientation, suggesting that the decoding of the head-body orientation angle results from a combination of neurons with different tuning profiles. Based on a suggestion from reviewer 2, we performed for each neuron of experiment 1 a one-way ANOVA with as factor head-body orientation angle. To do that, we combined all 64 trials that had the same head-body orientation angle. The percentage of neurons (required to be responsive in the tested condition) for which this one-way ANOVA was significant was low but larger than the expected 5% (Type 1 error), with a median of 16.5% (range: 3 to 23%) in aSTS and 8% for mSTS (range: 0-19%). However, a higher percentage of the 10 best neurons for each pose (indicated by the star) showed a significant one-way ANOVA for angle (for P1, MC: 50% (95% confidence interval (CI): 19% – 81%); P1, HC: 70% (CI: 35% - 93%); P2, MC: 70% (CI: 35% – 93%); P2: HC: 50% (CI: 19%-81%)). These percentages were significantly higher than expected for a random sample from the population of neurons for each pose-centering combination (expected percentages listed in the same order as above: 16%, 13%, 16%, and 10%; all outside CI). Thus, for at least half of the “best” neurons, the response differed significantly among the head-orientation angles at the single neuron level. Nonetheless, the tuning profiles were diverse, suggesting a populationl code for head-body orientation angle. We have added this interesting and novel result to the Results (page 16) and Suppl. Material (Figure S3).

      (2) Figure 4 of the paper shows a better decoding of the configuration angle in the anterior STS than in the middle STS. This is an interesting result, suggesting a transformation in the neural representation between these two areas. However, some control analyses are needed to further elucidate the nature of this transformation. For example, what about the decoding of head and body orientations - dose absolute orientation information decrease along the hierarchy, accompanying the increase in configuration information?

      We have performed now two additional analyses, one in which we decoded the orientation of the head and another one in which we decoded the orientation of the body. We employed the responses to the avatar of experiment 1, using the same sample of neurons of which we decoded the head-body orientation angle. To decode the head orientation, the trials with identical head orientation, irrespective of their body orientation, were given the same label. For this, we employed only responses in the head-centered condition. To decode the body orientation, the trials with identical body orientation, irrespective of their head orientation, had the same label, and we employed only responses in the body-centered condition. The decoding was performed separately for each pose (P1 and P2) and region. We decoded either the responses of 20 neurons (10 randomly sampled from each monkey for each of the 1000 resamplings), 40 neurons (20 randomly sampled per monkey), or 60 neurons (30 neurons per monkey) since the sample of 60 neurons yielded close to ceiling performance for the body orientation decoding. For each pose, the body orientation decoding was worse for aSTS than for mSTS, although this difference reached significance only for P1 and for the 40 neurons sample of P2 (p < 0.025; two-tailed test; same procedure as employed for testing the significance of the decoding of whole-body orientation for upright versus inverted avatars (Experiment 3))). Face orientation decoding was significantly worse for aSTS compared to mSTS. These results are in line with the previously reported decreased decoding of face orientation in the anterior compared to mid-STS face patches (Meyers EM, Borzello M, Freiwald WA, Tsao D. Intelligent information loss: the coding of facial identity, head pose, and non-face information in the macaque face patch system. J Neurosci. 2015 May 6;35(18):7069-81), and decreased decoding of body orientation in anterior compared to mid-STS body patches (Kumar S, Popivanov ID, Vogels R. Transformation of Visual Representations Across Ventral Stream Body-selective Patches. Cereb Cortex. 2019 Jan 1;29(1):215-229). As mentioned by the reviewer, this contrasts with the decoding of the head-body orientation angle, which increases when moving more anteriorly. We mention this finding now in the Discussion (page 27) and present the new Figure S10 in the Suppl. Material.    

      (3) While this work has characterized the neural integration of head and body information in detail, it's unclear how the neural representation relates to the animal's perception. Behavioural experiments using the same set of stimuli could help address this question, but I agree that these additional experiments may be beyond the scope of the current paper. I think the authors should at least discuss the potential outcomes of such experiments, which can be tested in future studies.

      Unfortunately, we do not have behavioral data. One prediction would be that the discrimination of head-body orientation angle, irrespective of the viewpoint of the avatar, would be more accurate for zero versus straight angles compared to the right versus left angles. We have added this to the Discussion (page 28).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) P22 L373. It should read Figure S5C instead of S4C.

      Thanks; corrected.

      (2) Figure 7B. All inverted decoding accuracies, although significantly lower than upright decoding accuracies, appear significantly above baseline. Should the title be amended accordingly?

      Thanks for pointing this out. To avoid future misunderstanding we have changed the title to:

      “Integration of head and body orientations in the macaque superior temporal sulcus is stronger for upright bodies”

      (3) Discussion L432-33. "with some neurons being tuned to a particular orientation of both the head and the body". Wouldn't that be visible as a diagonal profile on the normalized net responses in Fig 3D? Or can the Anova evidence such a tuning?

      We meant to say that some neurons were tuned to a particular combination of head and body orientation, like the third aSTS example neuron shown in Figure 3D. We have corrected the sentence.

      Reviewer #2 (Recommendations for the authors):

      Major comment:

      This paper effectively demonstrates that the angular relationship between the head and body can be decoded from population responses in the anterior STS. In other words, these neurons encode information about the head-body angle. However, how exactly do these neurons encode this information? Given that the study employed electrophysiological recordings from a local population of neurons, it might be possible to provide additional data on the response patterns of individual neurons to shed light on the underlying encoding mechanisms.

      Although the paper already presents example response patterns (Figures 3D, E) and shows that STS neurons encode interactions between head and body orientations (Figure 3B), it remains unclear whether the angle difference between the head and body has a systematic effect on neuronal responses. For instance, a description of whether some neurons preferentially encode specific head-body angle differences (e.g., a "45-degree angle neuron"), or additional population analyses such as a one-way ANOVA with angle difference as the main effect (or two-way ANOVA with angle difference as one of the main effect), would be very informative. Such data could offer valuable insights into how individual neurons contribute to the encoding of head-body angle differences-a detail that may also be reflected in the decoding results. Alternatively, it is possible that the encoding of head-body angle is inherently complex and only discernible via decoding methods applied to population activity. Either scenario would provide interesting and useful information to the field.

      We have performed two additional analyses which are relevant to this comment. First, we attempted to relate the tuning for body and head orientation with the decoding of the head-body orientation angle. To do this, one needs to identify the neurons that contribute to the decoding of head-body orientation angles. For that, we employed a neuron-dropping analysis, similar to Chiang et al. (Chiang FK, Wallis JD, Rich EL. Cognitive strategies shift information from single neurons to populations in prefrontal cortex. Neuron. 2022 Feb 16;110(4):709-721.) to assess the positive (or negative) contribution of each neuron to the decoding performance. We performed cross-validated linear SVM decoding N times, each time leaving out a different neuron (using N-1 neurons; 2000 resamplings of pseudo-population vectors). We then ranked decoding accuracies from highest to lowest, identifying the ‘worst’ (rank 1) to ‘best’ (rank N) neurons. Next, we conducted N decodings, incrementally increasing the number of included neurons from 1 to N, starting with the worst-ranked neuron (rank 1) and sequentially adding the next (rank 2, rank 3, etc.). This analysis focused on zero versus straight angle decoding in the aSTS, as it yielded the highest accuracy. We applied it when training on MC and testing on HC for each pose. Plotting accuracy as a function of the number of included neurons suggested that less than half contributed positively to decoding (see Figure S3). We examined the tuning for head and body orientation of the 10 “best” neurons (Figure S3). For half or more of those the two-way ANOVA showed a significant interaction. These are indicated by the red color in the Figure. They showed a variety of tuning patterns for head and body orientation, suggesting that the decoding of the head-body orientation angle results from a combination of neurons with different tuning profiles.

      Second, we have followed the suggestion of the reviewer to perform for each neuron of experiment 1 a one-way ANOVA with as factor head-body orientation angle. To do that, we combined all 64 trials that had the same head-body orientation angle. The percentage of neurons (required to be responsive in the tested condition) for which this one-way ANOVA was significant is shown in the Tables below for each region, separately for each pose (P1, P2), centering condition (MC = monkey-centered; HC = head-centered) and monkey subject (M1, M2). The percentages were low but larger than the expected 5% (Type 1 error), with a median of 16.5% (range: 3 to 23%) in aSTS and 8% for mSTS (range: 0-19%).

      Author response table 1.

      Interestingly, a higher percentage of the 10 best neurons for each pose (indicated by the star in the Figure above) showed a significant one-way ANOVA for angle (for P1, MC: 50% (95% confidence interval (CI): 19% – 81%); P1, HC: 70% (CI: 35% - 93%); P2, MC: 70% (CI: 35% – 93%); P2: HC: 50% (CI: 19%-81%)). These percentages were significantly higher than expected for a random sample from the population of neurons for each pose-centering combination (expected percentages listed in the same order as above: 16%, 13%, 16%, and 10%; all outside CI). Thus, for at least half of the “best” neurons, the response differed significantly among the head-orientation angles at the single neuron level. Nonetheless, the tuning profiles were quite diverse, suggesting population coding of head-body orientation angle. We have added this interesting and novel result to the Results (page 16) and Suppl. Material (Figure S3).    

      Minor comments:

      (1) Figure 4A, Fourth Row Example (Zero Angle vs. Straight Angle, Bottom of the P2 Examples): The order of the example stimuli might be incorrect- the 0{degree sign} head with 180{degree sign} body stimulus (leftmost) might be swapped with the 180{degree sign} head with 0{degree sign} body stimulus (5th from the left). While this ordering may be acceptable, please double-check whether it reflects the authors' intended arrangement.

      We have changed the order of the two stimuli in Figure 4A, following the suggestion of the reviewer.

      (2) Page 12, Lines 192-194: The text states, "Interestingly, some neurons (e.g. Figure 3D) were tuned to a particular combination of a head and body irrespective of centering." However, Figure 3D displays data for a total of 10 neurons. Could you please specify which of these neurons are being referred to in this context?

      The wording was not optimal. We meant to say that some neurons were tuned to a particular combination of head and body orientation, like the third aSTS example neuron of Figure 3D. We have rephrased the sentence and clarified which example neuron we referred to.

      (3) Page 28, Lines 470-471: The text states, "We observed no difference in response strength between anatomically possible and impossible configurations." Please clarify which data were compared for response strength, as I could not locate the corresponding analyses.

      The anatomically possible and impossible configurations differ in the head-body orientation angle. However, as we reported before in the Results, there was no effect of head-body orientation angle on mean response strength across poses (Friedman ANOVA; all p-values for both poses and centerings > 0.1). We have clarified this now in the Discussion (page 28).

      (4) Pages 40-43, Decoding Analyses: In experiments 2 and 3, were the decoding analyses performed on simultaneously recorded neurons? If so, such analyses might leverage trial-by-trial correlations and thus avoid confounds from trial-to-trial variability. In contrast, experiment 1, which used single-shank electrodes, would lack this temporal information. Please clarify how trial numbers were assigned to neurons in each experiment and how this assignment may have influenced the decoding performance.

      For the decoding analyses of experiments 2 and 3, we combined data from different daily penetrations, with only units from the same penetration being recorded simultaneously. In the decoding analyses of each experiment, the trials were assigned randomly to the pseudo-population vectors, shuffling on each resampling the trial order per neuron. This shuffling abolishes noise correlations in the analysis of each experiment.

      (5) Page 41, Lines 792-802: The authors state that "To assess the significance of the differences in classification scores between pairs of angles ... we computed the difference in classification score between the two pairs for each resampling and the percentile of 0 difference corresponded to the p-value." In a two-sided test under the null hypothesis of no difference between the distributions, the conventional approach would be to compute the p-value as the proportion of resampled differences that are as extreme or more extreme than the observed difference. Since a zero difference might be relatively rare, relying solely on its percentile could potentially misrepresent the tail probabilities relevant to a two-sided test. Could you clarify how their method addresses this issue?

      This test is based on the computation of the distribution of the difference between classification accuracies across resamplings. This is similar to the computation of the confidence interval of a  difference. Thus, we assess whether the theoretical zero value (= no difference; = null hypothesis) is outside the 2.5 and 97.5 percentile interval of the computed distribution of the empirically observed differences. We clarified now in the Methods (page 41) that for a two-tailed test the computed p-value (the percentile of the zero value) should be smaller than 0.025.

      (6) Page 43, Lines 829-834: The manuscript explains: "The mean of 10 classification accuracies (i.e., of 10 resamplings) was employed to obtain a distribution (n=100) of the differences in classification accuracy ... The reported standard deviations of the classification accuracies are computed using also the means of 10 resamplings." I am unfamiliar with this type of analysis and am unclear about the rationale for calculating distributions and standard deviations based on the means of 10 resamplings rather than using the original distribution of classification accuracies. This resampling procedure appears to yield a narrower distribution and smaller standard deviations than the original data. Could you please justify this approach?

      The logic of the analysis is to reduce the noise in the data, by averaging across 10 randomly selected resamplings, but still keeping a sufficient number of data (100 values) for a test.

      Reviewer #3 (Recommendations for the authors):

      (1) Some sentences are too long and difficult to parse. For example, in line 177: "the correlations between the responses to the 64 head-body orientation conditions of the two centerings for the neuron and pose combinations showing significant head-body interactions for the two centerings were similar to those observed for the whole population."

      We have modified this sentence: For neuron and pose combinations with significant head-body interactions in both centerings, the correlations between responses to the 64 head-body orientation conditions were similar to those observed in the whole population.

      (2) The authors argue in line 485: "in our study, a search bias cannot explain the body-inversion effect since we selected responsive units using both upright and inverted images." However, the body-selective patches were localized using upright images, correct?

      The monkey-selective patches were localized using upright images indeed. However, we recorded in experiment 3 (and 2) also outside the localized patches (as we noted before in the Methods:  “In experiments 2 and 3 we recorded from a wider region, which overlapped with the two monkey patches and the recording locations of experiment 1”). Furthermore, the preference for upright monkey images is not an all-or-nothing phenomenon: most units still responded to inverted monkeys. Also, we believe it is likely that the mean responses to the inverted bodies in the monkey patches, defined by upright bodies versus objects, would be larger than those to objects and we would be surprised to learn that there is a patch selective for inverted bodies that we would have missed with our localizer.

      (3) Typo: line 447, "this independent"->"is independent"?

      Corrected.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Kv2 subfamily potassium channels contribute to delayed rectifier currents in virtually all mammalian neurons and are encoded by two distinct types of subunits: Kv2 alpha subunits that have the capacity to form homomeric channels (Kv2.1 and Kv2.2), and KvS or silent subunits (Kv5,6,8.9) that can assemble with Kv2.1 or Kv2.2 to form heteromeric channels with novel biophysical properties. Many neurons express both types of subunits and therefore have the capacity to make both homomeric Kv2 channels and heteromeric Kv2/KvS channels. Determining the contributions of each of these channel types to native potassium currents has been very difficult because the differences in biophysical properties are modest and there are no Kv2/KvS-specific pharmacological tools. The authors set out to design a strategy to separate Kv2 and Kv2/KvS currents in native neurons based on their observation that Kv2/KvS channels have little sensitivity to the Kv2 pore blocker RY785 but are blocked by the Kv2 VSD blocker GxTx. They clearly demonstrate that Kv2/KvS currents can be differentiated from Kv2 currents in native neurons using a two-step strategy to first selectively block Kv2 with RY785, and then block both with GxTx. The manuscript is beautifully written; takes a very complex problem and strategy and breaks it down so both channel experts and the broad neuroscience community can understand it.

      Strengths:

      The compounds the authors use are highly selective and unlikely to have significant confounding cross-reactivity to other channel types. The authors provide strong evidence that all Kv2/KvS channels are resistant to RY785. This is a strength of the strategy - it can likely identify Kv2/KvS channels containing any of the 10 mammalian KvS subunits and thus be used as a general reagent on all types of neurons. The limitation then of course is that it can't differentiate the subtypes, but at this stage, the field really just needs to know how much Kv2/KvS channels contribute to native currents and this strategy provides a sound way to do so.

      Weaknesses:

      The authors are very clear about the limitations of their strategy, the most important of which is that they can't differentiate different subunit combinations of Kv2/KvS heteromers. This study is meant to be a start to understanding the roles of Kv2/KvS channels in vivo. As such, this is a minor weakness, far outweighed by the potential of the strategy to move the field through a roadblock that has existed since its inception.

      The study accomplishes exactly what it set out to do: provide a means to determine the relative contributions of homomeric Kv2 and heteromeric Kv2/KvS channels to native delayed rectifier K+ currents in neurons. It also does a fabulous job laying out the case for why this is important to do.

      Reviewer #2 (Public Review):

      Summary:

      Silent Kv subunits and the channels containing these Kv subunits (Kv2/KvS heteromers) are in the process of discovery. It is believed that these channels fine-tune the voltage-activated K+ currents that repolarize the membrane potential during action potentials, with a direct effect on cell excitability, mostly by determining action potentials firing frequency.

      Strengths:

      What makes silent Kv subunits even more important is that, by being expressed in specific tissues and cell types, different silent Kv subunits may have the ability to fine-tune the delayed rectifying voltage-activated K+ currents that are one of the currents that crucially determine cell excitability in these cells. The present manuscript introduces a pharmacological method to dissect the voltage-activated K+ currents mediated by Kv2/KvS heteromers as a means of starting to unveil their importance, together with Kv2-only channels, to the cells where they are expressed.

      Weaknesses:

      While the method is effective in quantifying these currents in any isolated cell under an electric voltage clamp, it is ineffective as a modulating maneuver to perhaps address these currents in an in vivo experimental setting. This is an important point but is not a claim made by the authors.

      We agree. We have now stated in the introduction that this study does not address the roles of Kv2/KvS currents in an in vivo setting.

      Manuscript revisions:

      While this study does not address the impact of GxTX or RY785 on action potentials or in vivo, the distinct pharmacology of Kv2/KvS heteromers presented here suggests that KvS conductances could be targeted to selectively modulate discrete subsets of cell types.  

      There are other caveats with the methods and data:

      (i) The need for a 'cocktail' of blockers to supposedly isolate Kv2 homomers and Kv2/KvS heteromers' currents from others may introduce errors in the quantification Kv2/KvS heteromers-mediated K+ currents and that is due to possible blockers off targets.

      We now point out that is possible that off target effects of blockers may introduce errors, include references that identify the selectivity of the blockers used in the cocktail, and specifically note that 4-aminopyridine in the cocktail is expected to block 2% of Kv2 homomers yet have a lesser impact Kv2/KvS heteromers. Additionally, to test whether the KvS isolation strategy requires the cocktail in neurons, we performed new experiments on a different subclass of nociceptors without the blocker cocktail and identified a substantial KvS-like component (new Fig 7 Supplement 3).

      Manuscript revisions:

      “After whole-cell voltage clamp was established, non-Kv2/KvS conductances were suppressed by changing to an external solution containing a cocktail of inhibitors: 100 nM alpha-dendrotoxin (Alomone) to block Kv1 (Harvey and Robertson, 2004), 3 μM AmmTX3 (Alomone) to block Kv4 (Maffie et al., 2013; Pathak et al., 2016), 100 μM 4-aminopyridine to block Kv3 (Coetzee et al., 1999; Gutman et al., 2005), 1 μM TTX to block TTX sensitive Nav channels, and 10 μM A803467 (Tocris) to block Nav1.8 (Jarvis et al., 2007). It is possible that off target effects of blockers may introduce errors in the quantification Kv2/KvS heteromer-mediated K<sup>+</sup> currents. For example, 4-aminopyridine is expected to block a small fraction, 2%, of Kv2 homomers and have a lesser impact on Kv2/KvS heteromers (Post et al., 1996; Thorneloe and Nelson, 2003; Stas et al., 2015) which could result in a slight overestimation of the ratio of Kv2/KvS heteromers to Kv2 homomers.”

      “We also tested the other major mouse C-fiber nociceptor population, peptidergic nociceptors, to determine if this subpopulation also has conductances resistant to RY785 yet sensitive to GxTX. We voltage clamped DRG neurons from a CGRP<sup>GFP</sup> mouse line that expresses GFP in peptidergic nociceptors (Gong et al., 2003). Deep sequencing has identified mRNA transcripts for Kv6.2, Kv6.3, Kv8.1 and Kv9.3 present in GFP+ neurons, an overlapping but distinct set of KvS subunits from the Mrgprd<sup>GFP</sup> non-peptidergic population (Zheng et al., 2019). In GFP+ neurons from CGRP<sup>GFP</sup> mice, we found that a fraction of outward current was inhibited by 1 µM RY785 and additional current inhibited by 100 nM GxTX (Fig 7 Supplement 3 A-C). In these experiments, 58 ± 2% (mean ± SEM) was KvS-like (Fig 7 Supplement 3 D) identifying that KvSlike conductances are present in these peptidergic nociceptors. For CGRP<sup>GFP</sup> neurons we did not include the Kv1, Kv3, Kv4, Nav and Cav channel inhibitor cocktail used for other neuron experiments, indicating that the cocktail of inhibitors is not required to identify KvS-like conductances.”

      (ii) During the electrophysiology experiments, the authors use a holding potential that is not as negative as it is needed for the recording of the full population of the Kv2/KvS channels. Depolarized holding potentials lead to a certain level of inactivation of the channels, that vary according to the KvS involved/present in that specific population of channels. As a reminder, some KvS promote inactivation and others prevent inactivation. Therefore, the data must be interpreted as such.

      We agree. We now point out that the physiological holding potentials used are insufficiently negative to relieve inactivation from all Kv2/KvS heteromeric channels. We also note that the ratio of Kv2-like to KvS-like conductance is expected to vary with voltage protocols.

      Manuscript revisions:

      “Neurons were held at a membrane potential of –74 mV to mimic a physiological resting potential. KvS subunits can profoundly shift the voltage-inactivation relation (Salinas et al., 1997a; Kramer et al., 1998; Kerschensteiner and Stocker, 1999) and this potential is likely insufficiently negative to relieve inactivation from all Kv2/KvS heteromeric channels. Also, the activation membrane potential is close to the half-maximal point of Kv2/KvS conductances. Thus the ratio of Kv2-like to KvS-like conductance is expected to vary with voltage protocols.”

      (iii) The analysis of conductance activation by using tail currents is only accurate when dealing with non-inactivating conductances. Also, in dealing with a heterogenous population of Kv2/KvS heteromers, heterogenous K+ conductance deactivation kinetics is a must. Indeed, different KvS may significantly relate to different deactivation kinetics as well.

      We now discuss that the bi-exponential fit of tail currents is likely inadequate to capture the deactivation kinetics of all underlying components of a heterogenous population of Kv2/KvS heteromers.

      Manuscript revisions:

      “We note that the analysis of conductance activation by using tail currents is only accurate when dealing with non-inactivating conductances. We expect that inactivation of Kv2/KvS conductances during the 200 ms pre-pulse is minimal (Salinas et al., 1997a; Kramer et al., 1998; Kerschensteiner and Stocker, 1999) and did not notice inactivation during the activation pulse. Also, deactivation kinetics can vary in a heterogenous population of Kv2/KvS heteromers. While analysis of tail currents could skew the quantification of total Kv2 like and KvS-like conductances, our data supports that mouse nociceptors and human neurons have tail currents that are resistant to RY785 and sensitive to GxTX consistent with the presence of Kv2/KvS heteromers.”

      (iv) Silent Kv subunits may be retained in the ER, in heterologous systems like CHO cells. This aspect may subestimate their expression in these systems. Nevertheless, the authors show similar data in CHO cells and in primary neurons.

      We agree. We now note that in heterologous systems, including CHO cells, transfection of KvS subunits can result in KvS subunits that are retained intracellularly.

      Manuscript revisions:

      “While a fraction of KvS subunits appear to be retained intracellularly, immunofluorescence for Kv5.1, Kv9.3 and Kv2.1 also appeared localized to the perimeter of transfected Kv2.1-CHO cells (Figure 1 Supplement).”

      (v) The hallmark of silent Kv subunits is their effect on the time inactivation of K+ currents. As such, data should be shown throughout, preferably, from this perspective, but it was only done so in Figure 4G.

      Indeed, effects on inactivation are a hallmark of KvS subunits. However, quantifying inactivation of Kv2/KvS channels requires steps to positive voltages for approximately 10 seconds. In neurons steps this long usually resulted in irreversible changes in leak currents/input resistance that degraded the accuracy of RY785/GxTX subtraction currents. Consequently, we did not acquire inactivation data in neurons, and we now explain in the manuscript why such data was not obtained.

      Manuscript revisions:

      “While changes in inactivation are prominent with KvS subunits, we did not investigate inactivation in neurons because the lengthy depolarizations required often resulted in irreversible leak current increases that degraded the accuracy of RY785/GxTX subtraction current quantification.”

      (vi) Functional characterization of currents only, as suggested by the authors as a bona fide of Kv2 and Kv2/KvS currents, should not be solely trusted to classify the currents and their channel mediators.

      We agree, and now state explicitly that functional characterization cannot be trusted to classify their channel mediators of conductances, and we try to be clear about this throughout the manuscript by using soft terms such as "KvS-like" when identity is uncertain.

      Manuscript revisions:

      “As functional characterization alone cannot be trusted to classify their channel mediators of conductances, we define conductances consistent with Kv2/KvS heteromers as 'KvS-like' and conductances consistent with Kv2 homomers as 'Kv2-like'.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      There is not a lot to do here - this was a real pleasure to read and very easy to understand, as written. Here are a few minor things to consider:

      (1) The naming of the KvS subunits has always been confusing - it is not clear that Kv5,6,8,9 are members of the Kv2 subfamily from the names. KvS does a good job of differentiating them by assembly phenotype and has been used a lot in the literature, but it doesn't solve the misconception of what subfamily they belong to. This might not matter so much for mammals, where all KvS channels are in the Kv2 subfamily, but it makes it impossible to extend the naming system to other animals where subunits requiring heteromeric assembly are common in most subfamilies. How about trying the name Kv2S? It would have continuity with KvS in the reader's mind, make it clear that they are Kv2 subfamily, and make a naming system that could be extended beyond vertebrates. This is not a problem the authors created - just a completely optional suggestion on how to solve it if so inclined.

      We agree that naming conventions for these subunits are problematic, and agonized quite a bit about nomenclature. In the end we chose to stick with the precedent of KvS.

      (2) Another naming issue they should definitely change is the use of "subfamily" for the different KvS subtypes (Kv5, Kv6, Kv8, and Kv9). This really creates confusion with the higher-order subfamilies that have a very clear functional definition: a subfamily of Kv genes is a group of related genes that have assembly compatibility. Those are Kv1, Kv2, Kv3 and Kv4. KvS genes are assembly compatible with Kv2, evolutionarily derived from the Kv2 lineage, and thus clearly a part of the Kv2 subfamily. Using a subfamily for the next lower level of the naming hierarchy confuses this. The authors should use different terms like sub-type or class or subgroups for the divisions within KvS.

      Thank you. We have standardized to Kv2/KvS as a subfamily; Kv5, Kv6, Kv8, and Kv9 as subtypes; and individual proteins, e.g. Kv8.1, as subunits.

      (3) When you discuss whether the KvS subunit directly disrupts Ry785 binding in the pore or works allosterically and you said you know which KvS residues point into the pore from models, I thought that maybe you could tell from a sequence alignment whether the KvS channels you didn't test look the same in the conduction pathway as the ones you did test. If so, you could mention that if the binding site is the pore, they should all be resistant. Alternatively, if one you didn't test looks fundamentally more similar to the Kv2s in this region, then maybe it could be fingered as a possible exception that needs to be tested later.

      Great ideas. We now assess sequence KvS variability near the proposed RY785 binding site in all KvS subunits. We generated structural models of RY785 docking to Kv2.1 and Kv2.1/Kv8.1 and found that residues near RY785 are different in all KvS subunits.

      Manuscript revisions:

      “We analyzed computational structural models of RY785 docked to a Kv2.1 homomer and a 3:1 Kv2.1:Kv8.1 heteromer (Fig 9) to gain structural insight into how KvS subunits might interfere with RY785 binding. We used Rosetta to dock RY785 to a cryo-EM structure of a Kv2.1 homomer in an apparently open state (Fernández-Mariño et al., 2023). The top-scoring docking pose has RY785 positioned below the selectivity filter and off-axis of the pore (Fig 9 A), similar to a stable pose observed in molecular dynamic simulations (Zhang et al., 2024). In this pose, RY785 contacts a collection of Kv2.1 residues that vary in every KvS subtype (Fig 9 B,D,E). Notably, RY785 bound similarly to a 3:1 model of Kv2.1/Kv8.1, in contact with the three Kv2.1 subunits, yet avoided the Kv8.1 subunit (Fig 9C). This is consistent with RY785 binding less well to Kv2.1/Kv8.1 heteromers, and also suggests that a 3:1 Kv2:KvS channel could retain a RY785 binding site when open.”

      (4) Future suggestion or tip - not for this paper. Your data shows your isolation strategy works really well on Kv6 channels, and these are also the Kv2/KvS channels that have the most pronounced biophysical changes. Working on neurons that have a prominent Kv2/Kv6 component would really show how well the strategy outlined here works to describe the physiology of native neurons. The highest KvS expression I have seen in public data in a wellstudied cell type is Kv6.4 in spinal motor neurons.

      Wonderful tip, thank you. We are indeed very interested in Kv6.4 in spinal motor neurons.

      Reviewer #2 (Recommendations For The Authors):

      The manuscript makes a good contribution to the identification of Kv2/KvS channels in primary cells. The pharmacological method proposed by the authors to dissect the currents in an experimental setting seems proper. Although meritorious in themselves, the findings are heavily phenomenological in the opinion of this reviewer. The manuscript should be improved with some level of mechanistic data and/or the demonstration of different levels of expression in different cell types.

      Thank you for the suggestions. This manuscript now demonstrates strikingly higher levels of the KvS-like component of Kv2 currents in somatosensory (DRG nonpeptidergic and peptidergic nociceptor) versus autonomic (SCG) neuron types. The mechanistic question of what electrophysiological properties the KvS subunits are providing to the neuronal circuit is an exciting one that we are pursuing separately.

      Manuscript revisions:

      “While we found only RY785-sensitive Kv2-like conductances in SCG neurons, Kv2/KvS heteromer-like conductances were dominant in DRG neurons.”

      At present, the manuscript says that the combination of RY785 and guangxitoxin-1E can be used to define Kv2/KvS-mediated K+ currents. Importantly, this method cannot be used in a way that one can functionally determine the function of Kv2/KvS channels, since it depends on the pre-blocking of Kv2-mediated K+ currents prior. In the opinion of this reviewer, this fact decreases the attention of a potential reader.

      Indeed, our study is focused on revealing KvS heteromers by voltage clamp, and we now clarify in the introduction that we do not determine the function of Kv2/KvS channels in this study, so as not to lead the reader to expect studies of neuronal signaling.

      However, the selective pharmacology we identify suggests RY785 application could reveal the function of Kv2 homomers, and for RY785-insensitive signaling, GxTX application of could reveal the function of Kv2/KvS heteromers. We now mention these possible applications in the Discussion.

      Manuscript revisions:

      “While this study does not address the impact of GxTX or RY785 on action potentials or in vivo, the distinct pharmacology of Kv2/KvS heteromers presented here suggests that KvS conductances could be targeted to selectively modulate discrete subsets of cell types.”

      Please find below suggestions for improving the manuscript:

      (1) The term "Kv2/KvS heteromers" should be used throughout instead of variations such as "Kv2/KvS channels", "Kv2/KvS" and others. Standardization of the term to refer to heteromers would make the manuscript easier to read.

      Thank you. We have standardized terms to consistently refer to Kv2/KvS heteromers.

      (2) Confusing terms like KvS conductances, KvS-like conductances, KvS-like (RY785-resistant, GxTX-sensitive) currents, and KvS channels should be avoided because they disregard the current belief that KvS cannot form functional homomeric channels. The term KvS-containing channels, and Kv2/KvS channels, seem more accurate. Uniformization in this regard will also make the manuscript more easily readable.

      Thank you. We have standardized terms to Kv2/KvS heteromers and KvS-containing channels when channel subunits are known and the use terms KvS-like and Kv2-like for functionally identified endogenous conductances with unknown channel subunits.

      (3) Referring to KvS as a regulatory subunit is inaccurate. It is clear that KvS is part of, and it makes up the alpha pore. KvS therefore is a part of the conductive pathway and not a regulatory (suggesting accessory) subunit. KvS take part in selectivity filter (fully conserved), but they also make up an important part of the conducting pathway with non-conserved amino acid residues.

      We felt it important to include the descriptor “regulatory” to connect our nomenclature with prior use of the descriptor in the literature, and now only use the term at the start of the introduction.

      Manuscript revisions:

      “A potential source of molecular diversity for Kv2 channels are a group of Kv2-related proteins which have been referred to as regulatory, silent, or KvS subunits.”

      (4) The use of a cocktail of channel inhibitors may affect the quantification of Kv2/KvS heteromers-mediated K+ currents because they may interact with RY785 and/or GxTx or they may even interact with the sites for these two drugs on Kv2-containing channels.

      This is an interesting point worth considering, thank you. We now alert readers to this possibility in the discussion when considering the limitations of our approach.

      Manuscript revisions:

      “Also, the cocktail of inhibitors used in most neuron experiments here could potentially alter RY785 or GxTX action against KvS/Kv2 channels.”

      (5) The graphical representation of fractional blocking and other parameters (e.g., Fig 1D), is hard to read in these slim plots. In my opinion, tall bars would be more meaningfully visualized.

      Thank you for pointing out that the graphs were hard to read, we have made the graph easier to read and added tall bars.

      (6) Vehicle control for IHC and electrophysiology. Please state what is the vehicle used in the electrophysiology experiments.

      Thank you. The composition of vehicle has now been stated in the methods.

      Manuscript revisions:

      “All RY785 solutions contained 0.1% DMSO. Vehicle control solutions also contained 0.1% DMSO but lacked RY785.”

      “Sections were incubated in vehicle solution (4% milk, 0.2% triton diluted in PB) for 1 hr at RT.”

      (7) The reference Trapani & Korn, 2003 (?) is not included in the list. This reference is important since it sets what are the Kv2.1-CHO cells. In this regard it is also important to mention, even better to address, the expressing qualities of this system in the face of a co-expression with a plasmid-based expression of silent Kv subunits. Are these two ways of expressing Kv subunits, meant to come together (or not) in heteromers, balanced? This question is critical here. Still, in regard to Kv2.1-CHO cells, it was not clear in the manuscript if the term "transfection" refers only to the plasmids used to temporarily induce the expression of silent Kv subunits and potentially Kv channels accessory subunits.

      We now include the Trapani & Korn, 2003 reference (thank you for pointing out this accidental omission), and better explain expression methods. The benefit of the inducible Kv2.1 expression is control of Kv conductance densities which can otherwise become so large as to be refractory to voltage clamp. The beauty of the expression system is that cells recently transfected with KvS subunits can be induced to express just enough Kv2.1 to get a substantial but not clampoverwhelming RY785-resistant Kv2/KvS conductance. We also discuss that our expression methods are distinct from past studies. We stop short of comparing the expression systems, as this is beyond the scope of what we set out to study.

      Manuscript revisions: See next response

      (8) Kv2.1-CHO cells transfection procedures, induction, and validation are unclear. This validation is important here.

      We have clarified transfection procedures, induction, and validation in the methods section.

      Manuscript revisions:

      “The CHO-K1 cell line transfected with a tetracycline-inducible rat Kv2.1 construct (Kv2.1-CHO) (Trapani and Korn, 2003) was cultured as described previously (Tilley et al., 2014).”

      Transfections were achieved with Lipofectamine 3000 (Life Technologies, L3000001). 1 μl Lipofectamine was diluted, mixed, and incubated in 25 μl of Opti-MEM (Gibco, 31985062).”

      “Concurrently, 0.5 μg of KvS or AMIGO1 or Navβ2, 0.5 μg of pEGFP, 2 μl of P3000 reagent and 25 μl of Opti-MEM were mixed. DNA and Lipofectamine 3000 mixtures were mixed and incubated at room temperature for 15 min. This transfection cocktail was added to 1 ml of culture media in a 24 well cell culture dish containing Kv2.1-CHO cells and incubated at 37 °C in 5% CO2 for 6 h before the media was replaced. Immediately after media was replaced, Kv2.1 expression was induced in Kv2.1-CHO cells with 1 μg/ml minocycline (Enzo Life Sciences, ALX380-109-M050), prepared in 70% ethanol at 2 mg/ml. Voltage clamp recordings were performed 12-24 hours later. We note that the expression method of Kv2/KvS heteromers used here is distinct from previous studies which show that the KvS:Kv2 mRNA ratio can affect the expression of functional Kv2/KvS heteromers (Salinas et al., 1997b; Pisupati et al., 2018). We validated the functional Kv2/KvS heteromer expression using voltage clamp to establish distinct channel kinetics and the presence of RY785-resistant conductance in KvS-transfected cells and using immunohistochemistry to label apparent surface localization of KvS subunits (Figure 4, Figure 1 Supplement, Figure 1 and Figure 5).”

      (9) It is important for readers to add some context to Kv2.1/Kv8.1 channels (and other Kv2/KvS heteromers) used to test the combination of RY785 and GxTx. In my opinion, this enriches the discussion.

      Good idea. We have added context about each of the KvS subunits we test.

      Manuscript revisions:

      “To test the pharmacological response of KvS we began with Kv8.1, a subunit that creates heteromers with biophysical properties distinct from Kv2 homomers (Salinas et al., 1997a), and modulates motor neuron vulnerability to cell death (Huang et al., 2024).

      Each of these KvS subunits create Kv2/KvS heteromers that have distinct biophysical properties (Kramer et al., 1998; Kerschensteiner and Stocker, 1999; Bocksteins et al., 2012). Kv5.1/Kv2.1 heteromers play an important role in controlling the excitability of mouse urinary bladder smooth muscle (Malysz and Petkov, 2020), mutations in Kv6.4 have been shown to influence human labor pain (Lee et al., 2020b), and deficiency of Kv9.3 disrupts parvalbumin interneuron physiology in mouse prefrontal cortex (Miyamae et al., 2021).”

      (10) In general, the membrane potential used to activate Kv2 only channels and Kv2/KvS channels is too close to the activation V1/2. In case the comparing curves are displaced in their relative voltage dependence and voltage sensitivity, using that range of membrane potential may introduce a crucial error in the estimation of the conductance's relative amplitudes.

      We now note that the relative conductances of Kv2-only vs Kv2/KvS channels are expected to vary with voltage protocol, as KvS inclusion results in channels with altered voltage responses.

      Manuscript revisions:

      “…the activation membrane potential is close to the half-maximal point of Kv2/KvS conductances. Thus the ratio of Kv2-like to KvS-like conductance is expected to vary with voltage protocols.”

      (11) The use of tail currents to estimate conductance is problematic if i) lack of current inactivation is not assured, and ii) if the different currents, with possible different deactivation kinetics at the used membrane potential (e.g., mV), are not assured. Why was the activation peak used at times, and at different elapsed times the tail currents were used instead? These aspects of conductance's amplitude estimation methods should be well defined.

      In CHO cells peak currents were analyzed because outward currents seem to offer the best signal/noise. In neurons, we restricted analysis to tail currents at elapsed times to minimize complications from non-Kv2 endogenous voltage-gated channels which deactivate more quickly. We have clarified this analysis in the methods section.

      Manuscript revisions:

      “In CHO cells peak currents were analyzed because outward currents seem to offer the best signal/noise. In neurons, we restricted analysis to tail currents at elapsed times to minimize complications from non-Kv2 endogenous voltage-gated channels which deactivate more quickly. In neurons, voltage gated currents remained in the toxin cocktail + RY785 and GxTX, that were sometimes unstable. To minimize complications from these currents, we restricted analysis of RY785 and GxTX subtraction experiments to tail currents at elapsed times to minimize complications from non-Kv2 endogenous voltage-gated channels which deactivate more quickly. We note that the analysis of conductance activation by using tail currents is only accurate when dealing with non-inactivating conductances. We expect that inactivation of Kv2/KvS conductances during the 200 ms pre-pulse is minimal (Salinas et al., 1997a; Kramer et al., 1998; Kerschensteiner and Stocker, 1999) and did not notice inactivation during the activation pulse. Also, deactivation kinetics can vary in a heterogenous population of Kv2/KvS heteromers. While analysis of tail currents could skew the quantification of total Kv2 like and KvS-like conductances, our data supports that mouse nociceptors and human neurons have tail currents that are resistant to RY785 and sensitive to GxTX consistent with the presence of Kv2/KvS heteromers.”

      (12) Were the experiments including different conditions such as control, RY, and RY+GxTx done pair-wised? This could potentially better the statistics and strengthen the data and the conclusions drawn from them.

      The control, RY, and RY+GxTX in neurons were done pairwise and the statistical tests performed for these experiments were pairwise tests. We have clarified this in the figure legends.

      Manuscript revisions:

      “Wilcoxon rank tests were paired, except the comparison of RY785 to vehicle which was unpaired.”

      (13) The holding potential of the experiments, mostly -89 mV, may be biasing the estimation of Kv2 only channels vs. Kv2/KvS channels conductances. Figure 4I exemplifies this concern.

      We agree. Figure 4I reveals that a holding potential of -89 mV vs -129 mV reduces conductance of Kv2.1/Kv8.1 heteromers vs Kv2.1 homomers in CHO cells by ~20%. We have now alerted readers that the ratio of Kv2 only channels vs. Kv2/KvS conductances can vary with holding voltage.

      Manuscript revisions:

      “Under these conditions, 58 ± 3 % (mean ± SEM) of the delayed rectifier conductance was resistant to RY785 yet sensitive to GxTX (KvS-like) (Fig 7 F). We note that the ratio of KvS- to Kv2-like conductances is expected to vary with holding potential, as KvS subunits can change the degree and voltage-dependence of steady state inactivation (e.g. Fig 4I).”

      (14) It is possible that Figure 6A (control trace) and Figure 6C ("Kv2-like" trace) are the same, by mistake, since their noise pattern looks too similar.

      Indeed the noise pattern of the Figure 6A (control trace) and Figure 6C ("Kv2-like" trace) are related, as they have inputs from the same trace, with Figure 6C ("Kv2-like" trace) being a subtraction of Figure 6A (+RY trace) from Figure 6A (control trace).

      (15) For example, in Figure 7A, what is the identity of the current remaining after the RY+GxTx application? In Figure 7B, a supposed outlier in the group of data referring to "veh" in the right panel is what possibly is making this group different from +RY in the left panel (p=0.02, Wilcoxon rank test). I would recommend parametric tests only since the data is essentially quantitative.

      In Figure 7A, we do not know the identity of the current remaining after the RY+GxTX application, the kinetics of the residual current appeared distinct from the Kv2/KvS-like currents blocked by RY or GxTX, but we did not analyze these.

      The date in Figure 7B, was indeed the positive outlier in the group of data referring to "veh" in the right panel and contributes to the p-value, but we saw no reason to exclude it. We have now replaced the representative trace in 7B with a non-outlier trace. We respectfully disagree with the suggestion to use parametric statistical tests as we do not know the distribution underlying the variance our data.

      Manuscript revisions:

      “Subsequent application of 100 nM GxTX decreased tail currents by 68 ± 5% (mean ± SEM) of their original amplitude before RY785. We do not know the identity of the outward current that remains in the cocktail of inhibitors + RY785 + GxTX.”

      (16) Please state the importance of using nonpeptidergic neurons to study silent Kv5.1 and Kv9.1 subunits. RNA data may not necessarily work to probe function or protein abundance, which is crucial in heteromeric complexes.

      We have now more thoroughly explained our rationale for choosing the nonpeptidergic neurons.

      RNA is not predictive of protein abundance, and we have not yet been successful in measuring KvS protein abundance in these neurons, so we've probed KvS abundance by assessing RY785 resistance.

      Manuscript revisions:

      “Mouse dorsal root ganglion (DRG) somatosensory neurons express Kv2 proteins (Stewart et al., 2024), have GxTX-sensitive conductances (Zheng et al., 2019), and express a variety of KvS transcripts (Bocksteins et al., 2009; Zheng et al., 2019), yet transcript abundance does not necessarily correlate with functional protein abundance. To record from a consistent subpopulation of mouse somatosensory neurons which has been shown to contain GxTXsensitive currents and have abundant expression of KvS mRNA transcripts (Zheng et al., 2019), we used a Mrgprd<sup>GFP</sup> transgenic mouse line which expresses GFP in nonpeptidergic nociceptors (Zylka et al., 2005; Zheng et al., 2019). Deep sequencing identified that mRNA transcripts for Kv5.1, Kv6.2, Kv6.3, and Kv9.1 are present in GFP+ neurons of this mouse line (Zheng et al., 2019) and we confirmed the presence of Kv5.1 and Kv9.1 transcripts in GFP+ neurons from Mrgprd<sup>GFP</sup> mice using RNAscope (Fig 7 Supplement 1).”

      (17) In Figure 8B, were +RY data different from veh data? The figure shows no Wilcoxon (nonparametric) comparison and this is important to be stated. What conductance(s) is the vehicle solution blocking or promoting? What is RY dissolved in, DMSO? What is the DMSO final concentration?

      We now state that in Figure 8B, +RY amplitudes were not statistically different from veh data in this limited data set. However, the RY-subtraction currents always had Kv2-like biophysical properties, whereas vehicle-subtraction currents had variable properties precluding biophysical analysis for Fig 8D.

      In Figure 8B, we do not know what conductance(s) the vehicle solution is affecting, we think the changes observed are likely merely time dependent or due to the solution exchange itself. RY stock is in DMSO. All recording solutions have 0.1% DMSO final concentration, this is now noted in methods.

      Manuscript revisions:

      “Unlike mouse neurons, we did not detect a significant difference in tail currents of RY785 versus vehicle controls. However, RY785-subtracted currents always had Kv2-like biophysical properties whereas vehicle-subtraction currents had variable properties that precluded the same biophysical analysis. Overall, these results show that human DRG neurons can produce endogenous voltage-gated currents with pharmacology and gating consistent with Kv2/KvS heteromeric channels.”

      “All RY785 solutions contained 0.1% DMSO. Vehicle control solutions also contained 0.1% DMSO but lacked RY785.”

      (18) METHODS. The electrophysiology approach should be unified in all aspects as applicable and possible.

      We have unified the mouse dorsal root ganglion and mouse superior cervical ganglion methods sections. We have kept CHO cells and mouse/human neurons section separate because the methods were substantially different.

      (19) DISCUSSION. The discussion section spends half of its space trying to elaborate on possible blocking/inhibiting/modulating mechanisms for RY785. The present manuscript shows no data, at least not that I have noticed, that would evoke such discussion.

      We have shortened this section, and enhance the discussion with structural models (new Fig 9), and our functional data indicating perturbed RY785 interaction with Kv2.1/8.1.

      Manuscript revisions:

      “In this pose, RY785 contacts a collection of Kv2.1 residues that vary in every KvS subtype (Fig 9 B,D,E). Notably, RY785 bound similarly to a 3:1 model of Kv2.1/Kv8.1, in contact with the three Kv2.1 subunits, yet avoided the Kv8.1 subunit (Fig 9C). This is consistent with RY785 binding less well to Kv2.1/Kv8.1 heteromers, and also suggests that a 3:1 Kv2:KvS channel could retain a RY785 binding site when open. However, the RY785 resistance of Kv2/KvS heteromers may primarily arise from perturbed interactions with the constricted central cavity of closed channels. In homomeric Kv2.1, RY785 becomes trapped in closed channels and prevents their voltage sensors from fully activating, indicating that RY785 must interact differently with closed channels (Marquis and Sack, 2022). Here we found that Kv2.1/Kv8.1 current rapidly recovers following washout of RY785, suggesting that Kv2.1/Kv8.1 heteromers do not readily trap RY785 (Figure 2 Supplement). Overall, the structural modeling suggests that KvS subunits sterically interfere with RY785 binding to the central cavity, while functional data suggest KvS subunits disrupt RY785 trapping in closed states.”

      (20) DISCUSSION. Topics like ER retention and release upon certain conditions would be a better enrichment for the manuscript in my opinion.

      ER retention of KvS subunits is indeed an important topic! However, we have opted not to delve into it here.

      (21) DISCUSSION. Speculation about the binding site for RY on Kv2/KvS channels is also not touched by the data shown in the manuscript.

      We have shortened this section of discussion, and now present this with structural models of RY785 docked to a Kv2.1 homomer and 3:1 Kv2.1: Kv8.1 heteromer (new Fig 9) to ground speculations. See manuscript changes noted in response to comment (19) above.

      (22) DISCUSSION. An important reference is missing in regard to stoichiometry: Bocksteins et al., 2017. This work is the only one using a non-optical technique to add knowledge to that question.

      Good point, and an excellent study we didn’t realize we’d not included before. We now include Bocksteins et al. 2017 as a reference in the Introduction.

      (23) In my opinion, allosterism and orthosterism are concepts not yet useful for the discussion of RY binding sites without even a general piece of data.

      We now include structural models of RY785 docked to a Kv2.1 homomer and 3:1 Kv2.1: Kv8.1 heteromer (new Fig 9) to ground blocking speculations. See manuscript changes noted in response to comment (19).

      (24) The term "homogeneously susceptible" associated with a Hill slope close to 1 needs to be more elaborated.

      Thank you, we have elaborated.

      Manuscript revisions:

      “Also, the degree of resistance to RY785 may vary if Kv2:KvS subunit stoichiometry varies. With high doses of RY785, we found that the concentration-response characteristics of Kv2.1/Kv8.1 in CHO cells revealed hallmarks of a homogenous channel population with a Hill slope close to 1 (Fig 2B). However, other KvS subunits might assemble in multiple stoichiometries and result in pharmacologically-distinct heteromer populations.”

      (25) Stating the KvS are resistant to RY785 is not proper in my opinion. This opinion relates to the fact that the RY binding site in the channels is certainly not restricted to a binding site residing only on the Kv subunit.

      Good point. We have now changed phrasing to convey that KvS subunits are a component of a heteromer that imbues RY785 resistance.

      Manuscript revisions:

      “These results show that voltage-gated outward currents in cells transfected with members from each KvS subtype have decreased sensitivity to RY785 but remain sensitive to GxTX. While we did not test every KvS subunit, the ubiquitous resistance suggests that all KvS subunits may provide resistance to 1 μM RY785 yet remain sensitive to GxTX, and that RY785 resistance is a hallmark of KvS-containing channels.”

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors investigate the role of the melanocortin system in puberty onset. They conclude that POMC neurons within the arcuate nucleus of the hypothalamus provide important but differing input to kisspeptin neurons in the arcuate or rostral hypothalamus.

      Strengths:

      Innovative and novel

      Technically sound

      Well-designed

      Thorough

      Weaknesses:

      There were no major weaknesses identified.

      Reviewer #2 (Public review):

      Summary:

      This interesting manuscript describes a study investigating the role of MC4R signalling on kisspeptin neurons. The initial question is a good one. Infertility associated with MC4 mutations in humans has typically been ascribed to the consequent obesity and impaired metabolic regulation. Whether there is a direct role for MC4 in regulating the HPG axis has not been thoroughly examined. Here, the researchers have assembled an elegant combination of targetted loss of function and gain of function in vivo experiments, specifically targetting MC4 expression in kisspeptin neurons. This excellent experimental design should provide compelling evidence for whether melanocortin signalling dirently affects arcuate kisspeptin neurons to support normal reproductive function. There were definite effects on reproductive function (irregular estrous cycle, reduced magnitude of LH surge induced by exogenous estradiol). However, the magnitude of these responses and the overall effect on fertility were relatively minor. The mice lacking MC4R in kisspeptin neurons remained fertile despite these irregularities. The second part of the manuscript describes a series of electrophysiological studies evaluating the pharmacological effects of melanocortin signalling in kisspeptin cells in ex-vivo brain slides. These studies characterised interesting differential actions of melanocortins in two different populations of kisspeptin neurons. Collectively, the study provides some novel insights into how direct actions of melanocortin signalling via the MC4 receptor in kisspeptin neurons contribute to the metabolic regulation of the reproductive system. Importantly, however, it is clear that other mechanisms are also at play.

      Strengths:

      The loss of function/gain of function experiments provides a conceptually simple but hugely informative experimental design. This is the key strength of the current paper - especially the knock-in study that showed improved reproductive function even in the presence of ongoing obesity. This is a very convincing result that documents that reproductive deficits in MC4R knockout animals (and humans with deleterious MC4R gene variants) can be ascribed to impaired signalling in the hypothalamic kisspeptin neurons and not necessarily caused as a consequence of obesity. As concluded by the authors: "reproductive impairments observed in MC4R deficient mice, which replicate many of the conditions described in humans, are largely mediated by the direct action of melanocortins via MC4R on Kiss1 neurons and not to their obese phenotype." This is important, as it might change how such fertility problems are treated.

      I would like to see the validation experiments for the genetic manipulation studies given greater prominence in the manuscript because they are critical to interpretation. Presently, only single unquantified images are shown, and a much more comprehensive analysis should be provided.

      Weaknesses:

      (1) Given that mice lacking MC4R in kisspeptin neurons remained fertile despite some reproductive irregularities, this can be described as a contributing pathway, but other mechanisms must also be involved in conveying metabolic information to the reproductive system. This is now appropriately covered in the discussion.

      (2) The mechanistic studies evaluating melanocortin signalling in kisspeptin neurons were all completed in ovariectomised animals (with and without exogenous hormones) that do not experience cyclical hormone changes. Such cyclical changes are fundamental to how these neurons function in vivo and may dynamically alter how they respond to hormones and neuropeptides. Eliminating this variable makes interpretation difficult, but the authors have justified this as a reductionist approach to evaluate estradiol actions specifically. However, this does not reflect the actual complexity of reproductive function.

      For example, the authors focus on a reduced LH response to exogenous estradiol in ovariectomised mice as evidence that there might be a sub-optimal preovulatory LH surge. However, the preovulatory LH sure (in intact animals) was not measured.

      They have not assessed why some follicles ovulated, but most did not. They have focused on the possibility that the ovulation signal (LH surge) was insufficient rather than asking why some follicles responded and others did not. This suggests some issue with follicular development, likely due to changes in gonadotropin secretion during the cycle and not simply due to an insufficient LH surge.

      Reviewer #3 (Public review):

      The manuscript by Talbi R et al. generated transgenic mice to assess the reproduction function of MC4R in Kiss1 neurons in vivo and used electrophysiology to test how MC4R activation regulated Kiss1 neuronal firing in ARH and AVPV/PeN. This timely study is highly significant in neuroendocrinology research for the following reasons.

      (1) The authors' findings are significant in the field of reproductive research. Despite the known presence of MC4R signaling in Kiss1 neurons, the exact mechanisms of how MC4R signaling regulates different Kiss1 neuronal populations in the context of sex hormone fluctuations are not entirely understood. The authors reported that knocking out Mc4r from Kiss1 neurons replicates the reproductive impairment of MC4RKO mice, and Mc4r expression in Kiss1 neurons in the MC4R null background partially restored the reproductive impairment. MC4R activation excites Kiss1 ARH neurons and inhibits Kiss1 AVPV/PeN neurons (except for elevated estradiol).

      (2) Reproduction dysfunction is one of obesity comorbidities. MC4R loss-of-function mutations cause obesity phenotype and impaired reproduction. However, it is hard to determine the causality. The authors carefully measured the body weight of the different mouse models (Figure 1C, Figure 2A, Figure 3B). For example, the Kiss1-MC4RKO females showed no body weight difference at puberty onset. This clearly demonstrated the direct function of MC4R signaling in reproduction but was not a consequence of excessive adiposity.

      (3) Gene expression findings in the "KNDy" system align with the reproduction phenotype.

      (4) The electrophysiology results reported in this manuscript are innovative and provide more details of MC4R activation and Kiss1 neuronal activation.

      Overall, the authors have presented sufficient background in a clear, logical, and organized structure, clearly stated the key question to be addressed, used the appropriate methodology, produced significant and innovative main findings, and made a justified conclusion.

      Comments on revisions:

      The authors have addressed my comments.

      Recommendations for the authors:

      The reviewers noted that they received comments in response to their concerns, and some improvements have been made to the manuscript. However, as described below, in some cases, a rebuttal was provided, but changes were not made to the manuscript. It is suggested that these issues be addressed to improve the quality of the manuscript.

      We thank the reviewers and editor for the assessment of the manuscript and recommendations for its improvement. We have addressed the remaining comments from reviewer #2 below, and hope that they find our revisions satisfactory.

      Reviewer #2 (Recommendations for the authors):

      The manuscript convincingly shows that MC4R in kisspeptin-producing cells can influence reproductive function. This suggests that fertility problems associated with melanocortin mutations are likely due to direct effects on the reproductive systems rather than simply being side effects of the resultant obesity.

      We are pleased that this reviewer finds the data convincing and thank them for the careful review of the manuscript, which has helped to improve its published version.

      The authors have responded to the reviewer's comments and made several improvements to the manuscript.

      The authors are correct in pointing out that the POMC-Cre animals should be fine for studies involving the administration of AAVs to adult animals. I have misinterpreted how these mice were being used, and this concern is fully addressed.

      Unfortunately, in some cases, the authors rebutted the reviewer's comments but did not change the manuscript. I suggest addressing several issues in the manuscript (after all, it is not the reviewer's opinion that counts; this process is about improving the manuscript).

      (1) Validation of the KO is insufficiently reported. From the methods, it appears that this was done thoroughly, but currently, only a single image of the arcuate nucleus is shown, and no image of the AVPV is shown. There is no quantitative information provided. The authors can keep these data as supplementary material, but they should be comprehensive and convincing, as so much depends on the degree of knockout in this model. One cannot assume complete KO based simply on the relevant genetics, as there are examples in this system where different Cre lines produce different outcomes with various floxed genes in the two major populations of kisspeptin neurons. This figure should show the quantitation of the RNAscope analysis from each of the two regions regarding the percentage of kisspeptin cells showing expression of MC4R mRNA. In addition, the lack of MC4 labelling in the arcuate nucleus, outside of kisspeptin neurons, is a concern. One would expect to see AgRP or POMC cells at this level, but are they still showing expression of MC4? A single image is insufficient to be convinced of the model's efficacy.

      We appreciate the reviewer’s concerns regarding the validation of the MC4RKO model. Below, we provide clarification and additional justification for our approach.

      (1) Quantification of MC4R in the Arcuate Nucleus (ARC): As noted by the reviewer, we were unable to detect sufficient MC4R signal in the ARC of KO mice to perform meaningful quantification. This is consistent with the expected outcome of a successful MC4R deletion. Given the low endogenous expression levels of MC4R in this region, even in control animals, and the technical limitations of RNAscope in detecting very low-abundance transcripts, especially for receptors, the absence of MC4R signal in the ARC of KO mice strongly supports effective deletion. Moreover, the MC4R loxP mouse has been published and validated by many labs including Brad Lowell’s lab who’s done extensive work using these mice for selective deletion of Mc4r from various neuronal populations such as Sim1 and Vglut2 neurons (Shah et al., 2014, de Souza Cordeiro et al., 2020). To further strengthen our validation, we provide additional images from another animal (Fig_S1) to illustrate the consistency of the MC4R KO in the ARC. These will be included as supplementary material, as suggested.Regarding AgRP and POMC neurons, MC4R is not highly expressed in these neurons (as per previous literature, e.g., Garfield et al., Nat Neurosci. 2015; Padilla SL et al, Endocrinology 2012; Henry et al, Nature, 2015). Instead, MC4R is predominantly found in downstream neurons in the paraventricular nucleus (PVN) and other hypothalamic regions (which is intact in our KO mice as shown in our validation figure). Thus, the absence of MC4R labeling in AgRP or POMC cells in our images aligns with known expression patterns and does not contradict the validity of our model.

      (2) MC4R Expression in the AVPV and OVX Effect on Kiss1 Expression: We acknowledge the reviewer’s request for MC4R expression analysis in the anteroventral periventricular nucleus (AVPV). However, due to the timing of tissue collection after ovariectomy (OVX), Kiss1 expression in the AVPV is significantly suppressed, making it technically unfeasible to perform co-staining of MC4R with Kiss1 in this region. This is a well-documented effect of estrogen depletion following OVX (Smith et al., 2005; Lehman et al., 2010). While we acknowledge that an ideal validation would include AVPV co-labeling, the experimental constraints related to OVX preclude this analysis in our dataset.

      Given these considerations and validations, we are confident that the KO is effective and specific.

      (2) Line 88: "... however, conflicting reports exist". Expand on this sentence to describe what these conflicting reports show. The authors responded to my comment but made no changes to the introduction. As a reader, I dislike being told there are conflicting reports, but then I have to go and look up the reference to see what that actual point of conflict is.

      By conflicting reports we meant that other studies have shown no association between MC4R and reproductive disorders, this has now been included in the revised manuscript (Line 89).

      (3) Could the authors explain how a decrease in AgRP would be interpreted as a "decrease in hypothalamic melanocortin tone" in line 142 and line 364? These overly simplistic interpretations of qPCR data detract from the overall quality of the paper.

      The reference to a decrease in melanocortin tone referred to the decrease in the expression of melanocortin receptor signaling, this has been clarified in the revised manuscript (lines 142 and 360).

      (4) Please show the individual cycle patterns for all animals, as in Figure 2B. This can be a supplemental figure, but the current bar charts are not informative.

      We respectfully disagree that the bar charts are not informative as they include the critical statistical analysis. We have now included all individual estrous cycle data in new separate supplemental figure (Sup. Figure 3). Therefore, we have excluded the representative cycles from the main figures as they are now in the new Supplemental. We have changed the orders of the figures in the text accordingly.

      (5) In their rebuttal, the authors state: "Mice lack true follicular and luteal phases, and therefore, it is impossible to separate estrogen-mediated changes from progesterone-mediated changes (e.g., in a proestrous female). Therefore, we use an ovariectomized female model in which we can generate an LH surge with an E2-replacement regimen [1]. This model enables us to focus on estrogen effects, exclude progesterone effects, and minimize variability. Inclusion of cycling females would make interpretation much more difficult." I disagree, but the authors can take this position if they wish. However, they should not report the responses to exogenous estradiol in an ovariectomised mouse as a "preovulatory LH surge" (line 380). An ovariectomised mouse cannot ovulate, and the estrogen-induced LH surge is significantly different in magnitude and timing from the endogenous preovulatory LH surge (likely due to the actions of progesterone). One goal of these studies is to understand why the ovulation rate appears to be low in the MC4-KO animals. Hence, evaluating whether the preovulatory LH surge is typical is important. This has not been done. The authors have shown that the response to exogenous estradiol is sub-normal. Such an effect might lead to a reduced preovulatory LH surge, but this has not been measured.

      We appreciate this reviewer’s concern about the nature of the preovulatory LH surge. We have clarified this in the revised manuscript and described it as “an induced LH surge” throughout the text (Lines 163, 533, 6560).

      (6) I believe that the ovulation process should be considered "all or none," and I do not quite understand the rebuttal discussion. The authors describe that "numerous follicles mature at the same time....". That is not disputed. My point was that each mature follicle will receive the identical endocrine ovulatory signal (correct? Or do the authors believe something different?). If it were sufficient for one follicle to ovulate, then all of those mature follicles (the number of which will be variable between animals and between cycles) would be expected to undergo ovulation. The fact that they do not raise several possibilities. One that the authors favor is that an insufficient ovulatory signal might approach a threshold where some follicles ovulate and others do not. This possibility is supported by the apparent increase in cystic follicles, which might be preovulatory follicles that did not complete the ovulation process. Such variation might be stochastic, within normal variation for sensitivity to LH. However, it is also possible that the follicles have not matured at the same rate, perhaps influenced by abnormal secretion of LH or FSH during earlier phases of the cycle, and hence are not in the appropriate condition to respond to the ovulation signal when it arrives. Some may even have matured prematurely due to the elevated gonadotropins reported in this study. Given the data and the partial fertility, the most likely explanation is that the genetic manipulation has resulted in fewer follicles being available for ovulation due to changes in follicular development rather than a deficit of the ovulation signal, although the latter mechanism might also contribute. A third possibility is that genetic manipulation has directly affected the ovary. The authors did not answer whether Kiss1 and MC4 are co-expressed in the ovary. I think the authors might want to rule this out by showing no change in MC4R expression in the ovary.

      We thank the reviewer for this thoughtful comment and agree that these are possible outcomes. We have now acknowledged them in the Discussion.

      To answer the reviewer’s question, we have not investigated the co-expression of Kiss1 and Mc4r in the ovary. While MC4R has indeed been documented in the ovary (Chen et al. Reproduction, 2017), the changes in gonadotropin release and supporting in vitro data included in this manuscript clearly document a central effect, however, an additional effect at the level of the ovary cannot be completely ruled out. This has now been added to the discussion (Line 378-387).

      (7) Lines 390, 454 " impaired LH pulse" What was the evidence for impaired LH pulse (see figure 2D)?

      Thank you for pointing this out. This comment referred to augmented LH release. This has been corrected in the revised manuscript (Line 394).

      The paper's strengths remain, as outlined in my original review. The authors have addressed what I perceived to be weaknesses, predominantly by changing the tone of discussion and interpretation of the data. This is appropriate. I consider the focus on the LH surge as the primary mechanism too narrow, and the authors should be considering how other changes during the cycle might influence ovarian function.

      We sincerely appreciate the reviewer’s thoughtful evaluation of our manuscript and their constructive feedback. We are pleased that our revisions have addressed the perceived weaknesses and that the adjustments to the discussion and interpretation were deemed appropriate.

      We acknowledge the reviewer’s perspective on broadening the discussion beyond the LH surge to consider additional cycle-dependent influences on ovarian function. While our current study focuses on this specific mechanism, we recognize that ovarian function is influenced by multiple physiological changes throughout the cycle. We have refined our discussion to reflect this broader context and appreciate the suggestion to consider these additional factors in future studies.

      We have addressed all of the reviewer’s comments to the best of our ability and hope they find the revised manuscript satisfactory.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      The authors track the motion of multiple consortia of Multicellular Magnetotactic Bacteria moving through an artificial network of pores and report a discovery of a simple strategy for such consortia to move fast through the network: an optimum drift speed is attained for consortia that swim a distance comparable to the pore size in the time it takes to align the with an external magnetic field. The authors rationalize their observations using dimensional analysis and numerical simulations. Finally, they argue that the proposed strategy could generalize to other species by demonstrating the positive correlation between the swimming speed and alignment time based on parameters derived from literature.

      Strengths:

      The underlying dimensional analysis and model convincingly rationalize the experimental observation of an optimal drift velocity: the optimum balances the competition between the trapping in pores at large magnetic fields and random pore exploration for weak magnetic fields.

      Weaknesses:

      The convex pore geometry studied here creates convex traps for cells, which I expect enhances their trapping. The more natural concave geometries, resulting from random packing of spheres, would create no such traps. In this case, whether a non-monotonic dependence of the drift velocity on the Scattering number would persist is unclear.

      We agree that convex walls increase the time that consortia remain trapped in pores at high magnetic fields. Since the non-monotonic behavior of the drift velocity with the Scattering number arises largely due to these long trapping times, we agree that experiments using concave pores are likely to show a peak drift velocity that is diminished or erased.

      However, we disagree that a random packing of spheres or similar particles provides an appropriate model for natural sediment, which is not composed exclusively of hard particles in a pure fluid. Pore geometry is also influenced by clogging. Biofilms growing within a network of convex pillars in two-dimensional microfluidic devices have been observed to connect neighboring pillars, thereby forming convex pores. Similar pore structures appear in simulations of biofilm growth between spherical particles in three dimensions. Moreover, the salt marsh sediment in which MMB live is more complex than simple sand grains, as cohesive organic particles are abundant. Experiments in microfluidic channels show that cohesive particles clog narrow passageways and form pores similar to those analyzed here. Thus, we expect convex pores to be present and even common in natural sediment where clogging plays a role.

      The concentration of convex pores in the experiments presented here is almost certainly much higher than in nature. Nonetheless, since magnetotactic bacteria continuously swim through the pore space, they are likely to regularly encounter such convexities. Efficient navigation of the pore space thus requires that magnetotactic bacteria be able to escape these traps. In the original version of this manuscript, this reasoning was reduced to only one or two sentences. That was a mistake, and we thank the reviewer for prompting us to expand on this point. As the reviewer notes, this reasoning is central to the analysis and should have been featured more prominently. In the final version, we will devote considerable space to this hypothesis and provide references to support the claims made above.

      The reviewer suggests that the generality of this work depends on our finding a ”positive correlation between the swimming speed and alignment [rate] based on parameters derived from literature.” We wish to emphasize that, in addition to predicting this correlation, our theory also predicts the function that describes it. The black line in Figure 3 is not fitted to the parameters found in the literature review; it is a pure prediction.

      Reviewer #2 (Public review):

      The authors have made microfluidic arrays of pores and obstacles with a complex shape and studied the swimming of multicellular magnetotactic bacteria through this system. They provide a comprehensive discussion of the relevant parameters of this system and identify one dimensionless parameter, which they call the scattering number and which depends on the swimming speed and magnetic moment of the bacteria as well as the magnetic field and the size of the pores, as the most relevant. They measure the effective speed through the array of pores and obstacles as a function of that parameter, both in their microfluidic experiments and in simulations, and find an optimal scattering number, which they estimate to reflect the parameters of the studied multicellular bacteria in their natural environment. They finally use this knowledge to compare different species to test the generality of this idea.

      Strengths:

      This is a beautiful experimental approach and the observation of an optimal scattering number (likely reflecting an optimal magnetic moment) is very convincing. The results here improve on similar previous work in two respects: On the one hand, the tracking of bacteria does not have the limitations of previous work, and on the other hand, the effective motility is quantified. Both features are enabled by choices of the experimental system: the use the multicellular bacteria which are larger than the usual single-celled magnetotactic bacteria and the design of the obstacle array which allows the quantification of transition rates due to the regular organization as well as the controlled release of bacteria into this array through a clever mechanism.

      Weaknesses:

      Some of the reported results are not as new as the authors suggest, specifically trapping by obstacles and the detrimental effect of a strong magnetic field have been reported before as has the hypothesis that the magnetic moment may be optimized for swimming in a sediment environment where there is a competition of directed swimming and trapping. Other than that, some of the key experimental choices on which the strength of the approach is based also come at a price and impose some limitations, namely the use of a non-culturable organism and the regular, somewhat unrealistic artificial obstacle array.

      In the “Recommendations for the Authors,” this reviewer drew our attention to a manuscript that absolutely should have been prominently cited. As the reviewer notes, our manuscript meaningfully expands upon this work. We are pleased to learn that the phenomena discussed here are more general than we initially understood. It was an oversight not to have found this paper earlier. The final version will better contextualize our work and give due credit to the authors. We sincerely appreciate the reviewer for bringing this work to our attention.

      We disagree that the use of non-culturable organisms and our unrealistic array should be considered serious weaknesses. While any methodological choice comes with trade-offs, we believe these choices best advance our aims. First, the goal of our research, both within and beyond this manuscript, is to understand the phenotypes of magnetotactic bacteria in nature. While using pure cultures enables many useful techniques, phenotypic traits may drift as strains undergo domestication. We therefore prioritize studying environmental enrichments.

      Clearly, an array of obstacles does not fully represent natural heterogeneity. However, using regular pore shapes allows us to average over enough consortium-wall collisions to enable a parameter-free comparison between theory and experiment. Conducting an analysis like this with randomly arranged obstacles would require averaging over an ensemble of random environments, which is practically challenging given the experimental constraints. Since we find good agreement between theory and experiment in simple geometries, we are now in a position to justify extending our theory to more realistic geometries. Additionally, we note that a microfluidic device composed of a random arrangement of obstacles would also be a poor representation of environmental heterogeneity, as pore shape and network topology differ between two and three dimensions.

      Recommendations for the Authors: 

      Reviewer #1 (Recommendations for the authors):

      My main suggestion is for the authors to describe the limitations of their approach in the case of concave pores.

      As we noted in our public comments, this was a very useful comment to hear from you and one that has been repeated as we have spoken about these results to colleagues. Convexities here represent an experimentally simple way to force bacteria to back track through the maze, as they must through natural sediment. We have greatly expanded this discussion to clarify this reasoning (lines 84–105). We provide references to three types of physical processes that may lead to such traps. First, as in figure 1 of Kurz et al, biofilm (white) can fill the spaces between convex pillars to create covexities. Additionally, clogging by cohesive particles can make narrow passageways between convex particles impassible. An example of clogging is shown in figure 6 of Dressaire & Sauret 2017. Finally, air bubbles trapped in the sediment can create pore-scale dead ends that require bacteria to backtrack. The full references are provided in the main text.

      Small points:

      (1) How many trajectories were used to produce Figures 2 b and c?

      We have modified the caption to note that these data represent the measured transition rates of a total 938 consortia at various Scattering numbers. Each consortium may pass between pores many times.

      (2) Can the authors describe in more detail how Equation (3) is derived? Why doesn’t it depend on the gap size between the pores?

      We have provided a derivation of this equation in Appendix 2 of the new version. This derivation shows that the drift velocity U<sub>drift</sub> is proportional to the pore diameter and difference between the transition rates.

      The proportionality constant α depends on how the pores are connected together in space. In the original version, we wanted to highlight the role of the asymmetry of the transition rates, so we imagined a one dimensional network of pores without gaps. In this case, α \= 1. This reasoning was poorly explained in the previous version and we thank the reviewer for pointing this deficiency out. In the new version, we include the gap size and use the layout of pores in a square lattice with gaps, which is shown in figure 1. The proportionality constant for a square lattice in the absence of gaps√ would be 1/2. The limitations of photolithography require some gap that increase the proportionality constant to α \= 0.8344.

      We have updated the text, equation (3), and the figures to account for the finite gap sizes.

      (3) I found the second part of the abstract, related to the comparison between diverse bacteria, to be slightly misleading. Upon first reading, my expectation was that the authors carried out experiments with different species.

      We have modified the abstract to make clear that we rely on values taken from a literature review.

      (4) More information is needed on how many trajectories were used to produce the probability densities in Figures 1b-d. How were the densities computed?

      The probability distributions give the probability that a pixel in a pore is covered by a consortium. They reflect between 1.2 and 7 million measurements (depending on the panel) of the instantaneous positions of consortia. We have added a section (Lines 453–469) to Materials and Methods that describes exactly how these distributions were calculated.

      Reviewer #2 (Recommendations for the authors):

      (1) As mentioned under Weaknesses in the Public review, some results are less new than claimed here. The existence of an optimal magnetic moment has been shown by Codutti et al eLife eLife13:RP98001 in very similar experiments, where it was also proposed that this may be an evolutionary adaptation to the sediment habitat. The paper here provides additional evidence for this, and with better tracking and quantification, but previous work should be discussed. Likewise, the work by Dekharghani et al. that is mentioned rather suddenly in the Results section appears to be a crucial previous state of the art and could already be mentioned in the introduction.

      We thank the reviewer for bringing this paper, which came out as we were writing this manuscript, to our attention. The hypothesis that there is an optimal phenotype that balances magnetotaxis with obstacle avoidance—and that natural selection could guide organisms to this optimum—goes back to at least 2022. It seems that Codutti et al independently came up with this same hypothesis and provided the first test.

      We have substantively rewritten the introduction (Lines 46–58) to better contextualize our work and give due attention to Dekharghani et al.

      (2) The first paragraph of Results also contains background information and could be moved into the introduction.

      As part of the rewrite to better contextualize our work, we moved the first two paragraphs of results to the introduction.

      (3) I found Figure 1 a bit confusing and it took me some time to understand the geometry. I think the black obstacles are very dominant to the viewer’s eye and draw attention away from the essentially circular shape of the pores. Likewise, I am not sure that cutting the neighboring pores off in a circular fashion in Figures 1b-d was the best choice. The authors should think about whether the presentation can be improved. Likewise, when describing the direction of the field in the text, I would suggest adding that it is along the horizontal direction in Figure 1.

      We have modified the figure and the text as the reviewer suggests.

      (4) That collisions with a pore wall are an important mechanism of changing direction is clear and it is nice to see the paper demonstrate that this mechanism is dominant over rotational diffusion. However, this may not be universal, as (i) rotational diffusion is more important for smaller cells and (ii) interaction with walls can result in all kinds of different behaviors than complete randomization (e.g. swimming along the walls as shown in microfluidic chambers, Ostapenko et al. Phys Rev Lett 2018, Codutti et al. eLife 2022, or reversals, Kuhn et al PNAS 2017). Here, it appears that complete randomization of the direction is an assumption, but this could be tested/quantified by analyzing the trajectories.

      This is an excellent point. We have modified the text to describe qualitatively how these tendencies would shift the Critical Scattering number. We also note in the text that there is evidence of these differences in Fig 3. The Desulfobacterota are shifted upwards in Fig 3 relative to the α-proteobacteria. This shift indicates that Desulfobacterota tend to live at slightly greater scattering numbers of 0.9±0.3 than the α-proteobacteria, which live at scattering number 0.37 ± 0.03. It is likely that this difference reflects taxonomic differences in rotational diffusion and cell-wall interactions.

      It is true that total randomization of the direction is indeed an assumption, and it is stated as such in line 189. We performed all of the numerics to find the solid curves in Fig 2 before we got any experimental data and so, at the time, total randomization seemed like a fair choice. Looking at Fig 2b, it is clear that these numerics systematically overestimate k<sub>−</sub>. We believe that this error is do to the assumption of total randomization.

      As this effect is small and does not change any of the conclusions of the paper and Codutti et al were able to publish their paper in the time that we were writing ours, we feel some urgency to move forward.

      (5) From the manuscript it is not fully clear to what extent experiments and simulations are or can be quantitatively compared. For example: is the curve (“fit”) in Figure 2c based on the simulations? Is there an explicit expression or is this just a spline or something like that? Why does Figure 5 (simulation) show the velocity as a function of Sc<sup>−1</sup>and Figure 2 (experiment) as a function of Sc? It looks to me as if a quantitative comparison could be achieved.

      The original version of Figure 2 shows a quantitative comparison between theory and experiment with no fit parameters. The data points are the result of experiments in which consortia are tracked as they as they move between connected pores. The solid line is a found by interpolating a smooth curve through the data from simulations. As we make clear in the new version (Lines 537–551), this blue curve is the most probable smooth curve that explains the simulations.

      We have added the simulations to figure 2 so that a single panel includes the data, the simulations, and the smooth curve. To further make clear that this comparison is quantitative and parameter free, we have added a panel to Figure 2. This panel directly compares the prediction to observation and is independent of the blue curve.

      As was noted (deep within the methods section) in the original version, our numerics can exactly simulate Sc = ∞. Consequently, it was reasonable to simulate parameters that are uniformly spaced in Sc<sup>−1</sup>.

      (6) While I like the idea behind Figure 3, the data shown here is not as convincing as suggested. If one looks at the data without the black line, I think one gets a weaker dependence. The correlation between U<sub>0</sub> and γ<sub>geo</sub> is likely not as strong as it seems. Calculating a correlation coefficient might be helpful here. In any case, the assumptions going into this figure should be discussed more explicitly and the results should in my opinion be phrased more cautiously (I tend to believe what the authors claim, but I don’t think the evidence for this point is very strong).

      We appreciate the reviewer’s skepticism. However, we believe that the data are stronger than one might understand from the previous text. We have rewritten the text (Lines 219–291) and included new analysis, figures, and explanation to make three points clear.

      (a) It is surprising that speed, magnetic moment, and mobility all vary tremendously(between one and three orders of magnitude) across taxa and environment, however, their dimensionless combination Sc is narrowly distributed. We have added a panel to Fig. 3 to show the measured Scattering numbers.

      It is notable that there are no adjusted parameters in the calculation of the Scattering numbers: it is a simple dimensionless combination of phenotypic and environmental parameters. All but one of these parameters (the pore size) is measured either by us or by other authors. The pore radius is likely narrowly distributed. We measure it at our field site and, when it is not reported, we use a value typical of the geological and fluvial environment. Just as the size of sand grains does not vary greatly between the beaches of Australia, Africa, and California, it is a good assumption that the pore spaces that host these magnetotactic bacteria do not vary tremendously in size.

      (b) In the new version we compare the Scattering number statistics to a parameterfree null model of phenotypic diversity. We argue in the text that it is appropriate to bootstrap over the phenotypic diversity of species. This null model provides the correct method to calculate p-values as the variability in the Scattering numbers is neither identically distributed nor normally distributed.

      We use this null model to show that—given the measured phenotypic diversity across species—the probability that fifteen random species would fall within the measured range of Scattering numbers that is consistent with optimal navigation is ∼ 10<sup>−6</sup>. This result is strong evidence that the phenotypic variables exhibit the correlations that are predicted by our analysis.

      (c) The correlation between U<sub>0</sub>/r and γ<sub>geo</sub> is reasonably strong. I think that our choice of axes in Fig 3, which were chosen to fit the legend, make the data look flatter than then they actually are. Here are the same data plotted without the line with tighter axes:

      Author response image 1.

      With the exception of the very first point and the very last point, the data appear to our eyes to be pretty correlated. This impression is born out by a calculation of the correlation coefficient which gives 0.77. The p-value is 4 × 10<sup>−4</sup>. We have included these values in the main text to clarify that this correlation is both statistically significant and of primary importance.

      (7) There is a comment at the end of the discussion that the evolutionary hypothesis could be tested by transferring the magnetotaxis genes to nonmagnetotactic organisms. This would indeed be highly desirable, but this is very difficult as indicated by the successful efforts in that direction (which often are only moderately magnetic/magnetotactic), see Kolinko et al Nature Nanotech 2014, Dziuba et al Nature Nanotech 2024.

      Thank you for highlighting these references, which we have included. We agree that these experiments will be challenging. Our results make a prediction about the evolution of these strains, so it seems worth mentioning this fact. We feel that this manuscript is not the correct space for a detailed description of challenges that we will encounter should we pursue this direction of study.

      (8) A section on how the bacterial samples were obtained could be added in Methods.

      We have done so.

      Additional Changes

      (1) In the original version, we feared that the consortia in the microfluidic device arepoorly representative of the natural population. Consequently, we used the values from previous experiments, which we performed using consortia taken from the same pond. Since submitting this manuscript we have undertaken new experiments that allowed us to measure the Scattering number of individual consortia. It turns out the effect is smaller than we worried. We have included these measurements in the new version. We find that even as the most common phenotypes vary over the course of time, the Scattering number remains constant. This result is additional evidence that there is strong selective pressure to optimally navigate.

      As a result of these additions, we have added an author, Julia Hernandez, who contributed to these experiments and analysis.

      (2) We have expanded the table of phenotypic variable in Appendix 1 to make it easier forother researchers to reproduce our analysis.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This manuscript reports the investigation of PriC activity during DNA replication initiation in Escherichia coli. It is reported that PriC is necessary for the growth and control of DNA replication initiation under diverse conditions where helicase loading is perturbed at the chromosome origin oriC. A model is proposed where PriC loads helicase onto ssDNA at the open complex formed by DnaA at oriC. Reconstituted helicase loading assays in vitro support the model. The manuscript is well-written and has a logical narrative.

      Thank you for understanding this study.

      Major Questions/Comments:

      An important observation here is that a ΔpriC mutant alone displays under-replication, suggesting that this helicase loading pathway is physiologically relevant. Has this PriC phenotype been reported previously? If not, would it be possible to confirm this result using an independent experimental approach (e.g. marker frequency analysis or fluorescent reporter-operator systems)?

      We thank Reviewer 1 for this comment. This study provides the first direct evidence for PriC’s role in initiation of chromosome replication. Given the change of the oriC copy number of ∆priC cells in non-stressed conditions is only slight, resolution of the suggested methods is clearly not high enough to distinguish the differences in the oriC copy number between priC<sup>+</sup> (WT) and ∆priC cells. Thus, to corroborate the ∆priC phenotype, we additionally analyzed using flow cytometry priC<sup>+</sup> and ∆priC cells growing under various nutrition and thermal conditions.

      As shown in Figure 2-figure supplement 1 of the revised version, the fraction of cells with non-2<sup>n</sup> oriC copies was slightly higher in ∆priC cells compared to priC<sup>+</sup> cells. Furthermore, when grown in M9 minimal medium at 37˚C, ∆priC mutant cells exhibited slightly reduced ori/mass values. These are supportive to the idea that inhibition of replication initiation occurs at low frequency even in the WT dnaA and dnaC background, and that PriC function is necessary to ensure normal replication initiation. Related descriptions have been revised accordingly.

      Is PriA necessary for the observed PriC activity at oriC? Is there evidence that PriC functions independently of PriA in vivo?

      As described in Introduction of the original manuscript, PriA is a 3’-to-5’ helicase which specifically binds to the forked DNA with the 3’-end of the nascent DNA strand. Thus, structural specificity of target DNA is essentially different between PriA and PriC. Consistent with this, our in vitro data indicate that PriC alone is sufficient to rescue the abortive helicase loading at oriC (Figure 7), indicating that PriA is principally unnecessary for PriC activity at oriC. Consistently, as described in Introduction, PriC can interact with ssDNA to reload DnaB (Figure 1E). Nevertheless, a possibility that PriA might participate in the PriC-dependent DnaB loading rescue at oriC in vivo can not be completely excluded. However, elucidation of this possibility is clearly beyond the scope of the present study and should be analyzed in the future. An additional explanation has been included in Discussion of the revised version.

      Is PriC helicase loading activity in vivo at the origin direct (the genetic analysis leaves other possibilities tenable)? Could PriC enrichment at oriC be detected using chromatin immunoprecipitation?

      These are advanced questions about genomic dynamics of PriC. Given that PriC facilitates DnaB reloading at stalled replication forks (Figure 1E) (Heller and Marians, Mol Cell., 2005; Wessel et al., J Biol Chem, 2013; Wessel et al., J Biol Chem, 2016), PriC might interact with the whole genome and its localization might not necessarily exhibit a preference for oriC in growing cells. Analysis about these advanced questions is interesting but is beyond the scope of the present study and should be analyzed in the future study.

      Reviewer #2 (Public review):

      This is a great paper. Yoshida et al. convincingly show that DnaA does not exclusively do loading of the replicative helicase at the E. coli oriC, but that PriC can also perform this function. Importantly, PriC seems to contribute to helicase loading even in wt cells albeit to a much lesser degree than DnaA. On the other hand, PriC takes a larger role in helicase loading during aberrant initiation, i.e. when the origin sequence is truncated or when the properties of initiation proteins are suboptimal. Here highlighted by mutations in dnaA or dnaC.

      This is a major finding because it clearly demonstrates that the two roles of DnaA in the initiation process can be separated into initially forming an open complex at the DUE region by binding/nucleation onto DnaA-boxes and second by loading of the helicase. Whereas these two functions are normally assumed to be coupled, the present data clearly show that they can be separated and that PriC can perform at least part of the helicase loading provided that an area of duplex opening is formed by DnaA. This puts into question the interpretation of a large body of previous work on mutagenesis of oriC and dnaA to find a minimal oriC/DnaA complex in many bacteria. In other words, mutants in which oriC is truncated/mutated may support the initiation of replication and cell viability only in the presence of PriC. Such mutants are capable of generating single-strand openings but may fail to load the helicase in the absence of PriC. Similarly, dnaA mutants may generate an aberrant complex on oriC that trigger strand opening but are incapable of loading DnaB unless PriC is present.

      We would like to thank Revierwer#2 for the very positive comments about our work.

      In the present work, the sequence of experiments presented is logical and the manuscript is clearly written and easy to follow. The very last part regarding PriC in cSDR replication does not add much to the story and may be omitted.

      Given that the role PriC in stimulating cSDR was unclear, we believe that our finding that PriC has little or no role in cSDR, despite being a negative result, is valuable for the general readership of eLife. To further assess impact of PriC on cSDR and as recommended by Referee #1, we carried out the chromosome loci copy-number analysis by the whole-genome sequencing. As shown in Figure 8-supplement 1 of the revised version, the results support our conclusion from the original version.

      Reviewer #3 (Public review):

      Summary:

      At the abandoned replication fork, loading of DnaB helicase requires assistance from PriABC, repA, and other protein partners, but it does not require replication initiator protein, DnaA. In contrast, nucleotide-dependent DnaA binding at the specific functional elements is fundamental for helicase loading, leading to the DUE region's opening. However, the authors questioned in this study that in case of impeding replication at the bacterial chromosomal origins, oriC, a strategy similar to an abandoned replication fork for loading DnaB via bypassing the DnaA interaction step could be functional. The study by Yoshida et al. suggests that PriC could promote DnaB helicase loading on the chromosomal oriC ssDNA without interacting with the DnaA protein. However, the conclusions drawn from the primarily qualitative data presented in the study could be slightly overwhelming and need supportive evidence.

      Thank you for your understanding and careful comments.

      Strengths:

      Understanding the mechanism of how DNA replication restarts via reloading the replisomes onto abandoned DNA replication forks is crucial. Notably, this knowledge becomes crucial to understanding how bacterial cells maintain DNA replication from a stalled replication fork when challenging or non-permissive conditions prevail. This critical study combines experiments to address a fundamental question of how DnaB helicase loading could occur when replication initiation impedes at the chromosomal origin, leading to replication restart.

      Thank you for your understanding.

      Weaknesses:

      The term colony formation used for a spotting assay could be misleading for apparent reasons. Both assess cell viability and growth; while colony formation is quantitative, spotting is qualitative. Particularly in this study, where differences appear minor but draw significant conclusions, the colony formation assays representing growth versus moderate or severe inhibition are a more precise measure of viability.

      We used serial dilutions of the cell culture for the spotting assay and thus this assay should be referred as semi-quantitative rather than simply qualitative. For more quantitative assessment of viability, we analyzed the growth rates of cells and the chromosome replication activity using flow cytometry.

      Figure 2

      The reduced number of two oriC copies per cell in the dnaA46priC-deficient strain was considered moderate inhibition. When combined with the data suggested by the dnaAC2priC-deficient strain containing two origins in cells with or without PriC (indicating no inhibition)-the conclusion was drawn that PriC rescue blocked replication via assisting DnaC-dependent DnaB loading step at oriC ssDNA.

      The results provided by Saifi B, Ferat JL. PLoS One. 2012;7(3):e33613 suggests the idea that in an asynchronous DnaA46 ts culture, the rate by which dividing cells start accumulating arrested replication forks might differ (indicated by the two subpopulations, one with single oriC and the other with two oriC). DnaA46 protein has significantly reduced ATP binding at 42C, and growing the strain at 42C for 40-80 minutes before releasing them at 30 C for 5 minutes has the probability that the two subpopulations may have differences in the active ATP-DnaA. The above could be why only 50% of cells contain two oriC. Releasing cells for more time before adding rifampicin and cephalexin could increase the number of cells with two oriCs. In contrast, DnaC2 cells have inactive helicase loader at 42 C but intact DnaA-ATP population (WT-DnaA at 42 or 30 C should not differ in ATP-binding). Once released at 30 C, the reduced but active DnaC population could assist in loading DnaB to DnaA, engaged in normal replication initiation, and thus should appear with two oriC in a PriC-independent manner.

      This is a question about dnaA46 Δ_priC_ mutant cells. Inhibition of the replication forks causes inhibition of RIDA (the DNA-clamp complex-dependent DnaA-ATP hydrolysis) system, resulting in the increase of ATP-DnaA molecules (Kurokawa et al. (1999) EMBO J.). Thus, if Δ_priC_ inhibits the replication forks significantly, the ATP-DnaA level should increase and initiation should be stimulated. However, the results of Figure 2BC are opposite, indicating inhibition of initiation by Δ_priC_. Thus, we infer that the inhibition of initiation in the Δ_priC_ cells is not related to possible changes in the ATP-DnaA level. Even if the ATP-DnaA levels are different in subpopulations in dnaA46 cells, Δ_priC_ mutation should not affect the ATP-DnaA levels significantly. Thus, we infer that even in dnaA46 Δ_priC_ mutant cells, Δ_priC_ mutation directly affect initiation mechanisms, rather than indirectly through the ATP-DnaA levels.

      Broadly, the evidence provided by the authors may support the primary hypothesis. Still, it could call for an alternative hypothesis: PriC involvement in stabilizing the DnaA-DnaB complex (this possibility could exist here). To prove that the conclusions made from the set of experiments in Figures 2 and 3, which laid the foundations for supporting the primary hypothesis, require insights using on/off rates of DnaB loading onto DnaA and the stability of the complexes in the presence or absence of PriC, I have a few other reasons to consider the latter arguments.

      This is a very careful consideration. However, we infer that stabilization of the DnaA-DnaB interaction by PriC, even if present, does not always result in stimulation of DnaB loading to oriC. Given that interactions between DnaA and DnaB during DnaB loading to oriC are highly dynamic and complicated with multiple steps, stabilization of the DnaA-DnaB interaction by PriC, even if it occurs, has a considerable risk of inhibiting the DnaB loading by constructing abortive complexes. In addition, DnaA-DiaA binding is very tight and stable (Keyamura et al., 2007, 2009). Even if WT DnaA and WT DnaB are present, PriC can rescue the initiation defects of oriC mutants. Based on these facts and the known characteristics of PriC as explained in Introduction, it is more reasonable to infer that PriC provides a bypass of DnaB loading even at oriC, as proposed for the mechanism at the stalled replication fork. However, we cannot completely rule out the indicated possibility and these explanations are included in the revised version.

      Figure 3

      One should consider the fact that dnA46 is present in these cells. Overexpressing pdnaAFH could produce mixed multimers containing subunits of DnaA46 (reduced ATP binding) and DnaAFH (reduced DnaB binding). Both have intact DnaA-DnaA oligomerization ability. The cooperativity between the two functions by a subpopulation of two DnaA variants may compensate for the individual deficiencies, making a population of an active protein, which in the presence of PriC could lead to the promotion of the stable DnaA: DnaBC complexes, able to initiate replication. In the light of results presented in Hayashi et al. and J Biol Chem. 2020 Aug 7;295(32):11131-11143, where mutant DnaBL160A identified was shown to be impaired in DnaA binding but contained an active helicase function and still inhibited for growth; how one could explain the hypothesis presented in this manuscript. If PriC-assisted helicase loading could bypass DnaA interaction, then how growth inhibition in a strain carrying DnaBL160A should be described. However, seeing the results in light of the alternative possibility that PriC assists in stabilizing the DnaA: DnaBC complex is more compatible with the previously published data.

      Unfortunately, in this comment, there is a crucial misunderstanding in the growth of cells bearing DnaA L160A. Hayashi et al. reported that the dnaB(Ts) cells bearing the dnaB L160A allele grew slowly and formed colonies even at 42°C. This feature is similar to the growth of dnaA46 cells bearing dnaA F46A H136A allele (Figure 2). Thus, the results of dnaB L160A cells are consistent with our model and support the idea that PriC partially rescues the growth inhibition of cells bearing the DnaB L160A allele by bypassing the strict requirement for the DnaA-DnaB interaction. Nevertheless, we have to be careful about a possibility that DnaB L160A could affect interaction with PriC, which we are going to investigate for a future paper.

      As suggested, if mixed complexes of DnaA46 and DnaA F46A H136A proteins are formed, those might retain partial activities in oriC unwinding and DnaB interaction although those cells are inviable at 42°C without PriC. It is noteworthy that in the specific oriC mutants which are impaired in DnaB loading (e.g., Left-oriC), PriC effectively rescues the initiation and cell growth. In these cells, both DnaA and DnaB are intact. Thus, the idea that only mutant DnaA (or DnaB) protein is simulated specifically via PriC interaction is invalid. Even in cells bearing wild-type oriC, DnaA and DnaB, contribution of PriC for initiation is detected.

      In addition, as described in the above response, given that interactions between DnaA and DnaB during DnaB loading to oriC are very dynamic and complicated with multiple steps, stabilization of the DnaA-DnaB interaction by PriC, even if present, would not simply result in stimulation of DnaB loading to oriC; rather we think a probability that it would inhibit the DnaB loading by constructing abortive complexes. Based on the known characteristics of PriC as explained in Introduction, it is more reasonable to infer that PriC provides a bypass of DnaB loading even at oriC, as proposed for the mechanism at the stalled replication fork.

      However, we cannot completely rule out the indicated possibility and this explanation has been described in the revised version as noted in response to the above question.

      Figure 4

      Overexpression of DiaA could contribute to removing a higher number of DnaA populations. This could be more aggravated in the absence of PriC (DiaA could titrate out more DnaA)-the complex formed between DnaA: DnaBC is not stable, therefore reduced DUE opening and replication initiation leading to growth inhibition (Fig. 4A ∆priC-pNA135). Figure 7C: Again, in the absence of PriC, the reduced stability of DnaA: DnaBC complex leaves more DnaA to titrate out by DiaA, and thus less Form I*. However, adding PriC stabilizes the DnaA: DnaBC hetero-complexes, with reduced DnaA titration by DiaA, producing additional Form I*. Adding a panel with DnaBL160A that does not interact with DnaA but contains helicase activity could be helpful. Would the inclusion of PriC increase the ability of mutant helicase to produce additional Form I*?

      Unfortunately, the proposed idea is biased disregarding the fact that DiaA effectively stimulates assembling processes of DnaA molecules at oriC. As oriC contains multiple DnaA boxes and multiple DnaA molecules are recruited there, DiaA will efficiently facilitate assembling of DnaA molecules on oriC. Even DnaA molecules of DnaA-DiaA complexes can efficiently bind to oriC. This is consistent with in vitro experiments showing that higher levels of DiaA stimulate assembly of DnaA molecules and oriC unwinding (i.e., DUE opening) but even excessive levels of DiaA do not inhibit those reactions (Keyamura et al., J. Biol. Chem. (2009) 284, 25038-25050). However, as shown in Figure 9, DiaA tightly binds to the specific site of DnaA which is the same as the DnaB L160-binding site, which causes inhibition of DnaA-DnaB binding (ibid). These are consistent with in vivo experiments, and concordantly consistent with the idea that the excessive DiaA level inhibits interaction and loading of DnaB by the DnaA-oriC complexes, but not oriC unwinding (i.e., DUE opening) in vivo. Also, as mentioned above, we do not consider that stabilization of DnaA-DnaBC complex simply results in stimulation of DnaB loading to oriC. Based on the known characteristics of PriC, it is more reasonable to infer that PriC provides a bypass of DnaB loading even at oriC, as proposed for the mechanism at the stalled replication fork (Figure 1E), as described in the above response.

      As for DnaB L160A, as mentioned above, we are currently investigating interaction modes between DnaB and PriC. While investigating DnaB L160A could further support our model, we believe its contribution to the present manuscript would be incremental. In addition, there is a possibility that DnaA L160A could affect interaction with PriC. Thus, analysis of DnaB mutants in this PriC rescue mechanisms should be addressed in future study.

      Figure 5

      The interpretation is that colony formation of the Left-oriC ∆priC double mutant was markedly compromised at 37˚C (Figure 5B), and 256 the growth defects of the Left-oriC mutant at 25{degree sign}C and 30{degree sign}C were aggravated. However, prima facia, the relative differences in the growth of cells containing and lacking PriC are similar. Quantitative colony-forming data is required to claim these results. Otherwise, it is slightly confusing.

      The indicated concern was raised due to our typing error lacking ∆priC. In the revised manuscript, we have amended as follows: the cell growth of the Left-oriCpriC double mutant was markedly compromised at 37˚C and moderately reduced at 25°C and 30°C (Figure 5B).

      A minor suggestion is to include cells expressing PriC using plasmid DNA to show that adding PriC should reverse the growth defect of dnaA46 and dnaC2 strains at non-permissive temperatures. The same should be added at other appropriate places.

      Even in the presence of PriC, unwinding of oriC and DnaB helicase loading to the wound oriC require DnaA and DnaC activities as indicated by previous studies (see for a review, Windgassen et al., (2018) Nucleic Acids Res. 46, 504-519). Thus, dnaA46 cells and dnaC2 cells bearing pBR322-priC can not grow at 42°C and 37°C (as follows). These are reasonable results. However, at semi-permissive temperatures (37°C for dnaA46 and 35°C for dnaC2), slight stimulation of the cell growth by pBR322-priC might be barely observed (Figure 2-supplement 1 of the revised version). These suggest that the intrinsic level of PriC is functionally nearly sufficient. This explanation has been included in the revised version.

      Author response image 1.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Line 38. "in assembly of the replisome".

      Corrected.

      Line 137. "specifically" rather than specificity.

      Corrected.

      Line 139. "at" rather than by.

      Corrected.

      The DnaA46 protein variant contains two amino acid substitutions (A184V and H252Y) within the AAA+ motif. H136 appears to reside adjacent to A184 in structure. Is A184V mutation causative?

      The DnaA H136A and A184V alleles are responsible for different defects. Indeed, the DnaA A184V variant is thermolabile and defective in ATP binding whereas the H136A variant retains ATP binding but impairs DnaB loading (Carr and Kaguni, Mol. Microbiol., 1996; Sakiyama et al., Front. Microbiol., 2018). These observations strongly support the view that the phenotype of the DnaA H136A allele is independent of that of the DnaA A184V allele.

      Figure 2A. Regarding the dnaA46 allele grown at 37°C.

      Individual colonies cannot be resolved. Is an image from a later time-point available?

      We have replaced the original image with one from another replicate that provides better resolution. Please see Figure 2A in the revised version.

      Figure 2C. Quantification of the number of cells with more than one chromosome equivalent in the dnaC2 ΔpriC strain. The plot from flow cytometry appears to show >20% of cells with only 1 genome. Are these numbers correct?

      Thank you for this careful comment. We quantified the peaks more strictly, but the percentages were noy largely changed. To improve resolution of the DNA profiles, we have changed the range of the x-axis in panels B and C of Figure 2 in the revised version.

      Figure 3. Are both F46A and H136A mutations in the plasmid-encoded dnaA necessary?

      Yes. The related explanation is included in the Discussion section (the third paragraph) of the original manuscript. As described there, dnaA46 cells expressing the DnaA H136A single mutant exhibited severe defects in cell growth even in the presence of PriC (Sakiyama et al., 2018). The His136 residue is located within the weak, secondary DnaB interaction region in DnaA, and is crucial for DnaB loading onto oriC ssDNA. Given domain I in DnaA H136A can stably tether DnaB-DnaC complexes to DnaA complexes on oriC (Sakiyama et al., 2018), we infer that oriC-DnaA complexes including DnaA H136A stably bind DnaB via DnaA domain I as an abortive complex, which inhibits functional interaction between PriC and DnaB as well as DnaB loading to oriC DNA.

      As for DnaA F46A mutant, our previous studies show that DnaA F46A has a limited residual activity in vivo (unlike in vitro), and allows slow growth of cells. As the stable DnaA-DnaB binding is partially impaired in vivo in DnaA F46A, this feature is consistent with the above ideas. Thus, both F46A and H136A mutations are required for severer inhibition of DnaB loading. This is additionally described in the revised Discussion.

      Figure 3. Is the DnaA variant carrying F46A and H136A substitutions stably expressed in vivo?

      We have performed western blotting, demonstrating that the DnaA variant carrying F46A and H136A substitutions is stable in vivo. In the revised version, we have added new data to Figure 3-figure supplement 1 and relevant description to the main text as follows:

      Western blotting demonstrated that the expression levels were comparable between WT DnaA and DnaA F46A H136A double mutant (Figure 3-figure supplement 1).

      Figure 5A. Should the dashed line extending down from I2 reach the R4Tma construct?

      We have amended the indicated line appropriately.

      Figure 6C. It was surprising that the strain combining the subATL mutant with ΔpriC displayed a pronounced under-initiation profile by flow cytometry, and yet there was no growth defect observed (see Figure 6B). This seems to contrast with results using the R4Tma origin, where the ΔpriC mutant produced a relatively modest change to the flow cytometry profile, and yet growth was perturbed (Figure 5C-D). How might these observations be interpreted? Is the absolute frequency of DNA replication initiation critical?

      Please note that, in E. coli, initiation activity corelates closely with the numbers of oriC copies per cell mass (ori/mass), rather than the apparent DNA profiles measured by flow cytometer. When cells were grown in LB at 30˚C, the mean ori/mass values were as follows: 0.34 for R4Tma priC, 0.51 for R4Tma, 0.82 for DATL priC, 0.99 for DATL (Figures 5 & 6 in the original manuscript). These values closely correspond to the cell growth ability shown in Figure 5C in the original manuscript.

      In the revised manuscript, we have cited appropriate references for introduction of the ori/mass values as follows.

      To estimate the number of oriC copies per unit cell mass (ori/mass) as a proxy for initiation activity (Sakiyama et al., 2017, 2022),

      Line 295. Reference for Form I* assay should cite the original publication.

      Done. The following paper is additionally cited.

      Baker, T. A., Sekimizu, K., Funnell, B. E., and Kornberg, A. (1986). Extensive unwinding of the plasmid template during staged enzymatic initiation of DNA replication from the origin of the Escherichia coli chromosome. Cell 45, 53–64.doi: 10.1016/0092-8674(86)90537-4

      Reviewer #2 (Recommendations for the authors):

      The partial complementation of the dnaC2 strain by PriC seems quite straightforward since this particular mutation leads to initiation arrest at the open complex stage and this sets the stage for PriC to load the helicase. The situation is somewhat different for dnaA46. Why is this mutation partly complemented by PriC at 37C? DnaA46 binds neither ATP nor ADP, yet it functions in initiation at permissive temperature. At nonpermissive temperature, it binds oriC as well but does not lead to initiation. Does the present data imply that the true initiation defect of DnaA46 lies in helicase loading? The authors need to comment on this in the text.

      Given the thermolabile propensity of the DnaA46 protein, it is presumable that DnaA46 protein becomes partially denatured at the sub-permissive temperature of 37˚C. This partial denaturation should impair both origin unwinding and helicase loading, though not to the extent that cell viability is lost. The priC deletion should further exacerbate helicase loading defects by inhibiting the bypass mechanism, resulting in the lethality of dnaA46 cells at this temperature. This explanation is included in the revised Discussion section.

      Relating to the above. In Figure 3 it is shown that the pFH plasmid partly complements dnaA46 in a PriC-dependent manner. Again, it would be nice to know the nature of the DnaA46 protein defect. It would be interesting to see how a pING1-dnaA46 plasmid performs in the experiment presented in Figure 3.

      A previous paper showed that multicopy supply of DnaA46 can suppress temperature sensitivity of the dnaA46 cells (Rao and Kuzminov, G3, 2022). This is reasonable in that DnaA46 has a rapid degradation rate unlike wild-type DnaA. As DnaA46 preserves the intact sequences in DnaB binding sites such as G21, F46 and H136, the suppression would not depend on PriC but would be due to the dosage effect.

      Figure 8 B: The authors should either remove the data or show a genome coverage: it is not clear that yapB is a good reference. A genome coverage would be nice, and show whether initiation can occur at oriC even if it is not the major place of initiation in a rnhA mutant.

      As suggested, we carried out the chromosome loci copy-number analysis by whole-genome sequencing to assess impact of PriC on cSDR. The new data are shown in Figure 8-supplement 1 with relevant descriptions of the main text of the revised version as shown below. Briefly, results of the chromosome loci copy-number analysis are consistent with those of real-time qPCR (Figure 8B). Given that the role PriC in stimulating cSDR was unclear, we believe that our finding that PriC has little or no role in cSDR, despite being a negative result, is valuable for the general readership of eLife.

      Line 38-39: .....resulting in replisome assembly.

      Corrected.

      Line 48: Something is wrong with the Michel reference. Also in the reference list.

      Corrected

      Line 156: replace retarded with reduced.

      Corrected.

      Line 171 and elsewhere: WT priC cells is somewhat misleading. Isn't this simply PriC+ cells?

      Yes. We have revised the wording to “priC<sup>+</sup>” for clarity.

      Line 349-350: "the oriC copy number ratio of the dnaA46 DpriC double mutant was lower than that of the dnaA46 single mutant....". This is only provided growth rate of the strains is the same.

      These strains exhibited similar growth rates. This is included in the Result section of the revised manuscript as follows: At the permissive temperature, despite having similar growth rates, the oriC copy number ratio of the dnaA46priC double mutant strain was lower than that of the dnaA46 single mutant.

      Reviewer #3 (Recommendations for the authors):

      I would suggest improved or additional experiments, data, or analyses.

      The revised version includes improved or additional experiments, data, or analyses.

    1. Author response:

      The following is the authors’ response to the original reviews

      ANALYTICAL

      (1) A key claim made here is that the same relationship (including the same parameter) describes data from pigeons by Gibbon and Balsam (1981; Figure 1) and the rats in this study (Figure 3). The evidence for this claim, as presented here, is not as strong as it could be. This is because the measure used for identifying trials to criterion in Figure 1 appears to differ from any of the criteria used in Figure 3, and the exact measure used for identifying trials to criterion influences the interpretation of Figure 3***. To make the claim that the quantitative relationship is one and the same in the Gibbon-Balsam and present datasets, one would need to use the same measure of learning on both datasets and show that the resultant plots are statistically indistinguishable, rather than simply plotting the dots from both data sets and spotlighting their visual similarity. In terms of their visual characteristics, it is worth noting that the plots are in log-log axis and, as such, slight visual changes can mean a big difference in actual numbers. For instance, between Figure 3B and 3C, the highest information group moves up only "slightly" on the y-axis but the difference is a factor of 5 in the real numbers. Thus, in order to support the strong claim that the quantitative relationships obtained in the Gibbon-Balsam and present datasets are identical, a more rigorous approach is needed for the comparisons.

      ***The measure of acquisition in Figure 3A is based on a previously established metric, whereas the measure in Figure 3B employs the relatively novel nDKL measure that is argued to be a better and theoretically based metric. Surprisingly, when r and r2 values are converted to the same metric across analyses, it appears that this new metric (Figure 3B) does well but not as well as the approach in Figure 3A. This raises questions about why a theoretically derived measure might not be performing as well on this analysis, and whether the more effective measure is either more reliable or tapping into some aspect of the processes that underlie acquisition that is not accounted for by the nDKL metric.

      Figure 3 shows that the relationship between learning rate and informativeness for our rats was very similar to that shown with pigeons by Gibbon and Balsam (1981). We have used multiple criteria to establish the number of trials to learn in our data, with the goal of demonstrating that the correspondence between the data sets was robust. In the revised Figure 3, specifically 3C and 3D, we have plotted trials to acquisition using decision criterion equivalent to those used by Gibbon and Balsam. The criterion they used—at least one peck at the response key on at least 3 out of 4 consecutive trials—cannot be directly applied to our magazine entry data because rats make magazine entries during the inter-trial interval (whereas pigeons do not peck at the response key in the inter-trial interval). Therefore, evidence for conditioning in our paradigm must involve comparison between the response rate during CS and the baseline response rate, rather than just counting responses during the CS. We have used two approaches to adapt the Gibbon and Balsam criterion to our data. One approach, plotted in Figure 3C, uses a non-parametric signed rank test for evidence that the CS response rate exceeds the pre-CS response rate, and adopting a statistical criterion equivalent to Gibbon and Balsam’s 3-out-of-4 consecutive trials (p<.3125). The second method (Figure 3D) estimates the nDkl for the criterion used by Gibbon and Balsam and then applies this criterion to the nDkl for our data. To estimate the nDkl of Gibbon and Balsam’s data, we have assumed there are no responses in the inter-trial interval and the response probability during the CS must be at least 0.75 (their criterion of at least 3 responses out of 4 trials). The nDkl for this difference is 2.2 (odds ratio 27:1). We have then applied this criterion to the nDkl obtained from our data to identify when the distribution of CS response rates has diverged by an equivalent amount from the distribution of pre-CS response rates. These two analyses have been added to the manuscript to replace those previously shown in Figures 3B and 3C.

      (2) Another interesting claim here is that the rates of responding during ITI and the cue are proportional to the corresponding reward rates with the same proportionality constant. This too requires more quantification and conceptual explanation. For quantification, it would be more convincing to calculate the regression slope for the ITI data and the cue data separately and then show that the corresponding slopes are not statistically distinguishable from each other. Conceptually, it is not clear why the data used to test the ITI proportionality came from the last 5 conditioning sessions. What were the decision criteria used to decide on averaging the final 5 sessions as terminal responses for the analyses in Figure 5? Was this based on consistency with previous work, or based on the greatest number of sessions where stable data for all animals could be extracted?

      If the model is that animals produce response rates during the ITI (a period with no possible rewards) based on the overall rate of rewards in the context, wouldn't it be better to test this before the cue learning has occurred? Before cue learning, the animals would presumably only have attributed rewards in the context to the context and thus, produce overall response rates in proportion to the contextual reward rate. After cue learning, the animals could technically know that the rate of rewards during ITI is zero. Why wouldn't it be better to test the plotted relationship for ITI before cue learning has occurred? Further, based on Figure 1, it seems that the overall ITI response rate reduces considerably with cue learning. What is the expected ITI response rate prior to learning based on the authors' conceptual model? Why does this rate differ from pre and post-cue learning? Finally, if the authors' conceptual framework predicts that ITI response rate after cue learning should be proportional to contextual reward rate, why should the cue response rate be proportional to the cue reward rate instead of the cue reward rate plus the contextual reward rate?

      A single regression line, as shown in Figure 5, is the simplest possible model of the relationship between response rate and reinforcement rate and it explains approximately 80% of the variance in response rate. Fixing the log-log slope at 1 yields the maximally simple model. (This regression is done in the logarithmic domain to satisfy the homoscedasticity assumption.) When transformed into the linear domain, this model assumes a truly scalar relation (linear, intercept at the origin) and assumes the same scale factor and the same scalar variability in response rates for both sets of data (ITI and CS). Our plot supports such a model. Its simplicity is its own motivation (Occam’s razor).

      If separate regression lines are fitted to the CS and ITI data, there is a small increase in explained variance (R<sub>2</sub> = 0.82). These regression lines have been added to the plot in the revised manuscript (Figure 5). We leave it to further research to determine whether such a complex model, with 4 parameters, is required. However, we do not think the present data warrant comparing the simplest possible model, with one parameter, to any more complex model for the following reasons:

      · When a brain—or any other machine—maps an observed (input) rate to a rate it produces (output rate), there is always an implicit scalar. In the special case where the produced rate equals the observed rate, the implicit scalar has value 1. Thus, there cannot be a simpler model than the one we propose, which is, in and of itself, interesting.

      · The present case is an intuitively accessible example of why the MDL (Minimum Description Length) approach to model complexity (Barron, Rissanen, & Yu, 1998; Grünwald, Myung, & Pitt, 2005; Rissanen, 1999) can yield a very different conclusion from the conclusion reached using the Bayesian Information Criterion (BIC) approach. The MDL approach measures the complexity of a model when given N data specified with precision of B bits per datum by computing (or approximating) the sum of the maximum-likelihoods of the model’s fits to all possible sets of N data with B precision per datum. The greater the sum over the maximum likelihoods, the more complex the model, that is, the greater its measured wiggle room, it’s capacity to fit data. Recall that von Neuman remarked to Fermi that with 4 parameters he could fit an elephant. His deeper point was that multi-parameter models bring neither insight nor predictive power; they explain only post-hoc, after one has adjusted their parameters in the light of the data. For realistic data sets like ours, the sums of maximum likelihoods are finite but astronomical. However, just as the Sterling approximation allows one to work with astronomical factorials, it has proved possible to develop readily computable approximations to these sums, which can be used to take model complexity into account when comparing models. Proponents of the MDL approach point out that the BIC is inadequate because models with the same number of parameters can have very different amounts of wiggle room. A standard illustration of this point is the contrast between logarithmic model and power-function model. Log regressions must be concave; whereas power function regressions can be concave, linear, or convex—yet they have the same number of parameters (one or two, depending on whether one counts the scale parameter that is always implicit). The MDL approach captures this difference in complexity because it measures wiggle room; the BIC approach does not, because it only counts parameters.

      · In the present case, one is comparing a model with no pivot and no vertical displacement at the boundary between the black dots and the red dots (the 1-parameter unilinear model) to a bilinear model that allows both a change in slope and a vertical displacement for both lines. The 4-parameter model is superior if we use the BIC to take model complexity into account. However, 4-parameter has ludicrously more wiggle room. It will provide excellent fits—high maximum likelihood—to data sets in which the red points have slope > 1, slope 0, or slope < 0 and in which it is also true that the intercept for the red points lies well below or well above the black points (non-overlap in the marginal distribution of the red and black data). The 1-parameter model, on the other hand, will provide terrible fits to all such data (very low maximum likelihoods). Thus, we believe the BIC does not properly capture the immense actual difference in the complexity between the 1-parameter model (unilinear with slope 1) to the 4-parameter model (bilinear with neither the slope nor the intercept fixed in the linear domain).

      · In any event, because the pivot (change in slope between black and red data sets), if any, is small and likewise for the displacement (vertical change), it suffices for now to know that the variance captured by the 1-parameter model is only marginally improved by adding three more parameters. Researchers using the properly corrected measured rate of head poking to measure the rate of reinforcement a subject expects can therefore assume that they have an approximately scalar measure of the subject’s expectation. Given our data, they won’t be far wrong even near the extremes of the values commonly used for rates of reinforcement. That is a major advance in current thinking, with strong implications for formal models of associative learning. It implies that the performance function that maps from the neurobiological realization of the subject’s expectation is not an unknown function. On the contrary, it’s the simplest possible function, the scalar function. That is a powerful constraint on brain-behavior linkage hypotheses, such as the many hypothesized relations between mesolimbic dopamine activity and the expectation that drives responding in Pavlovian conditioning (Berridge, 2012; Jeong et al., 2022; Y.  Niv, Daw, Joel, & Dayan, 2007; Y. Niv & Schoenbaum, 2008).

      The data in Figures 4 and 5 are taken from the last 5 sessions of training. The exact number of sessions was somewhat arbitrary but was chosen to meet two goals: (1) to capture asymptotic responding, which is why we restricted this to the end of the training, and (2) to obtain a sufficiently large sample of data to estimate reliably each rat’s response rate. We have checked what the data look like using the last 10 sessions, and can confirm it makes very little difference to the results. We now note this in the revised manuscript. The data for terminal responding by all rats, averaged over both the last 5 sessions and last 10 sessions, can be downloaded from https://osf.io/vmwzr/

      Finally, as noted by the reviews, the relationship between the contextual rate of reinforcement and ITI responding should also be evident if we had measured context responding prior to introducing the CS. However, there was no period in our experiment when rats were given unsignalled reinforcement (such as is done during “magazine training” in some experiments). Therefore, we could not measure responding based on contextual conditioning prior to the introduction of the CS. This is a question for future experiments that use an extended period of magazine training or “poor positive” protocols in which there are reinforcements during the ITIs as well as during the CSs. The learning rate equation has been shown to predict reinforcements to acquisition in the poor-positive case (Balsam, Fairhurst, & Gallistel, 2006).

      (3) There is a disconnect between the gradual nature of learning shown in Figures 7 and 8 and the information-theoretic model proposed by the authors. To the extent that we understand the model, the animals should simply learn the association once the evidence crosses a threshold (nDKL > threshold) and then produce behavior in proportion to the expected reward rate. If so, why should there be a gradual component of learning as shown in these figures? In terms of the proportional response rule to the rate of rewards, why is it changing as animals go from 10% to 90% of peak response? The manuscript would be greatly strengthened if these results were explained within the authors' conceptual framework. If these results are not anticipated by the authors' conceptual framework, this should be explicitly stated in the manuscript.

      One of us (CRG) has earlier suggested that responding appears abruptly when the accumulated evidence that the CS reinforcement rate is greater than the contextual rate exceeds a decision threshold (C.R.  Gallistel, Balsam, & Fairhurst, 2004). The new more extensive data require a more nuanced view. Evidence about the manner in which responding changes over the course of training is to some extent dependent on the analytic method used to track those changes. We presented two different approaches. The approach shown in Figures 7 and 8 (now 6 and 7), extending on that developed by Harris (2022), assumes a monotonic increase in response rate and uses the slope of the cumulative response rate to identify when responding exceeds particular milestones (percentiles of the asymptotic response rate). This analysis suggests a steady rise in responding over trials. Within our theoretical model, this might reflect an increase in the animal’s certainty about the CS reinforcement rate with accumulated evidence from each trial. While this method should be able to distinguish between a gradual change and a single abrupt change in responding (Harris, 2022) it may not distinguish between a gradual change and multiple step-like changes in responding and cannot account for decreases in response rate.

      The other analytic method we used relies on the information theoretic measure of divergence, the nDkl (Gallistel & Latham, 2023), to identify each point of change (up or down) in the response record. With that method, we discern three trends. First, the onset tends to be abrupt in that the initial step up is often large (an increase in response rate by 50% or more of the difference between its initial value and its terminal value is common and there are instances where the initial step is to the terminal rate or higher). Second, there is marked within-subject variability in the response rate, characterized by large steps up and down in the parsed response rates following the initial step up, but this variability tends to decrease with further training (there tend to be fewer and smaller steps in both the ITI response rates and the CS response rate as training progresses). Third, the overall trend, seen most clearly when one averages across subjects within groups is to a moderately higher rate of responding later in training than after the initial rise. We think that the first tendency reflects an underlying decision process whose latency is controlled by diminishing uncertainty about the two reinforcement rates and hence about their ratio. We think that decreasing uncertainty about the true values of the estimated rates of reinforcement is also likely to be an important part of the explanation for the second tendency (decreasing within-subject variation in response rates). It is less clear whether diminishing uncertainty can explain the trend toward a somewhat greater difference in the two response rates as conditioning progresses. It is perhaps worth noting that the distribution of the estimates of the informativeness ratio is likely to be heavy tailed and have peculiar properties (as witness, for example, the distribution of the ratio of two gamma distributions with arbitrary shape and scale parameters) but we are unable at this time to propound an explanation of the third trend.

      (4) Page 27, Procedure, final sentence: The magazine responding during the ITI is defined as the 20 s period immediately before CS onset. The range of ITI values (Table 1) always starts as low as 15 s in all 14 groups. Even in the case of an ITI on a trial that was exactly 20 s, this would also mean that the start of this period overlaps with the termination of the CS from the previous trial and delivery (and presumably consumption) of a pellet. It should be indicated whether the definition of the ITI period was modified on trials where the preceding ITI was < 20 s, and if any other criteria were used to define the ITI. Were the rats exposed to the reinforcers/pellets in their home cage prior to acquisition?

      There was an error in the description provided in the original text. The pre-CS period used to measure the ITI responding was 10 s rather than 20 s. There was always at least a 5-s gap between the end of the previous trial and the start of the pre-CS period. The statement about the pre-CS measure has been corrected in the revised manuscript.

      (5) For all the analyses, the exact models that were fit and the software used should be provided. For example, it is not necessarily clear to the reader (particularly in the absence of degrees of freedom) that the model discussed in Figure 3 fits on the individual subject data points or the group medians. Similarly, in Figure 6 there is no indication of whether a single regression model was fit to all the plotted data or whether tests of different slopes for each of the conditions were compared. With regards to the statistics in Figure 6, depending on how this was run, it is also a potential problem that the analyses do not correct for the potentially highly correlated multiple measurements from the same subjects, i.e. each rat provides 4 data points which are very unlikely to be independent observations.

      Details about model fitting have been added to the revision. The question about fitting a single model or multiple models to the data in Figure 6 (now 5) is addressed in response 2 above. In Figure 5, each rat provides 2 behavioural data points (ITI response rate and CS response rate) and 2 values for reinforcement rate (1/C and 1/T). There is a weak but significant correlation between the ITI and CS response rates (r = 0.28, p < 0.01; log transformed to correct for heteroscedasticity). By design, there is no correlation between the log reinforcement rates (r = 0.06, p = .404).

      CONCEPTUAL

      (1) We take the point that where traditional theories (e.g., Rescorla-Wagner) and rate estimation theory (RET) both explain some phenomenon, the explanation in terms of RET may be preferred as it will be grounded in aspects of an animal's experience rather than a hypothetical construct. However, like traditional theories, RET does not explain a range of phenomena - notably, those that require some sort of expectancy/representation as part of their explanation. This being said, traditional theories have been incorporated within models that have the representational power to explain a broader array of phenomena, which makes me wonder: Can rate estimation be incorporated in models that have representational power; and, if so, what might this look like? Alternatively, do the authors intend to claim that expectancy and/or representation - which follow from probabilistic theories in the RW mould - are unnecessary for explanations of animal behaviour?***

      It is important for the field to realize that the RW model cannot be used to explain the results of Rescorla’s (Rescorla, 1966; Rescorla, 1968, 1969) contingency-not-pairing experiments, despite what was claimed by Rescorla and Wagner (Rescorla & Wagner, 1972; Wagner & Rescorla, 1972) and has subsequently been claimed in many modelling papers and in most textbooks and reviews (Dayan & Niv, 2008; Y. Niv & Montague, 2008). Rescorla programmed reinforcements with a Poisson process. The defining property of a Poisson process is its flat hazard function; the reinforcements were equally likely at every moment in time when the process was running. This makes it impossible to say when non-reinforcements occurred and, a fortiori, to count them. The non-reinforcements are causal events in RW algorithm and subsequent versions of it. Their effects on associative strength are essential to the explanations proffered by these models. Non-reinforcements—failures to occur, updates when reinforcement is set to 0, hence also the lambda parameter—can have causal efficacy only when the successes may be predicted to occur at specified times (during “trials”). When reinforcements are programmed by a Poisson process, there are no such times. Attempts to apply the RW formula to reinforcement learning soon foundered on this problem (Gibbon, 1981; Gibbon, Berryman, & Thompson, 1974; Hallam, Grahame, & Miller, 1992; L.J. Hammond, 1980; L. J. Hammond & Paynter, 1983; Scott & Platt, 1985). The enduring popularity of the delta-rule updating equation in reinforcement learning depends on “big-concept” papers that don’t fit models to real data and discretize time into states while claiming to be real-time models (Y. Niv, 2009; Y. Niv, Daw, & Dayan, 2005).

      The information-theoretic approach to associative learning, which sometimes historically travels as RET (rate estimation theory), is unabashedly and inescapably representational. It assumes a temporal map and arithmetic machinery capable in principle of implementing any implementable computation. In short, it assumes a Turing-complete brain. It assumes that whatever the material basis of memory may be, it must make sense to ask of it how many bits can be stored in a given volume of material. This question is seldom posed in associative models of learning, nor by neurobiologists committed to the hypothesis that the Hebbian synapse is the material basis of memory. Many—including the new Nobelist, Geoffrey Hinton— would agree that the question makes no sense. When you assume that brains learn by rewiring themselves rather than by acquiring and storing information, it makes no sense.

      When a subject learns a rate of reinforcement, it bases its behavior on that expectation, and it alters its behavior when that expectation is disappointed. Subjects also learn probabilities when they are defined. They base some aspects of their behavior on those expectations, making computationally sophisticated use of their representation of the uncertainties (Balci, Freestone, & Gallistel, 2009; Chan & Harris, 2019; J. A. Harris, 2019; J.A. Harris & Andrew, 2017; J. A. Harris & Bouton, 2020; J. A. Harris, Kwok, & Gottlieb, 2019; Kheifets, Freestone, & Gallistel, 2017; Kheifets & Gallistel, 2012; Mallea, Schulhof, Gallistel, & Balsam, 2024 in press).

      (2) The discussion of Rescorla's (1967) and Kamin's (1968) findings needs some elaboration. These findings are already taken to mean that the target CS in each design is not informative about the occurrence of the US - hence, learning about this CS fails. In the case of blocking, we also know that changes in the rate of reinforcement across the shift from stage 1 to stage 2 of the protocol can produce unblocking. Perhaps more interesting from a rate estimation perspective, unblocking can also be achieved in a protocol that maintains the rate of reinforcement while varying the sensory properties of the US (Wagner). How does rate estimation theory account for these findings and/or the demonstrations of trans-reinforcer blocking (Pearce-Ganesan)? Are there other ways that the rate estimation account can be distinguished from traditional explanations of blocking and contingency effects? If so, these would be worth citing in the discussion. More generally, if one is going to highlight seminal findings (such as those by Rescorla and Kamin) that can be explained by rate estimation, it would be appropriate to acknowledge findings that challenge the theory - even if only to note that the theory, in its present form, is not all-encompassing. For example, it appears to me that the theory should not predict one-trial overshadowing or the overtraining reversal effect - both of which are amenable to discussion in terms of rates.

      I assume that the signature characteristics of latent inhibition and extinction would also pose a challenge to rate estimation theory, just as they pose a challenge to Rescorla-Wagner and other probability-based theories. Is this correct?

      The seemingly contradictory evidence of unblocking and trans-reinforcer blocking by Wagner and by Pearce and Ganesan cited above will be hard for any theory to accommodate. It will likely depend on what features of the US are represented in the conditioned response.

      RET predicts one-trial overshadowing, as anyone may verify in a scientific programming language because it has no free parameters; hence, no wiggle room. Overtraining reversal effects appear to depend on aspects of the subjects’ experience other than the rate of reinforcement. It seems unlikely that it can proffer an explanation.

      Various information-theoretic calculations give pretty good quantitative fits to the relatively few parametric studies of extinction and the partial-reinforcement extinction effect (see Gallistel (2012, Figs 3 & 4); Wilkes & Gallistel (2016, Fig 6) and Gallistel (2025, under review, Fig 6). It has not been applied to latent inhibition, in part for want of parametric data. However, clearly one should not attribute a negative rate to a context in which the subject had never been reinforced. An explanation, if it exists, would have to turn on the effect of that long period on initial rate estimates AND on evidence of a change in rate, as of the first reinforcement.

      Recommendations for authors:

      MINOR POINTS

      (1) It is not clear why Figure 3C is presented but not analyzed, and why the data presented in Figure 4 to clarify the spread of the distribution of the data observed across the plots in Figure 3 uses the data from Figure 3C. This would seem like the least representative data to illustrate the point of Figure 4. It also appears that the data plotted in Figure 4 corresponds to Figure 3A and 3B rather than the odds 10:1 data indicated in the text.

      Figures 3 has changed as already described. The data previously plotted in Figure 4 are now shown in 3B and corresponds to that plotted in Figure 3A.

      (2) Log(T) was not correlated with trials to criterion. If trials to criterion is inversely proportional to log(C/T) and C is uncorrelated with T, shouldn't trials to criterion be correlated with log(T)? Is this merely a matter of low statistical power?

      Yes. There is a small, but statistically non-significant, correlation between log(T) and trials to criterion, r = 0.35, p = .22. That correlation drops to .08 (p = .8) after factoring out log(C/T), which demonstrates that the weak correlation between log(T) and trials to criterion is based on the correlation between log(t) and log(C/T).

      (3) The rationale for the removal of the high information condition samples in the Fig 8 "Slope" plot to be weak. Can the authors justify this choice better? If all data are included, the relationship is clearly different from that shown in the plot.

      We have now reported correlations that include those 3 groups but noted that the correlations are largely driven by the much lower slope values of those 3 groups which is likely an artefact of their smaller number of trials. We use this to justify a second set of correlations that excludes those 3 groups.

      (4) The discussion states that there is at most one free parameter constrained by the data - the constant of proportionality for response rate. However, there is also another free parameter constrained by data-the informativeness at which expected trials to acquisition is 1.

      I think this comment is referring to two different sets of data. The constant of proportionality of the response rate refers to the scalar relationship between reinforcement rate and terminal response rate shown in Figure 5. The other parameter, the informativeness when trials to acquisition equals 1, describes the intercept of the regression line in Figure 1 (and 3).

      (5) The authors state that the measurement of available information is not often clear. Given this, how is contingency measurable based on the authors' framework?

      (6) Based on the variables provided in Supplementary File 3, containing the acquisition data, we were unable to reproduce the values reported in the analysis of Figure 3.

      Figure 3 has changed, using new criteria for trials to acquisition that attempt to match the criterion used by Gibbon and Balsam. The data on which these figures are based has been uploaded into OSF.

      GRAPHICAL AND TYPOGRAPHICAL

      (1) Y-axis labels in Figure 1 are not appropriately placed. 0 is sitting next to 0.1. 0 should sit at the bottom of the y-axis.

      If this comment refers to the 0 sitting above an arrow in the top right corner of the plot, this is not misaligned. The arrow pointing to zero is used to indicate that this axis approaches zero in the upward direction. 0 should not be aligned to a value on the axis since a learning rate of zero would indicate an infinite number of learning trials. The caption has been edited to explain this more clearly.

      (2) Typo, Page 6, Final Paragraph, line 4. "Fourteen groups of rats were trained with for 42 session"

      Corrected. Thank you.

      (3) Figure 3 caption: Typo, should probably be "Number of trials to acquisition"?

      This change has now been made. The axis shows reinforcements to acquisition to be consistent with Gibbon and Balsam, but trials and number of reinforcements are identical in our 100% reinforcement schedule.

      (4) Typo Page 17 Line 1: "Important pieces evidence about".

      Correct. Thank you.

      (5) Consider consistent usage of symbols/terms throughout the manuscript (e.g. Page 22, final paragraph: "iota = 2" is used instead of the corresponding symbol that has been used throughout).

      Changed.

      (6) Typo Page 28, Paragraph 1, Line 9: "We used a one-sample t-test using to identify when this".

      This section of text has been changed to reflect the new analysis used for the data in Figure 3.

      (7) Typo Page 29, Paragraph 1, Line 2: "problematic in cases where one of both rates are undefined" either typo or unclear phrasing.

      “of” has been corrected to “or”

      (8) Typo Page 30: Equation 3 appears to have an error and is not consistent with the initial printing of Equation 3 in the manuscript.

      The typo in initial expression of Eq 3 (page 23) has been corrected.

      (9) Typo Page 33, Line 5: "Figures 12".

      Corrected.

      (10) Typo Page 34, Line 10: "and the 5 the increasingly"? Should this be "the 5 points that"?

      Corrected.

      (11) Typo Page 35, Paragraph 2: "estimate of the onset of conditioned is the trial after which".

      Corrected.

      (12) Clarify: Page 35, final paragraph: it is stated that four-panel figures are included for each subject in the Supplementary files, but each subject has a six-panel figure in the Supplementary file.

      The text now clarifies that the 4-panel figures are included within the 6-panel figures in the Supplementary materials.

      (13) It is hard to identify the different groups in Figure 2 (Plot 15).

      The figure is simply intended to show that responding across seconds within the trial is relatively flat for each group. Individuation of specific groups is not particularly important.

      (14) It appears that the numbering on the y-axis is misaligned in Figure 2 relative to the corresponding points on the scale (unless I have misunderstood these values and the response rate measure to the ITI can drop below 0?).

      The numbers on the Y axes had become misaligned. That has now been corrected.

      (15) Please include the data from Figure 3A in the spreadsheet supplementary file 3. If it has already been included as one of the columns of data, please consider a clearer/consistent description of the relevant column variable in Supplementary File 1.

      The data from Figure 3 are now available from the linked OSF site, referenced in the manuscript.

      (16) Errors in supplementary data spreadsheets such that the C/T values are not consistent with those provided in Table 1 (C/T values of 4.5, 54, 180, and 300 are slightly different values in these spreadsheets). A similar error/mismatch appears to have occurred in the C/T labels for Figures (e.g. Figure 10) and the individual supplementary figures.

      The C/T values on the figures in the supplementary materials have been corrected and are now consistent with those in Table 1.

      (17) Currently the analysis and code provided at https://osf.io/vmwzr/ are not accessible without requesting access from the author. Please consider making these openly available without requiring a request for authorization. As such, a number of recommendations made here may already have been addressed by the data and code deposited on OSF. Apologies for any redundant recommendations.

      Data and code are now available in at the OSF site which has been made public without requiring request.

      (18) Please consider a clearer and more specific reference to supplementary materials. Currently, the reader is required to search through 4 separate supplementary files to identify what is being discussed/referenced in the text (e.g. Page 18, final line: "see Supplementary Materials" could simply be "see Figure S1").

      We have added specific page numbers in references to the Supplementary Materials.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This manuscript describes a novel magnetic steering technique to target human adipose derived mesenchymal stem cells (hAMSC) or induce pluripotent stem cells to the TM (iPSC-TM). The authors show that delivery of the stem cells lowered IOP, increased outflow facility, and increased TM cellularity.

      Strengths:

      The technique is novel and shows promise as a novel therapeutic to lower IOP in glaucoma. hAMSC are able to lower IOP below the baseline as well as increase outflow facility above baseline with no tumorigenicity. These data will have a positive impact on the field and will guide further research using hAMSC in glaucoma models.

      Weaknesses:

      The transgenic mouse model of glaucoma the authors used did not show ocular hypertensive phenotypes at 6-7 months of age as previously reported. Therefore, if there is no pathology in these animals the authors did not show a restoration of function, but rather a decrease in pressure below normal IOP.

      We appreciate the reviewer’s feedback and agree with the statement of weakness. Accordingly, we have revised the language to improve clarity. Specifically, all references to "restoration of IOP" or "restoration of conventional outflow function" have been replaced with more precise phrases, in the following locations: 

      • lines 2-3 (title): Magnetically steered cell therapy for reduction of intraocular pressure  as a treatment strategy for open-angle glaucoma

      • lines 36-8 (abstract): We observed a 4.5 [3.1, 6.0] mmHg or 27% reduction in intraocular pressure (IOP) for nine months after a single dose of only 1500 magnetically-steered hAMSCs, explained by increased conventional outflow facility and associated with higher TM cellularity.

      • lines 45-6 (one-sentence summary): A novel magnetic cell therapy provided effective intraocular pressure reduction in mice, motivating future translational studies.

      • lines 123-4 (introduction): Despite the absence of ocular hypertension in our MYOC<sup>Y437H</sup> mice, our data demonstrate sustained IOP lowering and a significant benefit of magnetic cell steering in the eye, particularly for hAMSCs, strongly indicating further translational potential.

      • line 207 (results): The observed reductions in IOP and increases in outflow facility after delivery of both cell types suggested functional changes in the conventional outflow pathway.

      • line 509-10 (discussion): In summary, this work shows the effectiveness of our novel magnetic TM cell therapy approach for long-term IOP reduction through functional changes in the conventional outflow pathway.

      It is very important to note that at the 23rd annual Trabecular Meshwork Study Club meeting (San Diego, December 2024), Dr. Zode, the lead author of reference 26 originally describing the transgenic myocilin mouse model, announced during his talk that this model no longer demonstrates the glaucomatous phenotype in his hands, which incidentally has motivated him to create a new, CRISPR MYOC mouse model. Dr. Zode also stated that he was uncertain of the reason for this loss of phenotype. His observation is consistent with our report. However, other investigators continue to observe the desired phenotype in their colonies of this mouse (Dr. Wei Zhu, personal communication). Continued use of this mouse model should therefore be approached with caution. 

      Reviewer #2 (Public review):

      Summary:

      This observational study investigates the efficacy of intracameral injected human stem cells as a means to re-functionalize the trabecular meshwork for the restoration of intraocular pressure homeostasis. Using a murine model of glaucoma, human adiposederived mesenchymal stem cells are shown to be biologically safer and functionally superior at eliciting a sustained reduction in intraocular pressure (IOP). The authors conclude that the use of human adipose-derived mesenchymal stem cells has the potential for long-term treatment of ocular hypertension in glaucoma.

      Strengths:

      A noted strength is the use of a magnetic steering technique to direct injected stem cells to the iridocorneal angle. An additional strength is the comparison of efficacy between two distinct sources of stem cells: human adipose-derived mesenchymal vs. induced pluripotent cell derivatives. Utilizing both in vivo and ex vivo methodology coupled with histological evidence of introduced stem cell localization provides a consistent and compelling argument for a sustainable impact exogenous stem cells may have on the refunctionalization of a pathologically compromised TM.

      Weaknesses:

      A noted weakness of the study, as pointed out by the authors, includes the unanticipated failure of the genetic model to develop glaucoma-related pathology (elevated IOP, TM cell changes). While this is most unfortunate, it does temper the conclusion that exogenous human adipose derived mesenchymal stem cells may restore TM cell function. Given that TM cell function was not altered in their genetic model, it is difficult to say with any certainty that the introduced stem cells would be capable of restoring pathologically altered TM function. A restoration effect remains to be seen. 

      We acknowledge that the phrase “restoration of TM function” is not fully supported by our results, given the absence of ocular hypertension in our animal model. Accordingly, we have revised the language to more precisely describe our findings. For specific details regarding these changes, please refer to our response to Reviewer 1’s public comments above.

      Another noted complication to these findings is the observation that sham intracameralinjected saline control animals all showed elevated IOP and reduced outflow facility, compared to WT or Tg untreated animals, which allowed for more robust statistically significant outcomes. Additional comments/concerns that the authors may wish to address are elaborated in the Private Review section.

      We agree that sham-injected animals tended to have higher average IOPs than transgenic animals in our study. However, these differences did not reach statistical significance and therefore remain inconclusive. Further, an increase in IOP following placebo injection has been previously reported (Zhu et al., 2016). 

      Prompted by the Referee’s comments and also a private comment from Referee 1, we further investigated this effect by analyzing IOP in uninjected contralateral eyes at the mid-term time point and comparing the IOPs in these eyes to other cohorts, as now presented as additional data in Supplementary Tables 1 and 2 and Supplementary Figure 4 (see below). In brief, the uninjected contralateral transgenic eyes (10 months old) showed an IOP of 16.5 [15.9, 17.1] mmHg, which was intermediate between the IOP levels of the 6–7-month-old Tg group (15.4 [14.7, 16.1] mmHg) and the sham group (16.9 [15.5, 18.2] mmHg). However, none of these differences reached statistical significance. Additionally, we cannot rule out potential contralateral effects induced by the injections.

      Regarding the best way to assess the effect of cell treatment, we feel very strongly that the most relevant IOP comparison is between cell-injected eyes and control (vehicle)-injected eyes, since this provides the most direct accounting for the effects of injection itself on IOP. Other comparisons, such as WT or untreated Tg eyes vs. cell-treated eyes, are interesting but harder to interpret. However, in response to the referee’s comment, we have added comparisons between cell-treated groups and untreated Tg eyes to Table 2, adjusting the post-hoc corrections accordingly. All hAMSC treated groups show statistically significant decrease in IOP even compared to Tg untreated eyes, while iPSC-TMs fail to reach such significance.

      The following changes were made to the manuscript:

      Lines 326 et seq.: Eyes subjected to saline injection exhibited marginally higher IOPs and lower outflow facilities on average, in comparison to the transgenic animals at baseline. However, due to the lack of statistical significance in these differences and the inherent age difference between the saline-injected animals and the non-injected controls at baseline, no conclusive inference can be drawn regarding the effect of saline injection. To investigate this phenomenon further, we also analyzed IOPs in uninjected contralateral eyes at the midterm time point (Supplementary Tables 1 and 2, Supplementary Figure 4). The uninjected contralateral transgenic eyes (10 months old) showed an IOP of 16.5 [15.9, 17.1] mmHg, which was intermediate between the IOP levels of the 6–7-month-old Tg group (15.4 [14.7, 16.1] mmHg) and the sham-injected group (16.9 [15.5, 18.2] mmHg). However, none of these differences reached statistical significance. Of note, contralateral hypertension has been previously reported after subconjunctival and periocular injection of dexamethasoneloaded nanoparticles (34), and we similarly cannot definitively rule out potential contralateral effects induced by our stem cell injections. Thus, we cannot draw any definite conclusions from these additional IOP comparisons at this time.

      Reviewer #3 (Public review):

      Summary:

      The purpose of the current manuscript was to investigate a magnetic cell steering technique for efficiency and tissue-specific targeting, using two types of stem cells, in a mouse model of glaucoma. As the authors point out, trabecular meshwork (TM) cell therapy is an active area of research for treating elevated intraocular pressure as observed in glaucoma. Thus, further studies determining the ideal cell choice for TM cell therapy is warranted. The experimental protocol of the manuscript involved the injection of either human adipose derived mesenchymal stem cells (hAMSCs) or induced pluripotent cell derivatives (iPSC-TM cells) into a previously reported mouse glaucoma model, the transgenic MYOCY437H mice and wild-type littermates followed by the magnetic cell steering. Numerous outcome measures were assessed and quantified including IOP, outflow facility, TM cellularity, retention of stem cells, and the inner wall BM of Schlemm's canal.

      Strengths:

      All of these analyses were carefully carried out and appropriate statistical methods were employed. The study has clearly shown that the hAMSCs are the cells of choice over the iPSC-TM cells, the latter of which caused tumors in the anterior chamber. The hAMSCs were shown to be retained in the anterior segment over time and this resulted in increased cellular density in the TM region and a reduction in IOP and outflow facility. These are all interesting findings and there is substantial data to support it.

      Weaknesses:

      However, where the study falls short is in the MYOCY437H mouse model of glaucoma that was employed. The authors clearly state that a major limitation of the study is that this model, in their hands, did not exhibit glaucomatous features as previously reported, such as a significant increase in IOP, which was part of the overall purpose of the study. The authors state that it is possible that "the transgene was silenced in the original breeders". The authors did not show PCR, western blot, or immuno of angle tissue of the tg to determine transgenic expression (increased expression of MYOC was shown in the angle tissue of the transgenics in the original paper by Zode et al, 2011). This should be investigated given that these mice were rederived. Thus, it is clearly possible that these are not transgenic mice.

      All MYOC mice that were used in this study were genotyped and confirmed to carry the transgene as noted in the original version of the paper (see lines 590-2). However, the transgene seems not to have been active, based on the lack of ocular hypertension as well as the lack of differences in supporting endpoints such as outflow facility and TM cellularity. While it would have been possible to carry out their recommended assays to investigate the root cause of this loss of phenotype this was not an objective of our study. Thus we instead here focus simply on communicating the observed loss of phenotype to readers. We also refer the referee to the final paragraph of our response to Referee 1. 

      If indeed they are transgenics, the authors may want to consider the fact that in the Zode paper, the most significant IOP elevation in the mutant mice was observed at night and thus this could be examined by the authors. 

      This is a good point. However, while the dark-phase IOP does exhibit a distinctly larger elevation (as previously observed in hypertonic saline sclerosis), Zode et al. also reported a notable 3 mmHg IOP increase during the light phase. The complete absence of such daytime (light phase) IOP elevation in our animals diminished our enthusiasm for pursuing darkphase IOP measurements. 

      Other glaucomatous features of these mice could also have been investigated such as loss of RGCs, to further determine their transgenic phenotype. 

      We agree that these other phenotypes could be studied, but in the absence of any detectable IOP elevation (and thus lack of mechanical insult on RGC axons), loss of RGC is extremely unlikely. We also note that the loss of retinal ganglion cells (RGCs) in the Myocilin model remains a subject of controversy. For example, despite a significant increase in IOP (>10 mmHg) in this model across four mouse strains, three, including C57BL6/J, did not exhibit any signs of optic nerve damage (McDowell et al., 2012). In contrast, Zhu et al. observed considerable nerve damage in this model, which was reversed following iPSC-TM cell transplantation (Zhu et al., 2016). Given these conflicting findings, we directed our efforts toward outcome measures directly related to aqueous humor dynamics.

      Finally, while increased cellular density in the TM region was observed, proliferative markers could be employed to determine if the transplanted cells are proliferating.

      We agree that identifying the source of the increased trabecular meshwork (TM) cellularity we observed is interesting and we plan to pursue that in future studies. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The sham-injected transgenic animals showed elevated IOP 3-4 weeks after the baseline measurements in the transgenic mice. The authors justify this may be due to the increase in age in these animals. However, this seems unlikely due to the short duration of time between measurement of the baseline IOP and the Short time point (3-4 weeks). The authors do not provide IOP data for any WT sham injected eyes or naïve Tg eyes at these time points. These data are essential to determine if the elevation is due to the sham injection, age, or the transgene. Could it be that the IOP in this cohort of Tg mice didn't increase until 7-8 months of age instead of 6-7 months of age? The methods state only unilateral injections of the stem cells were done so it is assumed the contralateral eye was uninjected. What was the IOP in these eyes? These data would clarify the confusion in the data from sham-injected animals compared to baseline (naive) measurements.

      We agree that the average IOP in saline-injected groups is higher than in WT or non-treated Tg mice, although the difference is inconclusive due to a lack of statistical significance. It is important to note, however, that this difference is subtle and not comparable to the 3 mmHg light-phase IOP elevation previously observed in this model (Zode et al., 2011). 

      We appreciate the reviewer’s suggestion to include IOP data from the contralateral uninjected eyes, and we have now provided this information along with the comparative statistics in the supplementary materials. Additional details can be found in our response to a similar comment from Reviewer 2’s public review. In summary, the IOP difference in contralateral non-injected ten-month-old transgenic eyes was even smaller than in the original Tg group. IOP elevation following saline injection in mice has been reported previously (Zhu et al., 2016). As a potential confounding factor, we highlight possible contralateral effects of the injection itself (which is why we initially did not analyze IOP in the contralateral eyes).

      The hAMSC-treated eyes appear to lower IOP even from baseline (although stats were only provided compared to the sham-injected eyes, which as stated above appear to have increased).

      However, the iPSC-TM-treated eyes had IOPs equal to that of the baseline measurements taken 3 weeks prior. The significance is coming from the "sham-treated" eyes which had elevated IOPs. The controls listed above should be included to make these conclusions.

      The reviewer makes an astute observation. Please refer to our response to a similar observation by Reviewer 2 under public reviews, where we provide and discuss the comparative statistics noted by the reviewer. However, we feel very strongly that the most relevant IOP comparison is between cell-injected eyes and control-injected eyes. 

      If the transgenic mouse model truly did not have a phenotype, then the authors are testing the ability of the stem cells to lower IOP from baseline normal pressures. Therefore, the authors are not "restoring function of the conventional outflow pathway" as there is no damage to begin with. The language in the manuscript should be corrected to reflect this if the transgenics have no phenotype.

      We agree and have adjusted the language accordingly. For further details, please refer to our response to your public review.

      The authors noted in the iPSC-TM-treated eyes there was a high rate of tumorigenicity. If the magnetic steering of these cells is specific and targeted to the TM, why do the tumors form near the central iris?

      While magnetic steering is more specific to the trabecular meshwork (TM) than previouslyused approaches (Bahrani Fard et al., 2023), it is not perfect, and a modest amount of offtarget delivery to the iris, including its central portion, still occurs. Apparently, it took only a few mis-directed iPSC-TM cells to lead to tumors in this work, which is a serious concern for future translational approaches. 

      Reviewer #2 (Recommendations for the authors):

      (1) It appears that mice were injected unilaterally (Line 590). I may have missed this, but was the companion un-injected eye analyzed in this study? If not analyzed, was there a confounding concern or limitation that necessitated omitting this possible control option?

      Contralateral effects, such as hypertension in the untreated eye after subconjunctival and periocular injection of dexamethasone-loaded nanoparticles, have previously been reported in the literature (Li et al., 2019) and also reported anecdotally by other leaders in the field to the senior authors, which is why we did not initially analyze contralateral eyes in this study. However, prompted by this comment and others, we have now included the IOP measurements for contralateral uninjected ten-month-old transgenic eyes in the supplementary materials. For further details, please refer to our response to your public review.

      (2) Were all these mice the same gender? Would gender be expected to alter the findings of this study?

      Animals of both sexes were randomly chosen and included in the study. We added the following statement to the Materials and Methods section (line 530): After breeding and genotyping, mice, regardless of sex, were maintained to age 6-7 months, when transgenic animals were expected to have developed a POAG phenotype.

      (3) As noted in the public review, the use of PBS for a control seems to have resulted in a slight elevation in IOP (Figure 2) as well as a reduction in outflow facility (Figure 3B) when compared to WT or Tg mice. Was this difference statistically significant? 

      The differences between the sham (saline)-injected groups at any time point and untreated Tg mice did not reach statistical significance for IOP, facility, or TM cellularity and for facility, did not even show clear trends. For example, WT mice had, on average, 0.2 mmHg higher IOP and 0.6 nl/min/mmHg greater facility than the Tg group. Meanwhile on a similar scale, the long-term sham group exhibited 0.4 nl/min/mmHg higher facility compared to the Tg group. As the statistical tests indicate, these differences should be interpreted more as noise than meaningful signal. 

      If so, then it should be noted as to whether the observed decrease in IOP following stem cell injection remained statistically significant when compared to these un-injected control animals. If significance was lost, then this should be appropriately noted and discussed. It is not apparently obvious why sham controls should have elevated IOP. This is a design and statistical concern.

      Please refer to our response to a similar observation by Reviewer 1. We believe that comparing the treatment (cell suspension in saline) with its age-matched vehicle (saline) is the appropriate approach which maintains rigor by most directly accounting for the effects of injection. 

      (4) The tonicity of the PBS used as a vehicle control was not stated and I did not see within the methods whether the stem cells were suspended using this same PBS vehicle. I assume isotonic phosphate buffered saline was used and that the stem cells were resuspended using the same sterile PBS. 

      Thanks for catching this. We added “sterile PBS (1X, Thermo Fisher Scientific, Waltham, MA)” to the Methods section of the manuscript (line 567). 

      With regards to using PBS as an injection control, I wonder if a better comparable control might have been to use mesenchymal stem cells that were rendered incapable of proliferating prior to intracameral injection. This, of course, addresses the unexplained mechanism(s) by which mesenchymal stem cells elicit a decrease in IOP.

      This is an interesting idea, and represents another level of control. However, we explicitly chose not to use non-proliferating hAMSCs as a control, for several reasons. Firstly, a saline injection is the simplest control and in this initial study with multiple groups, we did not feel another experimental group should be added. Second, this control would not rule out paracrine effects from injected cells, which our data suggested are an important effect. Third, rendering injected cells truly non-proliferative could introduce unwanted/unknown phenotypes in these cells that would need to be carefully characterized. That being said, if an efficient method could be developed to render an entire population of these cells irreversibly non-proliferating, the reviewer’s suggestion would be worth pursuing to better understand the mechanism of TM cell therapies. 

      (5) As noted in Figure 4C, TM cellular density as quantified was not altered in the sham control, so a loss of cellular density can not explain the elevated IOP with this group. Injecting viable (not determined?) mesenchymal stem cells did show, over the short term, a noted increase in TM cellular density. 

      Thank you for noting this. We agree that changes in cell density do not explain the mild IOP elevation in the sham group. As the referee certainly is aware, there are multiple reasons that IOP can be elevated (changes in trabecular meshwork extracellular matrix, changes in trabecular meshwork stiffness) that are not necessarily related to cell density.  Since we do not know definitively the cause of this mild elevation, we would prefer to not speculate about it in the manuscript. 

      Thanks for pointing out our omission of a statement about injected cell viability. We have now included the following statement in the Materials and Methods section (564-566): “For all the experiments where animals received hAMSC, cell count and >90% viability was verified using a Countess II Automated Cell Counter (Thermo Fisher Scientific, Waltham, MA).”

      I'm confused, as clearly stated (Lines 431-432), mesenchymal stem cells accumulated close to, but not within, the TM. How is it that TM cellular density increased if these stem cells did not enter the TM? The authors may wish to clarify this distinction. Given that mesenchymal stem cells did not increase the risk of tumorigenicity, do the authors have any evidence that these cells actually proliferated post-injection or did they undergo senesce thereby displaying senescence-associated secretory phenotype as a source of paracrine support?

      As the reviewer correctly noted, our observations show that hAMSCs primarily accumulated close to, but outside, the TM (likely caught up in the pectinate ligaments). Based on observations of increased TM cellularity, we think that the most likely explanation of these findings is paracrine signaling, as the reviewer suggests and which was discussed at length in the original version of the manuscript (lines 453-477). 

      We agree that, despite observing little signal from hAMSCs within the TM, labeling with proliferation markers (e.g., Ki-67) and searching for co-localization with exogenous cells, and/or labeling for senescence markers would have provided more mechanistic information. This is an excellent topic for future study, which we plan to pursue, but was outside the scope of this study. 

      (6) As noted in the public review, I think it is a bit of a stretch to even suggest that the findings of this study support stem cell restoration of TM function given that the model apparently did not produce TM cell dysfunction as anticipated. A restoration effect remains to be seen.

      We agree and have adjusted the language accordingly. For further details, please refer to our response to Reviewer 1’s public comment.

      Reviewer #3 (Recommendations for the authors):

      (1) Show PCR, western blot, or immuno of angle tissue of the MYOC tg to confirm transgenic expression.

      (2) Examine the IOP of mice at night.

      (3) Investigate other glaucomatous features in the mice to determine if they have any of the transgenic phenotypes previously reported.

      (4) Examine proliferative markers in the TM region of angles injected with stem cells.

      Please see our responses to all four of these comments in the public section.

      Bibliography (for this response letter only)

      Bahrani Fard, M.R., Chan, J., Sanchez Rodriguez, G., Yonk, M., Kuturu, S.R., Read, A.T., Emelianov, S.Y., Kuehn, M.H., Ethier, C.R., 2023. Improved magnetic delivery of cells to the trabecular meshwork in mice. Exp. Eye Res. 234, 109602. https://doi.org/10.1016/j.exer.2023.109602

      Li, G., Lee, C., Agrahari, V., Wang, K., Navarro, I., Sherwood, J.M., Crews, K., Farsiu, S., Gonzalez, P., Lin, C.-W., Mitra, A.K., Ethier, C.R., Stamer, W.D., 2019. In vivo measurement of trabecular meshwork stiffness in a corticosteroid-induced ocular hypertensive mouse model. Proc. Natl. Acad. Sci. U. S. A. 116, 1714–1722.

      https://doi.org/10.1073/pnas.1814889116

      Zhu, W., Gramlich, O.W., Laboissonniere, L., Jain, A., Sheffield, V.C., Trimarchi, J.M., Tucker, B.A., Kuehn, M.H., 2016. Transplantation of iPSC-derived TM cells rescues glaucoma phenotypes in vivo. Proc. Natl. Acad. Sci. 113, E3492–E3500.

      Zode, G.S., Kuehn, M.H., Nishimura, D.Y., Searby, C.C., Mohan, K., Grozdanic, S.D., Bugge, K., Anderson, M.G., Clark, A.F., Stone, E.M., Sheffield, V.C., 2011. Reduction of ER stress via a chemical chaperone prevents disease phenotypes in a mouse model of primary open angle glaucoma. J. Clin. Invest. 121, 3542–3553. https://doi.org/10.1172/JCI58183

    1. Author response:

      The following is the authors’ response to the original reviews

      Public reviews:

      Reviewer #1:

      The authors attempted to replicate previous work showing that counterconditioning leads to more persistent reduction of threat responses, relative to extinction. They also aimed to examine the neural mechanisms underlying counterconditioning and extinction. They achieved both of these aims and were able to provide some additional information, such as how counterconditioning impacts memory consolidation. Having a better understanding of which neural networks are engaged during counterconditioning may provide novel pharmacological targets to aid in therapies for traumatic memories. It will be interesting to follow up by examining the impact of varying amounts of time between acquisition and counterconditioning phases, to enhance replicability to real-world therapeutic settings.

      Major strengths

      · This paper is very well written and attempts to comprehensively assess multiple aspects of counterconditioning and extinction processes. For instance, the addition of memory retrieval tests is not core to the primary hypotheses but provides additional mechanistic information on how episodic memory is impacted by counterconditioning. This methodical approach is commonly seen in animal literature, but less so in human studies.

      · The Group x Cs-type x Phase repeated measure statistical tests with 'differentials' as outcome variables are quite complex, however, the authors have generally done a good job of teasing out significant F test findings with post hoc tests and presenting the data well visually. It is reassuring that there is a convergence between self-report data on arousal and valence and the pupil dilation response. Skin conductance is a notoriously challenging modality, so it is not too concerning that this was placed in the supplementary materials. Neural responses also occurred in logical regions with regard to reward learning.

      · Strong methodology with regards to neuroimaging analysis, and physiological measures.

      ·The authors are very clear on documenting where there were discrepancies from their pre-registration and providing valid rationales for why.

      We thank reviewer 1 for the positive feedback and for pointing out the strengths of our work. We agree that future research should investigate varying times between acquisition and counterconditioning to assess its success in real-life applications.

      Major Weaknesses

      (1) The statistics showing that counterconditioning prevents differential spontaneous recovery are the weakest p values of the paper (and using one-tailed tests, although this is valid due to directions being pre-hypothesized). This may be due to a relatively small number of participants and some variability in responses. It is difficult to see how many people were included in the final PDR and neuroimaging analyses, with exclusions not clearly documented. Based on Figure 3, there are relatively small numbers in the PDR analyses (n=14 and n=12 in counterconditioning and extinction, respectively). Of these, each group had 4 people with differential PDR results in the opposing direction to the group mean. This perhaps warrants mention as the reported effects may not hold in a subgroup of individuals, which could have clinical implications.

      General exclusion criteria are described on page 17. We have added more detailed information on the reasons for exclusion (see page 17). All exclusions were in line with pre-registered criteria. For the analysis, the reviewer is referring to (PDR analysis that investigated whether CC can prevent the spontaneous recovery of differential conditioned threat responses), 18 participants were excluded from this analysis: 2 participants did not show evidence for successful threat acquisition as was already indicated on page 17, and 16 participants were excluded due to (partially) missing data. We now explicitly mention the exclusion of the additional 16 participants on page 7 and have updated Figure 3 to improve visibility of the individual data points. Therefore, for this analysis both experimental groups consisted of 15 participants (total N=30).

      It is true that in both groups a few participants show the opposite pattern. Although this may also be due to measurement error, we agree that it is relevant to further investigate this in future studies with larger sample sizes. It will be crucial to identify who will respond to treatments based on the principles of standard extinction or counterconditioning. We have added this point in the discussion on page 14.

      Reviewer #2:

      Summary:

      The present study sets out to examine the impact of counterconditioning (CC) and extinction on conditioned threat responses in humans, particularly looking at neural mechanisms involved in threat memory suppression. By combining behavioral, physiological, and neuroimaging (fMRI) data, the authors aim to provide a clear picture of how CC might engage unique neural circuits and coding dynamics, potentially offering a more robust reduction in threat responses compared to traditional extinction.

      Strengths:

      One major strength of this work lies in its thoughtful and unique design - integrating subjective, physiological, and neuroimaging measures to capture the various aspects of counterconditioning (CC) in humans. Additionally, the study is centered on a well-motivated hypothesis and the findings have the potential to improve the current understanding of pathways associated with emotional and cognitive control. The data presentation is systematic, and the results on behavioral and physiological measures fit well with the hypothesized outcomes. The neuroimaging results also provide strong support for distinct neural mechanisms underlying CC versus extinction.

      We thank reviewer 2 for the feedback and for valuing the thoughtfulness that went into designing the study.

      Weaknesses:

      (1) Overall, this study is a well-conducted and thought-provoking investigation into counterconditioning, with strong potential to advance our understanding of threat modulation mechanisms. Two main weaknesses concern the scope and decisions regarding analysis choices. First, while the findings are solid, the topic of counterconditioning is relatively niche and may have limited appeal to a broader audience. Expanding the discussion to connect counterconditioning more explicitly to widely studied frameworks in emotional regulation or cognitive control would enhance the paper's accessibility and relevance to a wider range of readers. This broader framing could also underscore the generalizability and broader significance of the results. In addition, detailed steps in the statistical procedures and analysis parameters seem to be missing. This makes it challenging for readers to interpret the results in light of potential limitations given the data modality and/or analysis choices.

      In this updated version of the manuscript, we included the notion that extinction has been interpreted as a form of implicit emotion regulation. In addition to our discussion on active coping (avoidance), we believe that our discussion has an important link to the more general framework of emotion regulation, while remaining within the scope of relevance. Please see pages 14 and 15 for the changes. In addition to being informative to theories of emotion regulation, our findings are also highly relevant for forms of psychotherapy that build on principles of counterconditioning (e.g. the use of positive reinforcement in cognitive behavioral therapy), as we point out in the introduction. We believe this relevance shows that counterconditioning is more than a niche topic. In line with the recommendation from reviewer 2, we added more details and explanations to the statistical procedures and analyses where needed (see responses to recommendations).

      Reviewer #3:

      Summary:

      In this manuscript, Wirz et al use neuroimaging (fMRI) to show that counterconditioning produces a longer lasting reduction in fear conditioning relative to extinction and appears to rely on the nucleus accumbens rather than the ventromedial prefrontal cortex. These important findings are supported by convincing evidence and will be of interest to researchers across multiple subfields, including neuroscientists, cognitive theory researchers, and clinicians.

      In large part, the authors achieved their aims of giving a qualitative assessment of the behavioural mechanisms of counterconditioning versus extinction, as well as investigating the brain mechanisms. The results support their conclusions and give interesting insights into the psychological and neurobiological mechanisms of the processes that underlie the unlearning, or counteracting, of threat conditioning.

      Strengths:

      · Mostly clearly written with interesting psychological insights

      · Excellent behavioural design, well-controlled and tests for a number of different psychological phenomena (e.g. extinction, recovery, reinstatement, etc).

      · Very interesting results regarding the neural mechanisms of each process.

      · Good acknowledgement of the limitations of the study.

      We thank reviewer 3 for the detailed feedback and suggestions.

      Weaknesses:

      (1) I think the acquisition data belongs in the main figure, so the reader can discern whether or not there are directional differences prior to CC and extinction training that could account for the differences observed. This is particularly important for the valence data which appears to differ at baseline (supplemental figure 2C).

      Since our design is quite complex with a lot of results, we left the fear acquisition results as a successful manipulation check in the Supplementary Information to not overload the reader with information that is not the main focus of this manuscript. If the editor would like us to add the figure to the main text, we are happy to do so. During fear acquisition, both experimental groups showed comparable differential conditioned threat responses as measured by PDRs and SCRs. Subjective valence ratings indeed differed depending on CS category. Importantly, however, the groups only differed with respect to their rating to the CS- category, but not the CS+ category, which suggests that the strength of the acquired fear is similar between the groups. To make sure that these baseline differences cannot account for the differences in valence after CC/Ext, we ran an additional group comparison with differential valence ratings after fear acquisition added as a covariate. Results show that despite the baseline difference, the group difference in valence after CC/Ext is still significant (main effect Group: F<sub>(1,43)</sub>=7.364, p=0.010, η<sup>2</sup>=0.146). We have added this analysis to the manuscript (see page 7).

      (2) I was confused in several sections about the chronology of what was done and when. For instance, it appears that individuals went through re-extinction, but this is just called extinction in places.

      We understand that the complexity of the design may require a clearer description. We therefore made some changes throughout the manuscript to improve understanding. Figure 1 is very helpful in understanding the design and we therefore refer to that figure more regularly (see pages 6-7). We also added the time between tasks where appropriate (e.g. see page 7). Re-extinction after reinstatement was indeed mentioned once in the manuscript. Given that the reinstatement procedure was not successful (see page 9), we could not investigate re-extinction and it is therefore indeed not relevant to explicitly mention and may cause confusion. We therefore removed it (see page 12).

      (3) I was also confused about the data in Figure 3. It appears that the CC group maintained differential pupil dilation during CC, whereas extinction participants didn't, and the authors suggest that this is indicative of the anticipation of reward. Do reward-associated cues typically cause pupil dilation? Is this a general arousal response? If so, does this mean that the CSs become equally arousing over time for the CC group whereas the opposite occurs for the extinction group (i.e. Figure 3, bottom graphs)? It is then further confusing as to why the CC group lose differential responding on the spontaneous recovery test. I'm not sure this was adequately addressed.

      Indeed, reward and reward anticipation also evoke an increase in pupil dilation. This was an important reason for including a separate valence-specific response characterization task. Independently from the conditioning task, this task revealed that both threat and reward-anticipation induced strong arousal-related PDRs and SCRs. This was also reflected in the explicit arousal ratings, which were stronger for both the shock-reinforced (negative valence) and reward-reinforced (positive valence) stimuli. Therefore, it is not surprising that reward anticipation leads to stronger PDRs for CS+ (which predict reward) compared to CS- stimuli (which do not predict reward) during CC, but is reduced during extinction due to a decrease in shock anticipation. During the spontaneous recovery test, a return of stronger PDRs for CS+ compared to CS- stimuli in the standard extinction group can only reflect a return of shock anticipation. Importantly, the CC group received no rewards during the spontaneous recovery task and was aware of this, so it is to be expected that the effect is weakened in the CC group. However, CS+ and CS- items were still rated of similar valence and PDRs did not differ between CS+ and CS- items in the CC group, whereas the Ext group rated the CS+ significantly more negative and threat responses to the CS+ did return. It therefore is reasonable to conclude that associating the CS+ with reward helps to prevent a return of threat responses. We have added some clarifications and conclusions to this section on page 8.

      (4) I am not sure that the memories tested were truly episodic

      In line with previous publications from Dunsmoor et al.[1-4], our task allows for the investigation of memory for elements of a specific episode. In the example of our task, retrieval of a picture probes retrieval of the specific episode, in which the picture was presented. In contrast, fear retrieval relies on the retrieval of the category-threat association, which does not rely on retrieval of these specific episodic elements, but could be semantic in nature, as retrieval takes place at a conceptual level. We have added a small note on what we mean with episodic in this context on page 4. We do agree that we cannot investigate other aspects of episodic memories here, such as context, as this was not manipulated in this experiment.

      (5) Twice as many female participants than males

      It is indeed unfortunate that there is no equal distribution between female and male participants. Investigating sex differences was not the goal of this study, but we do hope that future studies with the appropriate sample sizes are able to investigate this specifically. We have added this to the limitations of this study on page 17.

      (6) No explanation as to why shocks were varied in intensity and how (pseudo-randomly?)

      The shock determination procedure is explained on pages 18-19 (Peripheral stimulation). As is common in fear conditioning studies in humans (see references), an ascending staircase procedure was used. The goal of this procedure is to try and equalize the subjective experience of the electrical shocks to be “maximally uncomfortable but not painful”.

      Recommendations for the authors:

      Reviewer #1:

      Very well written. No additional comments

      We thank reviewer 1 for valuing our original manuscript version. To further improve the manuscript, we adapted the current version based on the reviewer’s public review (see response to reviewer #1 public review comment 1).

      Reviewer #2:

      (1) I feel that more justification/explanation is needed on why other regions highly relevant to different aspects of counterconditioning (e.g., threat, memory, reward processing) were not included in the analyses.

      We first performed whole-brain analyses to get a general idea of the different neural mechanisms of CC compared to Ext. Clusters revealing significant group differences were then further investigated by means of preregistered ROI analyses. We included regions that have previously been shown to be most relevant for affective processing/threat responding (amygdala), memory (hippocampus), reward processing (NAcc) and regular extinction (vmPFC). We restricted our analyses to these most relevant ROIs as preregistered to prevent inflated or false-positive findings[5]. Beyond these preregistered ROIs, we applied appropriate whole-brain FEW corrections. The activated regions are listed in Supplementary Table 1 and include additional regions that were expected, such as the ACC and insula.

      (2) Were there observed differences across participants in the experiment? Any information on variance in the data such as how individual differences might influence these findings would provide a richer understanding of counterconditioning and increase the depth of interpretation for a broad readership.

      We agree that investigating individual differences is crucial to gain a better understanding of treatment efficacy in the framework of personalized medicine. Specifically, future research should aim to identify factors that help predict which treatment will be most effective for a particular patient. The results of this study provide a good basis for this, as we could show that the vmPFC in contrast to regular extinction, is not required in CC to improve the retention of safety memory. Therefore, this provides a viable option for patients who are not responding to treatments that rely on the vmPFC. In addition, as noted by Reviewer 1, in both groups a few participants show the opposite pattern (see Figure 3). It will be crucial to identify who will respond to treatments based on the principles of standard extinction or counterconditioning. We have added this point in the discussion on page 14.

      (3) While most figures are informative and clear, Figure 3 would benefit from detailed axis labels and a more descriptive caption. Currently, it is challenging to navigate the results presented to support the findings related to differential PDRs. A supplementary figure consolidating key patterns across conditions might also further facilitate understanding of this rather complicated result.

      We have made some changes to the figure to improve readability and understanding. Specifically, we changed the figure caption to “Change from last 2 trials CC/Ext to first 2 trials Spontaneous recovery test”, to give more details on what exactly is shown here. We also simplified the x-axis labels to “counterconditioning”, “recovery test” and “extinction”. With the addition of a clearer figure description, we hope to have improved understanding and do not think that another supplemental figure is needed.

      (4) Additional details on the statistical tests are needed. For example, please clarify whether p-values reported were corrected across all experimental conditions. Also, it would be helpful for the authors to discuss why for example repeated measures ANOVA or mixed-effects conditions were not used in this study. Might those tests not capture variance across participants' PDRs and SCRs over time better?

      We added that significant interactions were followed by Bonferroni-adjusted post-hoc tests where applicable (see page 21). We have used repeated measures ANOVAs to capture early versus late phases of acquisition and CC/extinction, as well as to compare late CC/extinction (last 2 trials) compared to early spontaneous recovery (first 2 trials) as is often done in the literature. A trial-level factor in a small sample would cost too many degrees of freedom and is not expected to provide more information. We have added this information and our reasoning to the methods section on page 21.

      Reviewer #3:

      (1) Suggest putting acquisition data into the main figures. In fact many of the supplemental figures could be integrated into the main figures in my opinion.

      See response to reviewer #3 public review comment 1.

      (2) Include explanations for why shock intensity was varied

      See response to reviewer #3 public review comment 6.

      (3) Include a better explanation for the change in differential responding from training to spontaneous recovery in the CC group (I think the loss of such responding in extinction makes more sense and is supported by the notion of spontaneous recovery, but I'm not sure about the loss in the CC group. There is some evidence from the rodent literature - which I am most familiar with - regarding a loss in contextual gradient across time which could account for some loss in specificity, could it be something like this?).

      See response to reviewer #3 public review comment 3.

      If we understand the reviewer correctly in that the we see a loss of differential responding due to a generalization to the CS-, this would imply an increase in responding to the CS-, which is not what we see. Our data should therefore be correctly interpreted as a loss of the specific response to the CS+ from the CC phase to the recovery test. Therefore, there is no spontaneous recovery in the CC group, and also not a non-specific recovery. To clarify this we relabeled Figure 3 by indicating “recovery test” instead of “spontaneous recovery”.

      (4) Is there a possibility that baseline differences, particularly that in Supplemental Figure 2C, could account for later differences? If differences persist after some transformation (e.g. percentage of baseline responding) this would be convincing to suggest that it doesn't.

      See response to reviewer #3 public review comment 1.

      (5) As I mentioned, I got confused by the chronology as I read through. Maybe mention early on when reporting the spontaneous recovery results that testing occurred the next day and that participants were undergoing re-extinction when talking about it for the second time.

      See response to reviewer #3 public review comment 2.

      (6) Page 8 - I was confused as to why it is surprising that the CC group were more aroused than the extinction group, the latter have not had CSs paired with anything with any valence, so doesn't this make sense? Or perhaps I am misunderstanding the results - here in text the authors refer back to Figure 2B, but I'm not sure if this is showing data from the spontaneous recovery test or from CC/extinction. If it is the latter, as the caption suggests, why are the authors referring to it here?

      Participants in the CC group showed increased differential self-reported arousal after CC, whereas arousal ratings did not differ between CS+ and CS- items after extinction. We interpret this in line with the valence and PDR results as an indication of reward-induced arousal. At the start of the next day, however, participants from the CC and extinction groups gave comparable ratings. It may therefore be surprising why participants in the CC group do not still show stronger ratings since nothing happened between these two ratings besides a night’s sleep (see design overview in Figure 1A). We removed the “suprisingly” to prevent any confusion.

      (7) I suggest that the authors comment on whether there were any gender differences in their results.

      See response to reviewer #3 public review comment 5.

      (8) The study makes several claims about episodic memory, but how can the authors be sure that the memories they are tapping into are episodic? Episodic has a very specific meaning - a biographical, contextually-based memory, whereas the information being encoded here could be semantic. Perhaps a bit of clarification around this issue could be helpful.

      See response to reviewer #3 public review comment 4.

      References

      (1) Dunsmoor, J. E. & Kroes, M. C. W. Episodic memory and Pavlovian conditioning: ships passing in the night. Curr Opin Behav Sci 26, 32-39 (2019). https://doi.org/10.1016/j.cobeha.2018.09.019

      (2) Dunsmoor, J. E. et al. Event segmentation protects emotional memories from competing experiences encoded close in time. Nature Human Behaviour 2, 291-299 (2018). https://doi.org/10.1038/s41562-018-0317-4

      (3) Dunsmoor, J. E., Murty, V. P., Clewett, D., Phelps, E. A. & Davachi, L. Tag and capture: how salient experiences target and rescue nearby events in memory. Trends Cogn Sci 26, 782-795 (2022). https://doi.org/10.1016/j.tics.2022.06.009

      (4) Dunsmoor, J. E., Murty, V. P., Davachi, L. & Phelps, E. A. Emotional learning selectively and retroactively strengthens memories for related events. Nature 520, 345-348 (2015). https://doi.org/10.1038/nature14106

      (5) Gentili, C., Cecchetti, L., Handjaras, G., Lettieri, G. & Cristea, I. A. The case for preregistering all region of interest (ROI) analyses in neuroimaging research. Eur J Neurosci 53, 357-361 (2021). https://doi.org/10.1111/ejn.14954

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      Summary:

      Audio et al. measured cerebral blood volume (CBV) across cortical areas and layers using high-resolution MRI with contrast agents in non-human primates. While the non-invasive CBV MRI methodology is often used to enhance fMRI sensitivity in NHPs, its application for baseline CBV measurement is rare due to the complexities of susceptibility contrast mechanisms. The authors determined the number of large vessels and the areal and laminar variations of CBV in NHP, and compared those with various other metrics.

      Strengths:

      Noninvasive mapping of relative cerebral blood volume is novel for non-human primates. A key finding was the observation of variations in CBV across regions; primary sensory cortices had high CBV, whereas other higher areas had low CBV. The measured CBV values correlated with previously reported neuronal and receptor densities.

      We appreciate your recognition of the novelty of our non-invasive relative cerebral blood volume (CBV) mapping in non-human primates, as well as the observed areal variations and their correlations with neuronal and receptor densities. However, we are concerned that key contributions of our work—such as cortical layer-specific vasculature mapping and benchmarking surface vessel density estimations against anatomical ground truth—are being framed as limitations rather than significant advances in the field pushing the boundaries of current neuroimaging capabilities and providing a valuable foundation for future research. Additionally, we would like to clarify that dynamic susceptibility contrast (DSC) MRI using gadolinium is the gold standard for CBV measurement in clinical settings and the argument that “baseline CBV measurements are rare due to the complexities of susceptibility contrast” is simply not true. The limited use of ferumoxytol for CBV imaging is primarily due to previous FDA regulatory restrictions, rather than inherent methodological shortcomings.

      Changes in text:

      Compared to clinically used gadolinium-based agents, ferumoxytol's substantially longer half-life and stronger R<sub>2</sub>* effect allows for higher-resolution and more sensitive vascular volume measurements (Buch et al., 2022), albeit these methodologies are hampered by confounding factors such as vessel orientation relative to the magnetic field (B<sub>0</sub>) direction (Ogawa et al., 1993).

      Weaknesses:

      A weakness of this manuscript is that the quantification of CBV with postprocessing approaches to remove susceptibility effects from pial and penetrating vessels is not fully validated, especially on a laminar scale. Further specific comments follow.

      (1) Baseline CBV indices were determined using contrast agent-enhanced MRI (deltaR<sub>2</sub>*). Although this approach is suitable for areal comparisons, its application at a laminar scale poses challenges due to significant contributions from large vessels including pial vessels. The primary concern is whether large-vessel contributions can be removed from the measured deltaR<sub>2</sub>* through processing techniques.

      Eliminating the contribution of large vessels completely is unlikely, and we agree with the reviewer that ΔR<sub>2</sub>* results likely reflect a weighted combination of signals from both large vessels and capillaries. However, the distribution of ΔR<sub>2</sub>* more closely aligns with capillary density in areas V1–V5 than with large vessel distributions (Weber et al., 2008), suggesting that our ΔR<sub>2</sub>* results are more weighted toward capillaries. Moreover, we demonstrated that the pial vessel induced signal-intensity drop-outs are clearly limited to the superficial layers and exhibit smaller spatial extent than generally thought (Supp. Figs. 2 and 4).

      (2) High-resolution MRI with a critical sampling frequency estimated from previous studies (Weber 2008, Zheng 1991) was performed to separate penetrating vessels. However, this approach is still insufficient to accurately identify the number of vessels due to the blooming effects of susceptibility and insufficient spatial resolution. The reported number of penetrating vessels is only applicable to the experimental and processing conditions used in this study, which cannot be generalized.

      Our intention was not to suggest that our measurements provide a general estimate of vessel density across the macaque cerebral cortex. At 0.23 mm isotropic resolution, we successfully delineated approximately 30% of the penetrating vessels in V1. Our primary objective was to demonstrate a proof-of-concept quantifiable measurement rather than to establish a generalized vessel density metric for all brain regions. We have consistently emphasized this throughout the manuscript, but if there is a specific point of misunderstanding, we would be happy to consider revisions for clarity.

      (3) Baseline R<sub>2</sub>* is sensitive to baseline R<sub>2</sub>, vascular volume, iron content, and susceptibility gradients. Additionally, it is sensitive to imaging parameters; higher spatial resolution tends to result in lower R<sub>2</sub>* values (closer to the R<sub>2</sub> value). Thus, it is difficult to correlate baseline R<sub>2</sub>* with physiological parameters.

      The observed correlation between R<sub>2</sub>* and neuron density is likely indirect, as R<sub>2</sub>* is strongly influenced by iron, myelin, and deoxyhemoglobin densities. However, the robust correlation between R<sub>2</sub>* and neuron density, peaking in the superficial layers (R = 0.86, p < 10<sup>-10</sup>), is striking and difficult to ignore (revised Supp. Fig. 6D-E). Upon revision, we identified an error in Supp. Fig. 6D-E, where the previous version used single-subject R<sub>2</sub>* and ΔR<sub>2</sub>* maps instead of the group-averaged maps. The revised correlations are slightly stronger than in the earlier version.

      Given that the correlation between neuron density and R<sub>2</sub>* is strongest in the superficial layers, we suggest this relationship reflects an underlying association with tissue cytochrome oxidase (CO) activity and cumulative effect of deoxygenated venous blood drainage toward the pial network. The superficial cortical layers are also less influenced by myelin and iron densities, which are more concentrated in the deeper cortical layers. Additional factors may contribute to this relationship, including the iron dependence of mitochondrial CO activity, as iron is an essential component of CO’s heme groups. Moreover, myelin maintenance depends on iron, which is predominantly stored in oligodendrocytes. The presence of myelinated thin axons and a higher axonal surface density may, in turn, be a prerequisite for high neuron density.

      In this context, it is also valuable to note the absolute range of superficial R<sub>2</sub>* values (≈ 6 s<sup>-1</sup>; Supp. Fig. 6D). This variation in cortical surface R<sub>2</sub>* is about 12-30 times larger compared to the signal changes observed during task-based fMRI (6 vs. 0.2-0.5 s<sup>-1</sup>). This relation seems reasonable because regional increases in absolute blood flow associated with imaging signals, as measured by PET, typically do not exceed 5%–10% of the brain's resting blood flow (Raichle and Mintum 2016; Brain work and brain imaging). The venous oxygenation level is typically 60%, with task-induced activation increasing it by only a few percent. We suggest that this is ~40% oxygen extraction is reflected in the superficial R<sub>2</sub>*. Finally, the large intercept (≈ 14.5 1/s; Supp. Fig. 6D), which is not equivalent to the water R<sub>2</sub>* (≈ 1 1/s), suggests that R<sub>2</sub>* is influenced by substantial non-neuron density factors, such as receptor, myelin, iron, susceptibility gradients and spatial resolution.

      The R<sub>2</sub>* values are well known to be influenced by intra-voxel phase coherence and thus spatial resolution. However, our view is that the proposed methodology of acquiring cortical-layer thickness adjusted high-resolution (spin-echo) R<sub>2</sub> maps poses more methodological limitations and is less practical. Notwithstanding, to further corroborate the relationship between R<sub>2</sub>* and neuron density, we investigated whether a similar correlation exists in non-quantitative T2w SPACE-FLAIR images (0.32 mm isotropic) signal-intensity and neuron density. Using B<sub>1</sub> bias-field and B<sub>0</sub> orientation bias corrected T2w SPACE-FLAIR images (N=7), we parcellated the equivolumetric surface maps using Vanderbilt sections. Our findings showed that signal intensity—where regions with high signal intensity correspond to low R<sub>2</sub> values, and areas with low signal intensity correspond to high R<sub>2</sub> values—was positively correlated with neuron density, particularly in the superficial layers (R = 0.77, p = 10<sup>-11</sup>; Author response image 1).This analysis confirmed the correlation with neuron density and R<sub>2</sub> peaks at superficial layers. However, this correlation was slightly weaker compared to quantitative R<sub>2</sub>* (Supp. Fig. 6D), suggesting the variable flip-angle spin-echo train refocused signal-phase coherence loss from large draining vessels or that non-quantitative T2w-FLAIR images may be confounded by other factors such as B<sub>1</sub> transmission field biases (Glasser et al., 2022). Notwithstanding, this non-quantitative fast spin-echo with variable flip-angles approach, which is in principle less dependent on image resolution and closer to R<sub>2,intrinsic</sub> than R<sub>2</sub>*, yields similar findings in comparison to quantitative gradient-echo.

      Author response image 1.

      (A) T2w-FLAIR SPACE normalized signal-intensity plotted vs neuron density. Note that low signal-intensity corresponds to high R<sub>2</sub> and high neuron density, consistent with findings using ME-GRE. (B) Correlation between T2w-FLAIR SPACE and neuron density across equivolumetric layers. Notably, a similar relationship with neuron density was observed using a variable spin-echo pulse sequence as with quantitative gradient-echo-based imaging.

      Changes in text:

      Results:

      “Because the Julich cortical area atlas covers only a section of the cerebral cortex, and the neuron density estimates are interpolated maps, we extended our analysis using the original Collins sample borders encompassing the entire cerebral cortex (Supp. Fig. 6A-C). This analysis reaffirmed the positive correlation with ΔR<sub>2</sub>* (peak at EL2, R = 0.80, p < 10<sup>-11</sup>) and baseline R<sub>2</sub>* (peak at EL2a, R = 0.86, p < 10<sup>-13</sup>), yielding linear coefficients of ΔR<sub>2</sub>* = 102 × 10<sup>3</sup> neurons/s and R<sub>2</sub>* = 41 × 10<sup>3</sup> neurons/s (Supp. Fig. 6D-G). This suggests that the sensitivity of quantitative layer R<sub>2</sub>* MRI in detecting neuronal loss is relatively weak, and the introduction of the Ferumoxytol contrast agent has the potential to enhance this sensitivity by a factor of 2.5.”

      A new paragraph was added into discussion section 4.3 corroborating the relation between R<sub>2</sub>* and neuron density:

      “Another key finding of this study was the strong correlation between baseline R<sub>2</sub>* and neuron density (Supp. Fig. 6D, E). While R<sub>2</sub>* is well known to be influenced by iron, myelin, and deoxyhemoglobin densities, this correlation peaks in the superficial layers (Supp. Fig. 6E), suggesting a link to CO activity and the accumulation of deoxygenated venous blood draining from all cortical layers toward the pial network. Notably, the absolute range of superficial R<sub>2</sub>* values (max - min ≈ 6 s<sup>-1</sup>; Supp. Fig. 6D) is approximately 12-30 times larger than the ΔR<sub>2</sub>* observed during task-based BOLD fMRI at 3T (0.2-0.5 1/s) (Yablonskiy and Haacke 1994). Since venous oxygenation is around 60% and task-induced changes in blood flow account for only 5%–10% of the brain's resting blood flow (Raichle & Mintun, 2006), these results suggest that superficial R<sub>2</sub>* (Fig. 1D) may serve as a more accurate proxy for total deoxyhemoglobin content (and thus total oxygen consumption), which scales with the neuron density of the underlying cortical gray matter. Importantly, superficial layers may also provide a more specific measure of deoxyhemoglobin, as they are less influenced by myelin and iron, which are more concentrated in deeper cortical layers. Additionally, smaller but direct contributors, such as mitochondrial CO density—an iron-dependent factor—may also play a role in this relationship.”

      References:

      Raichle, M.E., Mintun, M.A., 2006. BRAIN WORK AND BRAIN IMAGING. Annu. Rev. Neurosci. 29, 449–476. https://doi.org/10.1146/annurev.neuro.29.051605.112819

      (4) CBV-weighted deltaR<sub>2</sub>* is correlated with various other metrics (cytoarchitectural parcellation, myelin/receptor density, cortical thickness, CO, cell-type specificity, etc.). While testing the correlation between deltaR<sub>2</sub>* and these other metrics may be acceptable as an exploratory analysis, it is challenging for readers to discern a causal relationship between them. A critical question is whether CBV-weighted deltaR<sub>2</sub>* can provide insights into other metrics in diseased or abnormal brain states.

      We acknowledge that having multivariate analysis using dense histological maps would be valuable to establish causality among these several metrics:

      “To comprehensively understand the factors contributing to the vascular organization of the brain, experimental disentanglement through multivariate analysis of laminar cell types and receptor densities is needed (Hayashi et al., 2021, Froudist-Walsh et al., 2023). Moreover, employing more advanced statistical modeling, including considerations for synapse-neuron interactions, may be important for refined evaluations.”

      We think the primary contributors to the brain's energy budget are neurons and receptors, as shown in several references and stated in the manuscript. To investigate relationship between neuron density and CBV, we estimated the energy budget allocated to neurons and extrapolated the remaining CBV to other contributing factors:

      Changes in text:

      “However, this is a simplified estimation, and a more comprehensive assessment would need to account for an aggregate of biophysical factors such as neuron types, neuron membrane surface area, firing rates, dendritic and synaptic densities (Fig. 6F-G), neurotransmitter recycling, and other cell types (Kageyama 1982; Elston and Rose 1997; Perge et al., 2009; Harris et al., 2012). Indeed, the majority of the mitochondria reside in the dendrites and synaptic transmission is widely acknowledged to drive the majority of the energy consumption and blood flow (Wong-Riley, 1989; Attwell et al., 2001).

      Extrapolating cortical ΔR<sub>2</sub>* to zero neuron density results in a large intercept (~35 1/s), corresponding to 60% of the maximum cortical CBV (57 1/s; Supp. Fig. 6F). This supports the view that the majority of energy consumption occurs in the neuropil—comprising dendrites, synapses, and axons—which accounts for ~80–90% of cortical gray matter volume, whereas neuronal somata constitute only ~10–20% (Wong-Riley, 1989). Although neuronal cell bodies exhibit higher CO activity per unit volume due to their dense mitochondrial content, these results suggest their overall contribution to the total CBV per mm<sup>3</sup> tissue remains lower than that of the neuropil, given the latter's substantially larger volume fraction in cortical tissue.

      Contrary to our initial expectations, we observed a relatively smaller CBV in regions and layers with high receptor density (Fig. 6B, D, F). This relationship extends to other factors, such as number of spines (putative excitatory inputs) and dendrite tree size across the entire cerebral cortex (Supp. Fig. 7) (Froudist-Walsh et al., 2023, Elston 2007). These results align with the work of Weber and colleagues, who reported a similar negative correlation between vascular length density and synaptic density, as well as a positive correlation with neuron density in macaque V1 across cortical layers (Weber et al., 2008).”

      Variations in neurons and receptors are reflected in cytoarchitecture, myelin (axon density likely scales with neuron density and myelin inhibits synaptic connections), and cell-type composition. For example, fast-spiking parvalbumin interneurons, which target the soma or axon hillock, are well-suited for regulating activity in regions with high neuron density, whereas bursting calretinin interneurons, which target distal dendrites, are more adapted to areas with high synaptic density. These factors in turn, gradually change along the cortical hierarchy level (higher levels have thinner cortical layer IV, more complex dendrite trees and more numerous inter-areal connectivity patterns). In our view, these factors are tightly interlinked and explain the strong correlations and metabolic demands observed across different metrics.

      We also agree that cortical layer imaging of vasculature in diseased or abnormal brain states is an intriguing direction for future research; however, it falls beyond the scope of the present study.

      Reviewer #2 (Public review):

      Summary:

      This manuscript presents a new approach for non-invasive, MRI-based, measurements of cerebral blood volume (CBV). Here, the authors use ferumoxytol, a high-contrast agent and apply specific sequences to infer CBV. The authors then move to statistically compare measured regional CBV with known distribution of different types of neurons, markers of metabolic load and others. While the presented methodology captures and estimated 30% of the vasculature, the authors corroborated previous findings regarding lack of vascular compartmentalization around functional neuronal units in the primary visual cortex.

      Strengths:

      Non invasive methodology geared to map vascular properties in vivo.

      Implementation of a highly sensitive approach for measuring blood volume.

      Ability to map vascular structural and functional vascular metrics to other types of published data.

      Weaknesses:

      The key issue here is the underlying assumption about the appropriate spatial sampling frequency needed to captures the architecture of the brain vasculature. Namely, ~7 penetrating vessels / mm2 as derived from Weber et al 2008 (Cer Cor). The cited work, begins by characterizing the spacing of penetrating arteries and ascending veins using vascular cast of 7 monkeys (Macaca mulatta, same as in the current paper). The ~7 penetrating vessels / mm2 is computed by dividing the total number of identified vessels by the area imaged. The problem here is that all measurements were made in a "non-volumetric" manner and only in V1. Extrapolating from here to the entire brain seems like an over-assumption, particularly given the region-dependent heterogeneity that the current paper reports.

      We appreciate the reviewer’s concerns regarding spatial sampling frequency and its implications for characterizing brain vasculature, which we investigated in this study. To clarify, our analysis of surface vessel density was explicitly restricted to V1 precisely due to the limitations of our experimental precision. While we reported the total number of vessels identified in the cortex, we intentionally chose not to present density values across regions in this manuscript. Although these calculations are feasible, we focused on the data directly analyzed and avoided extrapolating density values beyond the scope of our findings. Thus, we are uncertain about the suggestion that we extrapolated vessel density values across the entire brain, as we have taken care to limit our conclusions of our vessel density precision to V1.

      Regarding methodology, we conducted two independent analyses of vessel density specifically in V1. The first involved volumetric analysis using the Frangi filter, while the second used surface-based analysis of local signal-intensity gradients (as illustrated in Fig. 2E and Supp. Figs. 3 and 4), albeit the final surface density analysis is performed using the ultra-high resolution equivolumetric layers. Notably, these two approaches produced consistent and comparable vessel density estimates, supporting the reliability of our findings within the scope of V1 (we found 30% of the vessels relative to the ground-truth).

      Comments on revisions:

      I appreciate the effort made to improve the manuscript. That said, the direct validation of the underlying assumption about spatial resolution sampling remains unaddressed in the final version of this manuscript. With the only intention to further strengthen the methodology presented here, I would encourage again the authors to seek a direct validation of this assumption for other brain areas.

      In their reply, the authors stated "... line scanning or single-plane sequences, at least on first impression, seem inadequate for whole-brain coverage and cortical surface mapping. ". This seems to emanate for a misunderstanding as the method could be used to validate the mapping, not to map per-se.

      We apologize for any misunderstanding in our previous response and appreciate your clarification. We now understand that you were suggesting the use of line-scanning or single-plane sequences as a method to validate, rather than map, our spatial sampling assumptions.

      We agree that single-plane sequences at very high in-plane resolution (e.g., 50 × 50 × 1000 µm) have great potential to detect penetrating vessels and even vessel branching patterns. These techniques could indeed provide valuable insights into region-specific vessel density variations which could then be used to validate whole brain 3D acquisitions. However, as noted above, we have refrained from reporting vessel densities outside V1 precisely due to sampling limitations (we only found 30% of the penetrating vessels in V1, or only 2 mm<sup>2</sup>/30mm<sup>2</sup> ≈ 7% of branching vessel ground-truth, see discussion).

      We acknowledge the merit of incorporating such methods to validate regional vessel densities and agree that this would be an important avenue for future research. Thank you for suggesting this point, we have briefly mentioned the advantage of single-plane EPI at discussion.

      Changes in text:

      “4.1 Methodological considerations - vessel density informed MRI

      …anatomical studies accounting for branching patterns have reported much higher vessel densities up to 30 vessels/mm<sup>2</sup> (Keller et al., 2011; Adams et al., 2015). Further investigations are warranted, taking into account critical sampling frequencies associated with vessel branching patterns (Duverney 1981), and achieving higher SNR through ultra-high B<sub>0</sub> MRI (Bolan et al., 2006; Harel et al., 2010; Kim et al., 2013) and utilize high-resolution single-plane sequences and prospective motion correction schemes to accurately characterize regional vessel densities. Such advancements hold promise for improving vessel quantification, classifications for veins and arteries and constructing detailed cortical surface maps of the vascular networks which may have diagnostic and neurosurgical utilities (Fig. 2A, B) (Iadecola, 2013; Qi and Roper, 2021; Sweeney et al., 2018).”

      During the revision we found a typo and corrected it in Supp. Fig. 8: Dosal -> Dorsal.

    1. Author response:

      The following is the authors’ response to the original reviews

      eLife Assessment

      This study provides useful findings about the effects of heterozygosity for Trio variants linked to neurodevelopmental and psychiatric disorders in mice. However, the strength of the evidence is limited and incomplete mainly because the experimental flow is difficult to follow, raising concerns about the conclusions' robustness. Clearer connections between variables, such as sex, age, behavior, brain regions, and synaptic measures, and more methodological detail on breeding strategies, test timelines, electrophysiology, and analysis, are needed to support their claims.

      We appreciate the opportunity to address the constructive feedback provided by eLife and the reviewers. Below, we respond to the overall assessment and individual reviewers' comments, clarifying our experimental approach, addressing concerns, and providing additional details where necessary.

      We thank the editors for highlighting the significance of our findings regarding the effects of Trio variant heterozygosity in mice. We acknowledge the feedback concerning the experimental flow and agree that clarity is paramount. To address these concerns:

      (1) Connections between variables: The word limit of the initial submission constrained our ability to provide adequate details and connections between variables. We have revised the manuscript to explicitly outline and extend explanations and the relationships between sex, age, behavior, brain regions, and synaptic measures, ensuring that the rationale for each experiment and its relevance to the overall conclusions are improved.

      (2) Methodological details: The Methods section of our initial submission was condensed, with key details provided in the Supplemental Methods section. We have merged all into an extended section to improve clarity. We have expanded our description of breeding strategies, test timelines, electrophysiological protocols, and data analysis methods in the revised Methods section. We believe the additions have enhanced the transparency and reproducibility of our study and ensured full support of our conclusions.

      (3) Experimental flow: We have revised and extended our results, methods, and discussion sections to clarify the rationale and experimental design to guide readers through the experimental sequence and rationale.

      We are confident these revisions address the concerns raised and enhance the robustness and coherence of our findings.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study explores how heterozygosity for specific neurodevelopmental disorder-associated Trio variants affects mouse behavior, brain structure, and synaptic function, revealing distinct impacts on motor, social, and cognitive behaviors linked to clinical phenotypes. Findings demonstrate that Trio variants yield unique changes in synaptic plasticity and glutamate release, highlighting Trio's critical role in presynaptic function and the importance of examining variant heterozygosity in vivo.

      Strengths:

      This study generated multiple mouse lines to model each Trio variant, reflecting point mutations observed in human patients with developmental disorders. The authors employed various approaches to evaluate the resulting behavioral, neuronal morphology, synaptic function, and proteomic phenotypes.

      Weaknesses:

      While the authors present extensive results, the flow of experiments is challenging to follow, raising concerns about the strength of the experimental conclusions. Additionally, the connection between sex, age, behavioral data, brain regions, synaptic transmission, and plasticity lacks clarity, making it difficult to understand the rationale behind each experiment. Clearer explanations of the purpose and connections between experiments are recommended. Furthermore, the methodology requires more detail, particularly regarding mouse breeding strategies, timelines for behavioral tests, electrophysiology conditions, and data analysis procedures.

      We appreciate the reviewer’s recognition of the novelty and comprehensiveness of our approach, particularly the generation of multiple mouse lines and our efforts to model Trio variant effects in vivo.

      Weaknesses

      (1) Experimental flow and rationale and connection between variables: We have expanded on the connections between behavioral data, neuronal morphology, synaptic function, and proteomics in the Results and Discussion sections to clarify how each experiment informs the reasoning and the conclusions and to highlight the relationships between sex, age, behavior, and synaptic measures.

      (2) Methodological details: Our initial Methods section was formatted to be short to fulfill word limits on the submitted version, with additional details provided in the Supplemental Methods section. We have merged our Methods and Supplemental Methods sections and expanded on our breeding strategies, test timelines, electrophysiological protocols, and data analysis. We believe these additions enhance the transparency and reproducibility of our study.

      (3) Recommendations for the authors: We thank Reviewer #1 for providing several recommendations to improve our manuscript. We have addressed their comments in the revision, as detailed below, adding key experiments that bolster our findings.

      Reviewer #2 (Public review):

      Summary:

      The authors generated three mouse lines harboring ASD, Schizophrenia, and Bipolar-associated variants in the TRIO gene. Anatomical, behavioral, physiological, and biochemical assays were deployed to compare and contrast the impact of these mutations in these animals. In this undertaking, the authors sought to identify and characterize the cellular and molecular mechanisms responsible for ASD, Schizophrenia, and Bipolar disorder development.

      Strengths:

      The establishment of TRIO dysfunction in the development of ASD, Schizophrenia, and Bipolar disorder is very recent and of great interest. Disorder-specific variants have been identified in the TRIO gene, and this study is the first to compare and contrast the impact of these variants in vivo in preclinical models. The impact of these mutations was carefully examined using an impressive host of methods. The authors achieved their goal of identifying behavioral, physiological, and molecular alterations that are disorder/variant specific. The impact of this work is extremely high given the growing appreciation of TRIO dysfunction in a large number of brain-related disorders. This work is very interesting in that it begins to identify the unique and subtle ways brain function is altered in ASD, Schizophrenia, and Bipolar disorder.

      Weaknesses:

      (1) Most assays were performed in older animals and perhaps only capture alterations that result from homeostatic changes resulting from prodromal pathology that may look very different.

      (2) Identification of upregulated (potentially compensating) genes in response to these disorder-specific Trio variants is extremely interesting. However, a functional demonstration of compensation is not provided.

      (3) There are instances where data is not shown in the manuscript. See "data not shown". All data collected should be provided even if significant differences are not observed.

      I consider weaknesses 1 and 2 minor. While they would be very interesting to explore, these experiments might be more appropriate for a follow-up study. I would recommend that the missing data in 3 should be provided in the supplemental material.

      We are grateful for the reviewer’s recognition of our study’s significance and methodological rigor. The acknowledgment of Trio dysfunction as a novel and impactful area of research is deeply appreciated.

      Weaknesses:

      We agree that focusing on older animals limits insights into early-stage pathophysiology. However, our goal in this study was to examine the functional impacts of Trio heterozygosity at an adolescent stage and to reveal the ultimate impact of these alleles on synaptic function. Our choice of age aligns with our objectives. Future studies of earlier developmental stages will be beneficial and complement these findings.

      Functional compensation:

      We tested functional compensation through rescue experiments in +/K1431M brain slices using a Rac1-specific inhibitor, NSC23766, which prevents Rac1 activation by Trio or Tiam1. Our finding that direct Rac1 inhibition normalizes deficient neurotransmitter release in +/K1431M mice strongly suggests that increased Rac1 activity drives this phenotype.

      Data not shown:

      We will incorporate all previously shown data into the Supplemental Materials, even when results are nonsignificant. We agree that this ensures full transparency and facilitates a more comprehensive evaluation of our findings.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) In Figure 1K-N, the lack of observed differences in +/M2145T mice across all tests raises questions about its validity as a BPD model. Furthermore, the differences in female behavior data compared to males, as shown in the Supplemental section, lack clarification-specifically, whether these variations are due to sex differences or sample size disparities, which is not discussed. Additionally, it's unclear if the same mice were used in tests K through L-N, as the reported numbers differ without explanation; if relevant, any mortality should be reported. Given the observed body weight differences, it is important to display locomotor data, despite the mention of no change in open field results. Lastly, a detailed breeding strategy and timeline for behavioral testing would enhance clarity.

      We thank Reviewer 1 for recognizing these confusing points in our behavioral data and seek to add clarification in our Revision as below:

      (a) We have revised the text to emphasize our goal to evaluate the impact of NDD-related Trio alleles that have discrete and measurable effects on brain development and function, and not to model specific NDDs (e.g. ASD, SCZ, or BPD). The three specific Trio mutations were chosen based on strong evidence of these mutations impairing the biochemical functions of Trio. We reasoned our approach would reveal how impairing Trio in different ways – i.e. altering protein level or GEF1/GEF2 function – and under genetic conditions (heterozygosity) that mimic those found in individuals with Trio-related disorders impacts brain development and function. The lack of behavioral phenotypes in +/M2145T mice is indeed intriguing, especially given the alterations in electrophysiology and biochemistry experiments. It remains possible that further behavioral analyses of these mice will reveal behavioral phenotypes.

      (b) Given that the prevalence and clinical presentation of individuals with various NDDs are influenced by sex, it is possible that the behavioral differences we see in male versus female Trio variant mice reflect human sex difference phenotypes. We have reorganized the Figure panels to clarify these sex differences in behaviors (new Fig. 2, Supp. Fig. 2). We focused on the most significant behavioral phenotypes shared by both sexes in the main text, or in males alone, as our anatomical and electrophysiological experiments were restricted to males to reduce variation due to estrus. The observed behavioral sex differences are not likely due to sample size disparities as power analyses were performed for all experimental results to ensure adequate sample size. A comprehensive study of the mechanisms underlying these behavioral findings merits examination but is outside the scope of this study.

      (c) All mice were subjected to all behavioral tests described. No sudden mortality was observed during the behavioral experiments. Outliers in post-hoc statistical analyses were removed, which explains the apparent sample size differences between behavioral tests. We have revised the Data analysis section in our Methods to include these details (Lines 216-289, 450-457).

      (d) Results of the open field test have been added to the Supplemental Data (new Supp. Fig. 2) and Results (Lines 532-537)

      (e) The Methods section was expanded to include more detail on the breeding strategy (Lines 98-106). A timeline for behavioral testing has also been included in the Figures to enhance clarity (new Fig. 2A).

      (2) In Figure 2A-E, head width and brain weight showed significant differences, but not body weight, how come the ratio does not change? Comparing with female results in Supplementary Figure 2A-E, it does show a difference between males and females. It is essential to clarify which sex authors use in all follow-up experiments, including synapse, transmission, and plasticity. Since the males and females have different phenotypes, why do the authors focus on males only? The E plot has no data points on the bar graph. In Figure 2I, it lacks example images for all four conditions.

      We greatly appreciate this Reviewer’s attention to details in our brain and body weight data and revised the manuscript to address these concerns.

      (a) The ratios of head width/body weight were calculated for each individual mouse. Hence the distribution of the ratio data (old Fig. 2D; new Fig. 3D) differs from the distribution of head width or body weight data alone (old Fig. 2A, 2C, resp.; now Fig. 3A, 3C), and therefore can affect the p-value for statistical significance. The body weight of +/M2145T males is 21.217 ±0.327 g, while for WT males is 21.745 ±0.224 g, a non-significant decrease of 0.528 g (adjusted p=0.3806). These values have been added to the Fig 3. figure legend (Lines 1020-1034) for clarity.

      (b) Similar to the behavioral experiments in comment (1), we observed sex differences in head width, brain weight, and body weight in Trio heterozygous variant mice compared to WT counterparts. The differences in the ratios of head width/body weight or brain weight/body weight were the same for both males and females (i.e. head width/body weight ratio is decreased in +/K1431M mice compared to WT regardless of sex, and brain weight/body weight ratio is decreased in both +/K1431M and +/K1918X mice compared to WT regardless of sex). These findings affirm the impact of Trio mutations on these phenotypes across both sexes. We have modified the text to draw more attention to this key point (Lines 554-566 and 777-801).

      (c) All experiments (excluding behavior and weight data) were performed in males only to minimize the variation in spine and synapse morphology and physiological activity that can occur due to estrus. We have clarified this in the ‘Animal Work’ section of the Methods (Lines 103-106) as well as in the Figure Legends.

      (d) We thank the Reviewer for pointing out Fig. 3E lacks individual data points on the bar graph. Fig. 3E has been modified to now include the brain weight/body weight ratio for each individual mouse rather than across the population, to be consistent with the calculation of head width/body weight ratio (see point 2a).

      On original submission, only a representative WT image was selected due to space constraints. The figure (new Fig. 3H and 3K) and figure legend have been revised to include representative traces for all genotypes examined.

      (3) In lines 315-320, "None of the Trio variant heterozygotes exhibited altered dendritic spine density on M1 L5 pyramidal neurons compared to WT mice on either apical or basal arbors (Supplementary Figure 3L, M). Electron microscopy of cortical area M1 L5 revealed that synapse density was significantly increased in +/K1918X mice compared to WT (Figure 3A, B), possibly due to a net reduction in neuropil resulting from smaller dendritic arbors." The proposed explanation does not adequately address the observed discrepancy between spine density and synapse density reported in these two experiments. A more thorough analysis is needed to reconcile these conflicting findings and clarify how these distinct measurements may relate to each other in the context of the study's conclusions.

      We acknowledge the apparent discrepancy between our dendritic spine density data, which is unchanged from WT for all three Trio variant heterozygotes, and our synapse density data, which showed an increase in +/K1918X M1 L5 compared to WT. We have expanded the explanation for this discrepancy below and added this to the Discussion (Lines 802-811):

      a) Because spine density can vary by dendritic branch order and distance from the soma, only protrusions from secondary dendritic arbors of M1 L5 pyramidal neurons were quantified for consistency in analyses. However, all synapses meeting criteria were quantified in EM images, regardless of where they were located along an individual neuron’s arbors. It is possible that the density and distribution of spines along other arbors are different between genotypes but was not captured in our current data.

      b) +/K1918X L5 pyramidal neurons are smaller and less complex than WT neurons, especially in the basal compartment corresponding to L5 where EM images were obtained, consistent with the smaller brain size and reduced cortical thickness of +/K1918X mice. We posit that due to their smaller dendritic field size, L5 neurons pack more densely contributing to the increased synapse density observed in +/K1918X M1 L5 cortex. Consistent with this hypothesis, we observed a trend toward increased DAPI+ cell density in M1 L5 of +/K1918X neurons (Supp. Fig. 3N).

      (4) In Figure 4, one potential rationale for measuring AMPAR mEPSC frequency is to infer synapse density changes. However, the findings show no frequency change in +/K1431M and +/K1918X, with an increase only in +/M2145T, which contradicts Figure 3 results indicating a trend toward increased density across variants.

      This inconsistency is confusing, especially since the authors claim to follow the methodology from the study "Trio Haploinsufficiency Causes Neurodevelopmental Disease-Associated Deficits"; yet, the observed mEPSC amplitude differs significantly from that study, while the frequency remains unaffected. Additionally, the NMDAR mEPSCs reflect combined AMPAR and NMDAR responses at positive holding potentials, with peak amplitude dominated by AMPAR. This inconsistency between holding potential results is unclear, as frequency should theoretically align across negative and positive potentials. For accurate NMDAR mEPSC measurement, it would be optimal to assess amplitude 50 ms post-initial peak and, if possible, increase the holding potential to enhance the driving force given the typically low signal of NMDAR response.

      We thank the Reviewer for highlighting these important points.

      a) Previous work from our lab and others demonstrate that Trio regulates synaptic AMPA receptor levels, which is why we chose to focus on AMPAR-mediated evoked and miniature EPSC frequencies and amplitudes in the current study. We acknowledge Reviewer 1’s comment on seemingly contradictory results regarding AMPAR mEPSC frequency and synapse density; however, the unchanged AMPAR mEPSC frequency in +/K1431M and +/K1918X mice is consistent with our finding of unaltered dendritic spine density in these mice compared to WT (Supp. Fig. 4L,M). The differences between dendritic spine counts and synapse density is addressed in Response (3) above.

      b) While synapse density changes can be inferred from AMPAR mEPSC frequency, mEPSCs are also measures of spontaneous neurotransmitter release changes especially in the absence of changes in synaptic numbers. Notably, the increased mEPSC frequency in the +/M2145T variant is linked to enhanced spontaneous release, not to spine or synapse density changes. These findings are reinforced by increase in counts of synaptic vesicles, calculated PPR changes, and estimates of the Pr and RRP from HFS train analysis. We have included these points in the Discussion (Lines 861-863).

      c) While it is tempting to compare the current study to our previously published conditional Trio haploinsufficiency model, we highlight key distinctions that may underlie phenotypic differences between these two mouse models. First, our prior model used a NEX-Cre transgene to ablate one Trio allele from excitatory neurons only beginning at embryonic day 11. In contrast, our Trio variants are expressed in all cell types throughout development, akin to the genetic variants found in individuals with TRIO-related disorders. Second, the Trio variant mice in this study are on a C57BL/6 background, while the Trio haploinsufficient mice were on a mixed 129Sv/J X C57BL/6 background. These differences in the current study may explain why some measures, such as mEPSC amplitude, may not align with those from the Trio conditional haploinsufficiency model.

      d) Recordings were performed using specific inhibitors to isolate AMPA and NMDA mEPSCs; these missing methodological details have now been clarified in the updated Methods section (Lines 353-360).

      (5) In Supplementary Figure 4, the sample traces indicate a higher NMDA/AMPA ratio, raising the question of whether the AMPA EPSC amplitude changes, as this could reflect PSD length. In Figure 4B, the increased AMPAR mEPSC amplitude in the +/K1918X condition compared to WT suggests an enhanced postsynaptic response, yet the PSD length is reduced in Figure 3C. Can the authors provide a potential hypothesis to explain this?

      We appreciate the Reviewer’s feedback. Yes, both evoked and miniature recordings indicate increased AMPAR amplitudes in the +/K1918X variants compared to WT. While PSD length is often linked to synaptic strength, the observed reduction in PSD length in EM PSD length reduction in +/K1918X synapses is small (~6% of WT) and clearly does not correlate with significant changes in synaptic strength. We also note that the whole cell recordings of mEPSCs represent input from all active synapses on the neuron, while PSD length is measured only in synapses of the L5.

      (6) In Figure 4, synaptic plasticity appears to decrease to around 50% of baseline; could this reduction be attributed to LTD, or might it result from changes in pipette resistance? Additionally, is the observed potentiation due to changes in presynaptic release probability? Measuring paired-pulse ratio (PPR) before and after induction would clarify this aspect.

      We thank the Reviewer for highlighting these important points.

      a) We used a well-established theta burst stimulation method for LTP induction in M1 L5 pyramidal neurons. This protocol reliably evokes LTP in WT neurons, as shown in Fig. 5J and K. Both +/K1431M and +/K1918X variants exhibit a slight but discernible increase in evoked excitatory postsynaptic currents (eEPSCs), indicative of the initiation of LTP. Although this increase is smaller compared to WT, the presence of potentiation indicates that long-term depression (LTD) is an unlikely explanation for the observed reduction.

      b) To rule out the influence of technical artifacts, pipette resistance was carefully monitored before and after LTP induction. Any cells exhibiting resistance changes exceeding 20% during electrophysiological recordings were excluded from the analysis, ensuring that fluctuations in pipette resistance did not confound LTP measurements. These technical details are denoted in the Methods (Lines 344-346 and 364-366).

      c) The potentiation in the +/M2145T variant may stem from increased release probability (Pr) and greater synaptic vesicle availability, but is beyond the scope of this work. We agree this is an intriguing question, not only for +/M2145T but also for +/K1431M mice. Future studies should address this, ideally using models where the Trio variant is selectively introduced into the presynaptic neuron.

      (7) In lines 377-380, "The +/M2145T PPR curve was unusual, with significantly reduced PPF at short ISIs, yet clearly increased PPF at longer ISI (Figure 5A, B) compared to WT." The unusual PPR observed at the 100 ms ISI appears unexpected. Can the authors provide an explanation for this anomaly? This finding could suggest atypical presynaptic dynamics or modulation at this specific interval, which may differ from typical synaptic behavior. Further insights into possible mechanisms or experimental conditions affecting this result would be valuable.

      "The decreased PPF at initial ISI in +/M2145T mice correlated with increased mEPSC frequency (Fig. 4A-C), suggestive of a possible increase in spontaneous glutamate Pr." If this is the case, it raises the question of why the increased PPR at the initial ISI in +/K1431M does not correspond to the result shown in Figure 4C. This discrepancy suggests that factors beyond initial presynaptic release probability might be influencing the observed synaptic response, or that compensatory mechanisms could be affecting PPR and mEPSC frequency differently in this variant. Further clarification on the interplay between these measurements would help resolve this inconsistency.

      We appreciate the Reviewer’s critical reading and genuine interest on this phenotype in +/M2145T mice.

      a) The unusual shift of the PPR in +/M2145T at ISI 100ms is fascinating and will require significant additional experimentation that lies beyond the scope of this report to address. We propose it results from altered presynaptic regulators, including increased Syt3 and reduced RhoA activity. Notably, Syt3 influences calcium-dependent SV replenishment, which can cause similar PPR defects (Weingarten DJ et al., 2022); this is now included in the Discussion. (Lines 915-918).

      Weingarten DJ, Shrestha A, Juda-Nelson K, Kissiwaa SA, Spruston E, Jackman SL. Fast resupply of synaptic vesicles requires synaptotagmin-3. Nature. 2022 Nov;611(7935):320-325. doi: 10.1038/s41586-022-05337-1. Epub 2022 Oct 19. PMID: 36261524.

      b) Thank you for raising the concern in clarity of this statement "The decreased PPF at initial ISI in +/M2145T mice correlated with increased mEPSC frequency (Fig. 4A-C), suggestive of a possible increase in spontaneous glutamate Pr." We have edited the sentence to be more clear (Lines 701-703). First, the K1431M and M2145T variants impact different TRIO catalytic activities disrupting distinct GTPase pathways and differentially affecting presynaptic regulators, which can lead to non-overlapping phenotypes. Also, we expand our discussion that +/K1431M variant data suggest increased AMPAR numbers and fewer silent synapses (Lines 850-855), potentially increasing AMPAR mEPSC frequency and masking the expected decrease in spontaneous release (Lines 905-910). Further experiments are needed, ideally using mixed cultures with TRIO variants in presynaptic neurons with synapses on WT neurons, as minimal stimulation variance analysis in slices would be inconclusive due to its reflection of both Pr and silent synapse changes, similar to mEPSC frequency.

      (8) In Figure 5, there is no evidence demonstrating that the NSC inhibitor functions specifically in the +/K1431M condition without affecting other conditions. To verify its specificity, the authors should test the NSC inhibitor's effects across other conditions in parallel, including a control group. Additionally, cumulative RRP measurements should be provided for a more comprehensive assessment of the inhibitor's impact on synaptic function.

      We appreciate the Reviewer’s feedback.

      a) Previous studies have shown that Rac1 activity can bidirectionally regulate synchronous release probability (Pr). We used the Rac1-specific inhibitor NSC23766 (NSC) to test how Rac1 inhibition impacted the neurotransmitter release deficits observed in +/K1431M mice. We also added control experiments testing the impact of NSC on WT slices. These new experiments are now presented in new Fig. 8 of the revised manuscript, with expanded details in the Results (Lines 737-750) and Discussion (Lines 892-900).

      b) To estimate Pr and the RRP, we employed the Decay method as described by (Ruiz et al., 2011), which does not rely on cumulative EPSC plots for RRP estimation. This approach was chosen to account for the initial facilitation in these synapses and fits are done using EPSCs plotted against stimulus number. Additional details have been provided in the Methods section  (Lines 367-373).

      Ruiz R, Cano R, Casañas JJ, Gaffield MA, Betz WJ, Tabares L. Active zones and the readily releasable pool of synaptic vesicles at the neuromuscular junction of the mouse. J Neurosci. 2011 Feb 9;31(6):2000-8. doi: 10.1523/JNEUROSCI.4663-10.2011. PMID: 21307238; PMCID: PMC6633039.

      (9) Given the relevance to NDD, specifying the age window of the mice used is crucial. It is confusing that the synaptic function studies were conducted at P42, while the proteomic analysis was performed at P21. Could the authors clarify the rationale behind using different age points for these analyses? Consistency in age selection, or an explanation for this variation, would help in interpreting the developmental relevance of the findings.

      P42 was chosen as the age as it represents young adulthood, by which time clinical features will have already presented in individuals with neurodevelopmental disorders. Our prior studies of NEX-Cre Trio<sup>-/-</sup> mice found significant measurable differences from WT at this age, after neuronal migration, differentiation, synaptogenesis and pruning have occurred. An earlier developmental timepoint, P21, which coincides with juvenile age in mice, was chosen for proteomics studies to identify earlier changes and potentially targetable and modifiable mechanisms that could influence the phenotypes we observed in older mice. The experiments in P42 versus P21 mice were originally two independent lines of investigation that converged in the current study.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Recommendations for the authors:

      Reviewer #1:

      First, I thank the authors for clarifying some of the confusion I had in the previous comment and I appreciate the efforts the authors put into improving the quality of the manuscript. However, my concerns about the lack of novelty of the key findings are not perfectly addressed and there is no additional analysis done in this revision. Currently in this version of the manuscript, asserting that a p-value of 10-6 is close to genome-wide significance may be considered an overstatement. Further analysis focusing on finding novel and additional discovery is very necessary.

      We thank the reviewer for their comments. Reviewer #2 also made a comment regarding the genomewide threshold, “However, it remains unclear why the authors found it appropriate to apply STEAM to the LAAA model, a joint test for both allele and ancestry effects, which does not benefit from the same reduction in testing burden.” The reviewers’ have correctly identified our oversight - we have amended the manuscript as follows:

      (1) The abstract, “We identified a suggestive association peak (rs3117230, p-value = 5.292 x10-6, OR = 0.437, SE = 0.182) in the HLA-DPB1 gene originating from KhoeSan ancestry.”

      (2) From line 233 to 239: “The R package STEAM (Significance Threshold Estimation for Admixture Mapping) (Grinde et al., 2019) was used to determine the admixture mapping significance threshold given the global ancestral proportions of each individual and the number of generations since admixture (g = 15). For the LA model, a genome-wide significance threshold of pvalue < 2.5 x 10-6 was deemed significant by STEAM. The traditional genome-wide significance threshold of 5 x 10-8 was used for the GA, APA and LAAA models, as recommended by the authors of the LAAA model (Duan et al., 2018).” 

      (3) We excluded the results for the signal on chromosome 20, since this also did not reach the LAAA model genome-wide significance threshold.  

      (4) From line 296 to 308: “LAAA models were successfully applied for all five contributing ancestries (KhoeSan, Bantu-speaking African, European, East Asian and Southeast Asian). However, no variants passed the threshold for statistical significance. Although no variants reached genome-wide significance, a suggestive peak was identified in the HLA-II region of chromosome 6 when using the LAAA model and adjusting for KhoeSan ancestry (Figure 3). The QQ-plot suggested minimal genomic inflation, which was verified by calculating the genomic inflation factor ( = 1.05289) (Supplementary Figure 1). The lead variants identified using the LAAA model whilst adjusting for KhoeSan ancestry in this region on chromosome 6 are summarised in Table 3. The suggestive peak encompasses the HLA-DPA1/B1 (major histocompatibility complex, class II, DP alpha 1/beta 1) genes (Figure 4). It is noteworthy that without the LAAA model, this suggestive peak would not have been observed for this cohort. This highlights the importance of utilising the LAAA model in future association studies when investigating disease susceptibility loci in admixed individuals, such as the SAC population.”

      We acknowledge that our results are not statistically significant. However, our study advances this area of research by identifying suggestive African-specific ancestry associations with TB in the HLA-II region. These findings build upon the work of the ITHGC, which did not identify any significant associations or suggestive peaks in their African-specific analyses. We have included this argument in our manuscript (from lines 425 to 432):

      “The ITHGC did not identify any significant associations or suggestive peaks in their African ancestryspecific analyses.  Notably, the suggestive peak in the HLA-DPB1 region was only captured in our cohort using the LAAA model whilst adjusting for KhoeSan local ancestry. This underscores the importance of incorporating global and local ancestry in association studies investigating complex multi-way admixed individuals, as the genetic heterogeneity present in admixed individuals (produced as a result of admixtureinduced and ancestral LD patterns) may cause association signals to be missed when using traditional association models (Duan et al., 2018; Swart, van Eeden, et al., 2022).”

      We appreciate the comment regarding additional analyses. We acknowledge that we did not validate our SNP peak in the HLA-II region through fine-mapping due to the lack of a suitable reference panel (see lines 490 to 500). Our long-term goal is to develop a HLA-imputation reference panel incorporating KhoeSan ancestry; however, this is beyond the scope and funding allowances of this study.

      Reviewer #2 (Recommendations for the authors):

      The authors we think have done an excellent job with their responses and the manuscript has been substantially improved.

      Thank you for taking the time to help us improve our manuscript.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study reveals that TRPV1 signaling plays a key role in tympanic membrane (TM) healing by promoting macrophage recruitment and angiogenesis. Using a mouse TM perforation model, researchers found that blood-derived macrophages accumulated near the wound, driving angiogenesis and repair. TRPV1-expressing nerve fibers triggered neuroinflammatory responses, facilitating macrophage recruitment. Genetic Trpv1 mutation reduced macrophage infiltration, angiogenesis, and delayed healing. These findings suggest that targeting TRPV1 or stimulating sensory nerve fibers could enhance TM repair, improve blood flow, and prevent infections. This offers new therapeutic strategies for TM perforations and otitis media in clinical settings. This is an excellent and high-quality study that provides valuable insights into the mechanisms underlying TM wound healing.

      Strengths:

      The work is particularly important for elucidating the cellular and molecular processes involved in TM repair. However, there are several concerns about the current version.

      We sincerely thank Reviewer #1 for their time and effort in evaluating and improving our study. Below, we are pleased to address the Reviewer's concerns point by point.

      Weaknesses:

      Major concerns

      (1) The method of administration will be a critical factor when considering potential therapeutic strategies to promote TM healing. It would be beneficial if the authors could discuss possible delivery methods, such as topical application, transtympanic injection, or systemic administration, and their respective advantages and limitations for targeting TRPV1 signaling. For example, Dr. Kanemaru and his colleagues have proposed the use of Trafermin and Spongel to regenerate the eardrum.

      We are grateful to the reviewer for raising this important point. While the present study primarily focuses on the mechanistic role of TRPV1 in TM repair, we agree that the mode of therapeutic delivery will be pivotal in translating these findings into clinical practice. In response, we will expand the discussion to explore possible delivery methods—such as topical application, transtympanic injection, and systemic routes—along with their respective benefits and challenges. We will also cite the work by Dr. Kanemaru and colleagues as an example of how local delivery systems may facilitate TM regeneration.

      (2) The authors appear to have used surface imaging techniques to observe the TM. However, the TM consists of three distinct layers: the epithelial layer, the fibrous middle layer, and the inner mucosal layer. The authors should clarify whether the proposed mechanism involving TRPV1-mediated macrophage recruitment and angiogenesis is limited to the epithelial layer or if it extends to the deeper layers of the TM.

      We apologize for any confusion caused by our previous description. In our study, we utilized Z-stack confocal imaging to capture the full thickness of the TM, as illustrated in Author response image 1 (reconstructed from the acquired Z-sections). This imaging technique allowed us to encompass all three layers of the TM entirely. Each sample was imaged using a 10X objective on an Olympus fluorescence microscope. Given the conical shape and size of the TM, we imaged it in four quadrants, acquiring approximately 30 optical sections (with a 3 µm step) per region. Each acquired images were projected and exported using FV10ASW 4.2 Viewer, then stitched together using Photoshop. The resulting Z-stack projections enabled us to visualize the distribution of macrophages, angiogenesis, and the localization of nerve fibers throughout the TM. We will include this detailed methodology in our revision to clarify any potential confusion.

      Author response image 1.

      Representative confocal images showing one quadrant of the TM collected from collected from CSR1F<sup>EGFP</sup> bone marrow transplanted mouse at day 7 post-perforation. (A-B) 3D-rendered views from different angles reveal the close spatial relationship between CSF1R<sup>EGFP</sup> cells (green) and blood vessels (red) within the TM. (C) Cross-sectional view highlights the depth-wise distribution of CSF1R<sup>EGFP</sup> cells (green) and blood vessels (red) across the layered TM architecture. All images were processed using Imaris Viewer x64 (version 10.2.0).

      Minor concerns

      In Figure 8, the schematic illustration presents a coronal section of the TM. However, based on the data provided in the manuscript, it is unclear whether the authors directly obtained coronal images in their study. To enhance the clarity and impact of the schematic, it would be helpful to include representative images of coronal sections showing macrophage infiltration, angiogenesis, and nerve fiber distribution in the TM.

      As noted above, we utilized Z-stack confocal imaging to capture the full thickness of the TM, enabling us to visualize structures across all three layers. This approach ensured that all layers were included in our analysis. Due to the thin and curved nature of the TM, traditional cross-sectional imaging often struggles to clearly depict the spatial relationships between macrophages, blood vessels, and nerve fibers, especially at low magnification as shown in Author response image 2. In response to the reviewer's suggestion, we will include representative coronal images in the revised manuscript to better illustrate the distribution of these structures at higher magnification.

      Author response image 2.

      Confocal images of eardrum cross-sections collected at day 1 (A), 3 (B), and 7 (C) post perforation to demonstrate the wound healing processes.

      Reviewer #2 (Public review):

      Summary:

      This study examines the role of TRPV1 signaling in the recruitment of monocyte-derived macrophages and the promotion of angiogenesis during tympanic membrane (TM) wound healing. The authors use a combination of genetic mouse models, macrophage depletion, and transcriptomic approaches to suggest that neuronal TRPV1 activity contributes to macrophage-driven vascular responses necessary for tissue repair.

      Strengths:

      (1) The topic of neuroimmune interactions in tissue regeneration is of interest and underexplored in the context of the TM, which presents a unique model due to its anatomical features.

      (2) The use of reporter mice and bone marrow chimeras allows for some dissection of immune cell origin.

      (3) The authors incorporate transcriptomic data to contextualize inflammatory and angiogenic processes during wound healing.

      We sincerely thank Reviewer #2 for their time and effort in improving our study and recognizing its strengths. Below, we are pleased to address the reviewer's concerns point by point.

      Weaknesses:

      (1) The primary claims of the manuscript are not convincingly supported by the evidence presented. Most of the data are correlative in nature, and no direct mechanistic experiments are included to establish causality between TRPV1 signaling and macrophage recruitment or function.

      We appreciate Reviewer #2's perspective on the lack of molecular mechanisms linking TRPV1 signaling and macrophages. However, our data demonstrates that TRPV1 mutations significantly affect macrophage recruitment and angiogenesis. This initial study primarily focuses on the intriguing phenomenon of how sensory nerve fibers are involved in eardrum immunity and wound healing, an area that has not been clearly reported in the literature before. We believe that further research is necessary to explore this topic in greater depth.

      (2) Functional validation of key molecular players (such as Tac1 or Spp1) is lacking, and their roles are inferred primarily from gene expression data rather than experimentally tested.

      Although we have identified the TAC1 and SPP1 signals as potentially important for TM wound healing for the first time, we agree with the Reviewer's view regarding the lack of molecular mechanisms explored in this study. We have not yet tested the downstream signaling pathways, but we plan to investigate them in a series of future studies. As this is an early report, we will continue to explore these signals and their potential clinical applications based on our initial findings moving forward.

      (3) The reuse of publicly available scRNA-seq data is not sufficiently integrated or extended to yield new biological insights, and it remains largely descriptive.

      We appreciate Reviewer #2 for highlighting this point. Leveraging publicly available scRNA-seq databases and established analysis pipelines not only saves time and resources—my lab recently collected macrophages from the eardrums of postnatal P15 mice, with each trial requiring 20 eardrums from 10 animals to obtain a sufficient number of cells—but also allows researchers to build on previous work and focus on new biological questions without the need to repeat experiments. A prior study conducted by Dr. Tward and his team utilized scRNA-seq data to make initial discoveries related to eardrum wound healing, primarily focusing on epithelial cells rather than macrophages. We are building on their raw data to uncover new biological insights regarding macrophages, even though we have not yet tested the unidentified signals, which we believe will be valuable to our peers.

      (4) The macrophage depletion model (CX3CR1CreER; iDTR) lacks specificity, and possible off-target or systemic effects are not addressed.

      We agree with reviewer #2, although macrophage depletion model used in our study is a standard and well-used animal model (Shi, Hua et al. 2018), which has been used by many other laboratories, it is important to note that any macrophage depletion model may have potential issues. We will discuss this in our revision.

      (5) Several interpretations of the data appear overstated, particularly regarding the necessity of TRPV1 for monocyte recruitment and wound healing.

      We thank the reviewer for pointing this out. We will revise our manuscript where it is overstated accordingly.

      (6) Overall, the study appears to apply known concepts - namely, TRPV1-mediated neurogenic inflammation and macrophage-driven angiogenesis - to a new anatomical site without providing new mechanistic insight or advancing the field substantially.

      Although our study may not seem highly innovative at first glance, it reveals a previously unknown role of the TRPV1 pain signaling pathway in promoting eardrum healing for the first time. This healing process includes the recruitment of monocyte-derived macrophages and the formation of new blood vessels (angiogenesis). While this process has been documented in other organs, most research on macrophage-driven angiogenesis has been conducted using in vitro models, with very few studies demonstrating this process in vivo. Our findings could lead to new translational opportunities, especially considering that tympanic membrane perforation, along with damage-induced otitis media and conductive hearing loss, are common clinical issues affecting millions of people worldwide. Targeting TRPV1 signaling could enhance tympanic membrane immunity, improve blood circulation, promote the repair of damaged tympanic membranes, and ultimately prevent middle ear infections—an idea that has not been previously proposed.

      Overall:

      While the study addresses an interesting topic, the current version does not provide sufficiently strong or novel evidence to support its major conclusions. Additional mechanistic experiments and more rigorous validation would be necessary to substantiate the proposed model and clarify the relevance of the findings beyond this specific tissue context.

      We greatly thank the two reviewers for their helpful critiques to improve our study. We especially thank the Section Editors for their insightful and constructive comments on this initial study.

      References:

      Shi, J., L. Hua, D. Harmer, P. Li and G. Ren (2018). "Cre Driver Mice Targeting Macrophages." Methods Mol Biol 1784: 263-275.

    1. Author response:

      We are grateful to the reviewers for their extensive and constructive feedback. In large the three reviewers noted the following main points:

      (1) The overall evidence for any rhythmicity in this data is not ‘very strong’.

      We do agree and will tone down the conclusions accordingly. However, as one of the reviewers noted, a qualitative interpretation of the specific statistical results remains somewhat vague and speculative by necessity.

      (2) The differences between the results for the individual experiments are generally small. Yet, the same reviewer also asks for speculations as to how differences between experiments can be interpreted.

      We will consider these, but also note that a clear demonstration of the robustness of specific effects requires the replication of individual experiments in a separate experiment.

      (3) A clear-cut interpretation of the current experimental design in the context of continuous listening and true vigilance tasks remains difficult. This makes the interpretation and generalization of the results difficult.

      We do agree in principle, but also note that task designs very widely in previous work, which may be one reason for why there is no clear consensus on the existence or absence of a rhythmic mode of listening. We will consider specific suggestions for future work to be included in the revision.

      (4) The adjustment of task difficulty in the present task design may pose a challenge. Reviewers also suggest analyzing potential rhythmicity in this task difficulty parameter.

      We will consider this for the revision.

      (5) A more clear-cut interpretation of what potential differences in the rhythmicity of sensitivity and bias would mean should be included.

      We will provide this in the revision.

      (6) The study should provide a stronger conceptual framework both for the source of "rhythmic modes" and why one may expect differences between ears.

      In large this has been put forward by many previous studies testing and reporting rhythmicity in auditory tasks.  Rhythmicity is pervasive in neural activity, but whether and how this relates to behavioral data remains less clear. These points will be clarified in a revision.

      (7) Parallels to work in the visual domain by Fiebelkorn, Landau & Fries should be included.

      We will discuss similarities and differences between studies on perceptual rhythmicity in the visual and auditory domains.

    1. Author response:

      The following is the authors’ response to the current reviews.

      We again thank you for the positive and constructive feedback on our manuscript, and for highlighting its contributions to understanding the role of CARD8 in viral protease-triggered sensing of viral spread, and the potential impact of our findings on chronic inflammation and immune activation. We agree that it will be important for future work to address whether or not HIV-1 protease-triggered CARD8 inflammasome activation contributes to chronic inflammation in PLWH who are receiving ART.

      In response to the question about the baseline level of IL-1β in Fig. 4D, the figure below shows the mock condition for the CD4+ T cell:MDM coculture. We had done this control in parallel with the data presented in the submitted figure. Levels of IL-1β during HIV-1 infection are increased over background (i.e., mock infection). We note that for donor G the IL-1β concentration is below the limit of detection for this assay. Thus, it remains possible that other inflammasomes contribute modestly during cell-to-cell transmission of HIV-1; however, incomplete knockout of CARD8 in a minority of cells may also contribute to the observed levels of IL-1β in response to HIV-1 infection. Nonetheless, collectively, our data strongly supports the role for CARD8 in HIV-1 protease-triggered inflammasome activation.


      The following is the authors’ response to the original reviews.

      Joint Public Review:

      Following up on their previous work, the authors investigated whether cell-to-cell transmission of HIV-1 activates the CARD8 inflammasome in macrophages, an important question given that inflammasome activation in myeloid cells triggers proinflammatory cytokine release. The data support the idea that CARD8 is activated by the viral protease and promotes inflammation. However, time-course analyses in primary T cells and macrophages and further information on the specific inflammasome involved would further increase the significance of the study.

      Strengths:

      The manuscript is well-written and the data is of good quality. The evidence that CARD8 senses the HIV-1 protease in the context of cell-to-cell transmission is important since cell-to-cell transmission is thought to play a key role in viral spread in vivo, and inflammation is a major driver of disease progression. Clean knockout experiments in primary macrophages are a notable strength and the results clearly support the role of CARD8 in protease-dependent sensing of viral spread and the induction of IL1β release and cell death. The finding that HIV-1 strains are resistant to protease inhibitors differ in CARD8 activation and IL1β production is interesting and underscores the potential clinical relevance of these results.

      Weaknesses:

      One weakness is that the authors used T cell lines which might not faithfully reflect the efficiency of HIV-1 production and cell-cell transfer by primary T cells. To assess whether CARD8 is also activated by protease from incoming viral particles earlier time points should be analyzed. Finally, while the authors exclude the role of NLRP3 in IL-1b and the death of macrophages it would be interesting to know whether the effect is still Gasdermin D dependent.

      Recommendations for the authors

      (1) Co-culture assay should also be done between primary CD4 cells and primary MDMs, because T-cell lines produce much more viruses, and the efficiency of cell-tocell transmission might be dramatically different in primary cells compared to cell lines.

      We have now added data from experiments using infected primary CD4 cells as the donor cells in cell-to-cell HIV-1 transmission to MDMs in new Figure 4. The results largely phenocopy the SUPT1:MDM coculture in that we observe inflammasome activation after co-culture of HIV-infected primary T cells with primary MDMs. We find that this inflammasome activity induced by the CD4:MDM cell-to-cell transmission is abrogated by knockout of CARD8 in the MDMs or treatment of HIV protease inhibitor lopinavir (LPV) or caspase 1 inhibitor VX765, suggesting that this activation is dependent on CARD8, HIV protease, and caspase 1. Additionally, the signal persists in the presence of reverse transcriptase inhibitor nevirapine (NVP), suggesting that the incoming protease is driving activation.

      (2) For all co-culture experiments, supernatants were collected at 48 or 72 hours. Since CARD8 activation is expected to be driven by incoming viral particles without RT, they should measure cytokine production at much earlier time points. 2-3 days co-culture raises concerns. Ideally, the authors can provide a time-course.

      We have now added a time course of the SUPT1:MDM coculture from 3 unique donors taken at 4, 24, 48, and 72 hours post coculture in the presence or absence of reverse transcriptase inhibitor (see new Figure 3B) as well as for the primary CD4 cells to MDM co-culture (see new Figure 4B). We detect IL-1β at the 24hour time point (and later), but not at the 4-hour time point which is slower than what was detected by direct cell-free infection (Kulsuptrakul et al., 2023). However, we still hypothesize that this is driven by active incoming viral protease because the signal is not abrogated by a reverse transcriptase inhibitor, which indicates that de novo protease production is not necessary. We also observed that IL-1β levels do not increase after plateauing 24h after establishing the co-culture, suggesting that secondary infection does not further amplify inflammasome activation. We now speculate on this in the Discussion.

      (3) A potential confounder in the data in Figure 4 is that despite rightly including the cognate adaptations in the Gag cleavage sites with the PI-R protease mutants, some of these viruses still display Gag processing defects. Can the authors disentangle the potency of PR mutant cleavage with either reduced cell entry or reduced protease availability due to processing defects in the incoming virions?

      The reviewer is correct that although the western blot with the p24<sup>gag</sup> antibody suggests that Gag is processed, we cannot rule out that other variables do not contribute to the observed difference in CARD8 inflammasome activation. For example, PI-R clones relative to the LAI strain may have distinct protease substrate specificity, variable efficiency/kinetics in viral assembly, gag dimerization, and other factors may ultimately influence CARD8 inflammasome activation. We have updated the text to reflect these possibilities. Nonetheless, this argument does not change the conclusion that CARD8 inflammasome activation is affected by protease mutations acquired during drug resistance.

      (4) There is considerable donor variation in the macrophages (unsurprising) but can the authors correlate this with CARD8 expression and are there any off-target effects on macrophage permissivity to HIV-1 infection?

      We have now considerably increased the number of primary cell donors from the first submission (see Author response table 1 below). We find that the non-responsive donor presented in the first submission is aberrant since all others do respond to a greater or lesser degree (Figure 3, Figure 4). However, the reviewer may be correct that the particular aberrant donor MDMs were poorly infected. We also note that despite donor variability in the degree of activation (IL-1β secretion) from cocultures with HIV<sub>BaL</sub>-infected SUPT1 cells, HIV-induced activation is comparable to the activation induced by VbP (see new Figure 3–figure supplement 1B). We do not see a notable difference in CARD8 expression between donors. Nonetheless, with the added number of primary cell donors, the data are consistent with a role of primary MDMs from nearly all donors in supporting a CARD8-dependent, HIV-protease dependent inflammasome response after co-culture with infected T cells. We have left in data from all of the donors so that readers can appreciate the variability among primary cells.

      Author response table 1.

      In addition, to address the reviewer concerns about off-target effects of the sgRNAs on macrophage permissivity, we assessed our CD4:MDM cocultures for percent infectivity via intracellular p24<sup>gag</sup> in AAVS1 vs CARD8 KO MDMs and we observed no significant difference in infectivity in AAVS1 vs CARD8 KO MDMs (see Author response image 1 of MDMs after co-culture with T cells that is not affected any potential off-target effects of the sgRNAs.

      Author response image 1.

      Equivalent infection in AAVS1 vs CARD8 KOMDMs. AAVS1 or CARD8 KO from donor 12 were cocultured with mock or HIV infected CD4 T cells as described in Figure 4D for 72 hours then assessed for HIV infection of the MDMs by washing away CD4 T cells, harvesting MDMs, and staining attached MDMs for intracellular p24<sup>gag</sup> for flow cytometry analysis. Datasets represent mean ± SD (n=2 technical replicates from one donor). One-way ANOVA with Dunnett’s test using GraphPad Prism 10. ns = not significant, *p<0.05,**p<0.01, ***p<0.001, ****p<0.0001.

      (5) The authors suggest that NLRP3 is unlikely to be the mediator of IL-1b and cell death in the macrophages. Is this death still GSDMDdependent, what other NLRs are expressed in this system and does it make a difference what PAMP you use to prime the response?

      We have now added additional data in support of the conclusion that NLRP3 is not a mediator of the IL-1β secretion in the infected SUPT1 cells to primary MDMs coculture. In addition to using an NLRP3 inhibitor, we have now also made NLRP3 KOs MDMs and used these in the coculture experiments which show that the IL-1β secretion after coculture of infected SUPT1 cells and primary MDMs is mediated by CARD8 and not NLRP3 because the signal is abrogated by CARD8 knockout, but not by NLRP3 knockout. This new data is shown in Figure 3C and D.

      To assess the role of GSDMD, we treated SUPT1:MDM cocultures with disulfiram, a GSDMD inhibitor (Hu et al., 2020). Disulfiram treatment abrogated IL-1β secretion, suggesting that this activation is indeed GSDMD-mediated (see Author response image 2 below). We choose not to include the disulfiram result in the final manuscript since we have not ruled out cytotoxic effects of the drug.

      There are likely other NLRs expressed in primary MDMs; however, since inflammasome activation is completely absent in the CARD8 KO MDMs, we infer that CARD8 is the main inflammasome-forming sensor in this system. However, we cannot rule out the possibility of other innate sensors being activated downstream of CARD8 or under different differentiation conditions.

      To address the concern that alternative priming affects CARD8 activation, we compared pre-treatment of cells with Pam3CSK4 or lipopolysaccharide (LPS) in the presence or absence of HIV protease inhibitor and reverse transcriptase inhibitor. Regardless of the priming agent used, we observed HIV protease-dependent activation that persisted in the presence of reverse transcriptase inhibitor, suggesting that CARD8 is the main sensor under LPS and Pam3CSK4 priming (new Figure 3–figure supplement 1A).

      Author response image 2.

      Inflammasome activation following cell-to-cell HIV infection is mediated by GSDMD. SUPT1-CCR5 cells were either mock-infected or infected with HIV-1<sub>NL4.3BaL</sub> for 20 hours before coculturing with MDMs in either the presence or absence of GSDMD inhibitor disulfarim (25μM). Cocultures were harvested 24 hours later to assess (left) IL-1β secretion via IL-1 reporter assay and (right) cell viability via CellTiter-Glo® assay. Viability was calculated by normalizing to relative luminescence units in the mock untreated control. Dotted line indicates limit of detection (LoD). Dashed line indicates 100% viability as determined by untreated mock control. Datasets represent mean ± SD (n=2 technical replicates for one donor). Two-way ANOVA with Sidak’s test (using GraphPad Prism 10. ns = not significant, *p<0.05,**p<0.01, ***p<0.001, ****p<0.0001.

      Minor points

      (1) In Figure 1, the authors should clarify whether LAI or LAI-VSV-G was used.

      Wild-type virus (LAI strain) was used in Figure 1. This has now been clarified in the figure legend.

      (2) In Figure 1, the fraction of infected cells without DEAE was ~20% in both WT and CARD8 KO THP-1, suggesting somewhat efficient viral entry even in the absence of DEAE. How do the authors reconcile this with the lack of IL-1β production? The increase in infection observed in WT THP-1 +DEAE was overall modest (from ~20% to 25-30%) compared to the dramatic difference in IL-1β production. Can they provide more evidence or discuss how DEAE might be impacting cytokine production? If differences in viral entry are the explanation for differences in inflammasome activation, then they should be able to overcome this by using virus at a higher MOI in the absence of DEAE. Experiments proposed in Figure 1 +/- DEAE should be repeated using a range of MOI for LAI and showing the corresponding percent infection in THP-1 cells (which is not shown in Figure S2 for LAI-VSVG).

      We hypothesize that the lack of IL-1β production without DEAE is likely due to an insufficient amount of incoming viral protease to induce CARD8 activation. Though the increase in infection with DEAE is modest by intracellular p24<sup>gag</sup> at 24 hours post infection, we infer that intracellular p24<sup>gag</sup> may be largely underestimating the actual increase in viral efficiency achieved with DEAE (now in Supplemental Note). We have also updated Figure S2 (now Figure 2–figure supplement 1) legend to include the percent infection for HIV-1<sub>LAI</sub> and HIV-1<sub>LAI-VSVG</sub> infections. We agree that activation in the absence of DEAE could be overcome by infecting with a more concentrated viral stock to increase the MOI. Indeed, our decision to use the cell-to-cell transmission model achieves this in a more physiologic context.

      (3) In Figure S1, the authors point out that RT-activity in the supernatants was similar in the cell-free vs. cell-to-cell model. While in the transwell system THP-1 cells are the only cells capable of producing new virions, how are they able to differentiate viral production from sup-T1 vs. THP-1 in the cell-to-cell system? At a minimum, they should provide some data on the observed RT activity in matching wells containing the same number of infected sup-T1 cells utilized in coculture experiments.

      We think this may have been a misinterpretation. In Figure S1 (now Figure 1B, right), we compare the amount of virus available in the lower chamber of the transwell versus the cell-to-cell condition. We are not comparing cell-free to cell-to-cell infection. We have changed the text and figure title to clarify this point.

      (4) Can the authors provide additional comments on the lack of IL-1β release in donor C in Figure 3? The donor did not produce IL-1β in response to VbP or HIV, although the WB for CARD8 appears similar to the other two donors.

      We have now tested MDMs from additional donors and continue to find a range of IL-1β secretion after the coculture. However, donor C is aberrant since each of the other donors had detectable IL-1β secretion in response to VbP and HIV-1 to greater or lesser extents. Nonetheless, we have included additional donors summarized in the table above corresponding to major comment #4.

      (5) For Figure 3, can the authors provide information on the fraction of MDMs that were infected after coculture with sup-T1 cells? Why didn't the authors measure cell death in MDMs?

      It is difficult to measure the fraction of MDMs infected or dying in the cocultures since it is hard to separate signal from the T cells. Although it would be possible to do so, in this manuscript, we instead prefer to focus on the potential contribution of CARD8 inflammasome activation in exacerbating chronic inflammation in response to HIV rather than the depletion of macrophages.

      (6) In Figure 4, did the authors introduce the mutations associated with PI resistance into the same LAI backbone? If not, this is not a fair comparison, as viral protein expression levels were not at the same level, indicated in Figure 4A. Additionally, such comparison will be further strengthened by using cells other than 293T cells for the coculture assay.

      No, we did not introduce these mutations into LAI, since they were already in an NL4.3 backbone and NL4.3 and LAI differ by only 1 amino acid in protease. We have updated Table S1 to report this amino acid difference. We also note that in our previous manuscript we tested much more diverse proteases such as a clade A HIV-1, HIV-2, and SIVs and find comparable CARD8 cleavage to LAI.

      Additions not requested by Reviewers:

      THP-1 characterization

      In our previous work, we noticed that different “wildtype” THP-1 lines behaved uniquely in response to DEAE-dextran. In particular, we observed inflammasome activation in response to DEAE-dextran alone at the concentration used for spinoculations (20μg/mL), whereas the other THP-1 line did not. Thus, we performed STR profiling on each THP-1 cell line and determined that the THP-1 cells used in our studies (JK THP1s) are distinct from THP-1 cells from ATCC at 3 different loci. This data is now included in the Supplemental Note (Figure A1). Please note that all data in this and the accompanying manuscript were performed in JK THP-1 cells.

      Whole plasmid sequencing of the PI-resistant HIV clones

      Since preprint submission, we have done whole plasmid Oxford Nanopore sequencing on the PI-resistant HIV clones obtained from the NIAID HIV/AIDS Specimen Repository Program. Of note, there were a handful of previously unreported mutations included in these plasmid stocks within protease. We have updated Table S1 to include an additional column titled “Additional amino acid changes in HIV<sup>PR</sup> relative to NL4.3.”

      References

      Hu JJ, Liu X, Xia S, Zhang Z, Zhang Y, Zhao J, Ruan J, Luo X, Lou X, Bai Y, Wang J, Hollingsworth LR, Magupalli VG, Zhao L, Luo HR, Kim J, Lieberman J, Wu H. 2020. FDA-approved disulfiram inhibits pyroptosis by blocking gasdermin D pore formation. Nat Immunol 21:736–745. doi:10.1038/s41590-020-0669-6

      Kulsuptrakul J, Turcotte EA, Emerman M, Mitchell PS. 2023. A human-specific motif facilitates CARD8 inflammasome activation after HIV-1 infection. eLife 12:e84108. doi:10.7554/eLife.84108

    1. Author Response:

      eLife assessment

      This is a valuable initial study of cell type and spatially resolved gene expression in and around the locus coeruleus, the primary source of the neuromodulator norepinephrine in the human brain. The data are generated with cutting-edge techniques, and the work lays the foundation for future descriptive and experimental approaches to understand the contribution of the locus coeruleus to healthy brain function and disease. However, due to small sample size and the need for additional confirmatory data, the data only incompletely support the main conclusions presented here. With the strengthening of the analyses, this paper, and the associated web application, will be of great interest to neuroscientists working on arousal-based behaviors and neurological and neuropsychiatric phenotypes.

      Thank you for the assessment and comments. Overall, the majority of the issues raised by the reviewers relate either directly or indirectly to limitations of the sample size that precluded further optimization of protocols and expansion of the dataset. We fully acknowledge the limited sample size in this dataset and aim to be transparent about the limitations of the study. This is the first report of snRNA-seq and spatially-resolved transcriptomics in the human locus coeruleus (LC). The LC is a very small nucleus, located deep within the brainstem, which is extremely challenging to study due to its small size, difficult to access location, and the very small number of norepinephrine (NE) neurons located within the nucleus, which were of prime interest for this study. We note that this study represents our initial attempt to molecularly and spatially characterize cell types within the human LC. We note that we did not have significant, established funding from extramural sources dedicated to this study, and tissue resources for the LC are difficult to ascertain, contributing to the small sample size in this initial study. We acknowledge that there are limitations in sample size as well as data quality. Findings from this study will be used to inform, improve, and optimize future and ongoing experimental design, as well as technical and analytical workflows for larger-scale studies. As brought up by one of the reviewers, this field is still in its infancy -- pilot experimentation in new brain regions is labor-intensive and these sequencing approaches remain costly. Moreover, due to the small size and difficulties in dissecting, tissue resources from the human brain in this area are a highly limited resource. Hence, notwithstanding limitations, in our view it is important to release the data for community access at this time. Specific responses to the reviewers’ comments are provided point-by-point in the following sections.

      Reviewer #1 (Public Review):

      Weber et al. collect locus coeruleus (LC) tissue blocks from 5 neurotypical European men, dissect the dorsal pons around the LC and prepare 2-3 tissue sections from each donor on a slide for 10X spatial transcriptomics. […] The authors transparently present limitations of their work in the discussion, but some points discussed below warrant further attention.

      Specific comments:

      1) snRNAseq:

      a. Major concerns with the snRNAseq dataset are A) the low recovery rate of putative LC-neurons in the snRNAseq dataset, B) the fact that the LC neuron cluster is contaminated with mitochondrial RNA, and C) that a large fraction of the nuclei cannot be assigned to a clear cell type (presumably due to contamination or damaged nuclei). The authors chose to enrich for neurons using NeuN antibody staining and FACS. But it is difficult to assess the efficacy of this enrichment without images of the nuclear suspension obtained before FACS, and of the FACS results. As this field is in its infancy, more detail on preliminary experiments would help the reader to understand why the authors processed the tissue the way they did. It would be nice to know whether omitting the FACS procedure might in fact result in higher relative recovery of LC-neurons, or if the authors tried this and discovered other technical issues that prompted them to use FACS.

      Thank you for these comments. We agree these are valid concerns in assessing the data quality and validity of the findings from the snRNA-seq dataset. We will respond to these concerns here to the best of our ability, but in some cases, we do not have definitive answers since comparison data are not yet available for this region. In particular, we were limited in resources for this initial study -- some of the results of the study and issues that we identified in attempting to molecularly profile cells in the human LC were surprising to us, and we intend to generate additional samples and troubleshoot these issues to improve data quality and increase recovery in future work. However, these experiments are (i) expensive, (ii) time- and labor-intensive, and (iii) the tissue for this region is limited and difficult to ascertain. Given the extremely small size of the LC, the tissue resource is quickly depleted. For this study, we had fixed resources and made best-guess decisions on how to proceed with the experimental design, based on our experience with snRNA-seq in other human brain regions (Tran and Maynard et al. 2021). However, the LC is a unique region, and our experiences with this dataset will guide us to make technical adjustments in future studies. Due to the limitations in the tissue resources and the lack of data currently available to the community, we wanted to share these results immediately while acknowledging the limitations of the study as we work to increase our resource availability to expand molecular and spatial profiling studies in this region of the human brain.

      Regarding the reviewer’s concern that our choice to use FANS to enrich for neurons could have potentially led to more damage and contributed to the low recovery rate of LC-NE neurons and the mitochondrial contamination -- we do not have a definitive answer to this question, since we did not perform a direct comparison with non-sorted data. As noted above, our limited tissue resource dictated that we could not do both. We made the decision to enrich for neurons based on our previous experience with identifying relatively rare populations in other brain regions (e.g. nucleus accumbens and amygdala; Tran and Maynard et al. 2021). Based on this previous work, our rationale was that without neuronal enrichment, we could potentially miss the LC-NE population, given the relative scarcity of this neuronal population. The low recovery rate and relatively lower quality / contamination issues may be due to technical issues that lead to LC-NE neurons being more susceptible to damage during nuclear preparation and sorting. We agree that directly comparing to data prepared without NeuN labeling and sorting is reasonable, as the additional perturbations may indeed contribute to cell damage. As mentioned in the discussion, we do not have a definitive answer to the reasons for increased mitochondrial contamination and we suspect that multiple technical factors may contribute -- including the relatively large size and increased fragility of LC-NE neurons. We agree that systematically optimizing the preparation to attempt to increase recovery rate and decrease mitochondrial contamination are important avenues for future work.

      b. It is unclear what percentage of cells that make up each cluster.

      We will add this information in the clustering heatmaps or as a supplementary plot in a revised version of the manuscript.

      c. The number of subjects used in each analysis was not always clear. Only 3 subjects were used for snRNAseq, and one of them only yielded 4 LC-nuclei. This means the results are essentially based on n=2. The authors report these numbers in the corresponding section, but the first sentence of the results section (and Figure 1C specifically!) create the impression that n=5 for all analyses. Even for spatial transcriptomics, if I understood it correctly, 1 sample had to be excluded (n=4).

      This is correct. We will update the figures and text in a revised version of the manuscript to make this limitation (small sample size) more clear, and to further emphasize that the intention of this study is to provide initial data to help determine next steps and best practices for a larger scale and more comprehensive study on this region, especially given the limited availability of tissue resources and currently limited data resources available for this region.

      2) Spatial transcriptomics:

      a. It is not clear to me what the spatial transcriptomics provides beyond what can be shown with snRNAseq, nor how these two sets of results compare to each other. It would be more intuitive to start the story with snRNAseq and then try to provide spatial detail using spatial transcriptomics. The LC is not a homogeneous structure but can be divided into ensembles based on projection specificity. Spatial transcriptomics could - in theory - offer much-needed insights into the spatial variation of mRNA profiles across different ensembles, or as a first step across the spatial (rostral/caudal, ventral/dorsal) extent of the LC. The current analyses, however, cannot address this issue, as the orientation of the LC cannot be deduced from the slices analyzed.

      We understand the point of the reviewer. However, we structured the manuscript in this format due to our aims of creating a data resource for the community as well as being transparent about the limitations of our study. Our experiments began with the spatial experiments on the tissue blocks because this (i) helped orient ourselves to the region, and (ii) provided guidance for how best to score the tissue blocks for the snRNA-seq experiments to maximize recovery of LC-NE neurons. Therefore, we also decided to present the results in this sequence.

      The spatial data also provides more information in that the measurements are from nuclei, cytoplasm, and cell processes (instead of nuclei only). This is one of the main differences / advantages between the platforms at this level of spatial resolution. As noted above, we were also working with a finite tissue resource -- if we ran snRNA-seq first and captured no neurons, the tissue block would be depleted. Due to the logistics / thickness of the required tissue sections for Visium and snRNA-seq respectively, running Visium first allowed us to ensure that we could collect data from both assays.

      Regarding a point raised below on why we only ran snRNA-seq on a subset of the donors -- this was due to resource depletion and not enough available tissue remaining on the tissue blocks to run the assay. We have conducted extensive piloting in other brain regions on the amount (mg) of tissue that is needed from various sized cryosections, and the LC is particularly difficult since these are small tissue blocks and the extent of the structure is small. Hence, in some of the subjects, we did not have sufficient tissue available for the snRNA-seq assay.

      We agree with the reviewer that spatial studies could, in future work, offer needed and important information about expression profiles across the spatial axes (rostral/caudal, ventral/dorsal) of the LC. Our study provides us with insight about optimizing the dissections for spatial assays, as well as bringing to light a number of technical and logistical issues that we had not initially foreseen. For example, during the course of this study and parallel, ongoing work in other small, challenging brain regions, we have now developed a number of specialized technical and logistical strategies for keeping track of orientation and mounting serial sections from the same tissue block onto a single spatial array, which is extremely technically challenging. We are now well-prepared for addressing these issues in future studies with larger numbers of donors and samples, e.g. spaced serial sections across the extent of the LC to make these types of insights. Due to the rarity of the tissue, limited availability of information in this region, and high expense of conducting these studies, we want to share this initial data with the community immediately. We also note that in addition to the 10x Genomics Visium platform, which lacks cellular and sub-cellular resolution, many new and exciting spatial platforms are entering the market, which may be able to address questions in very small regions such as the LC at higher spatial resolution.

      b. Unfortunately, spatial transcriptomics itself is plagued by sampling variability to a point where the RNAscope analyses the authors performed prove more powerful in addressing direct questions about gene expression patterns. Given that the authors compare their results to published datasets from rodent studies, it is surprising that a direct comparison of genes identified with spatial transcriptomics vs snRNAseq is lacking (unless this reviewer missed this comparison). Supplementary Figure 17 seems to be a first step in that direction, but this is not a gene-by-gene comparison of which analysis identifies which LC-enriched genes. Such an analysis should not compare numbers of enriched genes using artificial cutoffs for significance/fold-change, but rather use correlations to get a feeling for which genes appear to be enriched in the LC using both methods. This would result in one list of genes that can serve as a reference point for future work.

      We agree this is a good suggestion, and will add additional computational analyses to address this point in a revised version of the manuscript.

      c. Maybe the spatial transcriptomics could be useful to look at the peri-LC region, which has generated some excitement in rodent work recently, but remains largely unexplored in humans.

      We agree this is an excellent suggestion -- assessing cross-species comparisons related to convergence, especially, of GABAergic cell populations in the human LC is of high interest. We note that these types of extensions are exactly the reason why we have provided the publicly accessible web app (R/Shiny app, which includes the ability to annotate regions). We hope that others will use these apps for specialized topics they are interested in. As discussed above, we note that our initial dissections precluded the ability to keep track of the exact orientation of our tissue sections on the Visium arrays with respect to their location within the brainstem, so definitive localization of this region across subjects is difficult in our current study. However, it is possible, for example, to investigate whether there is a putative peri-LC region that is densely GABAergic that is homologous with the GABAergic peri-LC region in rodents. We also raise attention to a recent preprint by Luskin and Li et al. (2022), who apply snRNA-seq and spatially-resolved transcriptomics to molecularly define both LC and peri-LC cell types in mice -- in a revised version of our manuscript, we will extend our computational analyses of inhibitory neuronal subtypes in our data (Supplementary Figures 13, 16) to directly compare with those identified in this study in more detail. As noted above, we we have now developed a number of specialized technical and logistical strategies for keeping track of orientation of sections from the tissue block onto a single spatial array, and we feel that combined with optimized dissection strategies for this region and the guide of RNAscope for GABAergic markers on serial sections, that annotating the peri-LC region on spatial arrays in future studies will be possible.

      3) The comparison of snRNAseq data to published literature is laudable. Although the authors mention considerable methodological differences between the chosen rodent work and their own analyses, this needs to be further explained. The mouse dataset uses TRAPseq, which looks at translating mRNAs associated with ribosomes, very different from the nuclear RNA pool analyzed in the current work. The rat dataset used single-cell LC laser microdissection followed by microarray analyses, leading to major technical differences in terms of tissue processing and downstream analyses. The authors mention and reference a recent 10x mouse LC dataset (Luskin et al, 2022), however they only pick some neuropeptides from this study for their analysis of interneuron subtypes (Figure S13). Although this is a very interesting part of the manuscript, a more in-depth analysis of these two datasets would be very useful. It would likely allow for a better comparison between mouse and human, given that the technical approach is more similar (albeit without FACS), and Luskin et al have indicated that they are willing to share their data.

      As noted above, we plan to extend our comparisons with the dataset from Luskin and Li et al. (2022) in a revised version of the manuscript, which will provide a more in-depth cross-species comparison. In addition, we also note that there are some additional recent studies using TRAPseq of LC-NE neurons in a functional context, i.e. treatment vs. control experiments or in model systems (e.g. Iannitelli et al. 2023), which provide new opportunities for understanding disease context using in-depth cross-species comparisons. By providing our dataset and reproducible code, we will enable others to adapt and extend these types of comparisons (i.e. TRAPseq of LC-NE neurons or LC snRNA-seq following functional manipulations or in the context of disease or behavioral models) in the future.

      4) Statements in the manuscript about the unexpected identification of a 5-HT (serotonin) cell-cluster seem somewhat contradictory. Figure S14 suggests that 5-HT markers are expressed in the LC-regions just as much as anywhere else, but the RNAscope image in Figure S15 suggests spatial separation between these two populations. And Figure S17 again suggests almost perfect overlap between the LC and 5HT clusters. Maybe I misunderstood, in which case the authors should better clarify/explain these results.

      In our view, the most likely scenario is that the 5-HT neurons come from contamination from the dorsal raphe nucleus based on spatial separation from the RNAscope images, which we agree are more definitive. As mentioned above, since we do not have definitive documentation for the tissue sections in terms of orientation, it is difficult to say with clarity that the regions are the dorsal raphe and which sub-portion of the dorsal raphe they are. This initial study has now allowed us to optimize and improve our dissection strategy and approaches for retaining documentation of the orientation of the tissue sections from their intact position within the brainstem as they move from cryosection to placement on the array, which will enable us to better annotate regions with definitive anatomical information with respect to the rostral/caudal and dorsal/ventral axes in future experiments. Given that there are reports in the rodent that 5-HT markers have been identified in LC-NE neurons (Iijima 1993; Iijima 1989), and taking into account the technical limitations in our study, we felt that it was premature to definitively conclude in the manuscript that we were sure these signals arose from the dorsal raphe. We will update this language in a revised version of the manuscript to ensure that these limitations are clear (referring to Supplementary Figures S14-15, S17).

      Reviewer #2 (Public Review):

      The data generated for this paper provides an important resource for the neuroscience community. The locus coeruleus (LC) is the known seed of noradrenergic cells in the brain. Due to its location and size, it remains scarcely profiled in humans. Despite the physically minute structure containing these cells, its impact is wide-reaching due to the known neuromodulatory function of norepinephrine (NE) in processes like attention and mood. As such, profiling NE cells has important implications for most neurological and neuropsychiatric disorders. This paper generates transcriptomic profiles that are not only cell-specific but which also maintain their spatial context, providing the field with a map for the cells within the region.

      Strengths:

      Using spatial transcriptomics in a morphologically distinct region is a very attractive way to generate a map. Overlaying macroscopic information, i.e. a region with greater pigmentation, with its corresponding molecular profile in an unbiased manner is an extremely powerful way to understand the specific cellular and molecular composition of that brain structure.

      The technologies were used with an astute awareness of their limitations, as such, multiple technologies were leveraged to paint a more complete and resolved picture of the cellular composition of the region. For example, the lack of resolution in the spatial transcriptomic platform was compensated by complementary snRNA-seq and single molecule FISH.

      This work has been made publicly available and accessible through a user-friendly application such that any interested researcher can investigate the level of expression of their gene of interest within this region.

      Two important implications from this work are 1) the potential that the gene regulatory profiles of these cells are only partially conserved across species, humans, and rodents, and 2) that there may be other neuromodulatory cell types within the region that were otherwise not previously localized to the LC

      Weaknesses:

      Given that the markers used to identify cells are not as specific as they need to be to definitively qualify the desired cell type, the results may be over-interpreted. Specifically, TH is the primary marker used to qualify cells as noradrenergic, however, TH catalyzes the synthesis of L-DOPA, a precursor to dopamine, which in turn is a precursor for epinephrine and norepinephrine suggesting some of the cells in the region may be dopaminergic and not NE cells. Indeed, there are publications to support the presence of dopaminergic cells in the LC (see Kempadoo et al. 2016, Takeuchi et al., 2016, Devoto et al. 2005). This discrepancy is further highlighted by the apparent lack of overlap per given Visium spots with TH, SCL6A2, or DBH. While the single-nucleus FISH confirms that some of the cells in the region are noradrenergic, others very possibly represent a different catecholamine. As such it is suggested that the nomenclature for the cells be reconsidered.

      We appreciate the reviewer’s comment, and are aware of the reports suggesting the potential presence of dopaminergic cells in the LC. We initially had the same thought as the reviewer when we observed Visium spots in the spatial data with lack of overlap between TH, SLC6A2, and DBH as well as single nuclei in the snRNA-seq data with lack of overlap between TH, SLC6A2, and DBH. This surprising result was exactly why we performed the smFISH/RNAscope experiment with these three marker genes. Given known issues with read depth and coverage in the 10x Genomics assays, we wanted to better understand if this was a technical limitation in the sequencing coverage, or rather a true biological finding. The RNAscope data showed very clearly that nearly every cell body we looked at had co-localization of these three marker genes. We included an image from a single capture array of one tissue section in Supplementary Figure 11, but could, in a revised version of the manuscript, provide additional examples to illustrate how conclusive the images were by visualization. As such, we were quite convinced that the lack of overlap on Visium spots and in single nuclei in the snRNA-seq data was more likely related to technical issues with sequencing coverage, rather than a biological finding. We also note that we checked for the presence of the dopamine transporter, SLC6A3, and as can be appreciated in the iSEE web app for the snRNA-seq data or the R/Shiny web app for the Visium data, there is virtually no expression of SLC6A3 in the dataset, which in our view provides additional evidence against the possibility that there are substantial quantities of dopaminergic cells in this human LC dataset. We will include supplementary plots showing the lack of SLC6A3 expression in a revised version of the manuscript.

      The authors are unable to successfully implement unsupervised clustering with the spatial data, this greatly reduces the impact of the spatial technology as it implies that the transcriptomic data generated in the study did not have enough resolution to identify individual cell types.

      The reviewer is correct -- this is a fundamental limitation of the 10x Genomics Visium platform, i.e. the spatial resolution captures multiple cells per spot (e.g. around 1-10 cells per spot in human brain tissue). We note that new spatial platforms now provide cellular resolution (e.g. Vizgen MERSCOPE, 10x Genomics Xenium, 10x Genomics Visium HD), which will help address this in future work. However, many of these cellular-resolution in situ sequencing platforms have the limitation that they do not quantify genome-wide expression, and instead require users to select a priori gene panels to investigate. This is a problem if no genome-wide reference datasets are available. Hence, despite the limited spatial resolution of the Visium platform, this dataset is useful precisely for helping investigators choose gene panels for higher-resolution platforms or higher-order smFISH multiplexing.

      We also applied spatial clustering (using BayesSpace; Zhao et al. 2021) to attempt to segment the LC regions within the Visium samples in a data-driven manner as an alternative to the manual annotations, which was unsuccessful (and hence we relied on the manually annotated regions for downstream analyses) (Supplementary Figure S5). However, this is a different application of unsupervised clustering, which is separate from the task of identifying cell types.

      The sample contribution to the results is highly unbalanced, which consequently, may result in ungeneralizable findings in terms of regional cellular composition, limiting the usefulness of the publicly available data.

      We acknowledge the limitations of the work due to the small/unbalanced sample sizes. As mentioned above for Reviewer 1, this was an initial study in this region -- results of which will inform our (and hopefully others’) experimental design and approach to molecular profiling in this difficult to access brain region. Overall, this study was executed with finite tissue and financial resources and was intended to uncover limitations and help develop best practices and design workflows for future studies with larger numbers of donors and samples. Given the limited data availability for this brain region, we wanted to make this dataset available for the research community immediately. In addition, we note that making this genome-wide dataset available will help inform targeted gene panel design for higher-resolution platforms (e.g. 10x Genomics Xenium).

      This study aimed to deeply profile the LC in humans and provide a resource to the community. The combination of data types (snRNA-seq, SRT, smFISH) does in fact represent this resource for the community. However, due to the limitations, of which, some were described in the manuscript, we should be cautious in the use of the data for secondary analysis. For example, some of the cellular annotations may lack precision, the cellular composition also may not reflect the general population, and the presence of unexpected cell types may represent the accidental inclusion of adjacent regions, in this case, serotonergic cells from the Raphe nucleus.

      We agree, and have attempted to explain these limitations in the manuscript. We will clarify the language regarding the interpretation of the annotated cell populations and unexpected cell types, and the limited sample sizes, in a revised version of the manuscript.

      Nonetheless having a well-developed app to query and visualize these data will be an enormous asset to the community especially given the lack of information regarding the region in general.

      Reviewer #3 (Public Review):

      […] This study has many strengths. It is the first reported comprehensive map of the human LC transcriptome, and uses two independent but complementary approaches (spatial transcriptomics and snRNA-seq). Some of the key findings confirmed what has been described in the rodent LC, as well as some intriguing potential genes and modules identified that may be unique to humans and have the potential to explain LC-related disease states. The main limitations of the study were acknowledged by the authors and include the spatial resolution probably not being at the single cell level and the relatively small number of samples (and questionable quality) for the snRNA-seq data. Overall, the strengths greatly outweigh the limitations. This dataset will be a valuable resource for the neuroscience community, both in terms of methodology development and results that will no doubt enable important comparisons and follow-up studies.

      Major comments:

      Overall, the discovery of some cells in the LC region that express serotonergic markers is intriguing. However, no evidence is presented that these neurons actually produce 5-HT.

      The reviewer is correct that we did not provide any additional evidence to show that these neurons actually produce 5-HT. As noted above in the response to Reviewer 1, in our view, the most likely explanation is that these neurons are from dorsal raphe contamination on the tissue section. However, due to technical and logistical limitations in this study, we could not definitively say this because we did not clearly track the orientation of the tissue sections, and we did not have remaining tissue sections from all donor tissue blocks to repeat RNAscope experiments. For some of the donors, where we had remaining tissue sections to go back to repeat RNAscope experiments after completion of the snRNA-seq and Visium assays, we could see clear separation of the LC region / LC-NE neuron core from where putative 5-HT neurons were located (Supplementary Figure 15). However, we did not have sufficient tissue resources to map this definitively in all donors, and the orientation and anatomy of each tissue block were not fully annotated.

      Due to the lack of clarity, and the fact that there have been reports that LC-NE neurons express serotonergic markers (Iijima 1993; Iijima 1989), we felt that it was premature to definitively declare that these putative 5-HT neurons that we identified were definitively from the raphe. We will clarify the language around this discrepancy in a revised version of the manuscript to ensure that these limitations are clearly described.

      Concerning the snRNA-seq experiments, it is unclear why only 3 of the 5 donors were used, particularly given the low number of LC-NE nuclear transcriptomes obtained, why those 3 were chosen, and how many 100 um sections were used from each donor. It is also unclear if the 295 nuclei obtained truly representative of the LC population or whether they are just the most "resilient" LC nuclei that survive the process.

      As discussed above for Reviewer 1, the reason we included only 3 of the 5 donors for the snRNA-seq assays was due to the tissue availability on the tissue blocks. We will clarify the language in a revised version of the manuscript to make this limitation more clear. We will also include additional details in the Methods section on the number of 100 μm sections used for each donor (which varied between 10-15, approximating 60-80 mg of tissue).

      The LC displays rostral/caudal and dorsal/ventral differences, including where they project, which functions they regulate, and which parts are vulnerable in neurodegenerative disease (e.g. Loughlin et al., Neuroscience 18:291-306, 1986; Dahl et al., Nat Hum Behav 3:1203-14, 2019; Beardmore et al., J Alzheimer's Dis 83:5-22, 2021; Gilvesy et al., Acta Neuropathol 144:651-76, 2022; Madelung et al., Mov Disord 37:479-89, 2022). It was not clear which part(s) of the LC was captured for the SRT and snRNAseq experiments.

      As discussed above for Reviewer 1, a limitation of this study was that we did not record the orientation of the anatomy of the tissue sections, precluding our ability to annotate the tissue sections with the rostral/caudal and dorsal/ventral axis labels. We agree with the reviewer that additional spatial studies, in future work, could offer needed and important information about expression profiles across the spatial axes (rostral/caudal, ventral/dorsal) of the LC. Our study provides us with insight about optimizing the dissections for spatial assays, as well as bringing to light a number of technical and logistical issues that we had not initially foreseen. For example, during the course of this study and parallel, ongoing work in other, small, challenging regions, we have now developed a number of specialized technical and logistical strategies for keeping track of orientation and mounting serial sections from the same tissue block onto a single spatial array, which is extremely technically challenging. We are now well-prepared for addressing these issues in future studies with larger numbers of donors and samples in order to make these types of insights.

      The authors mention that in other human SRT studies, there are typically between 1-10 cells per expression spot. I imagine that this depends heavily on the part of the brain being studied and neuronal density, but it was unclear how many LC cells were contained in each expression spot.

      The reviewer is correct that we did not include this information in the manuscript. We attempted to apply a computational method to count nuclei contained in each gene expression spot based on analyzing the histological H&E images (VistoSeg; Tippani et al. 2022), which we have developed and previously applied in data from the dorsolateral prefrontal cortex (DLPFC) (Maynard and Collado-Torres et al. 2021). Based on the segmentation using this workflow we observe that the counts in this region are similar to what we observed in the DLPFC, i.e., typically between 1-10 LC cells per expression spot, with approximately 1-2 LC-NE neurons (which are characterized by their large size) per expression spot. However, these analyses had several technical issues related to the images themselves, the relatively large size and pigmentation of LC-NE neurons, and parameter settings that had been optimized for different brain regions. We are currently optimizing this analysis workflow for these images to provide more accurate estimates of cell counts per spot to give readers additional context on the number of nuclei per spot in the annotated LC regions and outside the LC regions in a revised version of the manuscript.

      Regarding comparison of human LC-associated genes with rat or mouse LC-associated genes (Fig. 2D-F), the authors speculate that the modest degree of overlap may be due to species differences between rodents and human and/or methodological differences (SRT vs microarray vs TRAP). Was there greater overlap between mouse and rat than between mouse/rat and human? If so, that is evidence for the former. If not, that is evidence for the latter. Also would be useful for more in-depth comparison with snRNA-seq data from mouse LC: https://www.biorxiv.org/content/10.1101/2022.06.30.498327v1.

      We will investigate this question and discuss this in updated results in a revised version of the manuscript.

      The finding of ACHE expression in LC neurons is intriguing, especially in light of work from Susan Greenfield suggesting that ACHE has functions independent of ACH metabolism that contributes to cellular vulnerability in neurodegenerative disease.

      We thank the reviewer for pointing this out. We were very surprised too by the observed expression of SLC5A7 and ACHE in the LC regions (Visium data) and within the LC-NE neuron cluster (snRNA-seq data), coupled with absence of other typical cholinergic marker genes (e.g. CHAT, SLC18A3), and we do not have a compelling explanation or theory for this. Hence, the work of Susan Greenfield and colleagues suggesting non-cholinergic actions of ACHE, particularly in other catecholaminergic neurons (e.g. dopaminergic neurons in the substantia nigra) is very interesting. We will include references to this work and how it could inform interpretation of this expression in a revised version of the manuscript (Greenfield 1991; Halliday and Greenfield 2012).

      High mitochondrial reads from snRNA-seq can indicate lower quality. It was not clear why, given the mitochondrial read count, the authors are confident in the snRNA-seq data from presumptive LC-NE neurons.

      We will include additional analyses to further investigate and/or confirm this finding (e.g. comparing sum of UMI counts / number of detected genes and mitochondrial percentage per nucleus for this population to confirm data quality) in additional supplementary figures in a revised version of the manuscript.

      References

      • Greenfield (1991), A noncholinergic action of acetylcholinesterase (AChE) in the brain: from neuronal secretion to the generation of movement, Cellular and Molecular Neurobiology, 11, 1, 55-77.

      • Halliday and Greenfield (2012), From protein to peptides: a spectrum of non-hydrolytic functions of acetylcholinesterase, Protein & Peptide Letters, 19, 2, 165-172.

      • Iannitelli et al. (2023), The neurotoxin DSP-4 dysregulates the locus coeruleus-norepinephrine system and recapitulates molecular and behavioral aspects of prodromal neurodegenerative disease, eNeuro, 10, 1, ENEURO.0483-22.2022.

      • Iijima K. (1989), An immunocytochemical study on the GABA-ergic and serotonin-ergic neurons in rat locus ceruleus with special reference to possible existence of the masked indoleamine cells. Acta Histochema, 87, 1, 43-57.

      • Iijima K. (1993), Chemocytoarchitecture of the rat locus ceruleus, Histology and Histopathology, 8, 3, 581-591.

      • Luskin A.T., Li L. et al. (2022), A diverse network of pericoerulear neurons control arousal states, bioRxiv (preprint).

      • Maynard and Collado-Torres et al. (2021), Transcriptome-scale spatial gene expression in the human dorsolateral prefrontal cortex, Nature Neuroscience, 24, 425-436.

      • Tippani et al. (2022), VistoSeg: processing utilities for high-resolution Visium/Visium-IF images for spatial transcriptomics data, bioRxiv (preprint).

      • Tran M.N., Maynard K.R. et al. (2021), Single-nucleus transcriptome analysis reveals cell-type-specific molecular signatures across reward circuitry in the human brain, Neuron, 109, 3088-3103.

      • Zhao E. et al. (2021), Spatial transcriptomics at subspot resolution with BayesSpace, Nature Biotechnology, 39, 1375-1384.

    1. Author response:

      The following is the authors’ response to the current reviews.

      Reviewer #2:

      Line 295 – was the time post-infection, which varies considerably between groups and across samples, taken into consideration when comparison of response was between ChatCre mice (4-9 weeks post-infection) and WT mice (four to five weeks post-infection)?

      Thank you for your comment. We did not originally assess the effects of time post-injection on DREADD response. Generally, AAV transgene expression has been demonstrated to be long-term and stable in the CNS of mice.[1] However, there is some variation in the reporting time of peak transgene expression[2], and this may potentially impact our results.

      In investigating this issue further, we discovered an error in our reporting as we did have n = 1 wild-type mouse that underwent EMG recordings 62 days (~9 weeks) post-AAV injection. This has been corrected in the manuscript (lines 87-88).

      Addressing this question is challenging due to the uneven distribution of time points within the 4–9-week windows for each group. Essentially, there were two groups per cohort, one studied at 4-5 weeks and one at 8-9 weeks. More specifically:

      - Wild-type cohort: n = 10 animals were studied 28–33 days post-injection, and n = 1 at 62 days.

      - ChAT-Cre cohort: n = 4 animals were studied 28–30 days post-injection, and n = 5 at 56–59 days.

      We performed Pearson correlation analyses between time post-injection and diaphragm EMG response to DREADD activation (peak amplitude and area under the curve, AUC) for both cohorts (Author response image 1):

      - ChAT-Cre: No significant correlations were found (peak amplitude: r<sup>2</sup> = -0.117, r = -0.1492, p = 0.702, Figure 1a-b; AUC:r<sup>2</sup> = -0.0883, r = 0.2184, p = 0.572, Figure 1c-d).

      - Wild type: Initial analysis of all data showed significant correlations (peak amplitude:r<sup>2</sup> = 0.362, r = 0.6523, p = 0.0296, Figure 1a; AUC: r<sup>2</sup> = 0.347, r = 0.6424, p = 0.033, Figure 1c), suggesting a moderate positive correlation between time post-injection and EMG response. However, when the single 8–9-week wild-type mouse was excluded, these correlations were no longer significant (peak amplitude: r<sup>2</sup> = 0.172, r = 0.5142, p = 0.128, Figure 1b; AUC: r<sup>2</sup> = 0.23, r = 0.5614, p = 0.0913, Figure1d).

      Comparing wild-type and ChAT-Cre groups directly was unreliable due to the single wild-type mouse studied at the later time point. We attempted to model time post-injection as a continuous variable (i.e., exact days post-injection) using a restricted maximum likelihood mixed linear model in JMP; however, the analysis could not be performed because there were not sufficient overlapping time points between the two cohorts (i.e., not all days post-injection were represented in both groups). To mitigate this, we binned animals into two groups: 4–5 weeks and 8–9 weeks post-injection. This analysis returned a significant interaction between cohort and time post-injection (p = 0.0391), however there were no significant multiple comparisons upon Tukey post hoc test (i.e., p > 0.05).

      Based on these findings, we feel confident that time post-injection is unlikely to have a significant impact on diaphragm EMG response to DREADD activation in the ChAT-Cre cohort. However, in the wild-type cohort, it is difficult to draw definitive conclusions, as only one animal was studied at the 8–9-week time point. For similar reasons, it remains unclear whether the relationship between time post-AAV transduction and DREADD response differs between cohorts. Given the inconclusive nature of these results, we have elected not to include this analysis in the manuscript. Nevertheless, to ensure transparency, we have provided Author response image 1 below of peak amplitude and AUC plotted against time, allowing readers to evaluate the data independently.

      Author response image 1.

      Plots of diaphragm EMG peak amplitude (a-b) and area under the curve (c-d) vs. days post-AAV injection for wild-type (blue) and ChAT-Cre (orange) mice. Pearson correlation analyses were performed to assess the relationship between time post-AAV injection and diaphragm EMG DREADD response in wild-type and ChAT-Cre mouse cohorts. r<sup>2</sup>, r, and p-values are shown in each panel for both cohorts. Panels a and c display peak amplitude and AUC, respectively, including all animals. Panels b and d present the same variables with the n = 1 wild-type mouse at the 9-week time point excluded; ChAT-Cre data is unchanged between corresponding panels. Scatter points represent data from individual animals. Polynomial trendlines are displayed for each cohort with wild-type in blue and ChAT-Cre in orange.

      REFERENCES

      (1) Kim, J. Y., Grunke, S. D., Levites, Y., Golde, T. E. & Jankowsky, J. L. Intracerebroventricular viral injection of the neonatal mouse brain for persistent and widespread neuronal transduction. J Vis Exp, 51863 (2014). https://doi.org/10.3791/51863

      (2) Hollidge, B. S. et al. Kinetics and durability of transgene expression after intrastriatal injection of AAV9 vectors. Front Neurol 13, 1051559 (2022). https://doi.org/10.3389/fneur.2022.1051559


      The following is the authors’ response to the original reviews.

      Response to reviewer’s public reviews:

      We chose the dose of J60 based on a prior publication that established that off-target effects were possible at relatively high doses[1]. The dose that we used (0.1 mg/kg) was 30-fold less than the dose that was reported in that paper to potentially have off-target responses (3 mg/kg). Further, Author response image 1 shows the results of experiments in which J60 was given to animals that did not have the excitatory DREADD expressed in the spinal cord. This includes a sample of mice (n = 2) and rats (n = 3), recorded from using the same diaphragm EMG procedure described in the manuscript. The figure shows that there was no consistent response to the J60 at 0.1 mg/kg in the “control experiment” in which the DREADD was not expressed in the spinal cord.

      Author response image 1.

      Diaphragm EMG response to J60 administrated to naïve rats and mice. Panel a-b show raw EMG values at baseline, following vehicle (saline) and J60 administration for the left and right hemidiaphragm. Panel c-d shows EMG values normalized to baseline. Neither One-way RM ANOVA (panel a-b) nor paired t-test (panel c-d) returned significant p values (p < 0.05).

      Response to specific reviewer comments:

      Reviewer #1:

      How old were the animals at the time of AAV injection, and in subsequent experiments?

      The wildtype cohort of mice were 7-9 weeks old at time of AAV injection and DREADD experiments took place 4-5 weeks after AAV injection. ChAT-Cre mice were 6-10 weeks old at time of AAV injection and DREADD experiments took place 4-9 weeks after AAV injection. ChAT-Cre rats were 2-5 months old at time of AAV spinal injection. These animals underwent plethysmography recordings 3-4 months post-AAV injection and subsequently phrenic nerve recording 3-8 weeks later. These details have been added to the Method section.

      How many mice were excluded from electrophysiology experiments due to deteriorating electrode contact?

      No mice were excluded from electrophysiology experiments due to deteriorating electrode contact. If you are referring to the n = 1 excluded ChAT-Cre mouse (line 368) this animal was excluded because it showed no histological evidence of DREADD expression (lines 200-206).

      What was the urethane dose?

      The urethane dose for phrenic nerve recordings was 2.1 g/kg. See methods section line 395.

      A graphical timeline of the experimental progression for plethysmography and electrophysiology studies would enhance clarity.

      A graphical timeline has been added. See Figure S6.

      Significance indicators in the figures would greatly enhance clarity. It is a little awkward to have to refer to supplemental tables to figure out statistical differences.

      Significance indicators have been added. See Figures 1, 2, 4, and 5

      In Figures 1, 2, and 5, individual data points should be shown, as in Fig 4.

      Thank you for this suggestion. We agree that, in general, it is best practice to scatter individual data points. However, when we drafted the new figures, it was apparent that including individual scatter points, in this case, created very “cluttered” figures that were very difficult to interpret.

      More detail regarding the plethysmography studies is needed. Was saline/J60 infused via a tail vein catheter? Were animals handled during the infusion? How long is the "IV" period? What volume of fluid was delivered?

      All IV infusions were delivered via a tail vein catheter. Animals were not handled during infusion nor at any point during the recording. An IV catheter was externalized via a port in the plethysmograph allowing for IV infusion without handling of the animal or opening the plethysmograph. The infusion period for both saline and J60 was standardized to 2 minutes. The volume of fluid of both saline and J60 was standardized to 0.6 mL. This information has been added to the methods section (lines 408-410, 415-16, 419-420).

      Reviewer #2:

      The abstract could be improved by briefly highlighting the rationale, scope, and novelty of the study - the intro does a great job of highlighting the scope of the study and the research questions.

      A brief explanation of the rationale, scope, and novelty of the study has been added to the abstract. See lines 2-8.

      Line 18, specifies that this was done under urethane anesthesia.

      This detail has been added to the abstract (line 20).

      The methods section should be moved to the end of the manuscript according to Journal policy.

      The methods section has been moved to the end of the manuscript.

      The authors mention the use of both female and male rats but it is not indicated if they tested for and observed any differences between sexes across experiments.

      We included the use of both male and female animals in this study to improve the generalizability of the results. However, we were not adequately powered for sex comparisons and therefore did not perform any statistical analysis to assess differences between sexes across experiments. Text has been added to the methods section (lines 534-537) to clarify.

      Line 40, since delivery of J60 was performed in both IV and IP, this general statement should be updated.

      This detail has been revised to include both IV and IP. See line 43.

      Line 42. "First, we determined if effective diaphragm activation requires focal DREADD expression targeting phrenic motor neurons, or if non-specific expression in the immediate vicinity of the phrenic motor nucleus would be sufficient...." I don't think that in the experiments with wild-type mice the authors can claim that they selectively targeted the cervical propriospinal network (in isolation from the motoneurons). Given the fact that the histological analysis did not quantify interneurons or motoneurons in the spinal cord, authors should be cautious in proposing which neuronal population is activated in the non-specific approach.

      We agree, and this was a poorly worded statement in our original text. We agree that wild-type DREADD expression was not limited to the cervical propriospinal networks but likely a mix of interneurons and motoneurons. The text has been edited to reflect that (see lines 56-60).

      AAV virus source is not described.

      All AAVs were obtained from the UF Powell Gene Therapy Center. Details of virus source and production have been added to the methods section. See lines 336-347.

      Line 108-125. Because the diaphragm EMG recordings are only described for mice here, I would suggest editing this methods section to clearly state mice instead of vaguely describing "animals" in the procedure.

      “Animals” has been changed to “mice” to avoid ambiguity.

      Line 120, add parenthesis.

      Parenthesis has been added.

      Line 126. Whole body plethysmography protocol. Three hypercapnic hypoxic challenges are a lot for a rat within a 3-hour recording session in freely behaving rats. Did the authors verify with control/ vehicle experiments that repeated challenges in the absence of J60 do not cause potentiation of the response? I understand that it is not possible to invert the order of the injections (due to likely long-term effects of J60) or it is too late to perform vehicle and J60 injections on different days, but controls for repeated challenges should be performed in this type of experiment, especially considering the great variability in the response observed in Figure 4 (in normoxic conditions).

      We did not conduct control experiments to assess the impact of repeated hypercapnic hypoxic challenges on the naïve response (i.e., in the absence of J60). However, our experimental protocol was designed such that each experimental period (i.e., post-vehicle or post-J60 infusion) was normalized to baseline recordings taken immediately prior to the vehicle or J60 infusion. While repeated exposure to hypercapnic hypoxic challenges may have altered respiratory output, we are confident that normalizing each experimental period to its respective baseline effectively captures the impact of DREADD activation on ventilation, independent of any potential potentiation that may have occurred due to gas challenge exposure. We have included raw values for all plethysmography outcomes (see Figure 4, panels a-c) to ensure full data transparency. Still, we believe that the baseline-normalized values more accurately reflect the impact of DREADD activation on the components of ventilation.

      Furthermore, why the response to the hypercapnic hypoxic challenges are not reported? These could be very interesting to determine the effects of DREADD stimulation on chemosensory responses and enhance the significance of the study.

      Response to the hypercapnic hypoxic challenges has been added to the manuscript. See Figure S3 and results section lines 162-167. Briefly, there were no statistically significant (p < 0.05) differences in tidal volume, respiratory rate, or minute ventilation between J60 vs sham condition during hypercapnic-hypoxic ventilatory challenges.

      Line 200 - what is the reason behind performing a qualitative analysis of mCherry in various quadrants? This limits the interpretation of the results. If the authors used Chat-cre rats, the virus should only be in Chat+ MN. Knowing how selective the virus is, and whether its expression was selective for Phrenic MN versus other MN pools, could address several technical questions.

      We agree that detailed quantification of expression by motoneuron pool would be of value in future work.  However, for these initial proof-of-concept experiments, we performed the quadrant-based qualitative analysis of mCherry expression to provide a simple comparison of mCherry expression between groups (i.e., ChAT-Cre vs. wildtype mice). This analysis allowed us to: 1) show the reader that each animal included in the study showed evidence of mCherry expression and 2) give the reader an idea of patterns of mCherry expression throughout the mid-cervical spinal cord. Additionally, it is important to note that while ChAT is a marker of motoneurons some populations of interneurons also express ChAT(2-4).

      Given the increased values of Dia EMG AUC and no changes in respiratory rate, did the authors determine if there was a change in the inspiratory time with J60 administration?

      We did not assess inspiratory time.

      High death rate in DREADD WT mice - was histological analysis performed on these mice? Could it be due to the large volume injected into the spinal cord that affects not only descending pathways but also ascending ones? Or caused by neuronal death due to the large volume of viral solution in injected in mice.

      Histological analysis was performed on these animals to assess mCherry expression only (i.e., no staining for NeuN or other markers was performed). While the reviewer's speculations are reasonable, we feel these reasons are unlikely to explain the death rate in DREADD WT mice as ChAT-Cre mice received the same volume injected into their spine and lived up until and during diaphragm EMG recordings. Additionally, WT mice lived for 4-5 weeks post-injection which would be past the acute phase that a large immune response to the viral dose would have occurred.

      Line 299-304. Can you please clarify whether these rats were tested under anesthesia?

      These rats were assessed under anesthesia. This detail has been added (line 146).

      Given some of the unexpected results on cardiovascular parameters in urethane anesthetized rats, did the authors test the effects of J60 in the absence of AAV construct infection?

      A small cohort (n = 2) of urethane anesthetized naïve wildtype rats were given the J60 ligand (IV, 0.1 mg/kg dose). We did observe a sudden drop in blood pressure after J60 administration that was sustained for the duration of the recording. One animal showed a 12% decrease in mean arterial blood pressure following J60 administration while the other showed a 35% decrease. Thus, it does appear that in this preparation the J60 ligand is producing a drop in arterial blood pressure.

      Line 393. I believe this comment is referred to the intrapleural and diaphragmatic injection. Maybe this should clarified in the sentence.

      This sentence has been revised for clarity (see lines 248-250).

      Figures 1 and 2. It would be informative to show raw traces of the Diaphragm EMG to demonstrate the increase in tonic EMG. It is not possible to determine that from the integrated traces in Figures 1A and B.

      Thank you for bringing up this concern. While the mean data in Figures 1F and 2F do indicate that, on average, animals had tonic diaphragm EMG responses to DREADD activation, the examples given in Figures 1A and 2A show minimal responses. This makes it difficult to fully appreciate the tonic response from those particular traces. However, clear tonic activity can be appreciated from Figures 5A and S2. In these figures, tonic activity is evident from the integrated EMG signals, presenting as a sustained increase in baseline activity between bursts—essentially an upward shift from the zero point.

      References

      (1) Van Savage, J. & Avegno, E. M. High dose administration of DREADD agonist JHU37160 produces increases in anxiety-like behavior in male rats. Behav Brain Res 452, 114553 (2023). https://doi.org/10.1016/j.bbr.2023.114553

      (2) Mesnage, B. et al. Morphological and functional characterization of cholinergic interneurons in the dorsal horn of the mouse spinal cord. J Comp Neurol 519, 3139-3158 (2011). https://doi.org/10.1002/cne.22668

      (3) Gotts, J., Atkinson, L., Yanagawa, Y., Deuchars, J. & Deuchars, S. A. Co-expression of GAD67 and choline acetyltransferase in neurons in the mouse spinal cord: A focus on lamina X. Brain Res 1646, 570-579 (2016). https://doi.org/10.1016/j.brainres.2016.07.001

      (4) Alkaslasi, M. R. et al. Single nucleus RNA-sequencing defines unexpected diversity of cholinergic neuron types in the adult mouse spinal cord. Nat Commun 12, 2471 (2021). https://doi.org/10.1038/s41467-021-22691-2

    1. Author response:

      The following is the authors’ response to the original reviews

      We thank the public reviewers and editors for their insightful comments on the manuscript. We have made the following changes to address their concerns and think the resulting manuscript is stronger as a result. Specifically, we have 1) added RNA FISH data of specific STB-2 and STB-3 RNA markers to confirm their distribution changes between STB<sup>in</sup> and STB<sup>out</sup> TOs, 2) removed language throughout the text that refer to STB-3 as a terminally differentiated nuclear subtype, and 3) generated CRISPR-mediated knock-outs of two genes identified by network analysis and validated their rolse in mediating STB nuclear subtype gene expression.

      Reviewer #1 (Public review): 

      Strengths: 

      The study offers a comprehensive SC- and SN-based characterization of trophoblast organoid models, providing a thorough validation of these models against human placental tissues. By comparing the older STB<sup>in</sup> and newer STB<sup>out</sup> models, the authors effectively demonstrate the improvements in the latter, particularly in the differentiation and gene expression profiles of STBs. This work serves as a critical resource for researchers, offering a clear delineation of the similarities and differences between TO-derived and primary STBs. The use of multiple advanced techniques, such as high-resolution sequencing and trajectory analysis, further enhances the study's contribution to the field. 

      Thank you for your thoughtful review—we appreciate your recognition of our efforts to comprehensively validate trophoblast organoid models and highlight key advancements in STB differentiation and gene expression.

      Weaknesses: 

      While the study is robust, some areas could benefit from further clarification. 

      (1) The importance of the TO model's orientation and its impact on outcomes could be emphasized more in the introduction. 

      We agree that TO orientation may significantly influence STB nuclear subtype differentiation. As the STB is critical for both barrier formation and molecular transport in vivo, lack of exposure to the surrounding media in STB<sup>in</sup> TOs in vitro could compromise these functions and the associated environmental cues that influence STB nuclear differentiation. We have added text to the introduction to highlight this point (lines 117-120).

      (2) The differences in cluster numbers/names between primary tissue and TO data need a clearer explanation, and consistent annotation could aid in comparison. 

      Thank you for highlighting that the comparisions and cluster annotations need clarification. In Figure 1, we did not aim to directly compare CTB and STB nuclear subtypes between TOs and tissue. Each dataset was analyzed independently, with clusters determined separately and with different resolutions decided via a clustering algorithm (Zappia and Oshlack, 2018). For example, for the STB, this approach identified seven subtypes in tissue but only two in TOs, making direct comparison challenging. To address this challenge, we integrated the SN datasets from TOs and tissue in Figure 6. This integration allowed us to directly compare gene expression between the sample types and examine the proportions within each STB subtype. Similarly, in Figure 2, direct comparison of individual CTB or STB clusters across the separate datasets is challenging (Figures 2A-C) due to differences in clustering. To overcome this, we integrated the datasets to compare cluster gene expression and relative proportions (Figures 2D-E). Nonetheless, to address the reviewers concern we have added text to the results section to clarify that subclusters of CTB and STB between datasets should not be directly compared until the datasets are integrated in Figure 2D-E and Figure 6 (lines 166-167).

      (3) The rationale for using SN sequencing over SC sequencing for TO evaluations should be clarified, especially regarding the potential underrepresentation of certain trophoblast subsets. 

      This is an important point as the challenges of studying a giant syncytial cell are often underappreciated by researchers that study mononucleated cells. We have added text to the introduction to clarify why traditional single cell RNA sequencing techniques were inadequate to collect  and characterize the STB (lines 91-93).

      (4) Additionally, more evidence could be provided to support the claims about STB differentiation in the STB<sup>out</sup> model and to determine whether its differentiation trajectory is unique or simply more advanced than in STB<sup>in</sup>. 

      Our original conclusion that STB<sup>out</sup> nuclei are more terminally differentiated than STB<sup>in</sup> was based on two observations: (1) STB<sup>out</sup> TOs exhibit increased expression of STB-specific pregnancy hormones and many classic STB marker genes and (2) STB<sup>out</sup> nuclei show an enrichment of the STB-3 nuclear subtype, which appears at the end of the slingshot pseudotime trajectory. However, upon consideration of the reviewer comments, we agree that this evidence is not sufficient to definitively distinguish if STB<sup>out</sup> nuclei are more advanced or follow a unique differentiation trajectory dependent on new environmental cues. Pseudotime analyses provided only a predictive framework for lineage tracing, and these predictions must be experimentally validated. Real-time tracking of STB nuclear subtypes in TOs would require a suite of genetic tools beyond the scope of this study. Therefore, to address the reviewers' concerns we have removed language suggesting that STB-3 is a terminally differentiated subtype or that STB<sup>out</sup> nuclei are more differentiated than STB<sup>in</sup> nuclei throughout the text until the discussion. Therein we present both our original hypothesis (that STB nuclei are further differentiated in STB<sup>out</sup>) and alternative explanations like changing trajectories due to local environmental cues (lines 619-625).

      Reviewer #2 (Public review): 

      Strengths: 

      (1) The use of SN and SC RNA sequencing provides a detailed analysis of STB formation and differentiation. 

      (2) The identification of distinct STB subtypes and novel gene markers such as RYBP offers new insights into STB development. 

      Thank you for highlighting these strengths—we appreciate your recognition of our use of SN and SC RNA sequencing to analyze STB differentiation and the discovery of distinct STB subtypes and novel gene markers like RYBP.

      Weaknesses: 

      (1) Inconsistencies in data presentation. 

      We address the individual comments of reviewer 2 later in this response.

      (2) Questionable interpretation of lncRNA signals: The use of long non-coding RNA (lncRNA) signals as cell type-specific markers may represent sequencing noise rather than true markers. 

      We appreciate the reviewer’s attention to detail in noticing the lncRNA signature seen in many STB nuclear subtypes. However, we disagree that these molecules simply represent sequencing noise. In fact, may studies have rigorously demonstrated that lncRNAs have both cell and tissue specific gene expression (e.g., Zhao et al 2022, Isakova et al 2021, Zheng et al 2020). Further, they have been shown to be useful markers of unique cell types during development (e.g., Morales-Vicente et al 2022, Zhou et al 2019, Kim et al 2015) and can enhance clustering interpretability in breast cancer (Malagoli et al 2024). Many lncRNAs have also been demonstrated to play a functional role in the human placenta, including H19, MEG3, and MEG8 (Adu-Gyamfi et al 2023) and differences are even seen in nuclear subtypes in trophoblast stem cells (Khan et al 2021). Therefore, we prefer to keep these lncRNA signatures included and let future researchers test their functional role.

      To improve the study's validity and significance, it is crucial to address the inconsistencies and to provide additional evidence for the claims. Supplementing with immunofluorescence staining for validating the distribution of STB_in, STB_out, and EVT_enrich in the organoid models is recommended to strengthen the results and conclusions. 

      Each general trophoblast cell type (CTB, STB, EVT) has been visualized by immunofluorescence by the Coyne laboratory in their initial papers characterizing the STB<sup>in</sup>, STB<sup>out</sup>, and EVT<sup>enrich</sup> models (Yang et al, 2022 and 2023). We agree that it is important to validate the STB nuclear subtypes found in our genomic study. However, one challenge in studying a syncytia is that immunofluorescence may not be a definitive method when the nuclei share a common cytoplasm. This is because protein products from mRNAs transcribed in one nucleus are translated in the cytoplasm and could diffuse beyond sites of transcription. Therefore, RNA fluorescence in situ hybridization (RNA-FISH) is instead needed. While a systematic characterization of the spatial distribution of the many marker genes found each subtype is outside the scope of this study, we include RNA-FISH of one STB-2 marker (PAPPA2) and one STB-3 marker (ADAMTS6) in Figure 3F-G and Supplemental Figure 3.3. This demonstrates there is an increase in STB-2 marker gene expression in STB<sup>in</sup> TOs and an increase in STB-3 marker gene expression in STB<sup>out</sup> TOs. 

      Reviewer #3 (Public review):  

      The authors present outstanding progress toward their aim of identifying, "the underlying control of the syncytiotrophoblast". They identify the chromatin remodeler, RYBP, as well as other regulatory networks that they propose are critical to syncytiotrophoblast development. This study is limited in fully addressing the aim, however, as functional evidence for the contributions of the factors/pathways to syncytiotrophoblast cell development is needed. Future experimentation testing the hypotheses generated by this work will define the essentiality of the identified factors to syncytiotrophoblast development and function. 

      We thank the reviewer for their thoughtful assessment, constructive feedback, and encouraging comments. We acknowledge that the initial manuscript primarily presented analyses suggesting correlations between RYBP and other factors identified in the gene network analysis and STB function. Understanding how gene networks in the STB are formed and regulated is a long-term goal that will require many experiments with collaborative efforts across multiple research groups.

      Nonetheless, to address this concern we have knocked out two key genes, RYBP and AFF1, in TOs using CRISPR-Cas9-mediated gene targeting. Bulk RNA sequencing of STB<sup>in</sup> TOs from both wild-type (WT) and knockout strains revealed that deletion of either gene caused a statistically significant decrease in the expression of the pregnancy hormone human placental lactogen and an increase in the expression of several genes characteristic of the oxygen-sensing STB-2 subtype, including FLT-1, PAPPA2, SPON2, and SFXN3. These findings demonstrate that knocking out RYBP or AFF1 results in an increase in STB-2 marker gene expression and therefore play a role in inhibiting their expression in WT TOs (Figure 5D-E and supplemental Figure 5.2). We also note that this is the first application of CRISPR-mediated gene silencing in a TO model.

      Future work will visualize the distribution of STB nuclear subtypes in these mutants and explore the mechanistic role of RYBP and AFF1 in STB nuclear subtype formation and maintenance. However, these investigations fall outside the scope of the current study.

      Localization and validation of the identified factors within tissue and at the protein level will also provide further contextual evidence to address the hypotheses generated. 

      We agree that visualizing STB nuclear subtype distribution is essential for testing the many hypotheses generated by our analysis. To address this, we have included RNA-FISH experiments for two STB subtype markers (PAPPA2 for STB-2 and ADAMTS6 for STB-3) in TOs. These experiments reveal an increase in PAPPA2 expression in STB<sup>in</sup> TOs and an increase in ADAMTS6 expression in STB<sup>out</sup> TOs (Figure 3F-G and Supplemental Figure 3.3). Genomic studies serve as powerful hypothesis generators, and we look forward to future work—both our own and that of other researchers—to validate the markers and hypotheses presented from our analysis.

      Recommendations for the authors: 

      Reviewing Editor Comments: 

      We strongly encourage the authors to further strengthen the study by addressing all reviewers' comments and recommendations, with particular attention to the following key aspects:

      (1) Clarifying the uniqueness of the STB differentiation trajectory between STB<sup>in</sup> and STB<sup>out</sup>, and determining whether STB<sup>out</sup> represents a more advanced stage of differentiation compared to STB<sup>in</sup>. It is also important to specify which developmental stage of placental villi the STB<sup>out</sup> and STB<sup>in</sup> are simulating. 

      We have revised the manuscript to remove definitive language claiming that STB-3 represents a terminally differentiated subtype or that STB<sup>out</sup> nuclei are more differentiated than STB<sup>in</sup> nuclei. Instead, we now present our hypothesis and alternative explanations in the discussion (lines 619-625), and emphasize the need for experimental validation of pseudotime predictions to test these hypotheses.

      (2) Utilizing immunofluorescence to validate the distribution of cell types in the organoid models. 

      The Coyne lab has previously performed immunofluorescence of CTB and STB markers in STB<sup>in</sup> and STB<sup>out</sup> TOs (Yang et al 2023). The syncytial nature of STBs complicates immunofluorescence-based validation of the STB nuclear subtypes due translating proteins all sharing a single common cytoplasm and therefore being able to diffuse and mix. Instead, we performed RNA-FISH for two STB subtype markers (PAPPA2, STB-2 and ADAMTS6, STB-3), which showed subtype-specific nuclear enrichment in STB<sup>in</sup> and STB<sup>out</sup> TOs, respectively (Figure 3F-G and Supplemental Figure 3.3).  

      (3) Addressing concerns regarding the use of lncRNA as cell marker genes. Employing canonical markers alongside critical TFs involved in differentiation pathways to perform a more robust cell-type analysis and validation is recommended.  

      As discussed in detail above, we maintain that lncRNAs are valuable markers, supported by their demonstrated roles in cell and tissue specificity and placental function. These signatures provide important insights and hypotheses for future research, and we have clarified this rationale in the revised manuscript.

      Reviewer #1 (Recommendations for the authors): 

      (1) The authors have presented an extensive SC- and SN-based characterization of their improved trophoblast TO model, including a comparison to human placental tissues and the previous TO iteration. In this way, the authors' work represents an invaluable resource for investigators by providing thorough validation of the TO model and a clear description of the similarities and differences between primary and TO-derived STBs. I would suggest that the authors reshape the study to further highlight and emphasize this aspect of the study. 

      We thank the reviewer for their thoughtful recommendation and agree that our datasets will serve as an invaluable resource for comparing in vitro models to in vivo gene expression. However, extensive validation is required to make definitive conclusions about the extent to which these systems mirror one another and where they diverge. For this reason, in this manuscript, we have focused on characterizing STB subtypes to provide a foundational understanding of the model and this poorly characterized subtype.

      (2) Introduction, Paragraph 3: What is the importance of orientation for the trophoblast TO model? The authors may consider removing some of the less important methodologic details from this paragraph and including more emphasis on why their TO model is an improvement. 

      Text has been added to this paragraph to highlight the importance of outward facing STB orientation, which is essential to mirror the STB’s transport function in vitro (lines 118-120).

      (3) Results, Figure 1: In addition to the primary placental tissue plots showing all cell populations, it may be useful to have side-by-side versions of similar plots showing only the trophoblast subsets, so that the primary and TO data could be more easily compared visually. 

      This has been implemented and added to the Supplemental Figure 1.4.

      (4) Results, Figure 1: In simple terms, what is the reason for ending up with different cluster numbers/names from the primary tissue and TO? Would it be possible to apply the same annotation to each (at least for trophoblast types) and thus allow direct comparison between the two? 

      As described above, each dataset was separately analyzed and clusters determined with an algorithm to determine the optimal clustering resolution. Therefore, the number of clusters between each dataset cannot be directly compared until the SN TO and tissue datasets are integrated together in Figure 6. We have added text to the manuscript to make it clear that they should not be compared except for in bulk number until this point (230-232).

      (5) Results, Figure 2: For subsequent evaluation of different in vitro TO conditions, did the authors use only SN sequencing because they wanted to focus on STB? Based on Figure 1, it seems some CTB subsets would be underrepresented if using only SN. Given that the authors look at both STB and CTB in their different TOs, is this an issue? 

      The CTB clusters that showed the greatest divergence between SC and SN datasets were those associated with mitosis and the cell cycle, likely due to nuclear envelope breakdown interfering with capture by the 10x microfluidics pipeline. While cytoplasmic gene expression provides valuable insights into CTB function, our manuscript focuses on the STB starting from Figure 2. Since the STB is captured exclusively by the SN dataset, we concentrated on this approach to streamline our analysis.

      (6) Results, Figure 3: What do the authors consider to be the primary contributing factors for why the STB subsets display differential gene expression between STB<sup>in</sup> and STB<sup>out</sup>? Is this due primarily to the cultural conditions and/or a result of the differing spatial arrangement with CTBs? 

      This is an intriguing question that is challenging to disentangle because the culture conditions are integral to flipping the orientation. The two primary factors that differ between STB<sup>in</sup> and STB<sup>out</sup> TOs are the presence of extracellular matrix in STB<sup>in</sup> and direct exposure to the surrounding media in STB<sup>out</sup>. We believe these environmental cues play a significant role in shaping the gene expression of STB subsets. Fully disentangling this relationship would require a method to alter the TO orientation without changing the culture conditions. While this is an exciting direction for future research, it falls outside the scope of the present study.

      (7) Results, Figure 4: The authors' analysis indicates that the STB nuclei from the STB<sup>out</sup> TO are likely "more differentiated" than those in STB<sup>in</sup> TO. Could the authors provide some qualitative or quantitative support for this? Is the STB<sup>out</sup> differentiated phenotype closer to what would be observed in a fully formed placenta? 

      As discussed earlier, we agree with the reviewers that this claim should be removed from the text outside of the discussion.

      (8) Results, Figure 5: Based on the trajectory analysis, do the authors consider that the STB from STB<sup>out</sup> TO are simply further along the differentiation pathway compared to those from STB<sup>in</sup> TO, or do the STB from STB<sup>out</sup> TO follow a differentiation pathway that is intrinsically distinct from STB<sup>in</sup> TO? 

      We think the idea of an intrinsically distinct pathway is a fascinating alternative hypothesis and have added it into the discussion. We do not find the pseudotime currently allows us to answer this question without additional experiments, so we have removed claims that the STB<sup>out</sup> STB nuclei are further along the differentiation pathway.

      (9) Results, Figure 6: A notable difference between the STB<sup>out</sup> TO and the term tissue is that the CTB subsets are much more prevalent. Is this simply a scale difference, i.e. due to the size of the human placenta compared to the limited STB nuclei available in the STB<sup>out</sup> TO? Or are there other contributing factors? 

      The proportion of CTB to STB nuclei in our term tissue (9:1) aligns with expectations based on stereological estimates. We believe the relatively low number of CTB nuclei in our dataset is due to the need for a larger sample size to capture more of this less abundant cell type. Since the primary focus of this paper is on STB, and we analyzed over 4,000 STB nuclei, we do not view this as a limitation. However, future studies utilizing SN to investigate term tissue should account for the abundance of STB nuclei and plan their sampling carefully to ensure sufficient representation of CTB nuclei if this is a desired focus.

      Reviewer #2 (Recommendations for the authors): 

      (1) The color annotations for cell types in Figure 2 are inconsistent between the different panels, and the term "Prolif" in Figure 2E is not explained by the authors. 

      We chose colors to enhance visibility on the UMAP. We do not wish readers to make direct comparisons between the different CTB or STB subtypes of the sample types until the datasets are integrated in Figure 2D. This is because an algorithm for the clustering resolution has been chosen independently for each dataset. Cluster proportions are better compared in the integrated datasets in Figure 2D. We have added text to the results section to make this clear to the reader (lines 166-167).

      (2) In Figure 3 and Supplementary Figures 1.3, the authors frequently present long non-coding RNA (lncRNA) signals as cell type-specific markers in the bubble plots. These signals are likely sequencing noise and may not accurately represent true markers for those cell types. It is recommended to revise this interpretation. 

      As referenced above, there are many examples of lncRNAs that have biological and pathological significance in the placenta (H19, Meg3, Meg8) and lncRNAs often have cell type specific expression that can enhance clustering. We prefer to keep these signatures included and let future researchers determine their biological significance.

      (3) In Figure 3C, the authors performed pathway enrichment analysis on the STB subtypes after integrating STB_in and STB_out organoids. The enrichment of the "transport across the blood-brain barrier" pathway in the STB-3 subtype does not align with the current understanding of STB cell function. Please provide corresponding supporting evidence. Additionally, please verify whether the other functional pathways represent functions specific to the STB subtypes. 

      Interestingly, many of the genes categorized under “transport across the blood-brain barrier” are transporters shared with “vascular transport.” These include genes involved in the transport of amino acids (SLC7A1, SLC38A1, SLC38A3, SLC7A8), molecules essential for lipid metabolism (SLC27A4, SLC44A1), and small molecule exchange (SLC4A4, SLC5A6). Given that the vasculature, the STB, and the blood-brain barrier all perform critical barrier functions, it is unsurprising that molecules associated with these GO terms are enriched in the STB-3 subtype, which expresses numerous transporter proteins. Since the transport of materials across the STB is a well-established function, we have not included additional supporting evidence but have clarified the genes associated with this GO term in the text (lines 392-394 and supplemental Table 9).

      (4) The pseudotime heatmap in Figure 4B is not properly arranged and is inconsistent with the differentiation relationships shown in Figure 4A. It is recommended to revise this. 

      We are uncertain which aspect of the heatmap in Figure 4A is perceived as inconsistent with Figure 4B. One distinction is that pseudotime in Figure 4A is normalized from 0 to 100 to fit the blue-to-yellow-to-red color scale, whereas in Figure 4B, the color scale is not normalized and the color bar ranging from white to red. This difference reflects our intent to simplify Figure 4B-C, as the abundance of color between cell types and gene expression changes required a streamlined representation to ensure the figure remained clear and easy to interpret. This is classically done in the field and consistent with the default code in the slingshot package.

      (5) In Figures 4C and 4D, although RYBP is highly expressed in STB, it is difficult to support the conclusion that RYBP shows the most significant expression changes. It is recommended to provide additional evidence. 

      The claim that RYBP exhibits the most significant expression changes was based on p-value ordering of genes associated with pseudotime via the associationTest function in slingshot and not with immunofluorescence data. The text has been revised to make this distinction clear (lines 390-393).

      (6) In Figure 4E, staining for CTB marker genes is missing, and in Figure 4F, CYTO is difficult to use as a classical STB marker. It is recommended to use the CGBs antibody from Figure 4E as a STB marker for staining to provide evidence.  

      We have revised the Figure 5B-C to use e-Cadherin as a CTB marker gene in TOs and CGB antibody as a marker of STB.

      In tissue, however, obtaining a good STB marker that does not overlap with the RYBP antibody (rabbit) in term tissue is difficult as the STB downregulates hCG expression closer to term to initiate contractions. SDC1 is often used but only labels the plasma membrane so does not help in distinguishing the STB cytoplasm. We have added an image of cytokeratin, e-Cadherin, and the STB marker ENDOU to validate that our current approach with e-Cadherin and cytokeratin allows us to accurately distinguish between CTB and STB cells.

      (7) The velocity results in Figure 5A do not align with the differentiation relationships between cells and contradict the pseudotime results presented in Figure 4 by the authors. 

      The reviewer raises an interesting observation regarding the velocity map in Figure 5A, which appears to show a bifurcation into two STB subtypes. This observation aligns with similar findings reported in tissue by our colleagues (Wang et al., 2024). However, given the low number of CTB cells in our tissue dataset, we were cautious about making definitive conclusions about pseudotime without a larger sample size. Notably, the RNA velocity map closely resembles the pseudotime trajectory in TOs, with CTB transitioning into the CTB-pf subtype and subsequently into the STB. One potential explanation for discrepancies between tissue and TOs is the difference in nuclear age: nuclei in tissue can be up to nine months old, whereas those in TOs are only hours or days old. It is possible that the lineage in TOs could bifurcate if cultured for longer than 48 hours, but our current dataset captures only the early stages of the STB differentiation process. While exploring these hypotheses is fascinating, they are beyond the scope of this current study.  

      Reviewer #3 (Recommendations for the authors): 

      Amazing work - I greatly enjoyed reading the manuscript. Here are a few questions and suggestions for consideration: 

      Evidence presented throughout the results sections hints that the organoids may represent an earlier stage of placental development compared to the term. Increased hCG gene expression is observed, but as noted expression is decreased in term STB. STB:CTB ratios are also higher at term compared to the first trimester, etc. It was difficult to conclude definitively based on how data is presented in Fig 6 and discussed. Maybe there is no clear answer. Perhaps the altered cell type ratios in the organoid models (e.g., few STB in EVT enrich conditions) impact recapitulation of the in vivo local microenvironment signaling. As such, can the authors speculate on whether cell ratios could be strategically leveraged to model different gestational time points? 

      Along these same lines, syncytiotrophoblast in early implantation (before proper villi development) is often described as invasive and later at the tertiary villi stage defined by hormone production, barrier function, and nutrient/gas exchange. Do the authors think the different STB subtypes captured in the organoid models represent different stages/functions of syncytiotrophoblast in placental development? 

      Minor Comments 

      (1) Please clarify what the third number represents in the STB:CTB ratio (e.g., 1:3:1 and 2:5:1). EVT? 

      The first number is a decimal point and not a colon (ie 1.3 and 2.5). Therefore these numbers are to be read as the STB:CTB ratio is 1.3 to 1 or 2.5 to 1.

      (2) Could consider co-localizing RYBP in term tissue with a syncytio-specific marker like CGB used for organoids (Fig 4F). 

      We addressed this concern in comment 6 to reviewer 2.

      (3) Recommend defining colors-which colors represent which module in Figure 5C in the legend and main body text. I see the labels surrounding the heatmap in 5B, but defining colors in text (e.g. cyan, magenta, etc.) would be helpful. Do the gray circles represent targets that don't belong to a specific module? Are the bolded factor names based on a certain statistical cutoff/defining criteria or were they manually selected? 

      The text of both the results and figure legends has been revised to clarify these points.

      (4) Data Availability: It would be helpful to provide supplemental table files for analyses (e.g., 5C to list the overlapping relationships in TGs for each TF/CR (5C) and 3E/6F to list DEG genes in comparisons). 

      Supplemental files for each analysis have been added (Supplemental Table 8-14). In addition, the raw and processed data is available on GEO and we have created an interactive Shiny App so people without coding experience can interact with each dataset (lines 917-919).

      (5) “...and found that each sample expressed these markers (Figure 6D), suggesting..." Consider clarifying "these". 

      Text has been added to refer to a few of these marker genes within the text (line 540).

      Citations

      (1) Zappia L, Oshlack A. Clustering trees: a visualization for evaluating clusterings at multiple resolutions. GigaScience. 2018;7(7):giy083. PMCID: PMC6057528

      (2) Zhou J, Xu J, Zhang L, Liu S, Ma Y, Wen X, Hao J, Li Z, Ni Y, Li X, Zhou F, Li Q, Wang F, Wang X, Si Y, Zhang P, Liu C, Bartolomei M, Tang F, Liu B, Yu J, Lan Y. Combined Single-Cell Profiling of lncRNAs and Functional Screening Reveals that H19 Is Pivotal for Embryonic Hematopoietic Stem Cell Development. Cell Stem Cell. 2019;24(2):285-298.e5. PMID: 30639035

      (3) Malagoli G, Valle F, Barillot E, Caselle M, Martignetti L. Identification of Interpretable Clusters and Associated Signatures in Breast Cancer Single-Cell Data: A Topic Modeling Approach. Cancers. 2024;16(7):1350. PMCID: PMC11011054

      (4) Adu-Gyamfi EA, Cheeran EA, Salamah J, Enabulele DB, Tahir A, Lee BK. Long non-coding RNAs: a summary of their roles in placenta development and pathology†. Biol Reprod. 2023;110(3):431–449. PMID: 38134961

      (5) Zheng M, Hu Y, Gou R, Nie X, Li X, Liu J, Lin B. Identification three LncRNA prognostic signature of ovarian cancer based on genome-wide copy number variation. Biomed Pharmacother. 2020;124:109810. PMID: 32000042

      (6) Khan T, Seetharam AS, Zhou J, Bivens NJ, Schust DJ, Ezashi T, Tuteja G, Roberts RM. Single Nucleus RNA Sequence (snRNAseq) Analysis of the Spectrum of Trophoblast Lineages Generated From Human Pluripotent Stem Cells in vitro. Front Cell Dev Biol. 2021;9:695248. PMCID: PMC8334858

      (7) Isakova A, Neff N, Quake SR. Single-cell quantification of a broad RNA spectrum reveals unique noncoding patterns associated with cell types and states. Proc Natl Acad Sci United States Am. 2021;118(51):e2113568118. PMCID: PMC8713755

      (8) Morales-Vicente DA, Zhao L, Silveira GO, Tahira AC, Amaral MS, Collins JJ, Verjovski-Almeida S. Singlecell RNA-seq analyses show that long non-coding RNAs are conspicuously expressed in Schistosoma mansoni gamete and tegument progenitor cell populations. Front Genet. 2022;13:924877. PMCID: PMC9531161

      (9) Kim DH, Marinov GK, Pepke S, Singer ZS, He P, Williams B, Schroth GP, Elowitz MB, Wold BJ. Single-Cell

      Transcriptome Analysis Reveals Dynamic Changes in lncRNA Expression during Reprogramming. Cell Stem Cell. 2015;16(1):88–101. PMCID: PMC4291542

      (10) Yang L, Liang P, Yang H, Coyne CB. Trophoblast organoids with physiological polarity model placental structure and function. bioRxiv. 2023;2023.01.12.523752. PMCID: PMC9882188

    1. Author response:

      General Statements

      In our manuscript, we demonstrate for the first time that RNA Polymerase I (Pol I) can prematurely release nascent transcripts at the 5' end of ribosomal DNA transcription units in vivo. This achievement was made possible by comparing wild-type Pol I with a mutant form of Pol I, hereafter called SuperPol previously isolated in our lab (Darrière at al., 2019). By combining in vivo analysis of rRNA synthesis (using pulse-labelling of nascent transcript and cross-linking of nascent transcript - CRAC) with in vitro analysis, we could show that Superpol reduced premature transcript release due to altered elongation dynamics and reduced RNA cleavage activity. Such premature release could reflect regulatory mechanisms controlling rRNA synthesis. Importantly, This increased processivity of SuperPol is correlated with resistance with BMH-21, a novel anticancer drugs inhibiting Pol I, showing the relevance of targeting Pol I during transcriptional pauses to kill cancer cells. This work offers critical insights into Pol I dynamics, rRNA transcription regulation, and implications for cancer therapeutics.

      We sincerely thank the three reviewers for their insightful comments and recognition of the strengths and weaknesses of our study. Their acknowledgment of our rigorous methodology, the relevance of our findings on rRNA transcription regulation, and the significant enzymatic properties of the SuperPol mutant is highly appreciated. We are particularly grateful for their appreciation of the potential scientific impact of this work. Additionally, we value the reviewer’s suggestion that this article could address a broad scientific community, including in transcription biology and cancer therapy research. These encouraging remarks motivate us to refine and expand upon our findings further.

      All three reviewers acknowledged the increased processivity of SuperPol compared to its wildtype counterpart. However, two out of three questions our claims that premature termination of transcription can regulate ribosomal RNA transcription. This conclusion is based on SuperPol mutant increasing rRNA production. Proving that modulation of early transcription termination is used to regulate rRNA production under physiological conditions is beyond the scope of this study. Therefore, we propose to change the title of this manuscript to focus on what we have unambiguously demonstrated:

      “Ribosomal RNA synthesis by RNA polymerase I is subjected to premature termination of transcription”.

      Reviewer 1 main criticisms centers on the use of the CRAC technique in our study. While we address this point in detail below, we would like to emphasize that, although we agree with the reviewer’s comments regarding its application to Pol II studies, by limiting contamination with mature rRNA, CRAC remains the only suitable method for studying Pol I elongation over the entire transcription units. All other methods are massively contaminated with fragments of mature RNA which prevents any quantitative analysis of read distribution within rDNA.  This perspective is widely accepted within the Pol I research community, as CRAC provides a robust approach to capturing transcriptional dynamics specific to Pol I activity. 

      We hope that these findings will resonate with the readership of your journal and contribute significantly to advancing discussions in transcription biology and related fields.

      (1) Description of the planned revisions

      Despite numerous text modification (see below), we agree that one major point of discussion is the consequence of increased processivity in SuperPol mutant on the “quality” of produced rRNA. Reviewer 3 suggested comparisons with other processive alleles, such as the rpb1-E1103G mutant of the RNAPII subunit (Malagon et al., 2006). This comparison has already been addressed by the Schneider lab (Viktorovskaya OV, Cell Rep., 2013 - PMID: 23994471), which explored Pol II (rpb1-E1103G) and Pol I (rpa190-E1224G). The rpa190-E1224G mutant revealed enhanced pausing in vitro, highlighting key differences between Pol I and Pol II catalytic ratelimiting steps (see David Schneider's review on this topic for further details).

      Reviewer 2 and 3 suggested that a decreased efficiency of cleavage upon backtracking might imply an increased error rate in SuperPol compared to the wild-type enzyme. Pol I mutant with decreased rRNA cleavage have been characterized previously, and resulted in increased errorrate. We already started to address this point. Preliminary results from in vitro experiments suggest that SuperPol mutants exhibit an elevated error rate during transcription. However, these findings remain preliminary and require further experimental validation to confirm their reproducibility and robustness. We propose to consolidate these data and incorporate into the manuscript to address this question comprehensively. This could provide valuable insights into the mechanistic differences between SuperPol and the wild-type enzyme. SuperPol is the first pol I mutant described with an increased processivity in vitro and in vivo, and we agree that this might be at the cost of a decreased fidelity.

      Regulatory aspect of the process:

      To address the reviewer’s remarks, we propose to test our model by performing experiments that would evaluate PTT levels in Pol I mutant’s or under different growth conditions. These experiments would provide crucial data to support our model, which suggests that PTT is a regulatory element of Pol I transcription. By demonstrating how PTT varies with environmental factors, we aim to strengthen the hypothesis that premature termination plays an important role in regulating Pol I activity.

      We propose revising the title and conclusions of the manuscript. The updated version will better reflect the study's focus and temper claims regarding the regulatory aspects of termination events, while maintaining the value of our proposed model.

      (2) Description of the revisions that have already been incorporated in the transferred manuscript

      Some very important modifications have now been incorporated:

      Statistical Analyses and CRAC Replicates:

      Unlike reviewers 2 and 3, reviewer 1 suggests that we did not analyze the results statistically. In fact, the CRAC analyses were conducted in biological triplicate, ensuring robustness and reproducibility. The statistical analyses are presented in Figure 2C, which highlights significant findings supporting the fact WT Pol I and SuperPol distribution profiles are different. We CRAC replicates exhibit a high correlation and we confirmed significant effect in each region of interest (5’ETS, 18S.2, 25S.1 and 3’ ETS, Figure 1) to confirm consistency across experiments. We finally took care not to overinterpret the results, maintaining a rigorous and cautious approach in our analysis to ensure accurate conclusions.

      CRAC vs. Net-seq:

      Reviewer 1 ask to comment differences between CRAC and Net-seq. Both methods complement each other but serve different purposes depending on the biological question on the context of transcription analysis. Net-seq has originally been designed for Pol II analysis. It captures nascent RNAs but does not eliminate mature ribosomal RNAs (rRNAs), leading to high levels of contamination. While this is manageable for Pol II analysis (in silico elimination of reads corresponding to rRNAs), it poses a significant problem for Pol I due to the dominance of rRNAs (60% of total RNAs in yeast), which share sequences with nascent Pol I transcripts. As a result, large Net-seq peaks are observed at mature rRNA extremities (Clarke 2018, Jacobs 2022). This limits the interpretation of the results to the short lived pre-rRNA species. In contrast, CRAC has been specifically adapted by the laboratory of David Tollervey to map Pol I distribution while minimizing contamination from mature rRNAs (The CRAC protocol used exclusively recovers RNAs with 3′ hydroxyl groups that represent endogenous 3′ ends of nascent transcripts, thus removing RNAs with 3’-Phosphate, found in mature rRNAs). This makes CRAC more suitable for studying Pol I transcription, including polymerase pausing and distribution along rDNA, providing quantitative dataset for the entire rDNA gene.

      CRAC vs. Other Methods:

      Reviewer 1 suggests using GRO-seq or TT-seq, but the experiments in Figure 2 aim to assess the distribution profile of Pol I along the rDNA, which requires a method optimized for this specific purpose. While GRO-seq and TT-seq are excellent for measuring RNA synthesis and cotranscriptional processing, they rely on Sarkosyl treatment to permeabilize cellular and nuclear membranes. Sarkosyl is known to artificially induces polymerase pausing and inhibits RNase activities which are involved in the process. To avoid these artifacts, CRAC analysis is a direct and fully in vivo approach. In CRAC experiment, cells are grown exponentially in rich media and arrested via rapid cross-linking, providing precise and artifact-free data on Pol I activity and pausing.

      Pol I ChIP Signal Comparison:

      The ChIP experiments previously published in Darrière et al. lack the statistical depth and resolution offered by our CRAC analyses. The detailed results obtained through CRAC would have been impossible to detect using classical ChIP. The current study provides a more refined and precise understanding of Pol I distribution and dynamics, highlighting the advantages of CRAC over traditional methods in addressing these complex transcriptional processes.

      BMH-21 Effects:

      As highlighted by Reviewer 1, the effects of BMH-21 observed in our study differ slightly from those reported in earlier work (Ref Schneider 2022), likely due to variations in experimental conditions, such as methodologies (CRAC vs. Net-seq), as discussed earlier. We also identified variations in the response to BMH-21 treatment associated with differences in cell growth phases and/or cell density. These factors likely contribute to the observed discrepancies, offering a potential explanation for the variations between our findings and those reported in previous studies. In our approach, we prioritized reproducibility by carefully controlling BMH-21 experimental conditions to mitigate these factors. These variables can significantly influence results, potentially leading to subtle discrepancies. Nevertheless, the overall conclusions regarding BMH-21's effects on WT Pol I are largely consistent across studies, with differences primarily observed at the nucleotide resolution. This is a strength of our CRAC-based analysis, which provides precise insights into Pol I activity.

      We will address these nuances in the revised manuscript to clarify how such differences may impact results and provide context for interpreting our findings in light of previous studies.

      Minor points:

      Reviewer #1:

      •  In general, the writing style is not clear, and there are some word mistakes or poor descriptions of the results, for example: 

      •  On page 14: "SuperPol accumulation is decreased (compared to Pol I)". 

      •  On page 16: "Compared to WT Pol I, the cumulative distribution of SuperPol is indeed shifted on the right of the graph." 

      We clarified and increased the global writing style according to reviewer comment.

      •  There are also issues with the literature, for example: Turowski et al, 2020a and Turowski et al, 2020b are the same article (preprint and peer-reviewed). Is there any reason to include both references? Please, double-check the references.  

      This was corrected in this version of the manuscript.

      •  In the manuscript, 5S rRNA is mentioned as an internal control for TMA normalisation. Why are Figure 1C data normalised to 18S rRNA instead of 5S rRNA? 

      Data are effectively normalized relative to the 5S rRNA, but the value for the 18S rRNA is arbitrarily set to 100%.

      •  Figure 4 should be a supplementary figure, and Figure 7D doesn't have a y-axis labelling. 

      The presence of all Pol I specific subunits (Rpa12, Rpa34 and Rpa49) is crucial for the enzymatic activity we performed. In the absence of these subunits (which can vary depending on the purification batch), Pol I pausing, cleavage and elongation are known to be affected. To strengthen our conclusion, we really wanted to show the subunit composition of the purified enzyme. This important control should be shown, but can indeed be shown in a supplementary figure if desired.

      Y-axis is figure 7D is now correctly labelled

      •  In Figure 7C, BMH-21 treatment causes the accumulation of ~140bp rRNA transcripts only in SuperPol-expressing cells that are Rrp6-sensitive (line 6 vs line 8), suggesting that BHM-21 treatment does affect SuperPol. Could the author comment on the interpretation of this result? 

      The 140 nt product is a degradation fragment resulting from trimming, which explains its lower accumulation in the absence of Rrp6. BMH21 significantly affects WT Pol I transcription but has also a mild effect on SuperPol transcription. As a result, the 140 nt product accumulates under these conditions.

      Reviewer #2:

      •  pp. 14-15: The authors note local differences in peak detection in the 5'-ETS among replicates, preventing a nucleotide-resolution analysis of pausing sites. Still, they report consistent global differences between wild-type and SuperPol CRAC signals in the 5'ETS (and other regions of the rDNA). These global differences are clear in the quantification shown in Figures 2B-C. A simpler statement might be less confusing, avoiding references to a "first and second set of replicates" 

      According to reviewer, statement has been simplified in this version of the manuscript.

      •  Figures 2A and 2C: Based on these data and quantification, it appears that SuperPol signals in the body and 3' end of the rDNA unit are higher than those in the wild type. This finding supports the conclusion that reduced pausing (and termination) in the 5'ETS leads to an increased Pol I signal downstream. Since the average increase in the SuperPol signal is distributed over a larger region, this might also explain why even a relatively modest decrease in 5'ETS pausing results in higher rRNA production. This point merits discussion by the authors. 

      We agree that this is a very important discussion of our results. Transcription is a very dynamic process in which paused polymerase is easily detected using the CRAC assay. Elongated polymerases are distributed over a much larger gene body, and even a small amount of polymerase detected in the gene body can represent a very large rRNA synthesis. This point is of paramount importance and, as suggested by the reviewer, is now discussed in detail.

      •  A decreased efficiency of cleavage upon backtracking might imply an increased error rate in SuperPol compared to the wild-type enzyme. Have the authors observed any evidence supporting this possibility? 

      Reviewer suggested that a decreased efficiency of cleavage upon backtracking might imply an increased error rate in SuperPol compared to the wild-type enzyme. We already started to address this point. Preliminary results from in vitro experiments suggest that SuperPol mutants exhibit an elevated error rate during transcription. However, these findings remain preliminary and require further experimental validation to confirm their reproducibility and robustness. We propose to consolidate these data and incorporate into the manuscript to address this question comprehensively.

      •  pp. 15 and 22: Premature transcription termination as a regulator of gene expression is welldocumented in yeast, with significant contributions from the Corden, Brow, Libri, and Tollervey labs. These studies should be referenced along with relevant bacterial and mammalian research. 

      According to reviewer suggestion, we referenced these studies.

      •  p. 23: "SuperPol and Rpa190-KR have a synergistic effect on BMH-21 resistance." A citation should be added for this statement. 

      This represents some unpublished data from our lab. KR and SuperPol are the only two known mutants resistant to BMH-21. We observed that resistance between both alleles is synergistic, with a much higher resistance to BMH-21 in the double mutant than in each single mutant (data not shown). Comparing their resistance mechanisms is a very important point that we could provide upon request. This was added to the statement.

      •  p. 23: "The released of the premature transcript" - this phrase contains a typo 

      This is now corrected.

      Reviewer #3:

      •  Figure 1B: it would be opportune to separate the technique's schematic representation from the actual data. Concerning the data, would the authors consider adding an experiment with rrp6D cells? Some RNAs could be degraded even in such short period of time, as even stated by the authors, so maybe an exosome depleted background could provide a more complete picture. Could also the authors explain why the increase is only observed at the level of 18S and 25S? To further prove the robustness of the Pol I TMA method could be good to add already characterized mutations or other drugs to show that the technique can readily detect also well-known and expected changes. 

      The precise objective of this experiment is to avoid the use of the Rrp6 mutant. Under these conditions, we prevent the accumulation of transcripts that would result from a maturation defect. While it is possible to conduct the experiment with the Rrp6 mutant, it would be impossible to draw reliable conclusions due to this artificial accumulation of transcripts.

      •  Figure 1C: the NTS1 probe signal is missing (it is referenced in Figure 1A but not listed in the Methods section or the oligo table). If this probe was unused, please correct Figure 1A accordingly. 

      We corrected Figure 1A.  

      •  Figure 2A: the RNAPI occupancy map by CRAC is hard to interpret. The red color (SuperPol) is stacked on top of the blue line, and we are not able to observe the signal of the WT for most of the position along the rDNA unit. It would be preferable to use some kind of opacity that allows to visualize both curves. Moreover, the analysis of the behavior of the polymerase is always restricted to the 5'ETS region in the rest of the manuscript. We are thus not able to observe whether termination events also occur in other regions of the rDNA unit. A Northern blot analysis displaying higher sizes would provide a more complete picture. 

      We addressed this point to make the figure more visually informative. In Northern Blot analysis, we use a TSS (Transcription Start Site) probe, which detects only transcripts containing the 5' extremity. Due to co-transcriptional processing, most of the rRNA undergoing transcription lacks its 5' extremity and is not detectable using this technique. We have the data, but it does not show any difference between Pol I and SuperPol. This information could be included in the supplementary data if asked.

      •  "Importantly, despite some local variations, we could reproducibly observe an increased occupancy of WT Pol I in 5'-ETS compared to SuperPol (Figure 1C)." should be Figure 2C. 

      Thanks for pointing out this mistake. it has been corrected.

      •  Figure 3D: most of the difference in the cumulative proportion of CRAC reads is observed in the region ~750 to 3000. In line with my previous point, I think it would be worth exploring also termination events beyond the 5'-ETS region. 

      We agree that such an analysis would have been interesting. However, with the exception of the pre-rRNA starting at the transcription start site (TSS) studied here, any cleaved rRNA at its 5' end could result from premature termination and/or abnormal processing events. Exploring the production of other abnormal rRNAs produced by premature termination is a project in itself, beyond this initial work aimed at demonstrating the existence of premature termination events in ribosomal RNA production.

      •  Figure 4: should probably be provided as supplementary material. 

      As l mentioned earlier (see comments), the presence of all Pol I specific subunits (Rpa12, Rpa34 and Rpa49) is crucial for the enzymatic activity we performed. This important control should be shown, but can indeed be shown in a supplementary figure if desired.

      •  "While the growth of cells expressing SuperPol appeared unaffected, the fitness of WT cells was severely reduced under the same conditions." I think the growth of cells expressing SuperPol is slightly affected. 

      We agree with this comment and we modified the text accordingly.

      •  Figure 7D: the legend of the y-axis is missing as well as the title of the plot. 

      Legend of the y-axis and title of the plot are now present.

      •  The statements concerning BMH-21, SuperPol and Rpa190-KR in the Discussion section should be removed, or data should be provided.

      This was discussed previously. See comment above.

      •  Some references are missing from the Bibliography, for example Merkl et al., 2020; Pilsl et al., 2016a, 2016b. 

      Bibliography is now fixed

      (3) Description of analyses that authors prefer not to carry out

      Does SuperPol mutant produces more functional rRNAs ?

      As Reviewer 1 requested, we agree that this point requires clarification.. In cells expressing SuperPol, a higher steady state of (pre)-rRNAs is only observed in absence of degradation machinery suggesting that overproduced rRNAs are rapidly eliminated. We know that (pre)rRNas are unable to accumulate in absence of ribosomal proteins and/or Assembly Factors (AF). In consequence, overproducing rRNAs would not be sufficient to increase ribosome content. This specific point is further address in our lab but is beyond the scope of this article.

      Is premature termination coupled with rRNA processing 

      We appreciate the reviewer’s insightful comments. The suggested experiments regarding the UTP-A complex's regulatory potential are valuable and ongoing in our lab, but they extend beyond the scope of this study and are not suitable for inclusion in the current manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public Review):

      Summary:

      This study investigates the hypoxia rescue mechanisms of neurons by non-neuronal cells in the brain from the perspective of exosomal communication between brain cells. Through multi-omics combined analysis, the authors revealed this phenomenon and logically validated this intercellular rescue mechanism under hypoxic conditions through experiments. The study proposed a novel finding that hemoglobin maintains mitochondrial function, expanding the conventional understanding of hemoglobin. This research is highly innovative, providing new insights for the treatment of hypoxic encephalopathy.

      Overall, the manuscript is well organized and written, however, there are some minor/major points that need to be revised before this manuscript is accepted.

      We thank the reviewer for the detailed analysis of our study. Please find our answers to the points raised by the reviewer below.

      Major points:

      (1) Hypoxia can induce endothelial cells to release exosomes carrying hemoglobin, however, how neurons are able to actively take up these exosomes? It is possible for other cells to take up these exosomes also? This point needs to be clarified in this study.

      We sincerely appreciate the reviewer’s valuable comments. Regarding the question of how neurons actively uptake extracellular vesicles (EVs) carrying hemoglobin mRNA, existing studies suggest that EVs can enter cells via three main pathways: direct fusion, receptor-mediated endocytosis, and phagocytosis (PMID: 25288114). Our experimental results show that neurons are able to actively uptake EVs from endothelial cells without any treatment, and hypoxic conditions did not significantly increase the uptake of endothelial EVs by neurons (Fig. 5A and I). As for the specific uptake mechanism, there is currently no definitive conclusion. Some studies have found that hypoxic-ischemic injury may induce neurons to upregulate Cav-1, which could enhance the uptake of endothelial-derived EVs via Cav-1-mediated endocytosis (PMID: 31740664), but this mechanism still requires further validation.

      Regarding whether other cell types also take up these EVs, we focused on neurons based on existing literature and our own data, which show that the increased hemoglobin in the brain under hypoxic conditions is primarily found in neurons (Fig. 4H-J, PMID: 19116637). Moreover, we observed that, under hypoxic conditions, almost all non-neuronal supporting cells in the brain transcribe hemoglobin in large amounts and release it via EVs (Fig. 3J). Furthermore, we would like to emphasize that although neurons do not transcribe hemoglobin, we observed substantial expression of hemoglobin within neurons. This suggests that it may serve as an important protective mechanism for the brain. Therefore, the focus of our study is on the protective effect of EVs carrying hemoglobin mRNA on neurons, and the uptake by other cell types was not explored. We greatly appreciate the reviewer’s question, and we believe this is an intriguing avenue for further investigation. This could provide new insights for interventions in hypoxic brain injury, and we plan to delve into this topic in future studies.

      (2) The expression of hemoglobin in neurons is important for mitochondrial homeostasis, but its relationship with mitochondrial homeostasis needs to be further elucidated in the study.

      We sincerely appreciate the reviewer’s valuable comments. We fully agree with the importance of hemoglobin expression in neurons for mitochondrial homeostasis. In this study, we have confirmed through in vitro experiments that when neurons are treated with conditioned medium from endothelial cells, they exhibit increased hemoglobin expression. This, in turn, enhances their resistance to hypoxia by restoring mitochondrial membrane potential and increasing mitochondrial numbers, thereby effectively improving neuronal viability. Notably, this protective effect disappears when EVs are removed from the endothelial-conditioned medium or when hemoglobin in endothelial cells is disrupted, further supporting the notion that endothelial cells transfer hemoglobin via EVs, helping neurons express hemoglobin under hypoxic conditions and exert protective effects.

      In summary, hemoglobin primarily helps maintain mitochondrial membrane potential, thereby supporting the restoration of energy metabolism and production under hypoxic conditions, which effectively improves the neuronal resistance to hypoxia. Although we were unable to explore the specific mechanisms of hemoglobin’s role in mitochondrial homeostasis in detail within this study, we recognize the importance of this aspect and plan to further investigate how hemoglobin regulates mitochondrial homeostasis and function in neurons in future research.

      Once again, we greatly appreciate the reviewer’s insightful comments. We will continue to optimize our research direction and look forward to further elucidating these important biological mechanisms in future studies.

      Minor points:

      (1) In Figures 1-3, the authors use "Endo" to represent endothelial cells, while in Figures 4-7, the abbreviation "EC" is used. Please standardize the format.

      Thank you for the reviewer’s suggestion. We will use “EC” consistently to refer to endothelial cells throughout the manuscript to ensure uniformity.

      (2) In all qPCR statistical results, please italicize the gene names on the axis.

      Thank you for the reviewer’s valuable suggestion. We will make sure to italicize the gene names on the axis in all qPCR statistical results to adhere to the formatting requirements.

      (3) In the Western blot result of Figure 3C, what type of cell-derived exosomes does the Control group represent, and why can it be used as a control group for brain-derived exosomes?

      Thank you for the reviewer’s insightful question. In Fig. 3C, the control group (Control) represents the cell lysate sample, which serves as a positive control in the EVs Western blot analysis. In this experiment, the positive control is primarily used to validate the specificity of the antibody and the accuracy of the experimental procedure. We used cell lysate as the control to confirm that the antibody can detect EV-associated markers in the cell lysates, thus providing a comparative basis for the identification of brain-derived EVs.

      (4) In Figure 4F, the morphology of hemoglobin in the Con group and the H28d group is not entirely consistent with Figure 4H. Is this difference due to different experimental batches?

      Thank you for the reviewer’s careful observation. The observed difference may indeed be due to variations between different experimental batches. To ensure consistency of the results, we have updated the representative immunofluorescence images, which are now presented in Fig. 4H.

      (5) Supplement the transcription and expression levels of hemoglobin in neurons under different treatment conditions after medium exchange with exosome removal and medium exchange after HBA1 interference.

      Thank you for the reviewer’s valuable suggestions. We have added the experimental data regarding the exchange of culture medium after the removal of EVs. As shown in Fig. S6, the endothelial-derived medium without EVs does not enhance the hemoglobin levels in neurons under hypoxic conditions. Additionally, we have included the detection results of hemoglobin expression in neurons after HBA1 interference, as shown in Fig. S7E-F. The results indicate that the culture medium derived from HBA1-interfered endothelial cells also fails to help neurons increase hemoglobin expression under hypoxic conditions.

      (6) Figure S3 should be split to separately explain the increased exosome release induced by hypoxia, the non-toxic effect of endothelial cell culture medium on neurons, and the successful screening of the HBA1 interference plasmid.

      Thank you for the reviewer’s suggestions. Based on your feedback, we have split the original Fig. S3 into multiple parts to more clearly present the different experimental results. Specifically, the results of hypoxia-induced EVs release increase have been updated in Fig. S4, the non-toxic effects of endothelial cell culture medium on neurons are shown in Fig. S5, and the successful screening of the HBA1 interference plasmid is presented in Fig. S7.

      (7) Regarding the extracellular vesicles/exosomes, it should be expressed consistently in the whole manuscript.

      Thank you for the reviewer’s reminder. We will ensure that the term “extracellular vesicles” is used consistently throughout the manuscript.

      (8) In lines 70 and 80, the O2 should be changed to "O<sub>2</sub>".

      Thank you for the reviewer’s careful observation. We have corrected the formatting of “O2” to “O₂” in lines 70 and 80.

      We would like to thank the Reviewer for taking the time to thoroughly examine our work, for their helpful feedback that has significantly contributed to improving our manuscript, and for their kind and encouraging words.

      Reviewer #2 (Public Review):

      Summary:

      This is an interesting study with a lot of data. Some of these ideas are intriguing. But a few major points require further consideration.

      We thank the reviewer for the detailed assessment of our study and pinpointing its current weaknesses. Please find our answers to all comments below.

      Major points:

      (1) What disease is this model of whole animal hypoxia supposed to mimic? If one is focused on the brain, can one just use a model of focal or global cerebral ischemia?

      Thank you for the reviewer’s insightful question. The chronic hypoxia model we employed is designed to mimic the multi-organ damage caused by systemic hypoxia, which is relevant to clinical conditions such as high-altitude hypoxia, chronic obstructive pulmonary disease, and acute hypoxic brain injury. In contrast to focal or global cerebral ischemia models, the focus of our study is on how the brain, under extreme systemic hypoxia, utilizes endothelial cell-derived extracellular vesicles (EVs) to transfer hemoglobin mRNA, thereby protecting neurons and aiding the brain’s response to hypoxia-induced damage.

      We understand the reviewer’s concern that focal or global ischemia models are typically used to simulate localized brain hypoxia or ischemic injury. However, the core of our research is to explore the brain’s overall adaptive mechanisms under systemic hypoxic conditions. By using a systemic hypoxia model, we can more comprehensively simulate the effects of global hypoxia on the brain and uncover how the brain engages specific molecular mechanisms for self-protection. This approach offers a novel perspective on brain hypoxic-ischemic diseases and holds potential clinical applications, particularly in the study of stroke, vascular cognitive impairment and dementia (VCID), and related conditions.

      Additionally, we have observed that hemoglobin significantly increases in the brain in an animal model of focal ischemia (as shown in Author response image 1 below). This finding further supports the idea that hemoglobin upregulation may be a universal protective mechanism for the brain’s response to hypoxic damage. While this part of the research is still ongoing, preliminary results suggest that both systemic hypoxia and focal ischemia might trigger protective effects through hemoglobin regulation.

      Author response image 1.

      The expression level of Hba-a1 in the brain of VCID mouse.

      Therefore, the core of our study is to elucidate the brain’s self-protection mechanisms under systemic hypoxia, rather than focusing solely on cerebral ischemia models. We believe this approach provides new insights into the prevention and treatment of brain hypoxic-ischemic diseases, with significant clinical application potential.

      In light of this, we have added a related discussion to the manuscript, clearly explaining the rationale for choosing the systemic hypoxia model. The updated content can be found on P11, Line 13-21 as follows: “To investigate this phenomenon, we employed a chronic hypoxia model in which mice were exposed to 7% oxygen for 28 days. This model aims to mimic systemic hypoxia-induced multi-organ damage, a condition observed in diseases such as high-altitude hypoxia, chronic obstructive pulmonary disease, and acute hypoxic brain injury. The primary goal of this model is to explore how the brain adapts under extreme low-oxygen conditions and employs specific mechanisms to protect itself from hypoxia-induced damage. This approach provides valuable insight into diseases related to hypoxic-ischemic injury in the brain, including stroke and vascular dementia, offering a novel perspective for potential clinical applications.”

      (2) If this model subjects the entire animal to hypoxia, then other organs will also be hypoxic. Should one also detect endothelial upregulation and release of extracellular vesicles containing hemoglobin mRNA in non-CNS organs? Where do these vesicles go? Into blood?

      Thank you for the reviewer’s valuable feedback. Indeed, in a whole-body hypoxia model, other organs are also affected by hypoxia. Therefore, future research may need to investigate the upregulation of endothelial cells in organs other than the central nervous system, as well as the release of EVs containing hemoglobin mRNA from these organs. However, in this study, we isolated EVs from the brain tissue in situ following perfusion with physiological saline, a method that effectively eliminates the influence of EVs from blood or other organs. As a result, our primary focus was on studying how EVs released by brain endothelial cells are actively taken up by neurons to exert neuroprotective effects. The potential for these EVs to enter the bloodstream and their subsequent fate is indeed a topic worthy of further investigation. Future research could offer new insights into the cross-organ effects of systemic hypoxia.

      (3) What other mRNA are contained in the vesicles released from brain endothelial cells?

      Thank you for the reviewer’s valuable suggestions. We have further analyzed EVs derived from brain endothelial cells, and in addition to hemoglobin mRNA, these EVs also contain a variety of other mRNAs, including Vwf, Hbb-bt, Hba-a1, Hbb-bs, Hba-a2, Acer2, Angpt2, Ldha, Gm42418, Slc16a1, Cxcl12, B2m, Ctla2a, Ccnd1, and Hmgcs2 (Log2FC > 1.2). The biological processes associated with these mRNAs primarily involve: cell-substrate adhesion, regulation of cellular amide metabolic process, negative regulation of cell migration, negative regulation of cell motility, and negative regulation of cellular component movement. These processes may be closely related to the neuroprotective effects of endothelial cell EVs in a hypoxic environment, especially in terms of regulating cell behavior and maintaining cell structure and function. Additionally, these EVs contain multiple key factors associated with intracellular metabolism, movement, and migration, which may collectively influence neuronal function and survival. Notably, our study also found that mRNA of various hemoglobin subunits ranks among the top five in terms of abundance in the mRNA secreted by hypoxic endothelial EVs, further emphasizing the importance of hemoglobin mRNA in endothelial-derived EVs. Therefore, future research may explore the functions of these mRNAs and reveal how they act in concert to protect neurons from hypoxia-induced damage.

      We have updated and added these results in Fig. S4, and have further elaborated on the findings in the revised figure. Once again, we thank the reviewer for the attention and valuable suggestions regarding our work.

      (4) Where do the endothelial vesicles go? Only to neurons? Or to other cells as well?

      Thank you for the reviewer’s important question. As previously mentioned, the focus of this study is to investigate how EVs carrying hemoglobin mRNA influence neuronal function. Through a combined analysis of single-cell transcriptomics and EV transcriptomics from brain tissue, we found that, besides neurons, almost all types of supportive cells in the brain and their secreted EVs contain a significant amount of hemoglobin mRNA (Fig. 3J, 4B). Notably, although neurons do not transcribe hemoglobin mRNA themselves, under hypoxic conditions, neurons significantly increase hemoglobin expression, resulting in a phenomenon where the transcription and expression levels of hemoglobin in neurons are inconsistent. This phenomenon has been observed both in our study and others (Fig. 4H-J, PMID: 19116637). This observation led us to focus on the active uptake of EVs by neurons and the potential neuroprotective effects they might bring.

      Regarding whether other cell types uptake these EVs and their potential functions, although our current research is focused on neurons, this is indeed an important area for further investigation. Given that non-neuronal supportive cells may also transfer hemoglobin mRNA via EVs under hypoxic conditions, future research will further explore the uptake of EVs by different cell types and their roles in hypoxic adaptation.

      We are particularly interested in the hemoglobin expression in neurons under hypoxic conditions and consider neurons to be the primary expressers of hemoglobin, providing a new perspective for exploring the neuroprotective role of hemoglobin. We plan to delve deeper into these issues in future studies.

      (5) Neurons can express endogenous hemoglobin. Is it useful to subject neurons to hypoxia and then see how much the endogenous mRNA goes up? How large is the magnitude of endogenous hemoglobin gene upregulation compared to the hypothesized exogenous mRNA that is supposed to be donated from endothelial vesicles?

      Thank you for the reviewer’s valuable question. We have observed that, in the absence of treatment with endothelial cell-derived conditioned medium, there is no significant change in the transcription and expression levels of endogenous hemoglobin in neurons under hypoxic conditions (Fig. 5I, 6C-D). However, when neurons were treated with endothelial cell-conditioned medium, under the same hypoxic conditions, the transcription levels of hemoglobin increased by approximately 1.2-fold, and the expression levels increased by approximately 3.8-fold (Fig. 6B-D). Additionally, we have added pre-treatment experiments involving EVs depletion from the endothelial culture medium and HBA interference. The results show that, after these two pre-treatments, the conditioned medium lost its ability to enhance the transcription and expression of hemoglobin in neurons under hypoxic conditions (Fig. S6, S7D-F), further emphasizing the important role of endothelial EVs in this process. This finding indicates that endothelial-derived EVs significantly promote hemoglobin expression in neurons, and this effect is far greater than the upregulation of endogenous hemoglobin in neurons. Therefore, while neurons can express endogenous hemoglobin, exogenous hemoglobin significantly enhances its expression, which may help neurons tolerate the hypoxic environment and provide additional protection.

      (6) Finally, it may be useful to provide more information and data to explain how the expression of this exogenous endothelial-derived hemoglobin binds to neuronal mitochondria to alter function.

      Thank you for the reviewer’s valuable suggestion. As we previously mentioned, hemoglobin plays a protective role in neurons by maintaining mitochondrial membrane potential, helping neurons restore energy metabolism and energy production under hypoxic conditions. We fully agree on the importance of this research direction. Several studies have shown that when hemoglobin is expressed in neurons, it predominantly localizes to mitochondria, which aligns with the physiological process of heme synthesis within mitochondria (PMID: 23187133). Furthermore, in the brains of Parkinson’s disease patients, the localization of hemoglobin in neuronal mitochondria is altered compared to normal conditions (PMID: 27181046). Therefore, the interaction between hemoglobin and mitochondria plays a crucial role in neuronal function.

      Although existing research indicates the role of hemoglobin in neuronal mitochondria, studies in this area remain limited. We plan to further investigate how hemoglobin binds to mitochondria and its specific effects on mitochondrial function in our future work. We believe that a deeper understanding of this mechanism will provide essential theoretical insights into the effects of hypoxia on neurons and offer new potential strategies for neuroprotective therapies.

      We would like to thank the Reviewer for taking the time to thoroughly examine our work, for their helpful feedback that has significantly contributed to improving our manuscript, and for their kind and encouraging words.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      In this study, the authors introduced an essential role of AARS2 in maintaining cardiac function. They also investigated the underlying mechanism that through regulating alanine and PKM2 translation are regulated by AARS2. Accordingly, a therapeutic strategy for cardiomyopathy and MI was provided. Several points need to be addressed to make this article more comprehensive:

      Thank this reviewer for the overall supports on our manuscript.

      (1) Include apoptotic caspases in Figure 2B, and Figure 4 B and E as well.

      This is a good point for further investigating the role of apoptosis signaling in cardiac-specific AARS2 knockout hearts. Since we are focusing on cardiomyocyte phenotypes, immunostaining on TUNEL and anti-cTnT directly evaluated the level of cardiomyocyte apoptosis, which was supported by Western blots with anti-Bcl-2 and anti-BAX of control and mutant hearts. TUNEL data accurately represents biochemical and morphological characteristics of apoptotic cells, and is more sensitive than the conventional histochemical and biochemical methods. Future studies are needed to address how apoptosis components including apoptotic caspases are involved in cardiomyocyte apoptosis in AARS2 mutant hearts.

      (2) It would be better to show the change of apoptosis-related proteins upon the knocking down of AARS2 by small interfering RNA (siRNA).

      Since primary culture of neonatal cardiomyocytes also contained non-cardiomyocytes, using Western blots with anti-apoptosis proteins cannot directly assess cardiomyocytes phenotypes. In this work, our data on the elevation of cTnT<sup>+</sup>/TUNEL<sup>+</sup> cardiomyocytes and cardiac fibrosis in AARS2 mutant hearts suggest that AARS2 deficiency induced cardiomyocyte death.

      (3) In Figure 5, the authors performed Mass Spectrometry to assess metabolites of homogenates. I was wondering if the change of other metabolites could be provided in the form of a heatmap.

      Indeed, we assessed other metabolites by mass spectrometry as shown below, we found that overexpression of AARS2 in either transgenic mouse hearts or neonatal cardiomyocytes had no consistent changes on the level of fumarate, succinate, malate, alpha-ketoglutarate (alpha-KG), citrate, oxaloacetate (OAA), ATP, and ADP, thus suggesting that AARS2 overexpression has more specific effect on the level of lactate, pyruvate, and acetyl-CoA.

      Author response image 1.

      (4) The amounts of lactate should be assessed using a lactate assay kit to validate the Mass Spectrometry results.

      We carried out several rounds of mass spectrometry experiments, suggesting that lactate is consistently elevated after AARS2 overexpression in neonatal cardiomyocytes as shown below. We will establish other lactate assays in future studies.

      Author response image 2.

      (5) How about the expression pattern of PKM2 before and after mouse MI. Furtherly, the correlation between AARS2 and PKM2?

      Previous studies have shown that the expression level of PKM2 in mice is significantly increased after cardiac surgery at different time points, which may be related to cardiometabolic changes [1]. Our co-IP experiments showed no direct interactions between AARS2 and PKM2 (Figure 6K), while both AARS2 proteins and mRNA decreased on the 3 days (Figure 1A-B) and 7 days (Author response image 3)after myocardial infarction in mice. Thus, the level of AARS2 is reversely related to PKM2 after myocardial infarction.

      Author response image 3.

      (6) In Figure 5, how about the change of apoptosis-related proteins after administration of PKM2 activator TEPP-46?

      It has been shown that TEPP-46 treatment decreased cardiomyocyte death in different models that induced cardiomyocyte apoptosis [2, 3]. We would like to refer these published works that TEPP-46 treatment improves heart function by inhibiting cardiac injury-induced cardiomyocyte death.

      Reviewer #2 (Public Review):

      Summary:

      The authors aimed to elucidate the role of AARS2, an alanyl-tRNA synthase, in mouse hearts, specifically its impact on cardiac function, fibrosis, apoptosis, and metabolic pathways under conditions of myocardial infarction (MI). By investigating the effects of both deletion and overexpression of AARS2 in cardiomyocytes, the study aims to determine how AARS2 influences cardiac health and survival during ischemic stress.

      The authors successfully achieved their aims by demonstrating the critical role of AARS2 in maintaining cardiomyocyte function under ischemic conditions. The evidence presented, including genetic manipulation results, functional assays, and mechanistic studies, robustly supports the conclusion that AARS2 facilitates cardiomyocyte survival through PKM2-mediated metabolic reprogramming. The study convincingly links AARS2 overexpression to improved cardiac outcomes post-MI, validating the proposed protective AARS2-PKM2 signaling pathway.

      This work may have a significant impact on the field of cardiac biology and ischemia research. By identifying AARS2 as a key player in cardiomyocyte survival and metabolic regulation, the study opens new avenues for therapeutic interventions targeting this pathway. The methods used, particularly the cardiomyocyte-specific genetic models and ribosome profiling, are valuable tools that can be employed by other researchers to investigate similar questions in cardiac physiology and pathology.

      Understanding the metabolic adaptations in cardiomyocytes during ischemia is crucial for developing effective treatments for MI. This study highlights the importance of metabolic flexibility and the role of specific enzymes like AARS2 in facilitating such adaptations. The identification of the AARS2-PKM2 axis adds a new layer to our understanding of cardiac metabolism, suggesting that enhancing glycolysis can be a viable strategy to protect the heart from ischemic damage.

      We thank this reviewer for his/her supports on our manuscript.

      Strengths:

      (1) Comprehensive Genetic Models: The use of cardiomyocyte-specific AARS2 knockout and overexpression mouse models allowed for precise assessment of AARS2's role in cardiac cells.

      (2) Functional Assays: Detailed phenotypic analyses, including measurements of cardiac function, fibrosis, and apoptosis, provided evidence for the physiological impact of AARS2 manipulation.

      (3) Mechanistic Insights: This study used ribosome profiling (Ribo-Seq) to uncover changes in protein translation, specifically highlighting the role of PKM2 in metabolic reprogramming.

      (4) Therapeutic Relevance: The use of the PKM2 activator TEPP-46 to reverse the effects of AARS2 deficiency presents a potential therapeutic avenue, underscoring the practical implications of the findings.

      Weaknesses:

      (1) Species Limitation: The study is limited to mouse and rat models, and while these are highly informative, further validation in human cells or tissues would strengthen the translational relevance.

      We fully agree with this reviewer that this study is limited to mouse and rat models. It would certainly be important to address how AARS2-PKM2 is related myocardial infarction patients in the future.

      (2) Temporal Dynamics: The study does not extensively address the temporal dynamics of AARS2 expression and PKM2 activity during the progression of MI and recovery, which could offer deeper insights into the timing and regulation of these processes.

      Thanks for this critical point. Indeed, we found that both AARS2 proteins and mRNA decreased on 3 days (Figure 1A-B) and 7 days (Author response image 3) after myocardial infarction in mice as shown below. Others have reported PKM2 proteins increased after heart surgery in mice at different time points [1]. Thus, the level of AARS2 is reversely related to PKM2 after myocardial infarction.

      Reviewer #3 (Public Review):

      In the present study, the author revealed that cardiomyocyte-specific deletion of mouse AARS2 exhibited evident cardiomyopathy with impaired cardiac function, notable cardiac fibrosis, and cardiomyocyte apoptosis. Cardiomyocyte-specific AARS2 overexpression in mice improved cardiac function and reduced cardiac fibrosis after myocardial infarction (MI), without affecting cardiomyocyte proliferation and coronary angiogenesis. Mechanistically, AARS2 overexpression suppressed cardiomyocyte apoptosis and mitochondrial reactive oxide species production, and changed cellular metabolism from oxidative phosphorylation toward glycolysis in cardiomyocytes, thus leading to cardiomyocyte survival from ischemia and hypoxia stress. Ribo-Seq revealed that AARS2 overexpression increased pyruvate kinase M2 (PKM2) protein translation and the ratio of PKM2 dimers to tetramers that promote glycolysis. Additionally, PKM2 activator TEPP-46 reversed cardiomyocyte apoptosis and cardiac fibrosis caused by AARS2 deficiency. Thus, this study demonstrates that AARS2 plays an essential role in protecting cardiomyocytes from ischemic pressure via fine-tuning PKM2-mediated energy metabolism, and presents a novel cardiac protective AARS2-PKM2 signaling during the pathogenesis of MI. This study provides some new knowledge in the field, and there are still some questions that need to be addressed in order to better support the authors' views.

      We thank this reviewer for his/her overall supports on our manuscript.

      (1) WGA staining showed obvious cardiomyocyte hypertrophy in the AARS2 cKO heart. Whether AARS affects cardiac hypertrophy needs to be further tested.

      WGA staining is widely used to measure the size of cardiomyocytes in the literature. Here, we found that the size of mutant cardiomyocytes increased by ~20% after AARS2 knockout. In addition, we also measured and found that the ratio of heart to body weight increased in AARS2 mutant mice compared with control siblings as shown below.

      Author response image 4.

      (2) The authors observed that AARS2 can improve myocardial infarction, and whether AARS2 has an effect on other heart diseases.

      Thanks for this critical point. We agree with this reviewer that it will be important to address whether overexpression of AARS2 has cardiac protection in other heart diseases such as transverse aortic constriction in the future.

      (3) Studies have shown that hypoxia conditions can lead to mitochondrial dysfunction, including abnormal division and fusion. AARS2 also affects mitochondrial division and fusion and interacts with mitochondrial proteins, including FIS and DRP1, the authors are suggested to verify.

      This is a good point. Mitochondrial dysfunction occurs when cardiomyocytes are subjected to hypoxia conditions such as myocardial infarction. Our ribosome sequencing data suggested that overexpression of AARS2 had no effect on the level of FIS1 and DRP2 as shown below. We agree with this reviewer that future studies are needed to clarify potential interactions between AARS2 and FIS/DRP1 proteins.

      Author response image 5.

      (4) The authors only examined the role of AARS2 in cardiomyocytes, and fibroblasts are also an important cell type in the heart. Authors should examine the expression and function of AARS2 in fibroblasts.

      We fully agree with this reviewer that AARS2 may also function in cardiac fibroblasts since it is expressed in fibroblasts and cardiomyocyte-specific AARS2 knockout led to more fibrosis after myocardial infarction, which certainly warrant future investigations.

      (5) Overexpression of AARS2 can inhibit the production of mtROS, and has a protective effect on myocardial ischemia and H/ R-induced injury, and the occurrence of iron death is also closely related to ROS, whether AARS protects myocardial by regulating the occurrence of iron death?

      Thank this reviewer for his/her critical point. Our current data cannot rule out whether iron-mediated death is involved in AARS2 function in cardiac protection, which warrant future investigations.

      (6) Please revise the English grammar and writing style of the manuscript, spelling and grammatical errors should be excluded.

      Sorry for spelling and grammatical errors. We have carefully revised this manuscript now.

      (7) Recent studies have shown that a decrease in oxygen levels leads to an increase in AARS2, and lactic acid rises rapidly without being oxidized. Both of these factors inhibit oxidative phosphorylation and muscle ATP production by increasing mitochondrial lactate acylation, thereby inhibiting exercise capacity and preventing the accumulation of reactive oxygen species ROS. The key role of protein lactate acylation modification in regulating oxidative phosphorylation of mitochondria, and the importance of metabolites such as lactate regulating cell function through feedback mechanisms, i.e. cells adapt to low oxygen through metabolic regulation to reduce ROS production and oxidative damage, and therefore whether AARS2 in the heart also acts in this way.

      This is an interesting question. Since overexpression of AARS2 in muscles has previously been reported to increase PDHA1 lactylation and decrease its activity [4]. Actually, we initially examined whether overexpression of AARS2 in cardiomyocytes has similar effect on PDHA1 lactylation. However, our results showed that overexpression of AARS2 had no evident increases of lactylated PDHA1 in cardiomyocytes as shown below. However, future studies are needed to explore whether other proteins lactylation by AARS2 are involved in its cardiac protection function.

      Author response image 6.

      Reviewer #2 (Recommendations For The Authors):

      Suggestions for Improved or Additional Experiments, Data, or Analyses:

      (1) Validation in Human Models: It would be great if, in the future, the authors could conduct experiments with human cardiomyocytes derived from induced pluripotent stem cells (iPSCs) to validate the findings in a human context. This would strengthen the translational relevance of the results.

      We fully agree with this reviewer that this study is limited to mouse and rat models. It would certainly be important to address how AARS2-PKM2 is related myocardial infarction patients and/or human iPSC-derived cardiomyocytes in the future.

      (2) Broader Metabolic Analysis: To perform comprehensive metabolic profiling (e.g., metabolomics) to identify other metabolic pathways influenced by AARS2 overexpression or deficiency. This could provide a more holistic view of the metabolic changes and potential compensatory mechanisms.

      As noted above, we indeed assessed other metabolites by mass spectrometry, we found that overexpression of AARS2 in either transgenic mouse hearts or neonatal cardiomyocytes had no consistent changes on the level of fumarate, succinate, malate, alpha-ketoglutarate (alpha-KG), citrate, oxaloacetic acid (OAA), ATP, and ADP, thus suggesting that AARS2 overexpression has more specific effect on the level of lactate, pyruvate, and acetyl-CoA.

      (3) Temporal Dynamics: Investigate the temporal expression and activity of AARS2 and PKM2 during the progression and recovery phases of myocardial infarction. Time-course studies could elucidate the dynamics and regulatory mechanisms involved.

      As noted above, we found that both AARS2 proteins and mRNA decreased on the third and seventh day after myocardial infarction in mice. Others have reported PKM2 proteins increased after heart surgery in mice at different time points [1]. Thus, the level of AARS2 is reversely related to PKM2 after myocardial infarction.

      (4) Investigate Additional Pathways: Explore the involvement of other signaling pathways and tRNA synthetases that might interact with or complement the AARS2-PKM2 axis. This could uncover broader regulatory networks affecting cardiomyocyte survival and function.

      Thank this reviewer for his/her critical point. This certainly warrants future investigations.

      (5) Mitochondrial Function Assays: Perform detailed mitochondrial function assays, including measurements of mitochondrial respiration and membrane potential, to further elucidate the role of AARS2 in mitochondrial health and function under stress conditions.

      We fully agree with this reviewer that future studies are needed to address how AARS2 is involved in mitochondrial function.

      (6) Single-Cell Analysis: Utilize single-cell RNA sequencing to examine the heterogeneity in cardiomyocyte responses to AARS2 manipulation, providing insights into cell-specific adaptations and potential differential effects within the heart tissue.

      We fully agree with this reviewer that it is important to address how AARS2 (cKO or overexpression) regulate cardiomyocyte heterogeneity and function in the future. 

      Recommendations for Improving the Writing and Presentation:

      (1) Visual Aids: Include more schematic diagrams to illustrate the proposed mechanisms, especially the AARS2-PKM2 signaling pathway and its impact on metabolic reprogramming. This can help readers better understand complex interactions.

      Below is our working hypothesis on the role of AARS2 in cardiac protection. AARS2 deficiency caused mitochondrial dysfunction due to increasing ROS production and apoptosis while decreasing PKM2 function and glycolysis, thus leading to cardiomyopathy in mutant mice.  On the other hand, overexpression of AARS2 in mice activates PKM2 and glycolysis while decreases ROS production and apoptosis, thus improving heart function after myocardial infarction.

      Author response image 7.

      (2) Discussion: Shorten the Discussion and systematically address the significance of the findings, limitations of the study, and potential future directions. This will provide a clearer narrative and context for the results.

      We have now made revisions on the Discussion part to highlight the significance of this work and brief perspective of future direction.

      (3) Minor corrections to the text and figures.

      We have now revised the full text carefully.

      (4) Typographical Errors: Carefully proofread the manuscript to correct any typographical errors and ensure consistent use of terminology and abbreviations throughout the text.

      Thanks. Based on the reviewer’s suggestions, we have carefully revised the manuscript and have done proof-reading on the whole manuscript.

      Availability of data, code, reagents, research ethics, or other issues:

      (1) Data Presentation: Ensure that all graphs and charts are clearly labeled with appropriate units, scales, and legends. Use color schemes that are accessible to color-blind readers.

      We followed these rules to present the data.

      (2) Supplementary Information: Provide detailed supplementary information, including raw data, experimental protocols, and analysis scripts, to enhance the reproducibility of the study.

      We provided the raw data, experimental protocols, and analysis scripts in the manuscript.

      (3) Data and Code Availability. Data Sharing: Authors should ensure that all raw data, processed data, and relevant metadata are deposited in publicly accessible repositories. Provide clear instructions on how to access these data. Code Availability: Make all analysis code available in a public repository, such as GitHub, with adequate documentation to allow other researchers to replicate the analyses.

      We have deposited RNA-Seq data at ArrayExpress (E-MTAB-13767). We have also uploaded the original data in the supplementary file.

      (4) Research Ethics and Compliance. Ethics Statement: Include a detailed statement on the ethical approval obtained for animal experiments, specifying the institution and ethical review board that granted approval. Conflict of Interest: Clearly state any potential conflicts of interest and funding sources that supported the research to ensure transparency.

      Thanks. In the manuscript we made an ethical statement, stating conflicts of interest and sources of funding.

      References:

      (1) Y. Tang, M. Feng, Y. Su, T. Ma, H. Zhang, H. Wu, X. Wang, S. Shi, Y. Zhang, Y. Xu, S. Hu, K. Wei, D. Xu, Jmjd4 Facilitates Pkm2 Degradation in Cardiomyocytes and Is Protective Against Dilated Cardiomyopathy, Circulation, 147 (2023) 1684-1704.

      (2) L. Guo, L. Wang, G. Qin, J. Zhang, J. Peng, L. Li, X. Chen, D. Wang, J. Qiu, E. Wang, M-type pyruvate kinase 2 (PKM2) tetramerization alleviates the progression of right ventricle failure by regulating oxidative stress and mitochondrial dynamics, Journal of translational medicine, 21 (2023) 888.

      (3) B. Saleme, V. Gurtu, Y. Zhang, A. Kinnaird, A.E. Boukouris, K. Gopal, J.R. Ussher, G. Sutendra, Tissue-specific regulation of p53 by PKM2 is redox dependent and provides a therapeutic target for anthracycline-induced cardiotoxicity, Science translational medicine, 11 (2019).

      (4) Y. Mao, J. Zhang, Q. Zhou, X. He, Z. Zheng, Y. Wei, K. Zhou, Y. Lin, H. Yu, H. Zhang, Y. Zhou, P. Lin, B. Wu, Y. Yuan, J. Zhao, W. Xu, S. Zhao, Hypoxia induces mitochondrial protein lactylation to limit oxidative phosphorylation, Cell research, 34 (2024) 13-30.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this paper, Manley and Vaziri investigate whole-brain neural activity underlying behavioural variability in zebrafish larvae. They combine whole brain (single cell level) calcium imaging during the presentation of visual stimuli, triggering either approach or avoidance, and carry out whole brain population analyses to identify whole brain population patterns responsible for behavioural variability. They show that similar visual inputs can trigger large variability in behavioural responses. Though visual neurons are also variable across trials, they demonstrate that this neural variability does not degrade population stimulus decodability. Instead, they find that the neural variability across trials is in orthogonal population dimensions to stimulus encoding and is correlated with motor output (e.g. tail vigor). They then show that behavioural variability across trials is largely captured by a brain-wide population state prior to the trial beginning, which biases choice - especially on ambiguous stimulus trials. This study suggests that parts of stimulus-driven behaviour can be captured by brain-wide population states that bias choice, independently of stimulus encoding.

      Strengths:

      -The strength of the paper principally resides in the whole brain cellular level imaging in a well-known but variable behaviour.

      - The analyses are reasonable and largely answer the questions the authors ask.

      - Overall the conclusions are well warranted.

      Weaknesses:

      A more in-depth exploration of some of the findings could be provided, such as:

      - Given that thousands of neurons are recorded across the brain a more detailed parcelation of where the neurons contribute to different population coding dimensions would be useful to better understand the circuits involved in different computations.

      We thank the reviewer for noting the strengths of our study and agree that these findings have raised a number of additional avenues which we intend to explore in depth in future studies. In response to the reviewer’s comment above, we have added a number of additional figure panels (new Figures S1E, S3F-G, 4I(i), 4K(i), and S5F-G) and updated panels (Figures 4I(ii) and 4K(ii) in the revised manuscript) to show a more detailed parcellation of the visually-evoked neurons, noise modes, turn direction bias population, and responsiveness bias population. To do so. we have aligned our recordings to the Z-Brain atlas (Randlett et al., 2015) as shown in new Figure S1E. In addition, we provided a more detailed parcellation of the neuronal ensembles by providing projections of the full 3D volume along the xy and yz axes, in addition to the unregistered xy projection shown in Figures 4H and 4J in the revised manuscript. We also found that the distribution of neurons across our huc:h2b-gcamp6s recordings is very similar to the distribution of labeling in the huc:h2b-rfp reference image from the Z-Brain atlas (Figure S1E), which further supports our whole-brain imaging results.

      Overall, we find that this more detailed quantification and visualization is consistent with our interpretations. In particular, we show that the optimal visual decoding population (w<sub>opt</sub>) and the largest noise mode (e1) are localized to the midbrain (Figures S3F-G). This is expected, as in Figure 3 we first extracted a low-dimensional subspace of whole-brain neural activity that optimally preserved visual information. Additionally, we provide new evidence that the populations correlated with the turn bias and responsiveness bias are distributed throughout the brain, including a relatively dense localization to the cerebellum, telencephalon, and dorsal diencephalon (habenula, new Figures 4H-K and S5F-G).

      - Given that the behaviour on average can be predicted by stimulus type, how does the stimulus override the brain-wide choice bias on some trials? In other words, a better link between the findings in Figures 2 and 3 would be useful for better understanding how the behaviour ultimately arises.

      We agree with the reviewer that one of the most fundamental questions that this study has raised is how the identified neuronal populations predictive of decision variables (which we describe as an internal “bias”) interact with the well-studied, visually-evoked circuitry. A major limitation of our study is that the slow dynamics of the NL-GCaMP6s prevent clearly distinguishing any potential difference in the onset time of various neurons during the short trials, which might provide clues into which neurons drive versus later reflect the motor output. However, given that these ensembles were also found to be correlated with spontaneous turns, our hypothesis is that these populations reflect brain-wide drives that enable efficient exploration of the local environment (Dunn et al. 2016, doi.org/10.7554/eLife.12741). Further, we suspect that a sufficiently strong stimulus drive (e.g., large, looming stimuli) overrides these ongoing biases, which would explain the higher average pre-stimulus predictability in trials with small to intermediate-sized stimuli. An important follow-up line of experimentation could involve comparing the neuronal dynamics of specific components of the visual circuitry at distinct internal bias states, ideally utilizing emerging voltage indicators to maximize spatiotemporal specificity. For example, what is the difference between trials with a large looming stimulus in the left visual fields when the turn direction bias indicates a leftward versus rightward drive?

      - What other motor outputs do the noise dimensions correlate with?

      To better demonstrate the relationship between neural noise modes and motor activity that we described, we have provided a more detailed correlation analysis in new Figure S4A. We extracted additional features related to the larva’s tail kinematics, including tail vigor, curvature, principal components of curvature, angular velocity, and angular acceleration (S4A(i)). Some of these behavioral features were correlated with one another; for example, in the example traces, PC1 appears to capture nearly the same behavioral feature as tail vigor. The largest noise modes showed stronger correlations with motor output than the smaller noise modes, which is reminiscent recent work in the mouse showing that some of the neural dimensions with highest variance were correlated with various behavioral features (Musall et al. 2019; Stringer et al. 2019; Manley et al. 2024). We anticipate additional motor outputs would exhibit correlations with neural noise modes, such as pectoral fin movements (not possible to capture in our preparation due to immobilization) and eye movements.

      The dataset that the authors have collected is immensely valuable to the field, and the initial insights they have drawn are interesting and provide a good starting ground for a more expanded understanding of why a particular action is determined outside of the parameters experimenters set for their subjects.

      We thank the reviewer for noting the value of our dataset and look forward to future efforts motivated by the observations in our study.

      Reviewer #2 (Public Review):

      Overview

      In this work, Manley and Vaziri investigate the neural basis for variability in the way an animal responds to visual stimuli evoking prey-capture or predator-avoidance decisions. This is an interesting problem and the authors have generated a potentially rich and relevant data set. To do so, the authors deployed Fourier light field microscopy (Flfm) of larval zebrafish, improving upon prior designs and image processing schemes to enable volumetric imaging of calcium signals in the brain at up to 10 Hz. They then examined associations between neural activity and tail movement to identify populations primarily related to the visual stimulus, responsiveness, or turn direction - moreover, they found that the activity of the latter two populations appears to predict upcoming responsiveness or turn direction even before the stimulus is presented. While these findings may be valuable for future more mechanistic studies, issues with resolution, rigor of analysis, clarity of presentation, and depth of connection to the prior literature significantly dampen enthusiasm.

      Imaging

      - Resolution: It is difficult to tell from the displayed images how good the imaging resolution is in the brain. Given scattering and lensing, it is important for data interpretation to have an understanding of how much PSF degrades with depth.

      We thank the reviewer for their comments and agree that the dependence of the PSF and resolution as a function of depth is an important consideration in light field imaging. To quantify this, we measured the lateral resolution of the fLFM as a function of distance from the native image plane (NIP) using a USAF target. The USAF target was positioned at various depths using an automated z-stage, and the slice of the reconstructed volume corresponding to that depth was analyzed. An element was considered resolved if the modulation transfer function (MTF) was greater than 30%.

      In new Figure S1A, we plot the resolution measurements of the fLFM as compared to the conventional LFM (Prevedel et al., 2014), which shows the increase in resolution across the axial extent of imaging. In particular, the fLFM does not exhibit the dramatic drop in lateral resolution near the NIP which is seen in conventional LFM. In addition, the expanded range of high-resolution imaging motivates our increase from an axial range of 200 microns in previous studies to 280 microns in this study.

      - Depth: In the methods it is indicated that the imaging depth was 280 microns, but from the images of Figure 1 it appears data was collected only up to 150 microns. This suggests regions like the hypothalamus, which may be important for controlling variation in internal states relevant to the behaviors being studied, were not included.

      The full axial range of imaging was 280 microns, i.e. spanning from 140 microns below to 140 microns above the native imaging plane. After aligning our recordings to the Z-Brain dataset, we have compared the 3D distribution of neurons in our data (new Figure S1E(i)) to the labeling of the reference brain (Figure S1E(ii)). This provides evidence that our imaging preparation largely captures the labeling seen in a dense, high-resolution reference image within the indicated 280 microns range.

      - Flfm data processing: It is important for data interpretation that the authors are clearer about how the raw images were processed. The de-noising process specifically needs to be explained in greater detail. What are the characteristics of the noise being removed? How is time-varying signal being distinguished from noise? Please provide a supplemental with images and algorithm specifics for each key step.

      We thank the reviewer for their comment. To address the reviewer’s point regarding the data processing pipeline utilized in our study, in our revised manuscript we have added a number of additional figure panels in Figure S1B-E to quantify and describe the various steps of the pipeline in greater depth.

      First, the raw fLFM images are denoised. The denoising approach utilized in the fLFM data processing pipeline is not novel, but rather a custom-trained variant of Lecoq et al.’s (2021) DeepInterpolation method. In our original manuscript, we also described the specific architecture and parameters utilized to train our specific variation of DeepInterpolation model. To make this procedure clearer, we have added the following details to the methods:

      “DeepInterpolation is a self-supervised approach to denoising, which denoises the data by learning to predict a given frame from a set of frames before and after it. Time-varying signal can be distinguished from shot noise because shot noise is independent across frames, but signal is not. Therefore, only the signal is able to be predicted from adjacent frames. This has been shown to provide a highly effective and efficient denoising method (Lecoq et al., 2021).”

      Therefore, time-varying signal is distinguished from noise based on the correlations of pixel intensity across consecutive imaging frames. To better visualize this process, in new Figure S1B we show example images and fluorescence traces before and after denoising.

      - Merging: It is noted that nearby pixels with a correlation greater than 0.7 were merged. Why was this done? Is this largely due to cross-contamination due to a drop in resolution? How common was this occurrence? What was the distribution of pixel volumes after aggregation? Should we interpret this to mean that a 'neuron' in this data set is really a small cluster of 10-20 neurons? This of course has great bearing on how we think about variability in the response shown later.

      First, to be clear, nearby pixels were not merged; instead neuronal ROIs identified by CNMF-E were merged, as we had described: “the CNMF-E algorithm was applied to each plane in parallel, after which the putative neuronal ROIs from each plane were collated and duplicate neurons across planes were merged.” If this merging was not performed, the number of neurons would be overestimated due to the relatively dense 3D reconstruction with voxels of 4 m axially. Therefore, this merging is a requisite component of the pipeline to avoid double counting of neurons, regardless of the resolution of the data.

      However, we agree with the reviewer that the practical consequences of this merging were not previously described in sufficient detail. Therefore, in our revision we have added additional quantification of the two critical components of the merging procedure: the number of putative neuronal ROIs merged and the volume of the final 3D neuronal ROIs, which demonstrate that a neuron in our data should not be interpreted as a cluster of 10-20 neurons.

      In new Figure S1C(i), we summarize the rate of occurrence of merging by assessing the number of putative 2D ROIs which were merged to form each final 3D neuronal ROI. Across n=10 recordings, approximately 75% of the final 3D neuronal ROIs involved no merging at all, and few instances involved merging more than 5 putative ROIs. Next, in Figure S1C(ii), we quantify the volume of the final 3D ROIs. To do so, we counted the number of voxels contributing to each final 3D neuronal ROI and multiplied that by the volume of a single voxel (2.4 x 2.4 x 4 µm<sup>3</sup>). The majority of neurons had a volume of less than 1000 µm<up>3</sup>, which corresponds to a spherical volume with a radius of roughly 6.2 m. In summary, both the merging statistics and volume distribution demonstrate that few neuronal ROIs could be consistent with “a small cluster of 10-20 neurons”.

      - Bleaching: Please give the time constants used in the fit for assessing bleaching.

      As described in the Methods, the photobleaching correction was performed by fitting a bi-exponential function to the mean fluorescence across all neurons. We have provided the time constants determined by these fits for n=10 recordings in new Figure S1D(i). In addition, we provided an example of raw mean activity, the corresponding bi-exponential fit, and the mean activity after correction in Figure S1D(ii). These data demonstrate that the dominant photobleaching effect is a steep decrease in mean signal at the beginning of the recording (represented by the estimated time constant τ<sub>1</sub>), followed by a slow decay (τ<sub>2</sub>).

      Analysis

      - Slow calcium dynamics: It does not appear that the authors properly account for the slow dynamics of calcium-sensing in their analysis. Nuclear-localized GCaMP6s will likely have a kernel with a multiple-second decay time constant for many of the cells being studied. The value used needs to be given and the authors should account for variability in this kernel time across cell types. Moreover, by not deconvolving their signals, the authors allow for contamination of their signal at any given time with a signal from multiple seconds prior. For example, in Figure 4A (left turns), it appears that much of the activity in the first half of the time-warped stimulus window began before stimulus presentation - without properly accounting for the kernel, we don't know if the stimulus-associated activity reported is really stimulus-associated firing or a mix of stimulus and pre-stimulus firing. This also suggests that in some cases the signals from the prior trial may contaminate the current trial.

      We would like to respond to each of the points raised here by the reviewer individually.

      (1) “It does not appear that the authors properly account for the slow dynamics of calcium-sensing in their analysis. Nuclear-localized GCaMP6s will likely have a kernel with a multiple-second decay time constant for many of the cells being studied. The value used needs to be given…”

      We disagree with the reviewer’s claim that the slow dynamics of the calcium indicator GCaMP were not accounted for. While we did not deconvolve the neuronal traces with the GCaMP response kernel, in every step in which we correlated neural activity with sensory or motor variables, we convolved the stimulus or motor timeseries with the GCaMP kernel, as described in the Methods. Therefore, the expected delay and smoothing effects were accounted for when analyzing the correlation structure between neural and behavioral or stimulus variables, as well as during our various classification approaches. To better describe this, we have added the following description of the kernel to our Methods:

      “The NL-GCaMP6s kernel was estimated empirically by aligning and averaging a number of calcium events. This kernel corresponds to a half-rise time of 400 ms and half-decay time of 4910 ms.”

      This approach accounts for the GCaMP kernel when relating the neuronal dynamics to stimuli and behavior, while avoiding any artifacts that could be introduced from improper deconvolution or other corrections directly to the calcium dynamics. Deconvolution of calcium imaging data, and in particular nuclear-localized (NL) GCaMP6s, is not always a robust procedure. In particular, GCaMP6s has a much more nonlinear response profile than newer GCaMP variants such as jGCaMP8 (Zhang et al. 2023, doi:10.1038/s41586-023-05828-9), as the reviewer notes later in their comments. The nuclear-localized nature of the indicator used in our study also provides an additional nonlinear effect. Accounting for a nonlinear relationship between calcium concentration and fluorescence readout is significantly more difficult because such nonlinearities remove the guarantee that the optimization approaches generally used in deconvolution will converge to global extrema. This means that deconvolution assuming nonlinearities is far less robust than deconvolution using the linear approximation (Vogelstein et al. 2010, doi: 10.1152/jn.01073.2009). Therefore, we argue that we are not currently aware of any appropriate methods for deconvolving our NL-GCaMP6s data, and take a more conservative approach in our study.

      We also argue that the natural smoothness of calcium imaging data is important for the analyses utilized in our study (Shen et al., 2022, doi:10.1016/j.jneumeth.2021.109431). Even if our data were deconvolved in order to estimate spike trains or more point-like activity patterns, such data are generally smoothed (e.g., by estimating firing rates) before dimensionality reduction, which is a core component of our neuronal population analyses. Further, Wei et al. (2020, doi:10.1371/journal.pcbi.1008198) showed in detail that deconvolved calcium data resulted in less accurate population decoding, whereas binned electrophysiological data and raw calcium data were equally accurate. When using other techniques, such as clustering of neuronal activity patterns (a method we do not employ in this study), spike and deconvolved calcium data were instead shown to be more accurate than raw calcium data. Therefore, we do not believe deconvolution of the neuronal traces is appropriate in this case without a better understanding of the NL-GCaMP6s response, and do not rely on the properties of deconvolution for our analyses. Still, we agree with the reviewer that one must be mindful of the GCaMP kernel when analyzing and interpreting these data, and therefore have noted the delayed and slow kinematics of the NL-GCaMP within our manuscript, for example: “To visualize the neuronal activity during a given trial while accounting for the delay and kinematics of the nuclear-localized GCaMP (NL-GCaMP) sensor, a duration of approximately 15 seconds is extracted beginning at the onset of the 3-second visual stimulus period.”

      (2) “… and the authors should account for variability in this kernel time across cell types.”

      In addition to the points raised above, we are not aware of any deconvolution procedures which have successfully shown the ability to account for variability in the response kernel across cell types in whole-brain imaging data when cell type is unknown a priori. Pachitariu et al. (2018, doi:10.1523/JNEUROSCI.3339-17.2018) showed that the best deconvolution procedures for calcium imaging data rely on a simple algorithm with a fixed kernel. Further, more complicated approaches either utilize either explicit priors about the calcium kernel or learn implicit priors using supervised learning, neither of which we would be able to confirm are appropriate for our dataset without ground truth electrophysiological spike data.

      However, we agree with the reviewer that we must interpret the data while being mindful that there could be variability in this kernel across neurons, which is not accounted for in our fixed calcium kernel. We have added the following sentence to our revised manuscript to highlight this limitation:

      “The used of a fixed calcium kernel does not account for any variability in the GCaMP response across cells, which could be due to differences such as cell type or expression level. Therefore, this analysis approach may not capture the full set of neurons which exhibit stimulus correlations but exhibit a different GCaMP response.”

      (3) “without properly accounting for the kernel, we don't know if the stimulus-associated activity reported is really stimulus-associated firing or a mix of stimulus and pre-stimulus firing”

      While we agree with the reviewer that the slow dynamics of the indicator will cause a delay and smoothing of the signal over time, we would like to point out that this effect is highly directional. In particular, we can be confident that pre-stimulus activity is not contaminated by the stimulus given the data we describe in the next point regarding the timing of visual stimuli relative to the GCaMP kernel. The reviewer is correct that post-stimulus firing can be mixed with pre-stimulus firing due to the GCaMP kernel. However, our key claims in Figure 4 center around turn direction and responsiveness biases, which are present even before the onset of the stimulus. Still, we have highlighted this delay and smoothing to readers in the updated version of our manuscript.

      (4) “This also suggests that in some cases the signals from the prior trial may contaminate the current trial”

      We have carefully chosen the inter-stimulus interval for maximum efficiency of stimulation, while ensuring that contamination from the previous stimulus is negligible. The inter-stimulus interval was chosen by empirically analyzing preliminary data of visual stimulation with our preparation. New Figure S3C shows the delay and slow kinematics due to our indicator; indeed, visually-evoked activity peaks after the end of the short stimulus period. Importantly, however, the visually-evoked activity is at or near baseline at the start of the next trial.

      Finally, we would like to note that our stimulation protocol is randomized, as described in the Methods. Therefore, the previous stimulus has no correlation with the current stimulus, which would prevent any contamination from providing predictive power that could be identified by our visual decoding methods.

      - Partial Least Squares (PLS) regression: The steps taken to identify stimulus coding and noise dimensions are not sufficiently clear. Please provide a mathematical description.

      We have updated the Results and Methods sections of our revised manuscript to describe in more mathematical detail the approach taken to identify the relevant dimensions of neuronal activity:

      “The comparison of the neural dimensions encoding visual stimuli versus trial-to-trial noise was modeled after Rumyantsev et al. (2020). Partial least squares (PLS) regression was used to find a low-dimensional space that optimally predicted the visual stimuli, which we refer to as the visually-evoked neuronal activity patterns. To perform regression, a visual stimulus kernel was constructed by summing the timeseries of each individual stimulus type, weighted by the stimulus size and negated for trials on the right visual field, thus providing a single response variable encoding both the location, size, and timing of all the stimulus presentations. This stimulus kernel was the convolved with the temporal response kernel of our calcium indicator (NL-GCaMP6s).

      PLS regression identifies the normalized dimensions and that maximize the covariance between paired observations and , respectively. In our case, the visual stimulus is represented by a single variable , simplifying the problem to identifying the subspace of neural activity that optimally preserves information about the visual stimulus (sometimes referred to as PLS1 regression). That is, the N x T neural time series matrix X is reduced to a d x T matrix spanned by a set of orthonormal vectors. PLS1 regression is performed as follows:

      PLS1 algorithm

      Let X<sub>i</sub> = X and . For i = 1…d,

      (1) 

      (2) 

      (3) 

      (4) 

      (5)  (note this is scalar)

      (6) 

      The projections of the neural data {p<sub>i</sub>} thus span a subspace that maximally preserves information about the visual stimulus . Stacking these projections into the N x d matrix P that represents the transform from the whole-brain neural state space to the visually-evoked subspace, the optimal decoding direction is given by the linear least squares solution . The dimensionality d of PLS regression was optimized using 6-fold cross-validation with 3 repeats and choosing the dimensionality between d = 1 and 20 with the lowest cross-validated mean squared error for each larva. Then, was computed using all time points.

      For each stimulus type, the noise covariance matrix  was computed in the low-dimensional PLS space, given that direct estimation of the noise covariances across many thousands of neurons would likely be unreliable. A noise covariance matrix was calculated separately for each stimulus, and then averaged across all stimuli. As before, the mean activity µ<sub>i</sub> for each neuron  was computed over each stimulus presentation period. The noise covariance then describes the correlated fluctuations δ<sub>i</sub> around this mean response for each pair of neurons i and j, where

      The noise modes for α = 1 …d were subsequently identified by eigendecomposition of the mean noise covariance matrix across all stimuli, . The angle between the optimal stimulus decoding direction and the noise modes is thus given by .”

      - No response: It is not clear from the methods description if cases where the animal has no tail response are being lumped with cases where the animal decides to swim forward and thus has a large absolute but small mean tail curvature. These should be treated separately. 

      We thank the reviewer for raising the potential for this confusion and agree that forward-motion trials should not treated the same as motionless trials. While these types of trial were indeed treated separately in our original manuscript, we have updated the Methods section of our revised manuscript to make this clear:

      “Left and right turn trials were extracted as described previously. Response trials included both left and right turn trials (i.e., the absolute value of mean tail curvature > σ<sub>active</sub>), whereas nonresponse trials were motionless (absolute mean tail curvature < σ<sub>active</sub>). In particular, forward-motion trials were excluded from these analyses.”

      While our study has focused specifically on left and right turns, we hypothesize that the responsiveness bias ensemble may also be involved in forward movements and look forward to future work exploring the relationship between whole-brain dynamics and the full range of motor outputs.

      - Behavioral variability: Related to Figure 2, within- and across-subject variability are confounded. Please disambiguate. It may also be informative on a per-fish basis to examine associations between reaction time and body movement.

      The reviewer is correct that our previously reported summary statistics in Figure 2D-F were aggregated across trials from multiple larvae. Following the reviewer’s suggestion to make the magnitudes of across-larvae and within-larva variability clear, in our revised manuscript we have added two additional figure panels to Figure S2.

      New Figure S2A highlights the across-larvae variability in mean head-directed behavioral responses to stimuli of various sizes. Overall, the relationship between stimulus size and the mean tail curvature across trials is largely consistent across larvae; however, the crossing-over point between leftward (positive curvature) and rightward (negative curvature) turns for a given side of the visual field exhibits some variability across larvae.

      New Figure S2B shows examples of within-larva variability by plotting the mean tail curvature during single trials for two example larvae. Consistent with Figure 2G which also demonstrates within-larva variability, responses to a given stimulus are variable across trials in both examples. However, this degree of within-larva variability can appear different across larvae. For example, the larva shown on the left of Figure S2B exhibits greater overlap between responses to stimuli presented on opposite visual fields, whereas the larva shown on the right exhibits greater distinction between responses.

      - Data presentation clarity: All figure panels need scale bars - for example, in Figure 3A there is no indication of timescale (or time of stimulus presentation). Figure 3I should also show the time series of the w_opt projection.

      We appreciate the reviewer’s attention to detail in this regard. We have added scalebars to Figures 3A, 3H-I, S4B(ii), 4H, 4J in the revised manuscript, and all new figure panels where relevant. In addition, the caption of Figure 3A has been updated to include a description of the time period plotted relative to the onset of the visual stimulus.

      Additionally, we appreciate the reviewer’s idea to show w<sub>opt</sub> in Figure 3J of the revised manuscript (previously Figure 3I). This clearly shows that the visual decoding project is inactive during the short baseline period before visual stimulation begins, whereas the noise mode is correlated with motor output throughout the recording.

      - Pixel locations: Given the poor quality of the brain images, it is difficult to tell the location of highlighted pixels relative to brain anatomy. In addition, given that the midbrain consists of much more than the tectum, it is not appropriate to put all highlighted pixels from the midbrain under the category of tectum. To aid in data interpretation and better connect this work with the literature, it is recommended that the authors register their data sets to standard brain atlases and determine if there is any clustering of relevant pixels in regions previously associated with prey-capture or predator-avoidance behavior.

      We agree with the reviewer that registration of our datasets to a standard brain atlas is a highly useful addition. While the dense, pan-neuronal labeling makes the isolation of highly specific circuit components difficult, we have shown in more detail the specific brain regions contributing to these populations by aligning our recordings to the Z-Brain atlas (Randlett et al., 2015) as shown in new Figures S1E, S3F-G, 4I, 4K, and S5F-G. In addition, we provided a more detailed parcellation of the neuronal ensembles by providing projections of the full 3D volume along the xy and yz axes, in addition to the unregistered xy projection shown in new Figures 4H and 4J. We also found that the distribution of neurons in our huc:H2B-GCaMP6s recordings is very similar to the distribution of labeling in the huc:H2B-RFP reference image from the Z-Brain atlas (new Figure S1E), which further supports our whole-brain imaging results.

      Overall, we find that this more detailed quantification and visualization is consistent with the interpretations in the previous version of our manuscript. In particular, we show that optimal visual decoding population (w<sub>opt</sub>) and largest noise mode (e1) are localized to the midbrain (new Figures S3F-G), which is expected since in Figure 3 we first extracted a low-dimensional subspace of whole-brain neural activity that optimally preserved visual information. Additionally, we provide additional evidence that the populations correlated with the turn bias and responsiveness bias are distributed throughout the brain, including a relatively dense localization to the cerebellum, telencephalon, and dorsal diencephalon (habenula, new Figures 4H-K and S5F-G).

      Finally, the reviewer is correct that our original label of “tectum” was a misnomer; the region analyzed corresponded to the midbrain, including the tegmentum, torus longitudinalis, and torus semicicularis in addition to the tectum. We have updated the brain regions shown and labels throughout the manuscript.

      Interpretation

      - W_opt and e_1 orthogonality: The statement that these two vectors, determined from analysis of the fluorescence data, are orthogonal, actually brings into question the idea that true signal and leading noise vectors in firing-rate state-space are orthogonal. First, the current analysis is confounding signals across different time periods - one could assume linearity all the way through the transformations, but this would only work if earlier sources of activation were being accounted for. Second, the transformation between firing rate and fluorescence is most likely not linear for GCaMP6s in most of the cells recorded. Thus, one would expect a change in the relationship between these vectors as one maps from fluorescence to firing rate.

      Unfortunately, we are not entirely sure we have understood the reviewer’s argument. We are assuming that the reviewer’s first sentence is suggesting that the observation of orthogonality in the neural state space measured in calcium imaging precludes the possibility (“actually brings into question”, as the reviewer states) that the same neural ensembles could be orthogonal in firing rate state space measured by electrophysiological data. If this is the reviewer’s conjecture, we respectfully disagree with it. Consider a toy example of a neural network containing N ensembles of neurons, where the neurons within an ensemble all fire simultaneously, and two populations never fire at the same time. As long as the “switching” of firing between ensembles is not fast relative to the resolution of the GCaMP kernel, the largest principal components would represent orthogonal dimensions differentiating the various ensembles, both when observing firing rates or observing timeseries convolved by the GCaMP kernel. This is a simple example where the observed orthogonality would appear similar in both calcium imaging and electrophysical data, demonstrating that we should not allow conclusions from fluorescence data to “bring into question” that the same result could be observed in firing rate data.

      We also disagree with the reviewer’s argument that we are “confounding signals across time periods”. Indeed, we must interpret the data in light of the GCaMP response kernel. However, all of the analyses presented here are performed on instantaneous measurements of population activity patterns. These activity patterns do represent a smoothed, likely nonlinear integration of recent neuronal activity, but unless the variability in the GCaMP response kernel (discussed above) is widely different across these populations (which has not been observed in the literature), we do not expect that the GCaMP transformations would artificially induce orthogonality in our analysis approach. Such smoothing operations tend to instead increase correlations across neurons and population decoding approaches generally benefit from this smoothness, as we have argued above. However, a much more problematic situation would be if we were comparing the activity of two neuronal populations at different points in time (which we do not include in this study), in which case the nonlinearities could overaccentuate orthogonality between non-time-matched activity patterns.

      Finally, we agree with the reviewer that the transformation between firing rate and fluorescence is very likely nonlinear and that these vectors of population activity do not perfectly represent what would be observed if one had access to whole-brain, cellular-resolution electrophysiology spike data. However, similar observations regarding the brain-wide, distributed encoding of behavior have been confirmed across recording modalities in the mouse (Stringer et al., 2019; Steinmetz et al., 2019), where large-scale electrophysiology utilizing highly invasive probes (e.g., Neuropixels) is more feasible than in the larval zebrafish. With the advent of whole-brain voltage imaging in the larval zebrafish, we expect any differences between calcium and voltage dynamics will be better understood, yet such techniques will likely continue to suffer to some extent from the nonlinearities described here.

      - Sources of variability: The authors do not take into account a fairly obvious source of variability in trial-to-trial response - eye position. We know that prey capture responsiveness is dependent on eye position during stimulus (see Figure 4 of PMID: 22203793). We also expect that neurons fairly early in the visual pathway with relatively narrow receptive fields will show variable responses to visual stimuli as the degree of overlap with the receptive field varies with eye movement. There can also be small eye-tracking movements ahead of the decision to engage in prey capture (Figure 1D, PMID: 31591961) that can serve as a drive to initiate movements in a particular direction. Given these possibilities indicating that the behavioral measure of interest is gaze, and the fact that eye movements were apparently monitored, it is surprising that the authors did not include eye movements in the analysis and interpretation of their data.

      We agree with the reviewer that eye movements, such as saccades and convergence, are important motor outputs that are well-known to play a role in the sequence of motor actions during prey capture and other behaviors. Therefore, we have added the following new eye tracking results to our revised manuscript:

      “In order to confirm that the observed neural variability in the visually-evoked populations was not predominantly due to eye movements, such as saccades or convergence, we tracked the angle of each eye. We utilized DeepLabCut, a deep learning tool for animal pose estimation (Mathis et al., 2018), to track keypoints on the eye which are visible in the raw fLFM images, including the retina and pigmentation (Figure S3D(i)). This approach enabled identification of various eye movements, such as convergence and the optokinetic reflex (Figure S3D(ii-iii)). Next, we extracted a number of various eye states, including those based on position (more leftward vs. rightward angles) and speed (high angular velocity vs. low or no motion). Figure S3E(i) provides example stimulus response profiles across trials of the same visual stimulus in each of these eye states, similar to a single column of traces in Figure 3A broken out into more detail. These data demonstrate that the magnitude and temporal dynamics of the stimulus-evoked responses show apparently similar levels of variability across eye states. If neural variability was driven by eye movement during the stimulus presentation, for example, one would expect to see much more variability during the high angular velocity trials than low, which is not apparent. Next, we asked whether the dominant neural noise modes vary across eye states, which would suggest that the geometry of neuronal variability is influenced by eye movements or states. To do so, the dominant noise modes were estimated in each of the individual eye conditions, as well as bootstrapped trials from across all eye conditions. The similarity of these noise modes estimated from different eye conditions (Figure S3E(ii), right)) was not significantly different from the similarity of noise modes estimated from bootstrapped random samples across all eye conditions (Figure S3E(ii), left)). Therefore, while movements of the eye likely contribute to aspects of the observed neural variability, they do not dominate the observed neural variability here, particularly given our observation that the largest noise mode represents a considerable fraction of the observed neural variance (Figure 3E).”

      While these results provide an important control in our study, we anticipate further study of the relationship between eye movements or states, visually-evoked neural activity, and neural noise modes would identify the additional neural ensembles which are correlated with and drive this additional motor output.

      Reviewer #3 (Public Review):

      Summary:

      In this study, Manley and Vaziri designed and built a Fourier light-field microscope (fLFM) inspired by previous implementations but improved and exclusively from commercially available components so others can more easily reproduce the design. They combined this with the design of novel algorithms to efficiently extract whole-brain activity from larval zebrafish brains.

      This new microscope was applied to the question of the origin of behavioral variability. In an assay in which larval zebrafish are exposed to visual dots of various sizes, the fish respond by turning left or right or not responding at all. Neural activity was decomposed into an activity that encodes the stimulus reliably across trials, a 'noise' mode that varies across trials, and a mode that predicts tail movements. A series of analyses showed that trial-to-trial variability was largely orthogonal to activity patterns that encoded the stimulus and that these noise modes were related to the larvae's behavior.

      To identify the origins of behavioral variability, classifiers were fit to the neural data to predict whether the larvae turned left or right or did not respond. A set of neurons that were highly distributed across the brain could be used to classify and predict behavior. These neurons could also predict spontaneous behavior that was not induced by stimuli above chance levels. The work concludes with findings on the distributed nature of single-trial decision-making and behavioral variability.

      Strengths:

      The design of the new fLFM microscope is a significant advance in light-field and computational microscopy, and the open-source design and software are promising to bring this technology into the hands of many neuroscientists.

      The study addresses a series of important questions in systems neuroscience related to sensory coding, trial-to-trial variability in sensory responses, and trial-to-trial variability in behavior. The study combines microscopy, behavior, dynamics, and analysis and produces a well-integrated analysis of brain dynamics for visual processing and behavior. The analyses are generally thoughtful and of high quality. This study also produces many follow-up questions and opportunities, such as using the methods to look at individual brain regions more carefully, applying multiple stimuli, investigating finer tail movements and how these are encoded in the brain, and the connectivity that gives rise to the observed activity. Answering questions about variability in neural activity in the entire brain and its relationship to behavior is important to neuroscience and this study has done that to an interesting and rigorous degree.

      Points of improvement and weaknesses:

      The results on noise modes may be a bit less surprising than they are portrayed. The orthogonality between neural activity patterns encoding the sensory stimulus and the noise modes should be interpreted within the confounds of orthogonality in high-dimensional spaces. In higher dimensional spaces, it becomes more likely that two random vectors are almost orthogonal. Since the neural activity measurements performed in this study are quite high dimensional, a more explicit discussion is warranted about the small chance that the modes are not almost orthogonal.

      We agree with the reviewer that orthogonality is less “surprising” in high-dimensional spaces, and we have added this important point of interpretation to our revised manuscript. Still, it is important to remember that while the full neural state space is very high-dimensional (we record that activity of up to tens of thousands of neurons simultaneously), our analyses regarding the relationship between the trial-to-trial noise modes and decoding dimensions were performed in a low-dimensional subspace (up to 20 dimensions) identified by PLS regression to that optimally preserved visual information. This is a key step in our analysis which serves two purposes: 1. it removes some of the confound described the reviewer regarding the dimensionality of the neural state space analyzed; and 2. it ensures that the noise modes we analyze are even relevant to sensorimotor processing. It would certainly not be surprising or interesting if we identified a neural dimension outside the midbrain which was orthogonal to the optimal visual decoding dimension. 

      Regardless, in order to better control for this confound, we estimated the distribution of angles between random vectors in this subspace. As we describe in the revised manuscript:

      “However, in high-dimensional spaces, it becomes increasingly common that two random vectors could appear orthogonal. While this is particularly a concern when analyzing a neural state space spanned by tens of thousands of neurons, our application of PLS regression to identify a low-dimensional subspace of relevant neuronal activity partially mitigates this concern. In order to control for this confound, we compared the angles between w<sub>opt</sub> and e1 across larvae to that computed with shuffled versions of w<sub>opt,shuff</sub> estimated by randomly shuffling the stimulus labels before identifying the optimal decoding direction. While it is possible to observe shuffled vectors which are nearly orthogonal to e<sub>1</sub>, the shuffled distribution spans a significantly greater range of angles than the observed data, demonstrating that this orthogonality is not simply a consequence of analyzing multi-dimensional activity patterns.”

      The conclusion that sparsely distributed sets of neurons produce behavioral variability needs more investigation because the way the results are shown could lead to some misinterpretations. The prediction of behavior from classifiers applied to neural activity is interesting, but the results are insufficiently presented for two reasons.

      (1) The neurons that contribute to the classifiers (Figures 4H and J) form a sufficient set of neurons that predict behavior, but this does not mean that neurons outside of that set cannot be used to predict behavior. Lasso regularization was used to create the classifiers and this induces sparsity. This means that if many neurons predict behavior but they do so similarly, the classifier may select only a few of them. This is not a problem in itself but it means that the distributions of neurons across the brain (Figures 4H and J) may appear sparser and more distributed than the full set of neurons that contribute to producing the behavior. This ought to be discussed better to avoid misinterpretation of the brain distribution results, and an alternative analysis that avoids the confound could help clarify.

      We thank the reviewer for raising this point, which we agree should be discussed in the manuscript. Lasso regularization was a key ingredient in our analysis; l2 regularization alone was not sufficient to prevent overfitting to the training trials, particularly when decoding turn direction and responsiveness. Previous studies have also found that sparse subsets of neurons better predict behavior than single neuron or non-sparse populations, for example Scholz et al. (2018).

      While showing l2 regularization would not be a fair comparison given the poor performance of the l2-regularized classifiers, we opted to identify a potentially “fuller” set of neurons correlated with these biases based on the correlation between each neuron’s activity over the recording and the projection along the turn direction or responsiveness dimension identified using l1 regularization. This procedure has the potential to identify all neurons correlated with the final ensemble dynamics, rather than just a “sufficient set” for lasso regression. In new Figures S5F-G, we show the 3D distribution of all neurons significantly correlated with these biases, which appear similar to those in Figures 4H-K and widely distributed across practically the entire labeled area of the brain.

      (2) The distribution of neurons is shown in an overly coarse manner in only a flattened brain seen from the top, and the brain is divided into four coarse regions (telencephalon, tectum, cerebellum, hindbrain). This makes it difficult to assess where the neurons are and whether those four coarse divisions are representative or whether the neurons are in other non-labeled deeper regions. For these two reasons, some of the statements about the distribution of neurons across the brain would benefit from a more thorough investigation.

      We agree with the reviewer that a more thorough description and visualization of these distributed populations is warranted.

      While the dense, pan-neuronal labeling makes the isolation of highly specific circuit components difficult, we have shown in more detail the specific brain regions contributing to these populations by aligning our recordings to the Z-Brain atlas (Randlett et al., 2015) as shown in new Figures S1E, S3F-G, 4I, 4K, and S5F-G. In addition, we provided a more detailed parcellation of the neuronal ensembles by providing projections of the full 3D volume along the xy and yz axes, in addition to the unregistered xy projection shown in new Figures 4H and 4J. We also found that the distribution of neurons in our huc:H2B-GCaMP6s recordings is very similar to the distribution of labeling in the huc:H2B-RFP reference image from the Z-Brain atlas (new Figure S1E), which further supports our whole-brain imaging results.

      Overall, we find that this more detailed quantification and visualization is consistent with the interpretations in the previous version of our manuscript. In particular, we show that optimal visual decoding population (w<sub>opt</sub>) and largest noise mode (e1) are localized to the midbrain (new Figures S3F-G), which is expected since in Figure 3 we first extracted a low-dimensional subspace of whole-brain neural activity that optimally preserved visual information. Additionally, we provide additional evidence that the populations correlated with the turn bias and responsiveness bias are distributed throughout the brain, including a relatively dense localization to the cerebellum, telencephalon, and dorsal diencephalon (habenula, new Figures 4H-K and S5F-G).

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      In addition to the overall strengths and weaknesses above, I have a few specific comments that I think could improve the study:

      (1) In lines 334-335 you write that 'We proceeded to build various logistic regression classifiers to decode'. Do you mean you tested this with other classifier types as well (e.g. SVM, Naive Bayes) or do you mean various because you trained the classifier described in the methods on each animal? This is not clear. If it is the first, more information is needed about what other classifiers you used.

      We appreciate the reviewer raising this point of clarification. Here, we simply meant that we fit the multiclass logistic regression classifier in the one-vs-rest scheme. In this sense, a single multiclass logistic regression classifier was fit for each larva. We have updated our revised manuscript with this clarification: “The visual stimuli were decoded using a one-versus-rest, multiclass logistic regression classifier with lasso regularization.”

      (2) In Figure 3 you train the decoder on all visually responsive cells identified across the brain. Does this reliability of stimulus decoding also hold for neurons sampled from specific brain regions? For example, does this reliable decoding come from stronger and more reliable responses in the optic tectum, whereas stimulus decodability is not as good in visual encoding neurons identified in other structures?

      In new Figure S5B, we show the performance of stimulus decoding from various brain regions. We find that stimulus classification is possible from the midbrain and cerebellum, very poor from the hindbrain, and not possible from the telencephalon during the period between stimulus onset and the decision.

      (3) In relation to point 2, it would be good to show in which brain areas the visually responsive neurons are located, and maybe the average coefficients per brain area. Plots like Figures 3G, and H would benefit from a quantification into areas. Similarly, a parcellation into more specific brain areas in Figure 4 would also be valuable.

      In addition to providing a more detailed parcellation of the turn direction and responsiveness bias populations in Figure 4, we have provided a similar visualization and quantification of the optimal stimulus decoding population and the dominant noise mode in new Figures S3F-G, respectively.

      (4) In Figure 3f, it is not clear to me how this shows that w<sub>opt</sub> and e1 are orthogonal. They appear correlated.

      The orthogonality we quantify is related to the pattern of coefficients across neurons, not necessarily the timeseries of their projections. The slight shift in the noise mode activations as you move from stimuli on the left visual field to the right actually comes from the motor outputs. Large left stimuli tend to evoke a rightward turn and vice versa, and the example noise mode shown encodes the directionality and vigor of tail movements, resulting in the slight shifts observed.

      (5) I think the wording of this conclusion is too strong for the results and a bit illogical:

      'Thus, our data suggest that the neural dynamics underlying single-trial action selection are the result of a widely-distributed circuit that contains subpopulations encoding internal time-varying biases related to both the larva's responsiveness and turn direction, yet distinct from the sensory encoding circuitry.'

      If that is the case, how is it even possible that the larvae can do a visually guided behaviour?

      Especially given Suppl Fig 4C it would be more appropriate to say something along the lines of: 'When stimuli are highly ambiguous, single trial action selection is dominated by widely-distributed circuit that contains subpopulations encoding internal time-varying biases related to both the larva's responsiveness and turn direction, that encode choice distinctly from the sensory encoding circuitry'.

      We appreciate the reviewer’s suggestion and have re-worded this line in the discussion in order to clarify that these time-varying biases are predominant in the case of ambiguous stimuli, as shown in Figure S5C in our revised manuscript (corresponding to Figure S4C in our original submission).

      (6) Line 599: typo: trial-to-trail

      We thank the reviewer for noting this error, which has been corrected in the revised text of the manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Weaknesses:  

      (1) The heatmaps (for example, Figure 3A, B) are challenging to read and interpret due to their size. Is there a way to alter the visualization to improve interpretability? Perhaps coloring the heatmap by general anatomical region could help? We feel that these heatmaps are critical to the utility of the registration strategy, and hence, clear visualization is necessary. 

      We thank the reviewers for this point on aesthetic improvement, and we agree that clearer visualization of our correlation heatmaps is important. To address this point, we have incorporated the capability of grouping “child” subregions in anatomical order by their more general “parent” region into the package function, plot_correlation_heatmaps(). Parent regions will be can now be plotted as smaller sub-facets in the heatmaps. We have also rearranged our figures to fit enlarged heatmaps in Figures 3-5, and Supplementary Figure 10 for easier visualization. 

      (2) Additional context in the Introduction on the use of immediate early genes to label ensembles of neurons that are specifically activated during the various behavioral manipulations would enable the manuscript and methodology to be better appreciated by a broad audience. 

      We thank the reviewers for this suggestion and have revised the first part of our Introduction to reflect the broader use and appeal of immediate early genes (IEGs) for studying neural changes underlying behavior.

      (3) The authors mention that their segmentation strategies are optimized for the particular staining pattern exhibited by each reporter and demonstrate that the manually annotated cell counts match the automated analysis. They mention that alternative strategies are compatible, but don't show this data. 

      We thank the reviewers for this comment. We also appreciate that integration with alternative strategies is a major point of interest to readers, given that others may be interested in compatibility with our analysis and software package, rather than completely revising their own pre-existing pipelines. 

      Generally, we have validated the ability to import datasets generated from completely different workflows for segmentation and registration. We have since released documentation on our package website with step-by-step instructions on how to do so (https://mjin1812.github.io/SMARTTR/articles/Part5.ImportingExternalDatasets). We believe this tutorial is a major entry point to taking advantage of our analysis package, without adopting our entire workflow.

      This specific point on segmentation refers to the import_segmentation_custom()function in the package. As there is currently not a standard cell segmentation export format adopted by the field, this function still requires some data wrangling into an import format saved as a .txt file. However, we chose not to visually demonstrate this capability in the paper for a few reasons.  

      i) A figure showing the broad testing of many different segmentation algorithms, (e.g., Cellpose, Vaa3d, Trainable Weka Segmentation) would better demonstrate the efficacy of segmentation of these alternative approaches, which have already been well-documented. However, demonstrating importation compatibility is more of a demonstration of API interface, which is better shown in website documentation and tutorial notebooks.

      ii) Additionally, showing importation with one well-established segmentation approach is still a demonstration of a single use case. There would be a major burden-of-proof in establishing importation compatibility with all potential alternative platforms, their specific export formats, which may be slightly different depending on post-processing choices, and the needs of the experimenters (e.g., exporting one versus many channels, having different naming conventions, having different export formats). For example, output from Cellpose can take the form of a NumPy file (_seg.npy file), a .png, or Native ImageJ ROI archive output, and users can have chosen up to four channels. Until the field adopts a standardized file format, one flexible enough to account for all the variables of experimental interest, we currently believe it is more efficient to advise external groups on how to transform their specific data to be compatible with our generic import function.  

      (4) The authors provided highly detailed information for their segmentation strategy, but the same level of detail was not provided for the registration algorithms. Additional details would help users achieve optimal alignment.

      We apologize for this lack of detail. The registration strategy depends upon the WholeBrain (Fürth et al., 2018) package for registration to the Allen Mouse Common Coordinate Framework. While this strategy has been published and documented elsewhere, we have substantially revised our methods section on the registration process to better incorporate details of this approach.

      (5) The authors illustrate registration to the Allen atlas. Can they comment on whether the algorithm is compatible with other atlases or with alternative sectioning planes (horizontal/sagittal)? 

      Since the current registration workflow integrates WholeBrain (Fürth et al., 2018), any limitations of WholeBrain apply to our approach, which means limited support for registering non-coronal sectioning planes and reliance on the Allen Mouse Atlas (Dong, 2008). However, network analysis and plotting functions are currently compatible with the Allen Mouse Brain Atlas and the Kim Unified Mouse Brain Atlas version (2019) (Chon et al., 2019). Therefore, current limitations in registration do not preclude the usefulness of the SMARTTR software in generating valuable insights from network analysis of externally imported datasets. 

      There are a number of alternative workflows, such as the QUINT workflow (Yates et al., 2019), that support multiple different mouse atlases, and registration of arbitrarily sectioned angles. We have plans to support and a facilitate an entry point for this workflow in a future iteration of SMARTTR, but believe it is of benefit to the wider community to release and support SMARTTR in its current state.

      (6) Supplemental Figures S10-13 do not have a legend panel to define the bar graphs. 

      We apologize for this omission and have fixed our legends in our resubmission. Our supplement figure orders have changed and the corresponding figures are now Supplemental Figures S11-14.

      (7) When images in a z-stack were collapsed, was this a max intensity projection or average? Assuming this question is in regards to our manual cell counting validation approach, the zstacks were collapsed as a maximum intensity projection.  

      Reviewer #2 (Public review): 

      Weaknesses: 

      (1) While I was able to install the SMARTR package, after trying for the better part of one hour, I could not install the "mjin1812/wholebrain" R package as instructed in OSF. I also could not find a function to load an example dataset to easily test SMARTR. So, unfortunately, I was unable to test out any of the packages for myself. Along with the currently broken "tractatus/wholebrain" package, this is a good example of why I would strongly encourage the authors to publish SMARTR on either Bioconductor or CRAN in the future. The high standards set by Bioc/CRAN will ensure that SMARTR is able to be easily installed and used across major operating systems for the long term. 

      We greatly thank the reviewer for pointing out this weakness; long-term maintenance of this package is certainly a mutual goal. Loading an .RDATA file is accomplished by either doubleclicking directly on the file in a directory window, after specifying this file type should be opened in RStudio or by using the load() function, (e.g., load("directory/example.RData")). We have now explicitly outlined these directions in the online documentation. 

      Moreover, we have recently submitted our package to CRAN and are currently working on revisions following comments. This has required a package rebranding to “SMARTTR”, as there were naming conflicts with a previously archived repository on CRAN. Currently, SMARTTR is not dependent on the WholeBrain package, which remains optional for the registration portion of our workflow. Ultimately, this independence will allow us to maintain the analysis and visualization portion of the package independently.

      In the meantime, we have fully revised our installation instructions (https://mjin1812.github.io/SMARTTR/articles/SMARTTR). SMARTTR is now downloadable from a CRAN-like repository as a bundled .tar.gz file, which should ease the burden of installation significantly. Installation has been verified on a number of different versions of R on different platforms. Again, we hope these changes are sufficient and improve the process of installation. 

      (2) The package is quite large (several thousand lines include comments and space). While impressive, this does inherently make the package more difficult to maintain - and the authors currently have not included any unit tests. The authors should add unit tests to cover a large percentage of the package to ensure code stability. 

      We have added unit testing to improve the reliability of our package. Unit tests now cover over 71% of our source code base and are available for evaluation on our github website (https://github.com/mjin1812/SMARTTR). We focused on coverage of the most front-facing functions. We appreciate this feedback, which has ultimately enhanced the longevity of our software.

      (3) Why do the authors choose to perform image segmentation outside of the SMARTTR package using ImageJ macros? Leading segmentation algorithms such as CellPose and StarMap have well-documented APIs that would be easy to wrap in R. They would likely be faster as well. As noted in the discussion, making SMARTTR a one-stop shop for multi-ensemble analyses would be more appealing to a user. 

      We appreciate this feedback. We believe parts of our response to Reviewer 1, Comment 3, are relevant to this point. Interfaces for CellPose and ClusterMap (which processes in situ transcriptomic approaches, like STARmap) are both in python, and currently there are ways to call python from within R (https://rstudio.github.io/reticulate/index.html). We will certainly explore incorporating these APIs from R. However, we would anticipate this capability is more similar to “translation” between programming languages, but would not currently preclude users from the issue of needing some familiarity with the capabilities of these python packages, and thus with python syntax.

      (4) Given the small number of observations for correlation analyses (n=6 per group), Pearson correlations would be highly susceptible to outliers. The authors chose to deal with potential outliers by dropping any subject per region that was> 2 SDs from the group mean. Another way to get at this would be using Spearman correlation. How do these analyses change if you use Spearman correlation instead of Pearson? It would be a valuable addition for the author to include Spearman correlations as an option in SMARTTR. 

      We thank reviewers for this suggestion and we have updated our code base to include the possibility for using Spearman’s correlation coefficient as opposed to Pearson’s correlation coefficient for heatmaps in the get_correlations() function. Users can now use the `method` parameter, set to either “pearson” or “spearman” and results will propagate throughout the rest of the analysis using these results.

      Below, in Author response image 1 we show a visual comparison of the correlation heat maps for active eYFP<sup>+</sup> ensembles in the CT and IS groups using both Pearson and Spearman correlations. We see a strongly qualitative similarity between the heat maps. Of course, since the statistical assumptions underlying the relationship between variables using Pearson correlation (linear) vs Spearman correlation (monotonic) are different, users should take this into account when interpreting results using different approaches.

      Author response image 1.

      Pearson and Spearmen regional correlations of eYFP+ ensembles activity in the CT and IS groups.

      (5) I see the authors have incorporated the ability to adjust p-values in many of the analysis functions (and recommend the BH procedure) but did not use adjusted p-values for any of the analyses in the manuscript. Why is this? This is particularly relevant for the differential correlation analyses between groups (Figures 3P and 4P). Based on the un-adjusted pvalues, I assume few if any data points will still be significant after adjusting. While it's logical to highlight the regional correlations that strongly change between groups, the authors should caution which correlations are "significant" without adjusting for multiple comparisons. As this package now makes this analysis easily usable for all researchers, the authors should also provide better explanations for when and why to use adjusted p-values in the online documentation for new users. 

      We appreciate the feedback note that our dataset is presented as a more demonstrative and exploratory resource for readers and, as such, we accept a high tolerance for false positives, while decreasing risk of missing possible interesting findings. As noted by Reviewer #2, it is still “logical to highlight the regional correlations that strongly change between groups.” We have clarified in our methods that we chose to present uncorrected p-values when speaking of significance. 

      We have also removed any previous recommendations for preferred methods for multiple comparisons adjustment in our function documentations, as some previous documentation was outdated. Moreover, the standard multiple comparisons adjustment approaches assume complete independence between tests, whereas this assumption is violated in our differential correlational analysis (i.e., a region with one significantly altered connection is more likely than another to have another significantly altered connection).

      Ultimately, the decision to correct for multiple comparisons with standard FDR, and choice of significance threshold, should still be informed by standard statistical theory and user-defined tolerance for inclusion of false-positives and missing of false-negatives. This will be influenced by factors, such as the nature and purpose of the study, and quality of the dataset.  

      (6) The package was developed in R3.6.3. This is several years and one major version behind the current R version (4.4.3). Have the authors tested if this package runs on modern R versions? If not, this could be a significant hurdle for potential users. 

      We thank reviewers for pointing out concerns regarding versioning. We have since updated our installation approach for SMARTTR, which is compatible with versions of R >= 3.6 and has been tested on Mac ARM-based (Apple silicon) architecture (R v4.4.2), and Windows 10 (R v3.6.3, v4.5.0 [devel]). 

      The recommendation for users to install R 3.6.3 is primarily for those interested in using our full workflow, which requires installation of the WholeBrain package, which is currently a suggested package. We anticipate updating and supporting the visualization and network analysis capabilities, whilst maintaining previous versioning for the full workflow presented in this paper.  

      (7) In the methods section: "Networks were constructed using igraph and tidygraph packages." - As this is a core functionality of the package, it would be informative to specify the exact package versions, functions, and parameters for network construction. 

      We thank reviewers for pointing out the necessity for these details for code reproducibility. We have since clarified our language in the manuscript on the exact functions we use in our analysis and package versions, which we also fully document in our online tutorial. Additionally. We have printed our package development and analysis environment online at https://mjin1812.github.io/SMARTTR/articles/Part7.Development.

      (8) On page 11, "Next, we examined the cross-correlations in IEG expression across brain regions, as strong co-activation or opposing activation can signify functional connectivity between two regions" - cross-correlation is a specific analysis in signal processing. To avoid confusion, the authors should simply change this to "correlations". 

      We thank the reviewer for pointing out this potentially confusing phrasing. We have changed all instances of “cross-correlation” to “correlation”.

      (9) Panels Q-V are missing in Figure 5 caption. 

      We thank the reviewer for pointing out this oversight. We have now fixed this in our revision.

      References

      Chon, U., Vanselow, D. J., Cheng, K. C., & Kim, Y. (2019). Enhanced and unified anatomical labeling for a common mouse brain atlas. Nature Communications, 10(1), 5067. https://doi.org/10.1038/s41467-019-13057-w

      Dong, H. W. (2008). The Allen reference atlas: A digital color brain atlas of the C57Bl/6J male mouse (pp. ix, 366). John Wiley & Sons Inc.

      Fürth, D., Vaissière, T., Tzortzi, O., Xuan, Y., Märtin, A., Lazaridis, I., Spigolon, G., Fisone, G., Tomer, R., Deisseroth, K., Carlén, M., Miller, C. A., Rumbaugh, G., & Meletis, K. (2018). An interactive framework for whole-brain maps at cellular resolution. Nature Neuroscience, 21(1), 139–149. https://doi.org/10.1038/s41593-017-0027-7

      Yates, S. C., Groeneboom, N. E., Coello, C., Lichtenthaler, S. F., Kuhn, P.-H., Demuth, H.-U., Hartlage-Rübsamen, M., Roßner, S., Leergaard, T., Kreshuk, A., Puchades, M. A., & Bjaalie, J. G. (2019). QUINT: Workflow for Quantification and Spatial Analysis of Features in Histological Images From Rodent Brain. Frontiers in Neuroinformatics, 13. https://www.frontiersin.org/articles/10.3389/fninf.2019.00075

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):  

      Summary:  

      This study provides new insights into the role of miR-19b, an oncogenic microRNA, in the developing chicken pallium. Dynamic expression pattern of miR-19b is associated with its role in regulating cell cycle progression in neural progenitor cells. Furthermore, miR-19b is involved in determining neuronal subtypes by regulating Fezf2 expression during pallial development. These findings suggest an important role for miR-19b in the coordinated spatio-temporal regulation of neural progenitor cell dynamics and its evolutionary conservation across vertebrate species.  

      Strengths:  

      The authors identified conserved roles of miR-19 in the regulation of neural progenitor maintenance between mouse and chick, and the latter is mediated by the repression of E2f8 and NeuroD1. Furthermore, the authors found that miR-19b-dependent cell cycle regulation is tightly associated with specification of Fezf1 or Mef2c-positive neurons, in spatio-temporal manners during chicken pallial development. These findings uncovered molecular mechanisms underlying microRNA-mediated neurogenic controls.  

      Weaknesses:  

      Although the authors in this study claimed striking similarities of miR-19a/b in neurogenesis between mouse and chick pallium, a previous study by Bian et al. revealed that miR-19a contributes the expansion of radial glial cells by suppressing PTEN expression in the developing mouse neocortex, while miR-19b maintains apical progenitors via inhibiting E2f2 and NeuroD1 in chicken pallium. Thus, it is still unclear whether the orthologous microRNAs regulate common or species-specific target genes.  

      In this study, we have proposed that miR-19b regulates similar phenomena in both species using different targets, such as regulation of proliferation through PTEN in mouse and through E2f8 in the chicken.

      The spatiotemporal expression patterns of miR-19b and several genes are not convincing. For example, the authors claim that NeuroD1 is initially expressed uniformly in the subventricular zone (SVZ) but disappears in the DVR region by HH29 and becomes detectable by HH35 (Figure 1). However, the in situ hybridization data revealed that NeuroD1 is highly expressed in the SVZ of the DVR at HH29 (Figure 4F). Thus, perhaps due to the problem of immunohistochemistry, the authors have not been able to detect NeuroD1 expression in Figure 1D, and the interpretation of the data may require significant modification.  

      While Fig. 1B may suggest that NeuroD1 expression has disappeared from the DVR region by HH29, this is not true in general because we have observed NeuroD1 to be expressed in the DVR at HH29 in images of other sections. In the revised version, we will include improved images for panels of Fig. 1B which accurately show the expression pattern of NeuroD1 and miR19b at stages HH29 and HH35.  

      It seems that miR-19b is also expressed in neurons (Figure 1), suggesting the role of miR19-b must be different in progenitors and differentiated neurons. The data on the gain- and loss-offunction analysis of miR-19b on the expression of Mef2c should be carefully considered, as it is possible that these experiments disturb the neuronal functions of miR19b rather than in the progenitors.

      As pointed out by the reviewer, it is quite possible that upon manipulation of miR19b its neuronal functions are also perturbed in addition to its function in progenitor cells. After introducing gain-of-function construct in progenitor cells, we have observed changes in the morphology of these cells. These data will be included in the revised version.

      The regions of chicken pallium were not consistent among figures: in Figure 1, they showed caudal parts of the pallium (HH29 and 35), while the data in Figure 4 corresponded to the rostral part of the pallium (Figure 4B).  

      We will address this by providing images from a similar region of the pallium showing Fezf2 and Mef2c expression patterns.

      The neurons expressing Fezf2 and Mef2 in the chicken pallium are not homologous neuronal subtypes to mammalian deep and superficial cortical neurons. The authors must understand that chicken pallial development proceeds in an outside-in manner. Thus, Mef2c-postive neurons in a superficial part are early-born neurons, while FezF2-positive neurons residing in deep areas are later-born neurons. It should be noted that the expression of a single marker gene does not support cell type homology, and the authors' description "the possibility of primitive pallial lamina formation in common ancestors of birds and mammals" is misleading.  

      We appreciate this clarification and will modify or remove this statement regarding the “primitive pallial lamina formation” to avoid any confusion and misinterpretation. 

      Overexpression of CDKN1A or Sponge-19b induced ectopic expression of Fezf2 in the ventricular zone (Figure 3C, E). Do these cells maintain progenitor statement or prematurely differentiate to neurons? In addition, the authors must explain that the induction of Fezf2 is also detected in GFP-negative cells.  

      We propose to follow up on the fate of these cells by extending the observation period post-overexpression of CDKN1A or Sponge-19b to assess whether they retain progenitor characteristics or differentiate. The presence of Fezf2 in GFP-negative cells could be due to the non-cell-autonomous effects, and we will discuss this possibility in the revised manuscript.

      Reviewer #2 (Public review):  

      Summary:  

      This paper investigates the general concept that avian and mammalian pallium specifications share similar mechanisms. To explore that idea, the authors focus their attention on the role of miR-19b as a key controlling factor in the neuronal proliferation/differentiation balance. To do so, the authors checked the expression and protein level of several genes involved in neuronal differentiation, such as NeuroD1 or E2f8, genes also expressed in mammals after conducting their functional gene manipulation experiments. The work also shows a dysregulation in the number of neurons from lower and upper layers when miR-19b expression is altered.  

      To test it, the authors conducted a series of functional experiments of gain and loss of function (G&LoF) and enhancer-reporter assays. The enhancer-reporter assays demonstrate a direct relationship between miR-19b and NeuroD1 and E2f8 which is also validated by the G&LoF experiments. It´s also noteworthy to mention that the way miR-19b acts is maintaining the progenitor cells from the ventricular zone in an undifferentiated stage, thus promoting them into a stage of cellular division.  

      Overall, the paper argues that the expression of miR-19b in the ventricular zone promotes the cells in a proliferative phase and inhibits the expression of differentiation genes such as E2f8 and NeurD1. The authors claim that a decrease in the progenitor cell pool leads to an increase and decrease in neurons in the lower and upper layers, respectively.  

      Strengths:  

      (1) Novelty Contribution  

      The paper offers strong arguments to prove that the neurodevelopmental basis between mammals and birds is quite the same. Moreover, this work contributes to a better understanding of brain evolution along the animal evolutionary tree and will give us a clearer idea about the roots of how our brain has been developed. This stands in contrast to the conventional framing of mammal brain development as an independent subject unlinked to the "less evolved species". The authors also nicely show a concept that was previously restricted to mammals - the role of microRNAs in development.  

      (2) Right experimental approach  

      The authors perform a set of functional experiments correctly adjusted to answer the role of miR-19b in the control of neuronal stem cell proliferation and differentiation. Their histological, functional, and genetic approach gives us a clear idea about the relations between several genes involved in the differentiation of the neurons in the avian pallium. In this idea, they maintain the role of miR-19b as a hub controller, keeping the ventricular zone cells in an undifferentiated stage to perpetuate the cellular pool.  

      (3) Future directions  

      The findings open a door to future experiments, particularly to a better comprehension of the role of microRNAs and pallidal genetic connections. Furthermore, this work also proves the use of avians as a model to study cortical development due to the similarities with mammals.  

      Weaknesses:  

      While there are questions answered, there are still several that remain unsolved. The experiments analyzed here lead us to speculate that the early differentiation of the progenitor cells from the ventricular zone entails a reduction in the cellular pool, affecting thereafter the number of latter-born neurons (upper layers). The authors should explore that option by testing progenitor cell markers in the ventricular zone, such as Pax6. Even so, it remains possible that miR-19b is also changing the expression pattern of neurons that are going to populate the different layers, instead of their numbers, so the authors cannot rule that out or verify it. Since the paper focuses on the role of miR-19b in patterning, I think the authors should check the relationship and expression between progenitors (Pax6) and intermediate (Tbr2) cells when miR-19b is affected. Since neuronal expression markers change so fast within a few days (HH24HH35), I don't understand why the authors stop the functional experiments at different time points.  

      To address this, we will examine the expression of Pax6 and Tbr2 following both gain-of-function and loss-of-function manipulations of miR-19b. We agree with the reviewer that miR-19b may influence not only the number of neurons but also the expression pattern of neuronal markers.  Due to the limitations of our experimental design, we acknowledge that this possibility cannot be ruled out. 

      Regarding time points chosen for the functional experiments: We selected different stages based on the expression dynamics of specific markers. To detect possible ectopic induction, we analyzed developmental stages where the expression of a given marker is normally absent. Conversely, to detect loss of expression we examined stages in which the marker is typically expressed robustly. This approach allowed us to better interpret the functional consequences of miR-19b manipulation within relevant developmental windows. 

      Reviewer #3 (Public review):  

      Summary:  

      This is a timely article that focuses on the molecular machinery in charge of the proliferation of pallial neural stem cells in chicks, and aims to compare them to what is known in mammals. miR19b is related to controlling the expression of E2f8 and NeuroD1, and this leads to a proper balance of division/differentiation, required for the generation of the right number of neurons and their subtype proportions. In my opinion, many experiments do reflect an interaction between all these genes and transcription factors, which likely supports the role of miR19b in participating in the proliferation/differentiation balance.  

      Strengths:  

      Most of the methodologies employed are suitable for the research question, and present data to support their conclusions.  

      The authors were creative in their experimental design, in order to assess several aspects of pallial development.  

      Weaknesses:  

      However, there are several important issues that I think need to be addressed or clarified in order to provide a clearer main message for the article, as well as to clarify the tools employed. I consider it utterly important to review and reinterpret most of the anatomical concepts presented here. The way the are currently used is confusing and may mislead readers towards an understanding of the bird pallium that is no longer accepted by the community.  

      Major Concerns:  

      (1) Inaccurate use of neuroanatomy throughout the entire article. There are several aspects to it, that I will try to explain in the following paragraphs:  

      Figure 1 shows a dynamic and variable expression pattern of miR19b and its relation to NeuroD1. Regardless of the terms used in this figure, it shows that miR19b may be acting differently in various parts of the pallium and developmental stages. However, all the rest of the experiments in the article (except a few cases) abolish these anatomical differences. It is not clear, but it is very important, where in the pallium the experiments are performed. I refer here, at least, to Figures 2C, E, F, H, I; 3D, E; 4C, D, G, I. Regarding time, all experiments were done at HH22, and the article does not show the native expression at this stage. The sacrifice timing is variable, and this variability is not always justified. But more importantly, we don't know where those images were taken, or what part of the pallium is represented in the images. Is it always the same? Do results reflect differences between DVR and Wulst gene expression modifications? The authors should include low magnification images of the regions where experiments were performed. And they should consider the variable expression of all genes when interpreting results.  

      We agree that precise anatomical context is essential. In the revised version, we propose to: 

      a) Include schematics of the regions of interest where experimental manipulations were performed.

      b) Provide low-magnification panoramic images where appropriate, for anatomical reference.

      c) Show the expression patterns of relevant marker genes to better justify stages and region selection. 

      d) Provide the expression pattern of markers in panoramic view to show differential expression in the DVR and Wulst region and interpret our results accordingly.

      b) SVZ is not a postmitotic zone (as stated in line 123, and wrongly assigned throughout the text and figures). On the contrary, the SVZ is a secondary proliferative zone, organized in a layer, located in a basal position to the VZ. Both (VZ and SVZ) are germinative zones, containing mostly progenitors. The only postmitotic neurons in VZ and SVZ occupy them transiently when moving to the mantle zone, which is closer to the meninges and is the postmitotic territory. Please refer to the original Boulder committee articles to revise the SVZ definition. The authors, however, misinterpret this concept, and label the whole mantle zone as it this would be the SVZ. Indeed, the term "mantle zone" does not appear in the article. Please, revise and change the whole text and figures, as SVZ statements and photographs are nearly always misinterpreted. Indeed, SVZ is only labelled well in Figure 4F.  

      The two articles mentioning the expression of NeuroD1 in the SVZ (line 118) are research in Xenopus. Is there a proliferative SVZ in Xenopus?  

      For the actual existence of the SVZ in the chick pallium, please refer to the recent Rueda-Alaña et al., 2025 article that presents PH3 stainings at different timepoints and pallial areas.  

      We appreciate the correction suggested by the reviewer. In the revised manuscript: a) SVZ will be labeled correctly in all figures and descriptions b) The mantle zone terminology will be incorporated appropriately c) The two Xenopus-based references in line 118 will be removed as they are not directly relevant and d) We will refer to the Rueda-Alaña et al., (2025) to guide accurate anatomical labeling and interpretation of proliferative zones.

      We also acknowledge that while some proliferative cells exist in the SVZ of the chicken, they are relatively few and do not express typical basal progenitor markers such as Tbr2 (Nomura et al., 2016, Development). We will ensure that this nuance is clearly reflected in the text. 

      What is the Wulst, according to the authors of the article? In many figures, the Wulst includes the medial pallium and hippocampus, whereas sometimes it is used as a synonym of the hyperpallium (which excludes the medial pallium and hippocampus). Please make it clear, as the addition or not of the hippocampus definitely changes some interpretations.  

      We propose to modify the text and figures to accurately represent the correct location of the Wulst in the chick pallium.

      d) The authors compare the entirety of the chick pallium - including the hippocampus (see above), hyperpallium, mesopallium, nidopallium - to only the neocortex of mammals. This view - as shown in Suzuki et al., 2012 - forgets the specificity of pallial areas of the pallium and compares it to cortical cells. This is conceptually wrong, and leads to incorrect interpretations (please refer to Luis Puelles' commentaries on Suzuki et al results); there are incorrect conclusions about the existence of upper-layer-like and deep-layer-like neurons in the pallium of birds. The view is not only wrong according to the misinterpreted anatomical comparisons, but also according to novel scRNAseq data (Rueda-Alaña et al., 2025; Zaremba et al., 2025; Hecker et al., 2025). These articles show that many avian glutamatergic neurons of the pallium have highly diversified, and are not comparable to mammalian cortical cells. The authors should therefore avoid this incorrect use of terminology. There are not such upper-layer-like and deeplayer-like neurons in the pallium of birds.  

      We acknowledge this conceptual oversight. In the manuscript: a) We will avoid direct comparisons between the entire chick pallium and the mammalian neocortex b) Terms like “upper-layer-like” and deep-layer-like” neurons will be removed or modified d) We will cite and integrate recent findings from Rueda-Alaña et al. (2025), Zaremba et al. (2025), and Hecker et al. (2025), which provide updated insights from scRNAseq analyses into the complexity of avian pallial neurons. Cell types will be described based on marker gene expression only, without unsupported evolutionary or homology claims.

      (2) From introduction to discussion, the article uses misleading terms and outdated concepts of cell type homology and similarity between chick and pallial territories and cells. The authors must avoid this confusing terminology, as non-expert readers will come to evolutionary conclusions which are not supported by the data in this article; indeed, the article does not deal with those concepts.  

      We agree with the reviewer. In the revised version, we will remove the misleading terms and outdated concepts and avoid speculative evolutionary conclusions.  

      a) Recent articles published in Science (Rueda-Alaña et al., 2025; Zaremba et al., 2025; Hecker et al., 2025) directly contradict some views presented in this article. These articles should be presented in the introduction as they are utterly important for the subject of this article and their results should be discussed in the light of the new findings of this article. Accordingly, the authors should avoid claiming any homology that is not currently supported. The expression of a single gene is not enough anymore to claim the homology of neuronal populations.  

      In the revised version, these above-mentioned articles (Rueda-Alaña et al., 2025; Zaremba et al., 2025; Hecker et al., 2025) will be included in the introduction and discussion.  Our interpretations will be updated to reflect these new insights into neuronal diversity and regionalization in the chick pallium. 

      Auditory cortex is not an appropriate term, as there is no cortex in the pallium of birds. Cortical areas require the existence of neuronal arrangements in laminae that appear parallel to the ventricular surface. It is not the case of either hyperpallium or auditory DVR. The accepted term, according to the Avian Nomenclature forum, is Field L.  

      We will replace all instances of “auditory cortex” with “Field L”, as per the accepted terminology in the Avian Nomenclature Forum.

      c) Forebrain, a term overused in the article, is very unspecific. It includes vast areas of the brain, from the pretectum and thalamus to the olfactory bulb. However, the authors are not researching most of the forebrain here. They should be more specific throughout the text and title.  

      In the revised version, we will replace “forebrain” with “Pallium” throughout the manuscript to more accurately reflect the regions studied.

      (3) In the last part of the results, the authors claim miR19b has a role in patterning the avian pallium. What they see is that modifying its expression induces changes in gene expression in certain neurons. Accordingly, the altered neurons would differentiate into other subtypes, not similar to the wild type example. In this sense, miR19b may have a role in cell specification or neuronal differentiation. However, patterning is a different developmental event, which refers to the determination of broad genetic areas and territories. I don't think miR19b has a role in patterning.  

      We agree with the reviewers that an alteration in one marker for a particular cell type may not indicate a change in patterning. However, including the effect of miR-19b gain- and loss-of-function on Pax6 and Tbr2, may strengthen the idea that it affects patterning as suggested by reviewer #2. 

      (4) Please add a scheme of the molecules described in this article and the suggested interaction between them.  

      In the revised version, we propose to include a diagram to visually summarize the proposed interactions between miR-19b, E2f8, NeuroD1, and other key regulators.  

      (5) The methods section is way too brief to allow for repeatability of the procedures. This may be due to an editorial policy but if possible, please extend the details of the experimental procedures.  

      We will expand the Methods section to provide more detailed protocols and justifications for experimental design, in alignment with journal policy.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary:

      In this study, the authors aim to understand the neural basis of implicit causal inference, specifically how people infer causes of illness. They use fMRI to explore whether these inferences rely on content-specific semantic networks or broader, domain-general neurocognitive mechanisms. The study explores two key hypotheses: first, that causal inferences about illness rely on semantic networks specific to living things, such as the 'animacy network,' given that illnesses affect only animate beings; and second, that there might be a common brain network supporting causal inferences across various domains, including illness, mental states, and mechanical failures. By examining these hypotheses, the authors aim to determine whether causal inferences are supported by specialized or generalized neural systems.

      The authors observed that inferring illness causes selectively engaged a portion of the precuneus (PC) associated with the semantic representation of animate entities, such as people and animals. They found no cortical areas that responded to causal inferences across different domains, including illness and mechanical failures. Based on these findings, the authors concluded that implicit causal inferences are supported by content-specific semantic networks, rather than a domain-general neural system, indicating that the neural basis of causal inference is closely tied to the semantic representation of the specific content involved.

      Strengths:

      (1) The inclusion of the four conditions in the design is well thought out, allowing for the examination of the unique contribution of causal inference of illness compared to either a different type of causal inference (mechanical) or non-causal conditions. This design also has the potential to identify regions involved in a shared representation of inference across general domains.

      (2) The presence of the three localizers for language, logic, and mentalizing, along with the selection of specific regions of interest (ROIs), such as the precuneus and anterior ventral occipitotemporal cortex (antVOTC), is a strong feature that supports a hypothesis-driven approach (although see below for a critical point related to the ROI selection).

      (3) The univariate analysis pipeline is solid and well-developed.

      (4) The statistical analyses are a particularly strong aspect of the paper.

      Weaknesses:

      Based on the current analyses, it is not yet possible to rule out the hypothesis that inferring illness causes relies on neurocognitive mechanisms that support causal inferences irrespective of their content, neither in the precuneus nor in other parts of the brain.

      (1) The authors, particularly in the multivariate analyses, do not thoroughly examine the similarity between the two conditions (illness-causal and mechanical-causal), as they are more focused on highlighting the differences between them. For instance, in the searchlight MVPA analysis, an interesting decoding analysis is conducted to identify brain regions that represent illness-causal and mechanical-causal conditions differently, yielding results consistent with the univariate analyses. However, to test for the presence of a shared network, the authors only perform the Causal vs. Non-causal analysis. This analysis is not very informative because it includes all conditions mixed together and does not clarify whether both the illness-causal and mechanical-causal conditions contribute to these results.

      (2) To address this limitation, a useful additional step would be to use as ROIs the different regions that emerged in the Causal vs. Non-causal decoding analysis and to conduct four separate decoding analyses within these specific clusters:

      (a) Illness-Causal vs. Non-causal - Illness First;

      (b) Illness-Causal vs. Non-causal - Mechanical First;

      (c) Mechanical-Causal vs. Non-causal - Illness First;

      (d) Mechanical-Causal vs. Non-causal - Mechanical First.

      This approach would allow the authors to determine whether any of these ROIs can decode both the illness-causal and mechanical-causal conditions against at least one non-causal condition.

      (3) Another possible analysis to investigate the existence of a shared network would be to run the searchlight analysis for the mechanical-causal condition versus the two non-causal conditions, as was done for the illness-causal versus non-causal conditions, and then examine the conjunction between the two. Specifically, the goal would be to identify ROIs that show significant decoding accuracy in both analyses.

      The hypothesis that a neural mechanism supports causal inference across domains predicts higher univariate responses when causal inferences occur than when they do not. This prediction was not generated by us ad hoc but rather has been made by almost all previous cognitive neuroscience papers on this topic (Ferstl & von Cramon, 2001; Satpute et al., 2005; Fugelsang & Dunbar, 2005; Kuperberg et al., 2006; Fenker et al., 2010; Kranjec et al., 2012; Pramod, Chomik-Morales, et al., 2023; Chow et al., 2008; Mason & Just, 2011; Prat et al., 2011). Contrary to this hypothesis, we find that the precuneus (PC) is most activated for illness inferences and most deactivated for mechanical inferences relative to rest, suggesting that the PC does not support domain-general causal inference. To further probe the selectivity of the PC for illness inferences, we created group overlap maps that compare PC responses to illness inferences and mechanical inferences across participants. The PC shows a strong preference for illness inferences and is therefore unlikely to support causal inferences irrespective of their content (Supplementary Figures 6 and 7). We also note that, in whole-cortex analysis, no shared regions responded more to causal inference than noncausal vignettes across domains. Therefore, the prediction made by the ‘domain-general causal engine’ proposal as it has been articulated in the literature is not supported in our data.

      Taking a multivariate approach, the hypothesis that a neural mechanism supports causal inference across domains also predicts that relevant regions can decode between all possible pairs of causal vs. noncausal conditions (e.g., Illness-Causal vs. Noncausal-Illness First, Mechanical-Causal vs. Noncausal-Illness First, etc.). The analysis described by the reviewer in (2), in which the regions that distinguish between causal vs. noncausal conditions in searchlight MVPA are used as ROIs to test various causal vs. noncausal contrasts, is non-independent. Therefore, we cannot perform this analysis. In accordance with the reviewer’s suggestions in (3), now include searchlight MVPA results for the mechanical inference condition compared to the two noncausal conditions (Supplementary Figure 9). No regions are shared across the searchlight analyses comparing all possible pairs of causal and noncausal conditions, providing further evidence that there are no shared neural responses to causal inference in our dataset.

      (4) Along the same lines, for the ROI MVPA analysis, it would be useful not only to include the illness-causal vs. mechanical-causal decoding but also to examine the illness-causal vs. non-causal conditions and the mechanical-causal vs. non-causal conditions. Additionally, it would be beneficial to report these data not just in a table (where only the mean accuracy is shown) but also using dot plots, allowing the readers to see not only the mean values but also the accuracy for each individual subject.

      We have performed these analyses and now include a table of the results as well as figures displaying the dispersion across participants (Supplementary Tables 2 and 3, Supplementary Figures 10 and 11). In the left PC, the illness inference condition was decoded from one of the noncausal conditions, and the mechanical inference condition was decoded from the same noncausal condition. The language network did not decode between any causal/noncausal pairs. In the logic network, the illness inference condition was decoded from one of the noncausal conditions, and the mechanical inference condition was decoded from the other noncausal condition. Thus, no regions showed the predicted ‘domain-general’ pattern, i.e., significant decoding between all causal/noncausal pairs. 

      Importantly, the decoding results must be interpreted in light of significant univariate differences across conditions (e.g., greater responses to illness inferences compared to noncausal vignettes in the PC). Linear classifiers are highly sensitive to univariate differences (Coutanche, 2013; Kragel et al., 2012; Hebart & Baker, 2018; Woolgar et al., 2014; Davis et al., 2014; Pakravan et al., 2022).

      (5) The selection of Regions of Interest (ROIs) is not entirely straightforward:

      In the introduction, the authors mention that recent literature identifies the precuneus (PC) as a region that responds preferentially to images and words related to living things across various tasks. While this may be accurate, we can all agree that other regions within the ventral occipital-temporal cortex also exhibit such preferences, particularly areas like the fusiform face area, the occipital face area, and the extrastriate body area. I believe that at least some parts of this network (e.g., the fusiform gyrus) should be included as ROIs in this study. This inclusion would make sense, especially because a complementary portion of the ventral stream known to prefer non-living items (i.e., anterior medial VOTC) has been selected as a control ROI to process information about the mechanical-causal condition. Given the main hypothesis of the study - that causal inferences about illness might depend on content-specific semantic representations in the 'animacy network' - it would be worthwhile to investigate these ROIs alongside the precuneus, as they may also yield interesting results.

      We thank the reviewer for their suggestion to test the FFA region. We think this provides an interesting comparison to the PC and hypothesized that, in contrast to the PC, the FFA does not encode abstract causal information about animacy-specific processes (i.e., illness). As we mention in the Introduction, although the fusiform face area (FFA) also exhibits a preference for animates, it does so primarily for images in sighted people (Kanwisher et al., 1997; Kanwisher et al., 1997; Grill-Spector et al., 2004; Noppeney et al., 2006; Konkle & Caramazza, 2013; Connolly et al., 2016; Bi et al., 2016).

      We did not select the FFA as a region of interest when preregistering the current study because we did not predict it would show sensitivity to causal knowledge. In accordance with the reviewer’s suggestions, we now include the FFA as an ROI in individual-subject univariate analysis (Supplementary Figure 8, Appendix 4). Because we did not run a separate FFA localizer task when collecting the data, we used FFA search spaces from a previous study investigating responses to face images (Julian et al., 2012). We followed the same analysis procedure that was used to investigate responses to illness inferences in the PC. Neither left nor right FFA exhibited a preference for illness inferences compared to mechanical inferences or to the noncausal conditions. This result is interesting and is now briefly discussed in the Discussion section.

      (6) Visual representation of results:

      In all the figures related to ROI analyses, only mean group values are reported (e.g., Figure 1A, Figure 3, Figure 4A, Supplementary Figure 6, Figure 7, Figure 8). To better capture the complexity of fMRI data and provide readers with a more comprehensive view of the results, it would be beneficial to include a dot plot for a specific time point in each graph. This could be a fixed time point (e.g., a certain number of seconds after stimulus presentation) or the time point showing the maximum difference between the conditions of interest. Adding this would allow for a clearer understanding of how the effect is distributed across the full sample, such as whether it is consistently present in every subject or if there is greater variability across individuals.

      We thank the reviewer for this suggestion. We now include scattered box plots displaying the dispersion in average percent signal change across participants in Figures 1, 3, and 4, and Supplementary Figures 8, 12, and 14.

      (7) Task selection:

      (a) To improve the clarity of the paper, it would be helpful to explain the rationale behind the choice of the selected task, specifically addressing: (i) why an implicit inference task was chosen instead of an explicit inference task, and (ii) why the "magic detection" task was used, as it might shift participants' attention more towards coherence, surprise, or unexpected elements rather than the inference process itself.

      (b) Additionally, the choice to include a large number of catch trials is unusual, especially since they are modeled as regressors of non-interest in the GLM. It would be beneficial to provide an explanation for this decision.

      We chose an orthogonal foil detection task, rather than an explicit causal judgment task, to investigate automatic causal inferences during reading and to unconfound such processing as much as possible from explicit decision-making processes (see Kuperberg et al., 2006 for discussion). Analogous foil detection paradigms have been used to study sentence processing and word recognition (Pallier et al., 2011; Dehaene-Lambertz et al., 2018). We now clarify this in the Introduction. The “magical” element occurred both within and across sentences so that participants could not use coherence as a cue to complete the task. Approximately 1/5 (19%) of the trials were magical catch trials to ensure that participants remained attentive throughout the experiment.

      Reviewer #2 (Public review):

      Summary:

      In this study, the authors hypothesize that "causal inferences about illness depend on content-specific semantic representations in the animacy network". They test this hypothesis in an fMRI task, by comparing brain activity elicited by participants' exposure to written situations suggesting a plausible cause of illness with brain activity in linguistically equivalent situations suggesting a plausible cause of mechanical failure or damage and non-causal situations. These contrasts identify PC as the main "culprit" in a whole-brain univariate analysis. Then the question arises of whether the content-specificity has to do with inferences about animates in general, or if there are some distinctions between reasoning about people's bodies versus mental states. To answer this question, the authors localize the mentalizing network and study the relation between brain activity elicited by Illness-Causal > Mech-Causal and Mentalizing > Physical stories. They conclude that inferring about the causes of illness partially differentiates from reasoning about people's states of mind. The authors finally test the alternative yet non-mutually exclusive hypothesis that both types of causal inferences (illness and mechanical) depend on shared neural machinery. Good candidates are language and logic, which justifies the use of a language/logic localizer. No evidence of commonalities across causal inferences versus non-causal situations is found.

      Strengths:

      (1) This study introduces a useful paradigm and well-designed set of stimuli to test for implicit causal inferences.

      (2) Another important methodological advance is the addition of physical stories to the original mentalizing protocol.

      (3) With these tools, or a variant of these tools, this study has the potential to pave the way for further investigation of naïve biology and causal inference.

      Weaknesses:

      (1) This study is missing a big-picture question. It is not clear whether the authors investigate the neural correlates of causal reasoning or of naïve biology. If the former, the choice of an orthogonal task, making causal reasoning implicit, is questionable. If the latter, the choice of mechanical and physical controls can be seen as reductive and problematic.

      We have modified the Introduction to clarify that the primary goal of the current study is to test the claim that semantic networks encode causal knowledge – in this case, causal intuitive theories of biology. Most conceptions of intuitive biology, intuitive psychology, and intuitive physics describe them as causal frameworks (e.g., Wellman & Gelman, 1992; Simons & Keil, 1995; Keil et al., 1999; Tenenbaum, Griffiths, & Niyogi, 2007; Gopnik & Wellman, 2012; Gerstenberg & Tenenbaum, 2017). As noted above, we chose an implicit task to investigate automatic causal inferences during reading and to unconfound such processing as much as possible from explicit decision-making processes. We are not sure what the reviewer means when they say that mechanical and physical controls are reductive. This is the standard control condition in neural and behavioral paradigms that investigate intuitive psychology and intuitive biology (e.g., Saxe & Kanwisher, 2003; Gelman & Wellman, 1991).

      (2) The rationale for focusing mostly on the precuneus is not clear and this choice could almost be seen as a post-hoc hypothesis.

      This study is preregistered (https://osf.io/6pnqg). The preregistration states that the precuneus is a hypothesized area of interest, so this is not a post-hoc hypothesis. Our hypothesis was informed by multiple prior studies implicating the precuneus in the semantic representation of animates (e.g., people, animals) (Fairhall & Caramazza, 2013a, 2013b; Fairhall et al., 2014; Peer et al., 2015; Wang et al., 2016; Silson et al., 2019; Rabini, Ubaldi, & Fairhall, 2021; Deen & Freiwald, 2022; Aglinskas & Fairhall, 2023; Hauptman, Elli, et al., 2025). We also conducted a pilot experiment with separate participants prior to pre-registering the study. We now clarify our rationale for focusing on the precuneus in the Introduction:

      “Illness affects living things (e.g., people and animals) rather than inanimate objects (e.g., rocks, machines, houses). Thinking about living things (animates) as opposed to non-living things (inanimate objects/places) recruits partially distinct neural systems (e.g., Warrington & Shallice, 1984; Hillis & Caramazza, 1991; Caramazza & Shelton, 1998; Farah & Rabinowitz, 2003). The precuneus (PC) is part of the ‘animacy’ semantic network and responds preferentially to living things (i.e., people and animals), whether presented as images or words (Devlin et al., 2002; Fairhall & Caramazza, 2013a, 2013b; Fairhall et al., 2014; Peer et al., 2015; Wang et al., 2016; Silson et al., 2019; Rabini, Ubaldi, & Fairhall, 2021; Deen & Freiwald, 2022; Aglinskas & Fairhall, 2023; Hauptman, Elli, et al., 2025). By contrast, parts of the visual system (e.g., fusiform face area) that respond preferentially to animates do so primarily for images (Kanwisher et al., 1997; Grill-Spector et al., 2004; Noppeney et al., 2006; Mahon et al., 2009; Konkle & Caramazza, 2013; Connolly et al., 2016; see Bi et al., 2016 for a review). We hypothesized that the PC represents causal knowledge relevant to animates and tested the prediction that it would be activated during implicit causal inferences about illness, which rely on such knowledge (preregistration: https://osf.io/6pnqg).”

      (3) The choice of an orthogonal 'magic detection' task has three problematic consequences in this study:

      (a) It differs in nature from the 'mentalizing' task that consists of evaluating a character's beliefs explicitly from the corresponding story, which complicates the study of the relation between both tasks. While the authors do not compare both tasks directly, it is unclear to what extent this intrinsic difference between implicit versus explicit judgments of people's body versus mental states could influence the results.

      (b) The extent to which the failure to find shared neural machinery between both types of inferences (illness and mechanical) can be attributed to the implicit character of the task is not clear.

      (c) The introduction of a category of non-interest that contains only 36 trials compared to 38 trials for all four categories of interest creates a design imbalance.

      We disagree with the reviewer’s argument that our use of an implicit “magic detection” task is problematic. Indeed, we think it is one of the advances of the current study over prior work.

      a) Prior work has shown that implicit mentalizing tasks (e.g., naturalistic movie watching) engages the theory of mind network, suggesting that the implicit/explicit nature of the task does not drive the activation of this network (Jacoby et al., 2016; Richardson et al., 2018). With these data in mind, it is unlikely that the implicit/explicit nature of the causal inference and theory of mind tasks in the present experiment can explain observed differences between them.

      b) Explicit causal inferences introduce a collection of executive processes that potentially confound the results and make it difficult to know whether neural signatures are related to causal inference per se. The current study focuses on the neural basis of implicit causal inference, a type of inference that is made routinely during language comprehension. We do not claim to find neural signatures of all causal inferences, we do not think any study could claim to do so because causal inferences are a highly varied class.

      c) Our findings do not exclude the possibility that content-invariant responses are elicited during explicit causality judgments. We clarify this point in the Results (e.g., “These results leave open the possibility that domain-general systems support the explicit search for causal connections”) and Discussion (e.g., “The discovery of novel causal relationships (e.g., ‘blicket detectors’; Gopnik et al., 2001) and the identification of complex causes, even in the case of illness, may depend in part on domain-general neural mechanisms”).

      d) Because the magic trials are excluded from our analyses, it is unclear how the imbalance in the number of magic trials could influence the results and our interpretation of them. We note that the number of catch trials in standard target detection paradigms are sometimes much lower than the number of target trials in each condition (e.g., Pallier et al., 2011).

      (4) Another imbalance is present in the design of this study: the number of trials per category is not the same in each run of the main task. This imbalance does not seem to be accounted for in the 1st-level GLM and renders a bit problematic the subsequent use of MVPA.

      Each condition is shown either 6 or 7 times per run (maximum difference of 1 trial between conditions), and the number of trials per condition is equal across the whole experiment: each condition is shown 7 times in two of the runs and 6 times four of the runs. This minor design imbalance is typical of fMRI experiments and should not impact our interpretations of the data, particularly because we average responses from each condition within a run before submitting them to MVPA.

      (5) The main claim of the authors, encapsulated by the title of the present manuscript, is not tested directly. While the authors included in their protocol independent localizers for mentalizing, language, and logic, they did not include an independent localizer for "animacy". As such, they cannot provide a within-subject evaluation of their claim, which is entirely based on the presence of a partial overlap in PC (which is also involved in a wide range of tasks) with previous results on animacy.

      We respectfully disagree with this assertion. Our primary analysis uses a within-subject leave-one-run-out approach. This approach allows us to use part of the data itself to localize animacy-relevant causal responses in the PC without engaging in ‘double-dipping’ or statistical non-independence (Vul & Kanwisher, 2011). We also use the mentalizing network localizer as a partial localizer for animacy. This is because the control condition (physical reasoning) does not include references to people or any animate agents (Supplementary Figures 1 and 15). We now clarify this point in Methods section of the paper (see below).

      From the Methods: “To test the relationship between neural responses to inferences about the body and the mind, and to localize animacy regions, we used a localizer task to identify the mentalizing network in each participant (Saxe & Kanwisher, 2003; Dodell-Feder et al., 2011; http://saxelab.mit.edu/use-our-efficient-false-belief-localizer)...Our physical stories incorporated more vivid descriptions of physical interactions and did not make any references to human agents, enabling us to use the mentalizing localizer as a localizer for animacy.”

      Reviewer #3 (Public review):

      Summary:

      This study employed an implicit task, showing vignettes to participants while a bold signal was acquired. The aim was to capture automatic causal inferences that emerge during language processing and comprehension. In particular, the authors compared causal inferences about illness with two control conditions, causal inferences about mechanical failures and non-causal phrases related to illnesses. All phrases that were employed described contexts with people, to avoid animacy/inanimate confound in the results. The authors had a specific hypothesis concerning the role of the precuneus (PC) in being sensitive to causal inferences about illnesses.

      These findings indicate that implicit causal inferences are facilitated by semantic networks specialized for encoding causal knowledge.

      Strengths:

      The major strength of the study is the clever design of the stimuli (which are nicely matched for a number of features) which can tease apart the role of the type of causal inference (illness-causal or mechanical-causal) and the use of two localizers (logic/language and mentalizing) to investigate the hypothesis that the language and/or logical reasoning networks preferentially respond to causal inference regardless of the content domain being tested (illnesses or mechanical).

      Weaknesses:

      I have identified the following main weaknesses:

      (1) Precuneus (PC) and Temporo-Parietal junction (TPJ) show very similar patterns of results, and the manuscript is mostly focused on PC (also the abstract). To what extent does the fact that PC and TPJ show similar trends affect the inferences we can derive from the results of the paper? I wonder whether additional analyses (connectivity?) would help provide information about this network.

      We thank the reviewer for this suggestion. While the PC shows the most robust univariate preference for illness inferences compared to both mechanical inferences and noncausal vignettes, the TPJ also shows a preference for illness inferences compared to mechanical inferences in individual-subject fROI analysis. However, as we mention in the Results section, the TPJ does not show a preference for illness inferences compared to noncausal vignettes, suggesting that the TPJ is selective for animacy but may not be as sensitive to causal knowledge about animacy-specific processes. When describing our results, we refer to the ‘animacy network’ (i.e., PC and TPJ) but also highlight that the PC exhibited the most robust responses to illness inferences (from the Results: “Inferring illness causes preferentially recruited the animacy semantic network, particularly the PC”; from the Discussion: “We find that a semantic network previously implicated in thinking about animates, particularly the precuneus (PC), is preferentially engaged when people infer causes of illness…”). We did not collect resting state data that would enable a connectivity analysis, as the reviewer suggests. This is an interesting direction for future work.

      (2) Results are mainly supported by an univariate ROI approach, and the MVPA ROI approach is performed on a subregion of one of the ROI regions (left precuneus). Results could then have a limited impact on our understanding of brain functioning.

      The original and current versions of the paper include results from multiple multivariate analyses, including whole-cortex searchlight MVPA and individual-subject fROI MVPA performed in multiple search spaces (see Supplementary Figures 10 and 11, Supplementary Tables 2 and 3).

      We note that our preregistered predictions focused primarily on univariate differences. This is because the current study investigates neural responses to inferences, and univariate increases in activity is thought to reflect the processing of such inferences. We use multivariate analyses to complement our primary univariate analyses. However, given that we observe significant univariate effects and that multivariate analyses are heavily influenced by significant univariate effects (Coutanche, 2013; Kragel et al., 2012; Hebart & Baker, 2018; Woolgar et al., 2014; Davis et al., 2014; Pakravan et al., 2022), our univariate results constitute the main findings of the paper.

      (3) In all figures: there are no measures of dispersion of the data across participants. The reader can only see aggregated (mean) data. E.g., percentage signal changes (PSC) do not report measures of dispersion of the data, nor do we have bold maps showing the overlap of the response across participants. Only in Figure 2, we see the data of 6 selected participants out of 20.

      We thank the reviewer for this suggestion. We now include graphs depicting the dispersion of the data across participants in the following figures: Figures 1, 3, and 4, and Supplementary Figures 8, 12, and 14. We have also created 2 figures that display the overlap of univariate responses across participants (Supplementary Figures 6 and 7). These figures show that there is high overlap across participants in PC responses to illness inferences but not mechanical inferences. In addition, all participants’ results from the analysis depicted in Figure 2 are included in Supplementary Figure 3. 

      (4) Sometimes acronyms are defined in the text after they appear for the first time.

      We thank the reviewer for pointing this out. We now define all acronyms before using them.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) I was unable to access the pre-registration on OSF because special permission is required.

      We apologize for this technical error. The preregistration is now publicly available: https://osf.io/6pnqg.

      (2) The length of the MRI session is quite long (around 2 hours). It is generally discouraged to have such extended data acquisition periods, as this can affect the stability and cleanliness of the data. Did you observe any effects of fatigue or attention decline in your data?

      The session was 2 hours long including 1-2 10-minute breaks. Without breaks, the scan would be approximately 1.5 hours. This is a standard length for MRI experiments. The main experiment (causal inference task) was always conducted first and lasted approximately 1 hour. Accuracy did not decrease across the 6 runs of this experiment (repeated measures ANOVA, F<sub>(5,114)</sub> = 1.35, p = .25).

      (3) The last sentence of the results states: "Although MVPA searchlight analysis identified several areas where patterns of activity distinguished between causal and non-causal vignettes, all of these regions showed a preference for non-causal vignettes in univariate analysis (Supplementary Figure 5)." This statement is not entirely accurate. As I previously pointed out, the MVPA searchlight analysis is not very informative and is difficult to interpret. However, as previously suggested, there are additional steps that could be taken to better understand and interpret these results. It is incorrect to conclude that because the brain regions identified in the MVPA analyses show a preference for non-causal vignettes in univariate analyses, the multivariate results lack value. While univariate analyses may show a preference for a specific condition, multivariate analyses can reveal more fine-grained representations of multiple conditions. For a notable example, consider the fusiform face area (FFA) that shows a clear preference for faces at the univariate level but can significantly decode other categories at the multivariate level, even when faces are not included in the analysis.

      The decoding analysis that the reviewer is suggesting for the current study would be analogous to identifying univariate differences between faces and places in the FFA and then decoding between faces and places and claiming that the FFA represents places because the decoding is significant. The decoding analyses enabled by our design are not equivalent to decoding within a condition (e.g., among face identities, among types of illness inferences), as the reviewer suggests above. It is not that such multivariate analyses “lack value” but that they recapitulate established univariate differences. Multivariate analyses are useful for revealing more fine-grained representations when i) significant univariate differences are not observed, or ii) when it is possible to decode among categories within a condition (e.g., among face identities, among types of illness inferences). We are currently collecting data that will enable us to perform within-condition decoding analyses in future work, but the design of the current study does not allow for such a comparison.

      We note that the original quotation from the manuscript has been removed because it is no longer accurate. When including participant response time as a covariate of no interest in the GLM, no regions are shared across the 4 searchlight analyses comparing causal and noncausal conditions, suggesting that there are no shared neural responses to causal inference in our dataset.

      Reviewer #2 (Recommendations for the authors):

      (1) Moderating the strength of some claims made to justify the main hypothesis (e.g., "people but not machines transmit diseases to each other through physical contact").

      We changed this wording so that it now reads: “Illness affects living things (e.g., people and animals) rather than inanimate objects (e.g., rocks, machines, houses).” (Introduction)

      (2) Expanding the paragraph introducing the sub-question about inferring people's "body states" vs "mental states". In addition, given the order in which the hypotheses are introduced, and the results are presented, I would suggest switching the order of presentation of both localizers in the methods section and adding a quick reminder of the hypotheses that justify using these localizers.

      We thank the reviewer for these suggestions. In accordance their suggestions, we have expanded the paragraph Introduction that introduces the “body states” vs. “mental states” question (see below). We have also switched the order of the localizer descriptions in the Methods section and added a sentence at the start of each section describing the relevant hypotheses (see below).

      From the Introduction: “We also compared neural responses to causal inferences about the body (i.e., illness) and inferences about the mind (i.e., mental states). Both types of inferences are about animate entities, and some developmental work suggests that children use the same set of causal principles to think about bodies and minds (Carey, 1985, 1988). Other evidence suggests that by early childhood, young children have distinct causal knowledge about the body and the mind (Springer & Keil, 1991; Callanan & Oakes, 1992; Wellman & Gelman, 1992; Inagaki & Hatano, 1993; 2004; Keil, 1994; Hickling & Wellman, 2001; Medin et al., 2010). For instance, preschoolers are more likely to view illness as a consequence of biological causes, such as contagion, rather than psychological causes, such as malicious intent (Springer & Ruckel, 1992; Raman & Winer, 2004; see also Legare & Gelman, 2008). The neural relationship between inferences about bodies and minds has not been fully described. The ‘mentalizing network’, including the PC, is engaged when people reason about agents’ beliefs (Saxe & Kanwisher, 2003; Saxe et al., 2006; Saxe & Powell, 2006; Dodell-Feder et al., 2011; Dufour et al., 2013). We localized this network in individual participants and measured its neuroanatomical relationship to the network activated by illness inferences.”

      From the Methods, localizer descriptions: “To test the relationship between neural responses to inferences about the body and the mind, and to localize animacy regions, we used a localizer task to identify the mentalizing network in each participant… To test for the presence of domain-general responses to causal inference in the language and logic networks (e.g., Kuperberg et al., 2006; Operskalski & Barbey, 2017), we used an additional localizer task to identify both networks in each participant.”

      (3) Adding a quick analysis of lateralization to support the corresponding claim of left lateralization of responses to causal inferences.

      In accordance with the reviewer’s suggestion, we now include hemisphere as a factor in all ANOVAs comparing univariate responses across conditions.

      From the Results: “In individual-subject fROI analysis (leave-one-run-out), we similarly found that inferring illness causes activated the PC more than inferring causes of mechanical breakdown (repeated measures ANOVA, condition (Illness-Causal, Mechanical-Causal) x hemisphere (left, right): main effect of condition, F<sub>(1,19)</sub> = 19.18, p < .001, main effect of hemisphere, F<sub>(1,19)</sub> = 0.3, p = .59, condition x hemisphere interaction, F<sub>(1,19)</sub> = 27.48, p < .001; Figure 1A). This effect was larger in the left than in the right PC (paired samples t-tests; left PC: t<sub>(19)</sub> = 5.36, p < .001, right PC: t<sub>(19)</sub> = 2.27, p = .04)…In contrast to the animacy-responsive PC, the anterior PPA showed the opposite pattern, responding more to mechanical inferences than illness inferences (leave-one-run-out individual-subject fROI analysis; repeated measures ANOVA, condition (Mechanical-Causal, Illness-Causal) x hemisphere (left, right): main effect of condition, F<sub>(1,19)</sub> = 17.93, p < .001, main effect of hemisphere, F<sub>(1,19)</sub> = 1.33, p = .26, condition x hemisphere interaction, F<sub>(1,19)</sub> = 7.8, p = .01; Figure 4A). This effect was significant only in the left anterior PPA (paired samples t-tests; left anterior PPA: t<sub>(19)</sub> = 4, p < .001, right anterior PPA: t<sub>(19)</sub> = 1.88, p = .08).”

      (4) Making public and accessible the pre-registration OSF link.

      We apologize for this technical error. The preregistration is now publicly available: https://osf.io/6pnqg.

      Reviewer #3 (Recommendations for the authors):

      In all figures: there are no measures of dispersion of the data across participants. The reader can only see aggregated (mean) data. E.g., percentage signal changes (PSC) do not report measures of dispersion of the data, nor do we have bold maps showing the overlap of the response across participants. Only in Figure 2, we see the data of 6 selected participants out of 20.

      We thank the reviewer for this suggestion. We now include graphs depicting the dispersion of the data across participants in the following figures: Figures 1, 3, and 4, and Supplementary Figures 8, 12, and 14. We have also created 2 figures that display the overlap of univariate responses across participants (Supplementary Figures 6 and 7). In addition, all participants’ results from the analysis depicted in Figure 2 are included in Supplementary Figure 3.

      Minor

      (1) Figure 2: Spatial dissociation between responses to illness inferences and mental state inferences in the precuneus (PC). If the analysis is the result of the MVPA, the figure should report the fact that only the left precuneus was analyzed.

      Figure 2 depicts the spatial dissociation in univariate responses to illness inferences and mental state inferences. We now clarify this in the figure legend.

      (2) VOTC and PSC acronyms are defined in the text after they appear for the first time. TPJ is never defined.

      We thank the reviewer for pointing this out. We now define all acronyms before using them.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary:

      The paper addresses the knowledge gap between the representation of goal direction in the central complex and how motor systems stabilize movement toward that goal. The authors focused on two descending neurons, DNa01 and 02, and showed that they play different roles in steering the fly toward a goal. They also explored the connectome data to propose a model to explain how these DNs could mediate response to lateralized sensory inputs. They finally used lateralized optogenetic activation/inactivation experiments to test the roles of these neurons in mediating turnings in freely walking flies.

      Strengths:

      The experiments are well-designed and controlled. The experiment in Figure 4 is elegant, and the authors put a lot of effort into ensuring that ATP puffs do not accidentally activate the DNs. They also have explained complex experiments well. I only have minor comments for the authors.

      We are grateful for this positive feedback.

      Weaknesses:

      (1) I do not fully understand how the authors extracted the correlation functions from the population data in Figure 1. Since the ipsilateral DNs are anti-correlated with the contralateral ones, I expected that the average will drop to zero when they are pooled together (e.g., 1E-G). Of course, this will not be the case if all the data in Figure 1 are collected from the same brain hemisphere. It would be helpful if the authors could explain this.

      We regret that this information was not easy to find in our initial submission. As noted in the Figure 1D legend, Here and elsewhere, ipsi and contra are defined relative to the recorded DN(s). We have now added a sentence to the Results (right after we introduce Figure 1D) that also makes this point.

      (2) What constitutes the goal directions in Figures 1-3 and 8, as the authors could not use EPG activity as a proxy for goal directions? If these experiments were done in the dark, without landmarks, one would expect the fly's heading to drift randomly at times, and they would not engage the DNa01/02 for turning. Do the walking trajectories in these experiments qualify as menotactic bouts?

      Published work (Green et al., 2019) has shown that, even in the dark, flies will often walk for extended periods while holding the bump of EPG activity at a fixed location. During these epochs, the brain is essentially estimating that the fly is walking in a straight line in a fixed direction. (The fact that the fly is actually rotating a bit on the spherical treadmill is not something the fly can know, in the dark.) Thus, epochs where the EPG bump is held fixed are treated as menotactic bouts, even in darkness.

      Our results provide additional support for this interpretation. We find that, when flies are walking in darkness and holding the bump of EPG activity at a fixed location, they will make a corrective behavioral turning maneuver in response to an imposed bump-jump. This result argues that the flies are actually engaging in goal-directed straight-line walking, i.e. menotaxis, and it reproduces the findings of Green et al. (2019).

      To clarify this point, we have adjusted the wording of the Results pertaining to Figure 4.

      (3) In Figure 2B, the authors mentioned that DNa02 overpredicts and 01 underpredicts rapid turning and provided single examples. It would be nice to see more population-level quantification to support this claim.

      In this revision, we have reorganized Figures 1 and 2 (and associated text) to improve clarity. As part of this reorganization, we have removed this passage from the text, as it was a minor point in any event.

      Reviewer #2 (Public review):

      The data is largely electrophysiological recordings coupled with behavioral measurements (technically impressive) and some gain-of-function experiments in freely walking flies. Loss-of-function was tested but had minimal effect, which is not surprising in a system with partially redundant control mechanisms. The data is also consistent with/complementary to subsequent manuscripts (Yang 2023, Feng 2024, and Ros 2024) showing additional descending neurons with contributions to steering in walking and flying.

      The experiments are well executed, the results interesting, and the description clear. Some hypotheses based on connectome anatomy are tested: the insights on the pre-synaptic side - how sensory and central complex heading circuits converge onto these DNs are stronger than the suggestions about biomechanical mechanisms for how turning happens on the motor side.

      Of particular interest is the idea that different sensory cues can converge on a common motor program. The turn-toward or turn-away mechanism is initiated by valence rather than whether the stimulus was odor or temperature or memory of heading. The idea that animals choose a direction based on external sensory information and then maintain that direction as a heading through a more internal, goal-based memory mechanism, is interesting but it is hard to separate conclusively.

      To clarify, we mention the role of memory in connection with two places in the manuscript. First, we note that the EPG/head direction system relies on learning and memory to construct a map of directional cues in the environment. These cues are, in principle, inherently neutral, i.e. without valence. Second, we note that specific mushroom body output neurons rely on learning and memory to store the valence associated with an odor. This information is not necessarily associated with an allocentric direction: it is simply the association of odor with value. Both of these ideas are well-attested by previous work.

      The reviewer may be suggesting a sequential scheme whereby the brain initializes an allocentric goal direction based on valence, and then maintains that goal direction in memory, based on that initialization. In other words, memory is used to associate valence with some allocentric direction. This seems plausible, but it is not a claim we make in our manuscript.

      The "see-saw", where left-right symmetry is broken to allow a turn, presumably by excitation on one side and inhibition of the other leg motor modules, is interesting but not well explained here. How hyperpolarization affects motor outputs is not clear.

      We have added several sentences to the Discussion to clarify this point. According to this see-saw model, steering can emerge from right/left asymmetries in excitation, or inhibition, or both. It may be nonintuitive to think that inhibitory input to a DN can produce an action. However, this becomes more plausible given our finding that DNa02 has a relatively high basal firing rate (Fig. 1D), and DNa02 hyperpolarization is associated with contraversive turning (Fig. 5A). It is also relevant to note that there are many inhibitory cell types that form strong unilateral connections onto DNa02 (e.g., AOTU019).

      The statement near Figure 5B that "DNa02 activity was higher on the side ipsilateral to the attractive stimulus, but contralateral to the aversive stimulus" is really important - and only possible to see because of the dual recordings.

      We thank the reviewer for this positive feedback.

      Reviewer #3 (Public review):

      Summary:

      Rayshubskiy et al. performed whole-cell recordings from descending neurons (DNs) of fruit flies to characterize their role in steering. Two DNs implicated in "walking control" and "steering control" by previous studies (Namiki et al., 2018, Cande et al., 2018, Chen et al., 2018) were chosen by the authors for further characterization. In-vivo whole-cell recordings from DNa01 and DNa02 showed that their activity predicts spontaneous ipsilateral turning events. The recordings also showed that while DNa02 predicts transient turns DNa01 predicts slow sustained turns. However, optogenetic activation or inactivation showed relatively subtle phenotypes for both neurons (consistent with data in other recent preprints, Yang et al 2023 and Feng et al 2024). The authors also further characterized DNa02 with respect to its inputs and showed a functional connection with olfactory and thermosensory inputs as well as with the head-direction system. DNa01 is not characterized to this extent.

      Strengths:

      (1) In-vivo recordings and especially dual recordings are extremely challenging in Drosophila and provide a much higher resolution DN characterization than other recent studies that have relied on behavior or calcium imaging. Especially impressive are the simultaneous recordings from bilateral DNs (Figure 3). These bilateral recordings show clearly that DNa02 cells not only fire more during ipsilateral turning events but that they get inhibited during contralateral turns. In line with this observation, the difference between left and right DNa02 neuronal activity is a much better predictor of turning events compared to individual DNa02 activity.

      (2) Another technical feat in this work is driving local excitation in the head-direction neuronal ensemble

      (PEN-1 neurons), while simultaneously imaging its activity and performing whole-cell recordings from DNa02

      (Figure 4). This impressive approach provided a way to causally relate changes in the head-direction system to DNa02 activity. Indeed, DNa02 activity could predict the rate at which an artificially triggered bump in the PEN-1 ring attractor returns to its previous stable point.

      (3) The authors also support the above observations with connectomics analysis and provide circuit motifs that can explain how the head direction system (as well as external olfactory/thermal stimuli) communicated with DNa02. All these results unequivocally put DNa02 as an essential DN in steering control, both during exploratory navigation as well as stimulus-directed turns.

      We are grateful for this detailed positive feedback.

      Weaknesses:

      (1) I understand that the first version of this preprint was already on biorxiv in 2020, and some of the "weaknesses" I list are likely a reflection of the fact that I'm tasked to review this manuscript in late 2024 (more than 4 years later). But given this is a 2024 updated version it suffers from laying out the results in contemporary terms. For instance, the manuscript lacks any reference to the DNp09 circuit implicated in object-directed turning and upstream to DNa02 even though the authors cite one of the papers where this was analyzed (Braun et al, 2024). More importantly, these studies (both Braun et al 2024 and Sapkal et al 2024) along with recent work from the authors' lab (Yang et al 2023) and other labs (Feng et al 2024) provide a view that the entire suite of leg kinematics changes required for turning are orchestrated by populations of heterogeneous interconnected DNs. Moreover, these studies also show that this DN-DN network has some degree of hierarchy with some DNs being upstream to other DNs. In this contemporary view of steering control, DNa02 (like DNg13 from Yang et al 2023) is a downstream DN that is recruited by hierarchically upstream DNs like DNa03, DNp09, etc. In this view, DNa02 is likely to be involved in most turning events, but by itself unable to drive all the motor outputs required for the said events. This reasoning could be used while discussing the lack of major phenotypes with DNa02 activation or inactivation observed in the current study, which is in stark contrast to strong phenotypes observed in the case of hierarchically upstream DNs like DNp09 or DNa03. In the section, "Contributions of single descending neuron types to steering behavior": the authors start off by asking if individual DNs can make measurable contributions to steering behavior. Once more, any citations to DNp09 or DNa03 - two DNs that are clearly shown to drive strong turning-on activation (Bidaye et al, 2020, Feng et al 2024) - are lacking. Besides misleading the reader, such statements also digress the results away from contemporary knowledge in the field. I appreciate that the brief discussion in the section titled "Ensemble codes for steering" tries to cover these recent updates. However, I think this would serve a better purpose in the introduction and help guide the results.

      We apologize for these omissions of relevant citations, which we have now fixed. Specifically, in our revised Discussion, we now point out that:

      - Braun et al. (2024) reported that bilateral optogenetic activation of either DNa02 or DNa01 can drive turning (in either direction). 

      - Braun et al. (2024) also identified DNb02 as a steering-related DN.

      - Bidaye et al. (2020), Sapkal et al. (2024), and Braun et al. (2024) all contributed to the identification of DNp09 as a broadcaster DN with the capacity to promote ipsiversive turning.

      We have also revised the beginning of the Results section titled “Contributions of single descending neuron types to steering behavior”, as suggested by the Reviewer.

      Finally, we agree with the Reviewer’s overall point that steering is influenced by multiple DNs. We have not claimed that any DN is solely responsible for steering. As we note in the Discussion: “We found that optogenetically inhibiting DNa01 produced only small defects in steering, and inhibiting DNa02 did not produce statistically significant effects on steering; these results make sense if DNa02 is just one of many steering DNs.”

      (2) The second major weakness is the lack of any immunohistochemistry (IHC) images quantifying the expression of the genetic tools used in these studies. Even though the main split-Gal4 tools for DNa01 and DNa02 were previously reported by Namiki et al, 2018, it is important to document the expression with the effectors used in this work and explicitly mention the expression in any ectopic neurons. Similarly, for any experiments where drivers were combined together (double recordings, functional connectivity) or modified for stochastic expression (Figure 8), IHC images are absolutely necessary. Without this evidence, it is difficult to trust many of the results (especially in the case of behavioral experiments in Figure 8). For example, the DNa01 genetic driver used by the authors is also expressed in some neurons in the nerve cord (as shown on the Flylight webpage of Janelia Research Campus). One wonders if all or part of the results described in Figure 8 are due to DNa01 manipulation or manipulation of the nerve cord neurons. The same applies for optic lobe neurons in the DNa02 driver.

      This is a reasonable request. We used DN split-Gal4 lines to express three types of UAS-linked transgenes:

      (1) GFP

      In these flies, we know that expression in DNs is restricted to the DN types in question, based on published work (Namki et al., 2018), as well as the fact that we see one labeled DN soma per hemisphere. When we label both cells with GFP, we use the spike waveform to identify DNa02 and DNa01, as described in Figure S1

      (2) ReaChR

      In these flies, expression patterns were different in different flies because ReaChR expression was stochastically sparsened using hs-FLP. Expression was validated in each fly after the experiment, as described in the Methods (“Stochastic ReaChR expression”). hs-FLP-mediated sparsening will necessarily produce stochastic patterns of expression in both DNa02 and off-target cells, and this is true of all the flies in this experiment. What makes the “unilateral” flies distinct from the “bilateral” flies is that unilateral flies express ReaChR in one copy of DNa02, whereas bilateral flies express ReaChR in both copies of DNa02. On average, off-target expression will be the same in both groups.

      (3) GtACR1

      In these flies, we initially assumed that GtACR1 expression was the same as GFP expression under control of the same driver. However, we agree with the reviewer’s point that these two expression patterns are not necessarily identical. Therefore, to address the reviewer’s question, we performed immunofluorescence microscopy to characterize GtACR1 patterns in the brain and VNC of both genotypes. These expression patterns are now shown in a new supplemental figure (Figure S8). This figure shows that, as it happens, expression of GtACR1 is indeed indistinguishable from the GFP expression patterns for the same lines (archived on the FlyLight website). Both DN split-Gal4 lines are largely selective for the DNs in question, with limited off-target labeling. We have now drawn attention to this off-target labeling in the last paragraph of the Results, where the GtACR1 results are discussed.

      (3) The paper starts off with a comparative analysis of the roles of DNa01 and DNa02 during steering. Unfortunately, after this initial analysis, DNa01 is largely ignored for further characterization (e.g. with respect to inputs, connectomics, etc.), only to return in the final figure for behavioral characterization where DNa01 seems to have a stronger silencing phenotype compared to DNa02. I couldn't find an explanation for this imbalance in the characterization of DNa01 versus DNa02. Is this due to technical reasons? Or was it an informed decision due to some results? In addition to being a biased characterization, this also results in the manuscript lacking a coherent thread, which in turn makes it a bit inaccessible to the non-specialist.

      Yes, the first portion of the manuscript focuses on DNa01 and DNa02. The latter part of the manuscript transitions to focus mainly on DNa02. 

      Our rationale is noted at the point in the manuscript where we make this transition, with the section titled “Steering toward internal goals”: “Having identified steering-related DNs, we proceeded to investigate the brain circuits that provide input to these DNs. Here we decided to focus on DNa02, as this cell’s activity is predictive of larger steering maneuvers.” When we say that DNa02 is predictive of larger steering maneuvers, we are referring to several specific results:

      - We obtain larger filter amplitudes for DNa02 versus DNa01 (Fig. 2A-C). This means that, just after a unit change in DN firing rate, we see on average a larger change in steering velocity for DNa02 versus DNa01.

      - The linear filter for DNa02 has a higher variance explained, as compared to DNa01 (Fig. 2D). This means that DNa02 is more predictive of steering.

      - The relationship between firing rate and rotational velocity (150 ms later) is steeper for DNa02 than for DNa01 (Fig. 2G). This means that, if we ignore dynamics and we just regress firing rate against subsequent rotational velocity, we see a higher-gain relationship for DNa02.

      Our focus on DNa02 was also driven by connectivity considerations. In the same paragraph (the first paragraph in the section titled “Steering toward internal goals”). We note that “there are strong anatomical pathways from the central complex to DNa02”; the same is not true of DNa01. This point has also been noted by other investigators (Hulse et al. 2021).

      We don’t think this focus on DNa02 makes our work biased or inaccessible. Any study must balance breadth with depth. A useful general way to balance these constraints is to begin a study with a somewhat broader scope, and then narrow the study’s focus to obtain more in-depth information. Here, we began with comparative study of two cell types, and we progressed to the cell type that we found more compelling.

      (4) There seems to be a discrepancy with regard to what is emphasized in the main text and what is shown in Figures S3/S4 in relation to the role of these DNs in backward walking. There are only two sentences in the main text where these figures are cited.

      a) "DNa01 and DNa02 firing rate increases were not consistently followed by large changes in forward velocity

      (Figs. 1G and S3)."

      b) "We found that rotational velocity was consistently related to the difference in right-left firing rates (Fig. 3B). This relationship was essentially linear through its entire dynamic range, and was consistent across paired recordings (Fig. 3C). It was also consistent during backward walking, as well as forward walking (Fig. S4)." These main text sentences imply the role of the difference between left and right DNa02 in turning. However, the actual plots in the Figures S3 and S4 and their respective legends seem to imply a role in "backward walking". For instance, see this sentence from the legend of Figure S3 "When (ΔvoltageDNa02>>ΔvoltageDNa01), the fly is typically moving backward. When (firing rateDNa02>>firing rateDNa01), the fly is also often moving backward, but forward movement is still more common overall, and so the net effect is that forward velocity is small but still positive when (firing rateDNa02>>firing rateDNa01). Note that when we condition our analysis on behavior rather than neural activity, we do see that backward walking is associated with a large firing rate differential (Fig. S4)." This sort of discrepancy in what is emphasized in the text, versus what is emphasized in the figures, ends up confusing the reader. More importantly, I do not agree with any of these conclusions regarding the implication of backward walking. Both Figures S3 and S4 are riddled with caveats, misinterpretations, and small sample sizes. As a result, I actually support the authors' decision to not infer too much from these figures in the "main text". In fact, I would recommend going one step further and removing/modifying these figures to focus on the role of "rotational velocity". Please find my concerns about these two figures below:

      a) In Figures S3 and S4, every heat map has a different scale for the same parameter: forward velocity. S3A is -10 to +10mm/s. S3B is -6 to +6 S4B (left) is -12 to +12 and S4B (right) is -4 to +4. Since the authors are trying to depict results based on the color-coding this is highly problematic.

      b) Figure S3A legend "When (ΔvoltageDNa02>>ΔvoltageDNa01), the fly is typically moving backward." There are also several instances when ΔvoltageDNa02= ΔvoltageDNa01 and both are low (lower left quadrant) when the fly is typically moving backwards. So in my opinion, this figure in fact suggests DNa02 has no role in backward velocity control.

      c) Based on the example traces in S4A, every time the fly walks backwards it is also turning. Based on this it is important to show absolute rotational velocity in Figure S4C. It could be that the fly is turning around the backward peak which would change the interpretation from Figure S4C. Also, it is important to note that the backward velocities in S4A are unprecedentedly high. No previous reports show flies walking backwards at such high velocities (for example see Chen et al 2018, Nat Comm. for backward walking velocities on a similar setup).

      d) In my opinion, Figure S4D showing that right-left DNa02 correlates with rotational velocity, regardless of whether the fly is in a forward or backward walking state, is the only important and conclusive result in Figures S3/S4. These figures should be rearranged to only emphasize this panel.

      We agree that it is difficult to interpret some of the correlations between DN activity and forward velocity, given that forward velocity and rotational velocity are themselves correlated to some degree. This is why we did not make claims based on these results in the main text. In response to these comments, we have taken the Reviewer’s suggestion to preserve Figure S4D (now Figure S3). The other components of these supplemental figures have been removed.

      (5) Figure 3 shows a really nice analysis of the bilateral DNa02 recordings data. While Figure S5 [now Figure S4] shows that authors have a similar dataset for DNa01, a similar level analysis (Figures 3D, E) is not done for DNa01 data. Is there a reason why this is not done?

      The reason we did not do the same analysis for DNa01 is that we only have two paired DNa01-DNa01 recordings. It turned out to be substantially more difficult to perform DNa01-DNa01 recordings, as compared to DNa02-DNa02 recordings. For this reason, we were not able to get more than two of these recordings.

      (6) In Figure 4 since the authors have trials where bump-jump led to turning in the opposite direction to the DNa02 being recorded, I wonder if the authors could quantify hyperpolarization in DNa02 as is predicted from connectomics data in Figure 7.

      We agree this is an interesting question. However, DNa02 firing rate and membrane potential are variable, and stimulus-evoked hyperpolarizations in these DNs tend to be relatively small (on the order of 1 mV, in the case of a contralateral fictive olfactory stimulus, Figure 5A). In the case of our fictive olfactory stimuli, we could look carefully for these hyperpolarizations because we had a very large number of trials, and we could align these trials precisely to stimulus onset. By contrast, for the bump-jump experiments, we have a more limited number of trials, and turning onset is not so tightly time-locked to the chemogenetic stimuli; for these reasons, we are hesitant to make claims about any bump-jump-related hyperpolarization in these trials.

      (7) Figure 6 suggests that DNa02 contains information about latent steering drives. This is really interesting. However, in order to unequivocally claim this, a higher-resolution postural analysis might be needed. Especially given that DNa02 activation does not reliably evoke ipsilateral turning, these "latent" steering events could actually contain significant postural changes driven by DNa02 (making them "not latent"). Without this information, at least the authors need to explicitly mention this caveat.

      This is a good point. We cannot exclude the possibility that DNa02 is driving postural changes when the fly is stopped, and these postural changes are so small we cannot detect them. In this case, however, there would still be an interesting mismatch between the stimulus-evoked change in DNa02 firing rate (which is large) and the stimulus-evoked postural response (which would be very small). We have added language to the relevant Results section in order to make this explicit.

      (8) Figure 7 would really benefit from connectome data with synapse numbers (or weighted arrows) and a corresponding analysis of DNa01.

      In response to this comment, we have added synapses number information (represented by weighted arrows) to Figures 7C, E, and F. We also added information to the Methods to explain how cells were chosen for inclusion in this diagram. (In brief: we thresholded these connections so as to discard connections with small numbers of synapses.)

      We did perform an analogous connectome circuit analysis for DNa01, but if we use the same thresholds as we do for DNa02, we obtain a much sparser connectivity graph. We now show this in a new supplemental figure (Figure S9). MBON32 makes no monosynaptic connections onto DNa01, and it only forms one disynaptic connection, via LAL018, which is relatively weak. PFL3 and PFL2 make no mono- or disynaptic connections onto DNa01 comparable in strength to what we find for DNa02. 

      The sparser connectivity graph for DNa01 is partly due to the fact that fewer cell types converge onto DNa01 as compared to DNa02 (110 cell types, versus 287 cell types). Also, it seems that DNa01 is simply less closely connected to the central complex and mushroom body, as compared to DNa02.

      (9) In Figure 8E, the most obvious neuronal silencing phenotype is decreased sideways velocity in the case of DNa01 optogenetic silencing. In Figure S2, the inverse filter for sideways velocity for DNa01 had a higher amplitude than the rotational velocity filter. Taken together, does this point at some role for DNa01 in sideways velocity specifically?

      No. The forward filters describe the average velocity impulse response, given a brief step change in firing rate.

      Figure 1 and Figure S2 show that the sideways velocity forward filter is actually smaller for DNa01 than for DNa02. This means that a brief step change in DNa01 firing rate is followed by only a very small sideways velocity response. Conversely, the reverse filters describe the average firing rate impulse response, given a brief step change in sideways velocity. Figure S2 shows that the sideways velocity reverse filter is larger for DNa01 than for DNa02, but this means that the relationship between DNa01 activity and sideways velocity is so weak that we would need to see a very large neural response in order to get a brief step change in sideways velocity. In other words, the reverse filter says that DNa01 likely has very little role in determining sideways velocity.

      (10) In Figure 8G, the effect on inner hind leg stance prolongation is very weak, and given the huge sample size, hard to interpret. Also, it is not clear how this fits with the role of DNa01 in slow sustained turning based on recordings.

      Yes, this effect is small in magnitude, which is not too surprising, given that many DNs seem to be involved in the control of steering in walking. To clarify the interpretation of these phenotypes, we have added a paragraph to the end of the Results:

      “All these effects are weak, and so they should be interpreted with caution. Also, both DN split-Gal4 lines drive expression in a few off-target cell types, which is another reason for caution (Fig. S8). However, they suggest that both DNs can lengthen the stance phase of the ipsilateral back leg, which would cause ipsiversive turning. These results are also compatible with a scenario where both DNs decrease the step length in the ipsilateral legs, which would also cause ipsiversive turning. Step frequency does not normally change asymmetrically during turning, so the observed decrease in step frequency during optogenetic inhibition may just be a by-product of increasing step length when these DNs are inhibited.” We have also added caveats and clarifications in a new Discussion paragraph:

      “Our study does not fully answer the question of how these DNs affect leg kinematics, because we were not able to simultaneously measure DN activity and leg movement. However, our optogenetic experiments suggest that both DNs can lengthen the stance phase of the ipsilateral back leg (Fig. 8G), and/or  decrease the step length in the ipsilateral legs (Fig. 8H), either of which would cause ipsiversive turning. If these DNs have similar qualitative effects on leg kinematics, then why does DNa02 precede larger and more rapid steering events? This may be due to the fact that DNa02 receives stronger and more direct input from key steering circuits in the brain (Fig. S9). It may also relate to the fact that DNa02 has more direct connections onto motor neurons (Fig. 1B).”

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) I found the sign conventions for rotational velocity particularly confusing. Figure 3 represents clockwise rotations as +ve values, but Figure 4H represents anticlockwise rotations as positive values. But for EPG bumps, anticlockwise rotations are given negative values. Please make them consistent unless I am missing something obvious.

      Different fields use different conventions for yaw velocity. In aeronautics, a clockwise turn is generally positive. In robotics and engineering of terrestrial vehicles, a counterclockwise turn is generally positive. Historically, most Drosophila studies that quantified rotational (yaw) velocity were focused on the behavior of flying flies, and these studies generally used the convention from aeronautics, where a clockwise turn is defined as a positive turn. When we began working in the field, we adopted this convention, in order to conform to previous literature. It might be argued that walking flies are more like robots than airplanes, but it seemed to us that it was confusing to have different conventions for different behaviors of the same animal. Thus, all of the published studies from our lab define clockwise rotation as having positive rotational velocity.

      Figure 4 focuses on the role of the central complex in steering. As the fly turns clockwise (rightward), the bump of activity in EPG neurons normally moves counterclockwise around the ellipsoid body, as viewed from the posterior side (Turner-Evans et al., 2017). The posterior view is the conventional way to represent these dynamics, because (1) we and others typically image the brain from the posterior side, not the anterior side, and (2) in a posterior view, the animal’s left is on the left side of the image, and vice versa. We have added a sentence to the Figure 4A legend to clarify these points.

      Previous work has shown that, when an experimenter artificially “jumps” the EPG bump, this causes the fly to make a compensatory turn that returns the bump to (approximately) its original location (Green et al., 2019). Our work supports this observation. Specifically, we find that clockwise bump jumps are generally followed by rightward turns (which drive the bump to return to its approximate original location via a counterclockwise path), and vice versa. This is noted in the Figure 4D legend. Note that Figure 4D plots the fly’s rotational velocity during the bump return, plotted against the initial bump jump. 

      Figure 4H shows that clockwise (blue) bump returns were typically preceded by leftward turning, counter-clockwise (green) bump returns were preceded by rightward turning, as expected. This is detailed in the Figure 4H legend, and it is consistent with the coordinate frame described above.

      (2) It would be helpful to have images of the DNa01 and DNa02 split lines used in this paper, considering this paper would most likely be used widely to describe the functions of these neurons. Similarly, images of their reconstructions would be a useful addition.

      High-quality three-dimensional confocal stacks of all the driver lines used in our study are publicly available. We have added this information to the Methods (under “Fly husbandry and genotypes”). Confocal images of the full morphologies of DNa01 and DNa02 have been previously published (Namiki et al., 2018). Figure 1A is a schematic that is intended to provide a quick visual summary of this information.

      EM reconstructions of DNa01 and DNa02 are publicly accessible in a whole-brain dataset (https://codex.flywire.ai/) and a whole-VNC dataset (https://neuprint.janelia.org/). Both datasets are referenced in our study. As these datasets are easy to search and browse via user-friendly web-based tools, we expect that interested readers will have no difficulty accessing the underlying datasets directly.

      Reviewer #2 (Recommendations for the authors):

      (1) The description of the activity of the DNs that they "PREDICT steering during walking". This is an interesting word choice. Not causes, not correlates with, not encodes... does that mean the activity always precedes the action? Does that mean when you see activity, you will get behavior? This is important for assessing whether the DN activity is a cause or an effect. It is good to be cautious but it might be worth expanding on exactly what kind of connection is implied to justify the use of the word 'predict'.

      Conventionally, “predict” means “to indicate in advance”. We write that DNs “predict” certain features of behavior. We use this term because (1) these DNs correlate with certain features of behavior, and (2) changes in DN activity precede changes in behavior.

      The notion that neurons can “predict” behavior is not original to our study. Whenever neuroscientists summarize the relationship between neural activity and behavior by fitting a mathematical model (which may be as simple as a linear regression), the fitted model can be said to represent a “prediction” of behavior. These models are evaluated by comparing their predictions with measured behaviors. A good model is predictive, but it also implies that the underlying neural signal is also predictive (Levenstein et al., 2023 Journal of Neuroscience 43: 1074-1088; DOI: 10.1523/JNEUROSCI.1179-22.2022). Here, prediction simply means correlation, without necessarily implying causation. We also use “prediction” to imply correlation.

      We do not think the term “prediction” implies determinism. Meteorologists are said to predict the weather, but it is understood that their predictions are probabilistic, not deterministic. Certainly, we would not claim that there is a deterministic relationship between DN activity and behavior. Figure 2D shows that neither DN type can explain all the variance in the fly’s rotational or sideways velocity. At the same time, both DNs have significant predictive power.

      We might equally say that these DNs “encode” behavior. We have chosen to use the word “predict” rather than “encode” because we do not think it is necessary to use the framework of symbolic communication in connection with these DNs.

      We agree with the Reviewer that it is helpful to test whether any neuron that “predicts” a behavior might also “cause” this behavior. In Figure 8, we show that directly perturbing these DNs can indeed alter locomotor behavior, which suggests a causal role. Connectome analyses also suggest a causal role for these DNs in locomotor behavior (Figure 1B, see especially also Cheong et al., 2024).

      At the same time, it is clear from our results that these DNs are not “command neurons” for turning: they do not deterministically cause turning. Therefore, to avoid misunderstanding, we have generally been careful to summarize the results of our perturbation experiments by avoiding the statement that “this DN causes this behavior”. Rather, we have generally tried to say that “this DN influences this behavior”, or “this DN promotes this behavior”.

      (2) There is some concern about how the linear filter models were developed and then used to predict the relationship between firing rate and steering behavior: how exactly were the build and test data separated to avoid re-extracting the input? It reads like a self-fulfilling prophecy/tautology.

      We used conventional cross-validation for model fitting and evaluation. We apologize that this was not made explicit in our original submission; this was due to an oversight on our part. To be clear: linear filters were computed using the data from the first 20% of a given experiment. We then convolved each cell’s firing rate estimate with the computed Neuron→Behavior filter (the “forward filter”) using the data from the final 80% of the experiment, in order to generate behavioral predictions. Thus, when a model has high variance explained, this is not attributable to overfitting: rather, it quantifies the bona fide predictive power of the model. We have added this information to the Methods (under “Data analysis - Linear filter analysis”).

      (3) Type-O right above Figure 2 [now Figure 1E]: I assume spike rate fluctuations in DNa02 precede DNa01?

      Fixed. Thank you for reading the manuscript carefully.

      (4) The description of the other manuscripts about neural control of the steering as "follow-up" papers is a bit diminishing. They were likely independent works on a similar theme that happened afterwards, rather than deliberate extensions of this paper, so "subsequent" might be a more accurate description.

      We apologize, as we did not intend this to be diminishing. Given this request, we have revised “follow-up” to “subsequent”.

      (5) The idea that DNa02 is high-gain because it is more directly connected to motor neurons is a hypothesis and this should be made clear. We really don't know the functional consequences of the directness of a path or the number of synapses, and which circuits you compare to would change this. DNa02 may be a higher gain than DNa01, but what about relative to the other DNs that enter pre-motor regions? How do you handle a few synapses and several neurons in a common class? All of these connectivity-based deductions await functional tests - like yours! I think it is better to make this clear so readers don't assume a higher level of certainty than we have.

      The Reviewer asks how we handled few-synapse connections, and how we combined neurons in the same class. We apologize for not making this explicit in our original submission. We have now added this information to the Methods. Briefly, to select cell types for inclusion in Figures 7C, we identified all individual cells postsynaptic to PFL3 and presynaptic to DNa02, discarding any unitary connections with <5 synapses. We then grouped unitary connections by cell type, and then summed all synapse numbers within each connection group (e.g., summing all synapses in all PFL3→LAL126 connections). We then discarded connection groups having <200 synapses or <1% of a cell type’s pre- or postsynaptic total. Reported connection weights are per hemisphere, i.e. half of the total within each connection group. For Figure 7F we did the same, but now discarding connection groups having <70 synapses or <0.4% of a cell type’s pre- or postsynaptic total. In Figure S9, we used the same procedures for analyzing connections onto DNa01. 

      We agree that it is tricky to infer function from connectome data, and this applies to motor neuron connectivity. We bring up DN connectivity onto motor neurons in two places. First, in the Results, we note that “steering filters (i.e., rotational and sideways velocity filters) were larger for DNa02 (Fig. 2A,B). This means that an impulse change in firing rate predicts a larger change in steering for this neuron. In other words, this result suggests that DNa02 operates with higher gain. This may be related to the fact that DNa02 makes more direct output synapses onto motor neurons (Fig. 1B) [emphasis added].” We feel this is a relatively conservative statement.

      Subsequently, in the Discussion, we ask, “why does DNa02 precede larger and more rapid steering events? This may be due to the fact that DNa02 receives stronger and more direct input from key steering circuits in the brain (Fig. S9). It may also relate to the fact that DNa02 has more direct connections onto motor neurons (Fig. 1B) [emphasis added].” Again, we feel this is a relatively conservative statement.

      To be sure, none of the motor neurons postsynaptic to DNa02 actually receive most of their synaptic input from DNa02 (or indeed any DN), and this is typical of motor neurons controlling leg muscles. Rather, leg motor neurons tend to get most of their input from interneurons rather than motor neurons (Cheong et al. 2024). Available data suggests that the walking rhythm originates with intrinsic VNC central pattern generators, and the DNs that influence walking do so, in large part, by acting on VNC interneurons. These points have been detailed in recent connectome analyses (see especially Cheong et al. 2024).

      We are reluctant to broaden the scope of our connectome analyses to include other DNs for comparison, because we think these analyses are most appropriate to full-central-nervous-system-(CNS)-connectomes (brain and VNC together), which are currently under construction. Without a full-CNS-connectome, many of the DN axons in the VNC cannot be identified. In the future, we expect that full-CNS-connectomes will allow a systematic comparison of the input and output connectivity of all DN types, and probably also the tentative identification of new steering DNs. Those future analyses should generate new hypotheses about the specializations of DNa02, DNa01, and other DNs. Our study aims to help lay a conceptual foundation for that future work.

      (6) Given the emphasis on the DNa02 to Motor Neuron connectivity shown (Figure 1B) and multiple text mentions, could you include more analyses of which motor neurons are downstream and how these might be expected to affect leg movements? I would like to see the synapse numbers (Figure 1B) as well as the fraction of total output synapses. These additions would help understand the evidence for the "see-saw" model.

      We agree this is interesting. In follow-up work from our lab (Yang et al., 2023), we describe the detailed VNC connectivity linking DNa02 to motor neurons. We refer the Reviewer specifically to Figure 7 of that study (https://www.cell.com/cell/fulltext/S0092-8674(24)00962-0).

      We regret that the see-saw model was perhaps not clear in our original submission. Briefly, this model proposes that an increase in excitatory synaptic input to one DN (and/or a disinhibition of that DN) is often accompanied by an increase in inhibitory synaptic input to the contralateral DN. This model is motivated by connectome data on the brain inputs to DNa02 (Figure 7), along with our observation that excitation of one DN is often accompanied by inhibition of the contralateral DN (Figure 5). We have now added text to the Results in several places in order to clarify these points. 

      This model specifically pertains to the brain inputs to DNs, comparing the downstream targets of these DNs in the VNC would not be a test of this hypothesis. The Reviewer may be asking to see whether there is any connectivity in the brain from one DN to its contralateral partner. We do not find connections of this sort, aside from multisynaptic connections that rely on very weak links (~10 synapses per connection). Figure 7 depicts a much stronger basis for this hypothesis, involving feedforward see-saw connections from PFL3 and MBON32. 

      (7) The conclusions from the data in Figure 8 could be explained more clearly. These seem like small effect sizes on subtle differences in leg movements - maybe like what was seen in granular control by Moonwalker's circuits? Measuring joint angles or step parameters might help clarify, but a summary description would help the reader.

      We agree that these results were not explained very well in our original submission. 

      In our revised manuscript, we have added a new paragraph to the end of this Results section providing some summary and interpretation:

      “All these effects are weak, and so they should be interpreted with caution. However, they suggest that both DNs can lengthen the stance phase of the ipsilateral back leg, which would promote ipsiversive turning. These results are also compatible with a scenario where both DNs decrease the step length in the ipsilateral legs, which would also promote ipsiversive turning. Step frequency does not normally change asymmetrically during turning, so the observed decrease in step frequency during optogenetic inhibition may just be a by-product of increasing step length when these DNs are inhibited.”

      Moreover, in the Discussion, we have also added a new paragraph that synthesizes these results with other results in our study, while also noting the limitations of our study:

      “Our study does not fully answer the question of how these DNs affect leg kinematics, because we were not able to simultaneously measure DN activity and leg movement. However, our optogenetic experiments suggest that both DNs can lengthen the stance phase of the ipsilateral back leg (Fig. 8G), and/or  decrease the step length in the ipsilateral legs (Fig. 8H), either of which would promote ipsiversive turning. If these DNs have similar qualitative effects on leg kinematics, then why does DNa02 precede larger and more rapid steering events? This may be due to the fact that DNa02 receives stronger and more direct input from key steering circuits in the brain (Fig. S9). It may also relate to the fact that DNa02 has more direct connections onto motor neurons (Fig. 1B).”

      In Figure 8D-H, we measure step parameters in freely walking flies during acute optogenetic inhibition of DNa01 and DNa02. In experiments measuring neural activity in flies walking on a spherical treadmill, we did not have a way to measure step parameters. Subsequently, this methodology was developed by Yang et al. (2023) and results for DNa02 are described in that study. 

      Reviewer #3 (Recommendations for the authors):

      Minor Points:

      (1) If space allows, actual membrane potential should be mentioned when raw recordings are shown (for example Figure 1D).

      We have now added absolute membrane potential information to Figure 1d.

      (2) Typo in the sentence "To address this issue directly, we looked closely at the timing of each cell's recruitment in our dual recordings, and found that spike rate fluctuations in DNa02 typically preceded the spike rate fluctuations in DNa02 (Fig. 2A)." The final word should be "DNa01".

      Fixed. Thank you for reading the manuscript carefully.

      (3) Figure 2A - although there aren't direct connections between a01 and a02 in the connectome, the authors never rule out functional connectivity between these two. Given a02 precedes a01, shouldn't this be addressed?

      In the full brain FAFB data set, there are two disynaptic connections from DNa02 onto the ipsilateral copy of DNa01. One connection is via CB0556 (which is GABAergic), and the other is via LAL018 (which is cholinergic). The relevant DNa02 output connections are very weak: each DNa02→CB0556 connection consists of 11 synapses, whereas each DNa02→LAL018 connection consists of 10 synapses (on average). Conversely, each CB0556→DNa01 connection consists of 29 synapses, whereas  each LAL018→DNa01 connection consists of 64 synapses. In short, LAL018 is a nontrivial source of excitatory input to DNa01, but DNa02 is not positioned to exert much influence over LAL018, and the two disynaptic connections from DNa02 onto DNa01 also have the opposite sign. Thus, it seems unlikely that DNa02 is a major driver of DNa01 activity. At the same time, it is difficult to completely exclude this possibility, because we do not understand the logic of the very complicated premotor inputs to these DNs in the brain. Thus, we are hesitant to make a strong statement on this point.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Sammons, Masserini et al. examine the connectivity of different types of CA3 pyramidal cells ("thorny" and "athorny"), and how their connectivity putatively contributes to their relative timing in sharp-wave-like activity. First, using patch-clamp recordings, they characterize the degree of connectivity within and between athorny and thorny cells. Based upon these experimental results, they compute a synaptic product matrix, and use this to inform a computational model of CA3 activity. This model finds that this differential connectivity between these populations, augmented by two different types of inhibitory neurons, can account for the relative timing of activity observed in sharp waves in vivo.

      We thank the reviewer for reading our manuscript, as well as for their nice summary and constructive comments

      Strengths:

      The patch-clamp experiments are exceptionally thorough and well done. These are very challenging experiments and the authors should be commended for their in-depth characterization of CA3 connectivity.

      Thank you for the recognition of our efforts.

      Weaknesses:

      (1) The computational elements of this study feel underdeveloped. Whereas the authors do a thorough job experimentally characterizing connections between excitatory neurons, the inhibitory neurons used in the model seem to be effectivity "fit neurons" and appear to have been tuned to produce the emergent properties of CA3 sharp wave-like activity. Although I appreciate the goal was to implicate CA3 connectivity contributions to activity timing, a stronger relationship seems like it could be examined. For example, did the authors try to "break" their model? It would be informative if they attempted different synaptic product matrices (say, the juxtaposition of their experimental product matrix) and see whether experimentally-derived sequential activity could not be elicited. It seems as though this spirit of analysis was examined in Figure 4C, but only insofar as individual connectivity parameters were changed in isolation.

      Including the two interneuron types (B and C) in the model is, on the one hand, necessary to align our modeling framework to the state-of-the-art model by Evangelista et al. (2020), which assumes that these populations act as switchers between an SPW and a non-SPW state, and on the other hand, less straightforward because the connectivity involving these interneurons is largely unknown.

      For B cells, the primary criterion to set their connections to and from excitatory cells was to balance the effect of the strong recurrent excitation and to achieve a mid-range firing rate for each population during sharp wave events. Our new simulations (Figure 5B) show that the initial suppression of population T (resulting in the long delay) indeed depends in equal proportions on the outlined excitatory connections and on how strongly each excitatory population is targeted by the B interneurons. However, these simulations demonstrate that there is a broad, clearly distinct, region of the parameter space that supports a long delay between the peaks, rather than a marginal set of finetuned parameters. In addition, the simulations show that B interneurons optimally contribute to the suppression of T when they primarily target T (Fig. 5B, panels 3,7,11,12,13) rather than A (panels 4,8,9,10,11). On the contrary, as reported in the parameter table, and now also displayed graphically in the new Figure 4A (included above, with arrow sizes proportional to the synaptic product between the parameters determining the total strength of each connection), we assume B to target A less weakly than T (to make up for the higher excitability of population A). Therefore, the long delay between the peaks in our model emerges in spite of the interneuron connectivity, rather than because of it, and it is an effect of the asymmetric connectivity between the two excitatory populations, in particular the extremely low connection from A to T.

      (2) Additional explanations of how parameters for interneurons were incorporated in the model would be very helpful. As it stands, it is difficult to understand the degree to which the parameters of these neurons are biologically constrained versus used as fit parameters to produce different time windows of activity in types of CA3 pyramidal cells.

      Response included in point (1).

      Reviewer #2 (Public Review):

      Sharp wave ripples are transient oscillations occurring in the hippocampus that are thought to play an important role in organising temporal sequences during the reactivation of neuronal activity. This study addresses the mechanism by which these temporal sequences are generated in the CA3 region focusing on two different subtypes of pyramidal neurons, thorny and athorny. Using high-quality electrophysiological recordings from up to 8 pyramidal neurons at a time the authors measure the connectivity rates between these pyramidal cell subtypes in a large dataset of 348 cells. This is a significant achievement and provides important data. The most striking finding is how similar connection characteristics are between cell types. There are no differences in synaptic strength or failure rates and some small differences in connectivity rates and short-term plasticity. Using model simulations, the authors explore the implications of the differences in connectivity rates for the temporal specificity of pyramidal cell firing within sharp-wave ripple events. The simulations show that the experimentally observed connectivity rates may contribute to the previously observed temporal sequence of pyramidal cell firing during sharp wave ripples.

      Thank you very much for your careful review of our manuscript and the overall positive assessment.

      The conclusions drawn from the simulations are not experimentally tested so remain theoretical. In the simple network model, the authors include basket cell and anti-SWR interneurons but the connectivity of these cell types is not measured experimentally and variations in interneuron parameters may also influence temporal specificity of firing.

      As variations in some of these parameters can indeed influence the temporal specificity of firing, we have now performed additional simulations, the results of which are in the new Figures 5 and S5. Please also see response to Reviewer 1, point 1.

      In addition, the influence of short-term plasticity measured in their experiments is not tested in the model.

      We have now included short-term synaptic depression in all the excitatory-to-excitatory synapses and compensated for the weakened recurrent excitation by scaling some of the other parameters. The results of re-running our simulations in this alternative version of the model are reported in Figure S3 and are qualitatively analogous to those in Figure 4.

      Interestingly, the experimental data reveal a large variability in many of the measured parameters. This may strongly influence the firing of pyramidal cells during SWRs but it is not represented within the model which uses the averaged data.

      We have now incorporated variability in the following simulation parameters: the strength and latency of the four excitatory-to-excitatory connections as well as the reversal potential and leak conductance of both types of pyramidal cells, assuming variabilities similar to those observed experimentally (see Materials and Methods for details). Upon a slight re-balancing of some inhibitory connection strengths, in order to achieve comparable firing rates, we found that this version of the model also supports the generation of sharp waves with two pyramidal components (Figure S4B), and is, thus, fully analogous to our basic model. Varying the excitatory connectivities as in the original simulations (cf. Figure 4C and Figure S4C) reveals that increasing the athorny-toathorny or decreasing the athorny-to-thorny connectivity still increases the delay between the peaks, although for some connectivity values the peak of the athorny population appears more spread out in time.

      Reviewer #3 (Public Review):

      Summary:

      The hippocampal CA3 region is generally considered to be the primary site of initiation of sharp wave ripples-highly synchronous population events involved in learning and memory although the precise mechanism remains elusive. A recent study revealed that CA3 comprises two distinct pyramidal cell populations: thorny cells that receive mossy fiber input from the dentate gyrus, and athorny cells that do not. That study also showed that it is athorny cells in particular that play a key role in sharp wave initiation. In the present work, Sammons, Masserini, and colleagues expand on this by examining the connectivity probabilities among and between thorny and athorny cells. First, using whole-cell patch clamp recordings, they find an asymmetrical connectivity pattern, with athorny cells receiving the most synaptic connections from both athorny and thorny cells, and thorny cells receiving fewer. They then demonstrate in spiking neural network simulations how this asymmetrical connectivity may underlie the preferential role of athorny cells in sharp wave initiation.

      Strengths:

      The authors provide independent validation of some of the findings by Hunt et al. (2018) concerning the distinction between thorny and athorny pyramidal cells in CA3 and advance our understanding of their differential integration in CA3 microcircuits. The properties of excitatory connections among and between thorny and athorny cells described by the authors will be key in understanding CA3 functions including, but not limited to, sharp wave initiation.

      As stated in the paper, the modeling results lend support to the idea that the increased excitatory connectivity towards athorny cells plays a key role in causing them to fire before thorny cells in sharp waves. More generally, the model adds to an expanding pool of models of sharp wave ripples which should prove useful in guiding and interpreting experimental research.

      Thank you very much for your careful review of our manuscript and this positive assessment.

      Weaknesses:

      The mechanism by which athorny cells initiate sharp waves in the model is somewhat confusingly described. As far as I understood, random fluctuations in the activities of A and B neurons provide windows of opportunity for pyramidal cells to fire if they have additionally recovered from adaptive currents. Thorny and athorny pyramidal cells are then set in a winner-takes-all competition which is quickly won by the athorny cells. The main thesis of the paper seems to be that athorny cells win this competition because they receive more inputs both from themselves and from thorny cells, hence, the connectivity "underlies the sequential activation". However, it is also stated that athorny cells activate first due to their lower rheobase and steeper f-I curve, and it is also indicated in the methods that athorny (but not thorny) cells fire in bursts. It seems that it is primarily these features that make them fire first, something which apparently happens even when the A to A connectivity is set to 0albeit with a very small lag. Perhaps the authors could further clarify the differential role of single cell and network parameters in determining the sequential activation of athorny and thorny cells. Is the role of asymmetric excitatory connectivity only to enhance the initial intrinsic advantage of athorny cells? If so, could this advantage also be enhanced in other ways?

      Thank you for the time invested in the review of our manuscript. We especially thank you for pointing out that the description of these dynamics was unclear: we have now improved it in the main text and we provide here an additional summary. As correctly highlighted by Reviewer 3, athorny neurons (A) are more excitable than thorny (T) ones due to single-neuron parameters: therefore, if there is a winner-takes-all competition, they are going to win it. Whether there is a competition in the first place, however, depends on the excitatory (and inhibitory) connections. In particular, we should distinguish two questions: does the activity of populations A and B (PV baskets), without adaptation (so at the beginning of the sharp wave) suppress T? And does the activity of populations T and B suppress A?

      The four possible combinations can be appreciated, for example, in the new Figure 5A5. If A can suppress T, but T cannot suppress A (low A-to-T, high T-to-A, bottom right corner, like in the data), A “wins” and T fires later, after a long delay. If both A and T can suppress each other (both cross-connections are low, bottom left corner), we still get the same outcome: A wins because of its earlier and sharper onset (due to single-neuron parameters). If neither population can suppress the other (high cross-connections, top right corner), then there is no competition and the populations reach the peak approximately at the same time. Only in the case in which T can suppress A, but A cannot suppress T (low T-to-A, high A-to-T, top left corner, opposite to the data), then A “loses” the competition. However, since A neurons nevertheless display some early activity (again, due to the single neuron parameters), this scenario is not as clean as the reversed one: rather, A cells have an initial, small peak, then T neurons quickly take over and grow to their own peak, and then, depending on how strongly T neurons suppress A neurons, there may or may not be a second peak for the A neurons. This is the reason why, in the top left corner of Figure 5B, the statistics show either a long positive or long negative delay, depending on whether the first (small) or second (absent, for some parameters) peak of A is taken into account. In summary, the experimentally measured connectivity does not only enhance the initial intrinsic advantage of A cells, but sets up the competitive dynamics in the first place, which are crucial for the emergence of two distinct peaks, rather than a single peak involving both populations.

      Although a clear effort has been made to constrain the model with biological data, too many degrees of freedom remain that allow the modeler to make arbitrary decisions. This is not a problem in itself, but perhaps the authors could explain more of their reasoning and expand upon the differences between their modeling choices and those of others. For example, what are the conceptual or practical advantages of using adaptation in pyramidal neurons as opposed to short-term synaptic plasticity as in the model by Hunt et al.?

      It should be pointed out that the model by Hunt et al. features adaptation in pyramidal neurons as well, as the neuronal units employed are also adaptive-exponential integrate-and-fire. In an early stage of this project, we obtained from Hunt et al. the code for their model, and ascertained that adaptation is the main mechanism governing the alternations between the sharp-wave and the non-sharp-wave states, to the extent that fully removing short-term plasticity from their model does not have any significant impact on the network dynamics. Therefore, our choices are, in this regard, fully consistent with theirs. In order to confirm that synaptic depression does not significantly impact the dynamics also in our model, we now performed additional simulations (Figure S3), addressed in the main text (lines 149-151) and in the response to Reviewer 1, who expressed similar concerns.

      Relatedly, what experimental observations could validate or falsify the proposed mechanisms?

      As sharp wave generation in this model relies on disinhibitory dynamics (suppression of the anti-sharp-wave interneurons C), the model could be validated/falsified by proving/disproving that a class of interneurons with anti-sharp-wave features exists. In addition, the mechanism we proposed for the long delay between the peaks of the athorny and thorny activity requires at least some connectivity from athorny to basket and from basket to thorny neurons.

      In the data by Hunt et al., thorny cells have a higher baseline (non-SPW) firing rate, and it is claimed that it is actually stochastic correlations in their firing that are amplified by athorny cells to initiate sharp waves. However, in the current model, the firing of both types of pyramidal cells outside of ripples appears to be essentially zero. Can the model handle more realistic firing rates as described by Hunt et al., or as produced by e.g., walking around an environment tiled with place cells, or would that trigger SPWs continuously?

      When building this model, we aimed at having two clearly distinct states the network could alternate between, so we picked a rather polarized connectivity to and from the anti-sharp wave cells (C), resulting in polarized states. As a result, we obtain a low, although non-zero, activity of pyramidal neurons in non-SPW states (0.4 spikes/s for athorny and 0.2 spikes/s for thorny). These assumptions can be partially relaxed, for example in the original model by Evangelista et al. (2020), where the background firing rate of pyramidal cells is ~2 spikes/s. It should also be noted that, when walking in an environment tiled with place cells, the hippocampus is subject to additional extra-hippocampal inputs (e.g. from the medial septum, resulting in theta oscillations) and to neuromodulation, which can alter the network in various ways that we have not included in our model. However, our results are not in contradiction to transient SPW-like activity states initiated at a certain phase of the theta oscillation, when the inhibition is weakest.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) The manuscript reads like it was intended as a short-form manuscript for another journal. The introduction and discussion in particular are very brief and would benefit from being expanded and providing a bigger picture for the reader.

      We had originally aimed to submit in the eLife “short report” format. However, also thanks to the suggestion of Reviewer 1, we realized that our text would be better supported by extended introduction and discussion sections, as well as additional figures.

      (2) Graphs would benefit from including all datapoints, where appropriate.

      All datapoints have now been added to boxplots in the main figures and supplement.

      (3) The panels of Figure 4 are laid out strangely, it may be worthwhile to adjust.

      We thank the reviewer for this suggestion. We have now adjusted the layout of Figure 4 and believe it is now easier to follow.

      Reviewer #2 (Recommendations For The Authors):

      Useful points to address include:

      (1) Explore within the model the effect of altering interneuron connectivity. Are there other factors that can influence temporal specificity within SWRs?

      The effects of varying the connectivity to and from B interneurons (the ones which are SPWactive and therefore relevant for temporal specificity) have now been investigated in the new Figure 5B, in which such parameters were varied in pairs or combined with the two most relevant excitatoryto-excitatory connections.

      (2) Implement the experimentally observed short-term plasticity in the model to determine how this influences temporal specificity.

      All the findings in Figure 4 have now been replicated in the new Figure S3, in which excitatory-to-excitatory synapses feature synaptic depression.

      (3) Consider if it is possible to incorporate observed experimental variability in the model and explore the implications.

      All the findings in Figure 4 have now been replicated in the new Figure S4, in which heterogeneity has been introduced in multiple neuronal and synaptic parameters of thorny and athorny neurons.

      (4) Include the co-connectivity rates in the data. Ie how many of the recorded neurons are reciprocally connected? Does this change the model simulations?

      We have now added the rates of reciprocal connections that we observed into the main text (lines 8688). We found 2 pairs of reciprocally connected athorny neurons and 2 pairs of reciprocally connected thorny neurons. These rates of reciprocity were not statistically significant. We did not observe reciprocal connections in other paired neuron combinations (i.e. athorny-thorny or vice-versa). Coconnectivity does not have any effect on the model simulations, as the model includes thousands of neurons grouped in populations without specific sub-structures. It might, however, be more relevant if the excitatory populations were further subdivided in assemblies.

      Reviewer #3 (Recommendations For The Authors):

      (1) Specify which part of CA3 you are recording from.

      We have added this information into our results section - we recorded from 20 cells in CA3a, 274 cells in CA3b and 54 cells in CA3c. This information can now be found in the text on lines 68-69.

      (2) Comment on why you might observe a larger fraction of athorny cells than Hunt et al.

      Hunt et al. cite a broad range for the fraction of athorny cells in their discussion (10-20%). It is unclear where these estimates originate from. In their study, Hunt et al. use the bursting and nonbursting phenotypes as proxies for athorny and thorny cells respectively, and report here numbers of 32 and 70 equating to 31% athorny and 69% thorny. This fraction of athorny cells is more or less in line with our own findings, albeit slightly lower (34% and 66%). However, we believe this difference falls within the range of experimental variability. One caveat is that our electrophysiological recordings likely represent a biased sample of cells. In particular, with multipatch recordings, placement of later electrodes is often restricted to the borders of the pyramidal layer so as not to disturb already patched cells. Thus, our recorded cells do not represent a fully random sample of CA3 pyramidal cells. We believe that, only once a reliable genetic marker for athorny cells has been established can the size of this cell population be properly estimated. Furthermore, the ratio of thorny and athorny cells varies along the proximal distal axis of the CA3 so differences in ratios seen between our study and Hunt et al. may arise from sampling differences along this axis.

      (3) In Figure 3, Aiii (the cell fractions) could also be represented as a vector of two squares stacked one on top of the other, then you could add multiplication signs between Ai, Aii and Aiii, and an equal sign between Aiii and Aiv.

      Thank you! We have implemented this very nice suggestion.

      (4) In Figure 4A, it would be helpful to display the strength of the connections similar to how it is done in Figure 3B.

      We thank the reviewer for this suggestion. We have now updated Fig 4A to include connection strengths.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Cognitive and brain development during the first two years of life is vast and determinant for later development. However, longitudinal infant studies are complicated and restricted to occidental high-income countries. This study uses fNIRS to investigate the developmental trajectories of functional connectivity networks in infants from a rural community in Gambia. In addition to resting-state data collected from 5 to 24 months, the authors collected growing measures from birth until 24 months and administrated an executive functioning task at 3 or 5 years old.

      The results show left and right frontal-middle and right frontal-posterior negative connections at 5 months that increase with age (i.e., become less negative). Interestingly, contrary to previous findings in high-income countries, there was a decrease in frontal interhemispheric connectivity. Restricted growth during the first months of life was associated with stronger frontal interhemispheric connectivity and weaker right frontal-posterior connectivity at 24 months. Additionally, the study describes that some connectivity patterns related to better cognitive flexibility at pre-school age.

      Strengths:

      - The authors analyze data from 204 infants from a rural area of Gambia, already a big sample for most infant studies. The study might encourage more research on different underrepresented infant populations (i.e., infants not living in occidental high-income countries).

      - The study shows that fNIRS is a feasible instrument to investigate cognitive development when access to fMRI is not possible or outside a lab setting.

      - The fNIRS data preprocessing and analysis are well-planned, implemented, and carefully described. For example, the authors report how the choices in the parameters for the motion artifacts detection algorithm affect data rejection and show how connectivity stability varies with the length of the data segment to justify the threshold of at least 250 seconds free of artifacts for inclusion.

      - The authors use proper statistical methods for analysis, considering the complexity of the dataset.

      We thank the reviewer for highlighting the strengths of this work.

      Weaknesses:

      - No co-registration of the optodes is implemented. The authors checked for correct placement by looking at pictures taken during the testing session. However, head shape and size differences might affect the results, especially considering that the study involves infants from 5 months to 24 months and that the same fNIRS array was used at all ages.

      The fNIRS array used in this work was co-registered onto age-appropriate MNI templates at every time point in a previous published work L. H. Collins-Jones, et al., Longitudinal infant fNIRS channel-space analyses are robust to variability parameters at the group-level: An image reconstruction investigation. Neuroimage 237, 118068 (2021). This is reference No. 68 in the manuscript.

      As we mentioned in the section fNIRS preprocessing and data-analysis: ‘The sections were established via the 17 channels of each hemisphere which were grouped into front, middle and back (for a total of six regions) based on a previous co-registration of the BRIGHT fNIRS arrays onto age-appropriate templates’. The procedure mentioned by the reviewer, involving the examination of pictures showing the placement of headbands on participants, aimed to exclude infants with excessive cap displacement from further analysis.

      - The authors regress the global signal to remove systemic physiological noise. While the authors also report the changes in connectivity without global signal regression, there are some critical differences. In particular, the apparent decrease in frontal inter-hemispheric connections is not present when global signal regression is omitted, even though it is present for deoxy-Hb. The authors use connectivity results obtained after applying global signal regression for further analysis. The choice of regressing the global signal is questionable since it has been shown to introduce anti-correlations in fMRI data (Murphy et al., 2009), and fNIRS in young infants does not seem to be highly affected by physiological noise (Emberson et al., 2016). Systemic physiological noise might change at different ages, which makes its remotion critical to investigate functional network development. However, global signal regression might also affect the data differently. The study would have benefited from having short separation channels to measure the systemic psychological component in the data.

      The work of Emberson et. al (2016) mentioned by the reviewer highlights indeed the challenges of removing systemic changes from the infants’ haemodynamic signal with short-channel separation (SSC). In fact, even a SSC of 1 cm detected changes in the blood in the brain, therefore by regressing this signal from the recorded one, the authors removed both systemic changes AND haemodynamic signal. This paper from Emberson et. al (2016) is taken as a reference in the field to suggest that SSC might not be an ideal tool to remove systemic changes when collecting fNIRS data on young infants, as we did in this work.

      We agree with the reviewer's observation that systemic physiological noise may vary with age and among infants. Therefore, for each infant at each age, we regressed the mean value calculated across all channels. This ensures that the regressed signal is not biased by averaged calculations at group levels.

      We are aware of the criticisms directed towards global signal regression in the fMRI literature, although some other works showed anticorrelations in functional connectivity networks both with and without global signal regression (Chaia, 2012). Furthermore, Murphy himself revised his criticism on the use of global signal regression in functional connectivity analysis in one of his more recent works (Murphy et al, 2017). The fact that the decreased FC is significant in results from data pre-processed without global signal regression gives us confidence that this finding is statistically robust and not solely driven by this preprocessing choice in our pipeline.

      An interesting study by Abdalmalak et al. (2022) demonstrated that failing to correct for systemic changes using any method is inappropriate when estimating FC with fNIRS, as it can lead to a high risk of elevated connectivity across the whole brain (see Figure 4 of the mentioned paper). Consequently, we strongly advocate for the implementation of global signal regression in our analysis pipeline as a fundamental step for accurate functional connectivity estimations.

      References:

      Emberson, L. L., Crosswhite, S. L., Goodwin, J. R., Berger, A. J., & Aslin, R. N. (2016). Isolating the effects of surface vasculature in infant neuroimaging using short-distance optical channels: a combination of local and global effects. Neurophotonics, 3(3), 031406-031406.

      Chaia, X. J., Castañóna, A. N., Öngürb, D., & Whitfield-Gabrielia, S. (2012). Anticorrelations in resting state networks without global signal regression. NeuroImage, 59(2), 1420–1428. https://doi.org/10.1515/9783050076010-014

      Murphy, K., & Fox, M. D. (2017). Towards a consensus regarding global signal regression for resting state functional connectivity MRI. NeuroImage, 154(November 2016), 169–173. https://doi.org/10.1016/j.neuroimage.2016.11.052

      Abdalmalak, A., Novi, S. L., Kazazian, K., Norton, L., Benaglia, T., Slessarev, M., ... & Owen, A. M. (2022). Effects of systemic physiology on mapping resting-state networks using functional near-infrared spectroscopy. Frontiers in neuroscience, 16, 803297.

      - I believe the authors bypass a fundamental point in their framing. When discussing the results, the authors compare the developmental trajectories of the infants tested in a rural area of Gambia with the trajectories reported in previous studies on infants growing in occidental high-income countries (likely in urban contexts) and attribute the differences to adverse effects (i.e., nutritional deficits). Differences in developmental trajectories might also derive from other environmental and cultural differences that do not necessarily lead to poor cognitive development.

      We agree with the reviewer that other factors differing between low- and poor-resource settings might have an impact on FC trajectories. We therefore specified this in the discussion as follows: “We acknowledge that differences in FC could also be attributed to other environmental and cultural disparities between high-resource and low-resource settings, and future studies are needed to investigate this further” (line 238).

      - While the study provides a solid description of the functional connectivity changes in the first two years of life at the group level, the evidence regarding the links between adverse situations, developmental trajectories, and later cognitive capacities is weaker. The authors find that early restricted growth predicts specific connectivity patterns at 24 months and that certain connectivity patterns at specific ages predict cognitive flexibility. However, the link between development trajectories (individual changes in connectivity) with growth and later cognitive capacities is missing. To address this question adequately, the study should have compared infants with different growing profiles or those who suffered or did not from undernutrition. However, as the authors discussed, they lacked statistical power.

      We agree with the reviewer, and indeed we highlighted this as one of the main limitation of our work: “Even given the large sample in our study, we were underpowered to test for group comparisons between sets of infants with distinct undernutrition growth profiles, e.g., infants with early poor growth that later resolved and infants with standard growth early that had a poor growth later. We were also underpowered to test the associations between early growth and FC on clinically undernourished infants (defined as having DWLZ two standard deviations below the mean) (line 311, discussion section).

      We believe this is an important point to consider for the field, as it addresses the sample size required for studies investigating brain development in clinically malnourished infants. We hope this will serve as a valuable reference for future studies in the field. For example, a new study led by Prof. Sophie Moore and other members of the BRIGHT team (INDiGO) is currently recruiting six-hundreds pregnant women with the aim of obtaining a broader distribution of infants’ growth measures (https://www.kcl.ac.uk/research/sophie-moore-research-group).

      Reviewer #2 (Public Review):

      Summary and strengths:

      The article pertains to a topic of importance, specifically early life growth faltering, a marker of undernutrition, and how it influences brain functional connectivity and cognitive development. In addition, the data collection was laborious, and data preprocessing was quite rigorous to ensure data quality, utilizing cutting-edge preprocessing methods.

      We thank the reviewer for highlighting the strengths of this work.

      Weaknesses:

      However, the subsequent analysis and explanations were not very thorough, which made some results and conclusions less convincing. For example, corrections for multiple tests need to be consistently maintained; if the results do not survive multiple corrections, they should not be discussed as significant results. Additionally, alternative plans for analysis strategies could be worth exploring, e.g., using ΔFC in addition to FC at a certain age. Lastly, some analysis plans lacked a strong theoretical foundation, such as the relationship between functional connectivity (FC) between certain ROIs and the development of cognitive flexibility.

      Thus, as much as I admire the advanced analysis of connectivity that was conducted and the uniqueness of longitudinal fNIRS data from these samples (even the sheer effort to collect fNIRS longitudinally in a low-income country at such a scale!), I have reservations about the importance of this paper's contribution to the field in its present form. Major revisions are needed, in my opinion, to enhance the paper's quality. 

      We acknowledge the reviewer’s concern regarding the reporting of results that do not survive multiple comparisons. However, considering the uniqueness of our dataset and the novelty of our work, we believe it is crucial to report all significant findings as well as hypothesis-generating findings that may not pass stringent significance thresholds. We have taken great care to transparently distinguish between results that survived multiple comparisons and those that did not in both the Results and Discussion sections, ensuring that readers are not misled. It is possible that future studies may replicate and further strengthen these associations. Therefore, by sharing these results with the research community, we provide valuable insights for future investigations.

      The relationship between FC and cognitive flexibility (as well as the relationship between growth and FC) has been explored focusing on those FC that showed a significant change with age, as specified in the results sections: ‘To investigate the impact of early nutritional status on FC at 24 months, we used multiple regression with the infant growth trajectory [...] and FC at 24 months [...]. To maximise power, we considered only those FC that showed a statistically significant change with age’ (line 183) and ‘To investigate whether FC early in life predicted cognitive flexibility at preschool age, we used multiple regression of FC across the first two years of life against later cognitive flexibility in preschoolers at three and five years. As per the analysis above, we focused on only those FC that showed a statistically significant change with age’ (line 198).

      We explored the possibility of investigating the relationship between changes in FC and changes in growth. However, the degrees of freedom in these analyses dropped dramatically (~25/30), thereby putting the significance and the meaning of the results at risk. We look forward to future longitudinal studies with less attrition across these time points to maintain the statistical power necessary to run such analyses.

      Reviewer #3 (Public Review):

      Summary:

      This study aimed to investigate whether the development of functional connectivity (FC) is modulated by early physical growth and whether these might impact cognitive development in childhood. This question was investigated by studying a large group of infants (N=204) assessed in Gambia with fNIRS at 5 visits between 5 and 24 months of age. Given the complexity of data acquisition at these ages and following data processing, data could be analyzed for 53 to 97 infants per age group. FC was analyzed considering 6 ensembles of brain regions and thus 21 types of connections. Results suggested that: i) compared to previously studied groups, this group of Gambian infants have different FC trajectory, in particular with a change in frontal inter-hemispheric FC with age from positive to null values; ii) early physical growth, measured through weight-for-length z-scores from birth on, is associated with FC at 24 months. Some relationships were further observed between FC during the first two years and cognitive flexibility at 4-5 years of age, but results did not survive corrections for multiple comparisons.

      Strengths:

      The question investigated in this article is important for understanding the role of early growth and undernutrition on brain and behavioral development in infants and children. The longitudinal approach considered is highly relevant to investigate neurodevelopmental trajectories. Furthermore, this study targets a little-studied population from a low-/middle-income country, which was made possible by the use of fNIRS outside the lab environment. The collected dataset is thus impressive and it opens up a wide range of analytical possibilities.

      We thank the reviewer for highlighting the strengths of this work.

      Weaknesses:

      - Analyzing such a huge amount of collected data at several ages is not an easy task to test developmental relationships between growth, FC, and behavioral capacities. In its present form, this study and the performed analyses lack clarity, unity and perhaps modeling, as it suggests that all possible associations were tested in an exploratory way without clear mechanistic hypotheses. Would it be possible to specify some hypotheses to reduce the number of tests performed? In particular, considering metrics at specific ages or changes in the metrics with age might allow us to test different hypotheses: the authors might clarify what they expect specifically for growth-FC-behaviour associations. Since some FC measures and changes might be related to one another, would it be reasonable to consider a dimensionality reduction approach (e.g., ICA) to select a few components for further correlation analyses?

      We confirm that this work was motivated by a compelling theoretical question: whether neural mechanisms, specifically FC, can be influenced by early adversity, such as growth, and subsequently impact cognitive outcomes, such as cognitive flexibility. This aligns with the overarching goal of the BRIGHT project, established in 2015 (Lloyd-Fox, 2023). We believe this was evident throughout the manuscript in several instances, for example:

      - “The goal of the study was to investigate early physical growth in infancy, developmental trajectories of brain FC across the first two years of life, and cognitive outcome at school age in a longitudinal cohort of infants and children from rural Gambia, an environment with high rates of maternal and child undernutrition. Specifically, we aimed to: (i) investigate whether differences in physical growth through the first two years of life are related to FC at 24 months, and (ii) investigate if trajectories of early FC have an impact on cognitive outcome at pre-school age in these children.” (page 4, introduction)

      - “This study investigated how early adversity via undernutrition drives longitudinal changes in brain functional connectivity at five time points throughout the first two years of life and how these developmental trajectories are associated with cognitive flexibility at preschool age.” (page 6, discussion)

      - We had a clear hypothesis regarding short-range connectivity decreasing with age and long-range connectivity increasing with age, as stated at the end of the introduction: We hypothesized that (i) long-range FC would increase and short-range FC would decrease throughout the first two years of life” (page 4, line 147). However, we were not able to formulate clear hypotheses about the localization of these connections due to the scarcity of previous studies conducted within this age range, particularly in low-resource settings. The ROI approach for analysis was chosen to mitigate this challenge by reducing the number of comparisons while still enabling us to estimate the developmental trajectories of all the connections from which we acquired data.

      Regarding the use of dimensionality reduction approach, we have not considered the use of ICA in our analysis. These methods require selecting a fixed number of components to remove from all participants. However, due to the high variability of infant fNIRS data across the five timepoints, we considered it untenable to precisely determine the number of components to remove at the group level. Such a procedure carries the risk of over-cleaning the data for some participants while leaving noise in for others (Di Lorenzo, 2019). We also felt that using PCA in this initial study would be beyond the scope of the brain-region-specific hypotheses and would be more appropriate in a follow-up analysis of these important data.

      References:

      Lloyd-Fox, S., McCann, S., Milosavljevic, B., Katus, L., Blasi, A., Bulgarelli, C., Crespo-Llado, M., Ghillia, G., Fadera, T., Mbye, E., Mason, L., Njai, F., Njie, O., Perapoch-Amado, M., Rozhko, M., Sosseh, F., Saidykhan, M., Touray, E., Moore, S. E., … Team, and the B. S. (2023). The Brain Imaging for Global Health (BRIGHT) Study: Cohort Study Protocol. Gates Open Research, 7(126).

      Di Lorenzo, R., Pirazzoli, L., Blasi, A., Bulgarelli, C., Hakuno, Y., Minagawa, Y., & Brigadoi, S. (2019). Recommendations for motion correction of infant fNIRS data applicable to multiple data sets and acquisition systems. NeuroImage, 200(April), 511–527.

      - It seems that neurodevelopmental trajectories over the whole period (5-24 months) are little investigated, and considering more robust statistical analyses would be an important aspect to strengthen the results. The discussion mentions the potential use of structural equation modelling analyses, which would be a relevant way to better describe such complex data.

      We appreciate the complexity of the dataset we are working with, which includes multiple measures and time points. Currently, our focus within the outputs from the BRIGHT project is on examining the relationship between selected measures. While this may not involve statistically advanced modelling at the moment, it is worth noting that most of the results presented in this work have survived correction for multiple comparisons, indicating their statistical robustness. We believe that more advanced statistical analyses are beyond the scope of this rich initial study. In the next phase of the project, known as BRIGHT IMPACT, our team is collaborating with statisticians and experts in statistical modelling to apply more sophisticated and advanced statistical techniques to the data.

      - Given the number of analyses performed, only describing results that survive correction for multiple comparisons is required. Unifying the correction approach (FDR / Bonferroni) is also recommended. For the association between cognitive flexibility and FC, results are not significant, and one might wonder why FC at specific ages was considered rather than the change in FC with age. One of the relevant questions of such a study would be whether early growth and later cognitive flexibility are related through FC development, but testing this would require a mediation analysis that was not performed.

      We acknowledge the reviewer’s concern regarding the reporting of results that do not survive multiple comparisons. However, considering the uniqueness of our dataset and the novelty of our work, we believe it is crucial to report all significant findings. We have taken great care to transparently distinguish between results that survived multiple comparisons and those that did not in both the Results and Discussion sections, ensuring that readers are not misled. It is possible that future studies may replicate and further strengthen these associations. Therefore, by sharing these results with the research community, we provide valuable insights for future investigations.

      We did not perform a mediation analysis as i) ΔWLZ between birth and the subsequent time points positively predicted frontal interhemispheric FC at 24 months, ii) frontal interhemispheric FC at 18 months (and right fronto-posterior connectivity at 24 months) predicted cognitive flexibility at preschool age. Considering that the frontal interhemispheric FC at 24 months that was positively predicted by growth, did not significantly predicted cognitive outcome at preschool age, we did not perform mediation models.

      The reviewer raised concerns about using different methods to correct for multiple comparisons throughout the work. Results showing changes in FC with age were Bonferroni corrected, while we used FDR correction for the regression analyses investigating the relationship between growth and FC, as well as FC and cognitive flexibility. Both methods have good control over Type I errors (false positives), but Bonferroni is very conservative, increasing the likelihood of Type II errors (false negatives). We considered Bonferroni an appropriate method for correcting results showing changes in FC with age, where we had a large sample with strong statistical power (i.e. linear mixed models with 132 participants who had at least 250 seconds of good data for 2 out of 5 visits). However, Bonferroni was too conservative for the regression analyses, with N between 57 and 78) (Acharya, 2014; Félix & Menezes, 2018; Narkevich et al., 2020; Narum, 2006; Olejnik et al., 1997).

      References:

      Acharya, A. (2014). A Complete Review of Controlling the FDR in a Multiple Comparison Problem Framework--The Benjamini-Hochberg Algorithm. ArXiv Preprint ArXiv:1406.7117.

      Félix, V. B., & Menezes, A. F. B. (2018). Comparisons of ten corrections methods for t-test in multiple comparisons via Monte Carlo study. Electronic Journal of Applied Statistical Analysis, 11(1), 74–91.

      Narkevich, A. N., Vinogradov, K. A., & Grjibovski, A. M. (2020). Multiple comparisons in biomedical research: the problem and its solutions. Ekologiya Cheloveka (Human Ecology), 27(10), 55–64.

      Narum, S. R. (2006). Beyond Bonferroni: less conservative analyses for conservation genetics. Conservation Genetics, 7, 783–787.

      Olejnik, S., Li, J., Supattathum, S., & Huberty, C. J. (1997). Multiple testing and statistical power with modified Bonferroni procedures. Journal of Educational and Behavioral Statistics, 22(4), 389–406.

      - Growth is measured at different ages through different metrics. Justifying the use of weight-for-length z-scores would be welcome since weight-for-age z-scores might be a better marker of growth and possible undernutrition (this impacting potentially both weight and length). Showing the distributions of these z-scores at different ages would allow the reader to estimate the growth variability across infants.

      We consistently used WLZ as the metric to measure growth throughout. Our analysis investigating the relationship between WLZ and growth included HCZ at 7/14 days to correct for head size at birth. When selecting the best growth measure for this paper, we opted for WLZ over WAZ, given extant evidence that infants in our sample are smaller and shorter compared to the reference WHO standard for the same age group (Nabwera et al., 2017). Therefore, using WLZ allows us to adjust each infant's weight for its own length.

      References:

      Nabwera, H. M., Fulford, A. J., Moore, S. E., & Prentice, A. M. (2017). Growth faltering in rural Gambian children after four decades of interventions: a retrospective cohort study. The Lancet Global Health, 5(2), e208–e216.

      - Regarding FC, clarifications about the long-range vs short-range connections would be welcome, as well as drawing a summary of what is expected in terms of FC "typical" trajectory, for the different brain regions and connections, as a marker of typical development. For instance, the authors suggest that an increase in long-range connectivity vs a decrease in short-range is expected based on previous fNIRS studies. However anatomical studies of white matter growth and maturation would suggest the reverse pattern (short-range connections developing mostly after birth, contrarily to long-range connections prenatally).

      We expected an increase in long-range functional connectivity with age, as discussed in the introduction:

      - “Based on data from fMRI, current models hypothesize that FC patterns mature throughout early development (23–27), where in typically developing brains, adult-like networks emerge over the first years of life as long-range functional connections between pre-frontal, parietal, temporal, and occipital regions become stronger and more selective (28–31). This maturation in FC has been shown to be related to the cascading maturation of myelination and synaptogenesis (32, 33) - fundamental processes for healthy brain development (34)” (line 93, page 3, introduction);

      - “Importantly, normative developmental patterns may be disrupted and even reversed in clinical conditions that impact development; e.g., increased short-range and reduced long-range FC have been observed in preterm infants (36) and in children with autism spectrum disorder (37, 38)” (line 103, page 3, introduction);

      - “We hypothesized that (i) long-range FC would increase and short-range FC would decrease throughout the first two years of life” (line 147, page 4, introduction).

      Since inferences about FC patterns recorded with fNIRS are highly limited by the number and locations of the optodes, it is challenging to make strong inferences about specific brain regions. Moreover, infant FC fNIRS studies are still limited, which is why we focused our inferences on long-range versus short-range connectivity, without specifically pinpointing particular brain regions.

      Additionally, were unable to locate the works mentioned by the reviewer regarding an increase in short-range white matter connectivity immediately after birth. On the contrary, we found several studies documenting an increase in white-matter long-range connectivity after birth, which is consistent with the hypothesised increase in FC long-range connectivity, such as:

      Yap, P. T., Fan, Y., Chen, Y., Gilmore, J. H., Lin, W., & Shen, D. (2011). Development trends of white matter connectivity in the first years of life. PloS one, 6(9), e24678.

      Dubois, J., Dehaene-Lambertz, G., Kulikova, S., Poupon, C., Hüppi, P. S., & Hertz-Pannier, L. (2014). The early development of brain white matter: a review of imaging studies in fetuses, newborns and infants. Neuroscience, 276, 48-71.

      Stephens, R. L., Langworthy, B. W., Short, S. J., Girault, J. B., Styner, M. A., & Gilmore, J. H. (2020). White matter development from birth to 6 years of age: a longitudinal study. Cerebral Cortex, 30(12), 6152-6168.

      Hagmann, P., Sporns, O., Madan, N., Cammoun, L., Pienaar, R., Wedeen, V. J., ... & Grant, P. E. (2010). White matter maturation reshapes structural connectivity in the late developing human brain. Proceedings of the National Academy of Sciences, 107(44), 19067-19072.

      Collin G, van den Heuvel MP. The ontogeny of the human connectome: development and dynamic changes of brain connectivity across the life span. Neuroscientist. 2013 Dec;19(6):616-28. doi: 10.1177/1073858413503712.

      The authors test associations between FC and growth, but making sense of such modulation results is difficult without a clearer view of developmental changes per se (e.g., what does an early negative FC mean? Is it an increase in FC when the value gets close to 0? In particular, at 24m, it seems that most FC values are not significantly different from 0, Figure 2B). Observing positive vs negative association effects depending on age is quite puzzling. It is also questionable, for some correlation analyses with cognitive flexibility, to focus on FC that changes with age but to consider FC at a given age.

      We thank the reviewer for bringing up this important point and understand that it requires some additional consideration. The negative FC values decreasing with age indicate that these regions go from being anti-correlated to becoming increasingly correlated. Hence, FC of these ROIs increased with age. The trajectory seems to suggest that this will keep increasing with age but of course further data need to be collected to assess this.

      Unfortunately, when considering ΔFC to predict cognitive flexibility, the numbers of participants dropped significantly, with N=~15/20 infants per group of preschoolers, making it very challenging to interpret the results with meaningful statistical power.

      - The manuscript uses inappropriate terms "to predict", "prediction" whereas the conducted analyses are not prediction analyses but correlational.

      We thank the reviewer for giving us to opportunity to thoroughly revise the manuscript about this matter. In this work, we had clear hypotheses regarding which variables predicted which certain measures (such as growth predicting FC and FC predicting cognitive outcomes). Therefore, we performed regression analyses rather than correlational analyses to investigate these associations. Hence, we believe that using the term ‘predict and ‘prediction’ is appropriate

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) In the introduction and discussion, the authors talk about the link between developmental trajectories and cognitive capacities, and undernutrition. However, they did not compare developmental trajectories but connectivity patterns at different ages with ΔWLZ and cognitive flexibility. I recommend that the authors rephrase the introduction and discussion.

      We thank the reviewer for pointing out places requiring better clarity in the text. We made edits through the introduction to better match our investigations. In particular we changed:

      - ‘our understanding of the relationships between early undernutrition, developmental trajectories of brain connectivity, and later cognitive outcomes is still very limited,’ to, ‘our understanding of the relationships between early undernutrition, brain connectivity, and later cognitive outcomes is still very limited’ (line 89, introduction);

      - ‘(ii) investigate if trajectories of early FC have an impact on cognitive outcome at pre-school age in these children,’ to, ‘(ii) investigate if early FC has an impact on cognitive outcome at pre-school age in these children’ (line 137, introduction);

      - ‘This study investigated how early adversity via undernutrition drives longitudinal changes in brain functional connectivity at five time points throughout the first two years of life and how these developmental trajectories are associated with cognitive flexibility at preschool age,’ to, ‘This study investigated how early adversity via undernutrition drives brain functional connectivity throughout the first two years of life and how these early functional connections are associated with cognitive flexibility at preschool age’ (line 215, discussion).

      (2) Considering most research is done in occidental high-income countries, and this work is one of the few presenting research in another context, I think the authors should discuss in the manuscript that differences with previous studies might also be due to environmental and cultural differences. Since the study lacks the statistical power to perform a statistical analysis that directly establishes a link between developmental trajectories and restricted growth and cognitive flexibility, the authors cannot disentangle which differences are related to undernutrition and which might result from growing up in a different environment. I recommend that the authors avoid phrases like (lines 57-58): "We observed that early physical growth before the fifth month of life drove optimal developmental trajectories of FC..." or (lines 223-224) "...our cohort of Gambian infants exhibit atypical developmental trajectories of functional connectivity...".

      We thank the reviewer for this observation, and we agree with the reviewer that other factors differing between low- and poor-resource settings might have an impact on FC trajectories. We therefore specified this in the discussion as follows: “We acknowledge that differences in FC could also be attributed to other environmental and cultural disparities between high-resource and low-resource settings, and future studies are needed to explore this further” (line 238). We revised the whole manuscript to reflect similar statements.

      (3) To better interpret the results, it would be interesting to know if poor early growth predicts late cognitive flexibility in the tested sample and if the ΔWLZ distributions differ compared to a population in a high-income country where undernutrition is less frequent.

      We explored the relationship between changes in growth and cognitive flexibility in the two preschooler group, but there were no significant associations.

      Mean and SD values of WLZ are reported in Table 3. The values at every age are negative, indicating that the infants' weight-for-length is below the expected norm at all ages. To our knowledge, no other studies have assessed changes in growth in an infant sample with similar closely spaced age time points in high-income countries, making comparisons on growth changes challenging.

      (4) It is unclear why WLZ at birth and HCZ at 7-14 days are included in the models. I imagine this is to ensure that differences are not due to growing restrictions before birth. It would be nice if the authors could explain this.

      As the reviewer pointed out, HCZ at 7-14 days was included to ensure associations between growth and FC are not due to physical differences at birth. This case be considered as a 'baseline' measure for cerebral development, in the same way that WLZ at birth was used as a baseline for physical development. Therefore, we can more confidently  assume that the associations between growth and FC were specific to the impact of change in WLZ postnatally and not confounded by the size or maturity of the infant at birth. We specified this in the manuscript as follows: “These analyses were adjusted by WLZ at birth and HCZ at 7/14 days, to more confidently assume that the associations between growth and FC were specific to the impact of change in WLZ postnatally and not confounded by the size or maturity of the infant at birth” (line 520, statistical analysis section in the method section).

      (5) Right frontal-posterior connections at 24 months negatively correlate with ΔWLZ. Thus, restricted growth results in stronger frontal-posterior connections at 24 months. However, the same connections at 24 months positively correlate with cognitive flexibility (stronger connections predict better cognitive flexibility). Do the authors have any interpretation of this? How could this relate to previous findings of the authors (Bulgarelli et al. 2020), showing first an increase and then a decrease in functional connectivity between frontal and parietal regions?

      We acknowledge that interpreting the negative relationship between changes in growth and fronto-posterior FC at 24 months, alongside the positive association between the same connection and later cognitive flexibility, is challenging. We refrain from relating these findings to those published by Bulgarelli in 2020 due to differences in optode locations and because in that work the decrease in fronto-posterior FC was observed after 24 months (up to 36 months), whereas the endpoint in this study is right at 24 months.

      (6) With the growth of the head, the frontal channels move to more temporal areas, right? Could this determine the decrease in frontal inter-hemisphere connections?

      As shown in Nabwera (2017) head size does not increase that much in Gambian infants, or at least as expected by the WHO standard measures. We have added HCZ mean and SD values per age in Table 3.

      Minor points

      - HCZ is used in line 184 but not defined.

      We thank the reviewer for spotting this, we have now specified HCZ at line 184 as follows: ‘head-circumference z-score (HCZ)’.

      - Table SI2: NIRS not undertaken = the participant was assessed but did want or could not perform... I imagine there is a missing "not".

      We thank the reviewer for spotting this, we have now modified the legend of Table SI2 as follows: ‘the participant was assessed but did not want or could not perform the NIRS assessments.’

      - The authors should explain what weight-for-length is for those who are not familiar with it.

      We have added an explanation of weight-for-length in the experimental design section, line 339 as follows: ‘We then tested for relationships between brain FC at age 24 months with measures of early growth, as indexed by changes in weight-for-length z-scores (reflecting body weight in proportion to attained growth in length) at one month of age, and at each of the four subsequent visits (details provided below).’

      Reviewer #2 (Recommendations For The Authors):

      (1) I am confused about the authors' interpretation that left and right front-middle and right front-back FC increased with age. It appears in Figure 2 that the negative FC among these ROIs should actually decrease with age. This means that as individuals grow older, the FC values between these regions and zero diminished, albeit starting with negative FC (anticorrelation values) in younger age groups.

      Yes, the reviewer is correct. The negative values of the left and right front-middle and right front-back FC decreasing with age indicate that these regions go from being anti-correlated to becoming increasingly correlated. Hence, FC of these ROIs increased with age.

      (2) Are these negative values mentioned above at 24 months still negative? Have t-tests been run to examine the differences from zero?

      As suggested, we performed t-tests against zero for the mentioned FC at 24 months, and only the left and right fronto-middle FC are significantly different than zero (left fronto-middle FC: t(94) = 1.8, p = 0.036; right fronto-middle FC t(94) = 2.7, p = 0.003).

      (3) With so many correlation analyses, have multiple comparisons been consistently controlled for? While I assume this was done according to the Methods section, could the authors clarify whether FDR adjustment was applied to all the p-values at once or to a group of p-values each time? I found the following way of reporting FDR-adjusted p-values quite informative, such as PFDR, 24 pairs of ROIs < 0.05.

      We thank the reviewer for this insightful comment. P-values of regression analyses were FDR corrected per connection investigated, i.e. 21 possible ΔWLZ values per connection. We have specified this in the method section as follows: “To ensure statistical reliability, results from the regression analyses on each FC were corrected for multiple comparisons using false discovery rate (FDR)(Benjamini & Hochberg, 1995) per each connection investigated, i.e. 21 possible ΔWLZ values per each connection,” (page 12, Statistical Analyses section).

      (4) Can early growth trajectories predict changes in FC? Why not use ΔWLZ to predict ΔFC?

      Unfortunately, when considering ΔWLZ to predict ΔFC, the numbers of participants dropped significantly, with N=~30 infants, making it very challenging to interpret the results. We believe this emphasizes the importance of recruiting large samples when conducting longitudinal studies involving infants and employing multiple measures.

      (5) I might have missed the rationale, but why weren't the growth changes after 5 months studied?

      ΔWLZ between all time points were assessed as predictors of FC at 24 months. We have specified this at line 183 as follows: ‘we used multiple regression with the infant growth trajectory (delta weight for length z-score between all time points, DWLZ) and FC at 24 months’. As indicated in Table 2 and 3 the associations between ΔWLZ at all time points and FC at 24 months were tested, but only those with DWLZ calculated between birth and 1 month and the subsequent time points were significant. DWLZ between 5 months and the subsequent time points, DWLZ between 8 months and the subsequent time points, DWLZ between 12 months and the subsequent time points, DWLZ between 18 months and the subsequent time points did not significantly predict FC at 24 months. These are highlighted in Table 2 and Figure 3 in blue and marked as NS (non-significant).

      (6) Once more, the advantage of longitudinal data is that it allows us to tap into developmental changes. Analyzing and predicting cognitive development based solely on FC values at a single age stage (i.e., 24 months) would overlook the benefits of a longitudinal design, which is regrettable. I suggest that the authors attempt to use ΔFC for predictions and observe the outcomes.

      As mentioned to point (4) raised by the reviewer, unfortunately, when considering ΔWLZ to predict ΔFC, the numbers of participants dropped significantly, with N=~30 infants, making it very challenging to interpret the results. We believe this emphasizes the importance of recruiting large samples when conducting longitudinal studies involving infants and employing various measures.

      (7) In the section "Early FC predicts cognitive flexibility at preschool age", the authors pointed out that "...,none of these survived FDR correction for multiple comparisons." However, the paper discussed the association between FC at 24 months of age and cognitive flexibility, as it was supported by the statistical analysis in the following sections. If FDR correction cannot be satisfied, I would rephrase the implication/conclusion of the results to suggest that early FC does not predict cognitive flexibility at preschool age.

      We acknowledge the reviewer’s concern regarding the reporting of results that do not survive multiple comparisons. However, considering the uniqueness of our dataset and the novelty of our work, we believe it is crucial to report all significant findings, even those not passing multiple comparisons corrections, as they may motivate hypothesis-generation for future studies. We have taken great care to transparently distinguish between results that survived multiple comparisons and those that did not in both the Results and Discussion sections, ensuring that readers are not misled. It is possible that future studies may replicate and further support these associations. Therefore, by sharing these results with the research community, we provide valuable insights for future investigations.

      Following the reviewer’ suggestion, we specified that results from regression analysis are significant but they did not survive multiple comparisons in the discussion as follows: ‘While our results are consistent with previous studies, we acknowledge that the significant association between early FC and later cognitive flexibility does not withstand multiple comparisons. Therefore, we encourage future studies that may replicate these findings with a larger sample. (line 290, discussion section).

      (8) Have the authors assessed the impact of growth trajectories on cognitive flexibility?

      We explored the relationship between changes in growth and cognitive flexibility in the two preschooler groups, but there were no significant associations.

      (9) Are there no other cognitive or behavioural measures available? Cognitive flexibility is just one domain of cognitive development, and would the impact of undernutrition on cognitive development be domain-specific? There is a lack of theoretical support here. Why choose cognitive flexibility, and should the impact of undernutrition be domain-specific or domain-general?

      We agree with the reviewer that in this work, we chose to focus on one specific cognitive outcome. While this does not imply that the impact of undernutrition is domain-specific, cognitive flexibility, being a core executive function, has been extensively studied in terms of its neural underpinnings using other neuroimaging modalities, especially fMRI (for example see Dajani, 2015; Uddin, 2021).

      Moreover, other studies looking at the effect of adversity on cognitive outcomes focus on specific cognitive skills, such as working memory (Roberts, 2017), reading and arithmetic skills (Soni, 2021).

      We did assess infants also with Mullen Scales of Early Learning (MSEL), although the cognitive flexibility task within the Early Years Toolbox has been specifically designed for preschoolers (Howard, 2015), and this set of tasks has recently been validated in our team in The Gambia (Milosavljevic, 2023).Future works from the BRIGHT team will investigate performance at the MSEL in relation to other variable of the project.

      References:

      D. R. Dajani, L. Q. Uddin, Demystifying cognitive flexibility: Implications for clinical and developmental neuroscience. Trends Neurosci. 38, 571–578 (2015).

      L. Q. Uddin, Cognitive and behavioural flexibility: neural mechanisms and clinical considerations. Nat. Rev. Neurosci. 22, 167–179 (2021).

      Roberts, S. B., Franceschini, M. A., Krauss, A., Lin, P. Y., de Sa, A. B., Có, R., ... & Muentener, P. (2017). A pilot randomized controlled trial of a new supplementary food designed to enhance cognitive performance during prevention and treatment of malnutrition in childhood. Current developments in nutrition, 1(11), e000885.

      Soni, A., Fahey, N., Bhutta, Z. A., Li, W., Frazier, J. A., Moore Simas, T., ... & Allison, J. J. (2021). Early childhood undernutrition, preadolescent physical growth, and cognitive achievement in India: A population-based cohort study. PLoS Medicine, 18(10), e1003838.

      Howard, S. J., & Melhuish, E. (2015). An Early Years Toolbox (EYT) for assessing early executive function, language, self-regulation, and social development: Validity, reliability, and preliminary norms. Journal of Psychoeducational Assessment, 35(3), 255-275.

      Milosavljevic, B., Cook, C. J., Fadera, T., Ghillia, G., Howard, S. J., Makaula, H., ... & Lloyd‐Fox, S. (2023). Executive functioning skills and their environmental predictors among pre‐school aged children in South Africa and The Gambia. Developmental Science, e13407.

      (10) I would review more previous fNIRS studies on infants if they exist (e.g., the work by S Lloyd-Fox, L Emberson, and others). These studies can help identify brain ROIs likely linked to undernutrition and cognitive flexibility. The current analysis methods lean towards exploratory research. This makes the paper more of a proof-of-concept report rather than a strongly theoretically-driven study.

      We thank the reviewer for this important point. While we have reviewed existing fNIRS infant studies, there are no extant works that showed whether specific brain regions are related undernutrition. However, several fMRI studies assessed regions that do support cognitive flexibility, and we mentioned these in the manuscript (for example see Dajani, 2015; Uddin, 2021).

      Other than the BRIGHT project, we are aware of two other projects that assessed the effect of undernutrition on brain development, assessing cognitive outcomes in poor-resource settings:

      - the BEAN project in Bangladesh in which fNIRS data were recorded from the bilateral temporal cortex (i.e. Pirazzoli, 2022);

      - a project in India in which fNIRS data were recorded from frontal, temporal and parietal cortex bilaterally (i.e. Delgado Reyes, 2020)

      The brain regions recorded in these studies largely overlap with the brain regions we recorded from in this study.

      Another aspect to consider is that infants underwent several fNIRS tasks as part of the BRIGHT project, focusing on social processing, deferred imitation, and habituation responses. Therefore, brain regions for data acquisition were chosen to maximize the likelihood of recording meaningful data for all tasks (Lloyd-Fox, 2023). To clarify the text, we specified this information in the methods section (line 383).

      References:

      D. R. Dajani, L. Q. Uddin, Demystifying cognitive flexibility: Implications for clinical and developmental neuroscience. Trends Neurosci. 38, 571–578 (2015).

      Pirazzoli, L., Sullivan, E., Xie, W., Richards, J. E., Bulgarelli, C., Lloyd-Fox, S., ... & Nelson III, C. A. (2022). Association of psychosocial adversity and social information processing in children raised in a low-resource setting: an fNIRS study. Developmental Cognitive Neuroscience, 56, 101125.

      Delgado Reyes, L., Wijeakumar, S., Magnotta, V. A., Forbes, S. H., & Spencer, J. P. (2020). The functional brain networks that underlie visual working memory in the first two years of life. NeuroImage, 219, Article 116971.

      Lloyd-Fox, S., McCann, S., Milosavljevic, B., Katus, L., Blasi, A., Bulgarelli, C., Crespo-Llado, M., Ghillia, G., Fadera, T., Mbye, E., Mason, L., Njai, F., Njie, O., Perapoch-Amado, M., Rozhko, M., Sosseh, F., Saidykhan, M., Touray, E., Moore, S. E., … Team, and the B. S. (2023). The Brain Imaging for Global Health (BRIGHT) Study: Cohort Study Protocol. Gates Open Research, 7(126).

      (11) Last but not least, in the paper, the authors mentioned that fNIRS offers better spatial resolution and anatomical specificity compared to EEG, thereby providing more precise and reliable localization of brain networks. While I partially agree with this perspective, it remains to be explored whether the current fNIRS analysis strategies indeed yield higher spatial resolution. It is hoped that the authors will delve deeper into this discussion in the paper.

      The brain regions of focus were selected based on coregistration work previously conducted at each time point on the array used in this project (Collins-Jones, 2019). We deliberately avoided making claims about small brain regions, considering that head size might increase slightly less with age in The Gambia compared to Western countries (Nabwera, 2017) . However, we maintain that the conclusions drawn in this study offer higher brain-region specificity than could have been  identified with current common EEG methods alone.

      References:

      L. H. Collins-Jones, et al., Longitudinal infant fNIRS channel-space analyses are robust to variability parameters at the group-level: An image reconstruction investigation. Neuroimage 237, 118068 (2021).

      Nabwera, H. M., Fulford, A. J., Moore, S. E., & Prentice, A. M. (2017). Growth faltering in rural Gambian children after four decades of interventions: a retrospective cohort study. The Lancet Global Health, 5(2), e208–e216.

      Reviewer #3 (Recommendations For The Authors):

      Introduction

      - Among important developmental mechanisms to mention are the development of exuberant connections and the further selection/stabilization of the relevant ones according to environmental stimulation, vs the pruning of others.

      We agree with the reviewer that the development of exuberant connections and subsequent pruning is a universal process of paramount importance during the first years of life. However, after revising our introduction, given the word limit of the journal, we maintained focus on neurodevelopment and early adversity.

      Results

      - Adding a few more information on the 6 sections and 21 connections would be welcome. In particular for within-section FC: how was this computed?

      The 6 sections were created based on the co-registration of the array used in this study at each age in a previous published work L. H. Collins-Jones, et al., Longitudinal infant fNIRS channel-space analyses are robust to variability parameters at the group-level: An image reconstruction investigation. Neuroimage 237, 118068 (2021). This is reference No. 68 in the manuscript.

      As we mentioned in the section fNIRS preprocessing and data-analysis: ‘The sections were established via the 17 channels of each hemisphere which were grouped into front, middle and back (for a total of six regions) based on a previous co-registration of the BRIGHT fNIRS arrays onto age-appropriate templates’.

      The 21 connections were defined as all the possible links between the 6 regions, specifically: the interhemispheric homotopic connections (in orange in Figure SI1), which connect the same regions between hemispheres (i.e., front left with front right); the intrahemispheric connections (in green in Figure SI1), which correlate channels belonging to the same region; the fronto-posterior connections (in blue in Figure SI1), which link front and middle, middle and back, and front and back regions of the same hemisphere; and the crossing interhemispheric connections (non-homotopic interhemispheric, in yellow in Figure SI1), which link the front, middle, and back areas between left and right hemispheres. We added these specifications also in the legend of Figure SI1 for clarity.

      - The denomination intrahemispheric vs fronto-posterior vs crossed connections is not clear. Maybe prefer intra-hemispheric vs inter-hemispheric homotopic vs inter-hemispheric non-homotopic (also in Figure SI1).

      We appreciate the reviewer's suggestion regarding terminology. However, we believe that the term 'inter-hemispheric non-homotopic' could potentially refer to both connections within the same brain hemisphere from front to back and connections crossing between hemispheres, leading to increased confusion. Therefore, we have chosen not to include the term 'non-homotopic' and instead added 'homotopic' to 'interhemispheric' throughout the manuscript to emphasize that these functional connections occur between corresponding regions of the two hemispheres.

      - with time -> with age.

      We replaced “with time” with “with age” as suggested through the manuscript.

      - The description of both HbO2 and HHb results overloads the main text: would it be relevant to present one of the two in Supplementary Information if the results are coherent?

      We understand the reviewer’s concern regarding overloading the results section with reporting both chromophores. However, reporting results for both HbO and HHb is considered a crucial step for publications in the fNIRS field, as emphasized in recent formal guidance (Yücel et al., 2020). One of the strengths of fNIRS compared to fMRI is its ability to record from both chromophores, enabling a more precise characterization of brain activations and oscillations. Moreover, in FC studies like this one, ensuring that HbO and HHb results overlap is an important check that increases confidence in interpreting the findings.

      References:

      Yücel, M. A., von Lühmann, A., Scholkmann, F., Gervain, J., Dan, I., Ayaz, H., Boas, D., Cooper, R. J., Culver, J., Elwell, C. E., Eggebrecht, A. ., Franceschini, M. A., Grova, C., Homae, F., Lesage, F., Obrig, H., Tachtsidis, I., Tak, S., Tong, Y., … Wolf, M. (2020). Best Practices for fNIRS publications. Neurophotonics, 1–34. https://doi.org/10.1117/1.NPh.8.1.012101

      - HCZ is not defined when first used.

      We thank the reviewer for spotting this, we have now specified HCZ at line 184 as follows: ‘head-circumference z-score (HCZ)’.

      - Choosing the analyzed measures to "maximize power" could be criticised.

      We appreciate the reviewer’s concern. However, correlating all the FC values with all changes in growth would have raised an important issue for multiple comparisons. We therefore we made a priori decision to focus on investigating the relationship between changes in growth and those FC that showed a significant change with age, considering these as the most interesting ones from a developmental perspective in our sample.

      Discussion

      - I would recommend using the same order to synthesize results and further discuss them.

      We agree with the reviewer that the suggested structure is optimal for a clear discussion section. We have indeed followed it, with each paragraph covering specific aspects:

      - Recap of the study aims

      - Results summary and discussion of developmental changes

      - Results summary and discussion of the relationship between changes in growth and FC

      - Results summary and discussion of the relationship between FC and cognitive flexibility

      - Limitations

      - Conclusion

      Given the numerous results presented in this paper, we believe that readers will better digest them by first reading a summary of the results followed by their interpretations, rather than condensing all the interpretations together.

      - Highlighting how "atypical" developmental trajectories are in Gambian infants would be welcome in the Results section. Other interpretations can be found than "The observed decrease in frontal inter-hemispheric FC with increasing age may be due to the exposure to early life undernutrition adversity".

      We agree with the reviewer that other factors that differ between low- and high-resource settings might have an impact on FC trajectories. We therefore specified this in the discussion as follows: “We acknowledge that differences in FC could also be attributed to other environmental and cultural disparities between high-resource and low-resource settings, and future studies are needed to further investigate cultural, environmental, and genetic effects on brain FC” (line 238).

      - Focusing on FC at 24m for the relationship with growth is questionable.

      Correlating the FC values at 5 time points with all changes in growth would have raised an important issue for multiple comparisons. We therefore we made a decision a priori to focus on investigating the relationship between changes in growth and FC at 24 months as our final time point of data collection. We added this information in the methods section as follows: “To investigate the impact of undernutrition on FC development, we used DWLZ as independent variables in regression analyses on HbO2 (as the chromophore with the highest signal-to-noise ratio) FC at 24 months, our final time point of data collection” (line 517, method section).

      - There is too much emphasis on the correlation between FC and cognitive flexibility, whereas results are not significant after correction for multiple comparisons.

      Following the reviewer’ suggestion, we specified that results from regression analysis are significant but they did not survive multiple comparisons in the discussion as follows: While our results are consistent with previous studies, we acknowledge that the significant association between early FC and later cognitive flexibility does not withstand multiple comparisons. Therefore, we encourage future studies that may replicate these findings with a larger sample. (line 290, discussion section).

      Methods

      - I would recommend detailing how z-scores were computed in the paragraph "Anthropometric measures".

      We specified how z-scores were computed in the statistical analysis section as follows: “Anthropometric measures were converted to age and sex adjusted z‐scores that are based on World Health Organization Child Growth Standards (93). Weight‐for‐Length (WLZ) and Head Circumference (HCZ) z-scores were computed” (line 509, method section). As transforming data is the first step of statistical analysis and is not directly related to data collection, we believe it is more appropriate to retain this description in the statistical analysis section.

      - FC computation: the mention of "correlating the first and the last 250s" is not clear.

      We specified this more clearly in the text as follows: We found that correlating the first and the last 250 seconds of valid data after pre-processing provided the highest percentage of infants with strong correlation between the first and the last portion of data (line 467).

      - The manuscript mentions "age 3 years" for the younger preschoolers but ~48months rather corresponds to 4 years.

      We revised the entire manuscript and the supplementary materials, but we could not find any instance in which preschoolers are referred with age in months rather than in years.

      - Specify the number of children evaluated at 4 and 5 years. Is the test of cognitive flexibility normalized for age? If not, how were the 2 groups considered in the analyses? (age as a confounding factor).

      We have added the number of children in the two preschooler groups as follows: younger preschoolers (age mean ± SD=47.96 ± 2.77 months, N=77) and older preschoolers (age mean ± SD=57.58 ± 2.11 months, N=84). (line 484).

      The cognitive flexibility test was not normalized for age, as this task was specifically developed for preschoolers (Howard, 2015). As mentioned in ‘Cognitive flexibility at preschool age’ of the methods section, “data were collected in two ranges of preschool ages”, which guided our decision to perform regression analysis on the impact of FC on cognitive flexibility separately within these two age groups, rather than treating them as a single group of preschoolers.

      References:

      Howard, S. J., & Melhuish, E. (2015). An Early Years Toolbox (EYT) for assessing early executive function, language, self-regulation, and social development: Validity, reliability, and preliminary norms. Journal of Psychoeducational Assessment, 35(3), 255-275.

      Figures and Tables

      - Table 1 could highlight the significant results. It is not clear what the "baseline" results correspond to.

      We have marked in bold the results that are statistically significant in Table 1. In the linear mixed model we performed, the first time point (i.e. 5 months) is chosen as ‘baseline’, i.e. the reference against which the other time points are compared to, and its statistical values refer to its significance against 0 (as it has been performed in Bulgarelli 2020).

      - Figures 2 B and C seem redundant? What is SE vs SD?

      We believe that both figures 2B and 2C are useful for the readers. While the first one shows the mean FC values at the group level, the second one highlights the individual variability of FC values (typical of infant neuroimaging data), which also why it is interesting to relate these measures to other variables of our dataset (i.e. growth and cognitive flexibility). Figure 2C also reports mean FC values per age, but these might be less visible considering that also one dot per infant is also plotted.

      SE stands for standard error, and in the legend of the figure we specified this as follows: ‘Mean and standard error of the mean (SE)’. SD stands for standard deviation, and we have now specified this as follows: ‘mean ± standard deviation (SD)’ .

      - Table 2: I would recommend removing results that don't survive corrections for multiple comparisons.

      We acknowledge the reviewer’s concern regarding the reporting of results that do not survive multiple comparisons. However, considering the uniqueness of our dataset and the novelty of our work, we believe it is crucial to report all significant findings. We have taken great care to transparently distinguish between results that survived multiple comparisons and those that did not in both the Results and Discussion sections, ensuring that readers are not misled. It is possible that future studies may replicate and further strengthen these associations. Therefore, by sharing these results with the research community, we provide valuable insights for future investigations.

      - Figure 3: the top is redundant with Table 2: to be merged? B: the statistical results might be shown in a Table.

      We agree with the reviewer that the top part of Figure 3 and Table 2 report the same results. However, given the richness of these findings, we believe that the top part of Figure 3 serves as a useful summary for readers. Additionally, examining both the top and bottom parts of Figure 3 provides a comprehensive overview of the regression analysis conducted in this study.

      - Figure SI6: Is it really a % in x-axis?

      We thank the reviewer for spotting this typo, the percentage is relevant for the y-axis only. We removed the % symbol from ticks of the x-axis.

      - Table SI1: the presented p-values don't seem to survive Bonferroni correction, contrary to what is written.

      We thank the reviewer for spotting this mistake, we removed the reference to the Bonferroni correction for the p-values.

      - Table SI2: For the proportion of children included in the analysis, maybe be precise that the proportion was computed based on the ones with acquired data. Maybe also add the proportion according to all children, to better show the high drop-out rate at certain ages?

      We thank the reviewer for these useful suggestions. We have specified in the legend of the table how we calculated the proportion of infants included as follows: ‘The proportion of children included in the analysis was computed based on the infants with FC data’. We have also added a column in the table called ‘Inclusion rate (from the 204 infants recruited)’, following the reviewer’s suggestion. This will be a useful reference for future studies.

      - A few typos should be corrected throughout the manuscript.

      We thoroughly revised the main manuscript and the supplementary materials for typos.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary:

      Building on previous in vitro synaptic circuit work (Yamawaki et al., eLife 10, 2021), Piña Novo et al. utilize an in vivo optogenetic-electrophysiological approach to characterize sensory-evoked spiking activity in the mouse's forelimb primary somatosensory (S1) and motor (M1) areas. Using a combination of a novel "phototactile" somatosensory stimuli to the mouse's hand and simultaneous high-density linear array recordings in both S1 and M1, the authors report in awake mice that evoked cortical responses follow a triphasic peak-suppression-rebound pattern response. They also find that M1 responses are delayed and attenuated relative to S1. Further analysis revealed a 20-fold difference in subcortical versus corticocortical propagation speeds.

      They also report that PV interneurons in S1 are strongly recruited by hand stimulation. Furthermore, they report that selective activation of PV cells can produce a suppression and rebound response similar to "phototactile" stimuli. Lastly, the authors demonstrate that silencing S1 through local PV cell activation reduces M1 response to hand stimulation, suggesting S1 may directly drive M1 responses.

      Strengths:

      The study was technically well done, with convincing results. The data presented are appropriately analyzed. The author's findings build on a growing body of both in vitro and in vivo work examining the synaptic circuits underlying the interactions between S1 and M1. The paper is well-written and illustrated. Overall, the study will be useful to those interested in forelimb S1-M1 interactions.

      Weaknesses:

      Although the results are clear and convincing, one weakness is that many results are consistent with previous studies in other sensorimotor systems, and thus not all that surprising. For example, the findings that sensory stimulation results in delayed and attenuated responses in M1 relative to S1 and that PV inhibitory cells in S1 are strongly recruited by sensory stimulation are not novel (e.g., Bruno et al., J Neurosci 22, 10966-10975, 2002; Swadlow, Philos Trans R Soc Lond B Biol Sci 357, 1717-1727, 2002; Gabernet et al., Neuron 48, 315-327, 2005; Cruikshank et al., Nat Neurosci 10, 462-468, 2007; Ferezou et al., Neuron 56, 907-923, 2007; Sreenivasan et al., Neuron 92, 1368-1382, 2016; Yu et al., Neuron 104, 412-427 e414, 2019). Furthermore, the observation that sensory processing in M1 depends upon activity in S1 is also not novel (e.g., Ferezou et al., Neuron 56, 907-923, 2007; Sreenivasan et al., Neuron 92, 1368-1382, 2016). The authors do a good job highlighting how their results are consistent with these previous studies.

      We thank the reviewer for the close reading of the manuscript and the many constructive comments and critiques. As the reviewer notes, there have been many prior studies of related circuits in other sensorimotor systems forming an important context for our study and findings, as we have tried to highlight. We appreciate the suggestions for additional relevant articles to cite.

      Perhaps a more significant weakness, in my opinion, was the missing analyses given the rich dataset collected. For example, why lump all responsive units and not break them down based on their depth? Given superficial and deep layers respond at different latencies and have different response magnitudes and durations to sensory stimuli (e.g., L2/3 is much more sparse) (e.g., Constantinople et al., Science 340, 1591-1594, 2013; Manita et al., Neuron 86, 1304-1316, 2015; Petersen, Nat Rev Neurosci 20, 533-546, 2019; Yu et al., Neuron 104, 412-427 e414, 2019), their conclusions could be biased toward more active layers (e.g., L4 and L5). These additional analyses could reveal interesting similarities or important differences, increasing the manuscript's impact. Given the authors use high-density linear arrays, they should have this data.

      We have analyzed the activity patterns as a function of cortical depth, and now include these results in the manuscript as suggested. The key new finding is that the M1 responses are strongest in upper layers, consistent with expectations based on the excitatory corticocortical synaptic connectivity characterized previously. Changes to the manuscript include new figures (Figure 5; Figure 5 - figure supplement 1), which we explain (Methods: page 14, lines 618-621), describe (new Results section: pages 4-5, lines 183-189), comment on (Discussion: page 9, lines 378-391), and summarize the significance of (Abstract: page 1, lines 22-24). In addition, we incorporated the new laminar analysis into a summary schematic (Figure 9). We thank the reviewer for suggesting this analysis.

      Similarly, why not isolate and compare PV versus non-PV units in M1? They did the photostimulation experiments and presumably have the data. Recent in vitro work suggests PV neurons in the upper layers (L2/3) of M1 are strongly recruited by S1 (e.g., Okoro et al., J Neurosci 42, 8095-8112, 2022; Martinetti et al., Cerebral cortex 32, 1932-1949, 2022). Does the author's data support these in vitro observations?

      These experiments were relatively complex and M1 optotagging was not routinely included in the stimulus and acquisition protocol. Therefore, we don’t have sufficient data for this analysis. We plan to address this in future studies.

      It would have also been interesting to suppress M1 while stimulating the hand to determine if any part of the S1 triphasic response depends on M1 feedback.

      We agree that this is of interest but consider this to be outside the scope of the current study.

      I appreciate the control experiment showing that optical hand stimulation did not evoke forelimb movement. However, this appears to be an N=1. How consistent was this result across animals, and how was this monitored in those animals? Can the authors say anything about digit movement?

      We have performed additional experiments to address this point. A constraint with EMG is that it is limited to the muscle(s) one chooses to record from, and it is difficult to implant tiny muscles of the hand. Therefore, for this analysis, we used kilohertz videography as a high-sensitivity method for movement surveillance across the hand. Hand stimulation did not evoke any detectable movements. Changes in the manuscript include: revised Figure 1 - figure supplement 1; supplementary Figure 1 - video 1; and associated text edits in the Methods (page 13, line 557; page 14, lines 626-639) and Results sections (page 2, lines 84-85).

      A light intensity of 5 mW was used to stimulate the hand, but it is unclear how or why the authors chose this intensity. Did S1 and M1 responses (e.g., amplitude and latency) change with lower or higher intensities? Was the triphasic response dependent on the intensity of the "phototactile" stimuli?

      As we now say in the Methods > Optogenetic photostimulation of the hand section (page 13, lines 562-565), “This intensity was chosen based on pilot experiments in which we varied the LED power, which showed that this intensity was reliably above the threshold for evoking robust responses in both S1 and M1 without evoking any visually detectable movements (as subsequently confirmed by videography)”.

      Reviewer #2 (Public review):

      Summary:

      Communication between sensory and motor cortices is likely to be important for many aspects of behavior, and in this study, the authors carefully analyse neuronal spiking activity in S1 and M1 evoked by peripheral paw stimulation finding clear evidence for sensory responses in both cortical regions

      Strengths:

      The experiments and data analyses appear to have been carefully carried out and clearly represented.

      Weaknesses:

      (1) Some studies have found evidence for excitatory projection neurons expressing PV and in particular some excitatory pyramidal cells can be labelled in PV-Cre mice. The authors might want to check if this is the case in their study, and if so, whether that might impact any conclusions.

      Thank you for pointing this out. The prior studies suggest it is mainly a subset of layer 5B excitatory neurons that may express PV. We checked this in two ways. Anatomically, we did not find double-labeling. An electrophysiology assay showed that, although some evoked excitatory synaptic input could be detected in some neurons, these inputs were very weak. Results from these assays are shown in new Figure 6 - figure supplement 1, with associated text edits in the Methods (page 11, lines 469-471; page 15, lines 657-668) and Results (page 5, lines 198-199) sections.

      (2) I think the analysis shown in Figure S1 apparently reporting the absence of movements evoked by the forepaw stimulation could be strengthened. It is unclear what is shown in the various panels. I would imagine that an average of many stimulus repetitions would be needed to indicate whether there is an evoked movement or not. This could also be state-dependent and perhaps more likely to happen early in a recording session. Videography could also be helpful.

      As noted above, we have performed additional experiments to address this.

      (3) Some similar aspects of the evoked responses, including triphasic dynamics, have been reported in whisker S1 and M1, and the authors might want to cite Sreenivasan et al., 2016.

      Thank you for pointing this out; we now cite this article (page 1, line 46; page 10, line 415).

      Reviewer #3 (Public review):

      Summary:

      This is a solid study of stimulus-evoked neural activity dynamics in the feedforward pathway from mouse hand/forelimb mechanoreceptor afferents to S1 and M1 cortex. The conclusions are generally well supported, and match expectations from previous studies of hand/forelimb circuits by this same group (Yamawaki et al., 2021), from the well-studied whisker tactile pathway to whisker S1 and M1, and from the corresponding pathway in primates. The study uses the novel approach of optogenetic stimulation of PV afferents in the periphery, which provides an impulselike volley of peripheral spikes, which is useful for studying feedforward circuit dynamics. These are primarily proprioceptors, so results could differ for specific mechanoreceptor populations, but this is a reasonable tool to probe basic circuit activation. Mice are awake but not engaged in a somatosensory task, which is sufficient for the study goals.

      The main results are:

      (1) brief peripheral activation drives brief sensory-evoked responses at ~ 15 ms latency in S1 and ~25 ms latency in M1, which is consistent with classical fast propagation on the subcortical pathway to S1, followed by slow propagation on the polysynaptic, non-myelinated pathway from S1 to M1;

      (2) each peripheral impulse evokes a triphasic activation-suppression-rebound response in both S1 and M1;

      (3) PV interneurons carry the major component of spike modulation for each of these phases; (4) activation of PV neurons in each area (M1 or S1) drives suppression and rebound both in the local area and in the other downstream area;

      (5) peripheral-evoked neural activity in M1 is at least partially dependent on transmission through S1.

      All conclusions are well-supported and reasonably interpreted. There are no major new findings that were not expected from standard models of somatosensory pathways or from prior work in the whisker system.

      Strengths:

      This is a well-conducted and analyzed study in which the findings are clearly presented. This will provide important baseline knowledge from which studies of more complex sensorimotor processing can build.

      Weaknesses:

      A few minor issues should be addressed to improve clarity of presentation and interpretation:

      (1) It is critical for interpretation that the stimulus does not evoke a motor response, which could induce reafference-based activity that could drive, or mask, some of the triphasic response. Figure S1 shows that no motor response is evoked for one example session, but this would be stronger if results were analyzed over several mice.

      As noted above, we have performed additional experiments to address this point.

      (2) The recordings combine single and multi-units, which is fine for measures of response modulation, but not for absolute evoked firing rate, which is only interpretable for single units. For example, evoked firing rate in S1 could be higher than M1, if spike sorting were more difficult in S1, resulting in a higher fraction of multi-units relative to M1. Because of this, if reporting of absolute firing rates is an essential component of the paper, Figs 3D and 4E should be recalculated just for single units.

      Thank you for noting this. Although the absolute firing rates are not essential for the main findings or conclusions (which as noted focus on response modulations and relative differences) we agree that analyzing the single-unit response amplitudes is useful. Therefore, changes in the manuscript now include: revised Figure 3, and associated text edits in the Methods (page 12, lines 543-545), Results (page 3, lines 115-119), and Discussion (page 7, lines 305-311) sections.

      (3) In Figure 5B, the average light-evoked firing rate of PV neurons seems to come up before time 0, unlike the single-trial rasters above it. Presumably, this reflects binning for firing rate calculation. This should be corrected to avoid confusion.

      Yes, this reflects the binning. We agree that this is potentially confusing and have removed these average plots below the raster plots, as the rasters alone suffice to demonstrate the result (i.e., that PV units are strongly activated and thus tagged by optogenetic stimulation). Changes are now reflected in revised Figure 6.

      (4) In Figure 6A bottom, please clarify what legends "W. suppression" and "W. rebound" mean.

      In the figure plot legends, the “W.” has been removed. Changes are now reflected in revised Figure 7 and Figure 7 – figure supplement 1.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Did you filter the neural signals during acquisition? If so, please include these details in the results.

      Signals were bandpass-filtered (2.5 Hz to 7.6 KHz) at the hardware level at acquisition (with no additional software filtering applied), as now clarified in the Methods Electrophysiological recordings section as requested (page 12, lines: 525-526).

      Reviewer #2 (Recommendations for the authors):

      (1) Some studies have found evidence for excitatory projection neurons expressing PV and in particular some excitatory pyramidal cells can be labelled in PV-Cre mice. The authors might want to check if this is the case in their study, and if so, whether that might impact any conclusions.

      Please see above for our response to this issue.

      (2) I think the analysis shown in Figure S1 apparently reporting the absence of movements evoked by the forepaw stimulation could be strengthened. It is unclear what is shown in the various panels. I would imagine that an average of many stimulus repetitions would be needed to indicate whether there is an evoked movement or not. This could also be state-dependent and perhaps more likely to happen early in a recording session. Videography could also be helpful.

      Please see above for our response to this issue.

      (3) Some similar aspects of the evoked responses, including triphasic dynamics, have been reported in whisker S1 and M1, and the authors might want to cite Sreenivasan et al., 2016.

      As noted above, we now cite this study.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, the authors discovered MYL3 of marine medaka (Oryzias melastigma) as a novel NNV entry receptor, elucidating its facilitation of RGNNV entry into host cells through macropinocytosis, mediated by the IGF1R-Rac1/Cdc42 pathway.

      Strengths:

      In this manuscript, the authors have performed in vitro and in vivo experiments to prove that MnMYL3 may serve as a receptor for NNV via macropinocytosis pathway. These experiments with different methods include Co-IP, RNAi, pulldown, SPR, flow cytometry, immunofluorescence assays, and so on. In general, the results are clearly presented in the manuscript.

      Weaknesses:

      For the writing in the introduction and discussion sections, the author Yao et al mainly focus on the viral pathogens and fish in Aquaculture, the meaning and novelty of results provided in this manuscript are limited, and not broad in biology. The authors should improve the likely impact of their work on the viral infection field, maybe also in the evolutionary field with the fish model.

      (1) Myosin is a big family, why did authors choose MYL3 as a candidate receptor for NNV?

      We appreciate your insightful question. We selected MYL3 as a candidate receptor based on a combination of proteomic screening and literature evidence, and functional validation. Increasing evidence indicated that myosins have been implicated in viral infections. For instance, myosin heavy chain 9 plays a role in multiple viral infections (Li et al., 2018), and non-muscle myosin heavy chain IIA has been identified as an entry receptor for herpes simplex virus-1 (Arii et al., 2010). Furthermore, myosin II light chain activation is essential for influenza A virus entry via macropinocytosis (Banerjee et al., 2014). Our previous studies hinted at a potential interaction between MYL3 and CP (Zhang et al., 2020). Huang et al also reported that Epinephelus coioides MYL3 might interact with native NNV CP by proteomic analysis of immunoprecipitation (IP) assay (Huang et al., 2020). Our Co-IP and SPR analyses confirmed a direct interaction between MYL3 and the RGNNV CP. Based on these studies, we selected MYL3 as a candidate receptor for NNV.

      References

      Huang PY, Hsiao HC, Wang SW, Lo SF, Lu MW, Chen LL. 2020. Screening for the Proteins That Can Interact with Grouper Nervous Necrosis Virus Capsid Protein. Viruses 12:1–20.

      Li L, Xue B, Sun W, Gu G, Hou G, Zhang L, Wu C, Zhao Q, Zhang Y, Zhang G, Hiscox JA, Nan Y, Zhou EM. 2018. Recombinant MYH9 protein C-terminal domain blocks porcine reproductive and respiratory syndrome virus internalization by direct interaction with viral glycoprotein 5. Antiviral Res 156:10–20.

      Arii J, Goto H, Suenaga T, Oyama M, Kozuka-Hata H, Imai T, Minowa A, Akashi H, Arase H, Kawaoka Y, Kawaguchi Y. 2010. Non-muscle myosin IIA is a functional entry receptor for herpes simplex virus-1.

      Banerjee I, Miyake Y, Philip Nobs S, Schneider C, Horvath P, Kopf M, Matthias P, Helenius A, Yamauchi Y. 2014. Influenza A virus uses the aggresome processing machinery for host cell entry. Science (80- ) 346:473–477.

      (2) What is the relationship between MmMYL3 and MmHSP90ab1 and other known NNV receptors? Why does NNV have so many receptors? Which one is supposed to serve as the key entry receptor?

      We acknowledge the functional diversity of receptors for NNV. MmHSP90ab1 and MmHSC70 have been identified as receptors involved in NNV entry through clathrin-mediated endocytosis (CME), whereas MYL3 facilitates entry via macropinocytosis. These pathways serve as complementary mechanisms for the virus to enter host cells, potentially enhancing infection efficiency. While HSP90ab1 facilitates CME, MYL3 promotes macropinocytosis, both of which are critical for viral internalization, but through distinct endocytic mechanisms.

      NNV likely utilizes multiple receptors to increase its host range and infection efficiency. The diversity of receptors ensures that the virus can infect a wide variety of host species. By employing HSP90ab1, HSC70, and MYL3, NNV can exploit different cellular pathways for entry, making it more adaptable to various host environments.

      Regarding the identification of a key entry receptor, we agree this is a critical unresolved question. While HSP90ab1/HSC70 appear essential for CME-mediated entry, our data suggest MYL3 plays a distinct role in macropinocytic uptake. To systematically evaluate receptor hierarchy, we initially proposed comparative knockout studies targeting these candidate genes. However, we must acknowledge that current technical limitations in marine fish models – particularly the extended generation time for stable knockout cell lines and challenges in maintaining viable cell cultures post-editing – have delayed these experiments. Nevertheless, we are actively exploring strategies to overcome these obstacles and will continue to refine our approach to address these questions in future research.

      (3) In vivo knockout of MYL3 using CRISPR-Cas9 should be conducted to verify whether the absence of MYL3 really inhibits NNV infection. Although it might be difficult to do it in marine medaka as stated by the authors, the introduction of zebrafish is highly recommended, since it has already been reported that zebrafish could serve as a vertebrate model to study NNV (doi: 10.3389/fimmu.2022.863096).

      As noted in our manuscript from line 374 to 384, marine medaka is a relatively new model for studying viral infections and is not yet optimized for CRISPR-Cas9-mediated gene knockout. The technical challenges related to precise embryo microinjection and off-target effects using CRISPR-Cas9 in marine medaka complicate the establishment of knockout lines. These limitations, including the time required for multiple breeding generations and molecular screening, currently make this approach difficult to implement.

      We fully agree with your suggestion to consider zebrafish as an alternative model. Zebrafish have been well-established as a vertebrate model for studying NNV, and their genetic tractability and well-developed CRISPR-Cas9 protocols provide a more accessible and efficient platform for generating knockout models. In our future studies, we plan to conduct CRISPR-Cas9-mediated knockout experiments targeting multiple NNV receptors in zebrafish. This will allow us to systematically evaluate the role of different receptors in NNV infection and elucidate their potential interactions. The findings from these studies will be included in a future publication, which will provide a more comprehensive understanding of the molecular mechanisms underlying NNV infection in vertebrate models.

      (4) The results shown in Figure 6 are not enough to support the conclusion that "RGNNV triggers macropinocytosis mediated by MmMYL3". Additional electron microscopy of macropinosomes (sizes, morphological characteristics, etc.) will be more direct evidence.

      Previous study has reported that dragon grouper nervous necrosis virus (DGNNV) enters SSN-1 cells primarily through micropinocytosis and macropinocytosis pathways. Electron microscopy observations revealed several kinds of membrane ruffling and large disproportionate macropinosomes were observed in DGNNV infected cells, indicating NNV infection could triggers micropinocytosis (Liu et al., 2005). In our study, the data from inhibitor treatments, co-localization of MmMYL3 with RGNNV CP, and dextran uptake assays also provide compelling evidence for the involvement of macropinocytosis in RGNNV entry via MmMYL3. These methods are well-established in the literature and have been used extensively to study viral entry pathways (Lingemann et al., 2019). Specifically, the dextran uptake assay has been widely utilized as a marker for macropinocytosis and has provided clear evidence of RGNNV internalization via this pathway. The use of macropinocytosis inhibitors, such as EIPA and Rottlerin, significantly reduced RGNNV entry, further supporting our conclusion. Nonetheless, we acknowledge the potential value of additional electron microscopy studies and will consider this approach in our future research.

      References

      Liu W, Hsu CH, Hong YR, Wu SC, Wang CH, Wu YM, Chao CB, Lin CS. 2005. Early endocytosis pathways in SSN-1 cells infected by dragon grouper nervous necrosis virus, J Gen Virol.

      Lingemann M, McCarty T, Liu X, Buchholz UJ, Surman S, Martin SE, Collins PL, Munir S. 2019. The alpha-1 subunit of the Na+,K+-ATPase (ATP1A1) is required for macropinocytic entry of respiratory syncytial virus (RSV) in human respiratory epithelial cells, PLoS Pathogens.

      (5) MYL3 is "predominantly found in muscle tissues, particularly the heart and skeletal muscles". However, NNV is a virus that mainly causes necrosis of nervous tissues (brain and retina). If MYL3 really acts as a receptor for NNV, how does it balance this difference so that nervous tissues, rather than muscle tissues, have the highest viral titers?

      While MYL3 is highly expressed in cardiac and skeletal muscles, studies have shown that MYL3, like other myosin light chains, can also be present in non-muscle tissues. Additionally, proteins involved in viral entry do not always need to be the most highly expressed in the final target tissue, as long as they facilitate the initial infection process. For instance, rabies virus is a rhabdovirus which exhibits a marked neuronotropism in infected animals. Transferrin receptor protein 1 can serve as a receptor for rabies virus through CME pathway, but TfR1 expressed most abundantly in liver tissue not nervous system (Wang et al., 2023).

      Viral tropism is often determined not only by the presence of an entry receptor but also by co-receptors, cellular factors, and post-entry mechanisms. While MYL3 may act as a receptor for NNV, other factors, such as cell-specific proteases, signaling molecules, and intracellular trafficking pathways, likely contribute to NNV’s preferential replication in the brain and retina.

      Reference

      Wang Xinxin, Wen Z, Cao H, Luo J, Shuai L, Wang C, Ge J, Wang Xijun, Bu Z, Wang J. 2023. Transferrin Receptor Protein 1 Is an Entry Factor for Rabies Virus. J Virol 97. doi:10.1128/jvi.01612-22

      Reviewer #2 (Public review):

      Summary:

      The manuscript offers an important contribution to the field of virology, especially concerning NNV entry mechanisms. The major strength of the study lies in the identification of MmMYL3 as a functional receptor for RGNNV and its role in macropinocytosis, mediated by the IGF1R-Rac1/Cdc42 signaling axis. This represents a significant advance in understanding NNV entry mechanisms beyond previously known receptors such as HSP90ab1 and HSC70. The data, supported by comprehensive in vitro and in vivo experiments, strongly justify the authors' claims about MYL3's role in NNV infection in marine medaka.

      Strengths:

      (1) The identification of MmMYL3 as a functional receptor for RGNNV is a significant contribution to the field. The study fills a crucial gap in understanding the molecular mechanisms governing NNV entry into host cells.

      (2) The work highlights the involvement of IGF1R in macropinocytosis-mediated NNV entry and downstream Rac1/Cdc42 activation, thus providing a thorough mechanistic understanding of NNV internalization process. This could pave the way for further exploration of antiviral targets.

      Thanks for your review.

      Reviewer #3 (Public review):

      Summary:

      The manuscript presents a detailed study on the role of MmMYL3 in the viral entry of NNV, focusing on its function as a receptor that mediates viral internalization through the macropinocytosis pathway. The use of both in vitro assays (e.g., Co-IP, SPR, and GST pull-down) and in vivo experiments (such as infection assays in marine medaka) adds robustness to the evidence for MmMYL3 as a novel receptor for RGNNV. The findings have important implications for understanding NNV infection mechanisms, which could pave the way for new antiviral strategies in aquaculture.

      Strengths:

      The authors show that MmMYL3 directly binds the viral capsid protein, facilitates NNV entry via the IGF1R-Rac1/Cdc42 pathway, and can render otherwise resistant cells susceptible to infection. This multifaceted approach effectively demonstrates the central role of MmMYL3 in NNV entry.

      Thanks for your review.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Line94: SPR analysis? The full name should be provided when it first shows.

      We have defined SPR when it first appears at line 97 in the revised manuscript.

      (2) Moreover, is it too many for a manuscript to have a total of nine figures in the main text? Some of them might be moved to the supplementary file.

      We have merged the previous Fig 4 and Fig 5 and combined Fig 8 and Fig 9, reducing the number of figures to seven. For the specific details of the figure adjustments, please refer to the corresponding figure legends.

      Reviewer #2 (Recommendations for the authors):

      (1) Expand on the potential therapeutic implications of targeting MYL3 or the IGF1R pathway in aquaculture settings. Including a discussion of how inhibitors could be developed or tested in future research would give practical context to the findings.

      Thanks for your valuable suggestion to expand on the therapeutic implications of targeting MYL3 and the IGF1R pathway in aquaculture. In response, we have discussed potential strategies for developing inhibitors, such as small molecules, peptides, or monoclonal antibodies targeting MYL3 to block its interaction with the viral capsid, and IGF1R inhibitors to prevent macropinocytosis-mediated viral entry. We propose using virtual screening platforms to identify these inhibitors, followed by in vivo testing in aquaculture models. Additionally, combining MYL3 and IGF1R inhibitors could provide a synergistic approach to enhance antiviral efficacy. The relevant discussions have been supplemented at lines 358 to 368 in the revised manuscript.

      (2) It is recommended to include the data regarding the lack of interaction between the CMNV CP and MmMYL3 as a supplementary figure.

      We have included supplementary data demonstrating that CMNV CP does not interact with MmMYL3, highlighting the specificity of MYL3 for RGNNV. For detailed information, please refer to Fig. S4.

      Reviewer #3 (Recommendations for the authors):

      Consider discussing the broader implications of these findings, particularly whether MYL3 might serve as a receptor for other viruses.

      We appreciate this suggestion. It is important to note that viral receptors typically exhibit specificity for specific types of viruses. Receptor recognition is typically highly specific, and the binding interactions between viral proteins and host receptors often depend on the structural compatibility between the viral capsid/ viral envelope and the host receptor. Our study demonstrates that MYL3 serves as a receptor for NNV based on its direct interaction with the NNV capsid protein (CP). However, when we tested whether MYL3 interacts with CMNV (Covert Mortality Nodavirus), which is phylogenetically closer to NNV, we found that CMNV CP does not bind to MYL3. Given the lack of interaction between MYL3 and CMNV, it is unlikely that MYL3 serves as a receptor for more distantly related viruses. Since MYL3 does not interact with CMNV, a virus more closely related to NNV, it is less likely to function as a receptor for viruses that are more distantly related to NNV. The relevant discussions have been supplemented at lines 306 to 310 in the revised manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Diarrheal diseases represent an important public health issue. Among the many pathogens that contribute to this problem, Salmonella enterica serovar Typhimurium is an important one. Due to the rise in antimicrobial resistance and the problems associated with widespread antibiotic use, the discovery and development of new strategies to combat bacterial infections is urgently needed. The microbiome field is constantly providing us with various health-related properties elicited by the commensals that inhabit their mammalian hosts. Harnessing the potential of these commensals for knowledge about host-microbe interactions as well as useful properties with therapeutic implications will likely remain a fruitful field for decades to come. In this manuscript, Wang et al use various methods, encompassing classic microbiology, genomics, chemical biology, and immunology, to identify a potent probiotic strain that protects nematode and murine hosts from S. enterica infection. Additionally, authors identify gut metabolites that are correlated with protection, and show that a single metabolite can recapitulate the effects of probiotic administration.

      We gratefully appreciate your positive and professional comments.

      Strengths:

      The utilization of varied methods by the authors, together with the impressive amount of data generated, to support the claims and conclusions made in the manuscript is a major strength of the work. Also, the ability to move beyond simple identification of the active probiotic, also identifying compounds that are at least partially responsible for the protective effects, is commendable.

      We gratefully appreciate your positive and professional comments.

      Weaknesses:

      Although there is a sizeable amount of data reported in the manuscript, there seems to be a chronic issue of lack of details of how some experiments were performed. This is particularly true in the figure legends, which for the most part lack enough details to allow comprehension without constant return to the text. Additionally, 2 figures are missing. Figure 6 is a repetition of Figure 5, and Figure S4 is an identical replicate of Figure S3.

      We gratefully appreciate your professional comments. Additional details to perform the related experiments had been added in Materials and methods section and figure legends (e.g., see Line 478-487, Line 996-1001, Line 1010-1012, Line 1019-1020, Line 1031-1033, Line 1041-1042, Line 1051-1053, Line 1082-1083, Line 1087-1088, Line 1093-1094, Line 1105-1107, Line 1113-1114,). Furthermore, we sincerely apologize for the mistakes and the inconvenience in the evaluating process of your review, and we have added the correct Figure 6 (see Line 1043-1053) and Figure S4 (see Line 1084-1088). We will carefully and thoroughly check the whole submitted manuscript along with supplementary information to avoid such mistakes in the future.

      Reviewer #2 (Public review):

      In this work, the investigators isolated one Lacticaseibacillus rhamnosus strain (P118), and determined this strain worked well against Salmonella Typhimurium infection. Then, further studies were performed to identify the mechanism of bacterial resistance, and a list of confirmatory assays was carried out to test the hypothesis.

      We gratefully appreciate your positive and professional comments.

      Strengths:

      The authors provided details regarding all assays performed in this work, and this reviewer trusted that the conclusion in this manuscript is solid. I appreciate the efforts of the authors to perform different types of in vivo and in vitro studies to confirm the hypothesis.

      We gratefully appreciate your positive and professional comments.

      Weaknesses:

      I have two main questions about this work.

      (1) The authors provided the below information about the sources from which Lacticaseibacillus rhamnosus was isolated. More details are needed. What are the criteria to choose these samples? Where did these samples originate from? How many strains of bacteria were obtained from which types of samples?

      Sorry for the ambiguous and limited information, more details had been added in Materials and methods section (see Line 480-496). We gratefully appreciate your professional comments.

      Lines 486-488: Lactic acid bacteria (LAB) and Enterococcus strains were isolated from the fermented yoghurts collected from families in multiple cities of China and the intestinal contents from healthy piglets without pathogen infection and diarrhoea by our lab.

      Sorry for the ambiguous and limited information, we had carefully revised this section and more details had been added in Materials and methods section (see Line 480-496). We gratefully appreciate your professional comments.

      Lines 129-133: A total of 290 bacterial strains were isolated and identified from 32 samples of the fermented yoghurt and piglet rectal contents collected across diverse regions within China using MRS and BHI medium, which consist s of 63 Streptococcus strains, 158 Lactobacillus/ Lacticaseibacillus Limosilactobacillus strains, and 69 Enterococcus strains.

      Sorry for the ambiguous information, we had carefully revised this section and more details had been added in this section (see Line 129-132). We gratefully appreciate your professional comments.

      (2) As a probiotic, Lacticaseibacillus rhamnosus has been widely studied. In fact, there are many commercially available products, and Lacticaseibacillus rhamnosus is the main bacteria in these products. There are also ATCC type strains such as 53103.

      I am sure the authors are also interested to know whether P118 is better as a probiotic candidate than other commercially available strains. Also, would the mechanism described for P118 apply to other Lacticaseibacillus rhamnosus strains?

      It would be ideal if the authors could include one or two Lacticaseibacillus rhamnosus which are currently commercially used, or from the ATCC. Then, the authors can compare the efficacy and antibacterial mechanisms of their P118 with other strains. This would open the windows for future work.

      We gratefully appreciate your professional comments and valuable suggestions. We deeply agree that it will be better and make more sense to include well-known/recognized/commercial probiotics as a positive control to comprehensively evaluate the isolated P118 strain as a probiotic candidate, particularly in comparison to other well-established probiotics, and also help assess whether the mechanisms described for P118 are applicable to other L. rhamnosus strains or lactic acid bacteria in general. Those issues will be fully taken into consideration and included in the further works.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Line 28 - The sentence "with great probiotic properties" suggests that this strain was already known to have probiotic properties. Is that the case?

      We gratefully appreciate your professional comments. This sentence "with great probiotic properties" in this context was intended as a summary of our findings, emphasizing that L. rhamnosus P118 exerts great probiotic properties after evaluating by traditional and C. elegans-infection screening strategies. We had revised this sentence (see Line27-30).

      (2) Line 30 - What exactly do authors mean by "traditional"? They should add a bit more information here as to what these methods would be.

      We gratefully appreciate your professional comments. By "traditional" methods, we refer to time-consuming and labor-intensive strategies for screening probiotic candidates with heavy works, which include bacterial isolation, culturing, phenotypic characterization, randomized controlled trials, and various in vitro and in vivo tests to assess probiotic properties (Sun et al., 2022). We had indicated this strategy in Line 91-94.

      Reference:

      Sun Y, Li HC, Zheng L, Li JZ, Hong Y, Liang PF, Kwok LY, Zuo YC, Zhang WY, Zhang HP. Iprobiotics: A machine learning platform for rapid identification of probiotic properties from whole-genome primary sequences. Briefings in Bioinformatics 2022;23.

      (3) Line 37 - I believe "harmful microbes" is not the correct term here. I suggest authors use "potentially harmful".

      Done as requested (see Line 36, 209, 212, 217, 381). We gratefully appreciate your valuable suggestions.

      (4) Line 75 - What exactly do authors mean by "irregular dietary consumption"?

      "irregular dietary consumption" means "irregular dietary habits" or " eating irregularly " or "abnormal eating behaviors". We had change to "irregular dietary habits" (see Line 76). We gratefully appreciate your professional comments.

      (5) Line 85 - What exactly do authors mean by "without residues in raw food products"?

      Here, "without residues in raw food products" means that probiotics barely remain in food animal products (e.g., meat, eggs, dairy) after dietary with probiotics in feeds by livestock and poultry. We gratefully appreciate your professional comments.

      (6) Line 86 - Please, give a specific example of yeast.

      Done as requested (see Line 85-86), “yeast (e.g., Saccharomyces boulardii, S. cerevisiae)”. We gratefully appreciate your valuable suggestions.

      (7) Line 112 - Lactobacillus reuteri should be written out, since this is the first time the species name appears in the main text.

      Done as requested (see Line 112). We gratefully appreciate your valuable suggestions.

      (8) Lines 115-118 - Please, rewrite for clarity.

      Done as requested (see Line 115-118). We gratefully appreciate your valuable suggestions.

      (9) Line 118 -Lacticaseibacillus rhamnosus should be written out, since this is the first time the species name appears in the main text.

      Done as requested (see Line 118). We gratefully appreciate your valuable suggestions.

      (10) Line 119 - Throughout the text authors make it seem like strain P118 was previously known. Is that the case? If yes, how was it isolated again? This should be briefly mentioned in the introduction.

      Sorry for the misunderstand caused by this statement, P118 strain was isolated and its probiotic properties were evaluated by our lab, not previously known, and we have revised this sentence (see Line 118-120). We gratefully appreciate your professional comments.

      (11) Line 131 - How were strains identified?

      Matrix-Assisted Laser Desorption Ionization-Time of Flight Mass Spectrometry (MALDI-TOF MS) method was employed to identify of bacterial species (He et al., 2022). This information was indicated in Materials and methods section (see Line 485-489). We gratefully appreciate your professional comments.

      Reference

      He D, Zeng W, Wang Y, Xing Y, Xiong K, Su N, Zhang C, Lu Y, Xing X. Isolation and characterization of novel peptides from fermented products of lactobacillus for ulcerative colitis prevention and treatment. Food Science and Human Wellness 2022;11:1464-74.

      (12) Figure 1 - Legend needs a lot more info. Where are legends to panels PQ? Also, some of the text is too small to read.

      Sorry for the limited info, we have revised Figure 1 legend and added more info (see Line 1000-1019), and we also provide vector graphic of Figure 1. We gratefully appreciate your professional comments.

      (13) Line 136 - All strains were screened and 27 strains were positive, right?

      Yes, all strains were screened and 27 strains were positive. We gratefully appreciate your professional comments.

      (14) Figure 2 - What do authors mean by "spleen index" and "liver index"? This should be described in more detail. Also, p values for 'a', 'b', 'ab' should be given.

      The organ index (spleen index, liver index) were calculated according to the formula: organ index = organ weight (g) / body weight (g) *1000, indicating in Materials and methods section (see Line 587-588). “Different lowercase letters ('a', 'b') indicate a significant difference (P < 0.05)” had been added in Line 1020-1029. We gratefully appreciate your professional comments.

      (15) Line 212-214 - Again, I suggest authors use "potentially harmful" and "potentially beneficial".

      Done as requested (see Line 36, 210, 213, 218, 383). We gratefully appreciate your valuable suggestions.

      (16) Figure 3 - Which groups were tested in panels CD? Is this based on color? Legends should be restated in panels or clearly marked in the legend.

      Sorry for this mistake, we have revised and added group info in Figure 3C-D (see Line 1013-1020). We gratefully appreciate your professional comments.

      (17) Figure 4 - Lacks details.

      Sorry for the mistakes, we have revised and added group info in Figure 4D-E and legend (see Line 1031-1037). We gratefully appreciate your professional comments.

      (18) Figure 6 - This is a repetition of Figure 5.

      Sorry for the mistakes, we have added the correct Figure 6 (see Line 1060-1070). We gratefully appreciate your professional comments.

      (19) Lines 329-330 - C. elegans does not "mimic" animal intestinal physiology.

      Sorry for the mistakes, we have revised this statement (see Line 139-142, 324-325). We gratefully appreciate your professional comments.

      (20) Lines 358 and 418 - What do authors mean by "metabolic dysfunction" and "metabolic disorder"? I assume they mean changes in fecal metabolites. However, these are terms that may have different interpretations in the field of human metabolism. Therefore, I would suggest that the authors specify that they mean changes in fecal metabolite profiles when using these terms.

      Sorry for the mistakes caused by this statement, we have revised this statement in the revised version (see Line 34-35, 122, 353-354, 413). We gratefully appreciate your professional comments.

      (21) Line 475 - What do authors mean by "superficial effects"?

      Sorry for the mistakes, we had change to “beneficial/protective effects” (see Line 469, Line 1074). We gratefully appreciate your professional comments.

      (22) Line 486 - Were all yogurts artisanal? Where were piglets from? How were samples collected? Feces, rectal swabs? Does the ethics statement at the end of the manuscript also cover work with piglets?

      Yes, all yogurts were artisanal. The 6 healthy piglet rectal content samples without pathogen infection and diarrhea were from a pig farm of Zhejiang province. Yes, the ethics statement at the end of the manuscript also cover the work with piglets.

      (23) Line 490 - Which MALDI platform was used? The database used can have important implications for strain identification. What was the confidence of ID? This should be included.

      Matrix-Assisted Laser Desorption Ionization-Time of Flight Mass Spectrometry (MALDI-TOF MS, Bruker Daltonik GmbH, Bremen, Germany) was employed to identify of bacterial species with a confidence level > 90%. This information was indicated in Materials and methods section (see Line 487-489). We gratefully appreciate your professional comments.

      (24) Line 501 - Is this a widely used method to characterize probiotics? Please, add a reference.

      Done as requested (see Line 498). Many probiotics or microbes can produce milk clotting enzyme to clot milk. It's an important measurement in the dairy industry, especially when making cheese (Zhang et al., 2023; Arbita et al., 2024; Shieh et al., 2009). The milk-clotting activity analysis is usually used for evaluating the potential ability of candidate probiotic isolates in clotting milk into cheeses.

      Reference:

      Zhang Y, Wang J, He J, Liu X, Sun J, Song X, Wu Y. Characteristics and application in cheese making of newly isolated milk-clotting enzyme from bacillus megaterium ly114. Food Res Int 2023;172:113202.

      Arbita AA, Zhao J. Milk clotting enzymes from marine resources and their role in cheese-making: A mini review. Crit Rev Food Sci Nutr. 2024;64(27):10036-10047.

      Chwen-Jen Shieh, Lan-Anh Phan Thi, Ing-Lung Shih. Milk-clotting enzymes produced by culture of Bacillus subtilis natto. Biochemical Engineering Journal. 2009;1(43): 85-91.

      (25) Line 713 - How were fecal metabolites extracted?

      Sorry for the missed information, the fecal metabolites extracted information had been added we have revised and added Materials and methods section (see Line 705-706). We gratefully appreciate your professional comments.

      (26) Figure 7 - Please correct "macrophages".

      Done as requested (see Figure 7, Line 1072). We gratefully appreciate your valuable suggestions.

      (27) Table 1 - Should read "number of strains", not size.

      Done as requested (see Line1084). We gratefully appreciate your valuable suggestions.

      (28) Figure S1B - Is this data for P118?

      Sorry for the mistakes, we have revised Figure S1 legend (see Line 1086-1088). We gratefully appreciate your professional comments.

      (29) Figure S3 - Legends C, S, PS, P are not specified.

      Sorry for the missed information, we have revised and added group info in Figure S3 legend (see Line 1095-1101). We gratefully appreciate your professional comments.

      (30) Figure S3B - What is the "clinical symptom score"? How was this determined?

      Sorry for the lack information, and the detailed information had been added in Materials and methods section (see Line 659-661, Table S7). We gratefully appreciate your professional comments.

      (31) Figure S4 - This is an identical copy of Figure S3.

      Sorry for the mistakes, we have added the correct Figure S4 (see Line 1103-1106). We gratefully appreciate your professional comments.

      (32) Figure S5 - Legend lacks details.

      Sorry for the missed information, we have revised and added group info in Figure S5 legend (see Line 1107-1112). We gratefully appreciate your professional comments.

      (33) Figure S8 - What is "GM"? Since it inhibits growth to a greater extent than the highest metabolite concentration used, I imagine it must be an antibiotic (gentamycin?) as a positive control. This needs to be clearly stated.

      Sorry for the missed information, GM: 100 μg/mL gentamicin (see Line 1134). We gratefully appreciate your professional comments.

      (34) Figure S9 - Labels for panels are missing.

      Sorry for the missed information, labels had been added (see Line 1135-1139). We gratefully appreciate your professional comments.

      Reviewer #2 (Recommendations for the authors):

      (1) This reviewer appreciates the efforts of the authors to provide the details related to this work. In the meantime, the manuscript shall be written in a way that is easy for the readers to follow.

      We had tried our best to revise and make improve the whole manuscript to make it easy for the readers to follow (e.g., see Line 27-30, Line 115-120, Line 129-132, Line 480-496). We gratefully appreciate your valuable suggestions.

      (2) For example, under the sections of Materials and Methods, there are 19 sub-titles. The authors could consider combining some sections, and/or citing other references for the standard procedures.

      We gratefully appreciate your professional comments and valuable suggestions. Some sections had been combined according to the reviewer’s suggestions (see Line 497-530, Line 637-671).

      (3) Another example: the figures have great resolution, but they are way too busy. Figures 1 and 2 have 14-18 panels. Figure 5 has 21 panels. Please consider separating into more figures, or condensing some panels.

      We deeply agree with you that some submitted figures are way too busy, but it’s not easy to move some results into supplementary information sections, because all of them are essential for fully supporting our hypothesis and conclusions. Nonetheless, some panels had been combined or condensed according to the reviewer’s suggestions (see Line 1000-1020, Line 1052-1071). We gratefully appreciate your professional comments and valuable suggestions.

      (4) Line 30: spell out "C." please.

      Done as requested (see Line 31). We gratefully appreciate your valuable suggestions.

    1. Author response:

      The following is the authors’ response to the original reviews

      eLife Assessment

      This valuable work explores how synaptic activity encodes information during memory tasks. All reviewers agree that the quality of the work is high. Although experimental data do support the possibility that phospholipase diacylglycerol signaling and synaptotagmin 7 (Syt7) dynamically regulate the vesicle pool required for presynaptic release, concerns remain that the central finding of paired pulse depression at very short intervals was more likely caused by Ca<sup>2+</sup> channel inactivation than pool depletion. Overall, this is a solid study with valuable findings, but the results warrant consideration of alternative interpretations.

      We greatly appreciate invaluable and constructive comments from Editors and Reviewers. We also thank for their time and patience. We are pleased for our manuscript to have been assessed valuable and solid.

      One of the most critical concerns was a possible involvement of Ca<sup>2+</sup> channel inactivation in the strong paired pulse depression (PPD). Meanwhile, we have measured total (free plus buffered) calcium increments induced by each of first four APs in 40 Hz trains at axonal boutons of prelimbic layer 2/3 pyramidal cells. We found that first four Ca<sup>2+</sup> increments were not different from one another, arguing against possible contribution of Ca<sup>2+</sup> channel inactivation to PPD. Please see our reply to the 2nd issue in the Weakness section of Reviewer #3.

      The second critical issue was on the definition of ‘vesicular probability’. Previously, vesicular probability (p<sub>v</sub>) has been used with reference to the releasable vesicle pool which includes not only tightly docked vesicles but also reluctant vesicles. On the other hand, the meaning of p<sub>v</sub> in the present study is the release probability of tightly docked vesicles. We clarified this point in our replies to the 1st issues in the Weakness sections of Reviewer #2 and Reviewer #3.

      We below described our point-by-point replies to the Reviewers’ comments.

      Public Reviews:

      Reviewer #1 (Public review):

      Shin et al. conduct extensive electrophysiological and behavioral experiments to study the mechanisms of short-term synaptic plasticity at excitatory synapses in layer 2/3 of the rat medial prefrontal cortex. The authors interestingly find that short-term facilitation is driven by progressive overfilling of the readily releasable pool, and that this process is mediated by phospholipase C/diacylglycerol signaling and synaptotagmin-7 (Syt7). Specifically, knockdown of Syt7 not only abolishes the refilling rate of vesicles with high fusion probability, but it also impairs the acquisition of trace fear memory. Overall, the authors offer novel insight to the field of synaptic plasticity through well-designed experiments that incorporate a range of techniques.

      Reviewer #2 (Public review):

      Summary:

      Shin et al aim to identify in a very extensive piece of work a mechanism that contributes to dynamic regulation of synaptic output in the rat cortex at the second time scale. This mechanism is related to a new powerful model is well versed to test if the pool of SV ready for fusion is dynamically scaled to adjust supply demand aspects. The methods applied are state-of-the-art and both address quantitative aspects with high signal to noise. In addition, the authors examine both excitatory output onto glutamatergic and GABAergic neurons, which provides important information on how general the observed signals are in neural networks, The results are compellingly clear and show that pool regulation may be predominantly responsible. Their results suggests that a regulation of release probability, the alternative contender for regulation, is unlikely to be involved in the observed short term plasticity behavior (but see below). Besides providing a clear analysis pof the underlying physiology, they test two molecular contenders for the observed mechanism by showing that loss of Synaptotagmin7 function and the role of the Ca dependent phospholipase activity seems critical for the short term plasticity behavior. The authors go on to test the in vivo role of the mechanism by modulating Syt7 function and examining working memory tasks as well as overall changes in network activity using immediate early gene activity. Finally, they model their data, providing strong support for their interpretation of TS pool occupancy regulation.

      Strengths:

      This is a very thorough study, addressing the research question from many different angles and the experimental execution is superb. The impact of the work is high, as it applies recent models of short term plasticity behavior to in vivo circuits further providing insights how synapses provide dynamic control to enable working memory related behavior through nonpermanent changes in synaptic output.

      Weaknesses:

      (1) While this work is carefully examined and the results are presented and discussed in a detailed manner, the reviewer is still not fully convinced that regulation of release provability is not a putative contributor to the observed behavior. No additional work is needed but in the moment I am not convinced that changes in release probability are not in play. One solution may be to extend the discussion of changes in release probability as an alternative.

      Quantal content (m) depends on n * p<sub>v</sub>, where n = RRP size and p<sub>v</sub> =vesicular release probability. The value for p<sub>v</sub> critically depends on the definition of RRP size. Recent studies revealed that docked vesicles have differential priming states: loosely or tightly docked state (LS or TS, respectively). Because the RRP size estimated by hypertonic solution or long presynaptic depolarization is larger than that by back extrapolation of a cumulative EPSC plot (Moulder & Mennerick, 2005; Sakaba, 2006) in glutamatergic synapses, the former RRP (denoted as RRP<sub>hyper</sub>) may encompass not only AP-evoked fast-releasing vesicles (TS vesicle) but also reluctant vesicles (LS vesicles). Because we measured p<sub>v</sub> based on AP-evoked EPSCs such as strong paired pulse depression (PPD) and associated failure rates, p<sub>v</sub> in the present study denotes vesicular fusion probability of TS vesicles, not that of LS plus TS vesicles.

      Recent studies suggest that release sites are not fully occupied by TS vesicles in the baseline (Miki et al., 2016; Pulido and Marty, 2018; Malagon et al., 2020; Lin et al., 2022). Instead, the occupancy (p<sub>occ</sub>) by TS vesicles is subject to dynamic regulation by reversible rate constants (denoted by k<sub>1</sub> and b<sub>1</sub>, respectively). The number of TS vesicles (n) can be factored into the number of release sites (N) and p<sub>occ</sub>, among which N is a fixed parameter but p<sub>occ</sub> depends on k<sub>1</sub>/(k<sub>1</sub>+b<sub>1</sub>) under the framework of the simple refilling model (see Methods). Because these refilling rate constants are regulated by Ca<sup>2+</sup> (Hosoi, et al., 2008), p<sub>occ</sub> is not a fixed parameter. Therefore, release probability should be re-defined as p<sub>occ</sub> * p<sub>v</sub>. Given that N is fixed, the increase in release probability is a major player in STF. Our study asserts that STF by 2.3 times can be attributed to an increase in p<sub>occ</sub> rather than p<sub>v</sub>, because p<sub>v</sub> is close to unity (Fig. S8). Moreover, strong PPD was observed not only in the baseline but also at the early and in the middle of a train (Fig. 2 and 7) and during the recovery phase (Fig. 3), arguing against a gradual increase in p<sub>v</sub> of reluctant vesicles.

      We imagine that the Reviewer meant vesicular release or fusion probability (p<sub>v</sub>) by ‘release probability’. If so, p<sub>v</sub> (of TS vesicles) cannot be a major player in STF, because the baseline p<sub>v</sub> is already higher than 0.8 even if it is most parsimoniously estimated (Fig. 2). Moreover, considering very high refilling rate (23/s), the high double failure rate cannot be explained without assuming that p<sub>v</sub> is close to unity (Fig. S8).

      Conventional models for facilitation assume a post-AP residual Ca<sup>2+</sup>-dependent step increase in p<sub>v</sub> of RRP (Dittman et al., 2000) or reluctant vesicles (Turecek et al., 2016). Given that p<sub>v</sub> of TS vesicles is close to one, an increase in p<sub>v</sub> of TS vesicles cannot account for facilitation. The possibility for activity-dependent increase in fusion probability of LS vesicles (denoted as p<sub>v,LS</sub>) should be considered in two ways depending on whether LS and TS vesicles reside in distinct pools or in the same pool. Notably, strong PPD at short ISI implies that p<sub>v,LS</sub> is near zero at the resting state. Whereas LS vesicles do not contribute to baseline transmission, short-term facilitation (STF) may be mediated by cumulative increase in p<sub>v v,LS </sub> that reside in a distinct pool. Because the increase in p<sub>v,LS</sub> during facilitation recruits new release sites (increase in N), the variance of EPSCs should become larger as stimulation frequency increases, resulting in upward deviation from a parabola in the V-M plane, as shown in recent studies (Valera et al., 2012; Kobbersmed et al., 2020). This prediction is not compatible with our results of V-M analysis (Fig. 3), showing that EPSCs during STF fell on the same parabola regardless of stimulation frequencies. Therefore, it is unlikely that an increase in fusion probability of reluctant vesicles residing in a distinct release pool mediates STF in the present study.

      For the latter case, in which LS and TS vesicles occupy in the same release sites, it is hard to distinguish a step increase in fusion probability of LS vesicles from a conversion of LS vesicles to TS. Nevertheless, our results do not support the possibility for gradual increase in p<sub>v,LS</sub> that occurs in parallel with STF. Strong PPD, indicative of high p<sub>v</sub>, was consistently found not only in the baseline (Fig. 2 and Fig. S6) but also during post-tetanic augmentation phase (Fig. 3D) and even during the early development of facilitation (Fig. 2D-E and Fig. 7), arguing against gradual increase in p<sub>v,LS</sub>. One may argue that STF may be mediated by a drastic step increase of p<sub>v,LS</sub> from zero to one, but it is not distinguishable from conversion of LS to TS vesicles.

      To address the reviewer’s concern, we incorporated these perspectives into Discussion and further clarified the reasoning behind our conclusions.

      References

      Moulder KL, Mennerick S (2005) Reluctant vesicles contribute to the total readily releasable pool in glutamatergic hippocampal neurons. J Neurosci 25:3842–3850.

      Sakaba, T (2006) Roles of the fast-releasing and the slowly releasing vesicles in synaptic transmission at the calyx of Held. J Neurosci 26(22): 5863-5871.

      Please note that papers cited in the manuscript are not repeated here.

      (2) Fig 3 I am confused about the interpretation of the Mean Variance analysis outcome. Since the data points follow the curve during induction of short term plasticity, aren't these suggesting that release probability and not the pool size increases? Related, to measure the absolute release probability and failure rate using the optogenetic stimulation technique is not trivial as the experimental paradigm bias the experiment to a given output strength, and therefore a change in release probability cannot be excluded.

      Under the recent definition of release probability, it can be factored into p<sub>v</sub> and p<sub>occ</sub>, which are fusion probability of TS vesicles and the occupancy of release sites by TS vesicles, respectively. With this regard, our interpretation of the Variance-Mean results is consistent with conventional one: different data points along a parabola represent a change in release probability (= p<sub>occ</sub> x p<sub>v</sub>). Our novel finding is that the increase in release probability should be attributed to an increase in p<sub>occ</sub>, not to that in p<sub>v</sub>.

      (3) Fig4B interprets the phorbol ester stimulation to be the result of pool overfilling, however, phorbol ester stimulation has also been shown to increase release probability without changing the size of the readily releasable pool. The high frequency of stimulation may occlude an increased paired pulse depression in presence of OAG, which others have interpreted in mammalian synapses as an increase in release probability.

      To our experience in the calyx of Held synapses, OAG, a DAG analogue, increased the fast releasing vesicle pool (FRP) size (Lee JS et al., 2013), consistent with our interpretation (pool overfilling). Once the release sites are overfilled in the presence of OAG, it is expected that the maximal STF (ratio of facilitated to baseline EPSCs) becomes lower as long as the number of release sites (N) are limited. As aforementioned, the baseline p<sub>v</sub> is already close to one, and thus it cannot be further increased by OAG. Instead, the baseline p<sub>occ</sub> seems to be increased by OAG.

      Reference

      Lee JS, et al., Superpriming of synaptic vesicles after their recruitment to the readily releasable pool. Proc Natl Acad Sci U S A, 2013. 110(37): 15079-84.

      (4) The literature on Syt7 function is still quite controversial. An observation in the literature that loss of Syt7 function in the fly synapse leads to an increase of release probability. Thus the observed changes in short term plasticity characteristics in the Syt7 KD experiments may contain a release probability component. Can the authors really exclude this possibility? Figure 5 shows for the Syt7 KD group a very prominent depression of the EPSC/IPSC with the second stimulus, particularly for the short interpulse intervals, usually a strong sign of increased release probability, as lack of pool refilling can unlikely explain the strong drop in synaptic output.

      The reviewer raises an interesting point regarding the potential link between Syt7 KD and increased initial p<sub>v</sub>, particularly in light of observations in Drosophila synapses (Guan et al., 2020; Fujii et al., 2021), in which Syt7 mutants exhibited elevated initial p<sub>v</sub>. However, it is important to note that these findings markedly differ from those in mammalian systems, where the role of Syt7 in regulating initial p<sub>v</sub> has been extensively studied. In rodents, consistent evidence indicates that Syt7 does not significantly affect initial p<sub>v</sub>, as demonstrated in several studies (Jackman et al., 2016; Chen et al., 2017; Turecek and Regehr, 2018). Furthermore, in our study of excitatory synapses in the mPFC layer 2/3, we observed an initial p<sub>v</sub> already near its maximal level, approaching a value of 1. Consequently, it is unlikely that the loss of Syt7 could further elevate the initial p<sub>v</sub>. Instead, such effects are more plausibly explained by alternative mechanisms, such as alterations in vesicle replenishment dynamics, rather than a direct influence on p<sub>v</sub>.

      References

      Chen, C., et al., Triple Function of Synaptotagmin 7 Ensures Efficiency of High-Frequency Transmission at Central GABAergic Synapses. Cell Rep, 2017. 21(8): 2082-2089.

      Fujii, T., et al., Synaptotagmin 7 switches short-term synaptic plasticity from depression to facilitation by suppressing synaptic transmission. Scientific reports, 2021. 11(1): 4059.

      Guan, Z., et al., Drosophila Synaptotagmin 7 negatively regulates synaptic vesicle release and replenishment in a dosage-dependent manner. Elife, 2020. 9: e55443.

      Jackman, S.L., et al., The calcium sensor synaptotagmin 7 is required for synaptic facilitation. Nature, 2016. 529(7584): 88-91.

      Turecek, J. and W.G. Regehr, Synaptotagmin 7 mediates both facilitation and asynchronous release at granule cell synapses. Journal of Neuroscience, 2018. 38(13): 3240-3251.

      Reviewer #3 (Public review):

      Summary:

      The report by Shin, Lee, Kim, and Lee entitled "Progressive overfilling of readily releasable pool underlies short-term facilitation at recurrent excitatory synapses in layer 2/3 of the rat prefrontal cortex" describes electrophysiological experiments of short-term synaptic plasticity during repetitive presynaptic stimulation at synapses between layer 2/3 pyramidal neurons and nearby target neurons. Manipulations include pharmacological inhibition of PLC and actin polymerization, activation of DAG receptors, and shRNA knockdown of Syt7. The results are interpreted as support for the hypothesis that synaptic vesicle release sites are vacant most of the time at resting synapses (i.e., p_occ is low) and that facilitation (and augmentation) components of short-term enhancement are caused by an increase in occupancy, presumably because of acceleration of the transition from not-occupied to occupied. The report additionally describes behavioural experiments where trace fear conditioning is degraded by knocking down syt7 in the same synapses.

      Strengths:

      The strength of the study is in the new information about short-term plasticity at local synapses in layer 2/3, and the major disruption of a memory task after eliminating short-term enhancement at only 15% of excitatory synapses in a single layer of a small brain region. The local synapses in layer 2/3 were previously difficult to study, but the authors have overcome a number of challenges by combining channel rhodopsins with in vitro electroporation, which is an impressive technical advance.

      Weaknesses:

      (1) The question of whether or not short-term enhancement causes an increase in p_occ (i.e., "readily releasable pool overfilling") is important because it cuts to the heart of the ongoing debate about how to model short term synaptic plasticity in general. However, my opinion is that, in their current form, the results do not constitute strong support for an increase in p_occ, even though this is presented as the main conclusion. Instead, there are at least two alternative explanations for the results that both seem more likely. Neither alternative is acknowledged in the present version of the report.

      The evidence presented to support overfilling is essentially two-fold. The first is strong paired pulse depression of synaptic strength when the interval between action potentials is 20 or 25 ms, but not when the interval is 50 ms. Subsequent stimuli at frequencies between 5 and 40 Hz then drive enhancement. The second is the observation that a slow component of recovery from depression after trains of action potentials is unveiled after eliminating enhancement by knocking down syt7. Of the two, the second is predicted by essentially all models where enhancement mechanisms operate independently of release site depletion - i.e., transient increases in p_occ, p_v, or even N - so isn't the sort of support that would distinguish the hypothesis from alternatives (Garcia-Perez and Wesseling, 2008, https://doi.org/10.1152/jn.01348.2007).

      The apparent discrepancy in interpretation of post-tetanic augmentation between the present and previous papers [Sevens Wesseling (1999), Garcia-Perez and Wesseling (2008)] is an important issue that should be clarified. We noted that different meanings of ‘vesicular release probability’ in these papers are responsible for the discrepancy. We added an explanation to Discussion on the difference in the meaning of ‘vesicular release probability’ between the present study and previous studies [Sevens Wesseling (1999), Garcia-Perez and Wesseling (2008)]. In summary, the p<sub>v</sub> in the present study was used for vesicular release probability of TS vesicles, while previous studies used it as vesicular release probability of vesicles in the RRP, which include LS and TS vesicles. Accordingly, p<sub>occ</sub> in the present study is the occupancy of release sites by TS vesicles.

      Not only double failure rate but also other failure rates upon paired pulse stimulation were best fitted at p<sub>v</sub> close to 1 (Fig. S8 and associated text). Moreover, strong PPD, indicating release of vesicles with high p<sub>v</sub>, was observed not only at the beginning of a train but also in the middle of a 5 Hz train (Fig. 2D), during the augmentation phase after a 40 Hz train (Fig 3D), and in the recovery phase after three pulse bursts (Fig. 7). Given that p<sub>v</sub> is close to 1 throughout the EPSC trains and that N does not increase during a train (Fig. 3), synaptic facilitation can be attained only by the increase in p<sub>occ</sub> (occupancy of release sites by TS vesicles). In addition, it should be noted that Fig. 7 demonstrates strong PPD during the recovery phase after depletion of TS vesicles by three pulse bursts, indicating that recovered vesicles after depletion display high p<sub>v</sub> too. Knock-down of Syt7 slowed the recovery of TS vesicles after depletion of TS vesicles, highlighting that Syt7 accelerates the recovery of TS vesicles following their depletion.

      As addressed in our reply to the first issue raised by Reviewer #2 and the third issue raised by Reviewer #3, our results do not support possibilities for recruitment of new release sites (increase in N) having low p<sub>v</sub> or for a gradual increase in p<sub>v</sub> of reluctant vesicles during short-term facilitation.  

      Following statement was added to Discussion in the revised manuscript

      “Previous studies suggested that an increase in p<sub>v</sub> is responsible for post-tetanic augmentation (Stevens and Wesseling, 1999; Garcia-Perez and Wesseling, 2008) by observing invariance of the RRP size after tetanic stimulation. In these studies, the RRP size was estimated by hypertonic sucrose solution or as the sum of EPSCs evoked 20 Hz/60 pulses train (denoted as ‘RRP<sub>hyper</sub>’). Because reluctant vesicles (called LS vesicles) can be quickly converted to TS vesicles (16/s) and are released during a train (Lee et al., 2012), it is likely that the RRP size measured by these methods encompasses both LS and TS vesicles. In contrast, we assert high p<sub>v</sub> based on the observation of strong PPD and failure rates upon paired stimulations at ISI of 20 ms (Fig. 2 and Fig. S8). Given that single AP-induced vesicular release occurs from TS vesicles but not from LS vesicles, p<sub>v</sub> in the present study indicates the fusion probability of TS vesicles. From the same reasons, p<sub>occ</sub> denotes the occupancy of release sites by TS vesicles. Note that our study does not provide direct clue whether release sites are occupied by LS vesicles that are not tapped by a single AP, although an increase in the LS vesicle number may accelerate the recovery of TS vesicles. As suggested in Neher (2024), even if the number of LS plus TS vesicles are kept constant, an increase in p<sub>occ</sub> (occupancy by TS vesicles) would be interpreted as an increase in ‘vesicular release probability’ as in the previous studies (Stevens and Wesseling (1999); Garcia-Perez and Wesseling (2008)) as long as it was measured based on RRP<sub>hyper</sub>.”

      (2) Regarding the paired pulse depression: The authors ascribe this to depletion of a homogeneous population of release sites, all with similar p_v. However, the details fit better with the alternative hypothesis that the depression is instead caused by quickly reversing inactivation of Ca<sup>2+</sup> channels near release sites, as proposed by Dobrunz and Stevens to explain a similar phenomenon at a different type of synapse (1997, PNAS, https://doi.org/10.1073/pnas.94.26.14843). The details that fit better with Ca<sup>2+</sup> channel inactivation include the combination of the sigmoid time course of the recovery from depression (plotted backwards in Fig1G,I) and observations that EGTA (Fig2B) increases the paired-pulse depression seen after 25 ms intervals. That is, the authors ascribe the sigmoid recovery to a delay in the activation of the facilitation mechanism, but the increased paired pulse depression after loading EGTA indicates, instead, that the facilitation mechanism has already caused p_r to double within the first 25 ms (relative to the value if the facilitation mechanism was not active). Meanwhile, Ca<sup>2+</sup> channel inactivation would be expected to cause a sigmoidal recovery of synaptic strength because of the sigmoidal relationship between Ca<sup>2+</sup>-influx and exocytosis (Dodge and Rahamimoff, 1967, https://doi.org/10.1113/jphysiol.1967.sp008367).

      The Ca<sup>2+</sup>-channel inactivation hypothesis could probably be ruled in or out with experiments analogous to the 1997 Dobrunz study, except after lowering extracellular Ca<sup>2+</sup> to the point where synaptic transmission failures are frequent. However, a possible complication might be a large increase in facilitation in low Ca<sup>2+</sup> (Fig2B of Stevens and Wesseling, 1999, https://doi.org/10.1016/s0896-6273(00)80685-6).

      We appreciate the reviewer's thoughtful comment regarding the potential role of Ca<sup>2+</sup> channel inactivation in the observed paired-pulse depression (PPD). As noted by the Reviewer, the Dobrunz and Stevens (1997) suggested that the high double failure rate at short ISIs in synapses exhibiting PPD can be attributed to Ca<sup>2+</sup> channel inactivation. This interpretation seems to be based on a premise that the number of RRP vesicles are not varied trial-by-trial. The number of TS vesicles, however, can be dynamically regulated depending on the parameters k<sub>1</sub> and b<sub>1</sub>, as shown in Fig. S8, implying that the high double failure rate at short ISIs cannot be solely attributed to Ca<sup>2+</sup> channel inactivation. Nevertheless, we acknowledge the possibility that Ca<sup>2+</sup> channel inactivation may contribute to PPD, and therefore, we have further investigated this possibility. Specifically, we measured action potential (AP)-evoked Ca<sup>2+</sup> transients at individual axonal boutons of layer 2/3 pyramidal cells in the mPFC using two-dye ratiometry techniques. Our analysis revealed no evidence for Ca<sup>2+</sup> channel inactivation during a 40 Hz train of APs. This finding indicates that voltage-gated Ca<sup>2+</sup> channel inactivation is unlikely to contribute to the pronounced PPD.

      Figure 2—figure supplement 2 shows how we measured the total Ca<sup>2+</sup> increments at axonal boutons. First we estimated endogenous Ca<sup>2+</sup>-binding ratio from analyses of single AP-induced Ca<sup>2+</sup> transients at different concentrations of Ca<sup>2+</sup> indicator dye (panels A to E). And then, using the Ca<sup>2+</sup> buffer properties, we converted free [Ca<sup>2+</sup>] amplitudes to total calcium increments for the first four AP-evoked Ca<sup>2+</sup> transients in a 40 Hz train (panels G-I). We incorporated these results into the revised version of our manuscript to provide evidence against the Ca<sup>2+</sup> channel inactivation.

      (3) On the other hand, even if the paired pulse depression is caused by depletion of release sites rather than Ca<sup>2+</sup>-channel inactivation, there does not seem to be any support for the critical assumption that all of the release sites have similar p_v. And indeed, there seems to be substantial emerging evidence from other studies for multiple types of release sites with 5 to 20-fold differences in p_v at a wide variety of synapse types (Maschi and Klyachko, eLife, 2020, https://doi.org/10.7554/elife.55210; Rodriguez Gotor et al, eLife, 2024, https://doi.org/10.7554/elife.88212 and refs. therein). If so, the paired pulse depression could be caused by depletion of release sites with high p_v, whereas the facilitation could occur at sites with much lower p_v that are still occupied. It might be possible to address this by eliminating assumptions about the distribution of p_v across release sites from the variance-mean analysis, but this seems difficult; simply showing how a few selected distributions wouldn't work - such as in standard multiple probability fluctuation analyses - wouldn't add much.

      We appreciate the reviewer’s insightful comments regarding the potential increase in p<sub>fusion</sub> of reluctant vesicles. It should be noted, however, that Maschi and Klyachko (2020) showed a distribution of release probability (p<sub>r</sub>) within a single active zone rather than a heterogeneity in p<sub>fusion</sub> of individual docked vesicles. Therefore both p<sub>occ</sub> and p<sub>v</sub> of TS vesicles would contribute to the p<sub>r</sub> distribution shown in Maschi and Klyachko (2020). 

      The Reviewer’s concern aligns closely with the first issue raised by Reviewer #2, to which we addressed in detail. Briefly, new release site may not be recruited during facilitation or post-tetanic augmentation, because variance of EPSCs during and after a train fell on the same parabola (Fig. 3). Secondly, strong PPD was observed not only in the baseline but also during early and late phases of facilitation, indicating that vesicles with very high p<sub>v</sub> contribute to EPSC throughout train stimulations (Fig. 2, 3, and 7). These findings argue against the possibilities for recruitment of new release sites harboring low p<sub>v</sub> vesicles and for a gradual increase in fusion probability of reluctant vesicles.

      To address the reviewers’ concern, we incorporated the perspectives into Discussion and further clarified the reasoning behind our conclusions.

      (4) In any case, the large increase - often 10-fold or more - in enhancement seen after lowering Ca<sup>2+</sup> below 0.25 mM at a broad range of synapses and neuro-muscular junctions noted above is a potent reason to be cautious about the LS/TS model. There is morphological evidence that the transitions from a loose to tight docking state (LS to TS) occur, and even that the timing is accelerated by activity. However, 10-fold enhancement would imply that at least 90 % of vesicles start off in the LS state, and this has not been reported. In addition, my understanding is that the reverse transition (TS to LS) is thought to occur within 10s of ms of the action potential, which is 10-fold too fast to account for the reversal of facilitation seen at the same synapses (Kusick et al, 2020, https://doi.org/10.1038/s41593-020-00716-1).

      As the Reviewer suggested, low external Ca<sup>2+</sup> concentration can lower release probability (p<sub>r</sub>). Given that both p<sub>v</sub> and p<sub>occ</sub> are regulated by [Ca<sup>2+</sup>]<sub>i</sub>, low external [Ca<sup>2+</sup>] may affect not only p<sub>v</sub> but also p<sub>occ</sub>, both of which would contribute to low p<sub>r</sub>. Under such conditions, it would be plausible that the baseline p<sub>r</sub> becomes much lower than 0.1 due to low p<sub>v</sub> and p<sub>occ</sub> (for instance, p<sub>v</sub> decreases from 1 to 0.5, and p<sub>occ</sub> from 0.3 to 0.1, then p<sub>r</sub> = 0.05), and then p<sub>r</sub> (= p<sub>v</sub> x p<sub>occ</sub>) has a room for an increase by a factor of ten (0.5, for example) by short-term facilitation as cytosolic [Ca<sup>2+</sup>] accumulates during a train.

      If p<sub>v</sub> is close to one, p<sub>r</sub> depends p<sub>occ</sub>, and thus facilitation depends on the number of TS vesicles just before arrival of each AP of a train. Thus, post-train recovery from facilitation would depend on restoration of equilibrium between TS and LS vesicles to the baseline. Even if transition between LS and TS vesicles is very fast (tens of ms), the equilibrium involved in de novo priming (reversible transitions between recycling vesicle pool and partially docked LS vesicles) seems to be much slower (13 s in Fig. 5A of Wu and Borst 1999). Thus, we can consider a two-step priming model (recycling pool -> LS -> TS), which is comprised of a slow 1st step (-> LS) and a fast 2nd step (-> TS). Under the framework of the two-step model, the slow 1st step (de novo priming step) is the rate limiting step regulating the development and recovery kinetics of facilitation. Given that on and off rate for Ca<sup>2+</sup> binding to Syt7 is slow, it is plausible that Syt7 may contribute to short-term facilitation (STF) by Ca<sup>2+</sup>-dependent acceleration of the 1st step (as shown in Fig. 9). During train stimulation, the number of LS vesicles would slowly accumulate in a Syt7 and Ca<sup>2+</sup>-dependent manner, and this increase in LS vesicles would shift LS/TS equilibrium towards TS, resulting in STF. After tetanic stimulation, the recovery kinetics from facilitation would be limited by slow recovery of LS vesicles.

      Reference

      Wu, L.-G. and Borst J.G.G. (1999) The reduced release probability of releasable vesicles during recovery from short-term synaptic depression. Neuron, 23(4): 821-832.

      Please note that papers cited in the manuscript are not repeated here.

      Individual points:

      (1) An additional problem with the overfilling hypothesis is that syt7 knockdown increases the estimate of p_occ extracted from the variance-mean analysis, which would imply a faster transition from unoccupied to occupied, and would consequently predict faster recovery from depression. However, recovery from depression seen in experiments was slower, not faster. Meanwhile, the apparent decrease in the estimate of N extracted from the mean-variance analysis is not anticipated by the authors' model, but fits well with alternatives where p_v varies extensively among release sites because release sites with low p_v would essentially be silent in the absence of facilitation.

      Slower recovery from depression observed in the Syt7 knockdown (KD) synapses (Fig. 7) may results from a deficiency in activity-dependent acceleration of TS vesicle recovery. Although basal occupancy was higher in the Syt7 KD synapses, this does not indicate a faster activity-dependent recovery.

      Higher baseline occupancy does not always imply faster recovery of PPR too. Actually PPR recovery was slower in Syt7 KD synapses than WT one (18.5 vs. 23/s). Under the framework of the simple refilling model (Fig. S8Aa), the baseline occupancy and PPR recovery rate are calculated as k<sub>1</sub> / (k<sub>1</sub> + b<sub>1</sub>) and (k<sub>1</sub> + b<sub>1</sub>), respectively. The baseline occupancy depends on k<sub>1</sub>/b<sub>1</sub>, while the PPR recovery on absolute values of k<sub>1</sub> and b<sub>1</sub>. Based on p<sub>occ</sub> and PPR recovery time constant of WT and KD synapses, we expect higher k<sub>1</sub>/b<sub>1</sub> but lower values for (k<sub>1</sub> + b<sub>1</sub>) in Syt7 KD synapses compared to WT ones.

      Lower release sites (N) in Syt7-KD synapses was not anticipated. As you suggested, such low N might be ascribed to little recruitment of release sites during a train in KD synapses. But our results do not support this model. If silent release sites are recruited during a train, the variance should upwardly deviate from the parabola predicted under a fixed N (Valera et al., 2012; Kobbersmed et al. 2020). Our result was not the case (Fig. 3). In the first version of the manuscript, we have argued against this possibility in line 203-208.

      As discussed in both the Results and Discussion sections, the baseline EPSC was unchanged by KD (Fig. S3) because of complementary changes in the number of docking sites and their baseline occupancy (Fig. 6). These findings suggest that Syt7 may be involved in maintaining additional vacant docking sites, which could be overfilled during facilitation. It remains to be determined whether the decrease in docking sites in Syt7 KD synapses is related to its specific localization of Syt7 at the plasma membrane of active zones, as proposed in previous studies (Sugita et al., 2001; Vevea et al., 2021).

      (2) Figure S4A: I like the TTX part of this control, but the 4-AP part needs a positive control to be meaningful (e.g., absence of TTX).

      The reason why we used 4-AP in the presence of TTX was to increase the length constant of axon fibers and to facilitate the conduction of local depolarization in the illumination area to axon terminals. The lack of EPSC in the presence of 4-AP and TTX indicates that illumination area is distant from axon terminals enough for optic stimulation-induced local depolarization not to evoke synaptic transmission. This methodology has been employed in previous studies including the work of Little and Carter (2013).

      Reference

      Little JP and Carter AG (2013) Synaptic mechanisms underlying strong reciprocal connectivity between the medial prefrontal cortex and basolateral amygdala. J Neurosci, 33(39): 15333-15342.

      (3) Line 251: At least some of the previous studies that concluded these drugs affect vesicle dynamics used logic that was based on some of the same assumptions that are problematic for the present study, so the reasoning is a bit circular.

      (4) Line 329 and Line 461: A similar problem with circularity for interpreting earlier syt7 studies.

      (Reply to #3 and #4) We selected the target molecules as candidates based on their well-characterized roles in vesicle dynamics, and aimed to investigate what aspects of STP are affected by these molecules in our experimental context. For example, we could find that the baseline p<sub>occ</sub> and short-term facilitation (STF) are enhanced by the baseline DAG level and train stimulation-induced PLC activation, respectively. Notably, the effect of dynasore informed us that slow site clearing is responsible for the late depression of 40 Hz train EPSC. The knock-down experiments also provided us with information on the critical role of Syt7 in replenishment of TS vesicles. These approaches do not deviate from standard scientific reasoning but rather builds upon prior knowledge to formulate and test hypotheses.

      Importantly, our conclusions do not rely solely on the assumption that altering the target molecule impacts synaptic transmission. Instead, our conclusions are derived from a comprehensive analysis of diverse outcomes obtained through both pharmacological and genetic manipulations. These interpretations align closely with prior literature, further validating our conclusions.

      Therefore, the use of established studies to guide candidate selection and the consistency of our findings with existing knowledge do not represent a logical circularity but rather a reinforcement of the proposed mechanism through converging lines of evidence.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Comments:

      (1) While the authors claim that Syt7-mediated facilitation is connected to the behavioral deficits they observed, this link is still somewhat speculative. This manuscript could benefit from further discussions of other alternative mechanisms to consider.

      We added following statement to Discussion of the revised manuscript:

      “The acquisition of trace fear memory was impaired by inhibition of persistent activity in mPFC during trace period (Gilmartin et al., 2013). The similar deficit observed in Syt7 KD animals is consistent with the hypothesis that STF provides bi-stable ensemble activity in a recurrent network (Mongillo et al., 2012). Nevertheless, alternative mechanisms may be responsible for the behavioral deficit. Not only recurrent network but also long-range loop between the mPFC and the mediodorsal (MD) thalamus play a critical role in maintaining persistent activity within the mPFC especially for a delay period longer than 10 s (Bolkan et al., 2017). Prefrontal L2/3 is heavily innervated by MD thalamus, and L2/3-PCs subsequently relay signals to L5 cortico-thalamic (CT) neurons (Collins et al., 2018). Given that L2/3 is an essential component of the PFC-thalamic loop, loss of STF at recurrent synapses between L2/3 PCs may lead to insufficient L2/3 inputs to L5 CT neurons and failure in the reverberant PFC-MD thalamic feedback loop. Therefore, not only L2/3 recurrent network but also its output to downstream network should be considered as a possible network mechanism underlying behavioral deficit caused by Syt7 KD L2/3.”

      (2) The authors mention that Syt7 contributes to persistent activity during working memory tasks but focus on using only a trace fear conditioning task. However, it would be interesting to see if their results are generalizable to other working memory tasks (i.e. a delayed alternation task).

      We thank to Reviewer for the insightful suggestion. Trace fear conditioning (tFC) shares behavioral properties with working memory (WM) tasks in that tFC is vulnerable to attentional distraction and to the load of WM task. In general WM tasks including delayed alternation tasks such as a T-maze task need persistent activity of ensemble neurons representing target-specific information among multiple choices. Different from such WM tasks, tFC is not appropriate to examine target-specific ensemble activity. Because it is not trivial to examine in vivo recordings in KD animals during delayed alternation tasks, it will be appropriate to study the effect of Syt7 KD in a separate study. 

      (3) The figure legend in Figure 6A and 6B mentions dotted lines and broken lines in the figure. However, this is confusing, and it is unclear as to what these lines are referring to in the figure.

      To avoid the confusion in the figure legend for Figure 6A and 6B, we corrected “dotted line” to " vertical broken line", and “broken lines” to “dashed parabolas”.

      (4) The manuscript can benefit from close reading and editing to catch typos and improve general readability (i.e. line 173: the word "are" is repeated twice).

      We corrected typographical errors throughout the manuscript and carefully read the manuscript to improve readability. A revised version reflecting these corrections has been prepared and will be resubmitted for your consideration.

      Reviewer #3 (Recommendations for the authors):

      The points in this section are all minor.

      (1) Line 44: Define release probability (p_r) more clearly. Authors use it to mean p<sub>v</sub>*p<sub>occ</sub>, but others routinely use it to mean p<sub>v</sub>*p<sub>occ</sub>*N.

      We understand that the Reviewer meant “others routinely use it to mean p<sub>v</sub>”. At this statement, we meant conventional definition of release probability, which is release probability among vesicles of RRP. We think that it is not appropriate to re-define release probability as p<sub>v</sub> * p<sub>occ</sub> in this first paragraph of Introduction. Therefore we clarified this issue in Discussion as we mentioned in our reply to the 1st weakness issue raised by Reviewer #3.   

      (2) Line 82: For clarity, define better what recurrent excitatory synapses are. It seems that synapses between L2/3 PCs and local targets may all be recurrent?

      Each of L2/3 and L5 of the prefrontal cortical layers harbors intralaminar recurrent excitatory synapses between pyramidal cells, called a recurrent network. Previous theoretical studies have proposed that a single layer recurrent network model can have bi-stable E/I balanced states (up- and down-states) if recurrent excitatory synapses display short-term facilitation (STF), and thus is able to temporally hold an information once external input shifts the network to the up-state. In this theory, synapses to local targets across layers are not considered and specific roles of L2/3 and L5 in working memory tasks are still elusive. For clarity, we added a statement at the beginning of the paragraph (line 82): “Each of layer 2/3 (L2/3) and layer 5 (L5) of neocortex displays intralaminar excitatory synapses between pyramidal cells comprising a recurrent network (Holmgren et al., 2003; Thomson and Lamy, 2007)”

      (3) Cite earlier studies of short-term synaptic plasticity at synapses between L2/3 pyramidal neurons and local targets in mPFC. If there are none, take more explicit credit for being first.

      As we mentioned in Introduction, previous studies on short-term plasticity (STP) at neocortical excitatory recurrent synapses have focused on synapses between L5 pyramidal cells (PCs) (Hemple et al. 2000; Wang et al. 2006; Morishima et al., 2011; Yoon et al., 2020). The local connectivity between L2/3 PCs in the somatosensory cortex has been elucidated by Homgren et al. (2003) and Ko et al. (2011). Although these study showed STP of EPSPs, it was at a fixed frequency or stimulus pattern at high external [Ca<sup>2+</sup>] (2 mM). There is a study on the frequency-dependence of STP of EPSP between L2/3-PCs (Feldmyer et al., 2006). Different from our study, Feldmyer et al., (2006) observed monotonous STD at all frequencies less than 50 Hz, but this study was done in the somatosensory cortex and at high external [Ca<sup>2+</sup>] (2 mM). To our knowledge, no previous study have investigated STP at recurrent excitatory synapses of L2/3 pyramidal cells of the mPFC especially at physiological external [Ca<sup>2+</sup>]. The present study, therefore, represents the first extensive investigation of STP at recurrent excitatory synapses in L2/3 of the mPFC under physiologically relevant external [Ca<sup>2+</sup>].

      References

      Feldmeyer D, Lubke J, Silver RA, Sakmann B (2002) Synaptic connections between layer 4 spiny neurone-layer 2/3 pyramidal cell pairs in juvenile rat barrel cortex: physiology and anatomy of interlaminar signalling within a cortical column. J Physiol 538:803-822.

      Holmgren C, Harkany T, Svennenfors B, Zilberter Y (2003) Pyramidal cell communication within local networks in layer 2/3 of rat neocortex. J Physiol 551:139-153.

      Ko H, Hofer SB, Pichler B, Buchanan KA, Sjöström PJ, Mrsic-Flogel TD (2011) Functional specificity of local synaptic connections in neocortical networks. Nature 473:87-91.

      Morishima M, Morita K, Kubota Y, Kawaguchi Y (2011) Highly differentiated projection-specific cortical subnetworks. Journal of Neuroscience 31:10380-10391.

      Wang Y, Markram H, Goodman PH, Berger TK, Ma J, Goldman-Rakic PS (2006) Heterogeneity in the pyramidal network of the medial prefrontal cortex. Nat Neurosci 9:534-542.

      (4) I couldn't figure out the significance of Figure S3. Perhaps this could be explained better.

      Optical minimal stimulation methods have not been previously documented in detail. This figure illustrates what parameters we should carefully examine in order to attain optical minimal stimulation, which hopefully stimulates a single afferent fiber. A single fiber stimulation by optical minimal stimulation is supported by the similarity of our estimate for the number of release sites (N) as the previous morphological estimate (Holler et al., 2021). For minimal stimulation, we used a collimated DMD-coupled LED was employed to restrict 470 nm illumination to a small and well-defined region within layer 2/3 of the prelimbic mPFC, and carefully adjusted the illumination radius such that one step smaller (by 1 μm) illumination results in failure to evoke EPSCs. Our typical illumination area ranged between 3–4 μm, as shown in Figure S3A. Under this minimal illumination area, we confirmed unimodal distributions for the EPSC parameters (amplitude, rise time, decay time and time to peak; Figure 3B-E). Otherwise, we excluded the recordings from analysis. We hope this explanation provides a clearer understanding of the figure's significance.

      (5) Note that CTZ seems to alter p_r at some synapses.

      We acknowledge that CTZ can increase release probability by blocking presynaptic K<sup>+</sup> currents. Indeed, Ishikawa and Takahashi (2001) reported that CTZ slowed the repolarizing phase of presynaptic action potentials and the frequency of miniature EPSCs in the calyx synapses. Consistently, we observed a slight increase in the baseline EPSC amplitude, from 33.3 pA to 41.9 pA (p=0.045) following the application of 50 µM CTZ. However, given that vesicular release probability (p<sub>v</sub>) is already close to 1 at the synapse of our interest, we believe that the observed effect is more likely attributed to an increase in release sites occupancy (p<sub>occ</sub>), which would be reflected as an increase in miniature EPSC frequency in Ishikawa and Takahashi (2001). Given that PPR depends on p<sub>v</sub> rather than p<sub>occ</sub>, this increase in p<sub>occ</sub> would not critically change our conclusion that AMPA receptor desensitization is not responsible for the strong PPD.

      Reference

      Ishikawa, T., & Takahashi, T. (2001). Mechanisms underlying presynaptic facilitatory effect of cyclothiazide at the calyx of Held of juvenile rats. The Journal of Physiology, 533(2), 423-431.

      (6) Figure 8B. The result in Figure 8C seems important, but I couldn't figure out why behaviour was not altered during the acquisition phase summarized in Figure 8B. Perhaps this could be explained more clearly for non-experts.

      Little difference in freezing behavior during acquisition has been also observed when prelimbic persistent firing was optogenetically inhibited (Gilmartin, 2013). Not only CS (tone) but also other sensory inputs (visual and olfactory etc.) and the spatial context could be a cue predicting US (shock). Moreover, during the acquisition phase, the presence of the electric shock inherently induces a freezing response as a natural defensive behavior, which may obscure specific behavioral changes related to the associative learning process. Therefore, the freezing behavior during acquisition cannot be regarded as a sign for specific association of CS and US. Instead, on the next day, we specifically evaluated the CS-US association of the conditioned animals by measuring freezing behavior in response to CS in a distinct context. We explicitly documented little difference between WT and KD animals during the acquisition phase in the relevant paragraph (line 397).

  2. Apr 2025
    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public Review):

      Summary:

      It seems as if the main point of the paper is about the new data related to rat fish although your title is describing it as extant cartilaginous fishes and you bounce around between the little skate and ratfish. So here's an opportunity for you to adjust the title to emphasize ratfish is given the fact that leader you describe how this is your significant new data contribution. Either way, the organization of the paper can be adjusted so that the reader can follow along the same order for all sections so that it's very clear for comparative purposes of new data and what they mean. My opinion is that I want to read, for each subheading in the results, about the the ratfish first because this is your most interesting novel data. Then I want to know any confirmation about morphology in little skate. And then I want to know about any gaps you fill with the cat shark. (It is ok if you keep the order of "skate, ratfish, then shark, but I think it undersells the new data).

      The main points of the paper are 1) to define terms for chondrichthyan skeletal features in order to unify research questions in the field, and 2) add novel data on how these features might be distributed among chondrichthyan clades. However, we agree with the reviewer that many readers might be more interested in the ratfish data, so we have adjusted the order of presentation to emphasize ratfish throughout the manuscript.

      Strengths:

      The imagery and new data availability for ratfish are valuable and may help to determine new phylogenetically informative characters for understanding the evolution of cartilaginous fishes. You also allude to the fossil record.

      Thank you for the nice feedback.

      Opportunities:

      I am concerned about the statement of ratfish paedomorphism because stage 32 and 33 were not statistically significantly different from one another (figure and prior sentences). So, these ratfish TMDs overlap the range of both 32 and 33. I think you need more specimens and stages to state this definitely based on TMD. What else leads you to think these are paedomorphic? Right now they are different, but it's unclear why. You need more outgroups.

      Sorry, but we had reported that the TMD of centra from little skate did significantly increase between stage 32 and 33. Supporting our argument that ratfish had features of little skate embryos, TMD of adult ratfish centra was significantly lower than TMD of adult skate centra (Fig1). Also, it was significantly higher than stage 33 skate centra, but it was statistically indistinguishable from that of stage 33 and juvenile stages of skate centra. While we do agree that more samples from these and additional groups would bolster these data, we feel they are sufficiently powered to support our conclusions for this current paper.

      Your headings for the results subsection and figures are nice snapshots of your interpretations of the results and I think they would be better repurposed in your abstract, which needs more depth.

      We have included more data summarized in results sub-heading in the abstract as suggested (lines 32-37).

      Historical literature is more abundant than what you've listed. Your first sentence describes a long fascination and only goes back to 1990. But there are authors that have had this fascination for centuries and so I think you'll benefit from looking back. Especially because several of them have looked into histology and development of these fishes.

      I agree that in the past 15 years or so a lot more work has been done because it can be done using newer technologies and I don't think your list is exhaustive. You need to expand this list and history which will help with your ultimate comparative analysis without you needed to sample too many new data yourself.

      We have added additional recent and older references: Kölliker, 1860; Daniel, 1934; Wurmbach, 1932; Liem, 2001; Arratia et al., 2001.

      I'd like to see modifications to figure 7 so that you can add more continuity between the characters, illustrated in figure 7 and the body of the text.

      We address a similar comment from this reviewer in more detail below, hoping that any concerns about continuity have been addressed with inclusion of a summary of proposed characters in a new Table 1, re-writing of the Discussion, and modified Fig7 and re-written Fig7 legend.

      Generally Holocephalans are the outgroup to elasmobranchs - right now they are presented as sister taxa with no ability to indicate derivation. Why isn't the catshark included in this diagram?

      While a little unclear exactly what was requested, we restructured the branches to indicate that holocephalans diverged earlier from the ancestors that led to elasmobranchs. Also in response to this comment, we added catshark (S. canicula) and little skate (L. erinacea) specifically to the character matrix.

      In the last paragraph of the introduction, you say that "the data argue" and I admit, I am confused. Whose data? Is this a prediction or results or summary of other people's work? Either way, could be clarified to emphasize the contribution you are about to present.

      Sorry for this lack of clarity, and we have changed the wording in this revision to hopefully avoid this misunderstanding.

      Reviewer #1 (Recommendations For The Authors):

      Further Strengths and Opportunities:

      Your headings for the results subsection and figures are nice snapshots of your interpretations of the results and I think they would be better repurposed in your abstract, which needs more depth. It's a little unusual to try and state an interpretation of results as the heading title in a results section and the figures so it feels out of place. You could also use the headings as the last statement of each section, after you've presented the results. In order I would change these results subheadings to:

      Tissue Mineral Density (TMD)

      Tissue Properties of Neural Arches

      Trabecular mineralization

      Cap zone and Body zone Mineralization Patterns

      Areolar mineralization

      Developmental Variation

      Sorry, but we feel that summary Results sub-headings are the best way to effectively communicate to readers the story that the data tell, and this style has been consistently used in our previous publications. No changes were made.

      You allude to the fossil record and that is great. That said historical literature is more abundant than what you've listed. Your first sentence describes a long fascination and only goes back to 1990. But there are authors that have had this fascination for centuries and so I think you'll benefit from looking back. Especially because several of them have looked into histology of these fishes. You even have one sentence citing Coates et al. 2018, Frey et al., 2019 and ørvig 1951 to talk about the potential that fossils displayed trabecular mineralization. That feels like you are burying the lead and may have actually been part of the story for where you came up with your hypothesis in the beginning... or the next step in future research. I feel like this is really worth spending some more time on in the intro and/or the discussion.

      We’ve added older REFs as pointed out above. Regarding fossil evidence for trabecular mineralization, no, those studies did not lead to our research question. But after we discovered how widespread trabecular mineralization was in extant samples, we consulted these papers, which did not focus on the mineralization patterns per se, but certainly led us to emphasize how those patterns fit in the context of chondrichthyan evolution, which is how we discussed them.

      I agree that in the past 15 years or so a lot more work has been done because it can be done using newer technologies. That said there's a lot more work by Mason Dean's lab starting in 2010 that you should take a look at related to tesserae structure... they're looking at additional taxa than what you did as well. It will be valuable for you to be able to make any sort of phylogenetic inference as part of your discussion and enhance the info your present in figure 7. Go further back in time... For example:

      de Beer, G. R. 1932. On the skeleton of the hyoid arch in rays and skates. Quarterly Journal of Microscopical Science. 75: 307-319, pls. 19-21.

      de Beer, G. R. 1937. The Development of the Vertebrate Skull. The University Press, Oxford.

      Indeed, we have read all of Mason’s work, citing 9 of his papers, and where possible, we have incorporated their data on different species into our Discussion and Fig7. Thanks for the de Beer REFs. While they contain histology of developing chondrichthyan elements, they appear to refer principally to gross anatomical features, so were not included in our Intro/Discussion.

      Most sections within the results, read more like a discussion than a presentation of the new data and you jump directly into using an argument of those data too early. Go back in and remove the references or save those paragraphs for the discussion section. Particularly because this journal has you skip the method section until the end, I think it's important to set up this section with a little bit more brevity and conciseness. For instance, in the first section about tissue mineral density, change that subheading to just say tissue mineral density. Then you can go into the presentation of what you see in the ratfish, and then what you see in the little skate, and then that's it. You save the discussion about what other elasmobranch's or mineralizing their neural arches, etc. for another section.

      We dramatically reduced background-style writing and citations in each Results section (other than the first section of minor points about general features of the ratfish, compared to catshark and little skate), keeping only a few to briefly remind the general reader of the context of these skeletal features.

      I like that your first sentence in the paragraph is describing why you are doing. a particular method and comparison because it shows me (the reader) where you're sampling from. Something else is that maybe as part of the first figure rather than having just each with the graph have a small sketch for little skate and catch shark to show where you sampled from for comparative purposes. That would relate back, then to clarifying other figures as well.

      Done (also adding a phylogenetic tree).

      Second instance is your section on trabecular mineralization. This has so many references in it. It does not read like results at all. It looks like a discussion. However, the trabecular mineralization is one of the most interesting aspect of this paper, and how you are describing it as a unique feature. I really just want a very clear description of what the definition of this trabecular mineralization is going to be.

      In addition to adding Table 1 to define each proposed endoskeletal character state, we have changed the structure of this section and hope it better communicates our novel trabecular mineralization results. We also moved the topic of trabecular mineralization to the first detailed Discussion point (lines 347-363) to better emphasize this specific topic.

      Carry this reformatting through for all subsections of the results.

      As mentioned above, we significantly reduced background-style writing and citations in each Results section.

      I'd like to see modifications to figure 7 so that you can add more continuity between the characters, illustrated in figure 7 and the body of the text. I think you can give the characters a number so that you can actually refer to them in each subsection of the results. They can even be numbered sequentially so that they are presented in a standard character matrix format, that future researchers can add directly to their own character matrices. You could actually turn it into a separate table so it doesn't taking up that entire space of the figure, because there need to be additional taxa referred to on the diagram. Namely, you don't have any out groups in figure 7 so it's hard to describe any state specifically as ancestral and wor derived. Generally Holocephalans are the outgroup to elasmobranchs - right now they are presented as sister taxa with no ability to indicate derivation. Why isn't the catshark included in this diagram?

      The character matrix is a fantastic idea, and we should have included it in the first place! We created Table 1 summarizing the traits and terminology at the end of the Introduction, also adding the character matrix in Fig7 as suggested, including specific fossil and extant species. For the Fig7 branching and catshark inclusion, please see above.

      You can repurpose the figure captions as narrative body text. Use less narrative in the figure captions. These are your results actually, so move that text to the results section as a way to truncate and get to the point faster.

      By figure captions, we assume the reviewer refers to figure legends. We like to explain figures to some degree of sufficiency in the legends, since some people do not read the main text and simply skim a manuscript’s abstract, figures, and figure legends. That said, we did reduce the wording, as requested.

      More specific comments about semantics are listed here:

      The abstract starts negative and doesn't state a question although one is referenced. Potential revision - "Comprehensive examination of mineralized endoskeletal tissues warranted further exploration to understand the diversity of chondrichthyans... Evidence suggests for instance that trabecular structures are not common, however, this may be due to sampling (bring up fossil record.) We expand our understanding by characterizing the skate, cat shark, and ratfish... (Then add your current headings of the results section to the abstract, because those are the relevant takeaways.)"

      We re-wrote much of the abstract, hoping that the points come across more effectively. For example, we started with “Specific character traits of mineralized endoskeletal tissues need to be clearly defined and comprehensively examined among extant chondrichthyans (elasmobranchs, such as sharks and skates, and holocephalans, such as chimaeras) to understand their evolution”. We also stated an objective for the experiments presented in the paper: “To clarify the distribution of specific endoskeletal features among extant chondrichthyans”.

      In the last paragraph of the introduction, you say that "the data argue" and I admit, I am confused. Whose data? Is this a prediction or results or summary of other people's work? Either way, could be clarified to emphasize the contribution you are about to present.

      Sorry for this lack of clarity, and we have changed the wording in this revision to hopefully avoid this misunderstanding.

      In the second paragraph of the TMD section, you mention the synarcual comparison. I'm not sure I follow. These are results, not methods. Tell me what you are comparing directly. The non-centrum part of the synarcual separate from the centrum? They both have both parts... did you mean the comparison of those both to the cat shark? Just be specific about which taxon, which region, and which density. No need to go into reasons why you chose those regions here.. Put into methods and discussion for interpretation.

      We hope that we have now clarified wording of that section.

      Label the spokes somehow either in caption or on figure direction. I think I see it as part of figure 4E, I, and J, but maybe I'm misinterpreting.

      Based upon histological features (e.g., regions of very low cellularity with Trichrome unstained matrix) and hypermineralization, spokes in Fig4 are labelled with * and segmented in blue. We detailed how spokes were identified in main text (lines 241-243; 252-254) and figure legend (lines 597-603).

      Reviewer #2 (Public Review):

      General comment:

      This is a very valuable and unique comparative study. An excellent combination of scanning and histological data from three different species is presented. Obtaining the material for such a comparative study is never trivial. The study presents new data and thus provides the basis for an in-depth discussion about chondrichthyan mineralised skeletal tissues.

      Many thanks for the kind words

      I have, however, some comments. Some information is lacking and should be added to the manuscript text. I also suggest changes in the result and the discussion section of the manuscript.

      Introduction:

      The reader gets the impression almost no research on chondrichthyan skeletal tissues was done before the 2010 ("last 15 years", L45). I suggest to correct that and to cite also previous studies on chondrichthyan skeletal tissues, this includes studies from before 1900.

      We have added additional older references, as detailed above.

      Material and Methods:

      Please complete L473-492: Three different Micro-CT scanners were used for three different species? ScyScan 117 for the skate samples. Catshark different scanner, please provide full details. Chimera Scncrotron Scan? Please provide full details for all scanning protocols.

      We clarified exact scanners and settings for each micro-CT experiment in the Methods (lines 476-497).

      TMD is established in the same way in all three scanners? Actually not possible. Or, all specimens were scanned with the same scanner to establish TMD? If so please provide the protocol.

      Indeed, the same scanner was used for TMD comparisons, and we included exact details on how TMD was established and compared with internal controls in the Methods. (lines 486-488)

      Please complete L494 ff: Tissue embedding medium and embedding protocol is missing. Specimens have been decalcified, if yes how? Have specimens been sectioned non-decalcified or decalcified?

      Please complete L506 ff: Tissue embedding medium and embedding protocol is missing. Description of controls are missing.

      Methods were updated to include these details (lines 500-503).

      Results:

      L147: It is valuable and interesting to compare the degree of mineralisation in individuals from the three different species. It appears, however, not possible to provide numerical data for Tissue Mineral Density (TMD). First requirement, all specimens must be scanned with the same scanner and the same calibration values. This in not stated in the M&M section. But even if this was the case, all specimens derive from different sample locations and have, been preserved differently. Type of fixation, extension of fixation time in formalin, frozen, unfrozen, conditions of sample storage, age of the samples, and many more parameters, all influence TMD values. Likewise the relative age of the animals (adult is not the same as adult) influences TMD. One must assume different sampling and storage conditions and different types of progression into adulthood. Thus, the observation of different degrees of mineralisation is very interesting but I suggest not to link this observation to numerical values.

      These are very good points, but for the following reasons we feel that they were not sufficiently relevant to our study, so the quantitative data for TMD remain scientifically valid and critical for the field moving forward. Critically, 1) all of the samples used for TMD calculations underwent the same fixation protocols, and 2) most importantly, all samples for TMD were scanned on the same micro-CT scanner using the same calibration phantoms for each scanning session. Finally, while the exact age of each adult was not specified, we note for Fig1 that clear statistically significant differences in TMD were observed among various skeletal elements from ratfish, shark, and skate. Indeed, ratfish TMD was considerably lower than TMD reported for a variety of fishes and tetrapods (summarized in our paper about icefish skeletons, who actually have similar TMD to ratfish: https://doi.org/10.1111/joa.13537).

      In response, however, we added a caveat to the paper’s Methods (lines 466-469), stating that adult ratfish were frozen within 1 or 2 hours of collection from the wild, staying frozen for several years prior to thawing and immediate fixation.

      Parts of the results are mixed with discussion. Sometimes, a result chapter also needs a few references but this result chapter is full of references.

      As mentioned above, we reduced background-style writing and citations in each Results section.

      Based on different protocols, the staining characteristics of the tissue are analysed. This is very good and provides valuable additional data. The authors should inform the not only about the staining (positive of negative) abut also about the histochemical characters of the staining. L218: "fast green positive" means what? L234: "marked by Trichrome acid fuchsin" means what? And so on, see also L237, L289, L291

      We included more details throughout the Results upon each dye’s first description on what is generally reflected by the specific dyes of the staining protocols. (lines 178, 180, 184, 223, 227, and 243-244)

      Discussion

      Please completely remove figure 7, please adjust and severely downsize the discussion related to figure 7. It is very interesting and valuable to compare three species from three different groups of elasmobranchs. Results of this comparison also validate an interesting discussion about possible phylogenetic aspects. This is, however, not the basis for claims about the skeletal tissue organisation of all extinct and extant members of the groups to which the three species belong. The discussion refers to "selected representatives" (L364), but how representative are the selected species? Can there be a extant species that represents the entire large group, all sharks, rays or chimeras? Are the three selected species basal representatives with a generalist life style?

      These are good points, and yes, we certainly appreciate that the limited sampling in our data might lead to faulty general conclusions about these clades. In fact, we stated this limitation clearly in the Introduction (lines 126-128), and we removed “representative” from this revision. We also replaced general reference to chondrichthyans in the Title by listing the specific species sampled. However, in the Discussion, we also compare our data with previously published additional species evaluated with similar assays, which confirms the trend that we are concluding. We look forward to future papers specifically testing the hypotheses generated by our conclusions in this paper, which serves as a benchmark for identifying shared and derived features of the chondrichthyan endoskeleton.

      Please completely remove the discussion about paedomorphosis in chimeras (already in the result section). This discussion is based on a wrong idea about the definition of paedomorphosis. Paedomorphosis can occur in members of the same group. Humans have paedormorphic characters within the primates, Ambystoma mexicanum is paedormorphic within the urodeals. Paedomorphosis does not extend to members of different vertebrate branches. That elasmobranchs have a developmental stage that resembles chimera vertebra mineralisation does not define chimera vertebra centra as paedomorphic. Teleost have a herocercal caudal fin anlage during development, that does not mean the heterocercal fins in sturgeons or elasmobranchs are paedomorphic characters.

      We agree with the reviewer that discussion of paedomorphosis should apply to members of the same group. In our paper, we are examining paedomorphosis in a holocephalan, relative to elasmobranch fishes in the same group (Chrondrichthyes), so this is an appropriate application of paedomorphosis. In response to this comment, we clarified that our statement of paedomorphosis in ratfish was made with respect to elasmobranchs (lines 37-39; 418-420).

      L432-435: In times of Gadow & Abott (1895) science had completely wrong ideas bout the phylogenic position of chondrichthyans within the gnathostomes. It is curious that Gadow & Abott (1895) are being cited in support of the paedomorphosis claim.

      If paedomorphosis is being examined within Chondrichthyes, such as in our paper and in the Gadow and Abbott paper, then it is an appropriate reference, even if Gadow and Abbott (and many others) got the relative position of Chondrichthyes among other vertebrates incorrect.

      The SCPP part of the discussion is unrelated to the data obtained by this study. Kawaki & WEISS (2003) describe a gene family (called SCPP) that control Ca-binding extracellular phosphoproteins in enamel, in bone and dentine, in saliva and in milk. It evolved by gene duplication and differentiation. They date it back to a first enamel matrix protein in conodonts (Reif 2006). Conodonts, a group of enigmatic invertebrates have mineralised structures but these structure are neither bone nor mineralised cartilage. Cat fish (6 % of all vertebrate species) on the other hand, have bone but do not have SCPP genes (Lui et al. 206). Other calcium binding proteins, such as osteocalcin, were initially believed to be required for mineralisation. It turned out that osteocalcin is rather a mineralisation inhibitor, at best it regulates the arrangement collagen fiber bundles. The osteocalcin -/- mouse has fully mineralised bone. As the function of the SCPP gene product for bone formation is unknown, there is no need to discuss SCPP genes. It would perhaps be better to finish the manuscript with summery that focuses on the subject and the methodology of this nice study.

      We completely agree with the reviewer that many papers claim to associate the functions of SCPP genes with bone formation, or even mineralization generally. The Science paper with the elephant shark genome made it very popular to associate SCPP genes with bone formation, but we feel that this was a false comparison (for many reasons)! In response to the reviewer’s comments, however, we removed the SCPP discussion points, moving the previous general sentence about the genetic basis for reduced skeletal mineralization to the end of the previous paragraph (lines 435-439). We also added another brief Discussion paragraph afterwards, ending as suggested with a summary of our proposed shared and derived chondrichthyan endoskeletal traits (lines 440-453).

      Reviewer #2 (Recommendations For The Authors):

      Other comments

      L40: remove paedomorphism

      No change; see above

      L53: down tune languish, remove "severely" and "major"

      Done (lines 57-59)

      L86: provide species and endoskeletal elements that are mineralized

      No change; this paragraph was written generally, because the papers cited looked at cap zones of many different skeletal elements and neural arches in many different species

      L130: remove TMD, replace by relative, descriptive, values

      No change; see above

      L135: What are "segmented vertebral neural arches and centra" ?

      Changed to “neural arches and centra of segmented vertebrae” (lines 140-141)

      L166: L168 "compact" vs. "irregular". Partial mineralisation is not necessarily irregular.

      Thanks for pointing out this issue; we changed wording, instead contrasting “non-continuous” and “continuous” mineralization patterns (lines 171-174)

      L192: "several endoskeletal regions". Provide all regions

      All regions provided (lines 198-199)

      L269: "has never been carefully characterized in chimeras". Carefully means what? Here, also only one chimera is analyses, not several species.

      Sentence removed

      302: Can't believe there is no better citation for elasmobranch vertebral centra development than Gadow and Abott (1895)

      Added Arriata and Kolliker REFs here (lines 293-295)

      L318 ff: remove discussion from result chapter

      References to paedomorphism were removed from this Results section

      L342: refer to the species studied, not to the entire group.

      Sorry, the line numbering for the reviewer and our original manuscript have been a little off for some reason, and we were unclear exactly to which line of text this comment referred. Generally in this revision, however, we have tried to restrict our direct analyses to the species analyzed, but in the Discussion we do extrapolate a bit from our data when considering relevant published papers of other species.

      346: "selected representative". Selection criteria are missing

      “selected representative” removed

      L348: down tune, remove "critical"

      Done

      L351: down tune, remove "critical"

      Done

      L 364: "Since stem chondrichthyans did not typically mineralize their centra". Means there are fossil stem chondrichthyans with full mineralised centra?

      Re-worded to “Stem chondrichthyans did not appear to mineralize their centra” (lines 379)

      L379: down tune and change to: "we propose the term "non-tesseral trabecular mineralization. Possibly a plesiomorphic (ancestral) character of chondrichthyans"

      No change; sorry, but we feel this character state needs to be emphasized as we wrote in this paper, so that its evolutionary relationship to other chondrichthyan endoskeletal features, such as tesserae, can be clarified.

      L407: suggests so far palaeontologist have not been "careful" enough?

      Apologies; sentence re-worded, emphasizing that synchrotron imaging might increase details of these descriptions (lines 406-408)

      414: down tune, remove "we propose". Replace by "possibly" or "it can be discussed if"

      Sentence re-worded and “we propose” removed (lines 412-415)

      L420: remove paragraph

      No action; see above

      L436: remove paragraph

      No action; see above

      L450: perhaps add summery of the discussion. A summery that focuses on the subject and the methodology of this nice study.

      Yes, in response to the reviewer’s comment, we finished the discussion with a summary of the current study. (lines 440-453)

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      The manuscript investigates the role of the membrane-deforming cytoskeletal regulator protein Abba in cortical development and its potential implications for microcephaly. It is a valuable contribution to the understanding of Abba's role in cortical development. The strengths and weaknesses identified in the manuscript are outlined below:

      Clinical Relevance:

      The authors identified a patient with microcephaly and intellectual disability patient harboring a mutation in the Abba variant (R671W), adding a clinically relevant dimension to the study.

      Mechanistic Insights:

      The study offers valuable mechanistic insights into the development of microcephaly by elucidating the role of Abba in radial glial cell proliferation, radial fiber organization, and the migration of neuronal progenitors. The identification of Abba's involvement in the cleavage furrow during cell division, along with its interaction with Nedd9 and positive influence on RhoA activity, adds depth to our understanding of the molecular processes governing cortical development.

      In Vivo Validation:

      The overexpression of mutant Abba protein (R671W), which results in phenotypic similarities to Abba knockdown effects, supports the significance of Abba in cortical development.

      Weaknesses:

      The findings in the study suggest that heterozygous expression of the R671W variant may exert a dominant-negative effect on ABBA's role, disrupting normal brain development and leading to microcephaly and cognitive delay. However, evidence also points to a possible gain-of-function effect, as the mutation does not decrease RhoA activity or PH3 expression in vivo. Additionally, the impact of ABBA depletion on cell fate is not fully addressed. While abnormal progenitor accumulation in the ventricular and subventricular zones is observed, the transition of progenitors to neuroblasts and their ability to support neuroblast migration remains unclear. Impaired cleavage furrow ingression and disrupted Nedd9 and RhoA signaling could lead to structural abnormalities in radial glial progenitors, affecting their scaffold function and neuroblast progression.  The manuscript lacks an exploration of the loss or decrease in interaction between Abba and NEDD9 in the case of the pathogenic patient-derived mutation in Abba. Furthermore, addressing the changes in localization and ineraction in for NEDD9 following over-expression of the mutant are important to further mehcanistically characterizxe this interaction in future studies. These gaps suggest the need for further exploration of ABBA's role in progenitor cell fate and neuroblast migration to clarify its mechanistic contributions to cortical development.

      (1) Response to statement on dominant-negative vs. gain-of-function effect of R671W variant:

      We appreciate the reviewer’s thoughtful analysis of the potential mechanisms underlying the R671W variant. We agree that the heterozygous expression of the human R671W mutation may initially suggest a dominant-negative effect. However, our data indicate that this variant may instead exert a gain-of-function effect. As highlighted in the discussion section, overexpression of ABBA-R671W in cells that also express wild-type ABBA did not result in a dominant-negative decrease in RhoA activation nor affect PH3 expression in vivo. These findings suggest that the R671W mutation does not impair the canonical ABBA-mediated activation of RhoA, and instead, the resulting phenotype may involve post-mitotic processes, such as altered cell migration. This interpretation is further supported by previous clinical studies reporting additional patients with the same mutation and phenotypic outcomes.

      (2) Response to statement on ABBA depletion and progenitor-to-neuroblast transition:

      We agree that the question of how ABBA depletion affects cell fate and the progression of radial glial progenitors (RGPs) to neuroblasts is of significant importance. Our findings suggest that ABBA knockdown disrupts cleavage furrow ingression, which may block radial glial cells prior to abscission. This likely contributes to the observed accumulation of cells in the ventricular and subventricular zones, as seen in Figures 2A and 4D. Additionally, disrupted Nedd9 expression and impaired RhoA signaling appear to alter the structural integrity of RGPs, leading to detachment of apical and basal endfeet (Supplementary Figure 3). These structural abnormalities compromise the ability of RGPs to function as scaffolds for neuroblast migration. Although direct live imaging of neuroblast migration was beyond the scope of the current dataset, we believe our evidence strongly supports a model in which ABBA depletion disrupts progenitor structure and migration. Future studies will address these transitions more directly using live imaging and fate-mapping strategies

      (3) Response to statement on loss of interaction between ABBA and NEDD9 with the R671W mutation:

      We fully agree with the importance of investigating whether the R671W mutation alters ABBA’s interaction with NEDD9. While our study provides evidence for a role of NEDD9 in mediating ABBA function, we acknowledge that we did not directly test whether the R671W mutation disrupts this interaction. We apologize if our manuscript conveyed the impression that this point had been fully addressed. Due to technical limitations, particularly the poor performance of anti-NEDD9 antibodies in slice immunohistochemistry, we were unable to reliably assess the interaction or localization changes in vivo. Nevertheless, this remains a priority for future studies aimed at better understanding the mechanistic underpinnings of the R671W mutation.

      (4) Response to statement on future directions for mechanistic characterization of NEDD9 localization and interaction:

      We agree with the reviewer that further investigation into NEDD9 localization and its interaction with the ABBA R671W mutant is essential to better define the molecular consequences of this mutation. Unfortunately, as mentioned above, the current tools available to us did not permit reliable immunohistochemical detection of NEDD9 in tissue. We fully intend to pursue alternative approaches, such as tagging strategies or the use of more sensitive detection platforms, to determine whether the R671W mutation affects the subcellular localization or stability of the ABBA-NEDD9 interaction. These experiments will be critical to elucidate the pathway through which ABBA regulates progenitor cell behavior and cortical development.

      Reviewer #2 (Public review):

      Summary:

      Carabalona and colleagues investigated the role of the membrane-deforming cytoskeletal regulator protein Abba (MTSS1L/MTSS2) in cortical development to better understand the mechanisms of abnormal neural stem cell mitosis. The authors used short hairpin RNA targeting Abba20 with a fluorescent reporter coupled with in utero electroporation of E14 mice to show changes to neural progenitors. They performed flow cytometry for in-depth cell cycle analysis of Abba-shRNA impact to neural progenitors and determined an accumulation in S phase. Using culture rat glioma cells and live imaging from cortical organotypic slides from mice in utero electroporated with Abba-shRNA, the authors found Abba played a prominent role in cytokinesis. They then used a yeast-two-hybrid screen to identify three high confidence interactors: Beta-Trcp2, Nedd9, and Otx2. They used immunoprecipitation experiments from E18 cortical tissue coupled with C6 cells to show Abba requirement for Nedd9 localization to the cleavage furrow/cytokinetic bridge. The authors performed an shRNA knockdown of Nedd9 by in utero electroporation of E14 mice and observed similar results as with the Abba-shRNA. They tested a human variant of Abba using in utero electroporation of cDNA and found disorganized radial glial fibers and misplaced, multipolar neurons, but lacked the impact of cell division seen in the shRNA-Abba model.

      Strengths:

      Fundamental question in biology about the mechanics of neural stem cell division.

      Directly connecting effects in Abba protein to downstream regulation of RhoA via Nedd9.

      Incorporation of human mutation in ABBA gene.

      Use of novel technologies in neurodevelopment and imaging.

      Weaknesses:

      Unexplored components of the pathway (such as what neurogenic populations are impacted by Abba mutation) and unleveraged aspects of their data (such as the live imaging) limit the scope of their findings and left significant questions about the effect of ABBA on radial glia development.

      (1) Claim of disorganized radial glial fibers lacks quantifications.

      - On page 11, the authors claim that knockdown of Abba lead to changes in radial glial morphology observed with vimentin staining. Here they claim misoriented apical processes, detached end feet, and decreased number of RGP cells in the VZ. However, they no not provide quantification of process orientation to better support their first claim. Measurements of radial glia fiber morphology (directionality, length) and of angle of division would be metrics that can be applied to data. Some of these analysis could be done in their time-lapse microscopy images, such as to quantify the number of cell division during their period of analysis (though that is short-15 hours).

      Response to: Lack of quantification of disorganized radial glial fibers and cell divisions in time-lapse data

      We appreciate the reviewer’s insightful comment regarding the need for quantification of radial glial (RG) fiber morphology. In the revised manuscript, we have addressed this by providing new quantification of changes in vimentin staining, specifically measuring the dispersion of the signal as a proxy for fiber disorganization (see Supplementary Figure 1). These data support the observed morphological changes, including misoriented apical processes and detachment of endfeet, following Abba knockdown.

      Regarding time-lapse analysis to track cell divisions, we attempted to follow individual cells during the 15-hour imaging window. However, due to the relatively short duration and limited number of cells that could be reliably tracked, the dataset did not allow for statistically meaningful conclusions. As an alternative approach, we performed live-cell imaging using Anillin-GFP, a reliable marker of mitotic progression. The distribution and accumulation of Anillin-GFP were analyzed in ABBA-shRNA3 and control conditions, and the results (now included in Supplementary Figure 3) indicate an increased number of cells arrested in late mitosis upon ABBA knockdown. This supports the notion of disrupted cytokinesis as a consequence of Abba depletion.

      (2) Unclear where effect is:

      - In RG or neuroblasts? Is it in cell cleavage that results in accumulation of cells at VZ (as sometimes indicated by their data like in Fig 2A or 4D)? Interrogation of cell death (such as by cleaved caspase 3) would also help. Given their time lapse, can they identify what is happening to the RG fiber? The authors describe a change in "migration" but do not show evidence for this for either progenitor or neuroblast populations. Given they have nice time-lapse imaging data, could they visualize progenitor versus young neuron migration? Analysis of neuroblasts (such as with doublecortin expression in the tissue) would also help understand any issues in migration (of neurons v stem cells).

      - At cleaveage furrow? In abscission? There is high resolution data that highlights the cleavage furrow as the location of interest (fig 3A), however there is also data (fig 3B) to suggest Abba is expressed elsewhere as well and there is an overall soma decrease. More detail of the localization of Abba during the division process would be helpful-for example, could cleavage furrow proteins, such as Aurora B, co-localization (and potentially co-IP) help delineate subpopulations of Abba protein? Furthermore, the FRET imaging is unique way to connect their mutation with function-could they measure/quantify differences at furrow compared to rest of soma to further corroborate that Abba-associated RhoA effect was furrow-enriched?

      - The data highlights nicely that a furrow doesn't clearly form when ABBA expression and subsequent RhoA activity are decreased (in Fig 3 or 5A). Does this lead to cells that can't divide because of poor abscission, especially since "rounding" still occurs? Or abnormal progenitors (with loss of fiber or inability to support neuroblast migration)? Or abnormal progression of progenitors to neuroblasts?

      Response to: Unclear location of the effect (RG vs. neuroblasts; cleavage furrow/abscission; migration issues)

      We thank the reviewer for this comprehensive and thought-provoking set of questions.

      a) Site of the effect – Radial Glia vs. Neuroblasts:

      Our data suggest that the primary effect of ABBA depletion occurs in radial glial progenitors (RGPs), specifically prior to abscission. We observed accumulation of electroporated cells in the ventricular zone (VZ), which we interpret as a result of cytokinetic failure (e.g., Figure 2A, 4D). We also documented detachment of apical and basal endfeet (see Supplementary Figure 3), further supporting structural disruption of RG fibers.

      b) Cell death analysis:

      We considered using cleaved caspase-3 as a marker for apoptosis, but due to its transient and non-specific activation during development, we opted to assess overall survival via quantification of RGP cell numbers and localization. This approach better reflects the developmental impact of ABBA knockdown (Supplementary Figure 3).

      c) Migration defects:

      We agree that distinguishing between progenitor and neuroblast migration would be highly informative. Although we did not perform doublecortin or similar staining to differentiate these populations in this dataset, the accumulation of electroporated cells in VZ/SVZ strongly suggests a migration deficit. Addressing this in detail will require new experiments using lineage-specific markers and longer time-lapse recordings, which we plan to explore in future studies.

      d) Cleavage furrow and abscission:

      Our high-resolution imaging of Anillin-GFP and FRET-based RhoA activity shows that ABBA localizes predominantly at the cleavage furrow. New quantifications of RhoA activity (now in Figure 5) show that the reduction in signaling is most pronounced at the furrow in ABBA knockdown cells. These findings align with the hypothesis that ABBA, through Nedd9 and RhoA, is essential for proper furrow formation and abscission.

      e) Mechanistic implications:

      As the reviewer notes, ABBA knockdown leads to cells that "round" but do not complete division, likely due to poor cleavage furrow ingression. This could generate abnormal progenitors that are structurally compromised (detached fibers) and thus unable to support neuroblast migration or proper differentiation. The cumulative result is disrupted progression from RGPs to neuroblasts, impaired structural scaffolding, and possibly reduced cell viability.

      (3) Limited to a singular time point of mouse cortical development

      On page 13, the authors outline the results of their Y2H screen with the identification of three high confidence interactors. Notably, they used a E10.5-E12.5 mouse brain embryo library rather than one that includes E14, the age of their in utero electroporation mice. Many of the authors' claims focus on in utero electroporation of shRNA-Abba of E14 mice that are then evaluated at E16-18. Justification for the focus on this age range should be included to support that their findings can then be applied to all of mouse corticogenesis.

      Response to: Use of E10.5–E12.5 library for yeast-two-hybrid (Y2H) screen

      We appreciate the reviewer’s concern regarding the developmental stage of the Y2H library. We chose the E10.5–E12.5 brain embryo library based on prior work demonstrating that ABBA expression is strongest during early cortical development, particularly in radial glia at these stages (see Saarikangas et al., J Cell Sci 2008). The radial glia-specific expression of ABBA was previously validated using RC2 and Tuj1 markers at E12.5. Thus, the library we used is well-suited for identifying interactors relevant to radial glial function, including Nedd9. We have clarified this rationale in the revised manuscript.

      (4) Detail of the effect of the human variant of the ABBA mutation in mouse is lacking.

      Their identification of the R671W mutation is interesting and the IUE model warrants more characterization, as they did with their original KD experiments.

      - Could they show that Abba protein levels are decreased (in either cell lines or electroporated tissue)?

      - While time-lapse morphology might not have been performed, more analysis on cell division phenotype (such as plane of division and radial glia morphology) would be helpful.

      Response to: Lack of detail on R671W human variant effects

      We thank the reviewer for encouraging further characterization of the R671W variant. In the revised manuscript, we now provide additional data on interkinetic nuclear migration (INM) defects resulting from R671W overexpression (see Supplementary Figure 3). These changes are consistent with disrupted radial glial organization and mirror aspects of the ABBA knockdown phenotype.

      a) Protein levels:

      We quantified ABBA expression in cells overexpressing the R671W variant (Supplementary Figure 5) and found no significant reduction compared to wild-type. This argues against a loss-of-function mechanism and supports a gain-of-function or dominant-interfering effect.

      b) Morphological and division phenotyping:

      While time-lapse imaging of R671W-expressing cells was not available in our dataset, we acknowledge that analyses such as division angle or radial glial morphology would be informative. Unfortunately, we were unable to perform these with the current data, but we agree these are important goals for future work.

      Reviewer 2 conclusion:

      The resubmission has addressed many of the questions raised.

      I have a few comments that should be addressed:

      (1) The authors maintain a deficit in "migration of immature neurons" which remains unsubstantiated. In their resonse, they state: "we believe that the data showing the accumulation of migrating electroporated cells in the ventricular (V) and subventricular (SV) zones provide compelling evidence of abnormal migration in ABBA-shRNA electroporated cells. "

      - Firstly, they do not demonstrate that it's immature neurons, not RGs, that are affected. Secondly, accumulation of cells at the V-SVZ could be due to soley the inability for the RGC to undergo mitosis, therefore remaining stuck"

      The commentary of migration, especially of neurons, should be modified.

      We appreciate the reviewer’s careful reading and valid concern regarding our use of the term "migration of immature neurons." We fully agree that the current dataset does not definitively distinguish whether the accumulated cells in the ventricular (V) and subventricular (SV) zones are immature neurons or radial glial progenitors (RGPs) arrested in mitosis.

      To clarify, our observations indicate that electroporated cells accumulate in the VZ/SVZ following ABBA knockdown (Figures 2A and 4D), and this was interpreted as evidence of impaired migration. However, we now recognize that this accumulation may primarily reflect a block in cell cycle progression—specifically, at the stage of cleavage furrow ingression and abscission—rather than a migratory defect per se. This is supported by our new data using Anillin-GFP (Supplementary Figure 3), which show increased accumulation of cells with persistent Anillin expression, consistent with mitotic arrest. Furthermore, the detachment of apical and basal processes (also shown in Supplementary Figure 3) suggests that ABBA knockdown affects the structural integrity of RGPs, potentially compromising their scaffold function.

      In light of these points, we have revised the manuscript to temper our conclusions regarding “migration defects.” Specifically, we now refer to the phenotype as “abnormal accumulation of progenitor cells” and clarify that, while these findings are consistent with impaired cell progression or scaffolding required for migration, we do not directly demonstrate impaired migration of immature neurons. As suggested, addressing this would require additional analyses, such as time-lapse imaging of post-mitotic cells or staining with markers like Doublecortin, which are beyond the scope of the current dataset but will be a focus of future investigations.

      We thank the reviewer again for encouraging a more precise interpretation of our findings

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Supplementary Fig 4B - The figure doesn't show an increase in percentage of PH3 positive cells in the NEDD9-shRNA condition. The control images are also missing for comparison. The figure legend needs to be corrected to match with the figure showing no significant changes.

      Thank you for this comment. This has been amended in the revised manuscript in the form of a new revised Supplementary Fig 4.

      Reviewer #2 (Recommendations for the authors):

      Minor annotations for slice culture assay

      The authors should make note of ages of slice cultures in text and have better annotations of slice cultures (for example, in Fig 4-where is mitosis?)

      We are sorry for the mistake it's not mitosis, it's the cleavage furrow stage.  In addition, a new amended Figure 4 is provided. 

      The effects are hard to see in lower mag slice images in Fig. 6. Would recommend focusing on higher mag to highlight RG differences.

      Thank you for this comment. This has been amended in the revised manuscript in the form of a new revised Figure 6.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1:

      (1) The mechanism by which fenofibrate rescues memory loss in Kallistatin-transgenic mice is unclear. As a PPARalpha agonist, does fenofibrate target the Kallistatin pathway directly or indirectly? Please provide a discussion based on literature supporting either possibility.

      Thank you for your important suggestion. Fenofibrate is indeed acting as a PPARα agonist. Fenofibrate has been shown to protect memory and cognitive function by downregulating α- and β-secretases[1]. Activation of PPARα can reduce Aβ plaques by upregulating ADAM10, thereby protecting memory and cognition[2]. Whereas, Fenofibrate can also act through a PPARα-independent pathway[3]. In our previous study, we proved that Fenofibrate can directly down-regulate the expression of Kallistatin in hepatocytes[4]. Here, our findings showed that Kallistatin induces cognitive memory deterioration by increasing amyloid-β plaques accumulation and tau protein hyperphosphorylation (Fig. 1-3), and Fenofibrate can directly down-regulate the serum level of Kallistatin (Fig. 8G). In addition, the expression of PPARα in the hippocampal tissue of Kallistatin (KAL-TG) mice showed no significant difference compared to the WT group (Author response image 1A-B). Therefore, we think Fenofibrate may improve memory and cognitive function at least in part through a PPARα-independent effect, which provides a new mechanism of Fenofibrate in AD with elevated Kallistatin levels.

      Author response image 1.

      (A-B) Protein levels of PPARα were tested by western blot analysis in hippocampal tissue, then statistically analyzed the above results.

      (2) The current study exclusively investigated the hippocampus. What about other cognitive memory-related regions, such as the prefrontal cortex? Including data from these regions or discussing the possibility of their involvement could provide a more comprehensive understanding of the role of Kallistatin in memory impairment.

      Thank you for your suggestion. In addition to hippocampal tissue analysis, we performed immunohistochemical detection of Aβ and phosphorylated Tau levels in the prefrontal cortex. Our findings revealed that KAL-TG mice exhibited significantly elevated Aβ and phosphorylated Tau levels in the prefrontal cortex compared to WT mice. These observations align with the pathological patterns observed in hippocampal tissues, demonstrating consistent neurodegenerative pathology across both the hippocampus and prefrontal cortex. The data for this part are seen as follows.

      Author response image 2.

      (A-B) Immunofluorescence staining of Aβ and phosphorylated tau (p-tau T231) was carried out in the prefrontal cortex tissue of KAL-TG and WT mice. Error bars represented the Standard Error of Mean (SEM); **p < 0.01. Scale bar, 100 μm.

      (3) Fenofibrate rescued phenotypes in Kallistatin-transgenic mice while rosiglitazone, a PPARgamma agonist, did not. This result contradicts the manuscript's emphasis on a PPARgamma-associated mechanism. Please address this inconsistency.

      Thank you for the reminder. In fact, our results showed a trend towards improved memory and cognitive function in KAL-TG mice treated with Rosiglitazone, although its effect is not as significant as that of Fenofibrate. Several studies have reported that Rosiglitazone has a beneficial effect on memory and cognitive function in mouse models of dementia, while these studies involve treatment periods of 3 to 4 months[5, 6], whereas our treatment period was only one month. Extending the treatment period with Rosiglitazone may result in a more pronounced improvement. In addition, Fenofibrate may have a PPAR-independent pathway by downregulating Kallistatin directly as discussed above and then show stronger effects.

      (4) Most of the immunohistochemistry images are unclear. Inserts have similar magnification to the original representative images, making judgments difficult. Please provide larger inserts with higher resolution.

      According to your suggestion, we provided larger inserts with higher resolution in Fig 3A and Fig 4B, as follows:

      (5) The immunohistochemistry images in different figures were taken from different hippocampal subregions with different magnifications. Please maintain consistency, or explain why CA1, CA3, or DG was analyzed in each experiment.

      Thank you for your advice. The trends of changes in different brain regions(including CA1, CA3, or DG) are consistent. Following your suggestion, we have now selected the DG region replaced the different hippocampal subregions with the DG area, and re-conducted the statistical analysis in Fig 5I & 6C, as follows. Due to the significant deposition of Aβ only in the CA1 region, Fig 2A was not replaced.

      (6) Figure 5B is missing a title. Please add a title to maintain consistency with other graphs.

      Thanks for your suggestion. We have added a title to Figure 5B, as follows:

      (7) Please list statistical methods used in the figure legends, such as t-test or One-way ANOVA with post-hoc tests.

      Thanks for your suggestion. We have listed the statistical methods used in the figure legends.

      Reviewer #2:

      (1) It was suggested that Kallistatin is primarily produced by the liver. The study demonstrates increased Kallistatin levels in the hippocampus tissue of AD mice. It would be valuable to clarify if Kallistatin is also increased in the liver of AD mice, providing a comprehensive understanding of its distribution in disease states.

      Thank you for your suggestion. We extracted liver tissue from APP/PS1 mice, and the Western blot results indicated that the expression of Kallistatin in the liver of APP/PS1 mice was elevated, as follows:

      Author response image 3.

      (A-B) Protein levels of Kallistatin were tested by western blot analysis in the liver tissue, then statistically analyzed the above results. Error bars represented the Standard Error of Mean (SEM); **p < 0.01.

      (2) Does Kallistatin interact directly with Notch1 ligands? Clarifying this interaction mechanism would enhance understanding of how Kallistatin influences Notch1 signaling in AD pathology.

      Thank you for your suggestion. This study reveals that Kallistatin directly binds to Notch1 and contributes to the activation of the Noch1-HES1 signaling pathway. As for whether Kallistatin can bind to the ligands of Notch1, it needs to conduct further investigations in future studies. Our preliminary data showed that Jagged1 was upregulated in the hippocampal tissues of KAL-TG mice by qPCR and Western blot analyses.

      Author response image 4.

      Kallistatin promoted Notch ligand Jagged1 expression to activate Notch1 signaling. (A) QPCR analysis of Notch ligands (Dll1, Dll3, Jagged1, Jagged2) expression in the 9 months hippocampus tissue. (B) Western blotting analysis of Notch ligand Jagged1 expression in the hippocampus tissue. (C) Western blotting analysis of Notch ligand Jagged1 expression in the hippocampus primary neuron. β-actin served as the loading control. Error bars represented the Standard Error of Mean (SEM); *p < 0.05.

      (3) Is there any observed difference in AD phenotype between male and female Kallistatin-transgenic (KAL-TG) mice? Including this information would address potential gender-specific effects on cognitive decline and pathology.

      Thank you for your suggestion. Actually, we have previously used female mice for Morris Water Maze experiments, and the results showed that both male and female KAL-TG mice exhibited a phenotype of decreased memory and cognitive function compared to the gender-matched WT group, while there was no significant difference between male and female KAL-TG mice as follows:

      Author response image 5.

      (A-D) Behavioral performance was assessed through the Morris water maze test. (A) The escape latency time was presented during 1-5 days. (B-D) Cognitive functions were evaluated by spatial probe test on day 6, then analyzing each group of mice crossing platform times(B), time percent in the targeted area (C), and the path traces heatmap (D). Error bars represented the Standard Error of Mean (SEM); F represents Female, M represents Male, and TG refers to KAL-TG; *p < 0.05.

      (4) It is recommended to include molecular size markers in Western blots for clarity and accuracy in protein size determination.

      Thank you for your reminder. We have shown the molecular weight of each bolt.

      (5) The language should be revised for enhanced readability and clarity, ensuring that complex scientific concepts are communicated effectively to a broader audience.

      According to your suggestion, we have polished the article for enhancing readability and clarity.

      Reviewer #3:

      (1) The authors did not illustrate whether the protective effect of fenofibrate against AD depends on Kallistatin.

      Thank you for your important suggestion. Fenofibrate is indeed acting as a PPARα agonist. Fenofibrate has been shown to protect memory and cognitive function by downregulating α- and β-secretases[1]. Activation of PPARα can reduce Aβ plaques by upregulating ADAM10, thereby protecting memory and cognition[2]. Whereas, Fenofibrate can also act through a PPARα-independent pathway[3]. In our previous study,we proved Fenofibrate can directly down-regulate the expression of KAL in hepatocytes[4]. Here, our findings showed that Kallistatin induces cognitive memory deterioration by increasing amyloid-β plaques accumulation and tau protein hyperphosphorylation (Fig. 1-3), and Fenofibrate can directly down-regulate the serum level of Kallistatin (Fig. 8G). In addition, the expression of PPARα in the hippocampal tissue of Kallistatin (KAL-TG) mice showed no significant difference compared to the WT group (Author response image 1-B). Therefore, we think Fenofibrate may improve memory and cognitive function at least in part through downregulatin Kallistatin. To conclusively determine whether fenofibrate’s therapeutic effects depend on Kallistatin, future studies should employ Kallistatin-knockout AD animal models to evaluate fenofibrate’s impact on cognitive and memory functions. These investigations will further clarify the mechanistic underpinnings of fenofibrate in AD therapy.

      (2) The conclusions are supported by the results, but the quality of some results should be improved.

      Thank you for your kind suggestion. We have updated the magnified images in the immunohistochemistry section of the article, ensuring that the fields of view for the immunohistochemistry are within the same brain region, and have shown the molecular weights in each bolt. Additionally, we have conducted a quantitative analysis of the protein levels in the Western blot results presented in Fig6&8.

      (3) Figures 2c, 3c, and 4a present the Western blot results of p-tau from mice of different ages on one membrane, showing age-dependent expression. The authors analyzed the results of mice of different ages in one statistical chart, which will create ambiguity with the results of the representative images. For example, the expression of p-tau 396 in the blot was lower in the WT-12 M group than in the WT-9 M group (Figure 3c), which is contradictory to the statistical analysis.

      Thank you for your reminder. The statistical presentation here does not match the figure. At that time, the WB experiments for the hippocampal tissue at each age group were conducted separately, and it was not appropriate to compare different age groups together. This graph cannot illustrate age dependency. We have replaced the statistical graph in Figure 3B&D, as follows:

      (4) Figure 4b shows that KAL-TG-9 M had greater BACE1 expression than KAL-TG-12 M. Furthermore, the nuclei are not uniformly colored. Please provide more representative figures.

      Thank you for your reminder. Due to the fact that these sets of data were not processed in a single batch, the ages in the graph are not comparable. Regarding the issue of inconsistent nuclear staining, we have provided another representative image from this group, as follows:

      (5) Unclear why the BACE1 and Aβ levels seems less with KAL+shHES1 treatment than GFP+shNC treatment (Fig 6H)? This finding contradicts the conclusion.

      Thank you for your reminder. This experiment was repeated three times, and here, we have represented the representative results along with the corresponding statistical data. There are no difference between KAL+shHES1 treatment and GFP+shNC treatment. We have updated the Fig. 6H.

      (6) The Western blot results in figure 6e-h, 8h-i, and S3-S5 were not quantified.

      Thank you for your reminder. We have added statistical graphs and original images of the pictures in figure 6e-h, 8h-i, and S3-S5.

      (7) The authors did not provide the detection range of the Aβ42 ELISA kit.

      Thank you for your suggestion. The Aβ42 ELISA kit is from the IBL, with the product number 27721. Its standard range is 1.56 - 100 pg/mL, and the sensitivity is 0.05 pg/mL.

      (8)The authors did not specify the sex of the mice. This is important since sex could have had a dramatic impact on the results.

      Thank you for your suggestion. The results we present in the text are all statistically obtained from male mice. Actually, we have previously used female mice for Morris Water Maze experiments, and the results showed that both male and female KAL-TG mice exhibited a phenotype of decreased memory and cognitive function compared to the gender-matched WT group, while there was no significant difference between male and female KAL-TG mice (Author response image 5).

      Minor:

      (1) In Figure 2b, there are no units for the vertical coordinates of the statistical graph.

      Thank you for your reminder. We have added units for the vertical coordinates in Figure 2b.

      (2) In Figure 2c, the left Y-axis title is lacking in the statistic chart.

      Thank you for your reminder. We have added the left Y-axis title in the statistic chart.

      Reference:

      (1) Assaf N, El-Shamarka ME, Salem NA, Khadrawy YA, El Sayed NS. Neuroprotective effect of PPAR alpha and gamma agonists in a mouse model of amyloidogenesis through modulation of the Wnt/beta catenin pathway via targeting alpha- and beta-secretases. Progress in Neuro-Psychopharmacology and Biological Psychiatry 2020, 97: 109793.

      (2) Rangasamy SB, Jana M, Dasarathi S, Kundu M, Pahan K. Treadmill workout activates PPARα in the hippocampus to upregulate ADAM10, decrease plaques and improve cognitive functions in 5XFAD mouse model of Alzheimer’s disease. Brain, Behavior, and Immunity 2023, 109: 204-218.

      (3) Yuan J, Tan JTM, Rajamani K, Solly EL, King EJ, Lecce L, et al. Fenofibrate Rescues Diabetes-Related Impairment of Ischemia-Mediated Angiogenesis by PPARα-Independent Modulation of Thioredoxin-Interacting Protein. Diabetes 2019, 68(5): 1040-1053.

      (4) Fang Z, Shen G, Wang Y, Hong F, Tang X, Zeng Y, et al. Elevated Kallistatin promotes the occurrence and progression of non-alcoholic fatty liver disease. Signal Transduct Target Ther 2024, 9(1): 66.

      (5) Nelson ML, Pfeifer JA, Hickey JP, Collins AE, Kalisch BE. Exploring Rosiglitazone's Potential to Treat Alzheimer's Disease through the Modulation of Brain-Derived Neurotrophic Factor. Biology (Basel) 2023, 12(7).

      (6) Pedersen WA, McMillan PJ, Kulstad JJ, Leverenz JB, Craft S, Haynatzki GR. Rosiglitazone attenuates learning and memory deficits in Tg2576 Alzheimer mice. Exp Neurol 2006, 199(2): 265-273.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews: 

      Reviewer #1 (Public review): 

      The authors investigated the role of the C. elegans Flower protein, FLWR-1, in synaptic transmission, vesicle recycling, and neuronal excitability. They confirmed that FLWR-1 localizes to synaptic vesicles and the plasma membrane and facilitates synaptic vesicle recycling at neuromuscular junctions. They observed that hyperstimulation results in endosome accumulation in flwr-1 mutant synapses, suggesting that FLWR-1 facilitates the breakdown of endocytic endosomes. Using tissue-specific rescue experiments, the authors showed that expressing FLWR-1 in GABAergic neurons restored the aldicarb-resistant phenotype of flwr-1 mutants to wild-type levels. By contrast, cholinergic neuron expression did not rescue aldicarb sensitivity at all. They also showed that FLWR-1 removal leads to increased Ca<sup>2+</sup> signaling in motor neurons upon photo-stimulation. From these findings, the authors conclude that FLWR-1 helps maintain the balance between excitation and inhibition (E/I) by preferentially regulating GABAergic neuronal excitability in a cell-autonomous manner. 

      Overall, the work presents solid data and interesting findings, however the proposed cell-autonomous model of GABAergic FLWR-1 function may be overly simplified in my opinion. 

      Most of my previous comments have been addressed; however, two issues remain. 

      (1) I appreciate the authors' efforts conducting additional aldicarb sensitivity assays that combine muscle-specific rescue with either cholinergic or GABergic neuron-specific expression of FLWR-1. In the revised manuscript, they conclude, "This did not show any additive effects to the pure neuronal rescues, thus FLWR-1 effects on muscle cell responses to cholinergic agonists must be cellautonomous." However, I find this interpretation confusing for the reasons outlined below. 

      Figure 1 - Figure Supplement 3B shows that muscle-specific FLWR-1 expression in flwr-1 mutants significantly restores aldicarb sensitivity. However, when FLWR-1 is co-expressed in both cholinergic neurons and muscle, the worms behave like flwr-1 mutants and no rescue is observed. Similarly, cholinergic FLWR-1 alone fails to restore aldicarb sensitivity (shown in the previous manuscript).

      This data is still shown in the manuscript, Fig. 3D. We interpreted our finding in the muscle/cholinergic co-rescue experiment as meaning, that FLWR-1 in cholinergic neurons over-compensates, so worms should be resistant, and the rescuing effect of muscle FLWR-1 is therefore cancelled. But it is true, if this were the case, why does the pure cholinergic rescue not show over-compensation? We added a sentence to acknowledge this inconsistency and we added a sentence in the discussion (see also below, comment 1) of reviewer #2).

      These observations indicate a non-cell-autonomous interaction between cholinergic neurons and muscle, rather than a strictly muscle cell-autonomous mechanism. In other words, FLWR-1 expressed in cholinergic neurons appears to negate or block the rescue effect of muscle-expressed FLWR-1. Therefore, FLWR-1 could play a more complex role in coordinating physiology across different tissues. This complexity may affect interpretations of Ca<sup>2+</sup> dynamics and/or functional data, particularly in relation to E/I balance, and thus warrants careful discussion or further investigation. 

      For the Ca<sup>2+</sup> dynamics, we think the effects of flwr-1 are likely very immediate, as the imaging assay relies on a sensor expressed directly in the neurons or muscles under study, and not on indirect phenotypes as muscle contraction and behavior, that depend on an interplay of several cell types influencing each other.

      (2) The revised manuscript includes new GCaMP analyses restricted to synaptic puncta. The authors mention that "we compared Ca<sup>2+</sup> signals in synaptic puncta versus axon shafts, and did not find any differences," concluding that "FLWR-1's impact is local, in synaptic boutons." This is puzzling: the similarity of Ca<sup>2+</sup> signals in synaptic regions and axon shafts seems to indicate a more global effect on Ca<sup>2+</sup> dynamics or may simply reflect limited temporal resolution in distinguishing local from global signals due to rapid Ca<sup>2+</sup> diffusion. The authors should clarify how they reached the conclusion that FLWR-1 has a localized impact at synaptic boutons, given that synaptic and axonal signals appear similar. Based on the presented data, the evidence supporting a local effect of FLWR-1 on Ca<sup>2+</sup> dynamics appears limited.

      We apologize, here we simply overlooked this misleading wording in our rebuttal letter. The data we mentioned, showing no obvious difference in axon vs. bouton, are shown below, including time constants for the onset and the offset of the stimulus (data is peak normalized for better visualization):

      Author response image 1.

      One can see that axonal Ca<sup>2+</sup> signals may rise a bit slower than synaptic Ca<sup>2+</sup> signals, as expected for Ca<sup>2+</sup> entering the boutons, and then diffusing out into the axon. The loss of FLWR1 does not affect this. However, the temporal resolution of the used GCaMP6f sensor is ca. 200 ms to reach peak, and the decay time (to t1/2) is ca. 400 ms (PMID: 23868258). Thus, it would be difficult to see effects based on Ca<sup>2+</sup> diffusion using this assay. For the decay, this is similar for both axon and synapse, while flwr-1 mutants do not reduce Ca<sup>2+</sup> as much as wt. In the axon, there is a seemingly slightly slower reduction in flwr-1 mutants, however, given the kinetics of the sensor, this is likely not a meaningful difference. Therefore, we wrote we did not find differences. The interpretation should not have been that the impact of FLWR-1 is local. It may be true if one could image this at faster time scales, i.e. if there is more FLWR-1 localized in boutons (as indicated by our data showing FLWR-1 enrichment in boutons; Fig. 3), and when considering its possible effect on MCA-3 localization (and assuming that MCA-3 is the active player in Ca<sup>2+</sup> removal), i.e. FLWR-1 recruiting MCA-3 to boutons (Fig. 9C, D).  

      Reviewer #2 (Public review): 

      Summary: 

      The Flower protein is expressed in various cell types, including neurons. Previous studies in flies have proposed that Flower plays a role in neuronal endocytosis by functioning as a Ca<sup>2+</sup> channel. However, its precise physiological roles and molecular mechanisms in neurons remain largely unclear. This study employs C. elegans as a model to explore the function and mechanism of FLWR-1, the C. elegans homolog of Flower. This study offers intriguing observations that could potentially challenge or expand our current understanding of the Flower protein. Nevertheless, further clarification or additional experiments are required to substantiate the study's conclusions. 

      Strengths: 

      A range of approaches was employed, including the use of a flwr-1 knockout strain, assessment of cholinergic synaptic activity via analyzing aldicarb (a cholinesterase inhibitor) sensitivity, imaging Ca<sup>2+</sup> dynamics with GCaMP3, analyzing pHluorin fluorescence, examination of presynaptic ultrastructure by EM, and recording postsynaptic currents at the neuromuscular junction. The findings include notable observations on the effects of flwr-1 knockout, such as increased Ca<sup>2+</sup> levels in motor neurons, changes in endosome numbers in motor neurons, altered aldicarb sensitivity, and potential involvement of a Ca<sup>2+</sup>-ATPase and PIP2 binding in FLWR-1's function. 

      The authors have adequately addressed most of my previous concerns, however, I recommend minor revisions to further strengthen the study's rigor and interpretation: 

      Major suggestions 

      (1) This study relies heavily on aldicarb assays to support its conclusions. While these assays are valuable, their results may not fully align with direct assessment of neurotransmitter release from motor neurons. For instance, prior work has shown that two presynaptic modulators identified through aldicarb sensitivity assays exhibited no corresponding electrophysiological defects at the neuromuscular junction (Liu et al., J Neurosci 27: 10404-10413, 2007). Similarly, at least one study from the Kaplan lab has noted discrepancies between aldicarb assays and electrophysiological analyses. The authors should consider adding a few sentences in the Discussion to acknowledge this limitation and the potential caveats of using aldicarb assays, especially since some of the aldicarb assay results in this study are not easily interpretable. 

      Aldicarb assays have been used very successfully in identifying mutants with defects in chemical synaptic transmission, and entire genetic screens have been conducted this way. The reviewer is right, one needs to realize that it is the balance of excitation and inhibition at the NMJ of C. elegans, which underlies the effects on the rate of aldicarb-induced paralysis, not just cholinergic transmission. I.e. if a given mutant affects cholinergic and GABAergic transmission differently, things become difficult to interpret, particularly if also muscle physiology is affected. Therefore, we combined mutant analyses with cell-type specific rescue. We acknowledge that results are nonetheless difficult to interpret. We thus added a sentence in the first paragraph of the discussion.

      (2) The manuscript states, "Elevated Ca<sup>2+</sup> levels were not further enhanced in a flwr-1;mca-3 double mutant." (lines 549-550). However, Figure 7C does not include statistical comparisons between the single and double mutants of flwr-1 and mca-3. Please add the necessary statistical analysis to support this statement. 

      Because we only marked significant differences in that figure, and n.s. was not shown. This was stated in the figure legend.

      (3) The term "Ca<sup>2+</sup> influx" should be avoided, as this study does not provide direct evidence (e.g. voltage-clamp recordings of Ca<sup>2+</sup> inward currents in motor neurons) for an effect of the flwr-1 mutation of Ca<sup>2+</sup> influx. The observed increase in neuronal GCaMP signals in response to optogenetic activation of ChR2 may result from, or be influenced by, Ca<sup>2+</sup> mobilization from of intracellular stores. For example, optogenetic stimulation could trigger ryanodine receptor-mediated Ca<sup>2+</sup> release from the ER via calcium-induced calcium release (CICR) or depolarization-induced calcium release (DICR). It would be more appropriate to describe the observed increase in Ca<sup>2+</sup> signal as "Ca<sup>2+</sup> elevation" rather than increased "Ca<sup>2+</sup> influx". 

      Ok, yes, we can do this, we referred by ‘influx’ to cytosolic Ca<sup>2+</sup>, that fluxes into the cytosol, be it from the internal stores or the extracellular. Extracellular influx, more or less, inevitably will trigger further influx from internal stores, to our understanding. We changed this to “elevated Ca<sup>2+</sup> levels” or “Ca<sup>2+</sup> level rise” or “Ca<sup>2+</sup> level increase”.

      Recommendations for the authors: 

      Reviewer #1 (Recommendations for the authors):

      A thorough discussion on the impact of cell-autonomous versus non-cell-autonomous effects is necessary. 

      Revise and clarify the distinction between local and global Ca²⁺ changes. 

      see above.

      Reviewer #2 (Recommendations for the authors): 

      Minor suggestions 

      (1) In "Few-Ubi was shown to facilitate recovery of neurons following intense synaptic activity (Yao et al.,....." (lines 283-284), please specify which aspects of neuronal recovery are influenced by the Flower protein. 

      We added “refilling of SV pools”.

      (2) The abbreviation "Few-Ubi" is used for the Drosophila Flower protein (e.g., line 283, Figure 1A, and Figure 8A). Please clarify what "Ubi" stands for and verify whether its inclusion in the protein name is appropriate.

      This is inconsistent across the literature, sometimes Fwe-Ubi is also referred to as FweA. We now added this term. Ubi refers to ubiquitous (“Therefore, we named this isoform fweubi because it is expressed ubiquitously in imaginal discs“) (Rhiner 2010)

      (3) The manuscript uses "pflwr-1" (line 303 and elsewhere) to denote the flwr-1 promoter. This notation could be misleading, as it may be interpreted as a gene name. Please consider using either "flwr-1p" or "Pflwr-1" instead. Additionally, ensure proper italicization of gene names throughout the manuscript. 

      We changed this throughout. We will change to italicized at proof stage, it would be too timeconsuming to spot these incidents now.

      (4) The authors tagged the C-terminus of FLWR-1 by GFP (lines 321). The fusion protein is referred to as "GFP::FLWR-1" throughout the manuscript. Please verify whether "FLWR-1::GFP" would be the more appropriate designation.

      Thank you, yes, we changed this in the text, GFP is indeed N-terminal.

      (5) In "This did not show any additive effects...." (line 363), please clarify what "This" refers to. 

      Altered to “The combined rescues did not show any additive effects…”

      (6) In "..., supporting our previous finding of increased neurotransmitter release in GABAergic neurons" (lines 412-413), please provide a citation for the referenced previous study.

      This refers to our aldicarb data within this paper, just further up in the text. We removed “previous”.

      (7) Figure 4C, D examines the effect of flwr-1 mutation on body length in the genetic background of the unc-29 mutation, which selectively disrupts the levamisole-sensitive acetylcholine receptor. Please comment on the rationale for implicating only the levamisole receptor rather than the nicotinic acetylcholine receptor in muscle cells. 

      This was because we used a behavioral assay. Despite the fact that the homopentameric ACR16/N-AChR mediate about 2/3 of the peak currents in response to acute ACh application to the NMJ (e.g. Almedom et al., EMBO J, 2009), the acr-16 mutant has virtually no behavioral / locomotion phenotype. Likely, this is because the heteropentameric, UNC-29 containing LAChR, while only contributing 1/3 of the peak current, desensitizes much more slowly and thus unc-29 mutants show a severe behavioral phenotype (uncoordinated locomotion, etc.). We thus did not expect a major effect when performing the behavoral assay in acr-16 mutants and thus chose the unc-29 mutant background.

      (8) In "we found no evidence ....insertion into the PM (Yao et al., 2009)", It appears that the cited paper was not authored by any of the current manuscript. Please confirm whether this citation is correctly attributed. 

      This sentence was arranged in a misleading way, we did not mean that we authored this paper. It was change in the text: “While a facilitating role of Flower in endocytosis appears to be conserved in C. elegans, in contrast to previous findings from Drosophila (Yao et al., 2009), we found no evidence that FLWR-1 conducts Ca<sup>2+</sup> upon insertion into the PM.”

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this study, Xiao et al. classified retroperitoneal liposarcoma (RPLS) patients into two subgroups based on whole transcriptome sequencing of 88 patients. The G1 group was characterized by active metabolism, while the G2 group exhibited high scores in cell cycle regulation and DNA damage repair. The G2 group also displayed more aggressive molecular features and had worse clinical outcomes compared to G1. Using a machine learning model, the authors simplified the classification system, identifying LEP and PTTG1 as the key molecular markers distinguishing the two RPLS subgroups. Finally, they validated these markers in a larger cohort of 241 RPLS patients using immunohistochemistry. Overall, the manuscript is clear and well-organized, with its significance rooted in the large sample size and the development of a classification method.

      Thank you for your positive assessment of our study on classifying RPLS patients based on whole transcriptome sequencing. We appreciate your recognition of the distinct characteristics of the G1 and G2 groups, as well as the significance of our simplified classification system and the identification of LEP and PTTG1 as key molecular markers. Your acknowledgment of the clarity and organization of our manuscript, along with the importance of the large sample size, is greatly appreciated. We will continue to refine our work based on your feedback as we prepare for resubmission.

      Weakness:

      (1) While the authors suggest that LEP and PTTG1 serve as molecular markers for the two RPLS groups, the process through which these genes were selected remains unclear. The authors should provide a detailed explanation of the selection process.

      The selection criteria for identifying LEP and PTTG1 as biomarkers involved selecting prognostic genes that were highly expressed in C1 and C2, respectively, and achieved the highest AUC value in distinguishing the two RPLS groups (Page17 lines 288-290).

      (2) To ensure the broader applicability of LEP and PTTG1 as classification markers, the authors should validate their findings in one or two external datasets.

      We sincerely appreciate your insightful suggestion regarding the external validation of LEP and PTTG1 as classification biomarkers. To address this concern, we performed an independent validation using an external liposarcoma cohort (GSE30929; Page 6, Lines 104-105)), which comprises 140 primary liposarcoma samples with annotated clinicopathological and survival data. This dataset was selected due to its relevance to RPLS (N=63, 45%) and the availability of distant recurrence-free survival (DRFS) outcomes, aligning with the clinical focus of our study. 

      Applying our previously established prognostic model (Risk value = 2.182 × PTTG1 - 2.204 × LEP) to this cohort, we stratified patients into high- and low-risk groups using the median risk score as the cutoff. Consistent with our original findings, the high-risk group exhibited significantly worse DRFS compared to the low-risk group. The ROC curves based on the 1-, 3-, 5-year survival status of patients demonstrated that this model can effectively predict patient DRFS (log-rank P < 0.001, Figure S3A-B). Furthermore, the high-risk group demonstrated a higher proportion of high-grade histology (P < 0.001, Fisher’s exact test, Figure S3C-D).

      These results validate the robustness and generalizability of our risk stratification model across distinct liposarcoma cohorts. The external dataset’s alignment with our findings underscores the potential of LEP and PTTG1 as reproducible biomarkers for prognosis and therapeutic stratification in liposarcoma. We have incorporated these validation results into the revised manuscript (Page 18, Lines 305-315) to strengthen the clinical applicability of our conclusions.

      (3) Since molecular subtyping is often used to guide personalized treatment strategies, it is recommended that the authors evaluate therapeutic responses in the two distinct groups. Additionally, they should validate these predictions using cell lines or primary cells.

      We sincerely appreciate your insightful comments and suggestions regarding the evaluation of therapeutic responses and the validation of our predictions using cell lines or primary cells. We would like to address these points in detail below:

      (1) Purpose of the PTTG1- and LEP-based RPLS Classification Model

      The primary objective of our study was to develop a molecular subtyping model based on PTTG1 and LEP to guide personalized treatment strategies for patients with RPLS, particularly those classified as low-grade by traditional histopathological criteria but exhibiting poor prognosis. This subgroup of patients may benefit from more aggressive surgical resection, which is a potentially curative approach for RPLS. Our model aims to identify these high-risk patients to ensure complete tumor resection, thereby improving their clinical outcomes.

      (2) Therapeutic Response Evaluation in Distinct Groups

      In both our validation cohort and external validation cohort, surgical resection was the primary treatment modality for RPLS. After stratifying patients using our model, we observed significant differences in surgical outcomes between the two groups: the high-risk group exhibited poor prognosis, while the low-risk group showed favorable outcomes (Figure 5D-E and Figure S3A-B). Importantly, our model successfully identified low-grade histopathological cases with poor prognosis, who might otherwise be undertreated (Figure 5G-I and Figure S3C-D). By advocating for more thorough surgical resection in these high-risk patients, we aim to improve their prognosis. This achievement aligns with the primary goal of our study, which is to provide a molecular tool for personalized treatment guidance.

      (3) Future Validation and Functional Exploration of PTTG1 and LEP

      Our study has identified PTTG1 and LEP as key biomarkers for RPLS classification, and we recognize the urgent need to elucidate their molecular functions in RPLS pathogenesis. Here, we are pleased to report that we have already initiated cellular and animal experiments to investigate the roles of PTTG1 and LEP in RPLS. These experiments aim to validate our predictions and explore the underlying mechanisms by which these biomarkers contribute to tumor behavior and treatment response. We anticipate that the results of these studies will provide further mechanistic insights and will be submitted for publication in a suitable journal in the near future.

      Reviewer #2 (Public review):

      Surgical resection remains the most effective treatment for retroperitoneal liposarcoma. However, postoperative recurrence is very common and is considered the main cause of disease-related death. Considering the importance and effectiveness of precision medicine, the identification of molecular characteristics is particularly important for the prognosis assessment and individualized treatment of RPLS. In this work, the authors described the gene expression map of RPLS and illustrated an innovative strategy of molecular classification. Through the pathway enrichment of differentially expressed genes, characteristic abnormal biological processes were identified, and RPLS patients were simply categorized based on the two major abnormal biological processes. Subsequently, the classification strategy was further simplified through nonnegative matrix factorization. The authors finally narrowed the classification indicators to two characteristic molecules LEP and PTTG1, and constructed novel molecular prognosis models that presented obviously a great area under the curve. A relatively interpretable logistic regression model was selected to obtain the risk scoring formula, and its clinical relevance and prognostic evaluation efficiency were verified by immunohistochemistry. Recently, prognostic model construction has been a hot topic in the field of oncology. The interesting point of this study is that it effectively screened characteristic molecules and practically simplified the typing strategy on the basis of ensuring high matching clinical relevance. Overall, the study is well-designed and will serve as a valuable resource for RPLS research.

      Thank you for your insightful feedback on our manuscript. We appreciate your recognition of the importance of precision medicine and molecular characteristics in improving prognosis and individualized treatment for RPLS.

      We are pleased that you found our gene expression mapping and innovative molecular classification strategy valuable. Your positive remarks on our pathway enrichment analysis and the categorization of RPLS patients based on abnormal biological processes affirm our approach.

      We are also grateful for your acknowledgment of our focus on the characteristic molecules LEP and PTTG1, as well as the development of novel molecular prognosis models with significant predictive capability.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Joint Public Review:

      Summary

      In this manuscript, Dong et al. study the directed cell migration of tracheal stem cells in Drosophila pupae. The authors study how the directionality of these cells is regulated along the dorsal trunk. They show that inter-organ communication between the tracheal stem cells and the nearby fat body plays a role in posterior migration. They provide compelling evidence that Upd2 production in the fat body and JAK/STAT activation in the tracheal stem cells play a role. Moreover, they show that JAK/STAT signalling might induce the expression of apicobasal and planar cell polarity genes in the tracheal stem cells which appear to be needed to ensure unidirectional migration. Finally, the authors suggest that trafficking and vesicular transport of Upd2 from the fat body towards the tracheal cells might be important.

      Strengths

      The manuscript is well written and presents extensive and varied experimental data to show a link between Upd2-JAK/STAT signaling from the fat body and tracheal progenitor cell migration. The authors provide convincing evidence that the fat body, located near the trachea, secretes vesicles containing the Upd2 cytokine and that affecting JAK-STAT signaling results in aberrant migration of some of the tracheal stem cells towards the anterior. Using ChIP-seq as well as analysis of GFP-protein trap lines of planar cell polarity genes in combination with RNAi experiments, the authors show that STAT92E likely regulates the transcription of planar cell polarity genes and some apicobasal cell polarity genes in tracheal stem cells which appear to be needed for unidirectional migration. The work presented here provides some novel insights into the mechanism that ensures polarized migration of tracheal stem cells, preventing bidirectional migration. This might have important implications for other types of directed cell migration in invertebrates or vertebrates including cancer cell migration. Overall, the authors have substantially improved their manuscript since the first submission but there are still some weaknesses.

      Weaknesses

      Overall, the manuscript lacks insights into the potential significance of the observed phenotypes and of the proposed new signaling model. Most of our concerns could be dealt with by adjusting the text (explaining some parts better and toning down some statements).

      (1) Directional migration of tracheal progenitors is only partially compromised, with some cells migrating anteriorly and others maintaining their posterior migration, a quite discrete phenotype.

      The strongest migration defects quantified in graphs (e.g. 100 μm) are not shown in images, since they would be out of frame, it would be beneficial to see them. In addition, the consequence of defects in polarized migration on tracheal development is not clear and data showing phenotypes on the final trachea morphology in pupae are not explained nor linked to the previous phenotypes.

      We agree with you that it is informative to show strong anterior migration (> 100 μm). Accordingly, we have shown examples in Figure 3B and Figure 7R-S. In addition, we have also discuss on the links between migration defects and the consequential phenotypes of the animal at a later developmental stage in the revised manuscript. The undisciplined migration leads to insufficient regeneration and incomplete remodeling of airway and causes pupal lethality.

      (2) Some important information is lacking, such as the origin of mutant and UAS-RNAi lines, which are not reported in the material and methods. For instance, mutants for components of the JAK-STAT pathway are used but not described. Are they all viable at the pupal stage? Otherwise, pupae would not be homozygous mutants. From the figure legend, it seems that the Stat92EF allele has been used, which is a point mutation, thus not leading to an absence of protein. If the hopTUM allele has been used, as mentioned in the legend, it is a gain-of-function allele. Thus, the authors should not conclude that "The aberrant anterior migration of tracheal progenitors in the absence of JAK/STAT components led to impairment of tracheal integrity and caused melanization in the trachea (Figure 3-figure supplement 1E-I)".

      We apologize for inadequate description of the experimental materials and methods. We have listed the stock number of mutant and RNAi alleles in Key resource table and Materials. The mutant alleles that we chose to examine can survive to pupal stage, which is key to the success of our subsequent characterization of these mutants. According to your suggestion, we modified the statement for accuracy.

      (3) The authors observe that tracheal progenitors display a polarized distribution of Fat that is controlled by JAK-STAT signaling. However, this conclusion is made from a single experiment using only 3 individuals with no statistics. This is insufficient to support the claim that "JAK/STAT signaling promotes the expression of genes involved in planar cell polarity leading to asymmetric localization of Fat in progenitor cells", as mentioned in the abstract, or that "the activated tracheal progenitors establish a disciplined migration through the asymmetrical distribution of polarity proteins which is directed by an Upd2-JAK/STAT signaling stemming from the remote organ of fat body."

      We performed multiple biological replicates for Ft distribution experiments and observed similar trend, although we only showed three representative samples. In the revised text, we have included n number for statistic representation and statistic test.

      (4) The authors demonstrate that Upd2 is transported through vesicles from the fat body to the tracheal progenitors. It remains somewhat unclear in the proposed model how Upd2 activates JAK-STAT signaling. Are vesicles internalized, as it seems to be proposed, and thus how does Upd2 activate JAK-STAT signaling intracellularly? Or is Upd2 released from vesicles to bind Dome extracellularly to activate the JAK-STAT pathway? Moreover, it is not clear nor discussed what would be the advantage of transporting the ligand in vesicles compared to classical ligand diffusion.

      We do not know whether the association between Upd2 and Lbm is inside or outside vesicles. The vesicular trafficking of Upd2 is our observation and supported by various genetic and biochemical experiments. Our research does not imply the message that this vesicular trafficking has advantage over diffusion.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Although the use of antimony has been discontinued in India, the observation that there are Leishmania parasites that are resistant to antimony in circulation has been cited as evidence that these resistant parasites are now a distinct strain with properties that ensure their transmission and persistence. It is of interest to determine what are the properties that favor the retention of their drug resistance phenotype even in the absence of the selective pressure that would otherwise be conferred by the drug. The hypothesis that these authors set out to test is that these parasites have developed a new capacity to acquire and utilize lipids, especially cholesterol which affords them the capacity to grow robustly in infected hosts.

      We sincerely appreciate Reviewer 1's thoughtful and positive evaluation of our manuscript. We acknowledge that the reviewer has a few major concerns, and we would like to address them one by one in the following section.

      Major issues:

      (1) There are several experiments for which they do not provide sufficient details, but proceed to make significant conclusions.

      Experiments in section 5 are poorly described. They supposedly isolated PVs from infected cells. No details of their protocol for the isolation of PVs are provided. They reference a protocol for PV isolation that focused on the isolation of PVs after L. amazonensis infection. In the images of infection that they show, by 24 hrs, infected cells harbor a considerable number of parasites. Is it at the 24 hr time point that they recover PVs? What is the purity of PVs? The authors should provide evidence of the success of this protocol in their hands. Earlier, they mentioned that using imaging techniques, the PVs seem to have fused or interconnected somehow. Does this affect the capacity to recover PVs? If more membranes are recovered in the PV fraction, it may explain the higher cholesterol content.

      We would like to thank the reviewer for correctly pointing out lack of details regarding PV isolation and its purity. There are multiple questions raised by the reviewer and we will answer them one by one in a point wise manner:

      Firstly, “Is it at the 24 hr time point that they recover PVs?”

      In the ‘Methods’ section of the original submission (Line number 606-611), there is a separate section on “Parasitophorous vacuole (PV) Isolation and cholesterol measurement”, where it is clearly mentioned, “24Hrs LD infected KCs were lysed by passing through a 22-gauge syringe needle to release cellular contents. Parasitophorous vacuoles (PV) were then isolated using a previously outlined protocol [Ref: 73].” However, we do acknowledge further details might be useful to enrich this section, and hence we would like to include the following details in the Methods section of the revised manuscript, Line 663-678 “Parasitophorous vacuoles (PV) were isolated using a previously outlined protocol with slight modifications [76]. 107 KCs were seeded in a 100 mm plate and allowed to adhere for 24Hrs. Following this infection was performed with Leishmania donovani (LD) for 24Hrs, the infected KCs were then harvested by gentle scraping and lysed through five successive passages through an insulin needle to ensure membrane disruption while preserving organelle integrity. The lysate was centrifuged at 200 × g for 10mins at 4°C to remove intact cells and large debris. The resulting supernatant was carefully collected and subjected to a discontinuous sucrose density gradient (60%, 40%, and 20%). The gradient was centrifuged at 700 × g for 25mins at 4°C to facilitate organelle separation. The interphase between the 40% and 60% sucrose layers, enriched with PVs, was carefully collected and subjected to a final centrifugation step at 12,000 × g for 25mins at 4°C. The supernatant was discarded, and the resulting pellet was enriched for purified parasitophorous vacuoles, suitable for downstream biochemical and molecular analyses. Cholesterol and protein contents in PV were determined by an Amplex Red assay kit and Bradford assay, respectively. Resulting data were represented as micrograms of cholesterol per microgram of protein.”

      Secondly, What is the purity of PVs? Earlier, they mentioned that using imaging techniques, the PVs seem to have fused or interconnected somehow. Does this affect the capacity to recover PVs? If more membranes are recovered in the PV fraction, it may explain the higher cholesterol content.

      We appreciate the reviewer for pointing this critical lack of data in the submitted manuscript. In the revised manuscript, we have now provided data on the purity of isolated fraction by performing Confocal imaging and Western blot against PV and cytoplasmic fraction in the revised manuscript. We admit, as rightly pointed out by the reviewer we need to access the purity of isolated PV in our experiment. As suggested by the reviewer, we have included the results of this experiment in the Figure 3C i, C ii and C iii. Our results clearly showed an efficient PV isolation with demarcating LAMP-1 positive staining around LD amastigotes, which was further validated by Western Blot showing a significant enrichment of LAMP-1 specifically in the PV fraction. This has been included as (Line 225-234), in the revised manuscript which read as, “Parasitophorous vacuole fractions were isolated from LD-S and LD-R-infected KCs at 24Hrs p.i. using a previously established protocol [35]. Following isolation, PV purity was confirmed through LAMP-1 staining which showed a significant enrichment around isolated PV in Confocal microscopy (Figure 3C i). Purity of isolated PV fractions was further confirmed by Western blot which showed an enhanced enrichment of LAMP-1 for LD-R-PV fraction as compared to LD-S-PV fraction, while PV excluded cellular fraction showed residual LAMP-1 expression confirming the purity of the isolated PV fractions (Figure 3C ii, iii). Following isolation, protein concentration was measured for isolated PV fractions using the Bradford assay, and PV fractions from both LD-S- and LD-R-infected KCs were normalized accordingly.”

      (2) In section 6 they evaluate the mechanism of LDL uptake in macrophages. Several approaches and endocytic pathway inhibitors are employed. The authors must be aware that the role of cytochalasin D in the disruption of fluid phase endocytosis is controversial. Although they reference a study that suggests that cytochalasin D has no effect on fluid-phase endocytosis, other studies have found the opposite (doi: 10.1371/journal.pone.0058054). It wasn't readily evident what concentrations were used in their study. They should consider testing more than 1 concentration of the drug before they make their conclusions on their findings on fluid phase endocytosis.

      We thank the reviewer for this insightful comment and we apologise for missing out mentioning Cytochalasin-D concentration. To clarify, LDL uptake by LD-R infected KCs is LDL-receptor independent as clearly shown in Section 6, Figure 4A, Figure S4A, Figure S4B i and Figure S4B ii in the  submitted manuscript. In (Figure 4F and Figure S4D) of the  submitted manuscript, as referred by the Reviewer, Cytochalasin-D was used at a concentration of 2.5µg/ml. At this concentration, we did not observe any effect of Cytochalasin-D on LDL-receptor independent fluid phase endocytosis as intracellular LD-R amastigotes was able to uptake LDL successfully and proliferate in infected Kupffer cells, unlike Latranculin-A (5µM) treatment which completely inhibited intracellular proliferation of LD-R amastigotes by blocking only receptor independent Fluid phase endocytosis (Video 2A and 2B and Figure 4E in the  submitted manuscript). In fact, the study referred by the reviewer (doi: 10.1371/journal.pone.0058054), used a concentration of 4µg/ml Cytochalasin-D which did affect both LDL-receptor dependent and also receptor independent endocytosis in bone marrow derived macrophages. We would also like to clarify that in this work during our preliminary experiments we have also tested higher concentration Cytochalasin-D (5µg/ml). However, even at this higher concentration there were no significant effect of Cytochalasin-D on LD-R induced LDL-receptor independent fluid phase endocytosis as observed from intracellular LD-R amastigote count. Thus, we strongly believe that Cytochalasin-D does not have any impact on LD-R induced fluid phase endocytosis even at higher concentration. We have now included this data as Figure 4F and Figure S4E in the revised manuscript. Further, to clear out any confusion that readers might have, and also concentration of all the inhibitors used in the study will be mentioned in the Result section (Line 278 and 284), as well as in the revised Figure labels.

      (3) In Figure 5 they present a blot that shows increased Lamp1 expression from as early as 4 hrs after infection with LD-R and by 12 hrs after infection of both LD-S and LD-R. Increased Lamp1 expression after Leishmania infection has not been reported by others. By what mechanism do they suggest is causing such a rapid increase (at 4hrs post-infection) in Lamp-1 protein? As they report, their RNA seq data did not show an increase in LAMP1 transcription (lines 432-434).

      We would like to express our gratitude to the reviewer for highlighting the novelty of this observation. Indeed, to the best of our knowledge, no similar findings (we could not find reference of any quantitative Western blot for LAMP-1) have been reported previously in primary macrophages infected with Leishmania donovani (LD). Firstly, we would like to point out, as stated in the Methods section (Lines 556–566) of the  submitted manuscript: "Flow-sorted metacyclic LD promastigotes were used at a MOI of 1:10 (with variations of 1:5 and 1:20 in some cases) for 4 hours, which was considered the 0th point of infection. Macrophages were subsequently washed to remove any extracellular loosely attached parasites and incubated further as per experimental requirements.” This indicates that our actual study points correspond to approximately the 8th and 28th hours post-infection”. We just wanted to clarify the time point just to prevent any potential confusion.

      Now regarding LAMP1 expression, although we could not find any previous reports of its expression in LD infected primary macrophages, we would like to mention that there is a previous report (doi.org/10.1128/mBio.01464-20), which shows a similar punctuated LAMP-1 upregulation (as observed by us in Figure 5A i of the  submitted manuscript) in response to leishmania infection in nonphagocytic fibroblast. It is tempting to speculate that increased LAMP-1 expression observed in response to LD-R infected macrophages might be due to increased lysosomal biogenesis, required for degrading increased endocytosed-LDL into bioavailable cholesterol. However, since no change in LAMP-1 expression in RNA seq data (Figure 6, of the  submitted manuscript), we can only speculate that this is happening due to some post transcriptional or post translational modifications. But further work will definitely require to investigate this mechanism in details which is beyond the scope of this work. That is why, in the  submitted manuscript, (Line 432-435), we have discussed this, “Although available RNAseq analysis (Figure 6) did not support this increased expression of lamp-1 in the transcript level, it did reflect a notable upregulation of vesicular fusion protein (VSP) vamp8 and stx1a in response to LD-R-infection. LD infection can regulate LAMP-1 expression, and the role of VSPs in LDLvesicle fusion with LD-R-PV is worthy of further investigation.”

      However, we agree with the reviewer that this might not be enough for the clarification. Hence in the revised manuscript this has been updated in the Discussion section (Line 465-472) as follows, “Although available RNAseq analysis (Figure 6) did not support this increased expression of lamp-1 in the transcript level, it did reflect a notable upregulation of vesicular fusion protein (VSP) vamp8 and stx1a in response to LD-R-infection. How, LD infection can regulate LAMP-1 expression, and the role of VSPs in LDL-vesicle fusion with LD-R-PV is worthy of further investigation. It is possible and has been earlier reported that LD infection can regulate host proteins expression through post transcriptional and post translational modifications [61-63]. It is tempting to speculate that LD-R amastigote might be promoting an increased lysosomal biogenesis through any such mechanism to increase supply of bioavailable cholesterol through action of lysosomal acid hydrolases on LDL.”

      (4) In Figure 6, amongst several assays, they reported on studies where SPC-1 is knocked down in PECs. They failed to provide any evidence of the success of the knockdown, but nonetheless showed greater LD-R after NPC-1 was knocked down. They should provide more details of such experiments.

      Although we do understand the concern raised by the reviewer, this statement in question is factually incorrect. We would like to point out that in Figure 6F i, of the  submitted manuscript (Figure 6G ii in the revised manuscript), we have demonstrated decreased NPC-1 staining following transfection with NPC-1-specific siRNA, whereas no such reduction was observed with scrambled RNA. Similar immunofluorescence data confirming LDL-receptor knockdown has also been provided in Figure S4B i of the  submitted manuscript (Figure S4B ii in the revised manuscript). However, we acknowledge that the reviewer may be referring to the lack of quantitative validation of the knockdown via Western blot. We would like to clarify although, we already had this data, but we did not include it to avoid duplication to reduce the data density of the MS. But as suggested by the reviewer, we have included western blot for both NPC-1 and LDL-receptor knock down in the revised manuscript as Figure 6G i and Figure S4B i which again confirms an efficient Knock down of NPC-1 and LDLr as we have observed with IFA.

      Additionally, as suggested by the reviewer, we also noticed lack of details in Methods section of the  submitted manuscript, concerning siRNA mediated Knock down (KD). Therefore, we have included more details in the revised manuscript (Line 821-828), which read as, “For all siRNA transfections, Lipofectamine® RNAiMAX Reagent (Life Technologies, 13778100) specifically designed for knockdown assays in primary cells was used according to the manufacturer's instructions with slight modifications. PECs were seeded into 24-well plates at a density of 1x10<sup>5</sup> per well, and incubated at 37°C with 5% CO2. The transfection complex, comprising (1µl Lipofectamine® RNAiMAX and 50µl Opti MEM) and (1 µl siRNA and 50µl Opti MEM) mixed together directly added to the incubated PECs. Gene silencing was checked by IFA and by Western blot as mentioned previously.”

      Minor issues

      (1) There is an implication that parasite replication occurs well before 24hrs post-infection?

      Studies on Leishmania parasite replication have reported on the commencement of replication after 24hrs post-infection of macrophages (PMCID: PMC9642900). Is this dramatic increase in parasite numbers that they observed due to early parasite replication?

      We thank the reviewer for this insightful comment and appreciate the opportunity to clarify our findings. Indeed, as rightly assumed by the Reviewer, as our data suggest, and we also believe that this increase intracellular amastigotes number is a consequence of early replication of Leishmania donovani. As already mentioned in response to Point number 3 raised by Reviewer 1, we would again like to highlight that in the Methods section (Lines 562–566), it is clearly stated: "Flow-sorted metacyclic LD promastigotes were used at a MOI of 1:10 (with variations of 1:5 and 1:20 in some cases) for 4 hours, which was considered the 0th point of infection. Macrophages were subsequently washed to remove any extracellular loosely attached parasites and incubated further as per experimental requirements.” This effectively means that our actual study points correspond to approximately the 8th and 28th hours post-infection and we just want to mention it to avoid any confusion regarding experimental time points.

      Now, regarding specific concern related to Leishmania parasite replication, we would like to point out that the study referred by the reviewer on the commencement of replication after 24hrs, was conducted on Leishmania major, which may differ significantly from Leishmania donovani owing to its species and strain-specific characteristics (PMCID: PMC9642900). In fact, doubling time of Leishmania donovani (LD) has been previously reported to be approximately 11.4 hours (doi: 10.1111/j.1550-7408. 1990.tb01147.x). Moreover, multiple studies have indicated an exponential increase in intracellular LD amastigote number (more than two-fold increase) by 24Hrs post infection. (doi:10.1128/AAC.0119607, doi.org/10.1016/j.ijpara.2011.07.013). We also have a similar observation for both infected PEC and KC as depicted in Figure 1C and Figure S1C in the  submitted and revised manuscript) indicating that active replication is happening in this time frame for Leishmania donovani. Hence it was an informed decision from our side to focus on 24Hrs time point to perform the analysis on intracellular LD proliferation.

      (2) Several of the fluorescence images in the paper are difficult to see. It would be helpful if a blown-up (higher magnification image of images in Figure 1 (especially D) for example) is presented.

      We apologise for the inconvenience. Although we have provided Zoomed images for several other Figures in the  submitted manuscript and revised manuscript, like Figure 4, Figure 5, Figure 6 and Figure 8. However, this was not always doable for all the figures (like for Figure 1D), due to lack of space and Figure arrangements requirements. However, to accommodate Reviewer’s request we have provide a blown-up image for Figure 1D iii in the revised manuscript.

      (3) The times at which they choose to evaluate their infections seem arbitrary. It is not clear why they stopped analysis of their KC infections at 24 hrs. As mentioned above, several studies have shown that this is when intracellular amastigotes start replicating. They should consider extending their analyses to 48 or 72 hrs post-infection. Also, they stop in vitro infection of Apoe/- mice at 11 days. Why? No explanation is given for why only 1 point after infection.

      Reviewer has raised two independent concerns and we would like to address them individually.

      Firstly, “The times at which they choose to evaluate their infections seem arbitrary. It is not clear why they stopped analysis of their KC infections at 24 hrs. As mentioned above, several studies have shown that this is when intracellular amastigotes start replicating. They should consider extending their analyses to 48 or 72 hrs post-infection.”

      We have already provided a detail justification for time point selection in our response to Reviewer 1, Minor Comment 1. As mentioned already we observed a significant and sharp rise in the number of intracellular amastigotes between 4Hrs and 24Hrs post-infection in KC, with replication rate appeared to be not increasing proportionally (not doubling) after that (Figure 1C in the revised manuscript). This early stage of rapid replication of LD amastigotes, therefore likely coincides with a critical period of lipid acquisition by intracellular amastigotes (Video 3A and 3B and Figure 4E in the  submitted manuscript and revised manuscript) and thus 24Hrs infected KC was specifically selected. In this regard, we would further like to add that at 72Hrs post-infection, we noticed a notable number of infected Kupffer cells began detaching from the wells with extracellular amastigotes probably egressing out. This phenomenon potentially reflects the severe impact of prolonged infection on Kupffer cell viability and adhesion properties as shown in Video 2 in the revised manuscript and Author response image 1. This observation further influenced our decision to conclude all infection studies in Kupffer cells by the 48Hrs post-infection, which necessitate to complete the infection time point at 24 Hrs, for allowing treatment of Amp-B for another 24 Hrs (Figure 8, and Figure S5, in the  submitted manuscript and revised manuscript). We acknowledge that we should have been possibly clearer on our selection of infection time points and as the Reviewer have suggested we have included this information in the revised manuscript (Line 134-141) for clear understanding of the reader. This read as, “Interestingly, as compared to a significant and sharp rise in the number of intracellular amastigotes between 4Hrs and 24Hrs post infected KC in response to LD-R infection, the number of intracellular amastigotes although increased significantly did not doubled from 24Hrs to 48Hrs p.i. suggesting exponential LD amastigote replication between 4Hrs and 24Hrs time frame and slowing down after that (Figure 1Ci, ii). Moreover, it was also noticed that at 72Hrs p.i. a notable number of infected-KC began detaching from the wells with extracellular amastigotes probably egressing out from the infected-KCs (Video 2). Thus, 24Hrs time point was selected to conduct all further infection studies involving KCs.”

      Author response image 1.

      Representative images of Kupffer cells infected with Leishmania donovani at 72Hrs post-infection showing a significant morphological change. Infected cells exhibit a rounded morphology and progressive detachment. Scale bar 10µm.

      Secondly “Also, they stop in vitro infection of Apoe-/- mice at 11 days. Why? No explanation is given for why only 1 point after infection.”

      We apologize for not providing an explanation regarding the selection of the 11-day time point for  Apoe<sup>-/-</sup> experiments (Figure 2 of the  submitted and revised manuscript). Our rationale for this choice is based on both previous literature and the specific objectives of our study. Previous report suggests that Leishmania donovani infection in hypercholesteraemic Apoe<sup>-/-</sup> mice triggers a heightened inflammatory response at approximately six weeks’ post-infection compared to C57BL/6 mice, leading to more efficient parasite clearance. This is owing to unique membrane composition of Apoe<sup>-/-</sup> which rectifies leishmania mediated defective antigen presentation at a later stage of infection (DOI 10.1194/jlr.M026914). Additionally, previous studies have also indicated that Leishmania donovani infection is well-established in vivo within 6 to 11 days post-infection in murine models (doi: 10.1128/AAC.47.5.1529-1535.2003). Given that in this experiment we particularly aimed to assess the early infection status (parasite load) in diet-induced hypercholesterolemic mice, we would like to argue that the selection of the 11-day time point was rational and well-aligned with our study objectives as this time point within this window are optimal for capturing initial parasite burden depending on initial lipid utilization, before host-driven immune clearance mechanisms could significantly alter infection dynamics. We have included this explanation in the revised manuscript (Line 170-179) as suggested by the Reviewer and this read as, “Previous report has suggested that LD infection in hypercholesteremic Apoe<sup>-/-</sup> mice triggers a heightened inflammatory response at approximately six weeks’ post-infection compared to wild type BL/6 mice, leading to more efficient parasite clearance. This is owing to unique membrane composition of Apoe-/- which rectifies leishmania mediated defective antigen presentation at a later stage of LD infection [20]. Additionally, previous studies have also indicated that LD infection is well-established in mice within 6 to 11 days post-infection in murine models [33]. Thus to evaluate impact of initial lipid utilization on LD amastigote replication in vivo, BL/6 and diet-induced hypercholesterolemic Apoe<sup>-/-</sup> mice were infected with GFP expressing LD-S or LD-R promastigotes and sacrificed 11 days p.i.”

      Reviewer #2 (Public review):

      Summary:

      This study by Pradhan et al. offers critical insights into the mechanisms by which antimonyresistant Leishmania donovani (LD-R) parasites alter host cell lipid metabolism to facilitate their own growth and, in the process, acquire resistance to amphotericin B therapy. The authors illustrate that LD-R parasites enhance LDL uptake via fluid-phase endocytosis, resulting in the accumulation of neutral lipids in the form of lipid droplets that surround the intracellular amastigotes within the parasitophorous vacuoles (PV) that support their development and contribute to amphotericin B treatment resistance. The evidence provided by the authors supporting the main conclusions is compelling, presenting rigorous controls and multiple complementary approaches. The work represents an important advance in understanding how intracellular parasites can modify host metabolism to support their survival and escape drug treatment.

      We would like to sincerely thank the reviewer for appreciating our work and find the evidence compelling to address the issue of emergence of drug resistance in infection with intracellular protozoan pathogens.

      Strengths:

      (1) The study utilizes clinical isolates of antimony-resistant L. donovani and provides interesting mechanistic information regarding the increased LD-R isolate virulence and emerging amphotericin B resistance.

      (2) The authors have used a comprehensive experimental approach to provide a link between antimony-resistant isolates, lipid metabolism, parasite virulence, and amphotericin B resistance. They have combined the following approaches:

      a) In vivo infection models involving BL/6 and Apoe-/- mice.

      b) Ex-vivo infection models using primary Kupffer cells (KC) and peritoneal exudate macrophages (PEC) as physiologically relevant host cells.

      c) Various complementary techniques to ascertain lipid metabolism including GC-MS, Raman spectroscopy, microscopy.

      d) Applications of genetic and pharmacological tools to show the uptake and utilization of host lipids by the infected macrophage resident L. donovani amastigotes.

      (3) The outcome of this study has clear clinical significance. Additionally, the authors have supported their work by including patient data showing a clear clinical significance and correlation between serum lipid profiles and treatment outcomes.

      (4) The present study effectively connects the basic cellular biology of host-pathogen interactions with clinical observations of drug resistance.

      (5) Major findings in the study are well-supported by the data:

      a) Intracellular LD-R parasites induce fluid-phase endocytosis of LDL independent of LDL receptor (LDLr).

      b) Enhanced fusion of LDL-containing vesicles with parasitophorous vacuoles (PV) containing LD-R parasites both within infected KCs and PECs cells.

      c) Intracellular cholesterol transporter NPC1-mediated cholesterol efflux from parasitophorous vacuoles is suppressed by the LD-R parasites within infected cells.

      d) Selective exclusion of inflammatory ox-LDL through MSR1 downregulation.

      e) Accumulation of neutral lipid droplets contributing to amphotericin B resistance.

      Weaknesses:

      The weaknesses are minor:

      (1) The authors do not show how they ascertain that they have a purified fraction of the PV postdensity gradient centrifugation.

      (2) The study could have benefited from a more detailed analysis of how lipid droplets physically interfere with amphotericin B access to parasites.

      We have addressed both these concerns in the revised Version of this work as elaborated in the following section.

      Impact and significance:

      This work makes several fundamental advances:

      (1) The authors were able to show the link between antimony resistance and enhanced parasite proliferation.

      (2) They were also able to reveal how parasites can modify host cell metabolism to support their growth while avoiding inflammation.

      (3) They were able to show a certain mechanistic basis for emerging amphotericin B resistance.

      (4) They suggest therapeutic strategies combining lipid droplet inhibitors with current drugs.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      (1) Experimental suggestions:

      a) The authors could have provided a more detailed analysis of lipid droplet composition. This is a critically missing piece in this nice study.

      We completely agree with the Reviewer on this, a more detailed analysis of lipid droplets composition, dynamics of its formation and mechanism of lipid transfer to amastigotes residing within the PV would be worthy of further investigation. To answer the Reviewer, we are already conducting investigation in this direction and have very promising initial results which we are willing to share with the Reviewer as unpublished communication if requested. Since, we plan to address these questions independently, we hope Reviewer will understand our hesitation to include these data into the present work which is already data dense. We sincerely believe existence of lipid droplet contact sites with the PV along with the specific lipid type transfer to amastigotes and its mechanism requires special attention and could stand out as an independent work by itself.

      b) The macrophages (PEC, KC) could have been treated with latex beads as a control, which would indicate that cholesterol and lipids are indeed utilized by the Leishmania parasitophorous vacuole (PV) and essential for its survival and proliferation.

      We thank the reviewer for this nice suggestion, which we believe will further strengthen the conclusion of this work. We have now included this data as Figure 5E in the revised manuscript. Our data showed that infected KC harbouring both LD-R amastigotes and Fluorescent Latex Beads, showed a concentrated staining of Cholesterol around amastigotes, with no positive Cholesterol staining around internalized latex beads similar to LD-S amastigotes. This observation clearly confirmed specific lipid uptake in LD-R-PV, which can not be replicated by phagocytosed Latex Beads.

      c) HMGCoA reductase is an important enzyme for the mevalonate pathway and cholesterol synthesis. The authors have not commented on this enzyme in either host or parasite. Additionally, western blots of these enzymes along with SREBP2 could have been performed.

      We appreciate the concern and do see the point why reviewer is suggesting this. We would like to mention that regarding HMGCoA we already do have real time qPCR data which perfectly aligns with our RNAseq data (Figure 6 A i, in the  submitted and revised manuscript), showing significant downregulation specifically in LD-R infected KC as compared to uninfected control. We are including this data as Author response image 2. However, we did not proceed with checking the level of HMGCoA at the protein level as we noticed several previous reports have suggested that HMGCoA reductase remains under transcriptional control of SERBP2 (doi.org/10.1016/j.cmet.2011.03.005, doi: 10.1194/jlr.C066712, doi:10.1194/jlr.RA119000201), which acts the master regulator of mevalonate pathway and cholesterol synthesis (doi.org/10.1161/ATVBAHA.122.317320) and SERBP2 remains significantly downregulated in response to LD-R infection (Figure 6B i and Figure 6C in the  submitted and revised manuscript). However, as suggested by the Reviewer, we have updated this data in the revised manuscript as Figure 6D. Western blot data further confirmed a significant expected downregulation of HMGCoA in response to LD-R infection.

      Author response image 2.

      qPCR Analysis of HMGCR Expression Following Leishmania donovani Infection: Quantitative PCR analysis showing the relative expression of hmgcr (3-hydroxy-3-methylglutaryl-CoA reductase) in Kupffer cells after 24 hours of Leishmania donovani (LD) infection compared to uninfected control cells. Gene expression levels are normalized to β-actin as an internal control, and fold change is represented relative to the uninfected condition.

      d) The authors should discuss the expression pattern of any enzyme of the mevalonate pathway that they have found to be dysregulated in the transcript data.

      As per the reviewer’s suggestion, we have looked into the RNA seq data and observed that apart from hmgcr, hmgcs (3-hydroxy-3methylglutaryl-CoA synthase), another key enzyme in the mevalonate pathway, is significantly downregulated in host PECs in response to LD-R infection compared to the LD-S infection. We have Discussed this in the revised manuscript (Line 484-490), which read as “Further RNA sequencing data also revealed a significant downregulation of hmgcs (3-hydroxy-3-methylglutarylCoA synthase) in LD-R infected PECs as compared to LD-S infection. Downregulation of HMGCS which catalyzes the condensation of acetyl-CoA with acetoacetyl-CoA to form 3-hydroxy-3-methylglutaryl-CoA (HMG-CoA), which serves as an intermediate in both cholesterol biosynthesis and ketogenesis further supports our observation that LD-R-infected PECs preferentially rely on endocytosed low-density lipoprotein (LDL)-derived cholesterol rather than de novo synthesized cholesterol to support their metabolic needs.”

      e) The authors have followed a previously published protocol by Real F (reference 73) to enrich for parasitophorous vacuole (PV). However, they do not show how they ascertain that they have a purified fraction of the PV post-density gradient centrifugation. The authors should at least show Western blot data for LAMP1 for different fractions of density gradient from which they enriched the PV.

      As we previously stated in our response to Reviewer 1, in the revised manuscript we have included a detailed analysis of purity for different fractions during PV isolation. We sincerely appreciate the reviewer for highlighting this important concern and for suggesting an approach to conduct the experiment. We have included this data as Figure 3C i, ii, iii) in the revised manuscript. Our Imaging and Western blot data showed a significant enrichment of LAMP-1 in PV fraction, and we believe this result further reinforce the conclusions of our study on increased Cholesterol.

      (2) Presentation improvements:

      a) Add a clear timeline for infection experiments.

      As suggested by the Reviewer, we have included a schematic of Timelines for all the animal infection experiment (Figure 2Ci and Figure 7A,Fi) in the revised manuscript.

      b) Provide more details on patient sample collection and analysis.

      We have included more details on the sample collection in the Method section of the revised manuscript (Line 830-835), “Blood samples were collected from a total of 22 individuals spanning a diverse age range (8 to 70 years) by RMRI, Bihar, India. Among these, nine samples were obtained from healthy individuals residing in endemic regions to serve as controls. Serum was isolated from each blood sample through centrifugation, and the lipid profile was subsequently analysed using a specialized diagnostic kit (Coral Clinical System) following the manufacturer's protocol.”

      c) Consider reorganizing figures to better separate mechanistic and clinical findings.

      We would like to thank the reviewer for this suggestion. We felt that a major arrangement altering the sequence of the Figures as presented in the Original Submission will impact smooth flow of the story and hence, we did not disturb that. However, as suggested by the Reviewer we have performed major rearrangement within Figure 2, Figure 5 and Figure 6 and Figure 9 of the revised manuscript for a better representation of the data and convenience of the reader. Also, if the reviewer has specific suggestion regarding rearrangement of any particular figure, we will be happy to consider that.

      (3) Technical clarifications needed:

      a) Specify exact concentrations used for inhibitors.

      We apologise for this unwanted and unnecessary mistake. Please note we have now clearly mentioned the concentration of all the inhibitors used in this study in Result section and in the Figures of the revised manuscript. For easy understanding The revised section (Line 281-287) read as, “Finally, we infected the KCs with GFP expressing LD-R for 4Hrs, washed and allowed the infection to proceed in presence of fluorescent red-LDL and Latrunculin-A (5µM), a compound which specifically inhibits fluid phase endocytosis by inducing actin depolymerization [41]. Real-time fluorescence tracking demonstrated that Latrunculin-A treatment not only prevented the uptake of fluorescent red-LDL but also severely impacted intracellular proliferation of LD-R amastigotes (Video 2A and 2B and Figure 4E). In contrast, treatment with Cytochalasin-D, which alters cellular F-actin organization but does not affect fluid phase endocytosis [41], had no effect on the intracellular proliferation of LD-R irrespective of Cytochalasin-D concentrations (2.5µg/ml and 5µg/ml respectively) (Figure 4F and Figure S4D).”

      b) Include more details on image analysis methods.

      Please note that in specific sections like in Line numbers 574-579, 653-658, 10471049 of the  submitted manuscript, we have put special attention in describing the Image analysis process. However, we agree that in some particular cases more details will be appreciated by the reader. Hence, we have included an additional section of Image Analysis in the Methods section of the revised manuscript. This section (Line 727-739) read as, “Image processing and analysis were conducted using Fiji (ImageJ). For optimal visualization, Giemsa-stained macrophages (MΦs) were represented in grayscale to enhance contrast and structural clarity. To improve the distinction of different fluorescent signals, pseudo-colors were assigned to fluorescence images, ensuring better differentiation between various cellular components. For colocalization analysis (Figures 3, Figure 5, Figure 6, and Figure S2), we utilized the RGB profile plot plugin in ImageJ, which allows for the precise assessment of signal overlap by generating fluorescence intensity profiles across selected regions of interest. This approach provided quantitative insights into the spatial relationship between labelled molecules within infected cells. Additionally, for analyzing the distribution of cofilin in Figure 4, the ImageJ surface plot plugin was employed. This tool enabled three-dimensional visualization of fluorescence intensity variations, facilitating a more detailed examination of cofilin localization and its potential reorganization in response to infection.”

      c) Clarify statistical analysis procedures.

      We have already provided a dedicated section of Statistical Analysis in the Methods section of the Original Submission and also have also shown the groups being compared to determine the statistical analysis in the Figure and in the Figure Legends of the  submitted manuscript. Furthermore, as suggested by the Reviewer we have now also add additional clarification regarding the statistical analysis performed in the revised manuscript (Line 737-749). In the revised manuscript this section read as, “All statistical analyses were performed using GraphPad Prism 8 on raw datasets to ensure robust and reproducible results. For datasets involving comparisons across multiple conditions, one-way or two-way analysis of variance (ANOVA) was conducted, followed by Tukey’s post hoc test to assess pairwise differences while controlling for multiple comparisons. A 95% confidence interval (CI) was applied to determine the statistical reliability of the observed differences. For non-parametric comparisons across multiple groups, Wilcoxon rank-sum tests were employed, maintaining a 95% confidence interval, which is particularly useful for analysing skewed data distributions. In cases where only two groups were compared, Student’s t-test was used to determine statistical significance, ensuring an accurate assessment of mean differences. All quantitative data are represented as mean ± standard error of the mean (SEM) to illustrate variability within experimental replicates. Statistical significance was determined at P ≤ 0.05. Notation for significance levels: *P ≤ 0.05; **P ≤ 0.001; ***P ≤ 0.0001.”

      (4) Minor corrections:

      a) Methods section could benefit from more details on Raman spectroscopy analysis.

      We agree with this suggestion of the Reviewer. For providing more clarity have incorporate additional details in the Methodology for the Raman section of the revised manuscript (Line 638-649). The updated section will read as follows in the revised manuscript. “For confocal Raman spectroscopy, spectral data were acquired from individual cells at 1000× magnification using a 100 × 100 μm scanning area, following previously established specifications. After spectral acquisition, distinct Raman shifts corresponding to specific biomolecular signatures were extracted for further analysis. These included: Cholesterol (535–545 cm¹), Nuclear components (780–790 cm¹), Lipid structures (1262–1272 cm<sup>1</sup>), Fatty acids (1436–1446 cm<sup>1</sup>) Following spectral extraction, pseudo-color mapping was applied to highlight the spatial distribution of each biomolecular component within the cell. These processed spectral images are presented in Figure 3D1, where the first four panels illustrate the individual biomolecular distributions. A merged composite image was then generated to visualize the co-localization of these biomolecules within the cellular microenvironment, with the final panel specifically representing the spatial distribution of key biomolecules.”

      b) In the methods section line 609, page 14, the authors cite Real F protocol as reference 73 for PV enrichment. However, in the very next section on GC-MS analysis (lines 615-616, page 15), they state they have used reference 74 for PV enrichment. Can they explain why a discrepancy in PV isolation references this? Reference 74 does not mention anything related to PV isolation.

      Response: We would like to sincerely apologise for this confusion which probably raised from our writing of this section. We would like to confirm that our PV isolation protocol is based on the published work of Real F protocol (reference 73). However, in the next section of the submitted manuscript, GC-MS analysis was described and that was performed based on protocol referenced in 74. In the revised manuscript, we have avoided this confusion and made correction by putting the references in the proper places. In the revised manuscript, this section (Line 663-678) read as,

      “GC-MS analysis of LD-S and LD-R-PV

      Following a 24Hrs infection period, KCs were harvested, washed with phosphate-buffered saline (PBS), and pelleted. Subsequent to this, PV isolation was carried out using the previously described protocol [35]. After PV isolation Bradford assay was carried out for normalizing the protein concentration. The resulting equal volume of PV pellet was suspended in 20 ml of dichloromethane: methanol (2:1, vol/vol) and incubated at 4°C for 24hours. After centrifugation (11,000 g, 1 hour, 4°C), the supernatant was checked through thin layer chromatography (TLC) and subsequently evaporated under vacuum. The residue and pellet were saponified with 30% potassium hydroxide (KOH) in methanol at 80°C for 2 hours. Sterols were extracted with n-hexane, evaporated, and dissolved in dichloromethane. A portion of the clear yellow sterol solution was treated with N, O-bis(trimethylsilyl)trifluoroacetamide (BSTFA) and heated at 80°C for 1 hour to form trimethylsilyl (TMS) ethers. Gas chromatography/mass spectrometry (GC/MS) analysis was performed using a Varian model 3400 chromatograph equipped with DB5 columns (methyl-phenylsiloxane ratio, 95/5; dimensions, 30 m by 0.25 mm). Helium was used as the gas carrier (1 ml/min). The column temperature was maintained at 270°C, with the injector and detector set at 300°C. A linear gradient from 150 to 180°C at 10°C/min was used for methyl esters, with MS conditions set at 280°C, 70 eV, and 2.2 kV[77].

    1. Author response:

      The following is the authors’ response to the original reviews

      Joint Public Review:

      Idiopathic scoliosis (IS) is a common spinal deformity. Various studies have linked genes to IS, but underlying mechanisms are unclear such that we still lack understanding of the causes of IS. The current manuscript analyzes IS patient populations and identifies EPHA4 as a novel associated gene, finding three rare variants in EPHA4 from three patients (one disrupting splicing and two missense variants) as well as a large deletion (encompassing EPHA4) in a Waardenburg syndrome patient with scoliosis. EPHA4 is a member of the Eph receptor family. Drawing on data from zebrafish experiments, the authors argue that EPHA4 loss of function disrupts the central pattern generator (CPG) function necessary for motor coordination.

      The main strength of this manuscript is the human genetic data, which provides convincing evidence linking EPHA4 variants to IS. The loss of function experiments in zebrafish strongly support the conclusion that EPHA4 variants that reduce function lead to IS.

      The conclusion that disruption of CPG function causes spinal curves in the zebrafish model is not well supported. The authors' final model is that a disrupted CPG leads to asymmetric mechanical loading on the spine and, over time, the development of curves. This is a reasonable idea, but currently not strongly backed up by data in the manuscript. Potentially, the impaired larval movements simply coincide with, but do not cause, juvenile-onset scoliosis. Support for the authors' conclusion would require independent methods of disrupting CPG function and determining if this is accompanied by spine curvature. At a minimum, the language of the manuscript could be toned down, with the CPG defects put forward as a potential explanation for scoliosis in the discussion rather than as something this manuscript has "shown". An additional weakness of the manuscript is that the zebrafish genetic tools are not sufficiently validated to provide full confidence in the data and conclusions.

      We highly appreciate the reviewer’s insightful comments and the acknowledgment of the main values of our study. We agree with the reviewer that further experiments are needed to fully establish the relationship between CPG and scoliosis. In response, we have revised the conclusion in the manuscript to better reflect this. Additionally, we conducted further analyses on the mutants to provide additional evidence supporting this concept.

      Reviewer #1 (Recommendations for the authors):

      Epha4a mutant zebrafish exhibited mild spinal curves, mostly laterally and in the tail. This was 75% of homozyous mutants but also, surprisingly, about 20% of heterozygotes. epha4b mutants also developed some mild scoliosis. If the two zebrafish paralogs can compensate for each other (partial redundancy), we might expect more severe scoliosis in double mutants. Did the authors generate and analyze double mutants? I believe it would be very useful for this study to report the zebrafish phenotype of loss of both paralogs together.

      We appreciate the reviewer’s insightful comment regarding the potential value of reporting the phenotype of eph4a/eph4b double mutants. While we fully agree that this analysis would be valuable, our attempts to generate double mutants have been unsuccessful. These two genes are closely linked on the chromosome, with less than 100 kb separating them, which makes it challenging to generate double mutants through standard genetic crossing. Establishing a double mutant line would require more than a year due to the technical constraints of the process. Although we are unable to address this question directly at this time, we hypothesize that eph4a/eph4b double mutants may exhibit a higher likelihood of body axis abnormalities based on the phenotypes observed in single mutants and the known functions of these genes.

      We hope this perspective will provide some useful context despite the limitations.

      In Figure 1F, a pCDK5 western blot is performed as a readout of EPH4A signaling after either WT or C849Y mutant EPH4A is transfected into HEK 293T cells. It would be useful to mention in the text, or at least the figure legend, how this experiment was performed/where the protein samples came from. It is included in the methods, but in the main text, it simply says "we conducted western blotting" without mentioning whether the protein samples were from cell lines, patients, or another source.

      Sorry for our ignorance. A detailed description of the western blotting conduction was supplemented at both “results” part (page 8, line 187-190) and the Figure 1 legend.

      Was the relative turn angle biased to the left or right side of the fish? (i.e. is a positive angle a rightward or leftward turn?)

      We are sorry for our unclear description. In Figure 3D, positive angle means turning left, while negative angle means turning right. In wild-type larvae, the average turning angle over a 4-minute period is approximately 0, whereas in mutants, this value deviates from 0, indicating a directional preference (positive for leftward and negative for rightward turns) in swimming behavior during the recording period. We have also made the necessary supplementation in the text and figure legend.

      In Figure 4, morpholinos rather than mutants are used, but it is not clear why. Has it been established that the MO used disrupts gene function specifically? Can the effect of the MO be rescued by expressing a wild-type mRNA of Epha4a? Does MO knockdown induce spinal curves if fish are raised? Indeed, this could be a way to determine whether the spinal curves are caused by early events in development (when MOs are active).

      Thanks for the comments. The efficacy of relevant MOs has been well-documented in numerous previous studies (Addison et al., 2018; Cavodeassi et al., 2013; Letelier et al., 2018; Royet et al., 2017). Following this reviewer’s suggestion, we have raised the epha4a morphants into adults, while no scoliosis were observed, suggesting that the spinal curvature formation may be induced by long-term defects in the absence of Epha4a. Additionally, we reconfirmed the abnormal motor neuron activation frequency phenotype in the mutants background. The corresponding data have replaced the original Figure 4 in the manuscript. 

      References

      (1) Addison, M., Xu, Q., Cayuso, J., and Wilkinson, D.G. (2018). Cell Identity Switching Regulated by Retinoic Acid Signaling Maintains Homogeneous Segments in the Hindbrain. Dev Cell 45, 606-620 e603.

      (2) Cavodeassi, F., Ivanovitch, K., and Wilson, S.W. (2013). Eph/Ephrin signalling maintains eye field segregation from adjacent neural plate territories during forebrain morphogenesis. Development 140, 4193-4202.

      (3) Letelier, J., Terriente, J., Belzunce, I., Voltes, A., Undurraga, C.A., Polvillo, R., Devos, L., Tena, J.J., Maeso, I., Retaux, S., et al. (2018). Evolutionary emergence of the rac3b/rfng/sgca regulatory cluster refined mechanisms for hindbrain boundaries formation. Proc Natl Acad Sci U S A 115, E3731-E3740.

      (4) Royet, A., Broutier, L., Coissieux, M.M., Malleval, C., Gadot, N., Maillet, D., Gratadou-Hupon, L., Bernet, A., Nony, P., Treilleux, I., et al. (2017). Ephrin-B3 supports glioblastoma growth by inhibiting apoptosis induced by the dependence receptor EphA4. Oncotarget 8, 23750-23759.

      Reviewer #2 (Recommendations for the authors):

      Supplementary Table 3 is missing.

      Sorry for any inconvenience caused to the reviewers. Due to the size of the supplementary Table 3, we have separately uploaded an Excel file as supplementary materials. We have also double-checked during the resubmission process of the revised manuscript. Thanks for your thorough review.

      The authors report only a single mutant allele for zebrafish epha4a and epha4b. Additionally, they provide no information about how many generations each allele has been outcrossed. The authors should provide some type of validation that the phenotypes they describe result from loss of function of the targeted gene and not from an off-targeting event.

      Thanks for the comments. For epha4a and epha4b mutants, each homozygous mutant was initially derived from the self-crossing of first filial generation heterozygotes, and subsequent homozygous generations were maintained for fewer than three rounds of in-crossing. Interestingly, we observed a reduction in the incidence of scoliosis across successive generations. This trend may be attributed to potential genetic compensation mechanisms, which could mitigate the phenotypic severity over time. To address concerns about possible off-target effects, we synthesized and injected epha4a mRNA to test for phenotypic rescue. Our data show that epha4a mRNA injection partially restored swimming coordination in the mutants (Fig. S5). Moreover, similar motor coordination defects have been reported in Epha4-deficient mice, as documented in previous studies (Kullander et al., 2003; Borgius et al., 2014). These findings collectively strengthen the hypothesis that Epha4a plays a critical role in regulating motor coordination.

      References

      (1) Borgius, L., Nishimaru, H., Caldeira, V., Kunugise, Y., Low, P., Reig, R., Itohara, S., Iwasato, T., and Kiehn, O. (2014). Spinal glutamatergic neurons defined by EphA4 signaling are essential components of normal locomotor circuits. J Neurosci 34, 3841-3853.

      (2) Kullander, K., Butt, S.J., Lebret, J.M., Lundfald, L., Restrepo, C.E., Rydstrom, A., Klein, R., and Kiehn, O. (2003). Role of EphA4 and EphrinB3 in local neuronal circuits that control walking. Science 299, 1889-1892.

      The authors need to provide allele designations for the mutant alleles following accepted nomenclature guidelines.

      Thank you for your careful review! We have reviewed and made revisions to the genes and mutation symbols throughout the entire text.

      The three antisense morpholino oligonucleotides need to be validated for efficacy and specificity.

      Thanks for the comments. The morpholinos were extensively used and validated in previous studies, and the efficacy of these morpholinos has been thoroughly validated in multiple studies (Addison et al., 2018; Cavodeassi et al., 2013; Letelier et al., 2018; Royet et al., 2017). Furthermore, we also performed swimming behavior analysis in the mutant background, which showed similar results as the morphants. Moreover, we also performed rescue experiments to confirm the specificity of the mutants (Fig. S5). Finally, we reconfirmed the abnormal calcium signaling in the mutants (Fig. 4), which further support our previous knockdown results.

      References

      (1) Addison, M., Xu, Q., Cayuso, J., and Wilkinson, D.G. (2018). Cell Identity Switching Regulated by Retinoic Acid Signaling Maintains Homogeneous Segments in the Hindbrain. Dev Cell 45, 606-620 e603.

      (2) Cavodeassi, F., Ivanovitch, K., and Wilson, S.W. (2013). Eph/Ephrin signalling maintains eye field segregation from adjacent neural plate territories during forebrain morphogenesis. Development 140, 4193-4202.

      (3) Letelier, J., Terriente, J., Belzunce, I., Voltes, A., Undurraga, C.A., Polvillo, R., Devos, L., Tena, J.J., Maeso, I., Retaux, S., et al. (2018). Evolutionary emergence of the rac3b/rfng/sgca regulatory cluster refined mechanisms for hindbrain boundaries formation. Proc Natl Acad Sci U S A 115, E3731-E3740.

      (4) Royet, A., Broutier, L., Coissieux, M.M., Malleval, C., Gadot, N., Maillet, D., Gratadou-Hupon, L., Bernet, A., Nony, P., Treilleux, I., et al. (2017). Ephrin-B3 supports glioblastoma growth by inhibiting apoptosis induced by the dependence receptor EphA4. Oncotarget 8, 23750-23759.

      Line 229. "While in consistent with previous reports, the hindbrain rhombomeric boundaries were found to be defective....". This sentence is not clear. Please describe how it is "inconsistent".

      Thanks for the comments and sorry for the unclear description, we have described this more clearly in our revised manuscript (page 9, line 229-230).

      Animals frequently are described as "heterozygous mutants" or "mutants". Please make clear that the latter are homozygous mutant animals.

      Thanks for the comments. In the manuscript, all references to mutants specifically indicate homozygous mutants. Heterozygous mutants are explicitly identified as such.

      The chromatin interaction portion of the Methods does not include any information on how these experiments were conducted or where the data were obtained. This information needs to be provided.

      Thanks for your advice. The detailed information of chromatin interaction mapping has been provided in “Methods and Materials” (page 18-19, line 450-455). Information about the interacting regions was derived from Hi-C datasets of 21 tissues and cell types provided by GSE87112. The significance of interactions for Hi-C datasets was computed by Fit-Hi-C, with an FDR ≤ 10-6 considered significant.

      The authors present single-cell RNA-seq data in Supplementary Figure 5 for which they cite Cavone et al, 2021. This seems like an odd database to use. Can the authors provide an explanation for choosing it? In any case, the citation should also be made in the Supplementary Figure 5 legend.

      Thank you for your rigorous comment, we have cited this literature in the proper place of the revised manuscript. Cavone et al. used the her4.3:GFP line to label ependymo-radial glia (ERG) progenitor cells and performed single-cell RNA-seq on FACS-isolated fluorescent cells. The isolated cells included not only ERG progenitors but also undifferentiated and differentiated neurons and oligodendrocytes. The authors attributed this to the relative stability of the GFP protein, which remained in the progeny of GFP-expressing her4.3+ ERG progenitor cells, thus effectively acting as a short-term cell lineage tracer. Indeed, clustering analysis of this data successfully identifies neural progenitors and other neural clusters. Therefore, we consider that this scRNA-seq data encompasses a comprehensive range of neural cell types and is suitable for analyzing the expression of genes of interest. Furthermore, we downloaded and analyzed the scRNA-seq data of the zebrafish nervous system reported by Scott et al. in 2021 (Fig. S7B) (Scott et al., 2021). Despite differences in the developmental stages of the larvae analyzed (Cavone et al. examined larvae at 4 dpf, whereas Scott et al. analyzed larvae at 24, 36, and 48 hpf), our findings are consistent. Specifically, epha4a and epha4b are expressed in interneurons, whereas efnb3a and efnb3b are enriched in floor plate cells.

      References

      (1) Scott, K., O'Rourke, R., Winkler, C.C., Kearns, C.A., and Appel, B. (2021). Temporal single-cell transcriptomes of zebrafish spinal cord pMN progenitors reveal distinct neuronal and glial progenitor populations. Dev Biol 479, 37-50.

      In Figure Legend 1, "expressed from the EPHA4-mutant plasmid" is not an accurate description of the experiment.

      Sorry for the previous inaccurate description. The description has been revised to accurately reflect the experiment. “Western blot analysis of EPHA4-c.2546G>A variant showing the protein expression levels of EPHA4 and CDK5 and the amount of phosphorylated CDK5 (pCDK5) in HEK293T cells transfected with EPHA4-mutant or EPHA4-WT plasmid”.

      Figure 3 panels J and K need more explanation. I don't understand what the different colors represent nor do I understand what are wild type and what are mutant data.

      Thank you for your valuable feedback. We apologize for the lack of clarity in the original figure legend. To address this, we have revised the legend of Figure 3 to provide a more detailed explanation. In panels J and K, each color-coded curve represents the response of an individual larva from an independent experimental trial to the stimulus. Specifically, panel J depicts the response data for the wild-type larvae, whereas panel K presents the response data for the homozygous epha4a mutants.

      Please provide the genotypes for the images in Figure 5A.

      Thanks for the comments and we are sorry for our unclear description, we have described this more clearly in the Figure 5.

      Figure legend 6B should also note the heterozygote data with the wild type and homozygous mutant data.

      Thanks for the comments, the data are now included in Figure 6B.

      Epha4 and Efnb3 have well-established roles in axon guidance. Although this is noted in the Discussion, I think a more extensive description of prior findings would be helpful.

      Thanks for your valuable feedback. A more detailed description of the roles of Epha4 and Efnb3 in axon guidance was provided in the “Discussion” (page 16, line 388-396).

      The main conclusion of this manuscript is that EPHA4 variants cause IS by disrupting central pattern generator function. I think this is misleading. I think that the more valid conclusion is that EPHA4 loss of function causes axon pathfinding defects that impair locomotion by disrupting CPG activity, thereby leading to IS. I urge the authors to consider this more nuanced interpretation.

      Thank you for your insightful comments. We appreciate your suggestion to refine our main conclusion. We agree that the proposed revision more accurately reflects our findings and will revise the manuscript accordingly to state that “EPHA4 loss of function causes axon pathfinding defects, which impair locomotion by disrupting central pattern generator activity, potentially leading to IS.”

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this study, Seidenthal et al. investigated the role of the C. elegans Flower protein, FLWR-1, in synaptic transmission, vesicle recycling, and neuronal excitability. They confirmed that FLWR-1 localizes to synaptic vesicles and the plasma membrane and facilitates synaptic vesicle recycling at neuromuscular junctions, albeit in an unexpected manner. The authors observed that hyperstimulation results in endosome accumulation in flwr-1 mutant synapses, suggesting that FLWR-1 facilitates the breakdown of endocytic endosomes, which differs from earlier studies in flies that suggested the Flower protein promotes the formation of bulk endosomes. This is a valuable finding. Using tissue-specific rescue experiments, the authors showed that expressing FLWR-1 in GABAergic neurons restored the aldicarb-resistant phenotype seen in flwr-1 mutants to wild-type levels. In contrast, FLWR-1 expression in cholinergic neurons in flwr-1 mutants did not restore aldicarb sensitivity, yet muscle expression of FLWR-1 partially but significantly recovered the aldicarb-resistant defects. The study also revealed that removing FLWR-1 leads to increased Ca<sup>2+</sup> signaling in motor neurons upon photo-stimulation. Further, the authors conclude that FLWR-1 contributes to the maintenance of the excitation/inhibition (E/I) balance by preferentially regulating the excitability of GABAergic neurons. Finally, SNG-1::pHluorin data imply that FLWR-1 removal enhances synaptic transmission, however, the electrophysiological recordings do not corroborate this finding.

      Strengths:

      This study by Seidenthal et al. offers valuable insights into the role of the Flower protein, FLWR-1, in C. elegans. Their findings suggest that FLWR-1 facilitates the breakdown of endocytic endosomes, which marks a departure from its previously suggested role in forming endosomes through bulk endocytosis. This observation could be important for understanding how Flower proteins function across species. In addition, the study proposes that FLWR-1 plays a role in maintaining the excitation/inhibition balance, which has potential impacts on neuronal activity.

      Weaknesses:

      One issue is the lack of follow-up tests regarding the relative contributions of muscle and GABAergic FLWR-1 to aldicarb sensitivity. The findings that muscle expression of FLWR-1 can significantly rescue aldicarb sensitivity are intriguing and may influence both experimental design and data interpretation. Have the authors examined aldicarb sensitivity when FLWR-1 is expressed in both muscles and GABAergic neurons, or possibly in muscles and cholinergic neurons? Given that muscles could influence neuronal activity through retrograde signaling, a thorough examination of FLWR-1's role in muscle is necessary, in my opinion.

      We thank the reviewer for this suggestion. Indeed, the retrograde inhibition of cholinergic transmission by signals from muscle has been demonstrated by the Kaplan lab in a number of publications. We have now done the experiments that were suggested, see the new Fig. S3B: rescuing FLWR-1 in cholinergic neurons and in muscle did not perform any better in the aldicarb assay, while co-rescue in GABAergic neurons and muscle, like rescue in GABA neurons, led to a complete rescue to wild type levels. Thus, retrograde signaling from muscle to neurons does not contribute to effects on the E/I imbalance caused by the absence of FLWR1. The fact that muscle rescue can partially rescue the flwr-1 phenotype is likely due a cellautonomous effect of FLWR-1 on muscle excitability, facilitating muscle contraction.

      Would the results from electrophysiological recordings and GCaMP measurements be altered with muscle expression of FLWR-1? Most experiments presented in the manuscript compare wild-type and flwr-1 mutant animals. However, without tissue-specific knockout, knockdown, or rescue experiments, it is difficult to separate cell-autonomous roles from non-cell-autonomous effects, in particular in the context of aldicarb assay results. Also, relying solely on levamisole paralysis experiments is not sufficient to rule out changes in muscle AChRs, particularly due to the presence of levamisole-resistant receptors.

      We repeated the Ca<sup>2+</sup> imaging in cholinergic neurons, in response to optogenetic activation, with expression of FLWR-1 in muscle, see Fig. 4E. This did not significantly alter the increased excitability of the flwr-1 mutant. Thus, we conclude that, along with the findings in aldicarb assays, the function of FLWR-1 in muscle is cell-autonomous, and does not indirectly affect its roles in the motor neurons. Also, cholinergic expression of FLWR-1 by itself reduced Ca<sup>2+</sup> levels to those in wild type (Fig. 4E). In addition, we now also assessed the contribution of the N-AChR (ACR-16) to aldicarb-induced paralysis (Fig. S3C), showing that flwr-1 and acr-16 mutations independently mediate aldicarb resistance, and that these effects are additive. Thus, FLWR-1 does not affect the expression level or function of the N-AChR, as otherwise, the flwr1; acr-16 double mutation would not exacerbate the phenotype of the single mutants.

      This issue regarding the muscle role of FLWR-1 also complicates the interpretation of results from coelomocyte uptake experiments, where GFP secreted from muscles and coelomocyte fluorescence were used to estimate endocytosis levels. A decrease in coelomocyte GFP could result from either reduced endocytosis in coelomocytes or decreased secretion from muscles. Therefore, coelomocytespecific rescue experiments seem necessary to distinguish between these possibilities.

      We have performed a rescue of FLWR-1 in coelomocytes to address this, and found that this fully recovered the CC GFP signals to wild type levels. Therefore, the absence of FLWR-1 in muscles does not affect exocytosis of GFP. The data can be found in Fig. 5A, B.

      The manuscript states that GCaMP was used to estimate Ca<sup>2+</sup> levels at presynaptic sites. However, due to the rapid diffusion of both Ca<sup>2+</sup> and GCaMP, it is unclear how this assay distinguishes Ca<sup>2+</sup> levels specifically at presynaptic sites versus those in axons. What are the relative contributions of VGCCs and ER calcium stores here? This raises a question about whether the authors are measuring the local impact of FLWR-1 specifically at presynaptic sites or more general changes in cytoplasmic calcium levels.

      We compared Ca<sup>2+</sup> signals in synaptic puncta versus axon shafts, and did not find any differences. The data previously shown have been replaced by data where the ROIs were restricted to synaptic puncta. The outcome is the same as before. These data are provided in Fig. 4A, B, E, F. We thus conclude that the impact of FLWR-1 is local, in synaptic boutons.

      The experiments showing FLWR-1's presynaptic localization need clarification/improvement. For example, data shown in Fig. 3B represent GFP::FLWR-1 is expressed under its own promoter, and TagRFP::ELKS-1 is expressed exclusively in GABAergic neurons. Given that the pflwr-1 drives expression in both cholinergic and GABAergic neurons, and there are more cholinergic synapses outnumbering GABAergic ones in the nerve cord, it would be expected that many green FLWR-1 puncta do not associate with TagRFP::ELKS-1. However, several images in Figure 3B suggest an almost perfect correlation between FLWR-1 and ELKS-1 puncta. It would be helpful for the readers to understand the exact location in the nerve cord where these images were collected to avoid confusion.

      Thank you for making us aware that the provided images may be misleading. We have now extended this Figure (Fig. 3A-C) and provided more intensity profiles along the nerve cords in Fig. S4A-C. The quantitative analysis of average R<sup>2</sup> for the two fluorescent signals in each neuron type did not show any significant difference between the two, also after choosing slightly smaller ROIs for line scan analysis. We also highlighted the puncta corresponding to FLWR-1 in both neurons types, as well as to ELKS-1 in each specific neuron type, to identify FLWR-1 puncta without co-localized ELKS-1 signal. Also, we indicated the region that was imaged, i.e. the DNC posterior of the vulva, halfway to the posterior end of the nerve cord.

      The SNG-1::pHluorin data in Figure 5C is significant, as they suggest increased synaptic transmission at flwr-1 mutant synapses. However, to draw conclusions, it is necessary to verify whether the total amount of SNG-1::pHluorin present on synaptic vesicles remains the same between flwr-1 mutant and wild-type synapses. Without this comparison, a conclusion on levels of synaptic vesicle release based on changes in fluorescence might be premature, in particular given the results of electrophysiological recordings.

      We appreciate the comment. We now added data and experiments that verify that the basal SNG-1::pHluorin signal in the plasma membrane, measured at synaptic puncta and in adjacent axonal areas, is not different in flwr-1 mutants compared to wild type in the absence of stimulation. This data can be found in Fig. S5A. In addition, we cultured primary neurons from transgenic animals to compare total SNG-1::pHluorin to the vesicular fraction, by adding buffers of defined pH to the external, or buffers that penetrate the cell and fix intracellular pH. These experiments (Fig. S5B, C) showed no difference in the vesicle fraction of the pHluorin signal in wild type vs. flwr-1 mutant cells, demonstrating that flwr-1 mutants do not per se have altered SNG-1::pHluorin in their SV or plasma membranes.

      Finally, the interpretation of the E74Q mutation results needs reconsideration. Figure 8B indicates that the E74Q variant of FLWR-1 partially loses its rescuing ability, which suggests that the E74Q mutation adversely affects the function of FLWR-1. Why did the authors expect that the role of FLWR-1 should have been completely abolished by E74Q? Given that FLWR-1 appears to work in multiple tissues, might FLWR-1's function in neurons requires its calcium channel activity, whereas its role in muscles might be independent of this feature? While I understand there is ongoing debate about whether FLWR1 is a calcium channel, the experiments in this study do not definitively resolve local Ca<sup>2+</sup> dynamics at synapses. Thus, in my opinion, it may be premature to draw firm conclusions about calcium influx through FLWR-1.

      Thank you for bringing this up. We did not expect E74Q to necessarily abolish FLWR-1 function, unless it would be a Ca<sup>2+</sup> channel. Of course the reviewer is right, FLWR-1 might have functions as an ion channel as well as channel-independent functions. Yet, we are quite confident that FLWR-1 is not an ion channel. Instead, we think that E74Q alters stability of the protein (however, in the absence of biochemical data, we removed this conclusion), and that this impairs the function of FLWR-1 as a modulator, or possibly even, accessory subunit of the PMCA MCA-3. This interaction was indicated by a new experiment we added, where we found that FLWR-1 and MCA-3 must be physically very close to each other in the plasma membrane, using bimolecular fluorescence complementation (see new Fig. 9A, B). This provides a reasonable explanation for findings we obtained, i.e. increased Ca<sup>2+</sup> levels in stimulated neurons of the flwr-1 mutant. If FLWR-1 acts as a stimulatory subunit of MCA-3, then its absence may cause reduced MCA-3 function and thus an accumulation of Ca<sup>2+</sup> in the synaptic terminals. In Drosophila, hyperstimulation of neurons led to reduced Ca<sup>2+</sup> levels (Yao et al., 2017, PLoS Biol 15: e2000931), suggesting that Flower is a Ca<sup>2+</sup> channel. Based on our findings, we suggest an alternative explanation. Based on proteomics, the PMCA is a component of SVs (Takamori et al., 2006, Cell 127: 831-846). Increased insertion of PMCA into the plasma membrane during high stimulation, along with impaired endocytosis in flower mutants, would increase the steadystate levels of PMCA in the PM. This could lead to reduced steady state levels of Ca<sup>2+</sup>. This ‘g.o.f.’ in Flower may also impact on Ca<sup>2+</sup> microdomains of the P/Q type VGCC required for SV fusion, which could contribute to the rundown of EPSCs we find during synaptic hyperstimulation (Fig. 5G-J). We acknowledge, though, that Yao et al. (2009, Cell 138: 947– 960), showed increased uptake of Ca<sup>2+</sup> into liposomes reconstituted with purified Flower protein. However, it cannot be ruled out that a protein contaminant could be responsible, as the controls were empty liposomes, not liposomes reconstituted with a mutated Flower protein purified the same way.

      We also tested the E74Q mutant in its ability to rescue the reduced PI(4,5)P<sub>2</sub> levels in coelomocytes (CCs), where we observed no positive effect. While we have not measured Ca<sup>2+</sup> in CCs, we would assume that here a function of FLWR-1 affecting increased PI(4,5)P<sub>2</sub> levels is not linked to a channel function. It was, nevertheless, compromised by E74Q (Fig. 8D).

      Also, the aldicarb data presented in Figures 8B and 8D show notable inconsistencies that require clarification. While Figure 8B indicates that the 50% paralysis time for flwr-1 mutant worms occurs at 3.5-4 hours, Figure 8D shows that 50% paralysis takes approximately 2.5 hours for the same flwr-1 mutants. This discrepancy should be addressed. In addition, the manuscript mentions that the E74Q mutation impairs FLWR-1 folding, which could significantly affect its function. Can the authors show empirical data supporting this claim?

      We performed the aldicarb assays in a consistent manner, but nonetheless note that some variability from day to day can affect such outcomes. Importantly, we always measured each control (wild type, flwr-1) along with each test strain (FLWR-1 point mutants), to ensure the relevant estimate of a point-mutant’s effect. These assays have been repeated, now including the FLWR-1 wild type rescue strain as a comparison. The data are now combined in Fig. 8B. Regarding the assumed instability of the E74Q mutant, as we, indeed, do not have any experimental data supporting this, we removed this sentence.

      Reviewer #2 (Public review):

      Summary:

      The Flower protein is expressed in various cell types, including neurons. Previous studies in flies have proposed that Flower plays a role in neuronal endocytosis by functioning as a Ca<sup>2+</sup> channel. However, its precise physiological roles and molecular mechanisms in neurons remain largely unclear. This study employs C. elegans as a model to explore the function and mechanism of FLWR-1, the C. elegans homolog of Flower. This study offers intriguing observations that could potentially challenge or expand our current understanding of the Flower protein. Nevertheless, further clarification or additional experiments are required to substantiate the study's conclusions.

      Strengths:

      A range of approaches was employed, including the use of a flwr-1 knockout strain, assessment of cholinergic synaptic activity via analyzing aldicarb (a cholinesterase inhibitor) sensitivity, imaging Ca<sup>2+</sup> dynamics with GCaMP3, analyzing pHluorin fluorescence, examination of presynaptic ultrastructure by EM, and recording postsynaptic currents at the neuromuscular junction. The findings include notable observations on the effects of flwr-1 knockout, such as increased Ca<sup>2+</sup> levels in motor neurons, changes in endosome numbers in motor neurons, altered aldicarb sensitivity, and potential involvement of a Ca<sup>2+</sup>-ATPase and PIP2 binding in FLWR-1's function.

      Weaknesses:

      (1) The observation that flwr-1 knockout increases Ca<sup>2+</sup> levels in motor neurons is notable, especially as it contrasts with prior findings in flies. The authors propose that elevated Ca<sup>2+</sup> levels in flwr-1 knockout motor neurons may stem from "deregulation of MCA-3" (a Ca<sup>2+</sup> ATPase in the plasma membrane) due to FLWR-1 loss. However, this conclusion relies on limited and somewhat inconclusive data (Figure 7). Additional experiments could clarify FLWR-1's role in MCA-3 regulation. For instance, it would be informative to investigate whether mutations in other genes that cause elevated cytosolic Ca<sup>2+</sup> produce similar effects, whether MCA-3 physically interacts with FLWR-1, and whether MCA-3 expression is reduced in the flwr-1 knockout.

      We thank the reviewer for bringing up these critical points. As to other mutations that produce elevated cytosolic Ca<sup>2+</sup>: Possible mutations could be g.o.f. mutations of the ryanodine receptor UNC-68, the sarco-endoplasmatic Ca<sup>2+</sup> ATPase, or mutants affecting VGCCs, like the L-type channel EGL-19 or the P/Q-type channel UNC-2. However, any such mutant would affect muscle contractions (as we have shown for r.o.f. mutations in unc-68, egl-19 and unc-2 in Nagel et al. 2005 Curr Biol 15: 2279-84) and thus would affect aldicarb assays (see aldicarb resistance induced by RNAi of these genes in Sieburth et al., 2005, Nature 436: 510). The same should be expected for g.o.f. mutations of any such gene. In neurons, we would expect increased or decreased Ca<sup>2+</sup> levels in response to stimulation.

      Regarding the physical interaction of MCA-3 and FLWR-1, we performed bimolecular fluorescence complementation, with two fragments of mVenus fused to the two proteins. This assay shows mVenus reconstitution (i.e., fluorescence) if the two proteins are found in close vicinity to each other. Testing MCA-3 and FLWR-1 in muscle indeed showed a robust signal, evenly distributed on the plasma membrane. As a control, FLWR-1 did not interact with another plasma membrane protein, the stomatin UNC-1 interacting with gap junction proteins (Chen et al., 2007, Curr Biol 17: 1334-9). FLWR-1 also interacted with the ER chaperone Nicalin (NRA2 in C. elegans), which helps assembling the TM domains of integral membrane proteins in association with the SEC translocon. However, this signal only occurred in the ER membrane, demonstrating the specificity of the BiFC assay. This data is presented in Fig. 9A, B. Additionally, we show that FLWR-1 expression has a function in stabilizing MCA-3 localization at synapses, which is also in line with the idea of a direct interaction (Fig. 9C, D).

      (2) In silico analysis identified residues R27 and K31 as potential PIP2 binding sites in FLWR-1. The authors observed that FLWR-1(R27A/K31A) was less effective than wild-type FLWR-1 in rescuing the aldicarb sensitivity phenotype of the flwr-1 knockout, suggesting that FLWR-1 function may depend on PIP2 binding at these two residues. Given that mutations in various residues can impair protein function non-specifically, additional studies may be needed to confirm the significance of these residues for PIP2 binding and FLWR-1 function. In addition, the authors might consider explicitly discussing how this finding aligns or contrasts with the results of a previous study in flies, where alanine substitutions at K29 and R33 impaired a Flower-related function (Li et al., eLife 2020).

      We further investigated the role of these two residues in an in vivo assay for PIP2 binding and membrane association of a reporter. We used the coelomocytes (CCs), in which a previous publication demonstrated that a GFP variant tagged with a PH domain would be recruited to the CC membrane (Bednarek et al., 2007, Traffic 8: 543-53). This assay was performed in wild type, flwr-1 mutants, and flwr-1 mutants rescued with wild type FLWR-1, the FLWR-1(E74Q) mutant, or the FLWR-1(K27A; R31A) double mutant. The data are shown in Fig. 8C, D. While the wild type FLWR-1 rescued PH-GFP levels at the CC membrane to the wild type control, the FLWR-1(K27A; R31A) double mutant did not rescue the reporter binding, indicating that, at least in CCs, reduced PIP2 levels are associated with non-functional FLWR-1. Mechanistically, this is not clear at present, though we noted a possible mechanism as found for synaptotagmin, that recruits the PIP2 kinase to the plasma membrane via a lysine and arginine containing motif (Bolz et al., 2023, Neuron 111: 3765-3774.e3767). We mention this now in the discussion. We also discussed our data with respect to the findings of Li et al., about the analogous residues K27, R31 (K29, R33) in the discussion section, i.e. lines 667-670, and the differences of our findings in electron microscopy compared to the Drosophila work (more rather than less bulk endosomes) were discussed in lines 713-720.

      (3) A primary conclusion from the EM data was that FLWR-1 participates in the breakdown, rather than the formation, of bulk endosomes (lines 20-22). However, the reasoning behind this conclusion is somewhat unclear. Adding more explicit explanations in the Results section would help clarify and strengthen this interpretation.

      We added a sentence trying to better explain our reasoning. Mainly, the argument is that accumulation of such endosomes of unusually large size is seen in mutants affecting formation of SVs from the endosome (in endophilin and synaptojanin mutants), while mutants affecting mainly endocytosis (dynamin) cause formation of many smaller endocytic structures that stay attached to the plasma membrane (Kittelmann et al., 2013, PNAS 110: E3007-3016). We changed our data analysis in that we collated the data for what we previously termed endosomes and large vesicles. According to the paper by Watanabe, 2013, eLife 2: e00723, endosomes are defined by their location in the synapse, and their size. However, this work used a much shorter stimulus and froze the preparations within a few dozens to hundreds of msec after the stimulus, while we used the protocol of Kittelmann 2013, which uses 30 sec stimulation and freezing after 5 sec. There, endosomes were defined as structures larger than SVs or DCVs, but no larger than 80 nm, with an electron dense lumen, and were very rarely observed. In contrast, large vesicles or ‘100 nm vesicles’, ranged from 50-200 nm diameter, with a clear lumen, were morphologically similar to the bulk endosomes as observed by Li et al., 2021. We thus reordered our data and jointly analyzed these structure as large vesicles / bulk endosomes. The outcome is still the same, i.e. photostimulated flwr-1 mutants showed more LVs than wild type synapses.

      (4) The aldicarb assay results in Figure 3 are intriguing, indicating that reduced GABAergic neuron activity alone accounts for the flwr-1 mutant's hyposensitivity to aldicarb. Given that cholinergic motor neurons also showed increased activity in the flwr-1 mutant, one might expect the flwr-1 mutant to display hypersensitivity to aldicarb in the unc-47 knockout background. However, this was not observed. The authors might consider validating their conclusion with an alternative approach or, at the minimum, providing a plausible explanation for the unexpected result. Since aldicarb-induced paralysis can be influenced by factors beyond acetylcholine release from cholinergic motor neurons, interpreting aldicarb assay results with caution may be advisable. This is especially relevant here, as FLWR-1 function in muscle cells also impacts aldicarb sensitivity (Figure S3B). Previous electrophysiological studies have suggested that aldicarb sensitivity assays may sometimes yield misleading conclusions regarding protein roles in acetylcholine release.

      We tested the unc-47; flwr-1 animals again at a lower concentration of aldicarb, to see if the high concentration may have leveled the differences between unc-47 animals and the double mutant. This experiment is shown in Fig. S3D, demonstrating that the double mutant is significantly less resistant to aldicarb. This verifies that FLWR-1 acts not only in GABAergic neurons, but also in cholinergic neurons (as we saw by electron microscopy and electrophysiology), and that the increased excitability of cholinergic cells leads to more acetylcholine being released. In the double mutant, where GABA release is defective, this conveys hypersensitivity to aldicarb.

      (5) Previous studies have suggested that the Flower protein functions as a Ca<sup>2+</sup> channel, with a conserved glutamate residue at the putative selectivity filter being essential for this role. However, mutating this conserved residue (E74Q) in C. elegans FLWR-1 altered aldicarb sensitivity in a direction opposite to what would be expected for a Ca<sup>2+</sup> channel function. Moreover, the authors observed that E74 of FLWR1 is not located near a potential conduction pathway in the FLWR-1 tetramer, as predicted by Alphafold3. These findings raise the possibility that Flower may not function as a Ca<sup>2+</sup> channel. While this is a potentially significant discovery, further experiments are needed to confirm and expand upon these results.

      As above, we do not exclude that FLWR-1 may constitute a channel, however, based on our findings, AF3 structure predictions and data in the literature, we are considering alternative explanations for the observed effect on Ca<sup>2+</sup> levels of Flower mutants in worms and flies. The observations of increase Ca<sup>2+</sup> levels in stimulated flwr-1 mutant neurons could result from a reduced stimulation of the PMCA, and this was also observed with low stimulation in Drosophila (Yao et al., 2017). This idea is supported by the indications of a direct physical interaction, or proximity, of the two proteins. The reduced Ca<sup>2+</sup> levels after hyperstimulation of Drosophila Flower mutants may have to do with increased levels of non-recycling PMCA in the plasma membrane, indicating that PMCA requires Flower for recycling. This could be underlying the rundown of evoked PSCs we find in worm flwr-1 mutants, and would also be in line with a function of FLWR-1 and MCA-3 in coelomocytes, cells that constantly endocytose, and in which both proteins are required for proper function (our data, Figs. 5A, B; 8D, E) and Bednarek et al., 2007 (Traffic 8: 543-553). CCs need to recycle / endocytose membranes and membrane proteins, and such proteins, likely including FLWR-1 and MCA-3, need to be returned to the PM effectively.

      We thus refrained from testing a putative FLWR-1 channel function in Xenopus oocytes, in part also because we would not be able to acutely trigger possible FLWR-1 gating. A constitutive Ca<sup>2+</sup> current, if it were present, would induce large Cl<sup>-</sup> conductance in oocytes, that would likely be problematic / killing the cells. The demonstration that FLWR-1(E74Q) does not rescue the PI(4,5)P<sub>2</sub> levels in coelomocytes is also more in line with a non-channel function of FLWR-1.

      (6) Phrases like "increased excitability" and "increased Ca<sup>2+</sup> influx" are used throughout the manuscript. However, there is no direct evidence that motor neurons exhibit increased excitability or Ca<sup>2+</sup> influx. The authors appear to interpret the elevated Ca<sup>2+</sup> signal in motor neurons as indicative of both increased excitability and Ca<sup>2+</sup> influx. However, this elevated Ca<sup>2+</sup> signal in the flwr-1 mutant could occur independently of changes in excitability or Ca<sup>2+</sup> influx, such as in cases of reduced MCA-3 activity. The authors may wish to consider alternative terminology that more accurately reflects their findings.

      Thank you, we rephrased the imprecise wording. Ca<sup>2+</sup> influx was meant with respect to the cytosol.

      Reviewer #3 (Public review):

      Summary:

      Seidenthal et al. investigated the role of the Flower protein, FLWR-1, in C. elegans and confirmed its involvement in endocytosis within both synaptic and non-neuronal cells, possibly by contributing to the fission of bulk endosomes. They also uncovered that FLWR-1 has a novel inhibitory effect on neuronal excitability at GABAergic and cholinergic synapses in neuromuscular junctions.

      Strengths:

      This study not only reinforces the conserved role of the Flower protein in endocytosis across species but also provides valuable ultrastructural data to support its function in the bulk endosome fission process. Additionally, the discovery of FLWR-1's role in modulating neuronal excitability broadens our understanding of its functions and opens new avenues for research into synaptic regulation.

      Weaknesses:

      The study does not address the ongoing debate about the Flower protein's proposed Ca<sup>2+</sup> channel activity, leaving an important aspect of its function unexplored. Furthermore, the evidence supporting the mechanism by which FLWR-1 inhibits neuronal excitability is limited. The suggested involvement of MCA-3 as a mediator of this inhibition lacks conclusive evidence, and a more detailed exploration of this pathway would strengthen the findings.

      We added new data showing the likely direct interaction of FLWR-1 with the PMCA, possibly upregulating / stimulating its function. This data is shown now in Fig. 9A, B. Also, we show now that FLWR-1 is required to stabilize MCA-3 expression / localization in the pre-synaptic plasma membrane (Fig. 9C, D). These findings are not supporting the putative function of FLWR-1 as an ion channel, but suggest that increased Ca<sup>2+</sup> levels following neuron stimulation in flwr-1 mutants are due to an impairment of MCA-3 and thus reduced Ca<sup>2+</sup> extrusion.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      The authors might consider focusing on one or two key findings from this study and providing robust evidence to substantiate their conclusions.

      We did substantiate the interactions of FLWR-1 and the PMCA, as well as assessing the function of FLWR-1 in the coelomocytes and the function of FLWR-1 in regulating PIP2 levels in the plasma membrane.

      Reviewer #3 (Recommendations for the authors):

      (1) Behavioral Analysis of Locomotion

      In Figure 1, the authors are encouraged to examine whether flwr-1 mutants show altered locomotion behaviors, such as velocity, in a solid medium.

      We performed such an analysis for wild type, comparing to flwr-1 mutants and flwr-1 mutants rescued with FLWR-1 expressed from the endogenous promoter. The data are shown in Fig. S1C. There was no difference. We note that we observed differences in swimming assays also only when we strongly stimulated the cholinergic neurons by optogenetic depolarization, but not during unstimulated, normal swimming.

      (2) Validation of FLWR-1 Tagging

      In Figure 2A, it is recommended that the authors confirm the functionality of the C-terminal-tagged FLWR-1.

      We performed such rescue assays during swimming. The data is shown in Fig. S2S, E. While the GFP::FLWR-1 animals were slightly affected right after the photostimulation, they quickly caught up with the wild type controls, while flwr-1 mutants remained affected even after several minutes.

      (3) Explanation of Differential Rescue in GABAergic Neurons and Muscle

      The authors should provide a rationale for why restoring FLWR-1 in GABAergic neurons fully rescues the aldicarb resistance phenotype, while its restoration in muscle also partially rescues it.

      We think that these effects are independent of each other, i.e. loss of FLWR-1 in muscles increases muscular excitability, which becomes apparent in the behavioral assay that depends on locomotion and muscle contraction. To assess this further, we performed combined GABAergic neuron and muscle rescue assays, as shown in Fig. S3B. The double rescue was not different from wild type, and performed better than the muscle rescue alone.

      (4) Rescue Experiments for Swimming Defect in GABAergic Neurons

      Consider adding rescue experiments to determine whether expressing FLWR-1 specifically in GABAergic neurons can restore the swimming defect phenotype.

      We did not perform this assay as swimming is driven by cholinergic neurons, meaning that we would only indirectly probe GABAergic neuron function and a GABAergic FLWR-1 rescue would likely not improve swimming much. Also, given the importance of the correct E/I balance in the motor neurons, it would likely require achieving expression levels that are very precisely matching endogenous expression levels, which is not possible in a cell-specific manner.

      (5) Further Data on GCaMP Assay for mca-3; flwr-1 Additive Effect

      The additive effect of the mca-3 and flwr-1 mutations on GCaMP signals requires further data for substantiation. Additional GCaMP recordings or statistical analysis would provide stronger support for the proposed interaction between MCA-3 and FLWR-1 in calcium signaling.

      Thank you. We increased the number of observations, and could thus improve the outcome of the assay in that it became more conclusive. Meaning, the double mutation was not exacerbating the effect of either single mutant, demonstrating that FLWR-1 and MCA-3 are acting in the same pathway. The data are in Fig. 7B, C.

      (6) Inclusion of Wild-Type FLWR-1 Rescue in Figures 8B and 8D

      Figures 8B and 8D would benefit from the inclusion of wild-type FLWR-1 as a rescue control.

      We included the FLWR-1 wild type rescue as suggested and summarized the data in Fig. 8B.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Responses to final minor critiques following initial revision

      Reviewer #1 (Recommendations for the authors): 

      The authors have generally done an excellent job of addressing my and the other reviewers' concerns. I have a few additional concerns that the authors could consider addressing through changes to the text: 

      We thank the Reviewer for this assessment and are glad to have addressed the major points.

      - Regarding the gRNA used for NMR studies, I thank the authors for adding additional rationale for their design of the RNA used. However, I still believe that it is misleading to term this RNA as a "gRNA", given that it is mainly composed of a sequence that is arbitrary (the spacer) and the sections of the gRNA that are constant between all gRNAs are truncated in a way that removes secondary structure that is likely essential for specific contacts with the Rec domains. I do not believe the authors need to make alterations to any of their experiments. However, I do think their description of the "gRNA" should be updated to properly reflect that this RNA lacks any of the secondary structure present in a typical gRNA, much of which is necessary to confer specificity of binding between GeoCas9 and the gRNA. As mentioned in my previous review, this may be best achieved by adding a cartoon of the secondary structure of the full-length gRNA and highlighting the region that was used in the truncated "gRNA". 

      We understand the Reviewer’s point. For any experiment in which the gRNA was truncated (i.e. NMR or some MST studies), we have clarified the text and no longer call it a “gRNA.” We state initially that it is a portion of the gRNA and then call it simply an “RNA.” 

      For experiments using the full-length constructs, we have kept the term “gRNA,” as it remains appropriate.

      We have also added a final Supplementary figure (S12) showing the structures of the truncated and full-length RNAs used, based on the _Geo_Cas9 cryo-EM structure and predicted with RNAfold.

      - Lines 256-257: "The ~3-fold decrease in Kd...". I believe the authors are discussing the Kd's of the mutants relative to WT, in which case the Kd increased. Also, the fold-change appears closer to 2fold than to 3-fold. 

      Yes, the Reviewer makes a good catch. We have corrected this.

      - Lines 407-408: "The mutations also diminished the stability of the full-length GeoCas9 RNP complex." This statement seems at odds with the authors' conclusions in the Results section that the full-length GeoCas9 variants had comparable affinities for the gRNAs (lines 376-382) 

      We agree that this seems contradictory. In the absence of full-length structures for all variants, we can’t definitively state what causes this. It could be that the mutation has an interesting allosteric effect on structure that does not affect RNA binding but induces the Cas9 protein to simply fall apart at lower temperatures, rendering the binding interaction moot. We have added a statement to this section.

      - The authors chose to keep "SpCas9" for consistency with their prior work and the work of many several others, including Doudna et al and Zhang et al. However, I will note that their publications on GeoCas9, the Doudna lab did use SpyCas9 to ensure consistent nomenclature within the publications. 

      We have made the change to “_Spy_Cas9”

      Reviewer #3 (Recommendations for the authors): 

      The authors clearly answered most of my concerns. I still have some technical questions about the analysis of CPMG-RD data but the numbers provided now seem to make sense. While I still think that crystal structures of the point mutant would make the conclusions more "bullet proof", I do appreciate the work associated with this and consider that the manuscript can be published as is. 

      We agree that additional magnetic fields could allow for additional models of CPMG data fitting and that additional crystal structures of the mutants could add to the conclusions. We appreciate the Reviewer recognizing the balance of the current results and potential future studies in signing off on publication.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary.

      The authors goal was to map the neural circuitry underlying cold sensitive contraction in Drosophila. The circuitry underlying most sensory modalities has been characterized but noxious cold sensory circuitry has not been well studied. The authors achieve their goal and map out sensory and post-sensory neurons involved in this behavior.

      Strengths.

      The manuscript provides convincing evidence for sensory and post sensory neurons involved in noxious cold sensitive behavior. They use both connectivity data and functional data to identify these neurons. This work is a clear advance in our understanding of noxious cold behavior. The experiments are done with a high degree of experimental rigor.

      Positive comments

      - Campari is nicely done to map cold responsive neurons, although it doesn't give data on individual neurons.

      - Chrimson and TNT experiments are nicely done.

      - Cold temperature activates basin neurons, it's a solid and convincing result.

      Weaknesses.

      Among the few weaknesses in this manuscript is the failure to trace the circuit from sensory neuron to motor neuron; and to ignore analysis of the muscles driving, cold induced contraction. Authors also need to elaborate more on the novel aspects of their work in the introduction or abstract.

      We have performed a more thorough em connectivity analysis of the CIII md neuron circuit (Figure 1A, Figure 1 – Figure supplement 1, Figure 10A). We now report all premotor neurons that are connected to CIII md neurons along with two additional projection/commandlike neurons. These additional premotor neurons (A01d3, A02e, A02f, A02g, A27k, and A31k) that are primarily implicated in locomotion were not required for cold nociception (Figure 5 – Figure supplement 2). Collectively, we have tested the requirement in cold nociception for ~94% synapses between CIII md->premotor neurons and all tested premotor with available driver lines. The requirement in cold nociception was also assessed for the two projection/command-like neurons dLIP7 and A02o neurons, which are required for sensory integration and directional avoidance to noxious touch, respectively (Figure 7 – Figure supplement 2) (Hu et al., 2017; Takagi et al., 2017). Silencing dLIP7 neurons resulted in modest reduction in cold-evoked behaviors, meanwhile A02o neurons were not required for cold nociception (Figure 7 – Figure supplement 2). To complete the analysis from thermosensation to evoked behavior, we analyzed cold-evoked Ca<sup>2+</sup> responses of larval musculature (Figure 10). Premotor neurons, which are connected to CIII md neurons, target multiple muscle groups (DL, DO, LT, VL, and VO) (Figure 10A). Individual larval segments have unique cold-evoked Ca<sup>2+</sup> responses, where the strongest cold-evoked Ca<sup>2+</sup> occurs in the central abdominal segments (Figure 10B-D). Inhibiting motor neuron activity or using an anesthetic (ethyl ether), there is a negligible cold-evoked Ca<sup>2+</sup> response compared to controls (Figure 10 – Figure supplement 1). Analysis of cold-evoked Ca<sup>2+</sup> in individual muscles reveal unique Ca<sup>2+</sup> dynamics for individual muscle groups (Figure 10E-H).

      Major comments.

      - Class three sensory neuron connectivity is known, and role in cold response is known (turner 16, 18). Need to make it clearer what the novelty of the experiments are.

      In figure 1, we are trying to guide the audience to CIII md neuron circuitry and emphasize the necessity and sufficiency CIII md neurons in cold nociception. Previously, only transient (GCaMP6) cold-evoked Ca<sup>2+</sup> were reported (Turner et al., 2016, 2018). However, here using CaMPARI, we performed dendritic spatial (sholl) analysis of cold-evoked Ca<sup>2+</sup> responses (Figure 1B-C). During the revision, we evaluated both CIII- and cold-evoked CT throughout larval development (Figure 1G, H). All in all, the findings from the first figure reiterate and replicate previous findings for the role of CIII md neuron in cold nociception. CIII md connectivity might be known, however, we investigated the functional and physiological roles of individual circuit neurons.

      - Why focus on premotor neurons in mechano nociceptive pathways? Why not focus on PMNs innervating longitudinal muscles, likely involved in longitudinal larval contraction? Especially since chosen premotor neurons have only weak effects on cold induced contraction?

      We assessed requirements for all premotor neurons that are connected to CIII md neurons and for which there are validated driver lines. Only premotor neurons (DnB, mCSI and Chair-1), which were previously initially implicated in mechanosensation, were also required for cold nociception. Premotor neurons previously implicated in locomotion (A01d3, A02e, A02f, A02g, A27k, and A31k) are not required for cold-evoked behaviors (Figure 5 – Figure supplement 2).

      Reviewer #2 (Public Review):

      Patel et al perform the analysis of neurons in a somatosensory network involved in responses to noxious cold in Drosophila larvae. Using a combination of behavioral experiments, Calcium imaging, optogenetics, and synaptic connectivity analysis in the Drosophila larval they assess the function of circuit elements in the somatosensory network downstream of multimodal somatosensory neurons involved in innocuous and noxious stimuli sensing and probe their function in noxious cold processing, Consistent with their previous findings they find the multidendritic class III neurons, to be the key cold sensing neurons that are both required and sufficient for the CT behaviors response (shown to evoked by noxious cold). They further investigate the downstream neurons identified based on literature and connectivity from EM at different stages of sensory processing characterize the different phenotypes upon activating/silencing those neurons and monitor their responses to noxious cold. The work reveals diverse phenotypes for the different neurons studied and provides the groundwork for understanding how information is processed in the nervous system from sensory input to motor output and how information from different modalities is processed by neuronal networks. However, at times the writing could be clearer and some results interpretations more rigorous.

      Specific comments

      (1) In Figure 1 -supplement 6D-F (Cho co-activation)

      The authors find that Ch neurons are cold sensitive and required for cold nociceptive behavior but do not facilitate behavioral responses induced but CIII neurons

      The authors show that coactivating mdIII and cho inhibits the CT (a typically observed coldinduced behavioral response) in the second part of the stimulation period, while Cho was required for cold-induced CT. Different levels of activation of md III and Cho (different light intensities) could bring some insights into the observed phenotypes upon Cho manipulation as different levels activate different downstream networks that could correspond to different stimuli. Also, it would be interesting to activate chordotonal during exposure to cold to determine how a behavioral response to cold is affected by the activation of chordotonal sensory neurons.

      Modulating both CIII md and Ch activation to assess the contribution of individual sensory neuron’s role in thermosensation would certainly shed unique insights. However, we believe that such analyses are beyond the scope of the current manuscript and better suited to future followup studies.

      (2) Throughout the paper the co-activation experiments investigate whether co-activating the different candidate neurons and md III neurons facilitates the md III-induced CT response. However, the cold noxious stimuli will presumably activate different neurons downstream than optogenetic activation of MdIII and thus can reveal more accurately the role of the different candidate neurons in facilitating cold nociception.

      We agree that the CIII md neuron activation of the downstream circuitry would be different from the cold-evoked activation of neurons downstream of primary sensory neurons. We believe that our current finding lay foundations for future works to evaluate how multiple sensory neurons work in concert for generating stimulus specific behavioral responses.

      (3) Use of blue lights in behavioral and imaging experiments

      Strong Blue and UV have been shown to activate MDIV neurons (Xiang, Y., Yuan, Q., Vogt, N. et al. Light-avoidance-mediating photoreceptors tile the Drosophila larval body wall. Nature 468, 921-926 (2010). https://doi.org/10.1038/nature09576) and some of the neurons tested receive input from MdIV.

      In their experiments, the authors used blue light to optogenetically activate CDIII neurons and then monitored Calcium responses in Basin neurons, premotor neurons, and ascending neurons and UV light is necessary for photoconversion in Campari Experiments. Therefore, some of the neurons monitored could be activated by blue light and not cdIII activation. Indeed, responses of Basin-4 neurons can be observed in the no ATR condition (Fig 3HI) and quite strong responses of DnB neurons. (Figure 6E) How do authors discern that the effects they see on the different neurons are indeed due to cold nociception and not the synergy of cold and blue light responses could especially be the case for DNB that could have in facilitating the response to cold in a multisensory context (where mdIV are activated by light).

      In addition, the silencing of DNB neurons during cold stimulation does not seem to give very robust phenotypes (no significant CT decrease compared to empty GAL4 control).

      It would be important to for example show that even in the absence of blue light the DNB facilitates the mdIII activation or cold-induced CT by using red light and Chrimson for example or TrpA activation (for coactivation with md III).

      Alternatively, in some other cases, the phenotype upon co-activation could be inhibited by blue light (e.g. chair-1 (Figure 5 H-I)).

      More generally, given the multimodal nature of stimuli activating mdIV , MdIII (and Cho) and their shared downstream circuitry it is important to either control for using the blue light in these stimuli or take into account the presence of the stimulus in interpreting the results as the coactivation of for example Cho and mdIII using blue lights also could activate mdIV (and downstream neurons, alter the state of the network that could inhibit the md III induced CT responses.

      Assessing the differences in behavioral phenotypes in the different conditions could give an idea of the influence of combining different modalities in these assays. For example, did the authors observe any other behaviors upon co-activation of MDIII and Cho (at the expense of CT in the second part of the stimulation) or did the larvae resume crawling? Blue light typically induces reorientation behavior. What about when co-activating mdIII and Basin-4?

      Using Chrimson and red light or TrpA in some key experiments e.g. with Cho, Basin-4, and DNB would clarify the implication of these neurons in cold nociception

      We agree that exposure to a bright light source results in avoidance behaviors in Drosophila larvae, which is primarily mediated by CIV md neurons. However, the light intensities used in our assays is much milder than the ones required to activate sensory neurons. Specifically, based on Xiang et al. 470nm light does not evoke any electrical response at the lowest tested light intensity (0.74mWmm<sup>-2</sup>), whereas our light intensity used in behavioral experiments was much lower at 0.15mWmm<sup>-2</sup>. Additionally, we assessed larval mobility and turning for control conditions ±ATR and also sensory neuron activation. As expected, there is an increase in larval immobility upon CIII md neurons activation (Author response image 1). Only activation of CIV md neurons resulted in light-evoked turning, meanwhile remaining conditions did show stimulus time locked turning response (Author response image 1). Furthermore, we tested whether the intensity of 470nm light used in our behavior experiments was enough to result in light-evoked Ca<sup>2+</sup> response in CIII md and CIV md neurons. We expressed RCaMP in sensory neurons using a pan-neural driver (GMR51C10<sup>GAL4</sup>). There was no detectable increase in light-evoked Ca<sup>2+</sup> response in either CIII md or CIV md neuron (Author response image 1).

      Furthermore, we also tested multiple optogenetic actuators (ChR2, ChR2-H134R, and CsChrimson) and two CIII md driver lines (19-12<sup>Gal4</sup> and R83B04<sup>Gal4</sup>). Regardless of the optogenetic actuator used or the wavelength of the light used, we observe light-evoked CT responses (Figure 1– Figure supplement 6). We found using CsChrimson raises several procedural challenges with our current experimental setup. In our hands, CsChrimson showed extreme sensitivity to any amount ambient white light intensities, whereas others have used infrared imaging to counteract ambient light sensitivity. Our imaging setup is equipped with visible spectrum imaging and cannot be retrofitted record infrared light sources. Thus, we have limited the use of CsChrimson to optogenetic-Ca<sup>2+</sup> imaging experiments, where we are not recording larval behavior.

      The use of TrpA1 would require heat stimulation for activating the channels, which in turn would impact downstream circuit neurons that are shared amongst sensory neurons.

      For CaMPARI experiments, the PC light was delivered using a similar custom filter cube, which was used in the original CaMPARI paper (Fosque et al., 2015). This filter cube delivers 440nm wavelength as the PC light. PC light exposure in absence of cold stimulus does not result in differential CaMPARI conversion between CIII md and CIV md (F<sub>red/green</sub> = 0.086 and 0.097, respectively). For the same condition, Ch neurons have high CaMPARI, but it is expected as they function in proprioception. Therefore, the chances of downstream neurons being solely activated by PC light remain low. The differential baseline CaMPARI F<sub>red/green</sub> ratios of individual circuit neurons could be a result of varying resting state cytosolic Ca<sup>2+</sup> concentrations.

      Lastly, for optogenetic-GCaMP experiments, where we use CIII md>CsChrimson and Basin-2/-4 or DnB>GCaMP to visualize CIII md evoked Ca<sup>2+</sup> responses in downstream neuron. Xiang et al. reported that confocal laser excitation for GCaMP does not activate CIV md neurons, which is consistent with what we have observed as well.

      Author response image 1.

      (A) For optogenetic experiments, percent turning was assessed in control conditions and sensory neuron activation. Only CIV md neurons activation results in an increase in bending response. Other conditions do not blue light-evoked turning. (A’) We assessed larval turning based on ellipse fitting using FIJI, the aspect ratio of the radii is indicative of larval bending state. We empirically determined that radii ratio of <2.5 represents a larval turning/bending. This method of ellipse fitting has previously been used to identify C. elegans postures using WrMTrck in FIJI (Nussbaum-Krammer et al., 2015). (B) Percent immobility for all control conditions plus sensory activation driver lines. Only CIII md neuron activation leads to sustained stimulus-locked increase in immobility. There’s also no blue light-evoked reductions in mobility, indicating that there was not increase in larval movement due to blue light. (C) We assessed CIII md (ddaF) and CIV md (ddaC) neurons response to blue light with similar light intensity that was used in behavioral optogenetic experiments. There is no blue light evoked increase in RCaMP fluorescence.

      (4) Basins

      - Page 17 line 442-3 "Neural silencing of all Basin (1-4) neurons, using two independent driver lines (R72F11GAL4 and R57F07<sup>GAL4</sup>).

      Did the authors check the expression profile of the R57F07 line that they use to probe "all basins"? The expression profile published previously (Ohyama et al, 2015, extended data) shows one basin neuron (identified as basin-4 ) and some neurons in the brain lobes. Also, the split GAL4 that labels Basin-4 (SS00740) is the intersection between R72F11 and R57F07 neurons. Thus the R57F07 likely labels Basin-4 and if that is the case the data in Figure 2 9 and supplement) and Figure 3 related to this driver line, should be annotated as Basin-4, and the results and their interpretation modified to take into account the different phenotypes for all basins and Basin-4 neurons.

      Due to the non-specific nature of R57F07<sup>GAL4</sup> in labeling Basin-4 and additional neuron types, we have decided to remove the driver line from our current analysis. We would need to perform further independent investigations to identify the other cell types and validate their role in cold nociception.

      Page 19 l. 521-525 I am confused by these sentences as the authors claim that Basin-4 showed reduced Calcium responses upon repetitive activation of CDIII md neurons but then they say they exhibit sensitization. Looking at the plots in FIG 3 F-I the Basin-4 responses upon repeated activation seem indeed to decrease on the second repetition compared to the first. What is the sensitization the authors refer to?

      We have rephrased this section.

      On Page 47-In this section of the discussion, the authors emit an interesting hypothesis that the Basin-1 neuron could modulate the gain of behavioral responses. While this is an interesting idea, I wonder what would be the explanation for the finding that co-activation of Cho and MDIII does not facilitate cold nociceptive responses. Would activation of Basin-1 facilitate the cold response in different contexts (in addition to CH0-mediated stimuli)?

      Page 48 Thus the implication of the inhibitory network in cold processing should be better contextualized.

      The authors explain the difference in the lower basin-2 Ca- response to Cold/ mdIII activation (compared to Basin-4) despite stronger connectivity, due a stronger inputs from inhibitory neurons to Basin-2 (compared to Basin-4). The previously described inhibitory neurons that synapse onto Basin-2 receive rather a small fraction of inputs from the class III sensory neurons. The differences in response to cold could be potentially assigned to the activation of the inhibitory neurons by the cold-sensing cho- neurons. However, that cannot explain the differences in responses induced by class III neurons. Do the authors refer to additional inhibitory neurons that would receive significant input from MdIII?

      Alternative explanations could exist for this difference in activation: electrical synapses from mdIII onto Basin-4, and by stronger inputs from mdIV (compared to Basin-2 in the case of responses to Cold stimulus (Cold induces responses in md IV sensory neurons). Different subtypes of CD III may differentially respond to cold and the cold-sensing ones could synapse preferentially on basin-4 etc.

      A possible explanation for lack of CT facilitation when Ch and CIII md neurons are both activated are likely the competing sensory inputs going into Basins and yet unknown role of the inhibitory network between sensory and Basin neurons in cold nociception (Jovanic et al., 2016). Mechanical activation of Ch leads to several behavioral responses (hunch, back-up, pause, crawl, and/or bend) and transition between behaviors (Kernan et al., 1994; Tsubouchi et al., 2012; Zhang et al., 2015; Turner et al., 2016, 2018; Jovanic et al., 2019; Masson et al., 2020).

      Meanwhile, primary CIII md-/cold-evoked is CT (Turner et al., 2016, 2018, Patel et al., 2022, Himmel et al., 2023). Certain touch- versus cold- evoked behaviors are mutually exclusive, where co-activation of Ch and CIII md likely leads to competing neural impulses leading to lack of any single behavioral enhancement. Furthermore, the mini circuit motif between Ch and Basins consisting of feedforward, feedback and lateral inhibitory neurons that play a role in behavioral selection and transitions might impact the overall output of Basin neurons. Upon Ch and CIII md neuron co-activation, the cumulative Basin neuronal output may be biased towards increased behavioral transitions instead of sustained singular behavior response.

      While we posited one possible mechanism explaining the differences between cold- or CIII mdevoked Ca<sup>2+</sup> responses in Basin 2 and 4 neurons, where we suggest the differences in evoked Ca<sup>2+</sup> responses may arise due to differential connectivity of TePns and inhibitory network neurons to Basin 2 and/or 4. Furthermore, ascending A00c neurons are connected to descending feedback SEZ neuron, SeIN128, which have connectivity to Basins (1-3 and strongest with Basin 2), A02o, DnB, Chair-1 and A02m/n (Ohyama et al., 2015; Zhu et al., 2024). However, how the 5 different subtypes of CIII md neurons respond to cold is unknown. Electrical recordings of the dorsal CIII md neurons revealed that within & between neuron subtypes there’s variability in temperature sensitivity of individual neurons, where population coding results in fine-tuned central temperature representation (Maksymchuk et al., 2022). Evaluating the role of how individual CIII md subtypes Basin activation could reveal important insights into the precise relationship between CIII md and multisensory integration Basin neurons. However, as of yet there are no known CIII md neuron driver lines that mark a subset of CIII md neurons thus limiting further clarification on how primary sensory information is transduced to integration neurons.

      (5) A00c

      Page 26 Figure 4F-I line While Goro may not be involved in cold nociception the A00c (and A05q) seems to be.

      A00c could convey information to other neurons other than Goro and thus be part of a pathway for cold-induced CT.

      A deeper look into A00c connectivity reveals that there is a reciprocal relationship between A00c and SEZ descending neuron, SeIN128 (Ohyama et al., 2015; Zhu et al., 2024). Additionally, this feedback SEZ descending neuron synapse onto A02o, A05q, Basins (highest connectivity to Basin 2 and weak connectivity to Basin 1 & 3), and select premotor neurons (Chair-1, DnB, and A02m/n) (Ohyama et al., 2015; Zhu et al., 2024). Interestingly, SEZ feedback neuron likely plays a role in the observed cold-/CIII md neuron evoked differential calcium activity and behavioral requirement amongst Basin-2 and -4 in cold nociception. We have added this to our discussion section.

      (6) Page 31 766-768 the conclusion that "premotor function is required for and can facilitate cold nociception" seems odd to stress as one would assume that some premotor neurons would be involved in controlling the behavioral responses to a stimulus. It would be more pertinent in the summary to specify which premotor neurons are involved and what is their function

      We have updated the section regarding premotor neurons’ role in cold nociception and now there’s a more specific concluding statement.

      (7) There are several Split GAL4 used in the study (with transgenes inserted in attP40 et attP2 site). A recent study points to a mutation related to attP40 that can have an effect on muscle function: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9750024/. The controls used in behavioral experiments do not contain the attP40 site. It would be important to check a control genotype bearing an attP40 site and characterize the different parameters of the CT behavior to cold and take this into account in interpreting the results of the experiments using the SplitGAL4 lines

      We have performed control experiments bearing empty attP40;attP2 sites in our neural silencing experiments. The observed muscle phenotypes were present in larvae bearing homozygous copies attP40/attP40 (van der Graaf et al., 2022). However, in our experiments, none of the larvae that we tested behaviorally had homozygous attP40;attP2 insertions. We have updated Table 1 to now include insertion sites.

      Reviewer #3 (Public Review):

      Summary:

      The authors follow up on prior studies where they have argued for the existence of cold nociception in Drosophila larvae. In the proposed pathway, mechanosensitive Class III multidendritic neurons are the noxious cold responding sensory cells. The current study attempts to explore the potential roles of second and third order neurons, based on information of the Class III neuron synaptic outputs that have been obtained from the larval connectome.

      Strengths:

      The major strength of the manuscript is the detailed discussion of the second and third order neurons that are downstream of the mechanosensory Class III multidendritic neurons. These will be useful in further studies of gentle touch mechanosensation and mechanonociception both of which rely on sensory input from these cells. Calcium imaging experiments on Class III

      activation with optogenetics support the wiring diagram.

      Weaknesses:

      The scientific premise is that a full body contraction in larvae that are exposed to noxious cold is a sensorimotor behavioral pathway. This premise is, to start with, questionable. A common definition of behavior is a set of "orderly movements with recognizable and repeatable patterns of activity produced by members of a species (Baker et al., 2001)." In the case of nociception behaviors, the patterns of movement are typically thought to play a protective role and to protect from potential tissue damage.

      Does noxious cold elicit a set of orderly movements with a recognizable and repeatable pattern in larvae? Can the patterns of movement that are stimulated by noxious cold allow the larvae to escape harm? Based on the available evidence, the answer to both questions is seemingly no. In response to noxious cold stimulation many, if not all, of the muscles in the larva, simultaneously contract (Turner et al., 2016), and as a result the larva becomes stationary. In response to cold, the larva is literally "frozen" in place and it is incapable of moving away. This incapacitation by cold is the antithesis of what one might expect from a behavior that protects the animals from harm.

      Extensive literature has investigated the physiological responses of insects to cold (reviewed in Overgaard and MacMillan, 2017). In numerous studies of insects across many genera (excluding cold adapted insects such as snow flies), exposure to very cold temperatures quickly incapacitates the animal and induces a state that is known as a chill coma. During a chill coma, the insect becomes immobilized by the cold exposure, but if the exposure to cold is very brief the insect can often be revived without apparent damage. Indeed, it is common practice for many laboratories that use adult Drosophila for studies of behavior to use a brief chilling on ice as a form of anesthesia because chilling is less disruptive to subsequent behaviors than the more commonly used carbon dioxide anesthesia. If flies were to perceive cold as a noxious nociceptive stimulus, then this "chill coma" procedure would likely be disruptive to behavioral studies but is not. Furthermore, there is no evidence to suggest that larval sensation of "noxious cold" is aversive.

      The insect chill coma literature has investigated the effects of extreme cold on the physiology of nerves and muscles and the consensus view of the field is that the paralysis that results from cold is due to complex and combined action of direct effects of cold on muscle and on nerves (Overgaard and MacMillan, 2017). Electrophysiological measurements of muscles and neurons find that they are initially depolarized by cold, and after prolonged cold exposure they are unable to maintain potassium homeostasis and this eventually inhibits the firing of action potentials (Overgaard and MacMillan, 2017). The very small thermal capacitance of a Drosophila larva means that its entire neuromuscular system will be quickly exposed to the effect of cold in the behavioral assays under consideration here. It would seem impossible to disentangle the emergent properties of a complex combination of effects on physiology (including neuronal, glial, and muscle homeostasis) on any proposed sensorimotor transformation pathway.

      Nevertheless, the manuscript before us makes a courageous attempt at attempting this. A number of GAL4 drivers tested in the paper are found to affect parameters of contraction behavior (CT) in cold exposed larvae in silencing experiments. However, notably absent from all of the silencing experiments are measurements of larval mobility following cold exposure. Thus, it is not known from the study if these manipulations are truly protecting the larvae from paralysis following cold exposure, or if they are simply reducing the magnitude of the initial muscle contraction that occurs immediately following cold (ie reducing CT). The strongest effect of silencing occurs with the 19-12-GAL4 driver which targets Class III neurons (but is not completely specific to these cells).

      Optogenetic experiments for Class III neurons relying on the 19-12-GAL4 driver combined with a very strong optogenetic acuator (ChETA) show the CT behavior that was reported in prior studies. It should be noted that this actuator drives very strong activation, and other studies with milder optogenetic stimulation of Class III neurons have shown that these cells produce behavioral responses that resemble gentle touch responses (Tsubouchi et al 2012 and Yan et al 2013). As well, these neurons express mechanoreceptor ion channels such as NompC and Rpk that are required for gentle touch responses. The latter makes the reported Calcium responses to cold difficult to interpret in light of the fact that the strong muscle contractions driven by cold may actually be driving mechanosensory responses in these cells (ie through deformation of the mechanosensitive dendrites). Are the cIII calcium signals still observed in a preparation where cold induced muscle contractions are prevented?

      A major weakness of the study is that none of the second or third order neurons (that are downstream of CIII neurons) are found to trigger the CT behavioral responses even when strongly activated with the ChETA actuator (Figure 2 Supplement 2). These findings raise major concerns for this and prior studies and it does not support the hypothesis that the CIII neurons drive the CT behaviors.

      Later experiments in the paper that investigate strong CIII activation (with ChETA) in combination with other second and third order neurons does support the idea activating those neurons can facilitate body-wide muscle contractions. But many of the co-activated cells in question are either repeated in each abdominal neuromere or they project to cells that are found all along the ventral nerve cord, so it is therefore unsurprising that their activation would contribute to what appears to be a non-specific body-wide activation of muscles along the AP axis. Also, if these neurons are already downstream of the CIII neurons the logic of this coactivation approach is not particularly clear. A more convincing experiment would be to silence the different classes of cells in the context of the optogenetic activation of CIII neurons to test for a block of the effects, a set of experiments that is notably absent from the study.

      The authors argument that the co-activation studies support "a population code" for cold nociception is a very optimistic interpretation of a brute force optogenetics approach that ultimately results in an enhancement of a relatively non-specific body-wide muscle convulsion.

      We have responded extensively to reviewer 3’s comments in our provisional response to address the critiques regarding conceptual merit of this paper.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this study, Nakagawa and colleagues report the observation that YAP is differentially localized, and thus differentially transcriptionally active, in spheroid cultures versus monolayer cultures. YAP is known to play a critical role in the survival of drug-tolerant cancer cells, and as such, the higher levels of basally activated YAP in monolayer cultures lead to higher fractions of surviving drug-tolerant cells relative to spheroid culture (or in vivo culture). The findings of this study, revealed through convincing experiments, are elegantly simple and straightforward, yet they add significantly to the literature in this field by revealing that monolayer cultures may actually be a preferential system for studying residual cell biology simply because the abundance of residual cells in this format is much greater than in spheroid or xenograft models. The potential linkage between matrix density and stiffness and YAP activation, while only speculated upon in this manuscript, is intriguing and a rich starting point for future studies.

      Although this work, like any important study, inspires many interesting follow-on questions, I am limiting my questions to only a few minor ones, which may potentially be explored either in the context of the current study or in separate, follow-on studies.

      We appreciate Reviewer #1's comments that our work is of importance to the field and particularly that it will "...add significantly to the literature in this field by revealing that monolayer cultures may actually be a preferential system for studying residual cell biology..."  We have sought to highlight the importance of how our findings could be applied to study resistance mechanisms at various points in the manuscript.

      Strengths:

      The major strengths of the work are described above.

      Weaknesses:

      Rather than considering the following points as weaknesses, I instead prefer to think of them as areas for future study:

      (1) Given the field's intense interest in the biology and therapeutic vulnerabilities of residual disease cells, I suspect that one major practical implication of this work could be that it inspires scientists interested in working in the residual disease space to model it in monolayer culture. However, this relies upon the assumption that drug-tolerant cells isolated in monolayer culture are at least reasonably similar in nature to drug-tolerant cells isolated from spheroid or xenograft systems. Is this true? An intriguing experiment that could help answer this question would be to perform gene expression profiling on a cell line model in the following conditions: monolayer growth, drug tolerant cells isolated from monolayer growth conditions, spheroid growth, drug tolerant cells isolated from spheroid growth conditions, xenograft tumors, and drug tolerant cells isolated from xenograft tumors. What are the genes and programs shared between drug-tolerant cells cultured in the three conditions above? Which genes and programs differ between these conditions? Data from this exercise could help provide additional, useful context with which to understand the benefits and pitfalls of modeling residual tumor cell growth in monolayer culture.

      We thank the reviewer for suggesting valuable future studies. We agree that the proposed experiments represent important next steps in understanding the role of YAP and other pathways in primary resistance. We believe, however, these experiments are both beyond the scope of the current manuscript and beyond what can reasonably be addressed in a revision. The distinct challenges associated with comparing in vivo and in vitro conditions would require significant optimization of single-cell approaches, especially given the robust cell death driven by afatinib treatment in vivo. Given the complexity of in vivo experimentation, we are concerned that such studies may not guarantee biologically meaningful insights. Nonetheless, we agree that this is a compelling direction for future research. If common gene expression patterns could be identified despite these challenges, such studies could help validate monolayer culture as a relevant model for investigating residual disease.

      (2) In relation to the point above, there is an interesting and established connection between mesenchymal gene expression and YAP/TAZ signaling. For example, analyses of gene expression data from human tumors and cell lines demonstrate an extremely strong correlation between these two gene expression programs. Further, residual persister cancer cells have often been characterized as having undergone an EMT-like transition. From the analysis above, is there evidence that residual tumor cells with increased YAP signaling also exhibit increased mesenchymal gene expression?

      We agree with the reviewer that a connection between YAP/TAZ activity and EMT is likely, given prior studies exploring correlations between these two gene signatures. We believe, however, exploring EMT represents a distinct research direction from the primary focus of the current manuscript.  We are concerned exploration of EMT, especially in the absence of corresponding preclinical models or mechanistic data directly linking EMT to therapy resistance in our models, could distract from the main conclusions of the manuscript. While we plan to stain for EMT-associated markers in the residual cancer tissue from the in vivo studies, it remains unclear whether such data would meaningfully contribute to the revised manuscript, regardless of the outcome.

      Reviewer #2 (Public review):

      The manuscript by Nakagawa R, et al describes a mechanism of how NSCLC cells become resistant to EGFR and KRAS G12C inhibition. Here, the authors focus on the initial cellular changes that occur to confer resistance and identify YAP activation as a non-genetic mechanism of acute resistance.

      The authors performed an initial xenograft study to identify YAP nuclear localization as a potential mechanism of resistance to EGFRi. The increase in the stromal component of the tumors upon Afatinib treatment leads the authors to explore the response to these inhibitors in both 2D and 3D culture. The authors extend their findings to both KRAS G12C and BRAF inhibitors, suggesting that the mechanism of resistance may be shared along this pathway.

      The paper would benefit from additional cell lines to determine the generalizability of the findings they presented. While the change in the localization of YAP upon Afatinib treatment was identified in a xenograft model, the authors do not return to animal models to test their potential mechanism, and the effects of the hyperactivated S127A YAP protein on Afatinib sensitivity in culture are modest. Also, combination studies of YAP inhibitors and EGFR/RAS/RAF inhibitors would have strengthened the studies.

      We thank the reviewer for their insightful comments. In this manuscript, we present data from 5 cell lines representing the EGFR/BRAF/KRAS pathway, demonstrating the generalizability of YAP-driven decreased cancer cell sensitivity to targeted inhibitors when cultured in 2D compared to spheroid counterparts. While expanding this analysis to a larger panel of cell lines is beyond the scope of the current study, we believe our findings provide a strong rationale for future investigations, including high-throughput screens conducted by other research groups and pharmaceutical companies, to recognize the value in screening spheroid cell cultures. We hope this work helps shift the field of cancer therapeutics toward screening approaches that better reflect tumor biology into drug discovery pipelines and believe this could be one of the most impactful and enduring contributions of our study.

      Reviewer #2 also mentions that "...combination studies of YAP inhibitors and EGFR/RAS/RAF inhibitors would have strengthened the studies..."  The concept that YAP/TAZ inhibitors (i.e. TEAD inhibitors) could be additive or synergistic in 2D culture is one that is being actively tested across several groups and in pharma. Several recent examples include a publication by Hagenbeek, et al., Nat. Cancer, 2023 (PMID: 37277530) showing that a TEAD inhibitor overcomes KRASG12C inhibitor resistance. Additional, recent work by Pfeifer, et al., Comm. Biol., 2024 (PMID: 38658677) suggests a similar effect between EGFR inhibitors and a different TEAD inhibitor. While neither of these studies extensively probes cell death pathways in the way performed in our studies, they nevertheless provide strong evidence that indeed TEAD + targeted EGFR/RAF/RAS inhibition in 2D have additive, if not synergistic, effects. We feel that these recent published studies affirm our findings and repeating such experiments is unlikely to add much new information. We thus feel they are beyond the scope of our present studies.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public Review):

      Summary:

      Olfactory sensory neurons (OSNs) in the olfactory epithelium detect myriads of environmental odors that signal essential cues for survival. OSNs are born throughout life and thus represent one of the few neurons that undergo life-long neurogenesis. Until recently, it was assumed that OSN neurogenesis is strictly stochastic with respect to subtype (i.e. the receptor the OSN chooses to express).

      However, a recent study showed that olfactory deprivation via naris occlusion selectively reduced birthrates of only a fraction of OSN subtypes and indicated that these subtypes appear to have a special capacity to undergo changes in birthrates in accordance with the level of olfactory stimulation. These previous findings raised the interesting question of what type of stimulation influences neurogenesis, since naris occlusion does not only reduce the exposure to potentially thousands of odors but also to more generalized mechanical stimuli via preventing airflow.

      In this study, the authors set out to identify the stimuli that are required to promote the neurogenesis of specific OSN subtypes. Specifically, they aim to test the hypothesis that discrete odorants selectively stimulate the same OSN subtypes whose birthrates are affected. This would imply a highly specific mechanism in which exposure to certain odors can "amplify" OSN subtypes responsive to those odors suggesting that OE neurogenesis serves, in part, an adaptive function.

      To address this question, the authors focused on a family of OSN subtypes that had previously been identified to respond to musk-related odors and that exhibit higher transcript levels in the olfactory epithelium of mice exposed to males compared to mice isolated from males. First, the authors confirm via a previously established cell birth dating assay in unilateral naris occluded mice that this increase in transcript levels actually reflects a stimulus-dependent birthrate acceleration of this OSN subtype family. In a series of experiments using the same assay, they show that one specific subtype of this OSN family exhibits increased birthrates in response to juvenile male exposure while a different subtype shows increased birthrates to adult mouse exposure. In the core experiment of the study, they finally exposed naris occluded mice to a discrete odor (muscone) to test if this odor specifically accelerates the birth rates of OSN types that are responsive to this odor. This experiment reveals a complex relationship between birth rate acceleration and odor concentrations showing that some muscone concentrations affect birth rates of some members of this family and do not affect two unrelated OSN subtypes.

      In addition to the results nicely summarized by the reviewer, which focus on experiments to examine the effects of odor stimulation on unilateral naris occluded (UNO) mice, an important part of the present study are experiments on non-occluded (i.e., non-UNO-treated) mice. These experiments show: 1) that the exposure of non-occluded mice to odors from adolescent male mice selectively increases quantities of newborn OSNs of the musk-responsive subtype Olfr235 (Figure 3G, H; previously Figure 6), 2) the exposure of non-occluded female mice to 2 different musk odorants (muscone, ambretone) selectively increases quantities of newborn OSNs of 3 musk responsive subtypes: Olfr235, Olfr1440 and Olfr1431 (Figure 4D-F; previously Figure 6), and 3) the exposure of non-occluded adult female mice to a musk odorants selectively increases quantities of newborn OSNs of musk responsive subtypes (Figure 5; previously Fig. S7). We have reorganized the revised manuscript to more prominently and clearly present the experimental design and findings of these experiments. We have also made changes to clarify (via schematics) the experimental conditions used (i.e., UNO, non-UNO, odor exposure) in each experiment.

      Strengths:

      The scientific question is valid and opens an interesting direction. The previously established cell birth dating assay in naris occluded mice is well performed and accompanied by several control experiments addressing potential other interpretations of the data.

      Weaknesses:

      (1) The main research question of this study was to test if discrete odors specifically accelerate the birth rate of OSN subtypes they stimulate, i.e. does muscone only accelerate the birth rate of OSNs that express muscone-responsive ORs, or vice versa is the birthrate of muscone-responsive OSNs only accelerated by odors they respond to?

      This question is only addressed in Figure 5 of the manuscript and the results only partially support the above claim. The authors test one specific odor (muscone) and find that this odor (only at certain concentrations) accelerates the birth rate of some musk-responsive OSN subtypes, but not two other unrelated control OSN subtypes. This does not at all show that musk-responsive OSN subtypes are only affected by odors that stimulate them and that muscone only affects the birthrate of musk-responsive OSNs, since first, only the odor muscone was tested and second, only two other OSN subtypes were tested as controls, that, importantly, are shown to be generally stimulus-independent OSN subtypes (see Figure 2 and S2).

      As a minimum the authors should have a) tested if additional odors that do not activate the three musk-responsive subtypes affect their birthrate b) choose 2-3 additional control subtypes that are known to be stimulus-dependent (from their own 2020 study) and test if muscone affects their birthrates.

      We appreciate these suggestions. Within the revised manuscript, we have described and included the results from several new experiments:

      (1) As noted by the reviewer, we had previously tested the effects of exposure to only one exogenous musk odorant, muscone, on quantities of newborn OSNs of the musk-responsive subtypes Olfr235, Olfr1440, and Olfr1431. To test whether the effects observed with muscone exposure occur with other musk odorants, we assessed the effects of exposure to ambretone (5-cyclohexadecenone), a musk odorant previously found to robustly activate musk-responsive OSNs (Sato-Akuhara et al., 2016; Shirasu et al., 2014), on quantities of newborn OSNs of 3 musk-responsive subtypes Olfr235, Olfr1440, and Olfr1431, as well as the SBT-responsive subtype Olfr912, in the OEs of non-occluded female mice. Exposure to ambretone was found to significantly increase quantities of newborn OSNs of all 3 musk-responsive subtypes (Figure 4D-F) but not the SBT-responsive subtype (Figure 4–figure supplement 4C-left), indicating that a variety of musk odorants can accelerate the birthrates of musk responsive subtypes.

      (2) To verify that exogenous non-musk odors do not increase quantities of newborn OSNs of musk responsive OSN subtypes (point a, above), we quantified newborn OSNs of 3 musk-responsive subtypes, Olfr235, Olfr1440, and Olfr1431, in non-occluded female mice that were exposed to the non-musk odorants SBT or IAA. As expected, neither of these odorants significantly affected the birthrates of the subtypes tested (Figure 4D-F).

      (3) To confirm that exogenous musk odors do not accelerate the birthrates of non-musk responsive OSN subtypes that were previously found to undergo stimulation-dependent neurogenesis (point b, above), we quantified newborn OSNs of 2 such subtypes, Olfr827 and Olfr1325, in non-occluded female mice that were exposed to muscone. As expected, exposure to muscone did not significantly affect the birthrates of either of these subtypes (Figure 4–figure supplement 4C-middle, right).

      (4) To provide additional confirmation that only some OSN subtypes have a capacity to exhibit increases in newborn OSN quantities in the presence of odors that activate them, we compared quantities of newborn OSNs of the SBT-responsive subtype Olfr912 in non-occluded females that were either exposed to 0.1% SBT versus unexposed controls. As expected, exposure of SBT caused no significant increase in quantities of newborn Olfr912 OSNs (Figure 4–figure supplement 4C-left).

      (2) The finding that Olfr1440 expressing OSNs do not show any increase in UNO effect size under any muscone concentration (Figure 5D, no significance in line graph for UNO effect sizes, middle) seems to contradict the main claim of this study that certain odors specifically increase birthrates of OSN subtypes they stimulate. It was shown in several studies that olfr1440 is seemingly the most sensitive OR for muscone, yet, in this study, muscone does not further increase birthrates of OSNs expressing olfr1440. The effect size on birthrate under muscone exposure is the same as without muscone exposure (0%).

      In contrast, the supposedly second most sensitive muscone-responsive OR olfr235 shows a significant increase in UNO effect size between no muscone exposure (0%) and 0.1% as well as 1% muscone.

      Findings that quantities of newborn Olfr1440 OSNs do not show a significantly greater UNO effect size in the OEs from mice exposed to muscone compared to control mice was also somewhat surprising to us. We think that there are two potential explanations for this result: 1) Unlike subtype Olfr235, subtype Olfr1440 exhibits a significant open-side bias in newborn OSN quantities in UNO-treated adolescent females even in the absence of exposure to muscone. We speculate that this subtype (as well as subtype Olfr1431) is stimulated by odors that are emitted by female mice at the adolescent stage, and/or by another environmental source. This may limit the influence of muscone exposure on the UNO effect size. 2) There is compelling evidence that odors within the environment can enter the closed side of the OE transnasally [via the nasopharyngeal canal (Kelemen, 1947)] and/or retronasally (via the nasopharynx) in UNO-treated mice [reviewed in (Coppola, 2012)]. Thus, it is conceivable that chronic exposure of UNO-treated mice to muscone results in the eventual entry on the closed side of the OE of muscone at concentrations sufficient to promote neurogenesis. If Olfr1440 is more sensitive to muscone than Olfr235 [e.g., (Sato-Akuhara et al., 2016; Shirasu et al., 2014)], OSNs of this subtype may be especially sensitive to small amounts of odors that enter the closed side of the OE transnasally and/or retronasally. These explanations are supported by the following results:

      - UNO-treated females exposed to 0.1% muscone show higher quantities of newborn Olfr1440 OSNs on both the open and closed sides of the OE in muscone exposed females compared to their unexposed counterparts (Figure 4–figure supplement 1A-middle). Similar results were also observed for newborn Olfr235 OSNs (Figure 4C-middle), albeit to a lesser extent, perhaps due to the lower sensitivity of this subtype to muscone.

      - In non-occluded female mice, exposure to 0.1% muscone was found to significantly increase quantities of newborn Olfr1440 OSNs, as well as newborn Olfr235 and Olfr1431 OSNs (Figure 4D-F in revised manuscript; Figure 6 in original version). Similar results were also observed upon exposure to ambretone, another musk odor (Figure 4D-F). These experiments strongly support the hypothesis that musk odors selectively increase birthrates of OSN subtypes that they stimulate.

      We have addressed these points within the results section of the revised manuscript.

      (3) The authors introduce their choice to study this particular family of OSN subtypes with first, the previous finding that transcripts for one of these musk-responsive subtypes (olfr235) are downregulated in mice that are deprived of male odors. Second, musk-related odors are found in the urine of different species. This gives the misleading impression that it is known that musk-related odors are indeed excreted into male mouse urine at certain concentrations. This should be stated more clearly in the introduction (or cited, if indeed data exist that show musk-related odors in male mouse urine) because this would be a very important point from an ethological and mechanistic point of view.

      In addition, this would also be important information to assess if the chosen muscone concentrations fall at all into the natural range.

      These are important points, which have addressed within the revised manuscript:

      (1) Within the introduction, we have now stated that the emission of musk odors by mice has not been documented. We have also added extensive discussions of what is known about the emission of musk odors by mice in a new subsection within Results, as well as within the Discussion section. Most prominently, we have cited one study (Sato-Akuhara et al., 2016) that noted unpublished evidence for the emission of Olfr1440-activating compounds from male preputial glands: “Indeed, our preliminary experiments suggest that there are unidentified compounds that activate MOR215-1 in mouse preputial gland extracts.” Another study, which used histomorphology, metabolomic and transcriptomic analyses to compare the mouse preputial glands to muskrat scent glands, found that the two glands are similar in many ways, including molecular composition (Han et al., 2022). However, the study did not identify known musk compounds within mouse preputial glands.

      (2) Based on the reviewer’s feedback and our own curiosity, we used GC-MS to analyze both mouse urine and preputial gland extracts for the presence of known musk odorants, particularly those known to activate Olfr235 and Olfr1440 (Sato-Akuhara et al., 2016). Although we were unable to find evidence for known musk odorants in mouse urine extracts (possibly due to insufficient sensitivity of the assay employed), we found that preputial gland extracts contain GC-MS signals that are structurally consistent with known musk odorants. A limitation of this approach, however, is that the conclusive identification of specific musk odorants in extracts derived from mouse urine and tissues requires comparisons to pure standards, many of which we could not readily obtain. For example, we were unable to obtain a pure sample of cycloheptadecanol, a musk molecule with a predicted potential match to a signal identified within preputial gland extracts. Another limitation is that although several known musk odorants have been found to activate Olfr235 and Olfr1440 OSNs, it is conceivable that structurally distinct odorants that have not yet been identified might also activate them. The findings from these experiments have been included in a new figure within the revised manuscript (Appendix 2–figure 1).

      Related: If these are male-specific cues, it is interesting that changes in OR transcripts (Figure 1) can already be seen at the age of P28 where other male-specific cues are just starting to get expressed. This should be discussed.

      We agree that the observed changes in quantities of newborn OSNs of musk-responsive subtypes in mice exposed to juvenile male odors deserves additional discussion. We have included a more extensive discussion of this observation in both the Results and Discussion sections of the revised manuscript.

      (4) Figure 5: Under muscone exposure the number of newborn neurons on the closed sides fluctuates considerably. This doesn't seem to be the case in other experiments and raises some concerns about how reliable the naris occlusion works for strong exposure to monomolecular odors or what other potential mechanisms are at play.

      We agree that the variability in quantities of newborn OSNs of musk-responsive subtypes on the closed side of the OE of UNO-treated mice deserves further discussion. As noted above, we suspect that these fluctuations are due, at least in part, to transnasal and/or retronasal odor transfer via the nasopharyngeal canal (Kelemen, 1947) and nasopharynx, respectively [reviewed in (Coppola, 2012)], which would be expected to result in exposure of the closed OE to odor concentrations that rise with increasing environmental concentrations. In support of this, quantities of newborn Olfr235 and Olfr1440 OSNs increase on both the open and closed sides with increasing muscone concentration (except at the highest concentration, 10%, in the case of Olfr1440) (Figure 4C-middle, Figure 4–figure supplement 1A-middle). It is conceivable that reductions in newborn Olfr1440 OSN quantities observed in the presence of 10% muscone reflect overstimulation-dependent reductions in survival. Our findings from UNO-based experiments are consistent with expectations that naris occlusion does not completely block exposure to odorants on the closed side, particularly at high concentrations. However, they also appear consistent with the hypothesis that exposure to musk odors promotes the neurogenesis of musk-responsive OSN subtypes.

      Considering the limitations of the UNO procedure, it is important to note that the present study also includes experimental exposure of non-occluded animals to both male odors (Figure 3G, H) and exogenous musk odorants (Figures 4D-F). Findings from the latter experiments provide strong evidence that exposure to multiple musk odorants (muscone, ambretone) causes selective increases in the birthrates of multiple musk-responsive OSN subtypes (Olfr235, Olfr1440, Olfr1431).

      We have included within the Results section of the revised manuscript a discussion of how observed effects of muscone exposure of UNO-treated mice may be influenced by transnasal/ retronasal odor transfer to the closed side of the OE.

      (5) In contrast to all other musk-responsive OSN types, the number of newborn OSNs expressing olfr1437 increases on the closed side of the OE relative to the open in UNO-treated male mice (Figure 1). This seems to contradict the presented theory and also does not align with the bulk RNAseq data (Figure S1).

      Subtype Olfr1437 is indeed an outlier among musk-responsive subtypes that were previously found to be more highly represented in the OSN population in 6-month-old sex-separated males compared to females (Appendix 1–figure 1)(C. van der Linden et al., 2018; Vihani et al., 2020). Somewhat unexpectedly, our findings from scRNA-seq experiments show slightly greater quantities of immature Olfr1437 OSNs on the closed side of the OE in juvenile males (Figure 1D, E of the revised manuscript, which now includes data from a second OE). Perhaps more informatively considering the small number of iOSNs of specific subtypes in the scRNA-seq datasets, EdU birthdating experiments show no difference in newborn Orlfr1437 OSN quantities on the 2 sides of the OE from UNO-treated juvenile males (Figure 2G). It is unclear to us why subtype Olfr1437 does not show open-side biases in newborn OSN quantities in juvenile male mice, but potential explanations include:

      - Age: Findings based on bulk RNA-seq that musk responsive OSN subtypes are more highly represented in mice exposed to male odors analyzed mice that were 6 months old (C. van der Linden et al., 2018) or > 9 months old (Vihani et al., 2020) at the time of analysis. By contrast, the present study primarily analyzed mice that were juveniles (PD 28) at the time of scRNA-seq analysis (Figure 1) or EdU labeling (Figure 2G). It is conceivable that different musk-responsive subtypes are selectively responsive to distinct odors that are emitted at different ages. In this scenario, odors that increase the birthrates of Olfr235, Olfr1440, and Olfr1431 OSNs may be emitted starting at the juvenile stage, while those that increase the birthrate of Olfr1437 OSNs may be emitted in adulthood. In potential support of this, juvenile males exposed to their adult parents at the time of EdU labeling showed a slightly greater (although not statistically significantly different) UNO effect size in quantities of newborn Olfr1437 OSNs compared to controls (Figure 3–figure supplement 3).

      - Capacity for stimulation-dependent neurogenesis: It is also conceivable that, unlike other musk-responsive OSN subtypes, Olfr1437 OSNs lack the capacity for stimulation-dependent neurogenesis (like the SBT-responsive subtype Olfr912, for example). If so, this would imply that increased representations of Olfr1437 OSNs observed in mice exposed to male odors for long periods (C. van der Linden et al., 2018; Vihani et al., 2020) may be due to male odor-dependent increases in the lifespans of Olfr1437 OSNs.

      Within the Discussion section of the revised manuscript, we have discussed the findings concerning Olfr1437.

      (6) The authors hypothesize in relation to the accelerated birthrate of musk-responsive OSN subtypes that "the acceleration of the birthrates of specific OSN subtypes could selectively enhance sensitivity to odors detected by those subtypes by increasing their representation within the OE". However, for two other OSN subtypes that detect male-specific odors, they hypothesize the opposite "By contrast, Olfr912 (Or8b48) and Olfr1295 (Or4k45), which detect the male-specific non-musk odors 2-sec-butyl-4,5-dihydrothiazole (SBT) and (methylthio)methanethiol (MTMT), respectively, exhibited lower representation and/or transcript levels in mice exposed to male odors, possibly reflecting reduced survival due to overstimulation."

      Without any further explanation, it is hard to comprehend why exposure to male-derived odors should, on one hand, accelerate birthrates in some OSN subtypes to potentially increase sensitivity to male odors, but on the other hand, lower transcript levels and does not accelerate birth rates of other OSN subtypes due to overstimulation.

      We agree that this point deserves further explanation. Within the revised manuscript, we have expanded the Introduction and Results to describe evidence from previous studies that exposure to stimulating odors causes two categories of changes to specific OSN subtypes: elevated representations or reduced representations within the OSN population. In one study (C. J. van der Linden et al., 2020), UNO treatment was found to cause a fraction of OSN subtypes to exhibit lower birthrates and representations on the closed side of the OE relative to the open. By contrast, another fraction of OSN subtypes exhibited higher representations on the closed side of the OEs of UNO-treated mice, but no difference in birthrates between the two sides. The latter subtypes were found to be distinguished by their receipt of extremely high levels of odor stimulation, suggesting that reduced odor stimulation via naris occlusion may lengthen their lifespans. In support of the possibility that Olfr912 (and Olfr1295), which detect SBT and MTMT, respectively (Vihani et al., 2020), which are emitted specifically by male mice (Lin et al., 2005; Schwende et al., 1986), UNO treatment was previously found to increase total Olfr912 OSN quantities on the closed side compared to the open side in sex-separated males (C. van der Linden et al., 2018), a finding confirmed in the present study (Figure 3–figure supplement 1H).

      Taken together, findings from previous studies as well as the current one indicate that olfactory stimulation can accelerate the birthrates and/or reduced the lifespans of OSNs, depending on the specific subtypes and odors within the environment. As we have now indicated in the Discussion, we do not yet know what distinguishes subtypes that undergo stimulation-dependent neurogenesis, but it is conceivable that they detect odors with a particular salience to mice. Thus, observations that some odorants (e.g., musks) cause stimulation-dependent neurogenesis while others do not (e.g., SBT) might reflect an animal’s specific need to adapt its sensitivity to the former. Alternatively, it is conceivable that stimulation-dependent reductions in representations of subtypes such as Olfr912 and Olfr1295 reflect a fundamentally different mode of plasticity that is also adaptive, as has been hypothesized (C. van der Linden et al., 2018; Vihani et al., 2020).

      Reviewer #1 (Recommendations For The Authors):

      To support the main claim, several controls are necessary as mentioned under point 1 of the public review.

      As outlined in our responses to the public review, new experiments within the revised manuscript indicate the following:

      (1) Accelerated birthrates of 3 different musk responsive OSN subtypes (Olfr235, Olfr1440, Olfr1431) are observed in non-occluded mice following exposure to multiple exogenous musk odorants (muscone, ambretone) (Figure 4D-F).

      (2) Exposure of non-occluded mice to non-musk odors (SBT, IAA) does not accelerate the birthrates of musk responsive OSN subtypes (Olfr235, Olfr1440, Olfr1431) (Figure 4D-F).

      (3) Exposure of mice to exogenous musk odors (muscone, ambretone) does not accelerate the birthrates of non-musk responsive OSN subtypes (e.g., Olfr912), including those previously found to undergo stimulation-dependent neurogenesis (Olfr827, Olfr1325) (Figure 4–figure supplement 4C).

      (4) Only a fraction of OSN subtypes have a capacity to undergo accelerated neurogenesis in the presence of odors that activate them (e.g., Olfr912 birthrates are not accelerated by SBT exposure) (Figure 4–figure supplement 4C-left).

      In addition, this study could be considerably improved by showing that the proposed mechanism applies beyond a single OSN subtype (olfr235), especially since the most sensitive OR subtype (expressing olfr1440) does not align with the main claim. The introduction states that this is difficult because the ligands for many ORs are unknown including all subtypes previously found to undergo stimulation-dependent neurogenesis referring to your 2020 study. While this reviewer agrees that the lack of deorphanization is a significant hurdle in the field, the 2020 study states that about 4% of all ORs (which should equal >40 ORs) show a stimulus-dependent down-regulation on the closed side, not only the 7 ORs which are closer examined (Figure 1). It would tremendously improve the impact of the current study to show that the proposed effect applies also to one of these other >40 ORs.

      We appreciate this question, as it alerted us to some shortcomings in how our findings were presented within the original manuscript. We respectfully disagree that only findings regarding subtype Olfr235 align with the main hypothesis of this study, which is that discrete odors can selectively promote the neurogenesis of sensory neuron subtypes that they stimulate. Specifically, we would like to draw attention to experiments on non-occluded female mice exposed to exogenous musk odorants (muscone, ambretone; revised Figures 4D-F; previously, Figure 6). Findings from these experiments provide compelling evidence that exposure to musk odorants causes selective increases in the birthrates of three different musk-responsive OSN subtypes: Olfr235, Olfr1440, and Olfr1431. Thus, we would suggest that results from the present study already show that the proposed mechanism applies to more than the just Olfr235 subtype. However, we agree with what we think is the essence of the reviewer’s point: that it is important to determine the extent to which this mechanism applies to OSN subtypes that are responsive to other (i.e., non-musk) odorants. While, as noted by the reviewer, our previous study identified several OSN subtypes that undergo stimulation-dependent neurogenesis (as well as many others that predicted to do so)(C. J. van der Linden et al., 2020), we are not aware of ligands that have been identified with high confidence for those subtypes. Although we are in the process of conducting experiments to identify additional odor/subtype pairs to which the mechanism described in this study applies, the early-stage nature of these experiments precludes their inclusion in the present manuscript.

      The ethological and mechanistic relevance of the current study could be significantly improved by showing that musk-related odors that activate olfr235 are actually found in male mouse urine (and additionally are not found in female mouse urine). Otherwise, the implicated link between the acceleration of OSN birthrates by exposure to male odors and acceleration by specific monomolecular odors does not hold, raising the question of any natural relevance (e.g. the proposed adaptive function to increase sensitivity to certain odors).

      As noted in our responses to the public review, we have addressed this important point within the revised manuscript as follows:

      (1) We have included an extensive discussion of what is known about the emission of musk-like odors by mice.

      (2) We have used GC-MS to analyze both mouse urine and preputial gland extracts for the presence of known musk compounds. Although inconclusive, we report that preputial glands contain signals that are structurally consistent with known musk compounds. The findings of these experiments have been included in the revised manuscript (new Appendix 2–figure 1), along with a discussion of their limitations.

      Reviewer #2 (Public Review):

      In their paper entitled "In mice, discrete odors can selectively promote the neurogenesis of sensory neuron subtypes that they stimulate" Hossain et al. address lifelong neurogenesis in the mouse main olfactory epithelium. The authors hypothesize that specific odorants act as neurogenic stimuli that selectively promote biased OR gene choice (and thus olfactory sensory neuron (OSN) identity). Hossain et al. employ RNA-seq and scRNA-seq analyses for subtype-specific OSN birthdating. The authors find that exposure to male and musk odors accelerates the birthrates of the respective responsive OSNs. Therefore, Hossain et al. suggest that odor experience promotes selective neurogenesis and, accordingly, OSN neurogenesis may act as a mechanism for long-term olfactory adaptation.

      We appreciate this summary but would like to underscore that a mechanism involving biased OR gene choice is just one of two possibilities proposed in the Discussion section to explain how odorant stimulation of specific subtypes accelerates the birthrates of those subtypes.

      The authors follow a clear experimental logic, based on sensory deprivation by unilateral naris occlusion, EdU labeling of newborn neurons, and histological analysis via OR-specific RNA-FISH. The results reveal robust effects of deprivation on newborn OSN identity. However, the major weakness of the approach is that the results could, in (possibly large) parts, depend on "downregulation" of OR subtype-specific neurogenesis, rather than (only) "upregulation" based on odor exposure. While, in Figure 6, the authors show that the observed effects are, in part, mediated by odor stimulation, it remains unclear whether deprivation plays an "active" role as well. Moreover, as shown in Figure 1C, unilateral naris occlusion has both positive and negative effects in a random subtype sample.

      In our view, the present study involves two distinct and complementary experimental designs: 1) odor exposure of UNO-treated animals and 2) odor exposure of non-occluded animals. Here we address this comment with respect to each of these designs:

      (1) For experiments performed on UNO-treated animals, we agree that observed differences in birthrates on the open and closed sides of the OE reflect, largely, a deceleration (i.e., downregulation) of the birthrates of these subtypes on the closed side relative to the open (as opposed to an acceleration of birthrates on the open side). Our objective in using this design was to test the extent to which specific OSN subtypes undergo stimulation-dependent neurogenesis under various odor exposure conditions. According to the main hypothesis of this study, a lower birthrate of a specific OSN subtype on the closed side of the OE compared to the open is predicted to reflect a lower level of odor stimulation on the closed side received by OSNs of that subtype. However (and as described in our responses to reviewer #1), a limitation of this design is that environmental odorants, especially at high concentrations, are likely to stimulate responsive OSNs on the closed side of the OE in addition to the open side due to transnasal and/or retronasal air flow.

      (2) Experiments performed on non-occluded animals were designed to provide critical complementary evidence that specific OSN subtypes undergo accelerated neurogenesis in the presence of specific odors. Using this design, we have found compelling evidence that:

      - Exposure of non-occluded mice to male odors causes the selective acceleration of the birthrate of Olfr235 OSNs (Figure 3G, H).

      - Exposure of non-occluded female mice to two different musk odorants (muscone and ambretone) selectively accelerates the birthrates three different musk responsive subtypes: Olfr235, Olfr1440, and Olf1431 (Figure 4D-F and Figure 4–figure supplement 4C).

      We have reorganized the revised manuscript to more clearly present the most important experimental findings using these two experimental designs. We have also highlighted (via schematics) the experimental conditions (e.g., UNO, non-occlusion, odor exposure) used for each experiment.

      Another weakness is that the authors build their model (Figure 8), specifically the concept of selectivity, on a receptor-ligand pair (Olfr912 that has been shown to respond, among other odors, to the male-specific non-musk odors 2-sec-butyl-4,5-dihydrothiazole (SBT)) that would require at least some independent experimental corroboration. At least, a control experiment that uses SBT instead of muscone exposure should be performed.

      We agree that this important concern deserves additional control experiments and discussion. We have addressed this concern within the revised manuscript as follows:

      - Within the Results section, we have added multiple new control experiments (detailed in response to Reviewer #1), including the one recommended above. As suggested, we quantified newborn OSNs of the SBT-responsive subtype Olfr912 in non-occluded females that were either exposed to 0.1% SBT or unexposed controls. Exposure of SBT was found to cause no significant increase in quantities of newborn Olfr912 OSNs (newly added Figure 4–figure supplement 4C-left). These findings further support the model in Figure 7 (previously Figure 8) that only a fraction of OSN subtypes have a capacity to undergo accelerated neurogenesis in the presence of odors that activate them.

      - Also within the Results section, we have made efforts to better highlight relevant control experiments that were included in the original version, particularly those showing that quantities of newborn Olfr912 OSNs are not affected by UNO in mice exposed to male odors (Figure 2H and Figure 3–figure supplement 1G; previously Figure 2F and Figure 3H) or by exposure of non-occluded females to male odors (Figure 3H; previously Figure 6E). Since Olfr235 is responsive to component(s) of male odors (C. van der Linden et al., 2018; Vihani et al., 2020), these results indicate that this subtype does not have the capacity of stimulation-dependent neurogenesis, which is consistent with our previous findings that only a fraction of subtypes have this capacity (C. J. van der Linden et al., 2020).

      In this context, it is somewhat concerning that some results, which appear counterintuitive (e.g., lower representation and/or transcript levels of Olfr912 and Olfr1295 in mice exposed to male odors) are brushed off as "reflecting reduced survival due to overstimulation." The notion of "reduced survival" could be tested by, for example, a caspase3 assay.

      This is a point that we agree deserves further discussion. Please see the explanation that we have outlined above in response to Reviewer #1.

      Within the revised manuscript, we have expanded the Introduction to describe evidence from previous studies that exposure to stimulating odors causes two categories of changes to specific OSN subtypes: elevated representations or reduced representations within the OSN population. We outline evidence from previous studies that Olfr912 and Olfr1295 belong to the latter category, and that the representations of these subtypes are likely reduced by male odor overstimulation-dependent shortening of OSN lifespan.

      Important analyses that need to be done to better be able to interpret the findings are to present (i) the OR+/EdU+ population of olfactory sensory neurons not just as a count per hemisection, but rather as the ratio of OR+/EdU+ cells among all EdU+ cells; and (ii) to the ratio of EdU+ cells among all nuclei (UNO versus open naris). This way, data would be normalized to (i) the overall rate of neurogenesis and (ii) any broad deprivation-dependent epithelial degeneration.

      We have addressed this concern in two ways within the revised manuscript:

      (1) We have noted within the Methods section that the approach of using half-sections for normalization has been used in multiple previous studies for quantifying newborn (OR+/EdU+) and total (OR+) OSN abundances (Hossain et al., 2023; Ibarra-Soria et al., 2017; C. van der Linden et al., 2018; C. J. van der Linden et al., 2020). Additionally, within the figure legends and Methods, we have more thoroughly described the approach used, including that it relies on averaging the quantifications from at least 5 high-quality coronal OE tissue sections that are evenly distributed throughout the anterior-posterior length of each OE and thereby mitigates the effects of section size and cell number variation among sections. In the case of UNO treated mice, the open and closed sides within the same section are paired, which further reduces the effects of section-to section variation. We have found that this approach yields reproducible quantities of newborn and total OSNs among biological replicate mice and enables accurate assessment of how quantities of OSNs of specific subtypes change as a result of altered olfactory experience, a key objective of this study.

      (2) To assess whether the use of alternative approaches for normalizing newborn OSN quantities suggested by the reviewers would affect the present study’s findings, we compared three methods for normalizing the effects of exposure to male odors or muscone on quantities of newborn Olfr235 OSNs in the OEs of both UNO-treated and non-occluded mice: 1) OR+/EdU+ OSNs per half-section (used in this study), 2) OR+/EdU+ OSNs per total number of EdU+ cells (reviewer suggestion (i)), and 3) OR+/EdU+ OSNs per unit of DAPI+ area (an approximate measure of nuclei number; reviewer suggestion (ii)). The three normalization methods yielded statistically indistinguishable differences in assessing the effects of exposure of either UNO-treated or non-occluded mice to male odors (newly added Figure 2–figure supplement 2 and Figure 3–figure supplement 2), or of exposure of non-occluded mice to muscone (newly added Figure 4–figure supplement 3). Based on these findings, and the considerable time that would be required to renormalize all data in the manuscript, we have chosen to maintain the use of normalization per half-section.

      Finally, the paper will benefit from improved data presentation and adequate statistical testing. Images in Figures 2 - 7, showing both EdU labeling of newborn neurons and OR-specific RNA-FISH, are hard to interpret. Moreover, t-tests should not be employed when data is not normally distributed (as is the case for most of their samples).

      We have made extensive changes within the revised manuscript to increase the clarity and interpretability of the figures, including:

      (1) Addition of a split-channel, high-magnification view of a representative image that shows the overlap of FISH and EdU signals (Figure 2D).

      (2) Addition of experimental schematics and timelines corresponding to each set of experiments.

      In the revised manuscript, several changes to the statistical tests have been made, as follows:

      (1) To assess deviation from normality of the histological quantifications of newborn and total OSNs of specific subtypes in this study, all datasets were tested using the Shapiro-Wilk test for non-normality and the P values obtained are included in Supplementary file 1 (figure source data). Of the 274 datasets tested, 253 were found to have Shapiro-Wilk P values > 0.05, indicating that the vast majority (92%) do not show evidence of significant deviation from a normal distribution.

      (2) A general lack of deviation of the datasets in this study from a normal distribution is further supported by quantile-quantile (QQ) plots, which compare actual data to a theoretically normal distribution (Appendix 4–figure 1). The datasets analyzed were separated into the following categories:

      a. Quantities of newborn OSNs in UNO treated mice (Appendix 4-figure 1A)

      b. Quantities of total OSNs in UNO treated mice (Appendix 4-figure 1B)

      c. Quantities of newborn OSNs in non-occluded mice (Appendix 4-figure 1C)

      d. UNO effect sizes for newborn or total OSNs (Appendix 4-figure 1D)

      (3) Results of both parametric and non-parametric statistical tests of comparisons in this study have been included in Supplementary file 2 (statistical analyses). In general, the results from parametric and non-parametric tests are in good agreement.

      (4) Statistical analyses of differences in OSN quantities in the OEs of non-occluded mice or UNO effect sizes in UNO-treated mice subjected more than two different experimental conditions have now been performed using one-way ANOVA tests, FDR-adjusted using the 2-stage linear step-up procedure of Benjamini, Krieger and Yekutieli.

      Reviewer #2 (Recommendations for the Authors):

      The manuscript by Hossain et al. would benefit from a thorough revision. Here, we outline several points that should be addressed:

      Figure 3E - I & Figure 4E&F: Red lines that connect mean values are misleading.

      Within the revised manuscript, the UNO effect size graphs have been modified for clarity, including removal of the lines between mean values except for those comparing changes over time post EdU injection (Figure 6 and Figure 6-figure supplement 1). For these latter graphs, we think that lines help to illustrate changes in effect sizes over time.

      Figure 3E - I: UNO effect sizes (right) should be tested via ANOVA.

      In the revised manuscript, statistical analyses of UNO effect sizes in UNO-treated mice subjected more than two different experimental conditions were done using one-way ANOVA tests, FDR-adjusted using the 2-stage linear step-up procedure of Benjamini, Krieger and Yekutieli (Figure 2-figure supplement 2; Figure 3; Figure 3-figure supplement 1; Figure 4; Figure 4-figure supplements 1, 2). The same tests were used for analysis of differences in OSN quantities in the OEs of non-occluded mice subjected more than two different experimental conditions (Figure 3; Figure 3-figure supplement 2; Figure 4; Figure 4-figure supplements 3, 4). For comparisons of differences in quantities of newborn OSNs of musk-responsive subtypes at 4 and 7 days post-EdU between non-occluded mice exposed and unexposed to muscone, a two sample ANOVA - fixed-test, using F distribution (right-tailed) was used (Figure 6; Figure 6-figure supplement 1).

      Images in Figures 2 - 7, showing both EdU labeling of newborn neurons and OR-specific RNA-FISH: Colabeling is hard / often impossible to discern. Show zoom-ins and better explain the criteria for "colabeling" in the methods.

      In the revised manuscript an enlarged and split-channel view of an image showing multiple newborn Olfr235 OSNs (OR+/EdU+) has been added (Figure 2D). A detailed description of the criteria for OR+/EdU+ OSNs is provided in Methods under the section “Histological quantification of newborn and total OSNs of specific subtypes.”

      Figure 1C: add Olfr912.

      As a control group for iOSN quantities of musk-responsive subtypes in Figure 1, we selected random subtypes that are expressed in the same zones: 2 and 3. Olfr912 OSNs were not included because this subtype was not randomly chosen, nor is it expressed the same zones (Olfr912 is expressed in zone 4). We also note that the scRNA-seq analysis was done to allow an initial exploration of the hypothesis that some OSN subtypes with that are more highly represented in mice exposed to male odors show stimulation-dependent neurogenesis. Considering that the scRNA-seq datasets contain only small numbers of iOSNs of specific subtypes, we think they are more useful for analyzing changes in birthrates within groups of subtypes (e.g., musk responsive, random) rather than individual subtypes.

      The time of OE dissection is different for data shown in Figure 1 (P28) as compared to other figures (P35). Please comment/discuss.

      Within the Results section of the revised manuscript, we have now clarified that the PD 28 timepoint chosen for EdU birthdating in the histological quantification of newborn OSNs of specific subtypes is analogous to the PD 28 timepoint chosen for identification of immature (Gap43-expressing) OSNs in the scRNA-seq samples. In the case of EdU birthdating, it is necessary to provide a chase period of sufficient length to enable robust and stable expression of an OR, which defines the subtype. A chase period of 7 days was chosen based on a previous study (C. J. van der Linden et al., 2020). Hence, a dissection date of PD 35 was chosen.

      Figure 3F&G: please discuss the female à female effects

      In the Results and Discussion sections of the revised manuscript, we discuss our observation that the Olfr1440 and Olfr1431 subtypes show significantly higher quantities of newborn OSNs on the open side compared to closed sides in UNO-treated females. We speculate that these subtypes may receive some odor stimulation in juvenile females, perhaps via musk or related odors emitted by females themselves or from elsewhere within the environment.

      Figure 4E (and other examples): male à male displays two populations (no effect versus effect); please explain/speculate.

      For some UNO effect sizes, there appears to be high degree of variation among mice, and, in some cases, this diversity appears to cause the data to separate into groups. We assessed whether this diversity might reflect mice that came from different litters, but this is not the case. Rather, we speculate that the observed diversity most likely reflects low representations of newborn OSNs of some subtypes and/or under specific conditions. The data referred to by the reviewer (now Figure 3–figure supplement 3D), for example, shows UNO effect sizes for quantities of newborn Olfr1431 OSNs, which has the lowest representation among the musk-responsive subtypes analyzed in this study.

      Figure 5C-E: It is unclear why strong muscone concentrations (10%) have no effect, whereas no muscone sometimes (D&E) has an effect.

      As discussed in response to comments from Reviewer #1, we speculate that fluctuations in UNO effect sizes in muscone-exposed mice, particularly at high muscone concentrations, may be due, at least in part, to transnasal and/or retronasal air flow [reviewed in (Coppola, 2012)], which would be expected to result in exposure of the closed side of the OE to muscone concentrations that increase with increasing environmental concentrations. In support of this, quantities of newborn Olfr235 (Figure 4C-middle) and Olfr1440 (Figure 4–figure supplement 1A-middle) OSNs increase on both the open and closed sides with increasing muscone concentration (except at the highest concentration, 10%, in the case of Olfr1440). We speculate that reductions in newborn Olfr1440 OSN quantities observed in the presence of 10% muscone may reflect overstimulation-dependent reductions in survival.

      As emphasized above, our study also includes experiments on non-occluded animals (Figures 3, 4, 5). Findings from these experiments provide additional evidence that exposure to multiple musk odorants (muscone, ambretone) causes selective increases in the birthrates of multiple musk-responsive OSN subtypes (Olfr235, Olfr1440, Olfr1431).

      We have included an extensive interpretation of UNO-based experiments, including their limitations, within the Results section of the revised manuscript.

      Figure S1: please explain the large error bars regarding "Transcript level".

      We have clarified that the error bars in this figure, which is now Appendix 1–figure 1, correspond to 95% confidence intervals.

      The figure captions could be improved for ease of reading.

      Figure captions have been revised for increased clarity.

      Figure 4: Include Olfr235 data for consistency.

      All OSN subtypes analyzed for the effects of exposure to adult mice on UNO-induced open-side biases in quantities of newborn OSNs have been included in a single figure, which is now Figure 3–figure supplement 3.

      Figure S6F&G: Do not run statistics on n = 2 (G) or 3 (F) samples.

      We have removed statistical test results from comparisons involving fewer than 4 observations.

      Reviewer #3 (Public Review):

      Summary:

      Neurogenesis in the mammalian olfactory epithelium persists throughout the life of the animal. The process replaces damaged or dying olfactory sensory neurons. It has been tacitly that replacement of the OR subtypes is stochastic, although anecdotal evidence has suggested that this may not be the case. In this study, Santoro and colleagues systematically test this hypothesis by answering three questions: is there enrichment of specific OR subtypes associated with neurogenesis? Is the enrichment dependent on sensory stimulus? Is the enrichment the result of differential generation of the OR type or from differential cell death regulated by neural activity? The authors provide some solid evidence indicating that musk odor stimulus selectively promotes the OR types expressing the musk receptors. The evidence argues against a random selection of ORs in the regenerating neurons.

      Strengths:

      The strength of the study is a thorough and systematic investigation of the expression of multiple musk receptors with unilateral naris occlusion or under different stimulus conditions. The controls are properly performed. This study is the first to formulate the selective promotion hypothesis and the first systematic investigation to test it. The bulk of the study uses in situ hybridization and immunofluorescent staining to estimate the number of OR types. These results convincingly demonstrate the increased expression of musk receptors in response to male odor or muscone stimulation.

      Weaknesses:

      A major weakness of the current study is the single-cell RNASeq result. The authors use this piece of data as a broad survey of receptor expression in response to unilateral nasal occlusion. However, several issues with this data raise serious concerns about the quality of the experiment and the conclusions. First, the proportion of OSNs, including both the immature and mature types, constitutes only a small fraction of the total cells. In previous studies of the OSNs using the scRNASeq approach, OSNs constitute the largest cell population. It is curious why this is the case. Second, the authors did not annotate the cell types, making it difficult to assess the potential cause of this discrepancy. Third, given the small number of OSNs, it is surprising to have multiple musk receptors detected in the open side of the olfactory epithelium whereas almost none in the closed side. Since each OR type only constitutes ~0.1% of OSNs on average, the number of detected musk receptors is too high to be consistent with our current understanding and the rest of the data in the manuscript. Finally, unlike the other experiments, the authors did not describe any method details, nor was there any description of quality controls associated with the experiment. The concerns over the scRNASeq data do not diminish the value of the data presented in the bulk of the study but could be used for further analysis.

      We are grateful to the reviewer for raising these important questions.

      In the revised manuscript, we have clarified that the scRNA-seq dataset presented in the original version of the manuscript (now called dataset OE 1) was published and described in detail in a previous study (C. J. van der Linden et al., 2020). The reviewer is correct that the proportion of OSNs within that dataset was lower in that dataset than in other datasets that have been published more recently (using updated methods). We think this is likely because of the way that the cells were processed (e.g., from cryopreserved single cells followed by live/dead selection). However, because the open and closed sides were processed identically, we do not expect the ratios of OSNs of specific subtypes to be greatly affected. Hence, the differences observed for specific OSN subtypes on the open versus closed sides are expected to be valid.

      As the reviewer notes, there is a surprisingly large difference between the number of OSNs of musk-responsive subtypes on the open and closed sides within the OE 1 dataset. This difference is a key piece of information that led us to formulate the hypothesis in the study: that musk responsive subtypes are born at a higher rate in the presence of male/musk odor stimulation. And while it is true that, on average, each subtype represents ~0.1% of the population, it is known that there is wide variance in representations among different subtypes [e.g., (Ibarra-Soria et al., 2017)]. The frequencies of the musk responsive subtypes among all OSNs on the open side of OE 1 (0.3% for Olfr235, 0.4% for olfr1440, 0.06% for Olfr1434, 0% for olfr1431, and 1% for Olfr1437) are in line with previous findings.

      To confirm that the scRNA-seq findings from dataset OE 1 are not an artifact of the cell preparation methods used, we generated a second scRNA-seq dataset, OE 2, which has been added to the revised manuscript (Figure 1). The OE 2 dataset was prepared according to the same experimental timeline as OE 1, but the cells were captured immediately after dissociation and live/dead sorting via FACS. As expected, most cells within OE 2 dataset are OSNs (77% on the open side, 66% on the closed). Importantly, like the OE 1 dataset, the OE 2 dataset shows higher quantities of iOSNs of musk responsive subtypes on the open side of the OE compared to the closed (normalized for either total cells or total OSNs) (Figure 1–figure supplement 1D, E).

      A weakness of the experiment assessing musk receptor expression is that the authors do not distinguish immature from mature OSNs. Immature OSNs express multiple receptor types before they commit to the expression of a single type. The experiments do not reveal whether mature OSNs maintain an elevated expression level of musk receptors.

      While it is established that multiple ORs are coexpressed at a low level during OSN differentiation (Bashkirova et al., 2023; Fletcher et al., 2017; Hanchate et al., 2015; Pourmorady et al., 2024; Saraiva et al., 2015; Scholz et al., 2016; Tan et al., 2015), this has been found to occur primarily at the immediate neuronal precursor 3 (INP3) stage (Bashkirova et al., 2023; Fletcher et al., 2017), which is characterized by expression of Tex15 (Fletcher et al., 2017; Pourmorady et al., 2024) and precedes the immature OSN (iOSN) stage, which is characterized by expression of Gap43 (Fletcher et al., 2017; McIntyre et al., 2010; Verhaagen et al., 1989). Within the scRNA-seq datasets in the present study, iOSNs of specific subtypes are identified based on robust expression of Gap43 (Log<sup>2</sup> UMI > 1) and a specific OR gene (Log<sup>2</sup> UMI > 2), as described in the figures and methods. Thus, the cells defined as iOSNs are expected to express a single OR gene and this expression should be maintained as iOSNs transition to mOSNs. To confirm these predictions, we carried out a detailed analysis of OR expression at three different stages of OSN differentiation: INP3, iOSN, and mOSN (Figure 1–figure supplement 2). The cells chosen for analysis express the musk-responsive ORs Olfr235 or Olfr1440 or a randomly chosen OR Olfr701, in addition to markers that define INP3, iOSN, or mOSN cells. As expected, individual iOSNs and mOSNs of musk-responsive subtypes were found to exhibit robust and singular OR expression on the open and closed sides of OEs from UNO-treated mice. Moreover, and as observed previously, INP3 cells coexpress multiple OR transcripts at low levels. A detailed description of how the analysis was performed is included in the Methods section under Quantification and statistical analysis.

      Within the histology-based quantifications, newborn OSNs are identified based on their robust RNA-FISH signals corresponding to a specific OR transcript and an EdU label. Considering the EdU chase time of 7 days, most EdU-positive cells are expected to have passed the INP3 stage and be iOSNs or mOSNs. Moreover, considering the low level of OR expression within INP3 cells, it is unlikely OR transcripts are expressed at a high enough level to be detectable and/or counted at this stage and thereby affect newborn OSN quantifications.

      There are also two conceptual issues that are of concern. The first is the concept of selective neurogenesis. The data show an increased expression of musk receptors in response to male odor stimulation. The authors argue that this indicates selective neurogenesis of the musk receptor types. However, it is not clear what the distinction is between elevated receptor expression and a commitment to a specific fate at an early stage of development. As immature OSNs express multiple receptors, a likely scenario is that some newly differentiated immature OSNs have elevated expression of not only the musk receptors but also other receptors. The current experiments do not distinguish the two alternatives. Moreover, as pointed out above, it is not clear whether mature OSNs maintain the increased expression. Although a scRNASeq experiment can clarify it, the authors, unfortunately, did not perform an in-depth analysis to determine at which point of neurogenesis the cells commit to a specific musk receptor type. The quality of the scRNASeq data unfortunately also does not lend confidence for this type of analysis.

      The addition of a second scRNA-seq dataset within the revised manuscript (Figure 1), combined with the new scRNA-seq-based analyses of OR expression in INP3, iOSN, and mOSN cells (Figure 1-figure supplement 2), provide strong evidence that iOSNs and mOSNs robustly express a single OR gene and that cellular expression is stable from the iOSN to the mOSN stage. These analyses do not support a scenario in which odor stimulation causes upregulated expression of multiple ORs and thereby causes apparent increases in quantities of newly generated OSNs that express musk-responsive ORs. Rather, the data firmly support a mechanism in which odor stimulation increases quantities of newly generated OSNs that have stably committed to the robust expression of a single musk-responsive OR.

      A second conceptual issue, the idea of homeostasis in regeneration, which the authors presented in the Introduction, needs clarification. In its current form, it is confusing. It could mean that a maintenance of the distribution of receptor types, or it could mean the proper replacement of a specific OR type upon the loss of this type. The authors seem to refer to the latter and should define it properly.

      We have revised the Introduction section to clarify our use of the term homeostatic in one instance (paragraph 4) and replace it with more specific language in a second instance (paragraph 5).

      Reviewer #3 (Recommendations For The Authors):

      Concerns over scRNASeq data. It appears that the samples may have included non-OE tissues, which reduced the representation of the OSNs. This experiment may need to be repeated to increase the number of OSNs.

      As outlined in the response to the public comments, we think that the low proportion of OSNs in the OE 1 data set reflects how the cells were prepared and processed. We have now included a second scRNA-seq dataset to address this concern.

      Cell types should be identified in the scRNASeq analysis, and the number of cells documented for each cell type, at least for the OSNs. The data should be made available for general access.

      We have now clarified that the OE 1 dataset was published as part of a previous study (C. J. van der Linden et al., 2020) and was made publicly available as part of that study (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE157119). All cell types in the newly generated OE 2 dataset have been annotated (Figure 1) and this dataset has also been made publicly available (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE278693). The numbers and percentages of OSNs within OE 1 and OE 2 datasets have been added to the legend of Figure 1-figure supplement 1.

      The specific OR types should be segregated for mature and immature OSNs. The percentage of a specific OR type should be normalized to the total number of OSNs, rather than the total cells. The current quantification is misleading because it gives the false sense that the muscone receptors represent ~0.1% of cells when the proportion is much higher if only OSNs are considered.

      In the revised manuscript, quantities of iOSNs (Gap43+ cells) of specific subtypes within the OE 1 and OE 2 scRNA-seq datasets are graphed as percentages of both all OSNs (Figure 1E, Figure 1–figure supplement 1D) and all cells (Figure 1–figure supplement 1E). As a percentage of all OSNs, average quantities of iOSNs of musk responsive subtypes on the open side of the OE range from 0.005% (for Olfr1431) to 0.14% (for Olfr1440) (Figure 1E).

      Within the feature plots for the two datasets, the differentiation stages of indicated OSNs have been clearly defined within the figures and figure legends. For the OE 1 dataset, iOSNs are differentiated from mOSNs by arrows (Figure 1–figure supplement 1C). For the OE 2 dataset (Figure 1D), only immature OSNs are shown for simplicity.

      Technical details of the scRNASeq should be documented. In the feature plot of musk-response receptors (Figure. 1D), it is better to use the actual quantity of expression rather than binarized representation (with or without an OR). If one needs to use on/off to determine the number of cells for a given OR type, then the criteria of selection should be given.

      Technical details of generation of the scRNA-seq datasets have been documented in the “Method details” section (for the OE 2 dataset) and in the method section of our previous publication of the OE 1 dataset (C. J. van der Linden et al., 2020). Details of the scRNA-seq analyses, including the criteria used to define immature OSNs of specific subtypes, are documented within the “Quantification and statistical analysis” section.

      Within the feature plots, we have decided to show OSNs of a given subtype in a binary fashion using specific colors for the sake of simplicity (Figure 1D, Figure 1-figure supplement 1C). To address the reviewer’s cooncern, we have added a new figure that provides detailed information about OR transcript expression (levels and genes) within iOSNs and mOSNs of two different musk responsive subtypes and a randomly chosen subtype (Figure 1-figure supplement 2).

      An in-depth analysis of the onset of OR expression in the GBC, INP, immature, and mature OSNs should be performed. It is also important to determine how many other receptors are detected in the cells that express the musk receptors. The current scRNASeq data may not be of sufficiently high quality and the experiment needs to be repeated. It is also important for the authors to take measures to eliminate ambient RNA contamination.

      The revised manuscript includes a second scRNA-seq dataset (OE 2; Figure 1). Details of how both the original (OE 1) and new datasets were generated have been documented within the Methods sections of the corresponding publications [(C. J. van der Linden et al., 2020); present study]. For both datasets, live/dead selection of cells was performed, which was expected to reduce ambient RNA.

      The revised manuscript also includes a new figure that provides detailed information about OR transcript expression within INP3, iOSN and mOSN cells that express one of two different musk responsive ORs or a randomly chosen OR (Figure 1-figure supplement 2). These data reveal, as reported previously (Bashkirova et al., 2023; Fletcher et al., 2017; Pourmorady et al., 2024), that low levels of multiple OR transcripts are detected in INP3 (Tex15+) cells. By contrast, iOSN (Gap43+) and mOSN (Omp+) cells robustly express a single OR, with little or no expression of other ORs.

      Quantification of cells for Figure 2-7 should be changed. Instead of using cell number per 1/2 section, the data should be calculated using density (using the area of the epithelium or normalized to the total number of cells (based on DAPI staining). This is because multiple sections are taken from the same mouse along the A-P axis. These sections have different sizes and numbers of cells.

      As noted in response to a similar concern of Reviewer #2, this has been addressed in two ways within the revised manuscript:

      (1) We have noted within the Methods section that the approach of using half-sections for normalization has been used in multiple previous studies for quantifying newborn (OR+/EdU+) and total (OR+) OSN abundances (Hossain et al., 2023; Ibarra-Soria et al., 2017; C. van der Linden et al., 2018; C. J. van der Linden et al., 2020). Additionally, within the figure legends and Methods, we have more thoroughly described the approach used, including that it relies on averaging the quantifications from at least 5 high-quality coronal OE tissue sections that are evenly distributed throughout the anterior-posterior length of each OE and thereby mitigates the effects of section size and cell number variation among sections. In the case of UNO treated mice, the open and closed sides within the same section are paired, which further reduces the effects of section-to section variation. We have found that this approach yields reproducible quantities of newborn and total OSNs among biological replicate mice and enables accurate assessment of how quantities of OSNs of specific subtypes change as a result of altered olfactory experience, a key objective of this study.

      (2) To assess whether the use of alternative approaches for normalizing newborn OSN quantities suggested by the reviewers would affect the present study’s findings, we compared three methods for normalizing the effects of exposure to male odors or muscone on quantities of newborn Olfr235 OSNs in the OEs of both UNO-treated and non-occluded mice: 1) OR+/EdU+ OSNs per half-section (used in this study), 2) OR+/EdU+ OSNs per total number of EdU+ cells (reviewer suggestion (i)), and 3) OR+/EdU+ OSNs per unit of DAPI+ area (an approximate measure of nuclei number; reviewer suggestion (ii)). The three normalization methods yielded statistically indistinguishable differences in assessing the effects of exposure of either UNO-treated or non-occluded mice to male odors (newly added Figure 2–figure supplement 2 and Figure 3–figure supplement 2), or of exposure of non-occluded mice to muscone (newly added Figure 4–figure supplement 3). Based on these findings, and the considerable time that would be required to renormalize all data in the manuscript, we have chosen to maintain the use of normalization per half-section.

      References

      Bashkirova, E. V., Klimpert, N., Monahan, K., Campbell, C. E., Osinski, J., Tan, L., Schieren, I., Pourmorady, A., Stecky, B., Barnea, G., Xie, X. S., Abdus-Saboor, I., Shykind, B. M., Marlin, B. J., Gronostajski, R. M., Fleischmann, A., & Lomvardas, S. (2023). Opposing, spatially-determined epigenetic forces impose restrictions on stochastic olfactory receptor choice. eLife, 12, RP87445. https://doi.org/10.7554/eLife.87445

      Coppola, D. M. (2012). Studies of olfactory system neural plasticity: The contribution of the unilateral naris occlusion technique. Neural Plasticity, 2012, 351752. https://doi.org/10.1155/2012/351752

      Fletcher, R. B., Das, D., Gadye, L., Street, K. N., Baudhuin, A., Wagner, A., Cole, M. B., Flores, Q., Choi, Y. G., Yosef, N., Purdom, E., Dudoit, S., Risso, D., & Ngai, J. (2017). Deconstructing Olfactory Stem Cell Trajectories at Single-Cell Resolution. Cell Stem Cell, 20(6), 817-830.e8. https://doi.org/10.1016/j.stem.2017.04.003

      Han, X., Jiang, Y., Feng, N., Yang, P., Zhang, M., Jin, W., Zhang, T., Huang, Z., Zhao, H., Zhang, K., Liu, S., & Hu, D. (2022). Comparison of the Homology Between Muskrat Scented Gland and Mouse Preputial Gland. Journal of Mammalian Evolution, 29(2), 435–446. https://doi.org/10.1007/s10914-022-09604-w

      Hanchate, N. K., Kondoh, K., Lu, Z., Kuang, D., Ye, X., Qiu, X., Pachter, L., Trapnell, C., & Buck, L. B. (2015). Single-cell transcriptomics reveals receptor transformations during olfactory neurogenesis. Science (New York, N.Y.), 350(6265), 1251–1255. https://doi.org/10.1126/science.aad2456

      Hossain, K., Smith, M., & Santoro, S. W. (2023). A histological protocol for quantifying the birthrates of specific subtypes of olfactory sensory neurons in mice. STAR Protocols, 4(3), 102432. https://doi.org/10.1016/j.xpro.2023.102432

      Ibarra-Soria, X., Nakahara, T. S., Lilue, J., Jiang, Y., Trimmer, C., Souza, M. A., Netto, P. H., Ikegami, K., Murphy, N. R., Kusma, M., Kirton, A., Saraiva, L. R., Keane, T. M., Matsunami, H., Mainland, J., Papes, F., & Logan, D. W. (2017). Variation in olfactory neuron repertoires is genetically controlled and environmentally modulated. eLife, 6. https://doi.org/10.7554/eLife.21476

      Kelemen, G. (1947). The junction of the nasal cavity and the pharyngeal tube in the rat. Archives of Otolaryngology, 45(2), 159–168. https://doi.org/10.1001/archotol.1947.00690010168002

      Lin, D. Y., Zhang, S.-Z., Block, E., & Katz, L. C. (2005). Encoding social signals in the mouse main olfactory bulb. Nature, 434(7032), 470–477. https://doi.org/10.1038/nature03414

      McIntyre, J. C., Titlow, W. B., & McClintock, T. S. (2010). Axon growth and guidance genes identify nascent, immature, and mature olfactory sensory neurons. Journal of Neuroscience Research, 88(15), 3243–3256. https://doi.org/10.1002/jnr.22497

      Pourmorady, A. D., Bashkirova, E. V., Chiariello, A. M., Belagzhal, H., Kodra, A., Duffié, R., Kahiapo, J., Monahan, K., Pulupa, J., Schieren, I., Osterhoudt, A., Dekker, J., Nicodemi, M., & Lomvardas, S. (2024). RNA-mediated symmetry breaking enables singular olfactory receptor choice. Nature, 625(7993), 181–188. https://doi.org/10.1038/s41586-023-06845-4

      Saraiva, L. R., Ibarra-Soria, X., Khan, M., Omura, M., Scialdone, A., Mombaerts, P., Marioni, J. C., & Logan, D. W. (2015). Hierarchical deconstruction of mouse olfactory sensory neurons: From whole mucosa to single-cell RNA-seq. Scientific Reports, 5, 18178. https://doi.org/10.1038/srep18178

      Sato-Akuhara, N., Horio, N., Kato-Namba, A., Yoshikawa, K., Niimura, Y., Ihara, S., Shirasu, M., & Touhara, K. (2016). Ligand Specificity and Evolution of Mammalian Musk Odor Receptors: Effect of Single Receptor Deletion on Odor Detection. The Journal of Neuroscience: The Official Journal of the Society for Neuroscience, 36(16), 4482–4491. https://doi.org/10.1523/JNEUROSCI.3259-15.2016

      Scholz, P., Kalbe, B., Jansen, F., Altmueller, J., Becker, C., Mohrhardt, J., Schreiner, B., Gisselmann, G., Hatt, H., & Osterloh, S. (2016). Transcriptome Analysis of Murine Olfactory Sensory Neurons during Development Using Single Cell RNA-Seq. Chemical Senses, 41(4), 313–323. https://doi.org/10.1093/chemse/bjw003

      Schwende, F. J., Wiesler, D., Jorgenson, J. W., Carmack, M., & Novotny, M. (1986). Urinary volatile constituents of the house mouse,Mus musculus, and their endocrine dependency. Journal of Chemical Ecology, 12(1), 277–296. https://doi.org/10.1007/BF01045611

      Shirasu, M., Yoshikawa, K., Takai, Y., Nakashima, A., Takeuchi, H., Sakano, H., & Touhara, K. (2014). Olfactory receptor and neural pathway responsible for highly selective sensing of musk odors. Neuron, 81(1), 165–178. https://doi.org/10.1016/j.neuron.2013.10.021

      Tan, L., Li, Q., & Xie, X. S. (2015). Olfactory sensory neurons transiently express multiple olfactory receptors during development. Molecular Systems Biology, 11(12), 844. https://doi.org/10.15252/msb.20156639

      van der Linden, C. J., Gupta, P., Bhuiya, A. I., Riddick, K. R., Hossain, K., & Santoro, S. W. (2020). Olfactory Stimulation Regulates the Birth of Neurons That Express Specific Odorant Receptors. Cell Reports, 33(1), 108210. https://doi.org/10.1016/j.celrep.2020.108210

      van der Linden, C., Jakob, S., Gupta, P., Dulac, C., & Santoro, S. W. (2018). Sex separation induces differences in the olfactory sensory receptor repertoires of male and female mice. Nature Communications, 9(1), 5081. https://doi.org/10.1038/s41467-018-07120-1

      Verhaagen, J., Oestreicher, A. B., Gispen, W. H., & Margolis, F. L. (1989). The expression of the growth associated protein B50/GAP43 in the olfactory system of neonatal and adult rats. The Journal of Neuroscience: The Official Journal of the Society for Neuroscience, 9(2), 683–691.

      Vihani, A., Hu, X. S., Gundala, S., Koyama, S., Block, E., & Matsunami, H. (2020). Semiochemical responsive olfactory sensory neurons are sexually dimorphic and plastic. eLife, 9, e54501. https://doi.org/10.7554/eLife.54501

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this study, Le et al.. aimed to explore whether AAV-mediated overexpression of Oct4 could induce neurogenic competence in adult murine Müller glia, a cell type that, unlike its counterparts in cold-blooded vertebrates, lacks regenerative potential in mammals. The primary goal was to determine whether Oct4 alone, or in combination with Notch signaling inhibition, could drive Müller glia to transdifferentiate into bipolar neurons, offering a potential strategy for retinal regeneration.

      The authors demonstrated that Oct4 overexpression alone resulted in the conversion of 5.1% of Müller glia into Otx2+ bipolar-like neurons by five weeks post-injury, compared to 1.1% at two weeks. To further enhance the efficiency of this conversion, they investigated the synergistic effect of Notch signaling inhibition by genetically disrupting Rbpj, a key Notch effector. Under these conditions, the percentage of Müller gliaderived bipolar cells increased significantly to 24.3%, compared to 4.5% in Rbpjdeficient controls without Oct4 overexpression. Similarly, in Notch1/2 double-knockout Müller glia, Oct4 overexpression increased the proportion of GFP+ bipolar cells from 6.6% to 15.8%.

      To elucidate the molecular mechanisms driving this reprogramming, the authors performed single-cell RNA sequencing (scRNA-seq) and ATAC-seq, revealing that Oct4 overexpression significantly altered gene regulatory networks. They identified Rfx4, Sox2, and Klf4 as potential mediators of Oct4-induced neurogenic competence, suggesting that Oct4 cooperates with endogenously expressed neurogenic factors to reshape Müller glia identity.

      Overall, this study aimed to establish Oct4 overexpression as a novel and efficient strategy to reprogram mammalian Müller glia into retinal neurons, demonstrating both its independent and synergistic effects with Notch pathway inhibition. The findings have important implications for regenerative therapies as they suggest that manipulating pluripotency factors in vivo could unlock the neurogenic potential of Müller glia for treating retinal degenerative diseases.

      Strengths:

      (1) Novelty: The study provides compelling evidence that Oct4 overexpression alone can induce Müller glia-to-bipolar neuron conversion, challenging the conventional view that mammalian Müller glia lacks neurogenic potential.

      (2) Technological Advances: The combination of Muller glia-specific labeling and modifying mouse line, AAV-GFAP promoter-mediated gene expression, single-cell RNA-seq, and ATAC-seq provides a comprehensive mechanistic dissection of glial reprogramming.

      (3) Synergistic Effects: The finding that Oct4 overexpression enhances neurogenesis in the absence of Notch signaling introduces a new avenue for retinal repair strategies.

      Weaknesses:

      (1) In this study, the authors did not perform a comprehensive functional assessment of the bipolar cells derived from Müller glia to confirm their neuronal identity and functionality.

      (2) Demonstrating visual recovery in a bipolar cell-deficiency disease model would significantly enhance the translational impact of this work and further validate its therapeutic potential.

      Response: We thank the Reviewer for their evaluation. We agree that functional analysis of Müller glia-derived bipolar cells is indeed important, but is beyond the current scope of the manuscript.

      Reviewer #2 (Public review):

      Summary:

      The authors harness single-cell RNAseq data from zebrafish and mice to identify Oct4 as a candidate driver of neurogenesis. They then use adeno-associated virus vectors to show that while Oct4 overexpression alone converts rare adult Müller glia (MG) to bipolar cells, it synergizes with Notch pathway inhibition to cause this neurogenesis (achieved by Cre-mediated knockout of Rbpj floxed allele). Importantly, they genetically lineage-mark adult MG using a GLAST-CreER transgene and a Sun-GFP reporter, so that any non-MG cells that convert can be identified unambiguously. This is crucial because several high-profile papers made erroneous claims using short promoters in the viral delivery vector itself to mark MG, but those promoters are leaky and mark other non-MG cell types, making it impossible to definitively state whether manipulations studied were actually causing neurogenesis, or were merely the result of expression in pre-existing neurons. Once the authors establish Oct4 + RbpjKO synergy they use snRNAseq/ATACseq to identify known and novel transcription factors that could play a role in driving neurogenesis.

      Strengths:

      The system to mark MG is stringent, so the authors are studying transdifferentiation, not artifactual effects due to leaky viral promoters. The synergy between Oct4 and Notch pathway blockade is notable. The single-cell results add the potential involvement of new players such as Rfx4 in adult-MG-neurogenesis.

      Weaknesses:

      The existing version is difficult to read due to an unusually high number of text errors (e.g. references to the wrong figure panels etc.). A fuller explanation for the fraction of non-MG cells seen in control scRNAseq assays is required, particularly because the neurogenic trajectory which is enhanced in the Oct4/Rbpj-KO context is also evident in the control retina. Claims regarding the involvement of transcription factors in adult neurogenesis (such as Rfx4) need to be toned down unless they are backed up with functional data. It is possible that such factors are important, but equally, they may have no role or a redundant role, and without functional tests, it's impossible to say one way or the other.

      Overall, the authors achieved what they set out to do, and have made new insights into how neurogenesis can be stimulated in MG. Ultimately, a major long-term goal in the field is to replace lost photoreceptors as this is most relevant to many human visual disorders, and while this paper (like all others before it) does not generate rods or cones, it opens new strategies to coax MG to form a related neuronal cell type. Their approach underscores the benefits of using a gold-standard approach for lineage tracing.

      We thank the Reviewer for their evaluation. We have made extensive changes to the manuscript to correct errors and modify discussion as recommended. These are detailed below in our point-by-point responses to specific recommendations to the authors.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Minor corrections:

      (1) In Figure 1C top GFAP-mCherry panel, two dim GFP + cells have colocalized with Otx2, is it caused by optic imaging thickness or some muller glia cells having the Otx2 expression?

      This indeed reflects the effects of optic imaging thickness. Colocalization of Sun1-GFP and Otx2 is not observed when Z-stack images are examined in GlastCreER;Sun1-GFP retinas. This can also be appreciated by the fact that, in cases of apparent overlap of nuclear envelope-targeted Sun1 and Otx2, the sizes of the labeled areas differ. In cases of true expression overlap, such as is seen following Oct4 overexpression, the labeled areas are the same size, or very nearly so.

      Whether the Glast-CreERT2 x Rosa26-LSL-Sun1-GFP mouse line has cross-labeling with the Otx2+ bipolar cells, the author should image the mCherry ctrl sample with a thin optical imaging layer with a small pinhole for Z-stack to verify the co-labeling the GFP and Otx2 in mCherry ctrl sample.

      Please see above. Since we first described this line (de Melo, et al. 2012), we have examined thousands of sections of GlastCreER;Sun1-GFP retinas, and have yet to see a single GFP-positive neuron. To avoid confusion, however, we have replaced these images with an additional image from a control mCherry-infected GlastCreER;Sun1-GFP retina processed for the same study.

      In the middle upper panel, Oct4-mCherry group, the white arrows indicate the GFP colocalized with Otx2 signal, but seems not mCherry positive, by contrast, the neighbor cells have significant mCherry expression but no colocalization with Otx2. The GFAP promoter-Oct4-mCherry may have stopped expression after the Müller Glia cells were converted into Otx2+ bipolar cells, but is there any middle stage in which the Oct4mCherry and Otx2 co-expression? And after Müller glia to Bipolar conversion, why have Glast-CreERT2 driven GFP expressions not suppressed as GFAP promoter driven Oct4-mCherry? Could the author discuss this point?

      We observed a significant number of Muller glia-derived cells expressing both Otx2 and weak mCherry signal. GFP expression is driven by the ubiquitous CAG promoter following Cre-dependent excision of a transcriptional stop cassette. We have modified the text to make this point explicit.

      (2) In Figure S2b, the mouse is labeled with wild type; I assume it should be the same mouse line as Fig.1. Otherwise, the author should describe the source of the GFP signal.

      “Wildtype” in this case refers to GlastCreER;Sun1-GFP controls, which as the Reviewer correctly points out, are not truly wildtype. The genotype of these animals is specified in all figure legends, and is now referred to as “control” rather than “wildtype” in the figures and main text throughout.

      In Figure S2k and l, mCherry ctrl panel, the GFP+ cells looked co-labeling with Otx2, so again, is it the thicker optical imaging layer that caused overlapping vertically or the low specific of Müller Glia of the mouse line? Please describe the stars' meaning in Figure S2i,j in the figure legend. There are 2 figures labeled "n" of the quantification data.

      This is, again, an example of the thicker optical imaging layer causing apparent overlap. We have previously demonstrated that the Sun1-GFP+ cells do not co-label with Otx2 in GFAP-mCherry AAV-injected control retinas (Le et al., 2022; Fig. 2C). The asterisks (*) indicate mouse-on-mouse vascular staining, which is now clarified in the figure legend. The 2 figures labeled ‘n’ have been relabeled as ‘m’ and ‘n’.

      (3) In Figure 2c in the top panel, the Otx2 image was wrong; please replace it with the correct one.

      We thank the Reviewer for spotting this error. This is an inadvertent duplication of the single-channel Otx2 staining for mCherry control sample. We have replaced this with the correct image.

      (4) In Figure 3a, the Rbpj-cKO mouse line was used, but where was the GFP signal from? Please verify the mouse line you used in your work. The same question is also asked in Figure S3, S4b.

      GlastCreER;Rpbj<sup>lox/lox</sup>;Sun1-GFP were used in Figure 3a. As now specified in the Methods and all figure legends, all mice used in this study carry both the GlastCreER and Sun1-GFP transgenes.

      (5) In Figure S4c,d, and 5 wks time point, if the authors quantify the GFP+/Sox2- cells changing, it will be more helpful to understand the percentage of the Müller glia cells conversion to Bipolar cells compared to the Figure 2D, and can be as a supplement to the conclusion Müller to Bipolar conversion rather the Müller proliferation.

      Sox2-/GFP+ cells are a measure of Müller glia to bipolar cell conversion that complements that of GFP+/Otx2+ cells. This is now clarified in the text. We also include quantification of Sox2-/GFP+ neurons at 5 weeks post-injury in Fig. S5b.

      (6) In Figure S1b,c, there is a large portion of cells that are activated Müller glia after NMDA injury. Did the activated Müller glial cells lose their Müller glial identity? Between the loss of Müller glial identity and neuronal reprogramming, are there any markers that can be used to assess whether Müller glial cells are truly transdifferentiating into neurons rather than remaining in a reactive glial state or an intermediate phase?

      Wildtype Müller glia progressively revert to resting state, and by 72 hours post-injury have already lost expression of Klf4 and Myc (Hoang, et al. 2020), a point which is now specifically mentioned in the text. In GlastCreER;Sun1-GFP;Nfia/b/x<sup>lox/lox</sup>;Rbpj<sup>lox/lox</sup> Müller glia, reactive MG appear to largely convert to bipolar and amacrine-like cells, and it remains unclear if they eventually revert to a resting state (Le, et al. 2024).

      Reviewer #2 (Recommendations for the authors):

      This work demonstrates that Oct4 (Pou5f3) can induce neurogenesis in murine Müller glia (MG). Le et al start by showing that murine and zebrafish MG lack expression of Oct4 (Pou5f3) and its target Nanog. To assess the effect of Oct4 they first label adult MG with Sun1-GFP using tamoxifen-treated GlastCreER;Sun1-GFP mice, then later transduce in vivo with AAV vectors expressing mCherry alone or Oct4 + mCherry. Subsequently, they damage the retina with NMDA and assess the effects several weeks later. In Oct4+ cells at 2 weeks there is rare induction of the neural determinant Ascl1, down-regulation of the MG marker Sox2, induction of bipolar markers (Otx2, Scgn,Cabp5) but not amacrine (HuC/D) or rod (Nrl) markers. Combining Oct4 with

      Notch inhibition (deleting floxed Rbpj) synergistically increases bipolar cell induction, with Otx2 staining rising to >20% of GFP-marked cells, and cells losing MG identify (loss of Sox2/9). EdU labeling was negligible suggesting direct trans-differentiation. Similar synergy was seen upon combining Oct4 expression with Notch1/2 double gene knockout. Attempts to combine Oct4 with Nfia, Nfib, and Nfix loss were unsuccessful as the GFAP promoter driving Oct4 in MG seems to require these three related transcription factors. scRNAseq confirmed the Oct4-overexpression/Rbpj-KO-driven increase in bipolar cells and decrease in MG cells and revealed that these manipulations may enhance bipolar cell genesis by repressing genes that define quiescent MG and enhancing expression of genes that define reactive MG and neurogenic cells. Finally, multiomic snRNA/scATAC-seq data was performed to assess the effect of Oct2 in wt or Rbpj null MG. This approach revealed that, as anticipated, more genes were up and down-regulated in the context of both manipulations vs Oct4 OE alone. Moreover, Oct4 and Rbpj KO reduced chromatin accessibility at target motifs for transcription factors involved in MG identify/quiescence, while MGPCs showed elevated accessibility for neurogenic factors. The combination of Oct4 OE and Rbpj KO induces accessibility at various interesting TF sites that may contribute to the synergistic neurogenesis, including Rfx4, Klf4, Insm1, and others.

      This is an interesting paper that adds to the growing literature on how neurogenesis can be induced in mammalian MG. The focus on Oct4 is interesting and the synergistic effects are striking and analyzed in some detail with scRNAseq and multiomic snRNA/scATACseq. The latter results provide useful new insight into transcriptional programs that may be critical in driving neurogenesis. Functional insight into these new candidates is not explored in this manuscript, but that's beyond the scope of the current work and forms the basis for new studies. There are some overreaching statements in the Discussion that need to be toned down, but apart from that and a long list of textual errors that need to be fixed, this paper is a valuable contribution to the field.

      Major comments

      There are numerous textual errors (some, but not all, examples are detailed in minor comments). It was difficult to follow this paper given the unusually high number of textual errors and the abbreviated legends. Greater attention should be paid to harmonizing the text with the figures and ensuring that the legends are correct and complete.

      The manuscript has been proofread carefully and errors corrected.

      The opening section of the scRNAseq data should outline briefly why sorting for GFP labeled cells purifies a significant fraction of non-MG cell types, despite the earlier claim, (which agrees with other publications), that GLAST-CreER transgene expression is highly specific to MG. Presumably, it mainly/totally reflects the co-purification of cells, cell fragments, and/or cell-free mRNA from other lineages. Is it also possible that a fraction (however small) of these cells reflect low-level spurious/temporary activation of GLAST-CreER expression in non-MG? The "contamination" is present despite the addition of the GFP sequence to the reference genome (as explained in Methods). They mention: "a clear differentiation trajectory connecting Muller glia, neurogenic Muller gliaderived progenitor cells (MGPCs), and differentiating amacrine and bipolar cells (Fig. 3b)". However, the same trajectory is evident in control mCherry samples, so one could argue that this trajectory is active in normal retina at some low rate, but that would/should equate to rare sun-GFP+ non-MG in controls. Are there any such cells, even extremely rarely, or is it truly 0%? At any rate, the authors need to raise these concerns and offer some explanation(s) at the start of their scRNAseq Results section. If there are really no such sun-GFP+ cells, the authors should comment on the presence of the apparent inactive trajectory in the Discussion.

      Since we first described this line (de Melo, et al. 2012), we have examined thousands of sections of GlastCreER;Sun1-GFP retinas, and have yet to see a single GFP-positive neuron. We have also previously shown (Hoang, et al. 2020) that FACSbased isolation of GFP-positive cells from GlastCreER;Sun1-GFP yields a roughly thirty-fold enrichment of Muller glia, implying the presence of small numbers of contaminating neurons. We thereby conclude that the presence of small numbers of neurons (rods, cones, bipolar, and amacrine cells) in the control GlastCreER;Sun1-GFP represents contamination rather than low levels of glia-to-neuron conversion, particularly since we are unable to detect the expression of genes such as neurogenic bHLH factors or immature photoreceptor precursor-specific factors such as Prdm1 that indicate the presence of intermediate cell states. This is now addressed in the Results section related to both Figures 3 and 4.

      Discussion:

      In reference to other strategies to induce neurogenesis the authors make the claim that Oct4 is fundamentally different: "In these cases, Müller glia broadly upregulate proneural genes and/or downregulate Notch signaling. Oct4 instead induces expression of the neurogenic transcription factor Rfx4, which is not expressed in developing retina. It is likely that activation of this parallel pathway to neurogenic competence in part accounts for synergistic induction of neurogenesis seen in Rbpj-deficient Müller glia". First, all these strategies, including Oct4, seem to activate bHLH factors, so they have that in common and the authors should note that overlap. More seriously, without functional tests (e.g. KO Rfx4) the authors need to dial back the over-reaching statement that Rfx4 is the fundamental mechanism driving the Oct4 effect. They can certainly suggest that this is one possibility, but equally, Rfx4 may have very little or no effect on neurogenesis, or it could act redundantly with some of the other factors the authors uncovered. It's impossible to know without functional data, so they either need to add the functional data, or hold back on the strong one-sided and overreaching claim.

      Since both Rfx4 expression and motif accessibility are selectively observed following Oct4 overexpression, and Rfx4 also has known neurogenic activity, we stand by our conclusion that it is a particularly strong candidate for mediating the neurogenic effects of Oct4 overexpression. However, the Reviewer is correct that in the absence of functional data, speculation about its function should be qualified. We have done this in the revised manuscript.

      Minor comments

      This sentence in the Results is confusing: "While expression of neurogenic bHLH factors driven by the Gfap promoter was rapidly silenced in Muller glia and activated in amacrine and retinal ganglion cells, Gfap-Oct4-mCherry remained selectively expressed in Muller glia but did not induce detectable levels of Muller glia-derived neurogenesis in the uninjured retina (Le et al., 2022)". The cited reference is at the end so it sounds like the Oct4 assay was performed in Le et al 2022, and there is no reference to a Figure for the Oct4 data in the current paper.

      As stated here, in Le, et al. 2022, we did not observe any conversion of Sun1-GFP-positive Muller glia to neurons in the absence of injury. In the current study, we instead test whether NMDA-induced excitotoxicity induced glia to neuron conversion in Muller glia overexpressing Oct4. This is now made clear in the revised text.

      There are many errors and omissions regarding Figure S2:

      Figure S2a, b legend, and panels do not match. 2a should be a schematic of the strategy to label MG with Sun1-GFP using GLAST-Cre and a floxed Sun1-GFP allele, but that's missing and instead, the current 2a is a schematic of AAV vectors. It seems that the current 2b legend may describe the combination of the current 2a and 2b panels.

      This has been corrected.

      Figure S2: Asterisks label certain stained elements in the Oct4 labeled panels, but there is no explanation in the legend. Are these meant to indicate non-specific staining? If so, what is the evidence that the signal is non-specific?

      These asterisks represent non-specific mouse-on-mouse vascular staining observed with the mouse monoclonal anti-Oct4 used in this study. This is now indicated in the figure legend.

      The text refers to Ascl1 staining in Figure S2e,f, but it's S2g,h.

      This has been corrected.

      Re this: "While Sun1-GFP-positive cells infected with Oct4-mCherry mostly express the Muller glial marker Sox2 (Fig. S2a,b), from 2 weeks post-injury onwards a subset of GFP positive cells did not show detectable Sox2 expression (Fig. S2b, yellow arrows)". Figure S2a, b are schematic diagrams, not immunofluorescence. They probably mean Figure S2c, d.

      This has been corrected.

      Fig S2m is mislabeled "n".

      This has been corrected.

      There are probably other errors with this figure, but I mostly gave up at this point. The authors should go through the paper to find and correct any additional mistakes/omissions in the text and legends.

      The manuscript has been carefully proofread and errors corrected.

      The figure panels are not always mentioned in the order that they appear. There are many examples.

      Figure panels are now mentioned in the order that they appear.

      Several schematics use "d-18-14" to indicate "day -18 to -14". The former is at first uninterpretable or at best unclear (could mean day -18 to day 14), perhaps d -18 to -14, or d -18:-14 would be clearer.

      This has been corrected.

      Re: "AAV-infected wildtype Muller glia could be readily identified by selective expression of Oct4 (Fig. 4e). Wildtype Oct4-expressing Muller glia give rise to both small numbers of neurogenic MGPCs (Fig. 4b),". Figure 4E is labeled Pou5f1, but it would be helpful to avoid confusion by also indicating on the figure that Pou5f1 = Oct4; and Fig 4b does not indicate neurogenic MGPCs (perhaps they mean 4c).

      This has been corrected.

      Some parts of the Results are written in the present tense and should be in the past tense (for guidance: https://www.nature.com/scitable/topicpage/effective-writing13815989/).

      Past tense is now used throughout.

      Pit1 (Pou1f1) is referred to as a "close variant" of Oct4/Pou4f5, but this is unclear (e.g. variant could mean a splice variant from the same locus) and the term "paralogue" should be used.

      “Paralogue” is now used in this context.

      Re: "Infection with Oct4-mCherry vector induced both Oct4 (Fig. S5e) and Ascl1 (Fig. S5d) expression in Notch1/2-deficient Müller glia." Supplementary image 5d is the one depicting Oct4 and 5e is the one showing Ascl1. However, the reference is reversed.

      This has been corrected.

    1. Author response:

      The following is the authors’ response to the current reviews.

      We deeply appreciate the reviewer’s careful review and critiques. These are excellent critiques that we are working on and probably require a few more years of work. Published together, we believe these critiques add value to our manuscript.


      The following is the authors’ response to the original reviews.

      Reviewer #2 (Public review):

      Summary:

      This manuscript by Yu and coworkers investigates the potential role of Secretory leukocyte protease inhibitor (SLPI) in Lyme arthritis. They show that, after needle inoculation of the Lyme disease (LD) agent, B. burgdorferi, compared to wild type mice, a SLPI-deficient mouse suffers elevated bacterial burden, joint swelling and inflammation, pro-inflammatory cytokines in the joint, and levels of serum neutrophil elastase (NE). They suggest that SLPI levels of Lyme disease patients are diminished relative to healthy controls. Finally, they find that SLPI may interact directly the B. burgdorferi.

      Strengths:

      Many of these observations are interesting and the use of SLPI-deficient mice is useful (and has not previously been done).

      Weaknesses:

      (a) The known role of SLPI in dampening inflammation and inflammatory damage by inhibition of NE makes the enhanced inflammation in the joint of B. burgdorferi-infected mice a predicted result; (b) The potential contribution of the greater bacterial burden to the enhanced inflammation is acknowledged but not experimentally addressed; (c) The relationship of SLPI binding by B. burgdorferi to the enhanced disease of SLPI-deficient mice is not addressed in this study, making the inclusion of this observation in this manuscript incomplete; and (d) assessment of SLPI levels in healthy controls vs. Lyme disease patients is inadequate.

      We greatly appreciate the critiques, and we do agree. Even though the observation of NE level is predictable, we believe that it is important to actually demonstrate it in the context of murine Lyme arthritis. The function of SLPI goes beyond inhibiting NE level.  As an ongoing project in our lab, we believe that the current study serves as a good starting point to explore the pleiotropic effects SLPI in the pathogenesis of murine Lyme arthritis and in patients. And, the critiques here are of great value to our research.

      Comments on revised version:

      Several of the points were addressed in the revised manuscript, but the following issues remain:

      Previous point that the relationship of SLPI binding to B. burgdorferi to the enhanced disease of SLPI-deficient mice is not investigated: The authors indicate that such investigations are ongoing. In the absence of any findings, I recommend that their interesting BASEHIT and subsequent studies be presented in a future study, which would have high impact.

      We thank the reviewer for the critique. We do agree that this part of the story is not complete. However, we would like to keep the BASEHIT and binding data in the paper, as we believe that it is an important finding. We confirmed the binding using ELISA, flow cytometry, and immunofluorescent microscopy. We showed that the binding is specific to infectious strain of B. burgdorferi, thus likely to contribute to the pathogenesis of murine Lyme arthritis. Our data suggest that SLPI can directly interact with a B. burgdorferi protein. We are exploring the biological significance of the binding. And this finding can be further explored by other labs too.

      Previous recommendation 1: (The authors added lines 267-68, not 287-68). This ambiguity is acknowledged but remains. In addition, in the revised manuscript, the authors state "However, these data also emphasize the importance of SLPI in controlling the development of inflammation in periarticular tissues of B. burgdorferi-infected mice." Given acknowledged limitations of interpretation, "suggest" would be more appropriate than "emphasize".

      We thank the reviewer for the careful reading, and we apologize for the mistake. The change has been made accordingly (line 268).

      Previous recommendation 5: The lack of clinical samples can be a challenge. Nevertheless, 4 of the 7 samples from LD patients are from individuals suffering from EM rather than arthritis (i.e., the manifestation that is the topic of the study) and some who are sampled multiple times, make an objective statistical comparison difficult. I don't have a suggestion as to how to address the difference in number of samples from a given subject. However, the authors could consider segregating EM vs. LA in their analysis (although it appears that limiting the comparison between HC and LA patients would not reveal a statistical difference).

      We thank the reviewer for the critique. And we agree with the reviewer that the patient’s data presented are not ideal. We believe that at this point the combination of the samples is most logical, as the number of samples we have from patients with Lyme arthritis is fairly limited. We stated the limitation in the discussion. We do believe that the finding of the correlation is important. It suggests the potential function of SLPI in patients, beyond murine infection.

      What’s more, various groups with large number of different samples can elucidate the relationship further.

      Previous recommendation 6: Given that binding of SLPI to the bacterial surface is an essential aspect of the authors' model, and that the ELISA assay to indicate SLPI binding used cell lysates rather than intact bacteria, a control PI staining to validate the integrity of bacteria seems reasonable.

      We appreciate the suggestion and has provided the propidium iodide staining in Supplemental Figure 5 (line 539-542, 568-569, 718-722).

      Previous recommendation 8: The inclusion of a no serum control (that presumably shows 100% viability) would validate the authors' assertion that 20% serum has bactericidal activity.

      We appreciate the suggestion. As stated in the manuscript (line 583-584), the percent viability was normalized to the control spirochetes culture without any treatment. Thus, the control spirochetes culture, without serum and SLPI treatment, showed 100% viability. We have revised Supplemental Figure 3 accordingly.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      The paper proposes an interesting perspective on the spatio-temporal relationship between FC in fMRI and electrophysiology. The study found that while similar networks configurations are found in both modalities, there is a tendency for the networks to spatially converge more commonly at synchronous than asynchronous timepoints. However, my confidence in the findings and their interpretation is undermined by an incomplete justification for the expected outcomes for each of the proposed scenarios.

      As detailed below, the reviewer’s comment motivated us to conduct simulations to establish the relationship between the scenarios that we seek to adjudicate and the empirical outcomes.

      Main Concern

      Fig 1 makes sense to me conceptually, including the schematics of the trajectories, i.e.:

      - Scenario1. Temporally convergent, same trajectories through connectome state space

      - Scenario2. Temporally divergent, different trajectories through connectome state space

      However, based on my understanding (and apologies if I am mistaken), I am concerned that these scenarios do not necessarily translate into the schematic CRP plots shown in fig 2C, or the statements in the main text, i.e.:

      - For scenario1, "epochs of cross-modal spatial similarity should occur more frequently at on-diagonal (synchronous) than off-diagonal (asynchronous) entries, resulting in an on-/off-diagonal ratio larger than unity"

      - For scenario2, "epochs of spatial similarity could occur equally likely at on-diagonal and off-diagonal entries (ratio≈1)"

      Where do the authors get these statements and the schematics in fig2C from? They do not seem to be fully justified via previous literature, theory, or simulations?

      In particular, I am not convinced based on the evidence currently in the paper, that the ratio of off- to on-diagonal entries (and under what assumptions) is a definitive way to discriminate between scenarios 1 and 2.

      For example, what about the case where the same network configuration reoccurs in both modalities at multiple time points. It seems to me that you would get a CRP with entries occurring equally on the on-diagonal as on the off-diagonal, regardless of whether the dynamics are matched between the two modalities or not (i.e. regardless of scenario 1 or 2 being true).

      This thought experiment example might have a flaw in it, and the authors might ultimately be correct, but nonetheless a systematic justification needs to be provided for using the ratio of off- to on-diagonal entries to discriminate between scenario 1 and 2 (and under what assumptions it is valid).

      Thank you for raising this important point. In response, we have now included simulation results to complement our earlier authors’ response, which provided literature references and a theoretical explanation of the on-/off-diagonal ratio metric.

      In the absence of theory, the authors could use surrogate data for scenario 1 and 2. For example:

      a. For scenario 1, run the CRP using a single modality. E.g. feed in the EEG into the analysis as both modality 1 AND modality 2. This should provide at least one example of CRP under scenario 1 (although it does not ensure that all CRPs under this scenario will look like this, it is at least a useful sanity check).

      Note: This simulation was included in the previous round of author’s responses.

      b. For scenario 2, run the CRP using a single modality plus a shuffled version. E.g. feed in the EEG into the analysis as both modality 1 AND a temporally shuffled version of the EEG as modality 2. The temporal shuffling of the EEG could be done by simple splitting the data into blocks of say ~10s and then shuffling them into a new order. This should provide a version of the CRP under scenario 2 (although it does not ensure that all CRPs under this scenario will look like this, it is at least a useful sanity check)

      The authors have provided CRP plots for option a. It shows a CRP, as expected, consistent with scenario 1. This is a useful sanity check. However, as mentioned above, it does not ensure that all CRPs under this scenario will look like this.

      However, the authors have not shown a CRP as per option b. As such, there is an incomplete justification for the expected outcomes of the scenarios.

      Note that another option, which has not been carried out, is to use full simulations, with clearly specified assumptions, for scenario1 and 2. One way of doing this is to use a simplified (state-space) setup where you randomly simulate N spatially fixed networks that are independently switching on and off over time (i.e. "activation" is 0 or 1). Note that this would result in a N-dimensional connectome state space.

      Using this, you can simulate and compute the CRPs for the two scenarios:

      a. Scenario 1: where the simulated activation timecourses are set to be the same between both modalities

      b. Scenario 2: where the simulated activation timecourses are simulated separately for each of the modalities

      We followed the reviewer’s suggestion and have now included full simulations to address the concerns regarding the theory of the on-/off-diagonal ratio metric. As recommended, we defined a random quantized signal with N levels to represent the recurrent manifestation of N fixed connectome states. This setup was used to demonstrate the relationship between the two scenarios and the CRP observations used to adjudicate between the scenarios in our paper.

      The CRP matrices in Fig. S10 provide an example illustration of this simulation. In the case where the two state timeseries are identical, there are more co-occurrences of the same state (white entries) on the diagonal than off the diagonal (left subplot). This is in line with Scenario 1, where both spatial and temporal convergence are present. Conversely, in Scenario 2, where state time courses are shuffled, co-occurrences of the same states are more dispersed, and the diagonal prominence vanishes (right subplot). This difference illustrates how the CRP reflects the presence or absence of temporal alignment, dissociating scenarios 1 and 2.

      To quantitively validate this observation, we calculated the on-/off-diagonal ratio across simulations with varying N values. For Scenario 2 (shuffled version), the ratio consistently remained close to 1, indicating the absence of temporal synchronization. In contrast, Scenario 1 (non-shuffled version) produced significantly higher ratios, exceeding 1, confirming the metric's ability to capture meaningful synchrony. These results demonstrate that the simulations successfully replicate the expected relationship between the two scenarios and the CRPs, and validate the theoretical foundation of the ratio metric under the defined assumptions.

      Minor Concern

      Leakage correction. The paper states: "To mitigate this issue, we provide results from source-localized data both with and without leakage correction (supplementary and main text, respectively)." It is great that the authors provide both. However, given that FC in EEG is almost totally dominated by spatial leakage (see Hipp paper), the main results/figures for the scalp EEG should be done using spatial leakage corrected EEG data.

      Thank you. We agree that source leakage is an important consideration, which is why the current work investigated the intracranial EEG-fMRI data as a primary approach and subsequently added the scalp EEG-fMRI approach. While source leakage correction is essential for addressing spurious connectivity, it can also risk removing genuine functional connectivity that includes zero-lag relationships. We are reassured by the observation that the scalp data both without and with leakage correction confirmed the findings of the intracranial data, i.e., the presence of spatial and a lack of temporal cross-modal convergence. As such we do not believe that source leakage had a considerable impact on the specific question at hand.

      Reviewer #2 (Public review):

      Summary:

      The study investigates the brain's functional connectivity (FC) dynamics across different timescales using simultaneous recordings of intracranial EEG/source-localized EEG and fMRI. The primary research goal was to determine which of three convergence/divergence scenarios is the most likely to occur.

      The results indicate that despite similar FC patterns found in different data modalities, the timepoints were not aligned, indicating spatial convergence but temporal divergence.

      The researchers also found that FC patterns in different frequencies do not overlap significantly, emphasizing the multi-frequency nature of brain connectivity. Such asynchronous activity across frequency bands supports the idea of multiple connectivity states that operate independently and are organized into a multiplex system.

      Strengths:

      The data supporting the authors' claims are convincing and come from simultaneous recordings of fMRI and iEEG/EEG, which has been recently developed and adapted.

      The analysis methods are solid and involved a novel approach to analyzing the co-occurrence of FC patterns across modalities (cross-modal recurrence plot, CRP) and robust statistics, including replication of the main results using multiple operationalizations of the functional connectome (e.g., amplitude, orthogonalized, and phase-based coupling).

      In addition, the authors provided a detailed interpretation of the results, placing them in the context of recent advances and understanding of the relationships between functional connectivity and cognitive states.

      The authors also did a control analysis and verified the effect of temporal window size or different functional connecvitity operationalizations. I also applaud their effort to make the analysis code open-sourced.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      The authors answer my concerns and they are resolved.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      This study investigates alterations in the autophagic-lysosomal pathway in the Q175 HD knock-in model crossed with the TRGL autophagy reporter mouse. The findings provide valuable insights into autophagy dynamics in HD and the potential therapeutic benefits of modulating this pathway. The study suggests that autophagy stimulation may offer therapeutic benefits in the early stages of HD progression, with mTOR inhibition showing promise in ameliorating lysosomal pathology and reducing mutant huntingtin accumulation.

      However, the data raises concerns regarding the strength of the evidence. The observed changes in autophagic markers, such as autolysosome and lysosome numbers, are relatively modest, and the Western blot results do not fully match the quantitative results. These discrepancies highlight the need for further validation and more pronounced effects to strengthen the conclusions. While the study suggests the potential of autophagy regulation as a long-term therapeutic strategy, additional experiments and more reliable data are necessary to confirm the broader applicability of the TRGL/Q175 mouse model.

      Furthermore, the 2004 publication by Ravikumar et al. demonstrated that inhibition of mTOR by rapamycin or the rapamycin ester CCI-779 induces autophagy and reduces the toxicity of polyglutamine expansions in fly and mouse models of Huntington's disease. mTOR is a key regulator of autophagy, and its inhibition has been explored as a therapeutic strategy for various neurodegenerative diseases, including HD. Studies suggest that inhibiting mTOR enhances autophagy, leading to the clearance of mHTT aggregates. Given that dysfunction of the autophagic-lysosomal pathway and lysosomal function in HD is already well-established, and that mTOR inhibition as a therapeutic approach for HD is also known, this study does not present entirely novel findings.

      Major Concerns:

      (1) In Figure 3A1 and A2, delayed and/or deficient acidification of AL causes deficits in the reformation of LY to replenish the LY pool. However, in Figure S2D, there is no difference in AL formation or substrate degradation, as shown by the Western blotting results for CTSD and CTSB. How can these discrepancies be explained?

      We appreciate the reviewer raising this point, and we agree with the concern. Please note that the material used for our immunoblotting was hemibrain homogenates, containing not only neurons but also glial cells, so the results for any protein, e.g., CTSD or CTSB in Fig. S2D, represented combined signals from neurons and glial cells. Our longstanding experience with western blot analysis of autophagy pathway markers is that signals from glial cells significantly interfere with/dilute the signals from neurons. By contrast, the immunofluorescence (IF) results in Fig. 3A, obtained with the assistance of tfLC3 probe and hue angle-based AV/LY subtype analysis, revealed the in situ conditions of the AL and LY within neurons selectively, which reflects the advantage of using the in vivo neuron-specific expression of the LC3 probe combined with IF with a LY marker in this study and our other related studies (Lee, Rao et al. 2019, Lee, Yang et al. 2022) as explained in the Introduction of this paper. Please also refer to a similar discussion regarding the WB-detected protein levels of p-ATG14 in L542-547. 

      (2) The results demonstrate that in the brain sections of 17-month-old TRGL/Q175 mice, there was an increase in the number of acidic autolysosomes (AL), including poorly acidified autolysosomes (pa-AL), alongside a decrease in lysosome (LY) numbers. These AL/pa-AL changes were not significant in 2-month-old or 7-month-old TRGL/Q175 mice, where only a reduction in lysosome numbers was observed. This indicates that these changes, representing damage to the autophagy-lysosome pathway (ALP), manifest only at later stages of the disease. Considering that the ALP is affected predominantly in the advanced stages of the disease (e.g., at 17 months), why were 6-month-old TRGL/Q175 mice selected for oral mTORi INK treatment, and why was the treatment duration restricted to just 3 weeks?

      We thank the reviewer for the comment. A key outcome measure in our evaluation of mTORi treatment was amelioration of mHTT pathology, i.e., mHTT aggregates/IBs. Before conducting the mTORi treatment experiments, we had learned from our assessments of age-associated progression of mHTT aggresomes/IBs in mice of different ages (e.g., 2-, 6-, 10- and 17-mo) that there were already severe mHTT accumulations in Q175 at 10-mo-old (e.g., Fig. 2A). This is consistent with a previous report (Carty, Berson et al. 2015) showing that striatal mHTT inclusions dynamically increase from 4 to 8 months. From a therapeutic point of view, more aggregates in the mouse brain would make it more difficult for the autophagy machinery to clear these aggregates. Thus, the high degree of aggregates in 10- or 17-mo may not be modifiable by the mTORi and/or prevent reliable/sensitive measurements on mTORi-induced phenotype changes. We then preferred to apply the treatment to younger (i.e., 6-mo-old) mice when the mHTT pathology was not so severe, with detectable, albeit mild, ALP abnormality.  Additionally, due to the 2-year funding limit for this project, there was insufficient time to generate a large set of old mice (e.g., ~18-mo) for another drug treatment experiment.  In future studies, it might be worthy to conduct the treatment “in the advanced stages of the disease (e.g., ~18-mo)” to further examine the modification potential of the mTORi on the ALP as well as the HTT aggregations. As for the treatment duration, we were interested in an acute treatment schedule given that, in our dosing tests, we observed rapid responses to the treatment (e.g., target engagement) in a few days even with one dose, and that the 14-15-day treatments produced consistent responses (e.g., Fig. S3A). Long-term treatment, however, would be worthy testing in the future although our current study informs a therapeutic approach that has been suggested by others involving intermittent/pulsatile administration of mTOR inhibitors to minimize side effects of chronic long-term administration.

      (3) Is the extent of motor dysfunction in TRGL/Q175 mice comparable to that in Q175 mice? Does the administration of mTORi INK improve these symptoms?

      Unfortunately, we were unable to investigate motor functions experimentally with specific assays such as open field or rotarod tests in this study (partially affected by the falling of the funded research period within the COVID-19 pandemic peak periods in 2020). Based on our experience in handling the mice, we did not notice any obvious differences between Q175 and TRGL/Q175, and any improvements after the acute mTORi INK treatment.  

      (4) Why is eGFP expression not visible in Fig. 6A in TRGL-Veh mice? Additionally, why do normal (non-poly-Q) mice have fewer lysosomes (LY) than TRGL/Q175-INK mice? IHC results also show that CTSD levels are lower in TRGL mice compared to TRGL/Q175-INK mice. Does this suggest lysosome dysfunction in TRGL-Veh mice?

      We appreciate the reviewer raising this point, which has been corrected (through slightly increasing the eGFP signal in the green channel and the merged channels equally for all genotypes), and the revised Fig. 6A is showing better eGFP signals. Regarding higher LY numbers/CTSD levels in TRGL/Q175-INK compared to the control TRGL-Veh mice, it does not necessarily imply LY dysfunction in TRGL mice, rather, it likely suggests mTORi treatment inducing LY biogenesis. Our original characterization of the TRGL mouse of varying ages, where low expression of the tgLC3 construct, produces only a very small increment of total LC3, resulting in no discernable functional changes in the autophagy pathway (Lee, Rao et al. 2019). The underlying mechanism, e.g., TFEB activation following mTOR inhibition, remains to be investigated in future studies. 

      (5) In Figure 5A, the phosphorylation of ATG14 (S29) shows minimal differences in Western blotting, which appears inconsistent with the quantitative results. A similar issue is observed in the quantification of Endo-LC3.

      We welcome the reviewer’s point, and therefore bands showing bigger differences of p-ATG14 (S29) have been used in the revised Fig. 5A, making the images and the quantitative results more consistent and representative. Similar changes have also been made to the Endo-LC3 data at the bottom of Fig. 5A.

      (6) In Figure S2A and Figure S2B, 17-month-old TRGL/Q175 mice show a decrease in pp70S6K and the p-ULK1/ULK1 ratio, but no changes are observed in autophagy-related markers. Do these results indicate only a slight change in autophagy at this stage in TRGL/Q175 mice? Since the mTOR pathway regulates multiple cellular mechanisms, could mTOR also influence other processes? Is it possible that additional mechanisms are involved?

      We completely agree with the reviewer. As mentioned in the text at multiple locations, LAP alterations in Q175 and TRGL/Q175 mice are mild even at a relatively old age (e.g., 17-mo), especially at the protein levels detected by immunoblotting. We agree that even if the mild alterations in the levels of pp70S6K (T389) and p-ULK1/ULK1 ratio may indicate “a slight change in autophagy”, it may also imply that other cell processes are involved given that mTOR signaling regulates multiple cellular functions. In particular, the p70S6K/p-p70S6K – a mTOR substrate used as a readout for mTOR activity in this study – is a key component of the protein synthesis pathway (Wang and Proud 2006, Magnuson, Ekim et al. 2012) , so its changes may serve as readouts for alterations in not only the autophagy pathway, but also the protein synthesis pathway. [A related discussion about mTOR/protein synthesis pathways, in response to a comment from Reviewer 2, has been incorporated into the text under Discussion, L633-640]

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors have explored the beneficial effect of autophagy upregulation in the context of HD pathology in a disease stage-specific manner. The authors have observed functional autophagy lysosomal pathway (ALP) and its machineries at the early stage in the HD mouse model, whereas impairment of ALP has been documented at the later stages of the disease progression. Eventually, the authors took advantage of the operational ALP pathway at the early stage of HD pathology, in order to upregulate ALP and autophagy flux by inhibiting mTORC1 in vivo, which ultimately reverted back to multiple ALP-related abnormalities and phenotypes. Therefore, this manuscript is a promising effort to shed light on the therapeutic interventions with which HD pathology can be treated at the patient level in the future.

      Strengths:

      The study has shown the alteration of ALP in the HD mouse model in a very detailed manner. Such stage-dependent in vivo study will be informative and has not been done before. Also, this research provides possible therapeutic interventions for patients in the future.

      Weaknesses:

      Some constructive comments and suggestions in order to reflect the key aspects and concepts better in the manuscript :

      (1) The authors have observed lysosome number alteration in a temporally regulated disease stage-specific manner. In this scenario investigation of regulation, localization, and level of TFEB, the transcription factor required for lysosome biogenesis, would be interesting and informative.

      We thank the reviewer for this point and completely agree that exploring TFEBrelated aspects would be interesting which will be investigated in future studies. 

      (2) For the general scientific community better clarification of the short forms will be useful. For example, in line 97, page 4, AP full form would be useful. Also 'metabolized via autophagy' can be replaced by 'degraded via autophagy'.

      We appreciate the reviewer for raising this point. We introduced each abbreviation at the location where the full term first appears and, for the case of “AP”, it was introduced in (previous) Line 69 when “autophagosome” first appears. We agree with the reviewer about easy reading for the general scientific community and thus we have added an Abbreviation section after the Key Words section, listing abbreviations used in this manuscript.

      Also, the word “metabolized” has been replaced with “degraded” as suggested. 

      (3) The nuclear vs cytosolic localization of HTT aggregates shown in Figure 2, are very interesting. The increase in cytosolic HTT aggregate formation at 10 months compared to 6 months probably suggests spatio-temporal regulation of aggregate formation. The authors could comment in a more elaborate manner, on the reason and impact of this kind of regulation of aggregate formation in the context of HD pathology.

      We value the reviewer’s important point. Previous studies have well documented that mHTT aggregates exist in both intranuclear and extranuclear locations in the brains of both human HD and mouse models (DiFiglia, Sapp et al. 1997, Li, Li et al. 1999, Carty, Berson et al. 2015, Peng, Wu et al. 2016, Berg, Veeranna et al. 2024). HTT can travel between the nucleus and cytoplasm and the default location for HTT is cytoplasmic, and thus the occurrence of nuclear mHTT aggregates is considered as a result of dysfunction in the nuclear exporting system for proteins (DiFiglia, Sapp et al. 1995, Gutekunst, Levey et al. 1995, Sharp, Loev et al. 1995, Cornett, Cao et al. 2005) while other factors such as phosphorylation of HTT may also affect nuclear targeting (DeGuire, Ruggeri et al. 2018). Extranuclear aggregates of mHTT usually appear later than nuclear aggregates and develop more aggressively in terms of numbers and pace after their appearance (Li, Li et al. 1999, Carty, Berson et al. 2015, Landles, Milton et al. 2020). The fact that there are neurons containing extranuclear aggregates without having nuclear aggregates within the same cells (Carty, Berson et al. 2015) does not support a nuclear-cytoplasmic sequence for aggregate formation, implying different mechanisms controlling the formation of these two types of aggregates. It was reported that there were no significant differences in toxicity associated with the presence of nuclear compared with extranuclear aggregates (Hackam, Singaraja et al. 1999), while other studies have proposed that nuclear aggregates correlate with transcriptional dysfunction while extranuclear aggregates may impair neuronal communication and can track disease progression (Li, Li et al. 1999, Benn, Landles et al. 2005, Landles, Milton et al. 2020). Thus, the observation of a higher level of extranuclear mHTT aggregates at 10-mo compared to 6-mo from the present study is consistent with previous findings mentioned above. In addition, our EM observations of homogenous granular/short fine fibril ultrastructure of both nuclear and extranuclear aggregates are consistent with findings from mouse model studies (Davies, Turmaine et al. 1997, Scherzinger, Lurz et al. 1997), which, interestingly, is different from in vitro studies where nuclear aggregates exhibited a core and shell structure but extranuclear aggregates did not possess the shell (Riguet, Mahul-Mellier et al. 2021), reflecting differences between in vivo and in vitro conditions. Taken together, even if efforts have been made in this and previous studies in trying to understand the differences between nuclear and extranuclear aggregates, the mechanisms regarding the spatial-temporal regulation of aggregate formation have so far not been fully revealed which will require additional investigations.

      (4) In this manuscript, the authors have convincingly shown that mTOR inhibition is inducing autophagy in the HD mouse model in vivo. On the other hand, mTOR inhibition would also reduce overall cellular protein translation. This aspect of mTOR inhibition can also potentially contribute to the alleviation of disease phenotype and disease symptoms by reducing protein overload in HD pathology. The authors' comments regarding this aspect would be appreciated.

      We recognize the value of the reviewer’s point which we completely agree with. Lowering mHTT via interfering protein translation (e.g., through RNAi, antisense oligonucleotides) has been an attractive strategy in HD therapeutic development (Kordasiewicz, Stanek et al. 2012, Tabrizi, Ghosh et al. 2019).  As mentioned above, mTOR regulates multiple cellular pathways including protein synthesis, and inhibition of mTOR as what was done in the present study is potentially affect protein synthesis as well. While our results of decreases in mHTT signals (Fig. 7) can be interpreted as a result of autophagymediated clearance of mHTT, certainly, a possibility cannot be excluded that mTOR inhibition may result in a reduction in HTT production which may also contribute to the observed results – future studies should determine how significant of such a contribution is. [The above description has been incorporated into the text under Discussion, L633-640] 

      (5) The authors have shown nuclear inclusion formation and aggregation of mHTT and also commented on its potential removal with the UPS system (proteasomal degradation) in vivo. As there is also a reciprocal relationship present between autophagy and proteasomal machineries, upon upregulation of autophagy machinery by mTOR inhibition proteasomal activity may decrease. How nuclear proteasomal activity increases to tackle nuclear mHTT IBs, would be interesting to understand in the context of HD pathology. Comments from the authors in this aspect would clarify the role of multiple degradation pathways in handling mutant HTT protein in HD pathology.

      We appreciate the reviewer raising this point. We agree that there are reciprocal relationships between autophagy and the UPS (Korolchuk, Menzies et al. 2010, Park and Cuervo 2013). In general, failure in one pathway would lead to compensatory upregulation of the other pathway, and vice versa (Lee, Park et al. 2019). So, as the reviewer pointed out, “upon upregulation of autophagy machinery by mTOR inhibition proteasomal activity may decrease”. However, we proposed in the Discussion that “It is possible that stimulation of autophagy is reducing the mHTT in the cytoplasm and thereby partially relieves the burden of the proteasome both in the cytoplasm and in the nucleus so that the nuclear proteasome operates more effectively”, which is inconsistent with the general expectation for a decreased UPS activity. However, please note that there are also instances where two pathways may act in the same direction, e.g., autophagy inhibition disturbs UPS degradative function (Korolchuk, Mansilla et al. 2009, Park and Cuervo 2013). Anyhow, our statement is just speculation, requiring verifications with additional experiments in the future. One of the observations reported here which may support the above speculation is the reductions of AV-non-associated form of mHTT/p62/Ub (Fig. 7B3), given that some of them might exist within the nucleus, whose reduced levels may reflect increased intranuclear UPS activity, besides the other possibility that they may travel from the nucleus to the cytosol for clearance as already discussed inside the text. [The last sentence has been incorporated into the text under Discussion, L628-632]

      (6) For the treatment of neurodegenerative disorders taking the temporal regulation into consideration is extremely important, as that will determine the success rate of the treatments in patients. The authors in this manuscript have clearly discussed this scenario. However, for neurodegenerative disordered patients, in most cases, the symptom manifestation is a late onset scenario. In that case, it will be complicated to initiate an early treatment regime in HD patients. If the authors can comment on and discuss the practicality of the early treatment regime for therapeutic purposes that would be impactful.

      We appreciate the reviewer raising this point and we agree with the main concern that “for neurodegenerative disordered patients, in most cases, the symptom manifestation is a late onset scenario.” This is really a common challenge in the therapeutic fields for neurodegeneration diseases. It should be first noted that the current study is an experimental therapeutical attempt in a mouse model which is consistent with previous reports (Ravikumar, Vacher et al. 2004) as a proof of concept for manipulating autophagy (i.e., via inhibiting mTOR in the current setting) as a potential therapeutic, whose clinical practicality requires further verifications. Moreover, in our opinion, early diagnosis (e.g., genetic testing in individuals with higher risk for HD) may be a key in overcoming the above challenges, i.e., if early diagnosis is enabled, it would become possible for earlier interventions. [The above description has been incorporated into the text under Discussion, L654-659] 

      Recommendations for the authors: 

      Reviewer #1 (Recommendations for the authors):

      Minor concerns:

      (1) Figures 1 and 2 should indicate the number of sections and mice/genotypes.

      Thanks for the suggestion, and the info has been added in the figure legends. 

      (2) Figure 3A2 should explain how AP, AL, pa-AL, and LY are quantified.

      Thanks for raising this point. Please note that the quantitation of AP, AL, pa-AL and LY was performed by the hue angle-based analysis which was described under “Confocal image collection and hue angle-based quantitative analysis for AV/LY subtypes” within the Materials and Methods. A phrase “(see the Materials and Methods)” has been added after the existing description “Hue angle-based analysis was performed for AV/LY subtype determination using the methods described in Lee et al., 2019” in the figure legend.

      References

      Benn, C. L., C. Landles, H. Li, A. D. Strand, B. Woodman, K. Sathasivam, S. H. Li, S. Ghazi-Noori, E. Hockly, S. M. Faruque, J. H. Cha, P. T. Sharpe, J. M. Olson, X. J. Li and G. P. Bates (2005). "Contribution of nuclear and extranuclear polyQ to neurological phenotypes in mouse models of Huntington's disease." Hum Mol Genet 14(20): 3065-3078.

      Berg, M. J., Veeranna, C. M. Rosa, A. Kumar, P. S. Mohan, P. Stavrides, D. M. Marchionini, D.S. Yang and R. A. Nixon (2024). "Pathobiology of the autophagy-lysosomal pathway in the Huntington’s disease brain." bioRxiv: 2024.2005.2029.596470.

      Carty, N., N. Berson, K. Tillack, C. Thiede, D. Scholz, K. Kottig, Y. Sedaghat, C. Gabrysiak, G. Yohrling, H. von der Kammer, A. Ebneth, V. Mack, I. Munoz-Sanjuan and S. Kwak (2015). "Characterization of HTT inclusion size, location, and timing in the zQ175 mouse model of Huntington's disease: an in vivo high-content imaging study." PLoS One 10(4): e0123527.

      Cornett, J., F. Cao, C. E. Wang, C. A. Ross, G. P. Bates, S. H. Li and X. J. Li (2005). "Polyglutamine expansion of huntingtin impairs its nuclear export." Nat Genet 37(2): 198204.

      Davies, S. W., M. Turmaine, B. A. Cozens, M. DiFiglia, A. H. Sharp, C. A. Ross, E. Scherzinger, E. E. Wanker, L. Mangiarini and G. P. Bates (1997). "Formation of neuronal intranuclear inclusions underlies the neurological dysfunction in mice transgenic for the HD mutation." Cell 90(3): 537-548.

      DeGuire, S. M., F. S. Ruggeri, M. B. Fares, A. Chiki, U. Cendrowska, G. Dietler and H. A. Lashuel (2018). "N-terminal Huntingtin (Htt) phosphorylation is a molecular switch regulating Htt aggregation, helical conformation, internalization, and nuclear targeting." J Biol Chem 293(48): 18540-18558.

      DiFiglia, M., E. Sapp, K. Chase, C. Schwarz, A. Meloni, C. Young, E. Martin, J. P. Vonsattel, R. Carraway, S. A. Reeves and et al. (1995). "Huntingtin is a cytoplasmic protein associated with vesicles in human and rat brain neurons." Neuron 14(5): 1075-1081.

      DiFiglia, M., E. Sapp, K. O. Chase, S. W. Davies, G. P. Bates, J. P. Vonsattel and N. Aronin (1997). "Aggregation of huntingtin in neuronal intranuclear inclusions and dystrophic neurites in brain." Science 277(5334): 1990-1993.

      Gutekunst, C. A., A. I. Levey, C. J. Heilman, W. L. Whaley, H. Yi, N. R. Nash, H. D. Rees, J. J. Madden and S. M. Hersch (1995). "Identification and localization of huntingtin in brain and human lymphoblastoid cell lines with anti-fusion protein antibodies." Proc Natl Acad Sci U S A 92(19): 8710-8714.

      Hackam, A. S., R. Singaraja, T. Zhang, L. Gan and M. R. Hayden (1999). "In vitro evidence for both the nucleus and cytoplasm as subcellular sites of pathogenesis in Huntington's disease." Hum Mol Genet 8(1): 25-33.

      Kordasiewicz, H. B., L. M. Stanek, E. V. Wancewicz, C. Mazur, M. M. McAlonis, K. A. Pytel, J. W. Artates, A. Weiss, S. H. Cheng, L. S. Shihabuddin, G. Hung, C. F. Bennett and D. W. Cleveland (2012). "Sustained therapeutic reversal of Huntington's disease by transient repression of huntingtin synthesis." Neuron 74(6): 1031-1044.

      Korolchuk, V. I., A. Mansilla, F. M. Menzies and D. C. Rubinsztein (2009). "Autophagy inhibition compromises degradation of ubiquitin-proteasome pathway substrates." Mol Cell 33(4): 517-527.

      Korolchuk, V. I., F. M. Menzies and D. C. Rubinsztein (2010). "Mechanisms of cross-talk between the ubiquitin-proteasome and autophagy-lysosome systems." FEBS Lett 584(7): 1393-1398.

      Landles, C., R. E. Milton, N. Ali, R. Flomen, M. Flower, F. Schindler, C. Gomez-Paredes, M. K. Bondulich, G. F. Osborne, D. Goodwin, G. Salsbury, C. L. Benn, K. Sathasivam, E. J. Smith, S. J. Tabrizi, E. E. Wanker and G. P. Bates (2020). "Subcellular Localization And Formation Of Huntingtin Aggregates Correlates With Symptom Onset And Progression In A Huntington'S Disease Model." Brain Commun 2(2): fcaa066.

      Lee, J. H., S. Park, E. Kim and M. J. Lee (2019). "Negative-feedback coordination between proteasomal activity and autophagic flux." Autophagy 15(4): 726-728.

      Lee, J. H., M. V. Rao, D. S. Yang, P. Stavrides, E. Im, A. Pensalfini, C. Huo, P. Sarkar, T. Yoshimori and R. A. Nixon (2019). "Transgenic expression of a ratiometric autophagy probe specifically in neurons enables the interrogation of brain autophagy in vivo." Autophagy 15(3): 543-557.

      Lee, J. H., D. S. Yang, C. N. Goulbourne, E. Im, P. Stavrides, A. Pensalfini, H. Chan, C. Bouchet-Marquis, C. Bleiwas, M. J. Berg, C. Huo, J. Peddy, M. Pawlik, E. Levy, M. Rao, M. Staufenbiel and R. A. Nixon (2022). "Faulty autolysosome acidification in Alzheimer's disease mouse models induces autophagic build-up of Abeta in neurons, yielding senile plaques." Nat Neurosci 25(6): 688-701.

      Li, H., S. H. Li, A. L. Cheng, L. Mangiarini, G. P. Bates and X. J. Li (1999). "Ultrastructural localization and progressive formation of neuropil aggregates in Huntington's disease transgenic mice." Hum Mol Genet 8(7): 1227-1236.

      Magnuson, B., B. Ekim and D. C. Fingar (2012). "Regulation and function of ribosomal protein S6 kinase (S6K) within mTOR signalling networks." Biochem J 441(1): 1-21.

      Park, C. and A. M. Cuervo (2013). "Selective autophagy: talking with the UPS." Cell Biochem Biophys 67(1): 3-13.

      Peng, Q., B. Wu, M. Jiang, J. Jin, Z. Hou, J. Zheng, J. Zhang and W. Duan (2016). "Characterization of Behavioral, Neuropathological, Brain Metabolic and Key Molecular Changes in zQ175 Knock-In Mouse Model of Huntington's Disease." PLoS One 11(2): e0148839.

      Ravikumar, B., C. Vacher, Z. Berger, J. E. Davies, S. Luo, L. G. Oroz, F. Scaravilli, D. F. Easton, R. Duden, C. J. O'Kane and D. C. Rubinsztein (2004). "Inhibition of mTOR induces autophagy and reduces toxicity of polyglutamine expansions in fly and mouse models of Huntington disease." Nat Genet 36(6): 585-595.

      Riguet, N., A. L. Mahul-Mellier, N. Maharjan, J. Burtscher, M. Croisier, G. Knott, J. Hastings, A. Patin, V. Reiterer, H. Farhan, S. Nasarov and H. A. Lashuel (2021). "Nuclear and cytoplasmic huntingtin inclusions exhibit distinct biochemical composition, interactome and ultrastructural properties." Nat Commun 12(1): 6579.

      Scherzinger, E., R. Lurz, M. Turmaine, L. Mangiarini, B. Hollenbach, R. Hasenbank, G. P. Bates, S. W. Davies, H. Lehrach and E. E. Wanker (1997). "Huntingtin-encoded polyglutamine expansions form amyloid-like protein aggregates in vitro and in vivo." Cell 90(3): 549-558.

      Sharp, A. H., S. J. Loev, G. Schilling, S. H. Li, X. J. Li, J. Bao, M. V. Wagster, J. A. Kotzuk, J. P. Steiner, A. Lo and et al. (1995). "Widespread expression of Huntington's disease gene (IT15) protein product." Neuron 14(5): 1065-1074.

      Tabrizi, S. J., R. Ghosh and B. R. Leavitt (2019). "Huntingtin Lowering Strategies for Disease Modification in Huntington's Disease." Neuron 101(5): 801-819.

      Wang, X. and C. G. Proud (2006). "The mTOR pathway in the control of protein synthesis." Physiology (Bethesda) 21: 362-369.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      This study offers a valuable investigation into the role of cholecystokinin (CCK) in thalamocortical plasticity during early development and adulthood, employing a range of experimental techniques. The authors demonstrate that tetanic stimulation of the auditory thalamus induces cortical long-term potentiation (LTP), which can be evoked through either electrical or optical stimulation of the thalamus or by noise bursts. They further show that thalamocortical LTP is abolished when thalamic CCK is knocked down or when cortical CCK receptors are blocked. Interestingly, in 18-month-old mice, thalamocortical LTP was largely absent but could be restored through the cortical application of CCK. The authors conclude that CCK contributes to thalamocortical plasticity and may enhance thalamocortical plasticity in aged subjects.

      While the study presents compelling evidence, I would like to offer several suggestions for the authors' consideration:

      (1) Thalamocortical LTP and NMDA-Dependence:

      It is well established that thalamocortical LTP is NMDA receptor-dependent, and blocking cortical NMDA receptors can abolish LTP. This raises the question of why thalamocortical LTP is eliminated when thalamic CCK is knocked down or when cortical CCK receptors are blocked. If I correctly understand the authors' hypothesis - that CCK promotes LTP through CCKR-intracellular Ca2+-AMPAR. This pathway should not directly interfere with the NMDA-dependent mechanism. A clearer explanation of this interaction would be beneficial.

      Thank you for your question regarding the role of CCK and NMDA receptors (NMDARs) in thalamocortical LTP. We propose that CCK receptor (CCKR) activation enhances intracellular calcium levels, which are crucial for thalamocortical LTP induction. Calcium influx through NMDARs is also essential to reach the threshold required for activating downstream signaling pathways that promote LTP (Heynen and Bear, 2001). Thus, CCKRs and NMDARs may function in a complementary manner to facilitate LTP, with both contributing to the elevation of intracellular calcium.

      However, it is important to note that the postsynaptic mechanisms of thalamocortical LTP in the auditory cortex (ACx) differ from those in other sensory cortices. Studies have shown that thalamocortical LTP in the ACx appears to be less dependent on NMDARs (Chun et al., 2013), which is distinct from somatosensory or visual cortices. Our previous studies also found that while NMDAR antagonists can block HFS-induced LTP in the inner ACx, LTP can still be induced in the presence of CCK even after the NMDARs blockade (Chen et al. 2019). These findings suggest that CCK may act through an alternative mechanism involving CCKR-mediated calcium signaling and AMPAR modulation, which partially compensates for the loss of NMDAR signaling. This distinction may reflect functional differences between the ACx and other sensory cortices, as highlighted in previous studies (King and Nelken, 2009).

      While our current study focuses on the role of CCKR-mediated plasticity in the auditory system, further investigations are needed to elucidate how CCKRs and NMDARs interact within the broader framework of thalamocortical neuroplasticity across different cortical regions. Understanding whether similar mechanisms operate in other sensory systems, such as the visual cortex, will be an important direction for future research.

      Heynen, A.J., and Bear, M.F. (2001). Long-term potentiation of thalamocortical transmission in the adult visual cortex in vivo. J Neurosci 21, 9801-9813. 10.1523/jneurosci.21-24-09801.2001.

      Chun, S., Bayazitov, I.T., Blundon, J.A., and Zakharenko, S.S. (2013). Thalamocortical Long-Term Potentiation Becomes Gated after the Early Critical Period in the Auditory Cortex. The Journal of Neuroscience 33, 7345-7357. 10.1523/jneurosci.4500-12.2013.

      Chen, X., Li, X., Wong, Y.T., Zheng, X., Wang, H., Peng, Y., Feng, H., Feng, J., Baibado, J.T., Jesky, R., et al. (2019). Cholecystokinin release triggered by NMDA receptors produces LTP and sound-sound associative memory. Proc Natl Acad Sci U S A 116, 6397-6406. 10.1073/pnas.1816833116.

      King, A. J., & Nelken, I. (2009). Unraveling the principles of auditory cortical processing: can we learn from the visual system? Nature neuroscience, 12(6), 698-701.

      (2) Complexity of the Thalamocortical System:

      The thalamocortical system is intricate, with different cortical and thalamic subdivisions serving distinct functions. In this study, it is not fully clear which subdivisions were targeted for stimulation and recording, which could significantly influence the interpretation of the findings. Clarifying this aspect would enhance the study's robustness.

      Thank you for your valuable feedback. We would like to clarify that stimulation was conducted in the medial geniculate nucleus ventral (MGv), and recording was performed in layer IV of the ACx. Targeting the MGv allows us to investigate the influence of thalamic inputs on auditory cortical responses. Layer IV of the ACx is known to receive direct thalamic projections, making it an ideal site for assessing how thalamic activity influences cortical processing. We will incorporate this clarification into the revised manuscript to enhance the robustness of our study.

      Results section:

      “Stimulation electrodes were placed in the MGB (specifically in the medial geniculate nucleus ventral subdivision, MGv), and recording electrodes were inserted into layer IV of ACx”

      “The recording electrodes were lowered into layer IV of ACx, while the stimulation electrodes were lowered into MGB (MGv subdivision). The final stimulating and recording positions were determined by maximizing the cortical fEPSP amplitude triggered by the ES in the MGB. The accuracy of electrode placement was verified through post-hoc histological examination and electrophysiological responses.”

      (3) Statistical Variability:

      Biological data, including field excitatory postsynaptic potentials (fEPSPs) and LTP, often exhibit significant variability between samples, sometimes resulting in a standard deviation that exceeds 50% of the mean value. The reported standard deviation of LTP in this study, however, appears unusually small, particularly given the relatively limited sample size. Further discussion of this observation might be warranted.

      Thank you for your question. In our experiments, the sample size N represents the number of animals used, while n refers to the number of recordings, with each recording corresponding to a distinct stimulation and recording sites. To adhere to ethical guidelines and minimize animal usage, we often perform multiple recordings within a single animal, such as from different hemispheres of the brain. Although N may appear small, our statistical analyses are based on n, ensuring sufficient data points for reliable conclusions.

      Furthermore, as our experiments are conducted in vivo, we observe lower variability in the increase of fEPSP slopes following LTP induction compared to brain slice preparations, where standard deviations exceeding 50% of the mean are common. This reduced variability likely reflects the robustness of the physiologically intact conditions in the in vivo setup.

      (4) EYFP Expression and Virus Targeting:

      The authors indicate that AAV9-EFIa-ChETA-EYFP was injected into the medial geniculate body (MGB) and subsequently expressed in both the MGB and cortex. If I understand correctly, the authors assume that cortical expression represents thalamocortical terminals rather than cortical neurons. However, co-expression of CCK receptors does not necessarily imply that the virus selectively infected thalamocortical terminals. The physiological data regarding cortical activation of thalamocortical terminals could be questioned if the cortical expression represents cortical neurons or both cortical neurons and thalamocortical terminals.

      Thank you for your question. In Figure 2A, EYFP expression indicates thalamocortical projections, while the co-expression of EYFP with PSD95 confirms the identity of thalamocortical terminals. The CCK-B receptors (CCKBR) are located on postsynaptic cortical neurons. The observed co-labeling of thalamocortical terminals and postsynaptic CCKBR suggests that CCK-expressing neurons in the medial geniculate body (MGB) can release CCK, which subsequently acts on the postsynaptic CCKBR. This evidence supports our interpretation of the functional role of CCK modulating neural plasticity between thalamocortical inputs and cortical neurons. As shown in Figure 2A, we aim to demonstrate that the co-labeling of thalamocortical terminals with CCK receptors accounts for a substantial proportion of the thalamocortical terminals. We will ensure that this clarification is emphasized in the revised manuscript to address your concerns.

      Results section:

      “Cre-dependent AAV9-EFIa-DIO-ChETA-EYFP was injected into the MGB of CCK-Cre mice. EYFP labeling marked CCK-positive neurons in the MGB. The co-expression of EYFP thalamocortical projections with PSD95 confirms the identity of thalamocortical terminals (yellow), which primarily targeted layer IV of the ACx (Figure 2A, upper panel). Immunohistochemistry revealed that a substantial proportion (15 out of 19, Figure 2A lower right panel) of thalamocortical terminals (arrows) colocalize with CCK receptors (CCKBR) on postsynaptic cortical neurons in the ACx (Figure 2A lower panel), supporting the functional role of CCK in modulating thalamocortical plasticity.”

      (5) Consideration of Previous Literature:

      A number of studies have thoroughly characterized auditory thalamocortical LTP during early development and adulthood. It may be beneficial for the authors to integrate insights from this body of work, as reliance on data from the somatosensory thalamocortical system might not fully capture the nuances of the auditory pathway. A more comprehensive discussion of the relevant literature could enhance the study's context and impact.

      Thank you for your valuable feedback. We will enhance our discussion on auditory thalamocortical LTP during early development and adulthood to provide a more comprehensive context for our study.

      (6) Therapeutic Implications:

      While the authors suggest potential therapeutic applications of their findings, it may be somewhat premature to draw such conclusions based on the current evidence. Although speculative discussion is not harmful, it may not significantly add to the study's conclusions at this stage.

      Thank you for your thoughtful feedback. We agree that the therapeutic applications mentioned in our study are speculative at this stage and should be regarded as a forward-looking perspective rather than definitive conclusions. Our intention was to highlight the broader potential of our findings to inspire further research, rather than to propose immediate clinical applications.

      In light of your feedback, we have adjusted the language in the manuscript to reflect a more cautious interpretation. Speculative discussions are now explicitly framed as hypotheses or possibilities for future exploration. We emphasize that our findings provide a foundation for further investigations into CCK-based plasticity and its implications.

      We believe that appropriately framed forward-thinking discussions are valuable in guiding the direction of future research. We sincerely hope that our current and future work will contribute to a deeper understanding of thalamocortical plasticity and, over time, potentially lead to advancements in human health.

      Reviewer #2 (Public review):

      Summary:

      This work used multiple approaches to show that CCK is critical for long-term potentiation (LTP) in the auditory thalamocortical pathway. They also showed that the CCK mediation of LTP is age-dependent and supports frequency discrimination. This work is important because it opens up a new avenue of investigation of the roles of neuropeptides in sensory plasticity.

      Strengths:

      The main strength is the multiple approaches used to comprehensively examine the role of CCK in auditory thalamocortical LTP. Thus, the authors do provide a compelling set of data that CCK mediates thalamocortical LTP in an age-dependent manner.

      Weaknesses:

      The behavioral assessment is relatively limited but may be fleshed out in future work.

      Reviewer #3 (Public review):

      Summary:

      Cholecystokinin (CCK) is highly expressed in auditory thalamocortical (MGB) neurons and CCK has been found to shape cortical plasticity dynamics. In order to understand how CCK shapes synaptic plasticity in the auditory thalamocortical pathway, they assessed the role of CCK signaling across multiple mechanisms of LTP induction with the auditory thalamocortical (MGB - layer IV Auditory Cortex) circuit in mice. In these physiology experiments that leverage multiple mechanisms of LTP induction and a rigorous manipulation of CCK and CCK-dependent signaling, they establish an essential role of auditory thalamocortical LTP on the co-release of CCK from auditory thalamic neurons. By carefully assessing the development of this plasticity over time and CCK expression, they go on to identify a window of time that CCK is produced throughout early and middle adulthood in auditory thalamocortical neurons to establish a window for plasticity from 3 weeks to 1.5 years in mice, with limited LTP occurring outside of this window. The authors go on to show that CCK signaling and its effect on LTP in the auditory cortex is also capable of modifying frequency discrimination accuracy in an auditory PPI task. In evaluating the impact of CCK on modulating PPI task performance, it also seems that in mice <1.5 years old CCK-dependent effects on cortical plasticity are almost saturated. While exogenous CCK can modestly improve discrimination of only very similar tones, exogenous focal delivery of CCK in older mice can significantly improve learning in a PPI task to bring their discrimination ability in line with those from young adult mice.

      Strengths:

      (1) The clarity of the results along with the rigor multi-angled approach provide significant support for the claim that CCK is essential for auditory thalamocortical synaptic LTP. This approach uses a combination of electrical, acoustic, and optogenetic pathway stimulation alongside conditional expression approaches, germline knockout, viral RNA downregulation, and pharmacological blockade. Through the combination of these experimental configures the authors demonstrate that high-frequency stimulation-induced LTP is reliant on co-release of CCK from glutamatergic MGB terminals projecting to the auditory cortex.

      (2) The careful analysis of the CCK, CCKB receptor, and LTP expression is also a strength that puts the finding into the context of mechanistic causes and potential therapies for age-dependent sensory/auditory processing changes. Similarly, not only do these data identify a fundamental biological mechanism, but they also provide support for the idea that exogenous asynchronous stimulation of the CCKBR is capable of restoring an age-dependent loss in plasticity.

      (3) Although experiments to simultaneously relate LTP and behavioral change or identify a causal relationship between LTP and frequency discrimination are not made, there is still convincing evidence that CCK signaling in the auditory cortex (known to determine synaptic LTP) is important for auditory processing/frequency discrimination. These experiments are key for establishing the relevance of this mechanism.

      Weaknesses:

      (1) Given the magnitude of the evoked responses, one expects that pyramidal neurons in layer IV are primarily those that undergo CCK-dependent plasticity, but the degree to which PV-interneurons and pyramidal neurons participate in this process differently is unclear.

      Thank you for this insightful comment. We agree that the differential roles of PV-interneurons and pyramidal neurons in CCK-dependent thalamocortical plasticity remain unclear and acknowledge this as an important limitation of our study. Our primary focus was on pyramidal neurons, as our in vivo electrophysiological recordings measured the fEPSP slope in layer IV of the auditory cortex, which primarily reflects excitatory synaptic activity. However, we recognize the critical role of the excitatory-inhibitory balance in cortical function and the potential contribution of PV-interneurons to this process. In future studies, we plan to utilize techniques such as optogenetics, two-photon calcium imaging and cell-type-specific recordings to investigate the distinct contributions of PV-interneurons and pyramidal neurons to CCK-dependent thalamocortical plasticity, thereby providing a more comprehensive understanding of how CCK modulates thalamocortical circuits.

      (2) While these data support an important role for CCK in synaptic LTP in the auditory thalamocortical pathway, perhaps temporal processing of acoustic stimuli is as or more important than frequency discrimination. Given the enhanced responsivity of the system, it is unclear whether this mechanism would improve or reduce the fidelity of temporal processing in this circuit. Understanding this dynamic may also require consideration of cell type as raised in weakness #1.

      Thank you for this thoughtful comment. We acknowledge that our study did not directly address the fidelity of temporal processing, which is indeed a critical aspect of auditory function. Our behavioral experiments primarily focused on linking frequency discrimination to the role of CCK in synaptic strengthening within the auditory thalamocortical pathway. However, we agree that enhanced responsivity of the system could also impact temporal processing dynamics, such as the precise timing of auditory responses. Whether this modulation improves or reduces the fidelity of temporal processing remains an open and important question.

      As you noted, understanding these dynamics will require a deeper investigation into the interactions between different cell types, particularly the balance between excitatory and inhibitory neurons. Exploring how CCK modulation affects both the circuit and cellular levels in temporal processing is an important direction for future research, which we plan to pursue. Thank you again for raising this important point.

      Disscusion section:

      “While we focused on homosynaptic plasticity at thalamocortical synapses by recording only fEPSPs in layer IV of ACx, it is essential to further explore heterosynaptic effects of CCK released from thalamocortical synapses on intracortical circuits, particularly its role in modulating the excitatory-inhibitory balance. PV-interneurons, as key regulators of cortical inhibition, may contribute to the temporal fidelity of sensory processing, which is critical for auditory perception (Nocon et al., 2023; Cai et al., 2018). Additionally, CCK may facilitate cross-modal plasticity by modulating heterosynaptic plasticity in interconnected cortical areas. Future studies would provide valuable insights into the broader role of CCK in shaping sensory processing and cortical network dynamics.”

      Nocon, J.C., Gritton, H.J., James, N.M., Mount, R.A., Qu, Z., Han, X., and Sen, K. (2023). Parvalbumin neurons enhance temporal coding and reduce cortical noise in complex auditory scenes. Communications Biology 6, 751. 10.1038/s42003-023-05126-0.

      Cai, D., Han, R., Liu, M., Xie, F., You, L., Zheng, Y., Zhao, L., Yao, J., Wang, Y., Yue, Y., et al. (2018). A Critical Role of Inhibition in Temporal Processing Maturation in the Primary Auditory Cortex. Cereb Cortex 28, 1610-1624. 10.1093/cercor/bhx057.

      (3) In Figure 1, an example of increased spontaneous and evoked firing activity of single neurons after HFS is provided. Yet it is surprising that the group data are analyzed only for the fEPSP. It seems that single-neuron data would also be useful at this point to provide insight into how CCK and HFS affect temporal processing and spontaneous activity/excitability, especially given the example in 1F.

      Thank you for your insightful comment. In our in vivo electrophysiological experiments on LTP induction, we recorded neural activity for over 1.5 hours to assess changes in neuronal responses over time, both prior to and following the induction. While single neuron firing data can provide valuable insights, such measurements are inherently more variable due to factors like cortical state fluctuations and the condition of nearby neurons, which makes them less reliable for long-term analysis. For this reason, we focused on fEPSP, as it offers a more stable and robust readout of synaptic activity over extended periods.

      We appreciate your suggestion and recognize the value of single-neuron data in understanding how CCK and HFS affect temporal processing and excitability. In future studies, we will consider to incorporate single-neuron analyses to complement our synaptic-level findings and provide a more comprehensive understanding of these mechanisms.

      (4) The authors mention that CCK mRNA was absent in CCK-KO mice, but the data are not provided.

      Thank you for your comment. Data from the CCK-KO mice are presented in Figure 3A (far right) and in the upper panel of Figure 3B (far right). In the lower panel of Figure 3B, data from the CCK-KO group are not shown because the normalized values for this group were essentially zero, as expected due to the absence of CCK mRNA.

      (5) The circuitry that determines PPI requires multiple brain areas, including the auditory cortex. Given the complicated dynamics of this process, it may be helpful to consider what, if anything, is known specifically about how layer IV synaptic plasticity in the auditory cortex may shape this behavior.

      Thank you for raising this important point. Pre-pulse inhibition (PPI) of the acoustic startle response indeed involves multiple brain regions, with the ascending auditory pathway playing a key role (Gómez-Nieto et al., 2020). Within the auditory cortex, layer IV neurons receive tonotopically organized inputs from the medial geniculate nucleus and are critical for integrating thalamic inputs and shaping auditory processing.

      In our behavioral experiments, mice were required to discriminate pre-pulses of varying frequencies against a continuous background sound. Given the role of auditory cortical neurons in integrating thalamic inputs and shaping auditory processing, it is likely that synaptic plasticity in these neurons contributes to the enhanced discrimination of pre-pulses. Supporting this idea, our previous work demonstrated that local infusion of CCK, paired with weak acoustic stimuli, significantly increased auditory responses in the auditory cortex (Li et al., 2014). In the current study, we further showed that CCK release during high-frequency stimulation of the thalamocortical pathway induced LTP in layer IV of the auditory cortex. Together, these findings suggest that CCK-dependent synaptic plasticity in layer IV may amplify the cortical representation of weak auditory inputs, thereby improving pre-pulses detection and enhancing PPI performance.

      It is also worth noting that aged mice with hearing loss typically exhibit PPI deficits due to impaired auditory processing (Ouagazzal et al., 2006 and Young et al., 2010). We propose that enhanced plasticity in the thalamocortical pathway, mediated by CCK, might partially compensate for these deficits by amplifying residual auditory signals in aged mice. However, the precise mechanisms by which layer IV synaptic plasticity modulates PPI behavior remain to be fully understood. Given the complex dynamics of sensory processing, future studies could explore how layer IV neurons interact with other cortical and subcortical circuits involved in PPI, as well as the specific contributions of excitatory and inhibitory cell types. These investigations will help provide a more comprehensive understanding of the role of CCK in modulating sensory gating and auditory processing.

      Gómez-Nieto, R., Hormigo, S., & López, D. E. (2020). Prepulse inhibition of the auditory startle reflex assessment as a hallmark of brainstem sensorimotor gating mechanisms. Brain sciences, 10(9), 639.

      Li, X., Yu, K., Zhang, Z., Sun, W., Yang, Z., Feng, J., Chen, X., Liu, C.-H., Wang, H., Guo, Y.P., and He, J. (2014). Cholecystokinin from the entorhinal cortex enables neural plasticity in the auditory cortex. Cell Research 24, 307-330. 10.1038/cr.2013.164.

      Ouagazzal, A. M., Reiss, D., & Romand, R. (2006). Effects of age-related hearing loss on startle reflex and prepulse inhibition in mice on pure and mixed C57BL and 129 genetic background. Behavioural brain research, 172(2), 307-315.

      Young, J. W., Wallace, C. K., Geyer, M. A., & Risbrough, V. B. (2010). Age-associated improvements in cross-modal prepulse inhibition in mice. Behavioral neuroscience, 124(1), 133.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      Major concerns:

      (1) In Figure 1, the authors used different metrics for fEPSP strength. In Figure 1D, the authors used the slope, while they used the amplitude in Figure 1G. It is known that the two metrics are different from each other. While the slope is calculated from the linear regression between the voltage change per time of the rising phase of the fEPSP, the amplitude represents the voltage value of the fEPSP's peak. Please clarify here and in the method what metric you used, because the two terms are not interchangeable.

      Thank you for pointing out this oversight in our manuscript. We confirm that we used the slope of the fEPSP as the metric for assessing synaptic strength throughout the study, including both Figure 1D and Figure 1G. We will make the necessary corrections to ensure clarity and consistency. Thank you for bringing this to our attention.

      (2) It is not mentioned in the details of the methods about the CCK-KO mice. Please give such details. Although the authors used the CCK-KO mouse model as a control, I think that it is not a good choice to test the hypothesis mentioned in lines 165 and 166. The experiment was supposed to monitor the CCK-BR activity after HFS of the MGB and answer whether the CCK-BR will get activated by thalamic stimulation, but the CCK-KO mouse does not have CCK to be released after the optogenetic activation of the Chrimson probe. Therefore, it is expected to give nothing as if the experimenter runs an experiment without intervention. I think that the appropriate way to examine the hypothesis is to compare mice that were either injected with AAV9-Syn-FLEX-ChrimsonR-tdTomato or AAV9-Syn-FLEX-tdTomato. However, CCK-OK would be a perfect model to confirm that LTP can be only generated dependently on CCK, by simply running the HFS of the MGB that would be associated with the cortical recording of the fEPSP. This also will rule out the assumption that the authors mentioned in lines 191 and 192.

      Thank you for your valuable feedback. The rationale behind our experimental design was to validate the newly developed CCK sensor and confirm its specificity. We aimed to verify CCK release post-HFS by comparing the responses of the CCK sensor in CCK-KO mice and CCK-Cre mice. This comparison allowed us to determine that the observed increase in fluorescence intensity post-HFS was specifically due to CCK release, rather than other neurotransmitters induced by HFS.

      We appreciate your suggestion to compare mice injected with AAV9-Syn-FLEX-ChrimsonR-tdTomato and AAV9-Syn-FLEX-tdTomato, as it is indeed a valuable approach for directly testing the hypothesis regarding CCK-BR activation. However, we prioritized using the CCK-KO model to validate the CCK sensor's efficacy and specificity. The validation can be inferred by comparing the CCK sensor activity before and after HFS.

      Regarding concerns mentioned in lines 191 and 192 about potential CCK release from other projections via indirect polysynaptic activation, CCK-KO mice were not suitable for this aspect due to their global knockout of CCK. To address this limitation, we utilized shRNA to specifically down-regulate Cck expression in MGB neurons. This approach focused on the necessity of CCK released from thalamocortical projections for the observed LTP and effectively ruled out the possibility of indirect polysynaptic activation.

      We also acknowledge that the methods section lacked sufficient details about the CCK-KO mice, which may have caused confusion. In the revised methods section, we will add the following details:

      (1) The genotype of the CCK-KO mice used in this study (CCK-ires-CreERT2, Jax#012710).

      (2) A brief description of the CCK-KO validation, emphasizing the absence of CCK mRNA in these mice (as shown in Figure 3A and 3B).

      (3) The experimental purpose of using CCK-KO mice to validate the specificity of the CCK sensor.

      We believe these additions will clarify the rationale for using CCK-KO mice and their role in this study. Thank you again for highlighting these important points.

      (3) Figure 3C: The authors should examine if there is a difference in the baseline of fEPSPs across different age groups as the dependence on the normalization in the analysis within each group would hide if there were any difference of the baseline slope of fEPSP between groups which could be related to any misleading difference after HFS. Also, I wonder about the absence of LTP in P20, which is a closer age to the critical period. Could the authors discuss that, please?

      Thank you for your insightful feedback. To address your concern regarding baseline differences in fEPSP slopes across age groups, we conducted additional analysis. Baseline fEPSP across the three groups (P20, 8w, 18m), normalized to the 8w group, were 64.8± 13.1%, 100.0 ± 20.4%, and 58.8± 10.3%, respectively. While there was a trend suggesting smaller fEPSP slopes in the P20 and 18m groups compared to the young adult group, these differences were not statistically significant due to data variability (P20 vs. 8w, P = 0.319; 8w vs. 18m, P=0.147; P20 vs. 18m, P = 1.0, one-way ANOVA). These results suggest that baseline variability is unlikely to confound the observed differences in LTP after HFS. Furthermore, we ensured that normalization minimized any potential baseline effects.

      Regarding the absence of LTP in P20, this likely reflects developmental regulation of CCKBR expression in the auditory cortex (ACx). The HFS-induced thalamocortical LTP observed in our study is CCK-dependent and mechanistically distinct from the NMDA-dependent thalamocortical LTP during the critical period. Specifically, correlated pre- and postsynaptic activity can induce NMDA-dependent thalamocortical LTP only during an early critical period corresponding to the first several postnatal days, after which this pairing becomes ineffective starting from the second postnatal week (Crair and Malenka, 1995; Isaac et al., 1997; Chun et al., 2013). In contrast, the CCK-dependent Thalamocortical LTP induced by HFS is robust in adult mice but appears absent in P20, likely due to the lack of postsynaptic CCKBR expression in the ACx at this developmental stage.

      We will include these clarifications in the revised manuscript, particularly in the Discussion section, to provide a more comprehensive explanation of our findings. Thank you for your valuable comments and suggestions.

      Crair, M.C., and Malenka, R.C. (1995). A critical period for long-term potentiation at thalamocortical synapses. Nature 375, 325-328. 10.1038/375325a0.

      Isaac, J.T.R., Crair, M.C., Nicoll, R.A., and Malenka, R.C. (1997). Silent Synapses during Development of Thalamocortical Inputs. Neuron 18, 269-280. https://doi.org/10.1016/S0896-6273(00)80267-6.

      Chun, S., Bayazitov, I.T., Blundon, J.A., and Zakharenko, S.S. (2013). Thalamocortical Long-Term Potentiation Becomes Gated after the Early Critical Period in the Auditory Cortex. The Journal of Neuroscience 33, 7345-7357. 10.1523/jneurosci.4500-12.2013.

      (4) Figure 4F: It is noticed that the baseline fEPSP of the CCK group and ACSF groups were different, which raises a concern about the baseline differences between treatment groups.

      Thank you for your valuable feedback and for pointing out this important detail. We apologize for any confusion caused by the presentation of the data. As noted in the figure legend, the scale bars for the fEPSPs were different between the left (0.1 mV) and right panels (20 µV). This difference in scale may have created the perception of baseline differences between the CCK and ACSF groups. To enhance clarity and avoid potential misunderstanding, we will unify the scale bar values in the revised figure. This adjustment will provide a clearer and more accurate comparison of fEPSPs between groups. Thank you again for bringing this issue to our attention.

      (5) From Figure S2D, it seems that different animals were injected with the drug and ACSF. Therefore, how the authors validate the position of the recording electrode to the cortical area of certain CF and relative EF. Also, there is not enough information about the basis of the selection of the EF. Should it be lower than the CF with a certain value? Was the EF determined after the initial tuning curve in each case? To mitigate this difference, it would be appropriate if the authors examined the presence of a significant difference in the tuning width and CFs between animals exposed to ACSF and CCK-4. This will give some validation of a balanced experiment between ACSF and CCK-4. I wonder also why the authors used rats here not mice, as it will be easier to interpret the results came from the same species.

      Thank you for your thoughtful comments. The effective frequency (EF) was determined after measuring the initial tuning curve for each case. The EF was selected to elicit a clear sound response while maintaining a sufficient distance from the characteristic frequency (CF) to allow measurable increases in response intensity. Specifically, EF was selected based on the starting point of the tuning peak, which corresponds to the onset of its fastest rising phase. From this point, EF was determined by moving 0.2 or 0.4 octaves toward the CF. While there were individual differences in EF selection among animals, the methodology for determining EF was standardized and applied consistently across both the ACSF and CCK-4 groups.

      Regarding the use of rats in these experiments, these studies were conducted prior to our current work with mice. The findings in rat provide valuable insights that support our current results in mice. Since the rat data are supplementary to the primary findings, we included them as supplementary material to provide additional context and validation. Furthermore, in consideration of animal welfare, we chose not to replicate these experiments in mice, as the findings from rats were sufficient to support our conclusions.

      Methods section:

      “The tuning curve was determined by plotting the lowest intensity at which the neuron responded to different tones. The characteristic frequency (CF) is defined as the frequency corresponding to the lowest point on this curve. The effective frequency (EF) was determined to elicit a clear sound response while maintaining a sufficient distance from the CF to allow measurable increases in response intensity. Specifically, EF was selected based on the starting point of the tuning peak, which corresponds to the onset of its fastest rising phase. From this point, EF was determined by moving 0.2 or 0.4 octaves toward the CF.”

      (6) Lines 384-386: There are no figures named 5H and I.

      Thank you for pointing this out. The references to Figures 5H and 5I were incorrect and should have referred to Figures 5C and 5D. We sincerely apologize for this oversight and will correct these errors in the revised manuscript to ensure clarity and accuracy. Thank you again for bringing this to our attention.

      (7) The authors should mention the sex of the animals used.

      Thank you for your comment and for highlighting this important detail. The sex of the animals used in this study is specified in the Animals section of the Methods: "In the present study, male mice and rats were used to investigate thalamocortical LTP." We appreciate your careful attention to this point and will ensure that this detail remains clearly stated in the manuscript.

      (8) Lines 534 and 648: These coordinates are difficult to understand. Since the experiment was done on both mice and rats, we need a clear description of the coordinates in both. Also, I think that you should mention the lateral distance from the sagittal suture as the ventral coordinates should be calculated from the surface of the skull above the AC and not from the sagittal suture.

      Thank you for your valuable feedback and for pointing out this important issue. We apologize for any confusion caused by our description of the coordinates. The term “ventral” was deliberately used because the auditory cortex is located on the lateral side of the skull, which may have caused some misunderstanding.

      To provide a clearer and more accurate descriptions of the coordinates, we will revise the text in the manuscript as follows: “A craniotomy was performed at the temporal bone (-2 to -4 mm posterior and -1.5 to -3 mm ventral to bregma for mice; -3.0 to -5.0 mm posterior and -2.5 to -6.5 mm ventral to bregma for rats) to access the auditory cortex.'

      We appreciate your attention to these details and will ensure that the revised manuscript includes this clarification to improve accuracy and eliminate potential confusion. Thank you again for bringing this to our attention.

      (9) Line 536: The author should specify that these coordinates are for the experiment done on mice.

      Thank you for your valuable feedback. We will revise the manuscript to explicitly specify that these coordinates refer to the experiments conducted on mice. This clarification will help improve the clarity and precision of the manuscript. We greatly appreciate your attention to this point and your effort to enhance the quality of our work.

      Methods section:

      “and a hole was drilled in the skull according to the coordinates of the ventral division of the MGB (MGv, AP: -3.2 mm, ML: 2.1 mm, DV: 3.0 mm) for experiments conducted on mice.”

      (10) Line 590: Please add the specifications of the stimulating electrode. Is it unipolar or bipolar? What is the cat.# provided by FHC?

      Thank you for your valuable feedback. The electrodes used in the experiments are unipolar. We will include the catalog number provided by FHC in the revised manuscript for clarity. The revised text will be updated as follows:

      “In HFS-induced thalamocortical LTP experiments, two customized microelectrode arrays with four tungsten unipolar electrodes each, impedance: 0.5-1.0 MΩ (recording: CAT.# UEWSFGSECNND, FHC, U.S.), and 200-500 kΩ (stimulating: CAT.# UEWSDGSEBNND, FHC, U.S.), were used for the auditory cortical neuronal activity recording and MGB ES, respectively.”

      We appreciate your attention to this detail, and we will ensure that the revised manuscript reflects this clarification accurately.

      (11) Lines 612-614: There are no details of how the optic fiber was inserted or post-examined. If there is a word limitation, the authors may reference another study showing these procedures.

      Thank you for your insightful comment and for highlighting this important aspect of the methodology. To address this, we will reference the study by Sun et al. (2024) in the revised manuscript, which provides detailed procedures for optic fiber insertion and post-examination. We believe that this reference will help enhance the clarity and completeness of the methods section.

      Sun, W., Wu, H., Peng, Y., Zheng, X., Li, J., Zeng, D., Tang, P., Zhao, M., Feng, H., Li, H., et al. (2024). Heterosynaptic plasticity of the visuo-auditory projection requires cholecystokinin released from entorhinal cortex afferents. eLife 13, e83356. 10.7554/eLife.83356.

      We appreciate your valuable suggestion, which will contribute to improving the quality of the manuscript.

      Minor concerns:

      (1) The definition of HFS was repeated many times throughout the manuscript. Please mention the defined name for the first time in the manuscript only followed by its abbreviation (HFS).

      Thank you for your suggestion and for pointing out this important detail. We will revise the manuscript to ensure that all abbreviations are defined only upon their first mention in the manuscript, with subsequent mentions using the abbreviations consistently. We appreciate your careful attention to detail and your effort to help improve the manuscript.

      (2) Line 173: There is a difference between here and the methods section (620 nm here and 635 nm there) please correct which wavelength the authors used.

      Thank you for your careful review and for bringing this discrepancy to our attention. We have corrected the inconsistency, and the wavelength has been unified throughout the manuscript to ensure accuracy and clarity. The revised text now reads as follows:

      “The fluorescent signal was monitored for 25s before and 60s after the HFLS (5~10 mW, 620 nm) or HFS application.”

      We appreciate your valuable feedback, which has helped us improve the precision and consistency of the manuscript.

      (3) Line 185: I think the authors should refer to Figure 2G before mentioning the statistical results.

      Thank you for your careful review and for pointing out this oversight. We have now added a reference to Figure 2G at the appropriate location to ensure clarity and logical flow in the manuscript, as recommended..

      (4) Line 202: I think the authors should refer to Figure 2J before mentioning the statistical results.

      Thank you again for your careful review and for highlighting this point. We have revised the manuscript to include a reference to Figure 2J before mentioning the statistical results.

      We appreciate your valuable feedback, which has helped us improve the accuracy and presentation of the results.

      (5) Line 260: Please add appropriate references at the end of the sentence to support the argument.

      Thank you for your valuable suggestion. To address this, we have add appropriate references to support the statement regarding the multiple steps involved between mRNA expression and neuropeptide release. Additionally, we have revised the statement to adopt a more cautious interpretation. The revised text is as follows:

      “It is widely recognized that mRNA levels do not always directly correlate with peptide levels due to multiple steps involved in peptide synthesis and processing, including translation, post-translational modifications, packaging, transportation, and proteolytic cleavage, all of which require various enzymes and regulatory mechanisms (38-41). A disruption at any stage in this process could lead to impaired CCK release, even when Cck mRNA is present.”

      We have included the following references to support this statement:

      38. Mierke, C.T. (2020). Translation and Post-translational Modifications in Protein Biosynthesis. In Cellular Mechanics and Biophysics: Structure and Function of Basic Cellular Components Regulating Cell Mechanics, C.T. Mierke, ed. (Springer International Publishing), pp. 595-665. 10.1007/978-3-030-58532-7_14.

      39. Gualillo, O., Lago, F., Casanueva, F.F., and Dieguez, C. (2006). One ancestor, several peptides post-translational modifications of preproghrelin generate several peptides with antithetical effects. Mol Cell Endocrinol 256, 1-8. 10.1016/j.mce.2006.05.007.

      40. Sossin, W.S., Fisher, J.M., and Scheller, R.H. (1989). Cellular and molecular biology of neuropeptide processing and packaging. Neuron 2, 1407-1417. https://doi.org/10.1016/0896-6273(89)90186-4.

      41. Hook, V., Funkelstein, L., Lu, D., Bark, S., Wegrzyn, J., and Hwang, S.R. (2008). Proteases for processing proneuropeptides into peptide neurotransmitters and hormones. Annu Rev Pharmacol Toxicol 48, 393-423. 10.1146/annurev.pharmtox.48.113006.094812.

      We greatly appreciate your helpful feedback, which has allowed us to improve both the accuracy and the depth of discussion in the manuscript.

      (6) Line 278: The authors mentioned "due to the absence of CCK in aged animals", which was not an appropriate description. It should be a reduction of CCK gene expression or a possible deficient CCK release.

      Thank you for your careful review and for pointing out the inaccuracy in our description. We agree with your suggestion and have revised the statement to more appropriately reflect the findings.

      “Our findings revealed that thalamocortical LTP cannot be induced in aged mice, likely due to insufficient CCK release, despite intact CCKBR expression.”

      This revision ensures a more accurate and precise description of the potential mechanisms underlying the observed phenomenon. We greatly appreciate your valuable feedback, which has helped us improve the clarity and accuracy of the manuscript.

      (7) Line 291: The authors mentioned that "without MGB stimulation", which is confusing. The MGB was stimulated with a single electrical pulse to evoke cortical fEPSPs. Therefore it should be "without HFS of MGB".

      Thank you for pointing this out and for highlighting the potential confusion caused by our original phrasing. Upon review, we recognize that our original phrasing "without MGB stimulation" may have been unclear and could have led to misinterpretation. To clarify, our intention was to describe the period during which CCK was present without any stimulation of the MGB.

      It is important to note that, in the presence of CCK, LTP can be induced even with low-frequency stimulation, including in aged mice. This observation underscores the potent effect of CCK in facilitating thalamocortical LTP, regardless of the specific stimulation protocol used.

      To address this issue, we have revised the sentence for improved clarity as follows::

      " To investigate whether CCK alone is sufficient to induce thalamocortical LTP without activating thalamocortical projections, we infused CCK-4 into the ACx of young adult mice immediately after baseline fEPSPs recording. Stimulation was then paused for 15 min to allow for CCK degradation, after which recording resumed."

      We believe this revision resolves the misunderstanding and provides a clearer and more accurate description of the experimental context. We greatly appreciate your insightful feedback, which has helped us refine the manuscript for clarity and precision.

      Reviewer #3 (Recommendations for the authors):

      Minor comments:

      (1) Line 99, 134, possibly other locations: "site" to "sites".

      Thank you for your careful review. We appreciate your attention to detail and have made the necessary corrections in the manuscript.

      (2) Throughout the manuscript there are some minor issues with language choice and subtle phrasing errors and I suggest English language editing.

      Thank you for your suggestion. In response, we have thoroughly reviewed the manuscript and addressed issues related to language choice and phrasing. The text has been carefully edited to ensure clarity, precision, and consistency. We believe these revisions have significantly enhanced the overall quality of the manuscript. We greatly appreciate your feedback, which has been invaluable in improving the presentation of our work.

      (3) Based on the experimental configurations, I do not think it is a problematic caveat, but authors should be aware of the high likelihood of AAV9 jumping synapses relative to other AAV serotypes.

      Thank you for bringing up the potential of AAV9 crossing synapses, a recognized characteristic of this serotype. We appreciate your observation regarding its relevance to our experimental design. In our study, we carefully considered the possibility of trans-synaptic transfer during both the experimental design and data interpretation phases. To minimize the likelihood of significant trans-synaptic spread, we implemented several measures, including controlling the injection volume, using a slow injection rate, and limiting the viral expression time. Post-hoc histological analyses confirmed that the expression of AAV9 was largely confined to the intended regions, with limited evidence of synaptic jumping under our experimental conditions.

      While we acknowledge the inherent potential for AAV9 to cross synapses, we believe this effect does not substantially confound the interpretation of our findings in the current study. To address this concern, we have added a brief discussion on this point in the revised manuscript to enhance clarity. We greatly appreciate your insightful comment, which has helped us further refine our work.

      Discussion section:

      “ One potential limitation of our study is the trans-synaptic transfer property of AAV9. To mitigate this, we carefully controlled the injection volume, rate, and viral expression time, and conducted post-hoc histological analyses to minimize off-target effects, thereby reducing the likelihood of trans-synaptic transfer confounding the interpretation of our findings.”

      (4) The trace identifiers (1-4) do not seem correctly placed/colored in Figure S1D. Please check others carefully.

      Thank you for your careful review and for bringing this issue to our attention. We have corrected the trace identifiers in Figure S1D. Additionally, we have carefully reviewed all other figures to ensure their accuracy and consistency. We greatly appreciate your attention to detail, which has helped improve the overall quality of the manuscript.

      (5) Please provide a value of the laser power range based on calibrated values.

      Thank you for your suggestion. We have included the calibrated laser power range in the revised manuscript as follows:

      “The laser stimulation was produced by a laser generator (5-20 mW(30), Wavelength: 473 nm, 620 nm; CNI laser, China) controlled by an RX6 system and delivered to the brain via an optic fiber (Thorlabs, U.S.) connected to the generator.”

      We appreciate your feedback, which has helped improve the clarity and precision of our methodological description.

      (6) It would be useful to annotate figures in a way that identifies in which transgenic mice experiments are being performed.

      Thank you for your valuable suggestion. We will add annotations to the figures to explicitly identify the type of mice used in each experiment. We believe this enhancement will improve the clarity and accessibility of our results. We greatly appreciate your input in making our manuscript more informative.

      (7) Please comment on the rigor you use to address the accuracy of viral injections. How often did they spread outside of the MGB/AC?

      Thank you for raising this important question regarding the accuracy of viral injections and the potential spread outside the MGB or AC. Below, we provide details for each set of experiments:

      shRNA Experiments:

      For the shRNA experiments targeting the MGB, our primary goal was to achieve comprehensive coverage of the entire MGB. To this end, we used larger injection volumes and multiple injection sites, which inevitably resulted in some viral spread beyond the MGB. However, this approach was necessary to ensure robust knockdown effects that were representative of the entire MGB. While strict confinement to specific subregions could not be guaranteed, this strategy allowed us to prioritize the effectiveness of the knockdown within the target region.

      Fiber photometry Experiments:

      For the fiber photometry experiments targeting the auditory cortex (AC), we used larger injection volumes and multiple injection sites to cover its relatively large size. Although this approach might have resulted in some CCK-sensor virus spread outside the AC, the placement of the optic fiber was guided by the location of the auditory cortex. Consequently, any minor viral expression outside the AC would not affect the experimental results, as recordings were confined to the intended area through precise fiber placement.  

      Optogenetic Experiments:

      For the optogenetic experiments targeting the MGB, we specifically injected virus into the MGv subregion. To minimize viral spread, we employed several strategies, including the used fine injection needles, waiting for tissue stabilization (7 minutes post-needle insertion), delivering small volumes at a slow rate to prevent backflow, aspirating 5 nL of the solution post-injection, and raising the needle by 100 μm before waiting an additional 5 minutes prior to full retraction. These measures significantly reduced the risk of viral leakage to adjacent regions.

      Histological Validation:

      After the electrophysiological experiments, we systematically verified the accuracy of viral expression by examining histological sections to ensure that the expression was primarily localized within the intended regions.

      Terminology in the Manuscript:

      In the manuscript, we deliberately used the term "MGB" in the manuscript rather than specifically "MGv" to transparently acknowledge the potential for viral spread in some experiments.

      We hope this explanation clarifies the strategies we employed to address the accuracy of viral injections, as well as how we managed potential viral spread. We have also added a brief information in the revised manuscript to reflect these points and acknowledge the inherent variability in viral delivery.

    1. Author response:

      The following is the authors’ response to the original reviews

      We thank the reviewers for their constructive and helpful comments, which led us to make major changes in the model and manuscript, including adding the results of new experiments and analyses. We believe that the revised manuscript is much better than the previous version and that it addresses all issued raised by the reviewers. 

      Summary of changes made in the revised manuscript:

      (1) We increased the training set size from 39 video clips to 97 video clips and the testing set size from 25 video clips to 60 video clips. The increase in training set size improved the overall accuracy from a mean F1 score of 0.81 in the previous version to a mean F1 score of 0.891 (see Figure 2 and Figure 3) in the current version. Specifically, the F1 score for urine detection was improved from 0.79 to 0.88.

      (2) We further evaluated the accuracy of the DeePosit algorithm in comparison to a second human annotator and found that the algorithm accuracy is comparable to human-level accuracy.

      (3) The additional test videos allowed us to test the consistency of the algorithm performance across gender, space, time, and experiment type (SP, SxP, and ESPs). We found consistent levels of performance across all categories (see Figure 3), suggesting that errors made by the algorithm are uniform across conditions, hence should not create any bias of the results.

      (4) In addition, we tested the algorithm performance on a second strain of mice (male C57BL/6) in a different environmental condition (white arena instead of a black one) and found that the algorithm achieves comparable accuracy, even though C57BL/6 mice and white arena were not included in the training set. Thus, the algorithm seems to be robust and efficient across various experimental conditions.

      (5) Analyzing urination and defecation dynamics in an additional strain of mice revealed interesting strain-specific features, as discussed in the revised manuscript.

      (6) Overall, we found DeePosit accuracy to be stable with no significant bias across stages of the experiment, types of the experiment, gender of the mice, strain of mice, and across experimental conditions.

      (7) We also compared the performance of DeePosit to a classic object detection algorithm: YOLOv8. We trained YOLOv8 both on a single image input (YOLOv8 Gray) and on 3 image inputs representing a sequence of three time points around the ground truth event (t): t+0, t+10, and t+30 seconds (YOLOv8 RGB). DeePosit achieved significantly better accuracy over both YOLOv8 alternatives. YOLOv8 RGB achieved better accuracy than YOLOv8 Gray, suggesting that temporal information is important for this task. It's worth mentioning that while YOLOv8 requires the annotator to draw rectangles surrounding each urine spot or feces as part of the training set, our algorithm training set used just a single click inside each spot, allowing faster generation of training sets. 

      (8) As for the algorithm parameters, we tested the effect of the main parameter of the preliminary detection (the temperature threshold for the detection of a new blob) and found that a threshold of 1.6°C gave the best accuracy and used this parameter for all of the experiments instead of 1.1°C which was used in the original manuscript. It's worth mentioning that the performance is quite stable (mean F1 score of 0.88-0.89) for the thresholds between 1.1°C and 3°C (Figure 3—Figure Supplement 2).

      (9) We also checked if changing the input length of the video clip that is fed to the classifier affects the accuracy by training the classifier with -11..30 seconds video clips (41 seconds in total) instead of -11..60 seconds (71 seconds in total) and found no difference in accuracy. 

      (10) In the revised paper, we report recall, precision, and F1 scores in the caption of the relevant figures and also supply Excel files with the full statistics for each of the figures.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The manuscript provides a novel method for the automated detection of scent marks from urine and feces in rodents. Given the importance of scent communication in these animals and their role as model organisms, this is a welcome tool.

      We thank the reviewer for the positive assessment of our tool

      Strengths:

      The method uses a single video stream (thermal video) to allow for the distinction between urine and feces. It is automated.

      Weaknesses:

      The accuracy level shown is lower than may be practically useful for many studies. The accuracy of urine is 80%. 

      We have trained the model better, using a larger number of video clips. The increase in training set size improved the overall accuracy from a mean F1 score of 0.81 in the previous version to a mean F1 score of 0.891 (see Figure 2 and Figure 3) in the current version. Specifically, the F1 score for urine detection was improved from 0.79 to 0.88. 

      This is understandable given the variability of urine in its deposition, but makes it challenging to know if the data is accurate. If the same kinds of mistakes are maintained across many conditions it may be reasonable to use the software (i.e., if everyone is under/over counted to the same extent). Differences in deposition on the scale of 20% would be challenging to be confident in with the current method, though differences of the magnitude may be of biological interest. Understanding how well the data maintain the same relative ranking of individuals across various timing and spatial deposition metrics may help provide further evidence for the utility of the method.

      The additional test videos allowed us to test the consistency of the algorithm performance across gender, space, time and experiment type (SP, SxP, and ESP). We found consistent levels of performance across all categories (see Figure 3), suggesting that errors made by the algorithm are uniform across conditions, hence should not create any bias of the results.

      Reviewer #2 (Public Review):

      Summary:

      The authors built a tool to extract the timing and location of mouse urine and fecal deposits in their laboratory set up. They indicate that they are happy with the results they achieved in this effort.

      Yes, we are.

      The authors note urine is thought to be an important piece of an animal's behavioral repertoire and communication toolkit so methods that make studying these dynamics easier would be impactful.

      We thank the reviewer for the positive assessment of our work.

      Strengths:

      With the proposed method, the authors are able to detect 79% of the urine that is present and 84% of the feces that is present in a mostly automated way.

      Weaknesses:

      The method proposed has a large number of design choices across two detection steps that aren't investigated. I.e. do other design choices make the performance better, worse, or the same? 

      We chose to use a heuristic preliminary detection algorithm for the detection of warm blobs, since warm blobs can be robustly detected with heuristic algorithms without the need for a training set. This design selection might allow easier adaptation of our algorithm for different types of arenas. Another advantage of using a heuristic preliminary detection is the easy control of the preliminary detection parameters such as the minimum temperature difference for detecting a blob, size limits of the detected blob, cooldown rate and so on that may help in adopting it to new conditions. As for the classifier, we chose to feed it with a relatively small window surrounding each preliminary detection, and hence it is not affected by the arena’s appearance outside of its region of interest. This should allow lower sensitivity to the arena’s appearance.  

      As for the algorithm parameters, we tested the effect of the main parameter of the preliminary detection (the temperature threshold for the detection of a new blob) and found that a threshold of 1.6°C gave the best accuracy and used this parameter for all of the experiments instead of 1.1°C which was used in the original manuscript. It's worth mentioning that the performance is quite stable (mean F1 score of 0.88-0.89) for the thresholds between 1.1°C and 3°.

      We also checked if changing the input length of the video clip fed to the classifier affects the accuracy by training the classifier with -11..30 seconds video clips (41 seconds in total) instead of -11..60 seconds (71 seconds in total) and found no difference in accuracy. 

      Overall, the algorithm's accuracy seems to be rather stable across various choices of parameters.

      Are these choices robust across a range of laboratory environments?

      We tested the algorithm performance on a second strain of mice (male C57BL/6) in a different environmental condition (white arena instead of a black one) and found that the algorithm achieves comparable accuracy, even though C57BL/6 mice and white arena were not included in the training set. Thus, the algorithm seems to be robust and efficient across various experimental conditions.

      How much better are the demonstrated results compared to a simple object detection pipeline (i.e. FasterRCNN or YOLO on the raw heat images)?

      We compared the performance of DeePosit to a classic object detection algorithm: YOLOv8. We trained YOLOv8 both on a single image input (YOLOv8 Gray) and on 3 image inputs representing a sequence of three time points around the ground truth event (t): t+0, t+10, and t+30 seconds (YOLOv8 RGB). DeePosit achieved significantly better accuracy over both YOLOv8 alternatives. YOLOv8 RGB achieved better accuracy than YOLOv8 Gray, suggesting that temporal information is important for this task. It's worth mentioning that while YOLOv8 requires annotator to draw rectangles surrounding each urine spot or feces as part of the training set, our algorithm training set used just a single click inside each spot, allowing faster generation of a training sets. 

      The method is implemented with a mix of MATLAB and Python.

      That is right.

      One proposed reason why this method is better than a human annotator is that it "is not biased." While they may mean it isn't influenced by what the researcher wants to see, the model they present is still statistically biased since each object class has a different recall score. This wasn't investigated. In general, there was little discussion of the quality of the model. 

      We tested the consistency of the algorithm performance across gender, space, time and experiment type (SP, SxP, and ESP). We found consistent levels of performance across all categories (see Figure 3), suggesting that errors made by the algorithm are uniform across conditions, hence should ne create any bias of the results. Specifically, the detection accuracy is similar between urine and feces, hence should not impose a bias between the various object classes.

      Precision scores were not reported.

      In the revised paper we report recall, precision, and F1 scores in the caption of the relevant figures and also supply Excel files with the full statistics for each of the figures.

      Is a recall value of 78.6% good for the types of studies they and others want to carry out? What are the implications of using the resulting data in a study?

      We have trained the model better, using a larger number of video clips. The increase in training set size improved the overall accuracy from a mean F1 score of 0.81 in the previous version to a mean F1 score of 0.891 (see Figure 2 and Figure 3) in the current version. Specifically, the F1 score for urine detection was improved from 0.79 to 0.88. 

      How do these results compare to the data that would be generated by a "biased human?"

      We further evaluated the accuracy of the DeePosit algorithm in comparison to a second human annotator and found that the algorithm accuracy is comparable to human-level accuracy (Figure 3).

      5 out of the 6 figures in the paper relate not to the method but to results from a study whose data was generated from the method. This makes a paper, which, based on the title, is about the method, much longer and more complicated than if it focused on the method.

      We appreciate the reviewer's comment, but the analysis of this new dataset by DeePosit demonstrates how the algorithm may be used to reveal novel and distinguishable dynamics of urination and defecation activities during social interactions, which were not yet reported. 

      Also, even in the context of the experiments, there is no discussion of the implications of analyzing data that was generated from a method with precision and recall values of only 7080%. Surely this noise has an effect on how to correctly calculate p-values etc. Instead, the authors seem to proceed like the generated data is simply correct.

      As mentioned above, the increase in training set size improved the overall accuracy from a mean F1 score of 0.81 in the previous version to a mean F1 score of 0.891 (see Figure 2 and Figure 3) in the current version. Specifically, the F1 score for urine detection was improved from 0.79 to 0.88.  

      Reviewer #3 (Public Review):

      Summary:

      The authors introduce a tool that employs thermal cameras to automatically detect urine and feces deposits in rodents. The detection process involves a heuristic to identify potential thermal regions of interest, followed by a transformer network-based classifier to differentiate between urine, feces, and background noise. The tool's effectiveness is demonstrated through experiments analyzing social preference, stress response, and temporal dynamics of deposits, revealing differences between male and female mice.

      Strengths:

      The method effectively automates the identification of deposits

      The application of the tool in various behavioral tests demonstrates its robustness and versatility.

      The results highlight notable differences in behavior between male and female mice

      We thank the reviewer for the positive assessment of our work.

      Weaknesses:

      The definition of 'start' and 'end' periods for statistical analysis is arbitrary. A robustness check with varying time windows would strengthen the conclusions.

      In all the statistical tests conducted in the revised manuscript, we have used a time period of 4 minutes for the analysis. We did not used the last minute of each stage for the analysis since the input of DeePosit requires 1 minute of video after the event. Nevertheless, we also conducted the same tests using a 5-minute period and found similar results (Figure 5—Figure Supplement 1).

      The paper could better address the generalizability of the tool to different experimental setups, environments, and potentially other species.

      As mentioned above, we tested the algorithm performance on a second strain of mice (male C57BL/6) in a different environmental condition (white arena instead of a black one) and found that the algorithm achieves comparable accuracy, even though C57BL/6 mice and white arena were not included in the training set. Thus, the algorithm seems to be robust and efficient across various experimental conditions.

      The results are based on tests of individual animals, and there is no discussion of how this method could be generalized to experiments tracking multiple animals simultaneously in the same arena (e.g., pair or collective behavior tests, where multiple animals may deposit urine or feces).

      At the moment, the algorithm cannot be applied for multiple animals freely moving in the same arena. However, in the revised manuscript we explicitly discussed what is needed for adapting the algorithm to perform such analyses.

      Recommendations for the authors: 

      -  Add a note and/or perform additional calculations to show that the results do not depend on the specific definitions of 'start' and 'end' periods. For instance, vary the time window thresholds and recalculate the statistics using different windows (e.g., 1-5 minutes instead of 1-4 minutes).

      In all the statistical tests conducted in the revised manuscript, we have used a time period of 4 minutes for the analysis. We did not use the last minute of each stage for the analysis since the input of DeePosit requires 1 minute of video after the event. Nevertheless, we also conducted the same tests using a 5-minute period and found similar results (Figure 5—Figure Supplement 1).

      - Condense Figures 4, 5, and 6 to simplify the presentation. Focus on demonstrating the effectiveness of the tool rather than detailed experimental outcomes, as the primary contribution of this paper is methodological.

      We have added to the revised manuscript one technical figure (Figure 3) comparing the accuracy of the algorithm performance across gender, space, time, and experiment type (SP, SxP, and ESP) as well as comparing its performance to a second human annotator and to YOLOv8. One more partially technical figure (Figure 5) compares the results of the algorithm between white ICR mice in the black arena and black C57BL/6 mice in the white arena. Thus, only Figures 4 and 6 show detailed experimental outcomes.

      - Provide more detail on how the preliminary detection procedure and parameters might need adjustment for different experimental setups or conditions. Discuss potential adaptations for field settings or more complex environments.

      As for the algorithm parameters, we tested the effect of the main parameter of the preliminary detection (the temperature threshold for the detection of a new blob) and found that a threshold of 1.6°C gave the best accuracy and used this parameter for all of the experiments instead of 1.1°C which was used in the original manuscript. It's worth mentioning that the performance is quite stable (mean F1 score of 0.88-0.89) for the thresholds between 1.1°C and 3°.

      We also checked if changing the input length of the video clip that is fed to the classifier affects the accuracy by training the classifier with -11..30 seconds video clips (41 seconds in total) instead of -11..60 seconds (71 seconds in total) and found no difference in accuracy. 

      Overall, the algorithm's accuracy seems to be rather stable across various choices of parameters.

      Editor's note:

      Should you choose to revise your manuscript, please ensure your manuscript includes full statistical reporting including exact p-values wherever possible alongside the summary statistics (test statistic and df) and 95% confidence intervals. These should be reported for all key questions and not only when the p-value is less than 0.05 in the main manuscript.

      We have deposited the detailed statistics of each figure in https://github.com/davidpl2/DeePosit/tree/main/FigStat/PostRevision

    1. Author response:

      The following is the authors’ response to the original reviews

      eLife Assessment

      This valuable study investigates how hearing impairment affects neural encoding of speech, in particular the encoding of hierarchical linguistic information. The current analysis provides incomplete evidence that hearing impairment affects speech processing at multiple levels, since the novel analysis based on HM-LSTM needs further justification. The advantage of this method should also be further explained. The study can also benefit from building a stronger link between neural and behavioral data.

      We sincerely thank the editors and reviewers for their detailed and constructive feedback.

      We have revised the manuscript to address all of the reviewers’ comments and suggestions. The primary strength of our methods lies in the use of the HM-LSTM model, which simultaneously captures linguistic information at multiple levels, ranging from phonemes to sentences. As such, this model can be applied to other questions regarding hierarchical linguistic processing. We acknowledge that our current behavioral results from the intelligibility test may not fully differentiate between the perception of lower-level acoustic/phonetic information and higher-level meaning comprehension. However, it remains unclear what type of behavioral test would effectively address this distinction. We aim to xplore this connection further in future studies.

      Public Reviews:

      Reviewer #1 (Public Review):

      The authors are attempting to use the internal workings of a language hierarchy model, comprising phonemes, syllables, words, phrases, and sentences, as regressors to predict EEG recorded during listening to speech. They also use standard acoustic features as regressors, such as the overall envelope and the envelopes in log-spaced frequency bands. This is valuable and timely research, including the attempt to show differences between normal-hearing and hearing-impaired people in these regards. I will start with a couple of broader questions/points, and then focus my comments on three aspects of this study: The HM-LSTM language model and its usage, the time windows of relevant EEG analysis, and the usage of ridge regression.

      Firstly, as far as I can tell, the OSF repository of code, data, and stimuli is not accessible without requesting access. This needs to be changed so that reviewers and anybody who wants or needs to can access these materials. 

      It is my understanding that keeping the repository private during the review process and making them public after acceptance is standard practice. As far as I understand, although the OSF repository was private, anyone with the link should be able to access it. I have now made the repository public.

      What is the quantification of model fit? Does it mean that you generate predicted EEG time series from deconvolved TRFs, and then give the R2 coefficient of determination between the actual EEG and predicted EEG constructed from the convolution of TRFs and regressors? Whether or not this is exactly right, it should be made more explicit.

      Model fit was measured by spatiotemporal cluster permutation tests (Maris & Oostenveld, 2007) on the contrasts of the timecourses of the z-transformed coefficient of determination (R<sup>2</sup>). For instance, to assess whether words from the attended stimuli better predict EEG signals during the mixed speech compared to words from the unattended stimuli, we used the 150dimensional vectors corresponding to the word layer from our LSTM model for the attended and unattended stimuli as regressors. We then fit these regressors to the EEG signals at 9 time points (spanning -100 ms to 300 ms around the sentence offsets, with 50 ms intervals). We then conducted one-tailed two-sample t-tests to determine whether the differences in the contrasts of the R<sup>2</sup> timecourses were statistically significant. Note that we did not perform TRF analyses. We have clarified this description in the “Spatiotemporal clustering analysis” section of the “Methods and Materials” on p.10 of the manuscript.

      About the HM-LSTM:

      • In the Methods paragraph about the HM-LSTM, a lot more detail is necessary to understand how you are using this model. Firstly, what do you mean that you "extended" it, and what was that procedure? 

      The original HM-LSTM model developed by Chung et al. (2017) consists of only two levels: the word level and the phrase level (Figure 1b from their paper). By “extending” the model, we mean that we expanded its architecture to include five levels: phoneme, syllable, word, phrase, and sentence. Since our input consists of phoneme embeddings, we cannot directly apply their model, so we trained our model on the WenetSpeech corpus (Zhang et al., 2021), which provides phoneme-level transcripts. We have added this clarification on p.4 of the manuscript.

      • And generally, this is the model that produces most of the "features", or regressors, whichever word we like, for the TRF deconvolution and EEG prediction, correct? 

      Yes, we extracted the 2048-dimensional hidden layer activity from the model to represent features for each sentence in our speech stimuli at the phoneme, syllable, word, phrase and sentence levels. But we did not perform any TRF deconvolution, we fit these features (downsampled to 150-dimension using PCA) to the EEG signals at 9 timepoints around the offset of each sentence using ridge regression. We have now added a multivariate TRF (mTRF) analysis following Reviewer 3’s suggestions, and the results showed similar patterns to the current results (see Figure S2). We have added the clarification in the “Ridge regression at different time latencies” section of the “Methods and Materials” on p.10 of the manuscript.

      Resutls from the mTRF analyses were added on p.7 of the manuscript.

      • A lot more detail is necessary then, about what form these regressors take, and some example plots of the regressors alongside the sentences.

      The linguistic regressors are just 5 150-dimensional vectors, each corresponding to one linguistic level, as shown in Figure 1B.

      • Generally, it is necessary to know what these regressors look like compared to other similar language-related TRF and EEG/MEG prediction studies. Usually, in the case of e.g. Lalor lab papers or Simon lab papers, these regressors take the form of single-sample event markers, surrounded by zeros elsewhere. For example, a phoneme regressor might have a sample up at the onset of each phoneme, and a word onset regressor might have a sample up at the onset of each word, with zeros elsewhere in the regressor. A phoneme surprisal regressor might have a sample up at each phoneme onset, with the value of that sample corresponding to the rarity of that phoneme in common speech. Etc. Are these regressors like that? Or do they code for these 5 linguistic levels in some other way? Either way, much more description and plotting is necessary in order to compare the results here to others in the literature.

      No, these regressors were not like that. They were 150-dimensional vectors (after PCA dimension reduction) extracted from the hidden layers of the HM-LSTM model. After training the model on the WenetSpeech corpus, we ran it on our speech stimuli and extracted representations from the five hidden layers to correspond to the five linguistic levels. As mentioned earlier, we did not perform TRF analyses; instead, we used ridge regression to predict EEG signals around the offset of each sentence, a method commonly employed in the literature (e.g., Caucheteux & King, 2022; Goldstein et al., 2022; Schmitt et al., 2021; Schrimpf et al., 2021). For instance, Goldstein et al. (2022) used word embeddings from GPT-2 to predict ECoG activity surrounding the onset of each word during naturalistic listening. We have included these literatures on p.3 in the manuscript, and the method is illustrated in Figure 1B.

      • You say that the 5 regressors that are taken from the trained model's hidden layers do not have much correlation with each other. However, the highest correlations are between syllable and sentence (0.22), and syllable and word (0.17). It is necessary to give some reason and interpretation of these numbers. One would think the highest correlation might be between syllable and phoneme, but this one is almost zero. Why would the syllable and sentence regressors have such a relatively high correlation with each other, and what form do those regressors take such that this is the case?

      All the regressors are represented as 2048-dimensional vectors derived from the hidden layers of the trained HM-LSTM model. We applied the trained model to all 284 sentences in our stimulus text, generating a set of 284 × 2048-dimensional vectors. Next, we performed Principal Component Analysis (PCA) on the 2048 dimensions and extracted the first 100 principal components (PCs), resulting in 284 × 100-dimensional vectors for each regressor. These 284 × 100 matrices were then flattened into 28,400-dimensional vectors. Subsequently, we computed the correlation matrix for the z-transformed 28,400-dimensional vectors of our five linguistic regressors. The code for this analysis, lstm_corr.py, can be found in our OSF repository. We have added a section “Correlation among linguistic features” in “Materials and Methods” on p.10 of the manuscript.

      We consider the observed coefficients of 0.17 and 0.22 to be relatively low compared to prior model-brain alignment studies which report correlation coefficients above 0.5 for linguistic regressors (e.g., Gao et al., 2024; Sugimoto et al., 2024). In Chinese, a single syllable can also function as a word, potentially leading to higher correlations between regressors for syllables and words. However, we refrained from overinterpreting the results to suggest a higher correlation between syllable and sentence compared to syllable and word. A paired ttest of the syllable-word coefficients versus syllable-sentence coefficients across the 284 sentences revealed no significant difference (t(28399)=-3.96, p=1). We have incorporated this information into p.5 of the manuscript.

      • If these regressors are something like the time series of zeros along with single sample event markers as described above, with the event marker samples indicating the onset of the relevant thing, then one would think e.g. the syllable regressor would be a subset of the phoneme regressor because the onset of every syllable is a phoneme. And the onset of every word is a syllable, etc.

      All the regressors are aligned to 9 time points surrounding sentence offsets (-100 ms to 300 ms with a 50 ms interval). This is because all our regressors are taken from the HM-LSTM model, where the input is the phoneme representation of a sentence (e.g., “zh ə_4 y ie_3 j iəu_4 x iaŋ_4 sh uei_3 y ii_2 y aŋ_4”). For each unit in the sentence, the model generates five 2048dimensional vectors, each corresponding to the five linguistic levels of the entire sentence. We have added the clarification on p.11 of the manuscript.

      For the time windows of analysis:

      • I am very confused, because sometimes the times are relative to "sentence onset", which would mean the beginning of sentences, and sometimes they are relative to "sentence offset", which would mean the end of sentences. It seems to vary which is mentioned. Did you use sentence onsets, offsets, or both, and what is the motivation?

      • If you used onsets, then the results at negative times would not seem to mean anything, because that would be during silence unless the stimulus sentences were all back to back with no gaps, which would also make that difficult to interpret.

      • If you used offsets, then the results at positive times would not seem to mean anything, because that would be during silence after the sentence is done. Unless you want to interpret those as important brain activity after the stimuli are done, in which case a detailed discussion of this is warranted.

      Thank you very much for pointing this out. All instances of “sentence onset” were typos and should be corrected to “sentence offset.” We chose offset because the regressors are derived from the hidden layer activity of our HM-LSTM model, which processes the entire sentence before generating outputs. We have now corrected all the typos. In continuous speech, there is no distinct silence period following sentence offsets. Additionally, lexical or phrasal processing typically occurs 200 ms after stimulus offsets (Bemis & Pylkkanen, 2011; Goldstein et al., 2022; Li et al., 2024; Li & Pylkkänen, 2021). Therefore, we included a 300 ms interval after sentence offsets in our analysis, as our regressors encompass linguistic levels up to the sentence level. We have added this motivation on p.11 of the manuscript.

      • For the plots in the figures where the time windows and their regression outcomes are shown, it needs to be explicitly stated every time whether those time windows are relative to sentence onset, offset, or something else.

      Completely agree and thank you very much for the suggestion. We have now added this information on Figure 4-6.

      • Whether the running correlations are relative to sentence onset or offset, the fact that you can have numbers outside of the time of the sentence (negative times for onset, or positive times for offset) is highly confusing. Why would the regressors have values outside of the sentence, meaning before or after the sentence/utterance? In order to get the running correlations, you presumably had the regressor convolved with the TRF/impulse response to get the predicted EEG first. In order to get running correlation values outside the sentence to correlate with the EEG, you would have to have regressor values at those time points, correct? How does this work?

      As mentioned earlier, we did not perform TRF analyses or convolve the regressors. Instead, we conducted regression analyses at each of the 9 time points surrounding the sentence offsets, following standard methods commonly used in model-brain alignment studies (e.g., Gao et al., 2024; Goldstein et al., 2022). The time window of -100 to 300 ms was selected based on prior findings that lexical and phrasal processing typically occurs 200–300 ms after word offsets (Bemis & Pylkkanen, 2011; Goldstein et al., 2022; Li et al., 2024; Li & Pylkkänen, 2021). Additionally, we included the -100 to 200 ms time period in our analysis to examine phoneme and syllable level processing (cf. Gwilliams et al., 2022). We have added the clarification on p. of the manuscript.

      • In general, it seems arbitrary to choose sentence onset or offset, especially if the comparison is the correlation between predicted and actual EEG over the course of a sentence, with each regressor. What is going on with these correlations during the middle of the sentences, for example? In ridge regression TRF techniques for EEG/MEG, the relevant measure is often the overall correlation between the predicted and actual, calculated over a longer period of time, maybe the entire experiment. Here, you have calculated a running comparison between predicted and actual, and thus the time windows you choose to actually analyze can seem highly cherry-picked, because this means that most of the data is not actually analyzed.

      The rationale for choosing sentence offsets instead of onsets is that we are aligning the HM-LSTM model’s activity with EEG responses, and the input to the model consists of phoneme representations of the entire sentence at one time. In other words, the model needs to process the whole sentence before generating representations at each linguistic level. Therefore, the corresponding EEG responses should also align with the sentence offsets, occurring after participants have seen the complete sentence. The ridge regression followed the common practice in model-brain alignment studies (e.g., Gao et al., 2024; Goldstein et al., 2022; Huth et al., 2016; Schmitt et al., 2021; Schrimpf et al., 2021), and the time window is not cherrypicked but based on prior literature reporting lexical and sublexical processing at these time period (e.g., Bemis & Pylkkanen, 2011; Goldstein et al., 2022; Gwilliams et al., 2022; Li et al., 2024; Li & Pylkkänen, 2021).

      • In figures 5 and 6, some of the time window portions that are highlighted as significant between the two lines have the lines intersecting. This looks like, even though you have found that the two lines are significantly different during that period of time, the difference between those lines is not of a constant sign, even during that short period. For instance, in figure 5, for the syllable feature, the period of 0 - 200 ms is significantly different between the two populations, correct? But between 0 and 50, normal-hearing are higher, between 50 and 150, hearing-impaired are higher, and between 150 and 200, normal-hearing are higher again, correct? But somehow they still end up significantly different overall between 0 and 200 ms. More explanation of occurrences like these is needed.

      The intersecting lines in Figures 5 and represent the significant time windows for withingroup comparisons (i.e., significant model fit compared to 0). They do not depict betweengroup comparisons, as no significant contrasts were found between the groups. For example, in Figure 1, the significant time windows for the acoustic models are shown separately for the hearing-impaired and normal-hearing groups. No significant differences were observed, as indicated by the sensor topography. We have now clarified this point in the captions for Figures 5 and 6.

      Using ridge regression:

      • What software package(s) and procedure(s) were specifically done to accomplish this? If this is ridge regression and not just ordinary least squares, then there was at least one non-zero regularization parameter in the process. What was it, how did it figure in the modeling and analysis, etc.?

      The ridge regression was performed using customary python codes, making heavy use of the sklearn (v1.12.0) package. We used ridge regression instead of ordinary least squares regression because all our linguistic regressors are 150-dimensional dense vectors, and our acoustic regressors are 130-dimension vectors (see “Acoustic features of the speech stimuli” in “Materials and Methods”). We kept the default regularization parameter (i.e., 1). This ridge regression methods is commonly used in model-brain alignment studies, where the regressors are high-dimensional vectors taken from language models (e.g., Gao et al., 2024; Goldstein et al., 2022; Huth et al., 2016; Schmitt et al., 2021; Schrimpf et al., 2021). The code ridge_lstm.py can be found in our OSF repository, and we have added the more detailed description on p.11 of the manuscript.

      • It sounds like the regressors are the hidden layer activations, which you reduced from 2,048 to 150 non-acoustic, or linguistic, regressors, per linguistic level, correct? So you have 150 regressors, for each of 5 linguistic levels. These regressors collectively contribute to the deconvolution and EEG prediction from the resulting TRFs, correct? This sounds like a lot of overfitting. How much correlation is there from one of these 150 regressors to the next? Elsewhere, it sounds like you end up with only one regressor for each of the 5 linguistic levels. So these aspects need to be clarified.

      • For these regressors, you are comparing the "regression outcomes" for different conditions; "regression outcomes" are the R2 between predicted and actual EEG, which is the coefficient of determination, correct? If this is R2, how is it that you have some negative numbers in some of the plots? R2 should be only positive, between 0 and 1.

      Yes we reduced 2048-dimensional vectors for each of the 5 linguistic levels to 150 using PCA, mainly for saving computational resources. We used ridge regression, following the standard practice in the field (e.g., Gao et al., 2024; Goldstein et al., 2022; Huth et al., 2016; Schmitt et al., 2021; Schrimpf et al., 2021). 

      Yes, the regression outcomes are the R<sup>2</sup> values representing the fit between the predicted and actual EEG data. However, we reported normalized R<sup>2</sup> values which are ztransformed in the plots. All our spatiotemporal cluster permutation analyses were conducted using the z-transformed R<sup>2</sup> values. We have added this clarification both in the figure captions and on p.11 of the manuscript. As a side note, R<sup>2</sup> values can be negative because they are not the square of a correlation coefficient. Rather, R<sup>2</sup> compares the fit of the chosen model to that of a horizontal straight line (the null hypothesis). If the chosen model fits the data worse than the horizontal line, then R<sup>2</sup> value becomes negative: https://www.graphpad.com/support/faq/how-can-rsup2sup-be-negative 

      Reviewer #2 (Public Review):

      This study compares neural responses to speech in normal-hearing and hearing-impaired listeners, investigating how different levels of the linguistic hierarchy are impacted across the two cohorts, both in a single-talker and multi-talker listening scenario. It finds that, while normal-hearing listeners have a comparable cortical encoding of speech-in-quiet and attended speech from a multi-talker mixture, participants with hearing impairment instead show a reduced cortical encoding of speech when it is presented in a competing listening scenario. When looking across the different levels of the speech processing hierarchy in the multi-talker condition, normal-hearing participants show a greater cortical encoding of the attended compared to the unattended stream in all speech processing layers - from acoustics to sentencelevel information. Hearing-impaired listeners, on the other hand, only have increased cortical responses to the attended stream for the word and phrase levels, while all other levels do not differ between attended and unattended streams.

      The methods for modelling the hierarchy of speech features (HM-LSTM) and the relationship between brain responses and specific speech features (ridge-regression) are appropriate for the research question, with some caveats on the experimental procedure. This work offers an interesting insight into the neural encoding of multi-talker speech in listeners with hearing impairment, and it represents a useful contribution towards understanding speech perception in cocktail-party scenarios across different hearing abilities. While the conclusions are overall supported by the data, there are limitations and certain aspects that require further clarification.

      (1) In the multi-talker section of the experiment, participants were instructed to selectively attend to the male or the female talker, and to rate the intelligibility, but they did not have to perform any behavioural task (e.g., comprehension questions, word detection or repetition), which could have demonstrated at least an attempt to comply with the task instructions. As such, it is difficult to determine whether the lack of increased cortical encoding of Attended vs. Unattended speech across many speech features in hearing-impaired listeners is due to a different attentional strategy, which might be more oriented at "getting the gist" of the story (as the increased tracking of only word and phrase levels might suggest), or instead it is due to hearing-impaired listeners completely disengaging from the task and tuning back in for selected key-words or word combinations. Especially the lack of Attended vs. Unattended cortical benefit at the level of acoustics is puzzling and might indicate difficulties in performing the task. I think this caveat is important and should be highlighted in the Discussion section. RE: Thank you very much for the suggestion. We admit that the hearing-impaired listeners might adopt different attentional strategies or potentially disengage from the task due to comprehension difficulties. However, we would like to emphasize that our hearing-impaired participants have extended high-frequency (EHF) hearing loss, with impairment only at frequencies above 8 kHz. Their condition is likely not severe enough to cause them to adopt a markedly different attentional strategy for this task. Moreover, it is possible that our normalhearing listeners may also adopt varying attentional strategies, yet the comparison still revealed notable differences.We have added the caveat in the Discussion section on p.8 of the manuscript.

      (2) In the EEG recording and preprocessing section, you state that the EEG was filtered between 0.1Hz and 45Hz. Why did you choose this very broadband frequency range? In the literature, speech responses are robustly identified between 0.5Hz/1Hz and 8Hz. Would these results emerge using a narrower and lower frequency band? Considering the goal of your study, it might also be interesting to run your analysis pipeline on conventional frequency bands, such as Delta and Theta, since you are looking into the processing of information at different temporal scales.

      Indeed, we have decomposed the epoched EEG time series for each section into six classic frequency bands components (delta 1–3 Hz, theta 4–7 Hz, alpha 8–12 Hz, beta 12–20 Hz, gamma 30–45 Hz) by convolving the data with complex Morlet wavelets as implemented in MNE-Python (version 0.24.0). The number of cycles in the Morlet wavelets was set to frequency/4 for each frequency bin. The power values for each time point and frequency bin were obtained by taking the square root of the resulting time-frequency coefficients. These power values were normalized to reflect relative changes (expressed in dB) with respect to the 500 ms pre-stimulus baseline. This yielded a power value for each time point and frequency bin for each section. We specifically examined the delta and theta bands, and computed the correlation between the regression outcome (R<sup>2</sup> in the shape of number of subject * sensor * time were flattened for computing correlation) for the five linguistic predictors from these bands and those obtained using data from all frequency bands. The results showed high correlation coefficients (see the correlation matrix in Supplementary Figures S2 for the attended and unattended speech). Therefore, we opted to use the epoched EEG data from all frequency bands for our analyses. We have added this clarification in the Results section on p.5 and the “EEG recording and preprocessing” section in “Materials and Methods” on p.11 of the manuscript.

      (3) A paragraph with more information on the HM-LSTM would be useful to understand the model used without relying on the Chung et al. (2017) paper. In particular, I think the updating mechanism of the model should be clarified. It would also be interesting to modify the updating factor of the model, along the lines of Schmitt et al. (2021), to assess whether a HM-LSTM with faster or slower updates can better describe the neural activity of hearing-impaired listeners. That is, perhaps the difference between hearing-impaired and normal-hearing participants lies in the temporal dynamics, and not necessarily in a completely different attentional strategy (or disengagement from the stimuli, as I mentioned above).

      Thank you for the suggestion. We have added more details on our HM-LSTM model on p.10 “Hierarchical multiscale LSTM model” in “Materials and Methods”: Our HM-LSTM model consists of 4 layers, at each layer, the model implements a COPY or UPDATE operation at each time step t. The COPY operation maintains the current cell state of without any changes until it receives a summarized input from the lower layer. The UPDATE operation occurs when a linguistic boundary is detected in the layer below, but no boundary was detected at the previous time step t-1. In this case, the cell updates its summary representation, similar to standard RNNs. We agree that exploring modifications to the model’s updating factor would be an interesting direction. However, since we have already observed contrasts between normal-hearing and hearing-impaired listeners using the current model’s update parameters, we believe discussing additional hypotheses would overextend the scope of this paper.

      (4) When explaining how you extracted phoneme information, you mention that "the inputs to the model were the vector representations of the phonemes". It is not clear to me whether you extracted specific phonetic features (e.g., "p" sound vs. "b" sound), or simply the phoneme onsets. Could you clarify this point in the text, please?

      The model inputs were individual phonemes from two sentences, each transformed into a 1024-dimensional vector using a simple lookup table. This lookup table stores embeddings for a fixed dictionary of all unique phonemes in Chinese. This approach is a foundational technique in many advanced NLP models, enabling the representation of discrete input symbols in a continuous vector space. We have added this clarification on p.10 of the manuscript.

      Reviewer #3 (Public Review):

      Summary:

      The authors aimed to investigate how the brain processes different linguistic units (from phonemes to sentences) in challenging listening conditions, such as multi-talker environments, and how this processing differs between individuals with normal hearing and those with hearing impairments. Using a hierarchical language model and EEG data, they sought to understand the neural underpinnings of speech comprehension at various temporal scales and identify specific challenges that hearing-impaired listeners face in noisy settings.

      Strengths:

      Overall, the combination of computational modeling, detailed EEG analysis, and comprehensive experimental design thoroughly investigates the neural mechanisms underlying speech comprehension in complex auditory environments.

      The use of a hierarchical language model (HM-LSTM) offers a data-driven approach to dissect and analyze linguistic information at multiple temporal scales (phoneme, syllable, word, phrase, and sentence). This model allows for a comprehensive neural encoding examination of how different levels of linguistic processing are represented in the brain.

      The study includes both single-talker and multi-talker conditions, as well as participants with normal hearing and those with hearing impairments. This design provides a robust framework for comparing neural processing across different listening scenarios and groups.

      Weaknesses:

      The analyses heavily rely on one specific computational model, which limits the robustness of the findings. The use of a single DNN-based hierarchical model to represent linguistic information, while innovative, may not capture the full range of neural coding present in different populations. A low-accuracy regression model-fit does not necessarily indicate the absence of neural coding for a specific type of information. The DNN model represents information in a manner constrained by its architecture and training objectives, which might fit one population better than another without proving the non-existence of such information in the other group. To address this limitation, the authors should consider evaluating alternative models and methods. For example, directly using spectrograms, discrete phoneme/syllable/word coding as features, and performing feature-based temporal response function (TRF) analysis could serve as valuable baseline models. This approach would provide a more comprehensive evaluation of the neural encoding of linguistic information.

      Our acoustic features are indeed direct the broadband envelopes and the log-mel spectrograms of the speech streams. The amplitude envelope of the speech signal was extracted using the Hilbert transform. The 129-dimension spectrogram and 1-dimension envelope were concatenated to form a 130-dimension acoustic feature at every 10 ms of the speech stimuli. Given the duration of our EEG recordings, which span over 10 minutes, conducting multivariate TRF (mTRF) analysis with such high-dimensional predictors was not feasible. Instead, we used ridge regression to predict EEG responses across 9 temporal latencies, ranging from -100 ms to +300 ms, with additional 50 ms latencies surrounding sentence offsets. To evaluate the model's performance, we extracted the R<sup>2</sup> values at each latency, providing a temporal profile of regression performance over the analyzed time period. This approach is conceptually similar to TRF analysis.

      We agree that including baseline models for the linguistic features is important, and we have now added results from mTRF analysis using phoneme, syllable, word, phrase, and sentence rates as discrete predictors (i.e., marking a value of 1 at each unit boundary offset). Our EEG data spans the entire 10-minute duration for each condition, sampled at 10-ms intervals. The TRF results for our main comparison—attended versus unattended conditions— showed similar patterns to those observed using features from our HM-LSTM model. At the phoneme and syllable levels, normal-hearing listeners showed marginally significantly higher TRF weights for attended speech compared to unattended speech at approximately -80 to 150 ms after phoneme offsets (t=2.75, Cohen’s d=0.87, p=0.057), and 120 to 210 ms after syllable offsets (t=3.96, Cohen’s d=0.73d = 0.73, p=0.083). At the word and phrase levels, normalhearing listeners exhibited significantly higher TRF weights for attended speech compared to unattended speech at 190 to 290 ms after word offsets (t=4, Cohen’s d=1.13, p=0.049), and around 120 to 290 ms after phrase offsets (t=5.27, Cohen’s d=1.09, p=0.045). For hearing-impaired listeners, marginally significant effects were observed at 190 to 290 ms after word offsets (t=1.54, Cohen’s d=0.6, p=0.059), and 180 to 290 ms after phrase offsets (t=3.63, Cohen’s d=0.89, p=0.09). These results have been added on p.7 of the manuscript, and the corresponding figure is included as Supplementary F2.

      It is not entirely clear if the DNN model used in this study effectively serves the authors' goal of capturing different linguistic information at various layers. Specifically, the results presented in Figure 3C are somewhat confusing. While the phonemes are labeled, the syllables, words, phrases, and sentences are not, making it difficult to interpret how the model distinguishes between these levels of linguistic information. The claim that "Hidden-layer activity for samevowel sentences exhibited much more similar distributions at the phoneme and syllable levels compared to those at the word, phrase and sentence levels" is not convincingly supported by the provided visualizations. To strengthen their argument, the authors should use more quantified metrics to demonstrate that the model indeed captures phrase, word, syllable, and phoneme information at different layers. This is a crucial prerequisite for the subsequent analyses and claims about the hierarchical processing of linguistic information in the brain.

      Quantitative measures such as mutual information, clustering metrics, or decoding accuracy for each linguistic level could provide clearer evidence of the model's effectiveness in this regard.

      In Figure 3C, we used color-coding to represent the activity of five hidden layers after dimensionality reduction. Each dot on the plot corresponds to one test sentence. Only phonemes are labeled because each syllable in our test sentences contains the same vowels (see Table S1). The results demonstrate that the phoneme layer effectively distinguishes different phonemes, while the higher linguistic layers do not. We believe these findings provide evidence that different layers capture distinct linguistic information. Additionally, we computed the correlation coefficients between each pair of linguistic predictors, as shown in Figure 3B. We think this analysis serves a similar purpose to computing the mutual information between pairs of hidden-layer activities for our constructed sentences. Furthermore, the mTRF results based on rate models of the linguistic features we presented earlier align closely with the regression results using the hidden-layer activity from our HM-LSTM model. This further supports the conclusion that our model successfully captures relevant information across these linguistic levels. We have added the clarification on p.5 of the manuscript.

      The formulation of the regression analysis is somewhat unclear. The choice of sentence offsets as the anchor point for the temporal analysis, and the focus on the [-100ms, +300ms] interval, needs further justification. Since EEG measures underlying neural activity in near real-time, it is expected that lower-level acoustic information, which is relatively transient, such as phonemes and syllables, would be distributed throughout the time course of the entire sentence. It is not evident if this limited time window effectively captures the neural responses to the entire sentence, especially for lower-level linguistic features. A more comprehensive analysis covering the entire time course of the sentence, or at least a longer temporal window, would provide a clearer understanding of how different linguistic units are processed over time. Additionally, explaining the rationale behind choosing this specific time window and how it aligns with the temporal dynamics of speech processing would enhance the clarity and validity of the regression analysis.

      Thank you for pointing this out. We chose this time window as lexical or phrasal processing typically occurs 200 ms after stimulus offsets (Bemis & Pylkkanen, 2011; Goldstein et al., 2022; Li et al., 2024; Li & Pylkkänen, 2021). Additionally, we included the -100 to 200 ms time period in our analysis to examine phoneme and syllable level processing (e.g., Gwilliams et al., 2022). Using the entire sentence duration was not feasible, as the sentences in the stimuli vary in length, making statistical analysis challenging. Additionally, since the stimuli consist of continuous speech, extending the time window would risk including linguistic units from subsequent sentences. This would introduce ambiguity as to whether the EEG responses correspond to the current or the following sentence. We have added this clarification on p.12 of the manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      As I mentioned, I think the OSF repo needs to be changed to give anyone access. I would recommend pursuing the lines of thought I mentioned in the public review to make this study complete and to allow it to fit into the already existing literature to facilitate comparisons.

      Yes the OSF folder is now public. We have made revisions following all reviewers’ suggestions.

      There are some typos in figure labels, e.g. 2B.

      Thank you for pointing it out! We have now revised the typo in Figure 2B.

      Reviewer #2 (Recommendations For The Authors):

      (1) I was able to access all of the audio files and code for the study, but no EEG data was shared in the OSF repository. Unless there is some ethical and/or legal constraint, my understanding of eLife's policy is that the neural data should be made publicly available as well.

      The preprocessed EEG data in .npy format in the OSF repository. 

      (2) The line-plots in Figures 4B,5B, and 6B have very similar colours. They would be easier to interpret if you changed the line appearance as well as the colours. E.g., dotted line for hearingimpaired listeners, thick line for normal-hearing.

      Thank you for the suggestion! We have now used thicker lines for normal-impaired listeners in all our line plots.

      Reviewer #3 (Recommendations For The Authors):

      (1) The authors may consider presenting raw event-related potentials (ERPs) or spatiotemporal response profiles before delving into the more complex regression encoding analysis. This would provide a clearer foundational understanding of the neural activity patterns. For example, it is not clear if the main claims, such as the neural activity in the normal-hearing group encoding phonetic information in attended speech better than in unattended speech, are directly observable. Showing ERP differences or spatiotemporal response pattern differences could support these claims more straightforwardly. Additionally, training pattern classifiers to test if different levels of information can be decoded from EEG activity in specific groups could provide further validation of the findings.

      We have now included results from more traditional mTRF analyses using phoneme, syllable, word, phrase, and sentence rates as baseline models (see p.7 of the manuscript and Figure S3). The results show similar patterns to those observed in our current analyses. While we agree that classification analyses would be very interesting, our regression analyses have already demonstrated distinct EEG patterns for each linguistic level. Consequently, classification analyses would likely yield similar results unless a different method for representing linguistic information at these levels is employed. To the best of our knowledge, no other computational model currently exists that can simultaneously represent these linguistic levels.

      (2) Is there any behavioral metric suggesting that these hearing-impaired participants do have deficits in comprehending long sentences? The self-rated intelligibility is useful, but cannot fully distinguish between perceiving lower-level phonetic information vs longer sentence comprehension.

      In the current study, we included only self-rated intelligibility tests. We acknowledge that this approach might not fully distinguish between the perception of lower-level phonetic information and higher-level sentence comprehension. However, it remains unclear what type of behavioral test would effectively address this distinction. Furthermore, our primary aim was to use the behavioral results to demonstrate that our hearing-impaired listeners experienced speech comprehension difficulties in multi-talker environments, while relying on the EEG data to investigate comprehension challenges at various linguistic levels.

      Minor:

      (1) Page 2, second line in Introduction, "Phonemes occur over ..." should be lowercase.

      According to APA format, the first word after the colon is capitalized if it begins a complete sentence (https://blog.apastyle.org/apastyle/2011/06/capitalization-after-colons.html). Here

      the sentence is a complete sentence so we used uppercase for “phonemes”.

      (2) Page 8, second paragraph "...-100ms to 100ms relative to sentence onsets", should it be onsets or offsets?

      This is typo and it should be offsets. We have now revised it.

      References

      Bemis, D. K., & Pylkkanen, L. (2011). Simple composition: An MEG investigation into the comprehension of minimal linguistic phrases. Journal of Neuroscience, 31(8), 2801– 2814.

      Gao, C., Li, J., Chen, J., & Huang, S. (2024). Measuring meaning composition in the human brain with composition scores from large language models. In L.-W. Ku, A. Martins, & V. Srikumar (Eds.), Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 11295–11308). Association for Computational Linguistics.

      Goldstein, A., Zada, Z., Buchnik, E., Schain, M., Price, A., Aubrey, B., Nastase, S. A., Feder, A., Emanuel, D., Cohen, A., Jansen, A., Gazula, H., Choe, G., Rao, A., Kim, C., Casto, C., Fanda, L., Doyle, W., Friedman, D., … Hasson, U. (2022). Shared computational principles for language processing in humans and deep language models. Nature Neuroscience, 25(3), Article 3.

      Gwilliams, L., King, J.-R., Marantz, A., & Poeppel, D. (2022). Neural dynamics of phoneme sequences reveal position-invariant code for content and order. Nature Communications, 13(1), Article 1.

      Huth, A. G., de Heer, W. A., Griffiths, T. L., Theunissen, F. E., & Gallant, J. L. (2016). Natural speech reveals the semantic maps that tile human cerebral cortex. Nature, 532(7600), 453–458.

      Li, J., Lai, M., & Pylkkänen, L. (2024). Semantic composition in experimental and naturalistic paradigms. Imaging Neuroscience, 2, 1–17.

      Li, J., & Pylkkänen, L. (2021). Disentangling semantic composition and semantic association in the left temporal lobe. Journal of Neuroscience, 41(30), 6526–6538.

      Maris, E., & Oostenveld, R. (2007). Nonparametric statistical testing of EEG- and MEG-data. Journal of Neuroscience Methods, 164(1), 177–190.

      Schmitt, L.-M., Erb, J., Tune, S., Rysop, A. U., Hartwigsen, G., & Obleser, J. (2021). Predicting speech from a cortical hierarchy of event-based time scales. Science Advances, 7(49), eabi6070.

      Schrimpf, M., Blank, I. A., Tuckute, G., Kauf, C., Hosseini, E. A., Kanwisher, N., Tenenbaum, J. B., & Fedorenko, E. (2021). The neural architecture of language: Integrative modeling converges on predictive processing. Proceedings of the National Academy of Sciences, 118(45), e2105646118.

      Sugimoto, Y., Yoshida, R., Jeong, H., Koizumi, M., Brennan, J. R., & Oseki, Y. (2024). Localizing Syntactic Composition with Left-Corner Recurrent Neural Network Grammars. Neurobiology of Language, 5(1), 201–224.

    1. Author response:

      The following is the authors’ response to the current reviews.

      Reviewer 1 (Public Review):

      I want to reiterate my comment from the first round of reviews: that I am insufficiently familiar with the intricacies of Maxwell’s equations to assess the validity of the assumptions and the equations being used by WETCOW. The work ideally needs assessing by someone more versed in that area, especially given the potential impact of this method if valid.

      We appreciate the reviewer’s candor. Unfortunately, familiarity with Maxwell’s equations is an essential prerequisite for assessing the veracity of our approach and our claims.

      Effort has been made in these revisions to improve explanations of the proposed approach (a lot of new text has been added) and to add new simulations. However, the authors have still not compared their method on real data with existing standard approaches for reconstructing data from sensor to physical space. Refusing to do so because existing approaches are deemed inappropriate (i.e. they “are solving a different problem”) is illogical.

      Without understanding the importance of our model for brain wave activity (cited in the paper) derived from Maxwell’s equations in inhomogeneous and anisotropic brain tissue, it is not possible to critically evaluate the fundamental difference between our method and the standard so-called “source localization” method which the Reviewer feels it is important to compare our results with. Our method is not “source localization” which is a class of techniques based on an inappropriate model for static brain activity (static dipoles sprinkled sparsely in user-defined areas of interest). Just because a method is “standard” does not make it correct. Rather, we are reconstructing a whole brain, time dependent electric field potential based upon a model for brain wave activity derived from first principles. It is comparing two methods that are “solving different problems” that is, by definition, illogical.

      Similarly, refusing to compare their method with existing standard approaches for spatio-temporally describing brain activity, just because existing approaches are deemed inappropriate, is illogical.

      Contrary to the Reviewer’s assertion, we do compare our results with three existing methods for describing spatiotemporal variations of brain activity.

      First, Figures 1, 2, and 6 compare the spatiotemporal variations in brain activity between our method and fMRI, the recognized standard for spatiotemporal localization of brain activity. The statistical comparison in Fig 3 is a quantitative demonstration of the similarity of the activation patterns. It is important to note that these data are simultaneous EEG/fMRI in order to eliminate a variety of potential confounds related to differences in experimental conditions.

      Second, Fig 4 (A-D) compares our method with the most reasonable “standard” spatiotemporal localization method for EEG: mapping of fields in the outer cortical regions of the brain detected at the surface electrodes to the surface of the skull. The consistency of both the location and sign of the activity changes detected by both methods in a “standard” attention paradigm is clearly evident. Further confirmation is provided by comparison of our results with simultaneous EEG/fMRI spatial reconstructions (E-F) where the consistency of our reconstructions between subjects is shown in Fig 5.

      Third, measurements from intra-cranial electrodes, the most direct method for validation, are compared with spatiotemporal estimates derived from surface electrodes and shown to be highly correlated.

      For example, the authors say that “it’s not even clear what one would compare [between the new method and standard approaches]”. How about:

      (1) Qualitatively: compare EEG activation maps. I.e. compare what you would report to a researcher about the brain activity found in a standard experimental task dataset (e.g. their gambling task). People simply want to be able to judge, at least qualitatively on the same data, what the most equivalent output would be from the two approaches. Note, both approaches do not need to be done at the same spatial resolution if there are constraints on this for the comparison to be useful.

      (2) Quantitatively: compare the correlation scores between EEG activation maps and fMRI activation maps

      These comparison were performed and already in the paper.

      (1) Fig 4 compares the results with a standard attention paradigm (data and interpretation from Co-author Dr Martinez, who is an expert in both EEG and attention). Additionally, Fig 12 shows detected regions of increased activity in a well-known brain circuit from an experimental task (’reward’) with data provided by Co-author Dr Krigolson, an expert in reward circuitry.

      (2) Correlation scores between EEG and fMRI are shown in Fig 3.

      (3) Very high correlation between the directly measured field from intra-cranial electrodes in an epilepsy patient and those estimated from only the surface electrodes is shown in Fig 9.

      There are an awful lot of typos in the new text in the paper. I would expect a paper to have been proof read before submitting.

      We have cleaned up the typos.

      The abstract claims that there is a “direct comparison with standard state-of-the-art EEG analysis in a well-established attention paradigm”, but no actual comparison appears to have been completed in the paper.

      On the contrary, as mentioned above, Fig 4 compares the results of our method with the state-of-the-art surface spatial mapping analysis, with the state-of-the-art time-frequency analysis, and with the state-of-the-art fMRI analysis

      Reviewer 2 (Public Review):

      This is a major rewrite of the paper. The authors have improved the discourse vastly.

      There is now a lot of didactics included but they are not always relevant to the paper.

      The technique described in the paper does in fact leverage several novel methods we have developed over the years for analyzing multimodal space-time imaging data. Each of these techniques has been described in detail in separate publications cited in the current paper. However, the Reviewers’ criticisms stated that the methods were non-standard and they were unfamiliar with them. In lieu of the Reviewers’ reading the original publications, we added a significant amount of text indeed intended to be didactic. However, we can assume the Reviewer that nothing presented was irrelevant to the paper. We certainly had no desire to make the paper any longer than it needed to be.

      The section on Maxwell’s equation does a disservice to the literature in prior work in bioelectromagnetism and does not even address the issues raised in classic text books by Plonsey et al. There is no logical “backwardness” in the literature. They are based on the relative values of constants in biological tissues.

      This criticism highlights the crux of our paper. Contrary to the assertion that we have ignored the work of Plonsey, we have referenced it in the new additional text detailing how we have constructed Maxwell’s Equations appropriate for brain tissue, based on the model suggested by Plonsey that allows the magnetic field temporal variations to be ignored but not the time-dependence electric fields.

      However, the assumption ubiquitous in the vast prior literature of bioelectricity in the brain that the electric field dynamics can be “based on the relative values of constants in biological tissues”, as the Reviewer correctly summarizes, is precisely the problem. Using relative average tissue properties does not take into account the tissue anisotropy necessary to properly account for correct expressions for the electric fields. As our prior publications have demonstrated in detail, taking into account the inhomogeneity and anisotropy of brain tissue in the solution to Maxwell’s Equations is necessary for properly characterizing brain electrical fields, and serves as the foundation of our brain wave theory. This led to the discovery of a new class of brain waves (weakly evanescent transverse cortical waves, WETCOW).

      It is this brain wave model that is used to estimate the dynamic electric field potential from the measurements made by the EEG electrode array. The standard model that ignores these tissue details leads to the ubiquitous “quasi-static approximation” that leads to the conclusion that the EEG signal cannot be spatial reconstructed. It is indeed this critical gap in the existing literature that is the central new idea in the paper.

      There are reinventions of many standard ideas in terms of physics discourses, like Bayesian theory or PCA etc.

      The discussion of Bayesian theory and PCA is in response to the Reviewer complaint that they were unfamiliar with our entropy field decomposition (EFD) method and the request that we compare it with other “standard” methods. Again, we have published extensively on this method (as referenced in the manuscript) and therefore felt that extensive elaboration was unnecessary. Having been asked to provide such elaboration and then being pilloried for it therefore feels somewhat inappropriate in our view. This is particularly disappointing as the Reviewer claims we are presenting “standard” ideas when in fact the EFD is new general framework we developed to overcome the deficiencies in standard “statistical” and probabilistic data analysis methods that are insufficient for characterizing non-linear, nonperiodic, interacting fields that are the rule, rather than the exception, in complex dynamical systems, such as brain electric fields (or weather, or oceans, or ....).

      The EFD is indeed a Bayesian framework, as this is the fundamental starting point for probability theory, but it is developed in a unique and more general fashion than previous data analysis methods. (Again, this is detailed in several references in the papers bibliography. The Reviewer’s requested that an explanation be included in the present paper, however, so we did so). First, Bayes Theorem is expressed in terms of a field theory that allows an arbitrary number of field orders and coupling terms. This generality comes with a penalty, which is that it’s unclear how to assess the significance of the essentially infinite number of terms. The second feature is the introduction of a method by which to determine the significant number of terms automatically from the data itself, via the our theory of entropy spectrum pathways (ESP), which is also detailed in a cited publication, and which produces ranked spatiotemporal modes from the data. Rather than being “reinventions of many standard ideas” these are novel theoretical and computational methods that are central to the EEG reconstruction method presented in the paper.

      I think that the paper remains quite opaque and many of the original criticisms remain, especially as they relate to multimodal datasets. The overall algorithm still remains poorly described. benchmarks.

      It’s not clear how to assess the criticisms that the algorithm is poorly described yet there is too much detail provided that is mistakenly assessed as “standard”. Certainly the central wave equations that are estimated from the data are precisely described, so it’s not clear exactly what the Reviewer is referring to.

      The comparisons to benchmark remain unaddressed and the authors state that they couldn’t get Loreta to work and so aborted that. The figures are largely unaltered, although they have added a few more, and do not clearly depict the ideas. Again, no benchmark comparisons are provided to evaluate the results and the performance in comparison to other benchmarks.

      As we have tried to emphasize in the paper, and in the Response to Reviewers, the standard so-called “source localization” methods are NOT a benchmark, as they are solving an inappropriate model for brain activity. Once again, static dipole “sources” arbitrarily sprinkled on pre-defined regions of interest bear little resemblance to observed brain waves, nor to the dynamic electric field wave equations produced by our brain wave theory derived from a proper solution to Maxwell’s equations in the anisotropic and inhomogeneous complex morphology of the brain.

      The comparison with Loreta was not abandoned because we couldn’t get it to work, but because we could not get it to run under conditions that were remotely similar to whole brain activity described by our theory, or, more importantly, by an rationale theory of dynamic brain activity that might reproduce the exceedingly complex electric field activity observed in numerous neuroscience experiments.

      We take issue with the rather dismissive mention of “a few more” figures that “do not clearly depict the idea” when in fact the figures that have been added have demonstrated additional quantitative validation of the method.


      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer 1 (Public Review):

      The paper proposes a new source reconstruction method for electroencephalography (EEG) data and claims that it can provide far superior spatial resolution than existing approaches and also superior spatial resolution to fMRI. This primarily stems from abandoning the established quasi-static approximation to Maxwell’s equations.<br /> The proposed method brings together some very interesting ideas, and the potential impact is high. However, the work does not provide the evaluations expected when validating a new source reconstruction approach. I cannot judge the success or impact of the approach based on the current set of results. This is very important to rectify, especially given that the work is challenging some long- standing and fundamental assumptions made in the field.

      We appreciate the Reviewer’s efforts in reviewing this paper and have included a significant amount of new text to address their concerns.

      I also find that the clarity of the description of the methods, and how they link to what is shown in the main results hard to follow.

      We have added significantly more detail on the methods, including more accessible explanations of the technical details, and schematic diagrams to visualize the key processing components.

      I am insufficiently familiar with the intricacies of Maxwell’s equations to assess the validity of the assumptions and the equations being used by WETCOW. The work therefore needs assessing by someone more versed in that area. That said, how do we know that the new terms in Maxwell’s equations, i.e. the time-dependent terms that are normally missing from established quasi-static-based approaches, are large enough to need to be considered? Where is the evidence for this?

      The fact that the time-dependent terms are large enough to be considered is essentially the entire focus of the original papers [7,8]. Time-dependent terms in Maxwell’s equations are generally not important for brain electrodynamics at physiological frequencies for homogeneous tissues, but this is not true for areas with stroung inhomogeneity and ansisotropy.

      I have not come across EFD, and I am not sure many in the EEG field will have. To require the reader to appreciate the contributions of WETCOW only through the lens of the unfamiliar (and far from trivial) approach of EFD is frustrating. In particular, what impact do the assumptions of WETCOW make compared to the assumptions of EFD on the overall performance of SPECTRE?

      We have added an entire new section in the Appendix that provides a very basic introduction to EFD and relates it to more commonly known methods, such as Fourier and Independent Components Analyses.

      The paper needs to provide results showing the improvements obtained when WETCOW or EFD are combined with more established and familiar approaches. For example, EFD can be replaced by a first-order vector autoregressive (VAR) model, i.e. y<sub>t</sub> = Ay<sub>t−1</sub> + e<sub>t</sub> (where y<sub>t</sub> is [num<sub>gridpoints</sub> ∗ 1] and A is [num<sub>gridpoints</sub> ∗ num<sub>gridpoints</sub>] of autoregressive parameters).

      The development of EFD, which is independent of WETCOW, stemmed from the necessity of developing a general method for the probabilistic analysis of finitely sampled non-linear interacting fields, which are ubiquitous in measurements of physical systems, of which functional neuroimaging data (fMRI, EEG) are excellent examples. Standard methods (such as VAR) are inadequate in such cases, as discussed in great detail in our EFD publications (e.g., [12,37]). The new appendix on EFD reviews these arguments. It does not make sense to compare EFD with methods which are inappropriate for the data.

      The authors’ decision not to include any comparisons with established source reconstruction approaches does not make sense to me. They attempt to justify this by saying that the spatial resolution of LORETA would need to be very low compared to the resolution being used in SPECTRE, to avoid compute problems. But how does this stop them from using a spatial resolution typically used by the field that has no compute problems, and comparing with that? This would be very informative. There are also more computationally efficient methods than LORETA that are very popular, such as beamforming or minimum norm.

      he primary reason for not comparing with ’source reconstruction’ (SR) methods is that we are are not doing source reconstruction. Our view of brain activity is that it involves continuous dynamical non-linear interacting fields througout the entire brain. Formulating EEG analysis in terms of reconstructing sources is, in our view, like asking ’what are the point sources of a sea of ocean waves’. It’s just not an appropriate physical model. A pre-chosen limited distribution of static dipoles is just a very bad model for brain activity, so much so that it’s not even clear what one would compare. Because in our view, as manifest in our computational implementation, one needs to have a very high density of computational locations throughout the entire brain, including white matter, and the reconstructed modes are waves whose extent can be across the entire brain. Our comments about the low resolution of computational methods for SR techniques really is expressing the more overarching concern that they are not capable of, or even designed for, detecting time-dependent fields of non-linear interacting waves that exist everywhere througout the brain. Moreover, the SR methods always give some answer, but in our view the initial conditions upon which those methods are based (pre-selected regions of activity with a pre-selected number of ’sources’) is a highly influential but artificial set of strong computational constraints that will almost always provide an answer consist with (i.e., biased toward) the expectations of the person formlating the problem, and is therefore potentially misleading.

      In short, something like the following methods needs to be compared:

      (1) Full SPECTRE (EFD plus WETCOW)

      (2) WETCOW + VAR or standard (“simple regression”) techniques

      (3) Beamformer/min norm plus EFD

      (4) Beamformer/min norm plus VAR or standard (“simple regression”) techniques

      The reason that no one has previously ever been able to solve the EEG inverse problem is due to the ubiquitous use of methods that are too ’simple’, i.e., are poor physical models of brain activity. We have spent a decade carefully elucidating the details of this statement in numerous highly technical and careful publications. It therefore serves no purpose to return to the use of these ’simple’ methods for comparison. We do agree, however, that a clearer overview of the advantages of our methods is warranted and have added significant additional text in this revision towards that purpose.

      This would also allow for more illuminating and quantitative comparisons of the real data. For example, a metric of similarity between EEG maps and fMRI can be computed to compare the performance of these methods. At the moment, the fMRI-EEG analysis amounts to just showing fairly similar maps.

      We disagree with this assessment. The correlation coefficient between the spatially localized activation maps is a conservative sufficient statistic for the measure of statistically significant similarity. These numbers were/are reported in the caption to Figure 5, and have now also been moved to, and highlighted in, the main text.

      There are no results provided on simulated data. Simulations are needed to provide quantitative comparisons of the different methods, to show face validity, and to demonstrate unequivocally the new information that SPECTRE can ’potentially’ provide on real data compared to established methods. The paper ideally needs at least 3 types of simulations, where one thing is changed at a time, e.g.:

      (1) Data simulated using WETCOW plus EFD assumptions

      (2) Data simulated using WETCOW plus e.g. VAR assumptions

      (3) Data simulated using standard lead fields (based on the quasi-static Maxwell solutions) plus e.g. VAR assumptions

      These should be assessed with the multiple methods specified earlier. Crucially the assessment should be quantitative showing the ability to recover the ground truth over multiple realisations of realistic noise. This type of assessment of a new source reconstruction method is the expected standard

      We have now provided results on simulated data, along with a discussion on what entails a meaningful simulation comparison. In short, our original paper on the WETCOW theory included a significant number of simulations of predicted results on several spatial and temporal scales. The most relevant simulation data to compare with the SPECTRE imaging results are the cortical wave loop predicted by WETCOW theory and demonstrated via numerical simulation in a realistic brain model derived from high resolution anatomical (HRA) MRI data. The most relevant data with which to compare these simulations are the SPECTRE recontruction from the data that provides the closest approximation to a “Gold Standard” - reconstructions from intra-cranial EEG (iEEG). We have now included results (new Fig 8) that demonstrate the ability of SPECTRE to reconstruct dynamically evolving cortical wave loops in iEEG data acquired in an epilepsy patient that match with the predicted loop predicted theoretically by WETCOW and demonstrated in realistic numerical simulations.

      The suggested comparison with simple regression techniques serves no purpose, as stated above, since that class of analysis techniques was not designed for non-linear, non-Gaussian, coupled interacting fields predicted by the WETCOW model. The explication of this statement is provided in great detail in our publications on the EFD approach and in the new appendix material provided in this revision. The suggested simulation of the dipole (i.e., quasi-static) model of brain activity also serves no purpose, as our WETCOW papers have demonstrated in great detail that is is not a reasonable model for dynamic brain activity.

      Reviewer 2 (Public Review):

      Strengths:

      If true and convincing, the proposed theoretical framework and reconstruction algorithm can revolutionize the use of EEG source reconstructions.

      Weaknesses:

      There is very little actual information in the paper about either the forward model or the novel method of reconstruction. Only citations to prior work by the authors are cited with absolutely no benchmark comparisons, making the manuscript difficult to read and interpret in isolation from their prior body of work.

      We have now added a significant amount of material detailing the forward model, our solution to the inverse problem, and the method of reconstruction, in order to remedy this deficit in the previous version of the paper.

      Recommendations for the authors:

      Reviewer 1 (Recommendations):

      It is not at all clear from the main text (section 3.1) and the caption, what is being shown in the activity patterns in Figures 1 and 2. What frequency bands and time points etc? How are the values shown in the figures calculated from the equations in the methods?

      We have added detailed information on the frequency bands reconstructed and the activity pattern generation and meaning. Additional information on the simultaneous EEG/fMRI acquisition details has been added to the Appendix.

      How have the activity maps been thresholded? Where are the color bars in Figures 1 and 2?

      We have now included that information in new versions of the figures. In addition, the quantitative comparison between fMRI and EEG are presented is now presented in a new Figure 2 (now Figure 3).

      P30 “This term is ignored in the current paper”. Why is this term ignored, but other (time-dependent) terms are not?

      These terms are ignored because they represent higher order terms that complicate the processing (and intepretation) but do not substatially change the main results. A note to this effect has been added to the text.

      The concepts and equations in the EFD section are not very accessible (e.g. to someone unfamiliar with IFT).

      We have added a lengthy general and more accessible description of the EFD method in the Appendix.

      Variables in equation 1, and the following equation, are not always defined in a clear, accessible manner. What is ?

      We have added additional information on how Eqn 1 (now Eqn 3) is derived, and the variables therein.

      In the EFD section, what do you mean conceptually by α, i.e. “the coupled parameters α”?

      This sentence has been eliminated, as it was superfluous and confusing.

      How are the EFD and WETCOW sections linked mathematically? What is ψ (in eqn 2) linked to in the WETCOW section (presumably ϕ<sub>ω</sub>?) ?

      We have added more introductory detail at the beginning of the Results to describe the WETCOW theory and how this is related to the inverse problem for EEG.

      What is the difference between data d and signal s in section 6.1.3? How are they related?

      We have added a much more detailed Appendix A where this (and other) details are provided.

      What assumptions have been made to get the form for the information Hamiltonian in eqn3?

      Eq 3 (now Eqn A.5) is actually very general. The approximations come in when constructing the interaction Hamiltonian H<sub>i</sub>.

      P33 “using coupling between different spatio-temporal points that is available from the data itself” I do not understand what is meant by this.

      This was a poorly worded sentence, but this section has now been replaced by Appendix A, which now contains the sentence that prior information “is contained within the data itself”. This refers to the fact that the prior information consists of correlations in the data, rather than some other measurements independent of the original data. This point is emphasized because in many Bayesian application, prior information consists of knowledge of some quantity that were acquired independently from the data at hand (e.g., mean values from previous experiments)

      Reviewer 2 (Recommendations):

      Abstract

      The first part presents validation from simultaneous EEG/fMRI data, iEEG data, and comparisons with standard EEG analyses of an attention paradigm. Exactly what constitutes adequate validation or what metrics were used to assess performance is surprisingly absent.

      Subsequently, the manuscript examines a large cohort of subjects performing a gambling task and engaging in reward circuits. The claim is that this method offers an alternative to fMRI.

      Introduction

      Provocative statements require strong backing and evidence. In the first paragraph, the “quasi-static” assumption which is dominant in the field of EEG and MEG imaging is questioned with some classic citations that support this assumption. Instead of delving into why exactly the assumption cannot be relaxed, the authors claim that because the assumption was proved with average tissue properties rather than exact, it is wrong. This does not make sense. Citations to the WETCOW papers are insufficient to question the quasi-static assumption.

      The introduction purports to validate a novel theory and inverse modeling method but poorly outlines the exact foundations of both the theory (WETCOW) and the inverse modeling (SPECTRE) work.

      We have added a new introductory subsection (“A physical theory of brain waves”) to the Results section that provides a brief overview of the foundations of the WETCOW theory and an explicit description of why the quasi-static approximation can be abandoned. We have expanded the subsequent subsection (“Solution to the inverse EEG problem”) to more clearly detail the inverse modeling (SPECTRE) method.

      Section 3.2 Validation with fMRI

      Figure 1 supposedly is a validation of this promising novel theoretical approach that defies the existing body of literature in this field. Shockingly, a single subject data is shown in a qualitative manner with absolutely no quantitative comparison anywhere to be found in the manuscript. While there are similarities, there are also differences in reconstructions. What to make out of these discrepancies? Are there distortions that may occur with SPECTRE reconstructions? What are its tradeoffs? How does it deal with noise in the data?

      It is certainly not the case that there are no quantitative comparisons. Correlation coefficients, which are the sufficient statistics for comparison of activation regions, are given in Figure 5 for very specific activation regions. Figure 9 (now Figure 11) shows a t-statistic demonstrating the very high significance of the comparison between multiple subjects. And we have now added a new Figure 7 demonstrating the strongly correlated estimates for full vs surface intra-cranial EEG reconstructions. To make this more clear, we have added a new section “Statistical Significance of the Results”.

      We note that a discussion of the discrepancies between fMRI and EEG was already presented in the Supplementary Material. Therein we discuss the main point that fMRI and EEG are measuring different physical quantities and so should not be expected to be identical. We also highlight the fact that fMRI is prone to significant geometrical distortions for magnetic field inhomogeities, and to physiological noise. To provide more visibility for this important issue, we have moved this text into the Discussion section.

      We do note that geometric distortions in fMRI data due to suboptimal acquisitions and corrections is all too common. This, coupled with the paucity of open source simultaneous fMRI-EEG data, made it difficult to find good data for comparison. The data on which we performed the quantitative statistical comparison between fMRI and EEG (Fig 5) was collected by co-author Dr Martinez, and was of the highest quality and therefore sufficient for comparison. The data used in Fig 1 and 2 was a well publicized open source dataset but had significant fMRI distortions that made quantitative comparison (i.e., correlation coefficents between subregions in the Harvard-Oxford atlas) suboptimal. Nevertheless, we wanted to demonstrate the method in more than one source, and feel that visual similarity is a reasonble measure for this data.

      Section 3.2 Validation with fMRI

      Figure 2 Are the sample slices being shown? How to address discrepancies? How to assume that these are validations when there are such a level of discrepancies?

      It’s not clear what “sample slices” means. The issue of discrepancies is addressed in the response to the previous query.

      Section 3.2 Validation with fMRI

      Figure 3 Similar arguments can be made for Figure 3. Here too, a comparison with source localization benchmarks is warranted because many papers have examined similar attention data.

      Regarding the fMRI/EEG comparison, these data are compared quantitatively in the text and in Figure 5.

      Regarding the suggestion to perform standard ’source localization’ analysis, see responses to Reviewer 1.

      Section 3.2 Validation with fMRI

      Figure 4 While there is consistency across 5 subjects, there are also subtle and not-so-subtle differences.

      What to make out of them?

      Discrepancies in activations patterns between individuals is a complex neuroscience question that we feel is well beyond the scope of this paper.

      Section 3.2 Validation with fMRI

      Figures 5 & 6 Figure 5 is also a qualitative figure from two subjects with no appropriate quantification of results across subjects. The same is true for Figure 6.

      On the contrary, Figure 5 contains a quantitative comparison, which is now also described in the text. A quantitative comparison for the epilepsy data in Fig 6 (and C.4-C.6) is now shown in Fig 7.

      Section 3.2 Validation with fMRI

      Given the absence of appropriate “validation” of the proposed model and method, it is unclear how much one can trust results in Section 4.

      We believe that the quantitative comparisons extant in the original text (and apparently missed by the Reviewer) along with the additional quantitative comparisons are sufficient to merit trust in Section 4.

      Section 3.2 Validation with fMRI

      What are the thresholds used in maps for Figure 7? Was correction for multiple comparisons performed? The final arguments at the end of section 4 do not make sense. Is the claim that all results of reconstructions from SPECTRE shown here are significant with no reason for multiple comparison corrections to control for false positives? Why so?

      We agree that the last line in Section 4 is misleading and have removed it.

      Section 3.2 Validation with fMRI

      Discussion is woefully inadequate in addition to the inconclusive findings presented here.

      We have added a significant amount of text to the Discussion to address the points brought up by the Reviewer. And, contrary to the comments of this Reviewer, we believe the statistically significant results presented are not “inconclusive”.

      Supplementary Materials

      This reviewer had an incredibly difficult time understanding the inverse model solution. Even though this has been described in a prior publication by the authors, it is important and imperative that all details be provided here to make the current manuscript complete. The notation itself is so nonstandard. What is Σ<sup>ij</sup>, δ<sup>ij</sup>? Where is the reference for equation (1)? What about the equation for <sup>ˆ</sup>(R)? There are very few details provided on the exact implementation details for the Fourier-space pseudo-spectral approach. What are the dimensions of the problem involved? How were different tissue compartments etc. handled? Equation 1 holds for the entire volume but the measurements are only made on the surface. How was this handled? What is the WETCOW brain wave model? I don’t see any entropy term defined anywhere - where is it?

      We have added more detail on the theoretical and numerical aspects of the inverse problem in two new subsections “Theory” and “Numerical Implementation” in the new section “Solution to the inverse EEG problem”.

      Supplementary Materials

      So, how can one understand even at a high conceptual level what is being done with SPECTRE?

      We have added a new subsection “Summary of SPECTRE” that provides a high conceptual level overview of the SPECTRE method outlined in the preceding sections.

      Supplementary Materials

      In order to understand what was being presented here, it required the reader to go on a tour of the many publications by the authors where the difficulty in understanding what they actually did in terms of inverse modeling remains highly obscure and presents a huge problem for replicability or reproducibility of the current work.

      We have now included more basic material from our previous papers, and simplified the presentation to be more accessible. In particular, we have now moved the key aspects of the theoretic and numerical methods, in a more readable form, from the Supplementary Material to the main text, and added a new Appendix that provides a more intuitive and accessible overview of our estimation procedures.

      Supplementary Materials

      How were conductivity values for different tissue types assigned? Is there an assumption that the conductivity tensor is the same as the diffusion tensor? What does it mean that “in the present study only HRA data were used in the estimation procedure?” Does that mean that diffusion MRI data was not used? What is SYMREG? If this refers to the MRM paper from the authors in 2018, that paper does not include EEG data at all. So, things are unclear here.

      The conductivity tensor is not exactly the same as the diffusion tensor in brain tissues, but they are closely related. While both tensors describe transport properties in brain tissue, they represent different physical processes. The conductivity tensor is often assumed to share the same eigenvectors as the diffusion tensor. There is a strong linear relationship between the conductivity and diffusion tensor eigenvalues, as supported by theoretical models and experimental measurements. For the current study we only used the anatomical data for estimatition and assignment of different tissue types and no diffusion MRI data was used. To register between different modalities, including MNI, HRA, function MRI, etc., and to transform the tissue assignment into an appropriate space we used the SYMREG registration method. A comment to the effect has been added to the text.

      Supplementary Materials

      How can reconstructed volumetric time-series of potential be thought of as the EM equivalent of an fMRI dataset? This sentence doesn’t make sense.

      This sentence indeed did not make sense and has been removed.

      Supplementary Materials

      Typical Bayesian inference does not include entropy terms, and entropy estimation doesn’t always lend to computing full posterior distributions. What is an “entropy spectrum pathway”? What is µ∗? Why can’t things be made clear to the reader, instead of incredible jargon used here? How does section 6.1.2 relate back to the previous section?

      That is correct that Bayesian inference typically does not include entropy terms. We believe that their introduction via the theory of entropy spectrum pathways (ESP) is a significant advance in Bayesian estimation as it provides highly relevent prior information from within the data itself (and therefore always available in spatiotemporal data) that facilitates a practical methodology for the analysis of complex non-linear dynamical system, as contained in the entropy field decomposition (EFD).

      Section 6.1.3 has now been replaced by a new Appendix A that discusses ESP in a much more intuitive and conceptual manner.

      Supplementary Materials

      Section 6.1.3 describes entropy field decomposition in very general terms. What is “non-period”? This section is incomprehensible. Without reference to exactly where in the process this procedure is deployed it is extremely difficult to follow. There seems to be an abuse of notation of using ϕ for eigenvectors in equation (5) and potentials earlier. How do equations 9-11 relate back to the original problem being solved in section 6.1.1? What are multiple modalities being described here that require JESTER?

      Section 6.1.3 has now been replaced by a new Appendix A that covers this material in a much more intuitive and conceptual manner.

      Supplementary Materials

      Section 6.3 discusses source localization methods. While most forward lead-field models assume quasistatic approximations to Maxwell’s equations, these are perfectly valid for the frequency content of brain activity being measured with EEG or MEG. Even with quasi-static lead fields, the solutions can have frequency dependence due to the data having frequency dependence. Solutions do not have to be insensitive to detailed spatially variable electrical properties of the tissues. For instance, if a FEM model was used to compute the forward model, this model will indeed be sensitive to the spatially variable and anisotropic electrical properties. This issue is not even acknowledged.

      The frequency dependence of the tissue properties is not the issue. Our theoretical work demonstrates that taking into account the anisotropy and inhomogeneity of the tissue is necessary in order to derive the existence of the weakly evanescent transverse cortical waves (WETCOW) that SPECTRE is detecting. We have added more details about the WETCOW model in the new Section “A physical theory of brain wave” to emphasize this point.

      Supplementary Materials

      Arguments to disambiguate deep vs shallow sources can be achieved with some but not all source localization algorithms and do not require a non-quasi-static formulation. LORETA is not even the main standard algorithm for comparison. It is disappointing that there are no comparisons to source localization and that this is dismissed away due to some coding issues.

      Again, we are not doing ’source localization’. The concept of localized dipole sources is anathema to our brain wave model, and so in our view comparing SPECTRE to such methods only propagates the misleading idea that they are doing the same thing. So they are definitely not dismissed due to coding issues. However, because of repeated requests to do compare SPECTRE with such methods, we attempted to run a standard source localization method with parameters that would at least provide the closest approximation to what we were doing. This attempt highlighted a serious computational issue in source localization methods that is a direct consequence of the fact that they are not attempting to do what SPECTRE is doing - describing a time-varying wave field, in the technical definition of a ’field’ as an object that has a value at every point in space-time.

    1. Author response:

      The following is the authors’ response to the original reviews

      eLife Assessment

      The study presents some useful findings on Mendelian randomization-phenome-wide association, with BMI associated with health outcomes, and there is a focus on sex differences. Although there are some solid phenotype and genotype data, some of the data are incomplete and could be better presented, perhaps benefiting from more rigorous approaches. Confirmation and further assessment of the observed sex differences will add further value.

      Thank you for your positive comments. We have revised the analysis based on your feedback and that from the two reviewers. Specifically, we implemented a stricter multiple testing correction approach, improved the figures, included additional figures in the Supplementary Materials, considered the sex differences more rigorously and reported them in more detail. A comprehensive description of the revisions is provided below.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study uses information from the UK Biobank and aims to investigate the role of BMI on various health outcomes, with a focus on differences by sex. They confirm the relevance of many of the well-known associations between BMI and health outcomes for males and females and suggest that associations for some endpoints may differ by sex. Overall their conclusions appear supported by the data. The significance of the observed sex variations will require confirmation and further assessment.

      Strengths:

      This is one of the first systematic evaluations of sex differences between BMI and health outcomes. The hypothesis that BMI may be associated with health differentially based on sex is relevant and even expected. As muscle is heavier than adipose tissue, and as men typically have more muscle than women, as a body composition measure BMI is sometimes prone to classifying even normal weight/muscular men as obese, while this measure is more lenient when used in women. Confirmation of the many well-known associations is as expected and attests to the validity of their approach. Demonstration of the possible sex differences is interesting, with this work raising the need for further study.

      Thank you for your valuable comments. We are grateful for the time and effort you have devoted to reviewing our manuscript. We have strengthened our paper by adding your insightful comment about the rationale for sex-specific analysis to the introduction:

      Weaknesses:

      (1) Many of the statistical decisions appeared to target power at the expense of quality/accuracy. For example, they chose to use self-reported information rather than doctor diagnoses for disease outcomes for which both types of data were available.

      Thank you for your valuable comments. We apologize for the lack of clarity in our original description of the phenotypes. Information about health in the UK Biobank was obtained at baseline from tests, measurements and self reports. Subsequently comprehensive data linkage to hospital admissions, death registries and cancer registries was implemented. However, data linkage to primary care data, such as doctor diagnoses, has not been comprehensively implemented for the UK Biobank, possibly for logistic reasons. Doctor diagnoses are only available for about half the cohort, (https://www.ukbiobank.ac.uk/enable-your-research/about-our-data/health-related-outcomes-data). So, we used self-reported diagnoses because they are substantially more comprehensive than the doctor diagnoses. We have explained this point by making the following change to the Methods:

      “Where attributes were available from both self-report and doctor diagnosis, we used self-reports. This is because comprehensive record linkage to doctor diagnoses has not yet been fully implemented for the UK Biobank, so information from doctor diagnoses may not fully represent the broader UK Biobank cohort.”

      (2) Despite known problems and bias arising from the use of one sample approach, they chose to use instruments from the UK Biobank instead of those available from the independent GIANT GWAS, despite the difference in sample size being only marginally greater for UKB for the context. With the way the data is presented, it is difficult to assess the extent to which results are compatible across approaches.

      Thank you for your comments. We agree completely about the issues with a one sample approach, please accept our apologies for not explaining our rationale. The sex-specific GIANT GWAS study is similar in size to the UK Biobank GWAS. However, the sex-specific GIANT GWAS is much less densely genotyped (~2,5 million variants) than the sex-specific UK Biobank GWAS (~10 million variants), so has less power, hence our use of the UK Biobank. To make this clear, we have added the number of variants in each study to the method section. Nevertheless, we also repeated analysis using sex-specific GIANT, as now given in the methods by making the following change

      We amended the description in the first paragraph of the results section:

      “Initial analysis using sex-specific BMI from GIANT yielded similar estimates as when using sex-specific BMI from the UK Biobank but had fewer SNPs resulting in wider confidence intervals (S Table 1) and fewer significant associations (S Table 1). Analysis using sex-combined GIANT yielded more significant associations but lacks granularity, so we presented the results obtained using sex-specific BMI from the UK Biobank.”

      In the discussion we also made the following changes:

      “Tenth, although this study primarily utilized sex-specific BMI, we also conducted analyses using overall BMI from GIANT including the UK Biobank, which gave a generally similar interpretation (S Table 1). Using sex-specific BMI from the UK Biobank and GIANT may lead to lower statistical power than using overall population BMI but allows for the detection of traits that are affected differently by BMI by sex. Including findings from the overall population BMI from sex-combined GIANT (S Table 1) makes the results more comparable to previous similar studies.”

      (3) The approach to multiple testing correction appears very lenient, although the lack of accuracy in the reporting makes it difficult to know what was done exactly. The way it reads, FDR correction was done separately for men, and then for women (assuming that the duplication in tests following stratification does not affect the number of tests). In the second stage, they compared differences by sex using Z-test, apparently without accounting for multiple testing.

      Thank you, we have accounted for multiple comparisons when considering differences by sex and have made corresponding changes. Specifically, in the methods, we changed:

      “We obtained differences by sex using a z-test (Paternoster et al., 1998), which as recommended was on a linear scale for dichotomous outcomes (Knol et al., 2007; Rothman, 2008), then we identified which ones remained after allowing for false discovery”

      We have made the following changes to the results section:

      “We found significant differences by sex in the associations of BMI with 105 health-related attributes (p-value<0.05); 46 phenotypes remained after allowing for false discovery (Table 1). Of these 46 differences most (35) were in magnitude but not direction, such as for SHBG, ischemic heart disease, heart attack, and facial aging, while 11 were directionally different.

      Notably, BMI was more strongly positively associated with myocardial infarction, major coronary heart disease events, ischemic heart disease, heart attack, and facial aging in men than in women. BMI was more strongly positively associated with diastolic blood pressure, and hypothyroidism/myxoedema in women than men. BMI was more strongly inversely associated with LDL-c, hay fever and allergic rhinitis in men than women. BMI was more strongly inversely associated with SHBG in women than men.

      BMI was inversely associated with ApoB, iron deficiency anemia, hernia, and total testosterone in men, while positively associated with these traits in women (Table 1). BMI was inversely associated with sensitivity/hurt feelings, and ever seeking medical advice for nerves, anxiety, tension, or depression in men. However, BMI was positively associated with sensitivity/hurt feelings and ever seeking medical advice for these same issues in women. BMI was positively associated with muscle or soft tissue injuries and haemorrhage from respiratory passages in men, whilst inversely associated with these traits in women.”

      We have correspondingly amended the discussion to reflect these changes by adding:

      “Whether the difference in ischemic heart disease rates between men and women that emerged in the US and the UK the late 19th century (Nikiforov & Mamaev, 1998) is explained by rising BMI remains to be determined.”

      (4) Presentation lacks accuracy in a few places, hence assessment of the accuracy of the statements made by the authors is difficult.

      Thank you, we have revised the whole manuscript in order to improve clarity.

      (5) Conclusion (Abstract) "These findings highlight the importance of retaining a healthy BMI" is rather uninformative, especially as they claim that for some attributes the effects of BMI may be opposite depending on sex/gender.

      Thank you for your comments. We have changed the conclusion of the abstract, as given below:

      “Our study revealed that BMI might affect a wide range of health-related attributes and also highlights notable sex differences in its impact, including opposite associations for certain attributes, such as ApoB; and stronger effects in men, such as for cardiovascular diseases. Our findings underscore the need for nuanced, sex-specific policy related to BMI to address inequities in health.”.

      We have changed the Impact statement, as given below:

      “BMI may affect a wide range of health-related attributes and there are notable sex differences in its impact, including opposite associations for certain attributes, such as ApoB; and stronger effects in men, such as for cardiovascular diseases. Our findings underscore the need for nuanced, sex-specific policy related to BMI.”

      We have changed the conclusion of the paper, as given below:

      “Our contemporary systematic examination found BMI associated with a broad range of health-related attributes. We also found significant sex differences in many traits, such as for cardiovascular diseases, underscoring the importance of addressing higher BMI in both men and women possibly as means of redressing differences in life expectancy. Ultimately, our study emphasizes the harmful effects of obesity and the importance of nuanced, sex-specific policy related to BMI to address inequities.in health.”

      Reviewer #2 (Public review):

      Summary:

      In this present Mendelian randomization-phenome-wide association study, the authors found BMI to be positively associated with many health-related conditions, such as heart disease, heart failure, and hypertensive heart disease. They also found sex differences in some traits such as cancer, psychological disorders, and ApoB.

      Strengths:

      The use of the UK-biobank study with detailed phenotype and genotype information.

      Thank you for your valuable comments. We are grateful for the time and effort you have devoted to reviewing our manuscript.

      Weaknesses:

      (1) Previous studies have performed this analysis using the same cohort, with in-depth analysis. See this paper: Searching for the causal effects of body mass index in over 300,000 participants in UK Biobank, using Mendelian randomization. https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.10079i51

      Thank you for your valuable comments. We checked the paper carefully. It gives sex-specific estimates when the outcome was assessed in different ways in men and women, for example the question about number of children was asked in terms of live births in women and number of children fathered in men. In addition, for some significant findings, the authors investigated differences by sex. However, the paper did not use sex-specific BMI or sex-specific outcomes systematically. We have added this paper to our introduction and amended the text to explain the novelty of our study compared to previous studies.

      “Previous phenome-wide association studies using MR (MR-PheWASs) have identified impacts of sex-combined BMI on endocrine disorders, circulatory diseases, inflammatory and dermatological conditions, some biomarkers and feelings of nervousness (Hyppönen et al., 2019; Millard et al., 2015; Millard et al. 2019), but did not systematically use sex-specific BMI for the exposure or sex-specific outcomes.”

      (2) I believe that the authors' claim, "To our knowledge, no sex-specific PheWAS has investigated the effects of BMI on health outcomes," is not well supported. They have not cited a relevant paper that conducted both overall and sex-stratified PheWAS using UK Biobank data with a detailed analysis. Given the prior study linked above, I am uncertain about the additional contributions of the present research.

      Thank you for your valuable comments, please accept our apologies for this oversight. As explained above, we have checked very carefully. There are three previous PheWAS for BMI, Hyppönen et al., 2019, Millard et al., 2015 and Millard et al. 2019. Hyppönen et al., 2019 and Millard et al., 2015 are not sex-specific. Millard et al. 2019 used sex-combined instruments, but some sex-specific outcomes, when the questions were asked sex-specifically, such as age at puberty asked as “age when periods started (menarche)” in women and “relative age of first facial hair” and “relative age voice broke” in men. When they found a factor significantly associated with BMI, they sometimes analyze it further including sex-specific analysis, but they did not do the analysis systematically for men and women with sex-specific BMI and sex-specific outcomes. We have amended the introduction to clarify this point.

      “To our knowledge, no sex-specific PheWAS has investigated the effects of BMI on health outcomes (Hyppönen et al., 2019; Millard et al., 2015; Millard et al. 2009). To address this gap, we conducted a sex-specific PheWAS, using the largest available sex-specific GWAS of BMI, to explore the impact of sex-specific BMI on sex-specific health-related attributes”

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Presentation, accuracy, and referencing:

      (1) The quality of the English language needs to be checked, including that all sentences carry all components required (including verbs).

      We thank the reviewer for this suggestion. The manuscript has undergone language editing by a native English-speaker, with particular attention to grammatical completeness (including verb consistency and sentence structure). We have also clarified ambiguities and inconsistencies in terms pointed out by the native English speakers. All revisions have been implemented in the updated manuscript.

      (2) The accuracy of statements needs to be checked. For example, in lines 82-83 it is not true that 2015/2019 was 'before the advent of large-scale GWAs studies". In the context of the above in lines 83-85, how can reference be made to a study published in 2020 calling that 'previous' MR studies and how a trial published in 2016 is 'recent'? Please revise, and please also check the manuscript for any other issues with accuracy of this kind.

      We thank the reviewer for this suggestion. We have checked the manuscript and revised these sentences to be clearer, by making the following change.

      “Previous phenome-wide association studies using MR (MR-PheWASs) have identified impacts of sex-combined BMI on endocrine disorders, circulatory diseases, inflammatory and dermatological conditions, some biomarkers and feelings of nervousness (Hyppönen et al., 2019; Millard et al., 2015; Millard et al. 2019), but did not systematically use sex-specific BMI for the exposure or sex-specific outcomes. Previous MR studies and trials of incretins have expanded our knowledge about a broad range of effects of BMI (Larsson et al., 2020; Marso et al., 2016).”

      (3) The adequacy of referencing will need to be checked, e.g. line 136 "as recommended by UK biobank" is vague and needs to be referenced.

      We thank the reviewer for this suggestion. We have added citations.

      “We categorized attributes as age at recruitment, physical measures, lifestyle and environmental, medical conditions, operations, physiological factors, cognitive function, health and medical history, sex-specific factors, blood assays and urine assays, based on the UK Biobank categories (https://biobank.ndph.ox.ac.uk/ukb/cats.cgi).”

      (4) The accurate use of terminology needs to be checked. For example, BMI is a measure of adiposity, while high BMI (typically >30) is used to index obesity.

      We thank you for your comments. We have changed the descriptions into “overweight/obesity” throughout.

      (5) Figure 1, Please check that complete information is given for 'selection criteria' and that the rationale for all information included is clear. For example, it is currently unclear what is the distinction between the bottom two sections which both present a number of features included in the analyses? Also, the Box detailing exclusion of 3585 variables does not give the criteria for these exclusions. Please add.

      Thank you for your comments. We have represented and revised Figure 1. Specifically, we have revised the bottom two sections to give each reason for exclusion and the number excluded for that reason. The updated “Excluded: 3,572 phenotypes, for the reason listed below:” box now contains bullet-points giving each reason for exclusion in the box (e.g. age of certain diseases/disorders onset: 26, alcohol: 56).

      (6) Figure 4, does not look to be of typical publication quality.

      We thank you for your comments. We have used different colors to make it smaller and more readable. Please see Table 1.

      Analyses:

      (1) As it stands, it is very difficult for a reader to confirm the conclusion that similar findings are obtained both when using instruments from the UKB and GIANT based on data presented (Stable 1 and 2). I suggested two things.

      a) Organise stable 1 and 2 by significance and category, with separation by highlighting for those which are significant under correction. I would consider merging these two tables, such that it would be easy for the reader to make the comparisons side by side. Consider presenting separate tables for the analyses for women and men.

      We thank you for your comments. We have followed your helpful advice and merged S Table 1 and S Table 2 into S Table 1. Furthermore, we have also merged S Table 5 to S Table 1.

      b) In Stable 3, please add information from related comparisons using the GIANT instruments. To support the authors' claim that associations are similar, but only the precision of estimation differed, you could consider adding information for numbers of associations for those that are directionally consistent and which have an association at least under nominal significance'. For associations where this does not hold, I would refrain from making a claim that the results are not affected by the choice of instrument (or biases relating to the analysis conducted).

      We thank you for your comments. Among 42 significant sex-specific associations identified in both the UK Biobank and the sex-specific GIANT consortium for men, all showed consistent directions of effect. Similarly, for women, all of the 45 significant associations exhibited consistent directions for UK Biobank compared with GIANT instruments.

      In the sex-specific UK Biobank, there are 203 significant associations in men, and 232 significant associations in women. We have added: in the sex-specific GIANT, there are 46 significant associations in men, and 84 significant associations in women. In the sex-combined GIANT, there are 246 significant associations in men, and 276 significant associations in women. We have provided all this information in S Table 2.

      We added the following descriptions at the end of the results section:

      “Of the 42 significant sex-specific associations identified in both the UK Biobank and the sex-specific GIANT consortium for men, all were directionally consistent. Similarly, for women, all 45 such significant associations were directionally consistent.

      We amended the following descriptions in the first paragraph of the results section:

      “Initial analysis using sex-specific BMI from the GIANT yielded similar estimates as when using sex-specific BMI from the UK Biobank but had fewer SNPs resulting in wider confidence intervals (S Table 1) and fewer significant associations (S Table 2). Analysis using sex-combined GIANT yielded more significant associations but lacks granularity, so we presented the results obtained using sex-specific BMI from the UK Biobank.”

      In the methods, we changed:

      “We obtained differences by sex using a z-test (Paternoster et al., 1998), which as recommended was on a linear scale for dichotomous outcomes (Knol et al., 2007; Rothman, 2008), then we identified which ones remained after allowing for false discovery.”

      We have made the following changes to the results section:

      “We found significant differences by sex in the associations of BMI with 105 health-related attributes (p-value<0.05); 46 phenotypes remained after allowing for false discovery (Table 1). Of these 46 differences most (35) were in magnitude but not direction, such as for SHBG, ischemic heart disease, heart attack, and facial aging, while 11 were directionally different.

      Notably, BMI was more strongly positively associated with myocardial infarction, major coronary heart disease events, ischemic heart disease, heart attack, and facial aging in men than in women. BMI was more strongly positively associated with diastolic blood pressure, and hypothyroidism/myxoedema in women than men. BMI was more strongly inversely associated with LDL-c, hay fever and allergic rhinitis in men than women. BMI was more strongly inversely associated with SHBG in women than men.

      BMI was inversely associated with ApoB, iron deficiency anemia, hernia, and total testosterone in men, while positively associated with these traits in women (Table 1). BMI was inversely associated with sensitivity/hurt feelings, and ever seeking medical advice for nerves, anxiety, tension, or depression in men. However, BMI was positively associated with sensitivity/hurt feelings and ever seeking medical advice for these same issues in women. BMI was positively associated with muscle or soft tissue injuries and haemorrhage from respiratory passages in men, whilst inversely associated with these traits in women.”

      (2) It is not clear what statistical criteria were used to determine sex differences, and the strategy/presentation should be clarified. In lines 229-231, it is implied that the 'significance' in one gender, but not in the other is used to indicate a difference. However, 'comparison of p-values' is not a valid statistical approach, and a more formal test (accounting for multiple testing would be warranted). It may be that a systematic approach has been implemented, but please check that it is adequately and accurately described to the reader.

      Please accept our apologies for being unclear. Multiple comparisons are for independent phenotypes however, here, some phenotypes cannot be independent, therefore, using multiple comparisons in men and women separately is quite strict. We added multiple comparisons for the assessment of sex-differences, which is now given in Table 1. Initially, there were 105 significant associations (p value for sex-difference<0.05) (Table 1), and 46 associations remained after FDR correction (Table 1).  

      Furthermore, we have made additional minor changes to clarify the wording.

      Knol, M. J., van der Tweel, I., Grobbee, D. E., Numans, M. F., & Geerlings, M. I. (2007). Estimating interaction on an additive scale between continuous determinants in a logistic regression model. Int J Epidemiol, 36(5), 1111-1118.

      Nikiforov, S. V., & Mamaev, V. B. (1998). The development of sex differences in cardiovascular disease mortality: a historical perspective. Am J Public Health, 88(9), 1348-1353. https://doi.org/10.2105/ajph.88.9.1348

      Paternoster, R., Brame, R., Mazerolle, P., & Piquero, A. (1998). Using the correct statistical test for the equality of regression coefficients. Criminology, 36(4), 859-866.

      Rothman, K. (2008). Greenland S, Lash TL (ed.). Modern Epidemiology. In: Philadelphia: Lippincott Wolliams & Wilkins.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews: 

      Reviewer #1 (Public review): 

      Summary:

      The study identifies two types of activation: one that is cue-triggered and nonspecific to motion directions, and another that is specific to the exposed motion directions but occurs in a reversed manner. The finding that activity in the medial temporal lobe (MTL) preceded that in the visual cortex suggests that the visual cortex may serve as a platform for the manifestation of replay events, which potentially enhance visual sequence learning.

      Evaluations:

      Identifying the two types of activation after exposure to a sequence of motion directions is very interesting. The experimental design, procedures and analyses are solid. The findings are interesting and novel.

      In the original submission, it was not immediately clear to me why the second type of activation was suggested to occur spontaneously. The procedural differences in the analyses that distinguished between the two types of activation need to be a little better clarified. However, this concern has been satisfactorily addressed in the revision.

      We thank the reviewer for his/her positive evaluation and thoughtful comments. 

      Reviewer #2 (Public review):

      This paper shows and analyzes an interesting phenomenon. It shows that when people are exposed to sequences of moving dots (That is moving dots in one direction, followed by another direction etc.), that showing either the starting movement direction, or ending movement direction causes a coarsegrained brain response that is similar to that elicited by the complete sequence of 4 directions. However, they show by decoding the sensor responses that this brain activity actually does not carry information about the actual sequence and the motion directions, at least not on the time scale of the initial sequence. They also show a reverse reply on a highly-compressed time scale, which is elicited during the period of elevated activity, and activated by the first and last elements of the sequence, but not others. Additionally, these replays seem to occur during periods of cortical ripples, similar to what is found in animal studies.

      These results are intriguing. They are based on MEG recordings in humans, and finding such replays in humans is novel. Also, this is based on what seems to be sophisticated statistical analysis. The statistical methodology seems valid, but due to its complexity it is not easy to understand. The methods especially those described in figures 3 and 4 should be explained better.  

      We thank the reviewer’s detailed evaluation. As suggested, we have further revised the Methods and Results sections, particularly the descriptions related to Figures 3 and 4, to enhance clarity. Please see the revisions highlighted in red in the revised manuscript.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      The most important results here are in Figure 4, and they rely on methods explained in Figure 3. Figure 4 and the results in the figure are confusing.

      What is the red bar in 4B,E. What are the units of the Y axis in figure 4B,E?

      Does sequenceness have units? How do we interpret these magnitudes apart from the line of statistical significance? Shouldn't there be two lines, one for forward replay and the other for backward replay rather than a single line with positive and negative values? The term sequnceness is defined in figure 3, and is key. The replayed sequence in figure 4A,D seems to last about 120 ms.

      What is the meaning of having significance only within a window of 28-36 ms?

      We thank the reviewer’s careful reading and insightful comments. We apologize for the lack of clarity regarding these details in the previous version. As mentioned above, we have revised the Methods and Results sections to enhance clarity throughout the manuscript. For convenience, we provide detailed explanations addressing the specific points raised by the reviewer below.

      First, the red bars in Figures 4B and 4E indicate the lags when the evidence of sequenceness surpassed the statistical significance threshold, as determined by permutation testing. We have now explicitly clarified this in the revised figure captions.

      Second, sequenceness doesn’t have units. It corresponds to the regression coefficient (β) obtained from the second-level GLM in the TDLM framework. Specifically, in the first step of TDLM, we constructed an empirical transition matrix that quantifies the evidence for all possible transitions (e.g., 0° → 90°) at each time lag (Δt). In the second step, we evaluated the extent to which each model transition matrix (e.g., forward or backward transitions) predicts the empirical transition matrix at each Δt, yielding second-level β values. Sequenceness is defined as the difference between the β values for the forward and backward transition models, reflecting the relative strength and directionality of sequential replay. As it is derived from regression coefficients, sequenceness is inherently a unitless measure.

      Regarding the interpretation of sequenceness magnitudes beyond statistical significance, the β values reflect the extent to which the model transition matrix explains variance in the empirical transition matrix. While larger β values suggest stronger sequenceness, absolute magnitudes are influenced by various factors, such as between-participant noise. Therefore, the key criterion for interpreting these values is whether they surpass permutationbased significance thresholds, which indicate that the observed sequenceness is unlikely to have occurred by chance.

      Third, as the reviewer correctly pointed out, we initially computed two separate regression lines, one for forward replay and the other for backward replay. We then defined sequenceness as the contrast between the forward and backward replay (forward minus backward). This contrast approach is commonly used in previous studies to remove between-participant variance in the sequential replay per se, which may arise due to variability in task engagement or measurement sensitivity (Liu et al., 2021; Nour et al., 2021).

      Finally, regarding the duration of replay events, the example sequences shown in Figures 4A and 4D indeed span about 120 ms in total. However, the time lag (Δt) between successive reactivation peaks within these sequences is about 30 ms. This is in line with the findings shown in Figures 4B and 4E, where statistical significance is observed at a time lag window of 28 – 36 ms on the x-axis. It is important to note that the x-axis in these plots represents the time lag (Δt) between sequential reactivations, rather than absolute time.

      We hope these clarifications address the reviewer’s concerns, and we have revised the manuscript accordingly to make these points clearer to readers.

      The methods here are not simple and not simple to explain. The new version is easier to understand. From the new version it seems that the methodology is sound. It should be still clarified and better explained.

      We have carefully revised the manuscript to better explain the methodology. We appreciate the reviewer’s feedback, which is valuable in improving the clarity of our work.

      Now that I understand what they mean by decoding probability, I think that this term is confusing or even misleading. The decoding accuracy is the probability that the direction of motion classification was correct. It seems the so-called decoding probability is value of the logistic regression after normalizing the sum to 1. If this is a standard term it can probably be kept, if not another term would be better.

      Thank you for the reviewer’s comment. We agree that the term decoding probability may initially seem confusing. However, decoding probability is a commonly used term in the neural decoding literature, particularly in human studies (e.g., Liu et al., 2019; Nour et al., 2021; Turner et al., 2023). To maintain consistency with previous work, we have kept this term in the manuscript. We appreciate the opportunity to clarify this point.

      References

      Liu, Y., Dolan, R. J., Higgins, C., Penagos, H., Woolrich, M. W., Ólafsdóttir, H. F., Barry, C., Kurth-Nelson, Z., & Behrens, T. E. (2021). Temporally delayed linear modelling (TDLM) measures replay in both animals and humans. eLife, 10, e66917. https://doi.org/10.7554/eLife.66917

      Liu, Y., Dolan, R. J., Kurth-Nelson, Z., & Behrens, T. E. J. (2019). Human Replay Spontaneously Reorganizes Experience. Cell, 178(3), 640-652.e14. https://doi.org/10.1016/j.cell.2019.06.012

      Nour, M. M., Liu, Y., Arumuham, A., Kurth-Nelson, Z., & Dolan, R. J. (2021). Impaired neural replay of inferred relationships in schizophrenia. Cell, 184(16), 4315-4328.e17. https://doi.org/10.1016/j.cell.2021.06.012

      Turner, W., Blom, T., & Hogendoorn, H. (2023). Visual Information Is Predictively Encoded in Occipital Alpha/Low-Beta Oscillations. Journal of Neuroscience, 43(30), 5537–5545. https://doi.org/10.1523/JNEUROSCI.0135-23.2023

    1. Author response:

      We thank the editors and the reviewers for their valuable comments and for taking the time to evaluate our manuscript.

      Answers to Reviewer 1:

      (1) The core contribution of our method is that it learns meaningful spatiotemporal embeddings directly from image data without requiring pose estimation or eigenworm-based features as input. The learned embedding space can serve as a foundation for downstream tasks such as behavioral classification, clustering, or anomaly detection, further supporting its utility beyond visualization through eigenworm-derived features. Here we use the Tierpsy-derived features for latent space interpretation and for validation that our approach does indeed encode meaningful postural information. Additionally, without any Tierpsy-calculated features users can still color embeddings by known metadata like mutation or age and compare different strains to each other. 

      (2) The numbers shown in Fig. 2.3 are illustrative placeholders intended to conceptually represent a vector of behavioral features. They do not correspond to any specific measurements or carry intrinsic meaning. We agree that this may lead to confusion, and we will clarify this in the revised manuscript.

      (3) The visualizations in Figs. 4 (b) and (c) show the embeddings of sequences of behavior, rather than individual poses. Therefore, motion-related features such as speed are related to temporal patterns in those sequences rather than static postures. The color overlays reflect average motion characteristics (e.g., speed) of short behavior clips projected into the embedding space, rather than being directly linked to any single frame or pose.

      Answers to Reviewer 2:

      (1) In the abstract, our use of the term "unbiased" refers specifically to the avoidance of human-generated bias through feature engineering—i.e., the model does not rely on handcrafted features or predefined pose representations – the representations are based on data only. However, we agree that the model is still subject to dataset biases and will rectify this in the revised manuscript.

      (2) The worm images are rotated to a common vertical orientation to remove orientation as a source of variability in the input. This ensures that the model focuses on learning pose and behavioral dynamics rather than arbitrary head-tail or angular positioning. While data augmentation could in theory account for this variability, we found in our preliminary experiments that applying this preprocessing step led to more stable and interpretable embeddings.

      (3) We agree that simplifying the technical explanations would enhance the manuscript’s accessibility. In the revised version, we will briefly introduce contrastive learning in a less technical language.

      (4) The gray points in Fig. 3a represent frames that Tierpsy could not resolve, primarily due to coiled, self-intersecting, or overlapping worm postures as Tierpsy uses skeletonization to estimate the centerline. This approach can fail if kind of challenging elements are part of the image.

      (5) We appreciate this suggestion and consider it for a revised version of the manuscript.

      (6) Although it may seem intuitive for highly bent (red) poses to lie near coiled (gray) ones in the embedding space, the clustering pattern observed reflects how the network organizes pose information. The red/orange cluster consists of distinguishable bent poses that are visually distinct and consistently separable from other postures. In contrast, the greenish and blueish poses are less strongly bent and may share more visual overlap with the unresolved (gray) images.

      (7) The overlap occurs because some highly bent or coiled worms can still be (partially) resolved by Tierpsy, depending on specific pose conditions (e.g., head and tail not touching, not self-overlapping). However, Tierpsy fails to consistently resolve such frames. We will describe these cases in more detail in the revised manuscript.

      (8) Thank you, we agree this claim needs to be better supported and will develop it in the revision.

      (9) To support this statement we mainly visualized the respective sequences embedded in this area of the embedding space and found that it mostly consists of common behaviors such as forward locomotion. 

      (10) We agree that interpretability is important and plan to include additional figures quantifications of the embedding space using more basic Tierpsy features.

      (11) Fig. 5a is indeed based solely on N2 animals. In the revised manuscript we will include quantitative measures of behavioral variability and its change with age.

      (12) We appreciate this suggestion and consider it for a revised version

      (13) We agree this would be a valuable analysis. However, our current dataset primarily includes aging data for N2 animals. We acknowledge this limitation and consider adding more strains for future work.

      (14) We will include links to our source code in the revised manuscript

      Answers to Reviewer 3:

      (1-2) Our current method is agnostic to head-tail orientation, which indeed restricts the ability to distinguish behaviors that rely on directional cues. We made this design choice as we believe that correctly identifying head/tail orientation can be a challenging task that may introduce additional biases or fail in difficult imaging conditions. However, we fully agree that integrating directional information would improve behavioral resolution, and this is a natural extension of our current framework. In future work, we aim to incorporate head-tail disambiguation.

      (3) We explicitly designed our preprocessing and training pipeline to encourage size invariance, for example by resizing individuals to a consistent scale, as the focus of our work is to encode posture and movement only. However, we acknowledge that absolute size information is lost in this process, which can be informative for distinguishing genotypes or age-related changes.

      (4) We agree that a direct quantitative comparison between our embedding-based representations and skeleton-based feature sets would strengthen the paper. Our current focus was to assess whether meaningful behavioral features could be learned from a skeleton-free representation.

    1. Author response:

      Reviewer 1:

      (1) In general, the representation of target and distractor processing is a bit of a reach. Target processing is represented by SSVEP amplitude, which is most likely going to be related to the contrast of the dots, as opposed to representing coherent motion energy, which is the actual target. These may well be linked (e.g., greater attention to the coherent motion task might increase SSVEP amplitude), but I would call it a limitation of the interpretation. Decoding accuracy of emotional content makes sense as a measure of distractor processing, and the supplementary analysis comparing target SSVEP amplitude to distractor decoding accuracy is duly noted.

      We agree with the reviewer. This is certainly a limitation and will be acknowledged as such in the revised manuscript.

      (2) Comparing SSVEP amplitude to emotional category decoding accuracy feels a bit like comparing apples with oranges. They have different units and scales and probably reflect different neural processes. Is the result the authors find not a little surprising in this context? This relationship does predict performance and is thus intriguing, but I think this methodological aspect needs to be discussed further. For example, is the phase relationship with behaviour a result of a complex interaction between different levels of processing (fundamental contrast vs higher order emotional processing)?

      Traditionally, the SSVEP amplitude at the distractor frequency is used to quantify distractor processing. Given that the target SSVEP amplitude is stronger than that for the distractor, it is possible that the distractor SSVEP amplitude is contaminated by the target SSVEP amplitude due to spectral power leakage; see Figure S4 for a demonstration of this. Because of this issue we therefore introduce the use of decoding accuracy as an index of distractor processing. This has not been done in the SSVEP literature. The lack of correlation between the distractor SSVEP amplitude and the distractor decoding accuracy, although it is kind of like comparing apples with oranges as pointed out by the reviewer, serves the purpose of showing that these two measures are not co-varying, and the use of decoding accuracy is free from the influence of the distractor SSVEP amplitude and thereby free from the influence by the target SSVEP amplitude. This is an important point. We will provide a more thorough discussion of this point in the revised manuscript. 

      Reviewer 2:

      (1) Incomplete Evidence for Rhythmicity at 1 Hz: The central claim of 1 Hz rhythmic sampling is insufficiently validated. The windowing procedure (0.5s windows with 0.25s step) inherently restricts frequency resolution, potentially biasing toward low-frequency components like 1 Hz. Testing different window durations or providing controls would significantly strengthen this claim.

      This is an important point. We plan to follow the reviewer’s suggestion and repeat our analysis using different window sizes to test the robustness of the observed 1Hz rhythmicity. In addition, we plan to also apply the Hilbert transform to extract time-point-by-time-point amplitude envelopes, which will provide a window-free estimation of the distractor strength and further validate the presence of the low-frequency 1Hz dynamics.

      (2) No-Distractor Control Condition: The study lacks a baseline or control condition without distractors. This makes it difficult to determine whether the distractor-related decoding signals or the 1 Hz effect reflect genuine distractor processing or more general task dynamics.

      We agree with the reviewer. This is certainly a limitation and will be acknowledged as such in the revised manuscript.

      (3) Decoding Near Chance Levels: The pairwise decoding accuracies for distractor categories hover close to chance (~55%), raising concerns about robustness. While statistically above chance, the small effect sizes need careful interpretation, particularly when linked to behavior.

      This is a good point. In addition to acknowledging this in the revised manuscript, we will carry out two additional analyses to test this issue further. First, we will implement a random permutation procedure, in which the trial labels are randomly shuffled and the null-hypothesis distribution for decoding accuracy is built, and compare the decoding accuracy from the actual data to this distribution. Second, we will perform a temporal generalization analysis to examine whether the neural representations of the distractor drift over the course of an entire trial, which is 11 seconds long. Recent studies suggest that even when the stimulus stays the same, their neural representations may drift over time.

      (4) No Clear Correlation Between SSVEP and Behavior: Neither target nor distractor signal strength (SSVEP amplitude) correlates with behavioral accuracy. The study instead relies heavily on relative phase, which - while interesting - may benefit from additional converging evidence.

      We felt that what the reviewer pointed out is actually the main point of our study, namely, it is not the overall target or distractor strength that matters for behavior, it is their temporal relationship that matters for behavior. This reveals a novel neuroscience principle that has not been reported in the past. We will stress this point further in the revised manuscript.

      (5) Phase-analysis: phase analysis is performed between different types of signals hindering their interpretability (time-resolved SSVEP amplitude and time-resolved decoding accuracy).

      The time-resolved SSVEP amplitude is used to index the temporal dynamics of target processing whereas the time-resolved decoding accuracy is used to index the temporal dynamics of distractor processing. As such, they can be compared, using relative phase for example, to examine how temporal relations between the two types of processes impact behavior. This said, we do recognize the reviewer’s concern that these two processes are indexed by two different types of signals. We plan to normalize each time course, make them dimensionless, and then compute the temporal relations between them.   

      Appraisal of Aims and Conclusions:

      The authors largely achieved their stated goal of assessing rhythmic sampling of distractors. However, the conclusions drawn - particularly regarding the presence of 1 Hz rhythmicity - rest on analytical choices that should be scrutinized further. While the observed phase-performance relationship is interesting and potentially impactful, the lack of stronger and convergent evidence on the frequency component itself reduces confidence in the broader conclusions.

      Impact and Utility to the Field:

      If validated, the findings will advance our understanding of attentional dynamics and competition in complex visual environments. Demonstrating that ignored distractors can be rhythmically sampled at similar frequencies to targets has implications for models of attention and cognitive control. However, the methodological limitations currently constrain the paper's impact.

      Thanks for these comments and positive assessment of our work’s potential implications and impact. We will try our best in the revision process to address the concerns.

      Additional Context and Considerations:

      (1) The use of EEG-fMRI is mentioned but not leveraged. If BOLD data were collected, even exploratory fMRI analyses (e.g., distractor modulation in visual cortex) could provide valuable converging evidence.

      Indeed, leveraging fMRI data in EEG studies would be very beneficial, as having been demonstrated in our previous work. However, given that this study concerns the temporal relationship between target and distractor processing, it is felt that fMRI, given its well-known limitation in temporal resolution, has limited potential to contribute. We will be exploring this rich dataset in other ways where the two modalities are integrated to gain more insights not possible with either modality used alone.

      (2) In turn, removal of fMRI artifacts might introduce biases or alter the data. For instance, the authors might consider investigating potential fMRI artifact harmonics around 1 Hz to address concerns regarding induced spectral components.

      We have done extensive work in the area of simultaneous EEG-fMRI and have not encountered artifacts with a 1Hz rhythmicity. Also, the fact that the temporal relations between target processing and distractor processing at 1Hz predict behavior is another indication that the 1Hz rhythmicity is a neuroscientific effect not an artifact. However, we will be looking into this carefully and address this in the revision process.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This computational modeling study builds on multiple previous lines of experimental and theoretical research to investigate how a single neuron can solve a nonlinear pattern classification task. The authors construct a detailed biophysical and morphological model of a single striatal medium spiny neuron, and endow excitatory and inhibitory synapses with dynamic synaptic plasticity mechanisms that are sensitive to (1) the presence or absence of a dopamine reward signal, and (2) spatiotemporal coincidence of synaptic activity in single dendritic branches. The latter coincidence is detected by voltage-dependent NMDA-type glutamate receptors, which can generate a type of dendritic spike referred to as a "plateau potential." The proposed mechanisms result in moderate performance on a nonlinear classification task when specific input features are segregated and clustered onto individual branches, but reduced performance when input features are randomly distributed across branches. Given the high level of complexity of all components of the model, it is not clear which features of which components are most important for its performance. There is also room for improvement in the narrative structure of the manuscript and the organization of concepts and data.

      Strengths:

      The integrative aspect of this study is its major strength. It is challenging to relate low-level details such as electrical spine compartmentalization, extrasynaptic neurotransmitter concentrations, dendritic nonlinearities, spatial clustering of correlated inputs, and plasticity of excitatory and inhibitory synapses to high-level computations such as nonlinear feature classification. Due to high simulation costs, it is rare to see highly biophysical and morphological models used for learning studies that require repeated stimulus presentations over the course of a training procedure. The study aspires to prove the principle that experimentally-supported biological mechanisms can explain complex learning.

      Weaknesses:

      The high level of complexity of each component of the model makes it difficult to gain an intuition for which aspects of the model are essential for its performance, or responsible for its poor performance under certain conditions. Stripping down some of the biophysical detail and comparing it to a simpler model may help better understand each component in isolation. That said, the fundamental concepts behind nonlinear feature binding in neurons with compartmentalized dendrites have been explored in previous work, so it is not clear how this study represents a significant conceptual advance. Finally, the presentation of the model, the motivation and justification of each design choice, and the interpretation of each result could be restructured for clarity to be better received by a wider audience.

      Thank you for the feedback! We agree that the complexity of our model can make it challenging to intuitively understand the underlying mechanisms. To address this, we have revised the manuscript to include additional simulations and clearer explanations of the mechanisms at play.

      In the revised introduction, we now explicitly state our primary aim: to assess to what extent a biophysically detailed neuron model can support the theory proposed by Tran-Van-Minh et al. and explore whether such computations can be learned by a single neuron, specifically a projection neuron in the striatum. To achieve this, we focus on several key mechanisms:

      (1) A local learning rule: We develop a learning rule driven by local calcium dynamics in the synapse and by reward signals from the neuromodulator dopamine. This plasticity rule is based on the known synaptic machinery for triggering LTP or LTD in the corticostriatal synapse onto dSPNs (Shen et al., 2008). Importantly, the rule does not rely on supervised learning paradigms and neither is a separate training and testing phase needed.

      (2) Robust dendritic nonlinearities: According to Tran-Van-Minh et al., (2015) sufficient supralinear integration is needed to ensure that e.g. two inputs (i.e. one feature combination in the NFBP, Figure 1A) on the same dendrite generate greater somatic depolarization than if those inputs were distributed across different dendrites. To accomplish this we generate sufficiently robust dendritic plateau potentials using the approach in Trpevski et al., (2023). 

      (3) Metaplasticity: Although not discussed much in more theoretical work, our study demonstrates the necessity of metaplasticity for achieving stable and physiologically realistic synaptic weights. This mechanism ensures that synaptic strengths remain within biologically plausible ranges during training, regardless of initial synaptic weights.

      We have also clarified our design choices and the rationale behind them, as well as restructured the interpretation of our results for greater accessibility. We hope these revisions make our approach and findings more transparent and easier to engage with for a broader audience.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      This study extends three previous lines of work:  

      (1) Prior computational/phenomenological work has shown that the presence of dendritic nonlinearities can enable single neurons to perform linearly non-separable tasks like XOR and feature binding (e.g. Tran-Van-Minh et al., Front. Cell. Neurosci., 2015).

      Prior computational and phenomenological work, such as Tran-Van-Minh et al. (Front. Cell. Neurosci., 2015), directly inspired our study, as we now explicitly state in the introduction (page 4, lines 19-22). While Tran-Van-Minh theoretically demonstrated that these principles could solve the NFBP, it remains untested to what extent this can be achieved quantitatively in biophysically detailed neuron models using biologically plausible learning rules - which is what we test here.

      (2) This study and a previous biophysical modeling study (Trpevski et al., Front. Cell. Neurosci., 2023) rely heavily on the finding from Chalifoux & Carter, J. Neurosci., 2011 that blocking glutamate transporters with TBOA increases dendritic calcium signals. The proposed model thus depends on a specific biophysical mechanism for dendritic plateau potential generation, where spatiotemporally clustered inputs must be co-activated on a single branch, and the voltage compartmentalization of the branch and the voltage-dependence of NMDARs is not enough, but additionally glutamate spillover from neighboring synapses must activate extrasynaptic NMDARs. If this specific biophysical implementation of dendritic plateau potentials is essential to the findings in this study, the authors have not made that connection clear. If it is a simple threshold nonlinearity in dendrites that is important for the model, and not the specific underlying biophysical mechanisms, then the study does not appear to provide a conceptual advance over previous studies demonstrating nonlinear feature binding with simpler implementations of dendritic nonlinearities.

      We appreciate the feedback on the hypothesized role of glutamate spillover in our model. While the current manuscript and Trpevski et al. (2023) emphasize glutamate spillover as a plausible biophysical mechanism to provide sufficiently robust and supralinear plateau potentials, we acknowledge, however, that the mechanisms of supralinearity of dendritic integration, might not depend solely on this specific mechanism in other types of neurons. In Trpevski et al (2023) we, however, realized that if we allow too ‘graded’ dendritic plateaus, using the quite shallow Mg-block reported in experiments, it was difficult to solve the NFBP. The conceptual advance of our study lies in demonstrating that sufficiently nonlinear dendritic integration is needed and that this can be accounted for by assuming spillover in SPNs—but regardless of its biophysical source (e.g. NMDA spillover, steeper NMDA Mg block activation curves or other voltage dependent conductances that cause supralinear dendritic integration)—it enables biophysically detailed neurons to solve the nonlinear feature binding problem. To address this point and clarify the generality of our conclusions, we have revised the relevant sections in the manuscript to state this explicitly.

      (3) Prior work has utilized "sliding-threshold," BCM-like plasticity rules to achieve neuronal selectivity and stability in synaptic weights. Other work has shown coordinated excitatory and inhibitory plasticity. The current manuscript combines "metaplasticity" at excitatory synapses with suppression of inhibitory strength onto strongly activated branches. This resembles the lateral inhibition scheme proposed by Olshausen (Christopher J. Rozell, Don H. Johnson, Richard G. Baraniuk, Bruno A. Olshausen; Sparse Coding via Thresholding and Local Competition in Neural Circuits. Neural Comput 2008; 20 (10): 2526-2563. doi: https://doi.org/10.1162/neco.2008.03-07-486). However, the complexity of the biophysical model makes it difficult to evaluate the relative importance of the additional complexity of the learning scheme.

      We initially tried solving the NFBP with only excitatory plasticity, which worked reasonably well, especially if we assume a small population of neurons collaborates under physiological conditions. However, we observed that plateau potentials from distally located inputs were less effective, and we now explain this limitation in the revised manuscript (page 14, lines 23-37).

      To address this, we added inhibitory plasticity inspired by mechanisms discussed in Castillo et al. (2011) , Ravasenga et al., and Chapman et al. (2022) , as now explicitly stated in the text (page 32, lines 23-26). While our GABA plasticity rule is speculative, it demonstrates that distal GABAergic plasticity can enhance nonlinear computations. These results are particularly encouraging, as it shows that implementing these mechanisms at the single-neuron level produces behavior consistent with network-level models like BCM-like plasticity rules and those proposed by Rozell et al. We hope this will inspire further experimental work on inhibitory plasticity mechanisms.

      P2, paragraph 2: Grammar: "multiple dendritic regions, preferentially responsive to different input values or features, are known to form with close dendritic proximity." The meaning is not clear. "Dendritic regions" do not "form with close dendritic proximity."

      Rewritten (current page 2, line 35)

      P5, paragraph 3: Grammar: I think you mean "strengthened synapses" not "synapses strengthened".

      Rewritten (current page 14, line 36)

      P8, paragraph 1: Grammar: "equally often" not "equally much".

      Updated (current page 10, line 2)

      P8, paragraph 2: "This is because of the learning rule that successively slides the LTP NMDA Ca-dependent plasticity kernel over training." It is not clear what is meant by "sliding," either here or in the Methods. Please clarify.

      We have updated the text and removed the word “sliding” throughout the manuscript to clarify that the calcium dependence of the kernels are in fact updated

      P10, Figure 3C (left): After reading the accompanying text on P8, para 2, I am left not understanding what makes the difference between the two groups of synapses that both encode "yellow," on the same dendritic branch (d1) (so both see the same plateau potentials and dopamine) but one potentiates and one depresses. Please clarify.

      Some "yellow" and "banana" synapses are initialized with weak conductances, limiting their ability to learn due to the relatively slow dynamics of the LTP kernel. These weak synapses fail to reach the calcium thresholds necessary for potentiation during a dopamine peak, yet they remain susceptible to depression under LTD conditions. Initially, the dynamics of the LTP kernel does not allow significant potentiation, even in the presence of appropriate signals such as plateau potentials and dopamine (page 10, lines 22–26). We have added a more detailed explanation of how the learning rule operates in the section “Characterization of the Synaptic Plasticity Rule” on page 9 and have clarified the specific reason why the weaker yellow synapses undergo LTD (page 11, lines 1–7).

      As shown in Supplementary Figure 6, during subthreshold learning, the initial conductance is also low, which similarly hinders the synapses' ability to potentiate. However, with sufficient dopamine, the LTP kernel adapts by shifting closer to the observed calcium levels, allowing these synapses to eventually strengthen. This dynamic highlights how the model enables initially weak synapses to "catch up" under consistent activation and favorable dopaminergic conditions.

      P9, paragraph 1: The phrase "the metaplasticity kernel" is introduced here without prior explanation or motivation for including this level of complexity in the model. Please set it up before you use it.

      A sentence introducing metaplasticity has been added to the introduction (page 3, lines 36-42) as well as on page 9, where the kernel is introduced (page 9, lines 26-35)

      P10, Figure 3D: "kernel midline" is not explained.

      We have replotted fig 3 to make it easier to understand what is shown. Also, an explanation of the Kernel midpoint is added to the legend (current page 12, line 19)

      P11, paragraph 1; P13, Fig. 4C: My interpretation of these data is that clustered connectivity with specific branches is essential for the performance of the model. Randomly distributing input features onto branches (allowing all 4 features to innervate single branches) results in poor performance. This is bad, right? The model can't learn unless a specific pre-wiring is assumed. There is not much interpretation provided at this stage of the manuscript, just a flat description of the result. Tell the reader what you think the implications of this are here.

      Thanks for the suggestion - we have updated this section of the manuscript, adding an interpretation of the results that the model often fails to learn both relevant stimuli if all four features are clustered onto the same dendrite (page 13, lines 31-42). 

      In summary, when multiple feature combinations are encoded in the same dendrite with similar conductances, the ability to determine which combination to store depends on the dynamics of the other dendrite. Small variations in conductance, training order, or other stochastic factors can influence the outcome. This challenge, known as the symmetry-breaking problem, has been previously acknowledged in abstract neuron models (Legenstein and Maass, 2011). To address this, additional mechanisms such as branch plasticity—amplifying or attenuating the plateau potential as it propagates from the dendrite to the soma—can be employed (Legenstein and Maass, 2011). 

      P12, paragraph 2; P13, Figure 4E: This result seems suboptimal, that only synapses at a very specific distance from the soma can be used to effectively learn to solve a NFBP. It is not clear to what extent details of the biophysical and morphological model are contributing to this narrow distance-dependence, or whether it matches physiological data.

      We have added Figure 5—figure supplement 1A to clarify why distal synapses may not optimally contribute to learning. This figure illustrates how inhibitory plasticity improves performance by reducing excessive LTD at distal dendrites, thereby enhancing stimulus discrimination. Relevant explanations have been integrated into Page 18, Lines 25-39 in the revised manuscript.

      P14, paragraph 2: Now the authors are assuming that inhibitory synapses are highly tuned to stimulus features. The tuning of inhibitory cells in the hippocampus and cortex is controversial but seems generally weaker than excitatory cells, commensurate with their reduced number relative to excitatory cells. The model has accumulated a lot of assumptions at this point, many without strong experimental support, which again might make more sense when proposing a new theory, but this stitching together of complex mechanisms does not provide a strong intuition for whether the scheme is either biologically plausible or performant for a general class of problem.

      We acknowledge that it is not currently known whether inhibitory synapses in the striatum are tuned to stimulus features. However, given that the striatum is a purely inhibitory structure, it is plausible that lateral inhibition from other projection neurons could be tuned to features, even if feedforward inhibition from interneurons is not. Therefore, we believe this assumption is reasonable in the context of our model. As noted earlier, the GABA plasticity rule in our study is speculative. However, we hope that our work will encourage further experimental investigations, as we demonstrate that if GABAergic inputs are sufficiently specific, they can significantly enhance computations (This is discussed on page 17, lines 8-15.).

      P16, Figure 5E legend: The explanation of the meaning of T_max and T_min in the legend and text needs clarification.

      The abbreviations  T<sub>min</sub> and  T<sub>max</sub> have been updated to CTL and CTH to better reflect their role in calcium threshold tracking. The Figure 5E legend and relevant text have been revised for clarity. Additionally, the Methods section has been reorganized for better readability.

      P16, Figure 5B, C: When the reader reaches this paper, the conundrums presented in Figure 4 are resolved. The "winner-takes-all" inhibitory plasticity both increases the performance when all features are presented to a single branch and increases the range of somatodendritic distances where synapses can effectively be used for stimulus discrimination. The problem, then, is in the narrative. A lot more setup needs to be provided for the question related to whether or not dendritic nonlinearity and synaptic inhibition can be used to perform the NFBP. The authors may consider consolidating the results of Fig. 4 and 5 so that the comparison is made directly, rather than presenting them serially without much foreshadowing.

      In order to facilitate readability, we have updated the following sections of the manuscript to clarify how inhibitory plasticity resolves challenges from Figure 4:

      Figure 5B and Figure 5–figure supplement 1B: Two new panels illustrate the role of inhibitory plasticity in addressing symmetry problems.

      Figure 5–figure supplement 1A: Shows how inhibitory plasticity extends the effective range of somatodendritic distances.

      P18, Figure 6: This should be the most important figure, finally tying in all the previous complexity to show that NFBP can be partially solved with E and I plasticity even when features are distributed randomly across branches without clustering. However, now bringing in the comparison across spillover models is distracting and not necessary. Just show us the same plateau generation model used throughout the paper, with and without inhibition.

      Figure updated. Accumulative spillover and no-spillover conditions have been removed.

      P18, paragraph 2: "In Fig. 6C, we report that a subset of neurons (5 out of 31) successfully solved the NFBP." This study could be significantly strengthened if this phenomenon could (perhaps in parallel) be shown to occur in a simpler model with a simpler plateau generation mechanism. Furthermore, it could be significantly strengthened if the authors could show that, even if features are randomly distributed at initialization, a pruning mechanism could gradually transition the neuron into the state where fewer features are present on each branch, and the performance could approach the results presented in Figure 5 through dynamic connectivity.

      To model structural plasticity is a good suggestion that should be investigated in later work, however, we feel that it goes beyond what we can do in the current manuscript.  We now acknowledge that structural plasticity might play a role. For example we show that if we can assume ‘branch-specific’ spillover, that leads to sufficiently development of local dendritic non-linearities, also one can learn with distributed inputs. In reality, structural plasticity is likely important here, as we now state (current page 22, line 35-42). 

      P17, paragraph 2: "As shown in Fig. 6B, adding the hypothetical nonlinearities to the model increases the performance towards solving part of the NFBP, i.e. learning to respond to one relevant feature combination only. The performance increases with the amount of nonlinearity." This is not shown in Figure 6B.

      Sentence removed. We have added a Figure 6 - figure supplement 1 to better explain the limitations.

      P22, paragraph 1: The "w" parameter here is used to determine whether spatially localized synapses are co-active enough to generate a plateau potential. However, this is the same w learned through synaptic plasticity. Typically LTP and LTD are thought of as changing the number of postsynaptic AMPARs. Does this "w" also change the AMPAR weight in the model? Do the authors envision this as a presynaptic release probability quantity? If so, please state that and provide experimental justification. If not, please justify modifying the activation of postsynaptic NMDARs through plasticity.

      This is an important remark. Our plasticity model differs from classical LTP models as it depends on the link between LTP and increased spillover as described by Henneberger et al., (2020).

      We have updated the method section (page 27, lines 6-11), and we acknowledge, however, that in a real cell, learning might first strengthen the AMPA component, but after learning the ratio of NMDA/AMPA is unchanged ( Watt et al., 2004). This re-balancing between NMDA and AMPA might perhaps be a slower process.

      Reviewer #2 (Public Review):

      Summary:

      The study explores how single striatal projection neurons (SPNs) utilize dendritic nonlinearities to solve complex integration tasks. It introduces a calcium-based synaptic learning rule that incorporates local calcium dynamics and dopaminergic signals, along with metaplasticity to ensure stability for synaptic weights. Results show SPNs can solve the nonlinear feature binding problem and enhance computational efficiency through inhibitory plasticity in dendrites, emphasizing the significant computational potential of individual neurons. In summary, the study provides a more biologically plausible solution to single-neuron learning and gives further mechanical insights into complex computations at the single-neuron level.

      Strengths:

      The paper introduces a novel learning rule for training a single multicompartmental neuron model to perform nonlinear feature binding tasks (NFBP), highlighting two main strengths: the learning rule is local, calcium-based, and requires only sparse reward signals, making it highly biologically plausible, and it applies to detailed neuron models that effectively preserve dendritic nonlinearities, contrasting with many previous studies that use simplified models.

      Weaknesses:

      I am concerned that the manuscript was submitted too hastily, as evidenced by the quality and logic of the writing and the presentation of the figures. These issues may compromise the integrity of the work. I would recommend a substantial revision of the manuscript to improve the clarity of the writing, incorporate more experiments, and better define the goals of the study.

      Thanks for the valuable feedback. We have now gone through the whole manuscript updating the text, and also improved figures and added some supplementary figures to better explain model mechanisms. In particular, we state more clearly our goal already in the introduction.

      Major Points:

      (1) Quality of Scientific Writing: The current draft does not meet the expected standards. Key issues include:

      i. Mathematical and Implementation Details: The manuscript lacks comprehensive mathematical descriptions and implementation details for the plasticity models (LTP/LTD/Meta) and the SPN model. Given the complexity of the biophysically detailed multicompartment model and the associated learning rules, the inclusion of only nine abstract equations (Eq. 1-9) in the Methods section is insufficient. I was surprised to find no supplementary material providing these crucial details. What parameters were used for the SPN model? What are the mathematical specifics for the extra-synaptic NMDA receptors utilized in this study? For instance, Eq. 3 references [Ca2+]-does this refer to calcium ions influenced by extra-synaptic NMDARs, or does it apply to other standard NMDARs? I also suggest the authors provide pseudocodes for the entire learning process to further clarify the learning rules.

      The model is quite detailed but builds on previous work. For this reason, for model components used in earlier published work (and where models are already available via model repositories, such as ModelDB), we refer the reader to these resources in order to improve readability and to highlight what is novel in this paper - the learning rules itself. The learning rule is now explained in detail. For modelers that want to run the model, we have also provided a GitHub link to the simulation code. We hope this is a reasonable compromise to all readers, i.e, those that only want to understand what is new here (learning rule) and those that also want to test the model code. We explain this to the readers at the beginning of the Methods section.

      ii. Figure quality. The authors seem not to carefully typeset the images, resulting in overcrowding and varying font sizes in the figures. Some of the fonts are too small and hard to read. The text in many of the diagrams is confusing. For example, in Panel A of Figure 3, two flattened images are combined, leading to small, distorted font sizes. In Panels C and D of Figure 7, the inconsistent use of terminology such as "kernels" further complicates the clarity of the presentation. I recommend that the authors thoroughly review all figures and accompanying text to ensure they meet the expected standards of clarity and quality.

      Thanks for directing our attention to these oversights. We have gone through the entire manuscript, updating the figures where needed, and we are making sure that the text and the figure descriptions are clear and adequate and use consistent terminology for all quantities.

      iii. Writing clarity. The manuscript often includes excessive and irrelevant details, particularly in the mathematical discussions. On page 24, within the "Metaplasticity" section, the authors introduce the biological background to support the proposed metaplasticity equation (Eq. 5). However, much of this biological detail is hypothesized rather than experimentally verified. For instance, the claim that "a pause in dopamine triggers a shift towards higher calcium concentrations while a peak in dopamine pushes the LTP kernel in the opposite direction" lacks cited experimental evidence. If evidence exists, it should be clearly referenced; otherwise, these assertions should be presented as theoretical hypotheses. Generally, Eq. 5 and related discussions should be described more concisely, with only a loose connection to dopamine effects until more experimental findings are available.

      The “Metaplasticity” section (pages 30-32) has been updated to be more concise, and the abundant references to dopamine have been removed.

      (2) Goals of the Study: The authors need to clearly define the primary objective of their research. Is it to showcase the computational advantages of the local learning rule, or to elucidate biological functions?

      We have explicitly stated our goal in the introduction (page 4, lines 19-22). Please also see the response to reviewer 1.

      i. Computational Advantage: If the intent is to demonstrate computational advantages, the current experimental results appear inadequate. The learning rule introduced in this work can only solve for four features, whereas previous research (e.g., Bicknell and Hausser, 2021) has shown capability with over 100 features. It is crucial for the authors to extend their demonstrations to prove that their learning rule can handle more than just three features. Furthermore, the requirement to fine-tune the midpoint of the synapse function indicates that the rule modifies the "activation function" of the synapses, as opposed to merely adjusting synaptic weights. In machine learning, modifying weights directly is typically more efficient than altering activation functions during learning tasks. This might account for why the current learning rule is restricted to a limited number of tasks. The authors should critically evaluate whether the proposed local learning rule, including meta-plasticity, actually offers any computational advantage. This evaluation is essential to understand the practical implications and effectiveness of the proposed learning rule.

      Thank you for your feedback. To address the concern regarding feature complexity, we extended our simulations to include learning with 9 and 25 features, achieving accuracies of 80% and 75%, respectively (Figure 6—figure supplement 1A). While our results demonstrate effective performance, the absence of external stabilizers—such as error-modulated functions used in prior studies like Bicknell and Hausser (2021)—means that the model's performance can be more sensitive to occasional incorrect outcomes. For instance, while accuracy might reach 90%, a few errors can significantly affect overall performance due to the lack of mechanisms to stabilize learning.

      In order to clarify the setup of the rule, we have added pseudocode in the revised manuscript (Pages 31-32) detailing how the learning rule and metaplasticity update synaptic weights based on calcium and dopamine signals. Additionally, we have included pseudocode for the inhibitory learning rule on Pages 34-35. In future work, we also aim to incorporate biologically plausible mechanisms, such as dopamine desensitization, to enhance stability.

      ii. Biological Significance: If the goal is to interpret biological functions, the authors should dig deeper into the model behaviors to uncover their biological significance. This exploration should aim to link the observed computational features of the model more directly with biological mechanisms and outcomes.

      As now clearly stated in the introduction, the goal of the study is to see whether and to what quantitative extent the theoretical solution of the NFBP proposed in Tran-Van-Minh et al. (2015) can be achieved with biophysically detailed neuron models and with a biologically inspired learning rule. The problem has so far been solved with abstract and phenomenological neuron models (Schiess et al., 2014; Legenstein and Maass, 2011) and also with a detailed neuron model but with a precalculated voltage-dependent learning rule (Bicknell and Häusser, 2021).

      We have also tried to better explain the model mechanisms by adding supplementary figures.

      Reviewer #2 (Recommendations For The Authors):

      Minor:

      (1) The [Ca]NMDA in Figure 2A and 2C can have large values even when very few synapses are activated. Why is that? Is this setting biologically realistic?

      The elevated [Ca²⁺]NMDA with minimal synaptic activation arises from high spine input resistance, small spine volume, and NMDA receptor conductance, which scales calcium influx with synaptic strength. Physiological studies report spine calcium transients typically up to ~1 μM (Franks and Sejnowski 2002, DOI: 10.1002/bies.10193), while our model shows ~7 μM for 0.625 nS and around ~3 μM for 0.5 nS, exceeding this range. The calcium levels of the model might therefore be somewhat high compared to biologically measured levels - however, this does not impact the learning rule, as the functional dynamics of the rule remain robust across calcium variations.

      (2) In the distributed synapses session, the study introduces two new mechanisms "Threshold spillover" and "Accumulative spillover". Both mechanisms are not basic concepts but quantitative descriptions of them are missing.

      Thank you for your feedback. Based on the recommendations from Reviewer 1, we have simplified the paper by removing the "Accumulative spillover" and focusing solely on the "Thresholded spillover" mechanism. In the updated version of the paper, we refer to it only as glutamate spillover. However, we acknowledge (page 22, lines 40-42) that to create sufficient non-linearities, other mechanisms, like structural plasticity, might also be involved (although testing this in the model will have to be postponed to future work).

      (3) The learning rule achieves moderate performance when feature-relevant synapses are organized in pre-designed clusters, but for more general distributed synaptic inputs, the model fails to faithfully solve the simple task (with its performance of ~ 75%). Performance results indicate the learning rule proposed, despite its delicate design, is still inefficient when the spatial distribution of synapses grows complex, which is often the case on biological neurons. Moreover, this inefficiency is not carefully analyzed in this paper (e.g. why the performance drops significantly and the possible computation mechanism underlying it).

      The drop in performance when using distributed inputs (to a mean performance of 80%) is similar to the mean performance in the same situation in Bicknell and Hausser (2021), see their Fig. 3C. The drop in performance is due to that: i) the relevant feature combinations are not often colocalized on the same dendrite so that they can be strengthened together, and ii) even if they are, there may not be enough synapses to trigger the supralinear response from the branch spillover mechanism, i.e. the inputs are not summated in a supralinear way (Fig. 6B, most input configurations only reach 75%).

      Because of this, at most one relevant feature combination can be learned. In the several cases when the random distribution of synapses is favorable for both relevant feature combinations to be learned, the NFBP is solved (Figs. 6B, some performance lines reach 100 % and 6C, example of such a case). We have extended the relevant sections of the paper trying to highlight the above mentioned mechanisms.

      Further, the theoretical results in Tran-Van-Minh et al. 2015 already show that to solve the NFBP with supralinear dendrites requires features to be pre-clustered in order to evoke the supralinear dendritic response, which would activate the soma. The same number of synapses distributed across the dendrites i) would not excite the soma as strongly, and ii) would summate in the soma as in a point neuron, i.e. no supralinear events can be activated, which are necessary to solve the NFBP. Hence, one doesn’t expect distributed synaptic inputs to solve the NFBP with any kind of learning rule. 

      (4) Figure 5B demonstrates that on average adding inhibitory synapses can enhance the learning capabilities to solve the NFBP for different pattern configurations (2, 3, or 4 features), but since the performance for excitatory-only setup varies greatly between different configurations (Figure 4B, using 2 or 3 features can solve while 4 cannot), can the results be more precise about whether adding inhibitory synapses can help improve the learning with 4 features?

      In response to the question, we added a panel to Figure 5B showing that without inhibitory synapses, 5 out of 13 configurations with four features successfully learn, while with inhibitory synapses, this improves to 7 out of 13. Figure 5—figure supplement 1B provides an explanation for this improvement: page 18 line 10-24

      (5) Also, in terms of the possible role of inhibitory plasticity in learning, as only on-site inhibition is studied here, can other types of inhibition be considered, like on-path or off-path? Do they have similar or different effects?

      This is an interesting suggestion for future work. We observed relevant dynamics in Figure 6A, where inhibitory synapses increased their weights on-site when randomly distributed. Previous work by Gidon and Segev (2012) examined the effects of different inhibitory types on NMDA clusters, highlighting the role of on-site and off-path inhibition in shunting. In our context, on-site inhibition in the same branch, appears more relevant for maintaining compartmentalized dendritic processing.

      (6) Figure 6A is mentioned in the context of excitatory-only setup, but it depicts the setup when both excitatory and inhibitory synapses are included, which is discussed later in the paper. A correction should be made to ensure consistency.

      We have updated the figure and the text in order to make it more clear that simulations are run both with and without inhibition in this context (page 21 line 4-13)

      (7) In the "Ca and kernel dynamics" plots (Fig 3,5), some of the kernel midlines (solid line) are overlapped by dots, e.g. the yellow line in Fig 3D, and some kernel midlines look like dots, which leads to confusion. Suggest to separate plots of Ca and kernel dynamics for clarity. 

      The design of the figures has been updated to improve the visibility of the calcium and kernel dynamics during training.

      (8) The formulations of the learning rule are not well-organized, and the naming of parameters is kind of confusing, e.g. T_min, T_max, which by default represent time, means "Ca concentration threshold" here.

      The abbreviations of the thresholds  ( T<sub>min</sub>,  T<sub>max</sub> in the initial version) have been updated to CTL and CTH, respectively, to better reflect their role in tracking calcium levels. The mathematical formulations have further been reorganized for better readability. The revised Methods section now follows a more structured flow, first explaining the learning mechanisms, followed by the equations and their dependencies.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer 1 (Public Review):

      Thank you for the helpful comments. Below, we have quoted the relevant sections from the revised manuscript as we respond to the reviewer’s comments item-by-item.

      Weaknesses:

      While the task design in this study is intentionally stimulus-rich and places a minimal constraint on the animal to preserve naturalistic behavior, this is, unfortunately, a double-edged sword, as it also introduces additional variables that confound some of the neural analysis. Because of this, a general weakness of the study is a lack of clear interpretability of the task variable neural correlates. This is a limitation of the task, which includes many naturally correlated variables - however, I think with some additional analyses, the authors could strengthen some of their core arguments and significantly improve clarity.

      We acknowledge the weakness and have included additional analyses to compensate for it. The details are as follows in our reply to the subsequent comments.  

      For example, the authors argue, based on an ANN decoding analysis (Figure 2b), that PFC neurons encode spatial information - but the spatial coordinate that they decode (the distance to the active foraging zone) is itself confounded by the fact that animals exhibit different behavior in different sections of the arena. From the way the data are presented, it is difficult to tell whether the decoder performance reflects a true neural correlate of distance, or whether it is driven by behavior-associated activity that is evoked by different behaviors in different parts of the arena. The author's claim that PFC neurons encode spatial information could be substantiated with a more careful analysis of single-neuron responses to supplement the decoder analysis. For example, 1) They could show examples of single neurons that are active at some constant distance away from the foraging site, regardless of animal behavior, and 2) They could quantify how many neurons are significantly spatially modulated, controlling for correlates of behavior events. One possible approach to disambiguate this confound could be to use regression-based models of neuron spiking to quantify variance in neuron activity that is explained by spatial features, behavioral features, or both.

      First of all, we would like to point out that while the recording was made during naturalistic foraging with minimal constraints behaviorally, a well-trained rat displayed an almost fixed sequence of actions within each zone. The behavioral repertoire performed in each zone was very different from each other: exploratory behaviors in the N-zone, navigating back and forth in the F-zone, and licking sucrose while avoiding attacks in the E-zone. Therefore, the entire arena is not only divided by the geographical features but also by the distinct set of behaviors performed in each zone. This is evident in the data showing a higher decoding accuracy of spatial distance in the F-zone than in the N- or E-zone. In this sense, the heterogeneous encoding reflects heterogenous distribution of dominant behaviors (navigation in the F-zone and attack avoidance while foraging in the E-zone) and hence corroborate the reviewer’s comment at a macroscopic scale encompassing the entire arena.

      Having said that, the more critical question is whether the neural activity is more correlated with microscopic behaviors at every moment rather than the location decoded in the F-zone. As the reviewer suggested, the first-step is to analyze single-neuron activity to identify whether direct neural correlates of location exist. To this end, traditional place maps were constructed for individual neurons. Most neurons did not show cohesive place fields across different regions, indicating little-to-no direct place coding by individual neurons. Only a few neurons displayed recognizable place fields in a consistent manner. However, even these place fields were irregular and patchy, and therefore, nothing comparable to the place cells or grid cells found in the hippocampus or entorhinal cortex. Some examples firing maps have been added to Figure 2 and characterized in the text as below.

      “To determine whether location-specific neural activity exists at the single-cell level in our mPFC data, a traditional place map was constructed for individual neurons. Although most neurons did not show cohesive place fields across different regions in the arena, a few neurons modulated their firing rates based on the rat’s current location. However, even these neurons were not comparable to place cells in the hippocampus (O’Keefe & Dostrovsky, 1971) or grid cells in the entorhinal cortex (Hafting et al., 2005) as the place fields were patchy and irregular in some cases (Figure 2B; Units 66 and 125) or too large, spanning the entire zone rather than a discrete location within it (Units 26 and 56). The latter type of neuron has been identified in other studies (e.g., Kaefer et al., 2020).”

      Next, to verify whether the location decoding reflects neuronal activity due to external features or particular type of action, predicted location was compared between the opposite directions within the F-zone, inbound and outbound in reference to the goal area (Lobsterbot). If the encoding were specifically tied to a particular action or environmental stimuli, there should be a discrepancy when the ANN decoder trained with outbound trajectory is tested for predictions on the inbound path, and vice versa. However, the results showed no significant difference between the two trajectories, suggesting that the decoded distance was not simply reflecting neural responses to location-specific activities or environmental cues during navigation.

      “To determine whether the accuracy of the regressor varied depending on the direction of movement, we compared the decoding accuracy of the regressor for outbound (from the N- to E-zone) vs. inbound (from the E- to N- zone) navigation within the F-zone. There was no significant difference in decoding accuracy between outbound vs. inbound trips (paired t-test; t(39) = 1.52, p =.136), indicating that the stability of spatial encoding was maintained regardless of the moving direction or perceived context (Figure 2E).”

      Additionally, we applied the same regression analysis on a subset of data that were recorded while the door to the robot compartment was closed during the Lobsterbot sessions. This way, it is possible to test the decoding accuracy when the most salient spatial feature, the Lobsterbot, is blocked out of sight. The subset represents an average of 38.92% of the entire session. Interestingly, the decoding accuracy with the subset of data was higher accuracy than that with the entire dataset, indicating that the neural activities were not driven by a single salient landmark. This finding supports our conclusion that the location information can be decoded from a population of neurons rather than from individual neurons that are associated with environmental or proprioceptive cues. We have added the following description of results in the manuscript.

      “Previous analyses indicated that the distance regressor performed robustly regardless of movement direction, but there is a possibility that the decoder detects visual cues or behaviors specific to the E-zone. For example, neural activity related to Lobsterbot confrontation or licking behavior might be used by the regressor to decode distance. To rule out this possibility, we analyzed a subset of data collected when the compartment door was closed, preventing visual access to the Lobsterbot and sucrose port and limiting active foraging behavior. The regressor trained on this subset still decoded distance with a MAE of 12.14 (± 3.046) cm (paired t-test; t(39) = 12.17, p <.001). Notably, the regressor's performance was significantly higher with this subset than with the full dataset (paired t-test; t(39) = 9.895, p <.001).”

      As for the comment on “using regression-based models of neuron spiking to quantify variance in neuron activity that is explained by spatial features, behavioral features, or both”, it is difficult to separate a particular behavioral event let alone timestamping it since the rat’s location was being monitored in the constantly-moving, naturalistic stream of behaviors. However, as mentioned above, a new section entitled “Overlapping populations of mPFC neurons adaptively encode spatial information and defensive decision” argues against single-neuron based account by performing the feature importance analysis. The results showed that even when the top 20% of the most informative neurons were excluded, the remaining neural population could still decode both distance and events.  This analysis supports the idea of a population-wide mode shift rather than distinct subgroups of neurons specialized in processing different sensory or motor events. This idea is also expressed in the schematic diagrams featured in Figure 8 of the revision.

      To substantiate the claim that PFC neurons really switch between different coding "modes," the authors could include a version of this analysis where they have regressed out, or otherwise controlled for, these confounds. Otherwise, the claim that the authors have identified "distinctively different states of ensemble activity," as opposed to simple coding of salient task features, seems premature.

      A key argument in our study is that the mPFC neurons encode different abstract internal representations (distance and avoidance decision) at the level of population. This has been emphasized in the revision with additional analyses and discussions. Most of all, we performed single neuron-based analysis for both spatial encoding (place fields for individual neurons) and avoidance decision (PETHs for head entry and head withdrawal) and contrasted the results with the population analysis. Although some individual neurons displayed a fractured “place cell-like” activity, and some others showed modulated firing at the head-entry and the head-withdrawal events, the ensemble decoding extracted distance information for the current location of the animal at a much higher accuracy. Furthermore, the PCA analysis identified abstract feature dimensions especially regarding the activity in the E-zone that cannot be attributable to a small number of sensory- or motor-related neurons. 

      To mitigate the possibility that the PCA is driven primarily by a small subset of units responsive to salient behavioral events, we also applied PCA to the dataset excluding the activity in the 2-second time window surrounding the head entry and withdrawal. While this approach does not eliminate all cue- or behavior-related activity within the E-zone, it does remove the neural activity associated with emotionally significant events, such as entry into the E-zone, the first drop of sucrose, head withdrawal, and the attack. Even without these events, the PC identified in the E-zone was still separated from those in the F-zone and N-zone. This result again argues in support of distinct states of ensemble activity formed in accordance with different categories of behaviors performed in different zones. Finally, the Naïve Bayesian classifier trained with ensemble activity in the E-zone was able to predict the success and failure of avoidance that occur a few seconds later, indicating that the same population of neurons are encoding the avoidance decision rather than the location of the animal.

      Reviewer 1 (Recommendations):

      The authors include an analysis (Figure 4) of population responses using PCA on session-wide data, which they use to support the claim that PFC neurons encode distinctive neural states, particularly between the encounter zone and nesting/foraging zones. However, because the encounter zone contains unique stimulus and task events (sucrose, threat, etc.), and the samples for PCA are drawn from the entire dataset (including during these events), it seems likely that the Euclidean distance measures analyzed in Figure 4b are driven mostly by the neural correlates of these events rather than some more general change in "state" of PFC dynamics. This does not invalidate this analysis but renders it potentially redundant with the single neuron results shown in Figure 5 - and I think the interpretation of this as supporting a state transition in the coding scheme is somewhat misleading. The authors may consider performing a PCA/population vector analysis on the subset of timepoints that do not contain unique behavior events, rather than on session-wide data, or otherwise equalizing samples that correspond to behavioral events in different zones. Observing a difference in PC-projected population vectors drawn from samples that are not contaminated by unique encounter-related events would substantiate the idea that there is a general shift in neural activity that is more related to the change in context or goal state, and less directly to the distinguishing events themselves.

      Thank you for the comments. Indeed, this is a recurring theme where the reviewers expressed concerns and doubts about heterogenous encoding of different functional modes. Besides the systematic presentation of the results in the manuscript, from PETH to ANN and to Bayesian classifier, we argue, however, that the activity of the mPFC neurons is better represented by the population rather than loose collection of stimulus- or event-related neurons.

      The PCA results that we included as the evidence of distinct functional separation, might reflect activities driven by a small number of event-coding neurons in different zones. As mentioned in the public review, we conducted the same analysis on a subset of data that excluded neural activity potentially influenced by significant events in the E-zone. The critical times are defined as ± 1 second from these events and excluded from the neural data. Despite these exclusions, the results continued to show populational differences between zones, reinforcing the notion that neurons encode abstract behavioral states (decision to avoid or stay) without the sensory- or motor-related activity. Although this analysis does not completely eliminate all possible confounding factors emerging in different external and internal contexts, it provides extra support for the population-level switch occurring in different zones.

      In Figure 7, the authors include a schematic that suggests that the number of neurons representing spatial information increases in the foraging zone, and that they overlap substantially with neurons representing behaviors in the encounter zone, such as withdrawal. They show in Figure 3 that location decoding is better in the foraging zone, but I could not find any explicit analysis of single-neuron correlates of spatial information as suggested in the schematic. Is there a formal analysis that lends support to this idea? It would be simple, and informative, to include a quantification of the fraction of spatial- and behavior-modulated neurons in each zone to see if changes in location coding are really driven by "larger" population representations. Also, the authors could quantify the overlap between spatial- and behavior-modulated neurons in the encounter zone to explicitly test whether neurons "switch" their coding scheme.

      The Figure 7 (now Figure 8) is now completely revised. The schematic diagram is modified to show spatial and avoidance decision encoding by the overlapping population of mPFC neurons (Figure 8a). Most notably, there are very few neurons that encode location but not the avoidance decision or vice versa. This is indicated by the differently colored units in F-zone vs. E-zone. The model also included units that are “not” engaged in any type of encoding or engaged in only one-type of encoding although they are not the majority.

      We have also added a schematic for hypothetical switching mechanisms (Figure 8b) to describe the conceptual scheme for the initiation of encoding-mode switching (sensory-driven vs. arbitrator-driven process)

      “Two main hypotheses could explain this switch. A bottom-up hypothesis suggests sensory inputs or upstream signals dictate encoding priorities, while a top-down hypothesis proposes that an internal or external “arbitrator” selects the encoding mode and coordinates the relevant information (Figure 8B). Although the current study is only a first step toward finding the regulatory mechanism behind this switch, our control experiment, where rats reverted to a simple shuttling task, provide evidence that might favor the top-down hypothesis. The absence of the Lobsterbot degraded spatial encoding rather than enhancing it, indicating that simply reducing the task demand is not sufficient to activate one particular type of encoding mode over another.  The arbitrator hypothesis asserts that the mPFC neurons are called on to encode heterogenous information when the task demand is high and requires behavioral coordination beyond automatic, stimulus-driven execution. Future studies incorporating multiple simultaneous tasks and carefully controlling contextual variables could help determine whether these functional shifts are governed by top-down processes involving specific neural arbitrators or by bottom-up signals.”

      Related to this difference in location coding throughout the environment, the authors suggest in Figure 3a-b that location coding is better in the foraging zone compared to the nest or encounter zones, evidenced by better decoder performance (smaller error) in the foraging zone (Figure 3b). The authors use the same proportion of data from the three zones for setting up training/test sets for cross-validation, but it seems likely that overall, there are substantially more samples from the foraging zone compared to the other two zones, as the animal traverses this section frequently, and whenever it moves from the next into the encounter zone (based on the video). What does the actual heatmap of animal location look like? And, if the data are down-sampled such that each section contributes the same proportion of samples to decoder training, does the error landscape still show better performance in the foraging zone? It is important to disambiguate the effects of uneven sampling from true biological differences in neural activity.

      Thank you for the comment. We agree with the concern regarding uneven data size from different sections of the arena. Indeed, as the heatmap below indicates, the rats spent most of their time in two critical locations, one being a transition area between N-and F-zone and the other near the sucrose port. This imbalance needs to be corrected. In fact we have included methodology to correct this biased sampling. In the result section “Non-navigational behavior reduces the accuracy of decoded location” we have the following results.

      Author response image 1.

      Heatmap of the animal’s position during one example session. (Left) Unprocessed occupancy plot. Each dot represents 0.2 seconds. Right) Smoothed occupancy plot using a Gaussian filter (sigma: 10 pixels, filter size: 1001 pixels). The white line indicates a 10 cm length.

      “To correct for the unequal distribution of location visits (more visits to the F- than to other zones), the regressor was trained using a subset of the original data, which was equalized for the data size per distance range (see Materials and Methods). Despite the correction, there was a significant main effect of the zone (F(1.16, 45.43) = 119.2, p <.001) and the post hoc results showed that the MAEs in the N-zone (19.52 ± 4.46 cm; t(39) = 10.45; p <.001) and the E-zone (26.13 ± 7.57 cm; t(39) = 11.40; p <.001) had a significantly higher errors when compared to the F-zone (14.10 ± 1.64 cm).”

      Also in the method section, we have stated that:

      “In the dataset adjusted for uneven location visits, we divided distance values into five equally sized bins. Then, a sub-dataset was created that contains an equal number of data points for each of these bins.”

      Why do the authors choose to use a multi-layer neural network (Figure 2b-c) to decode the animal's distance to the encounter zone?(…) The authors may consider also showing an analysis using simple regression, or maybe something like an SVM, in addition to the ANN approach.

      We began with a simple linear regression model and progressed to more advanced methods, including SVM and multi-layer neural networks. As shown below, simpler methods could decode distance to some extent, but neural networks and random forest regressors outperformed others (Neural Network: 16.61 cm ± 3.673; Linear Regression: 19.85 cm ± 2.528; Quadratic Regression: 18.68 cm ± 4.674; SVM: 18.88 cm ± 2.676; Random Forest: 13.59 cm ± 3.174).

      We chose the neural network model for two main reasons: (1) previous studies demonstrated its superior performance compared to Bayesian regressors commonly used for decoding neural ensembles, and (2) its generalizability and robustness against noisy data. Although the random forest regressor achieved the lowest decoding error, we avoided using it due to its tendency to overfit and its limited generalization to unseen data.

      Overall, we expect similar results with other regressors but with different statistical power for decoding accuracy. Instead, we speculate that neural network’s use of multiple nodes contributes to robustness against noise from single-unit recordings and enables the network to capture distributed processing within neural ensembles.

      In Figure 6c, the authors show a prediction of withdrawal behavior based on neural activity seconds before the behavior occurs. This is potentially very interesting, as it suggests that something about the state of neural dynamics in PFC is potentially related to the propensity to withdraw, or to the preparation of this behavior. However, another possibility is that the behaves differently, in more subtle ways, while it is anticipating threat and preparing withdrawal behavior - since PFC neurons are correlated with behavior, this could explain decoder performance before the withdrawal behavior occurs. To rule out this possibility, it would be useful to analyze how well, and how early, withdrawal success can be decoded only on the basis of behavioral features from the video, and then to compare this with the time course of the neural decoder. Another approach might be to decode the behavior on the basis of video data as well as neural data, and using a model comparison, measure whether inclusion of neural features significantly increases decoder performance.

      We appreciate this important point, as mPFC activity might indeed reflect motor preparation preceding withdrawal behavior. Another reviewer raised a similar concern regarding potential micro-behavioral influences on mPFC activity prior to withdrawal responses. However, our behavioral analysis suggests that highly trained rats engage in sucrose licking which has little variability regardless of the subsequent behavioral decision. To support, 95% of inter-lick intervals were less than 0.25 seconds, which is not enough time to perform any additional behavior during encounters.

      Author response image 2.

      To further clarify this, we included additional video showing both avoidance and escape withdrawals at close range. This video was recorded during the development of the behavioral paradigm, though we did not routinely collect this view, as animals consistently exhibited stable licking behavior in the E-zone. As demonstrated in the video, the rat remains highly focused on the lick port with minimal body movement during encounters. Therefore, we believe that the neural ensemble dynamics observed in the mPFC are unlikely to be driven by micro-behavioral changes.

      Reviewer 2 (Public Review):

      Thank you for the positive comment on our behavior paradigm and constructive suggestions on additional analysis. We came to think that the role of mPFC could be better portrayed as representing and switching between different encoding targets under different contexts, which in part, was more clearly manifested by the naturalistic behavioral paradigm. In the revision we tried to convey this message more explicitly and provide a new perspective for this important aspect of mPFC function.

      It is not clear what proportion of each of the ensembles recorded is necessary for decoding distance from the threat, and whether it is these same neurons that directly 'switch' to responding to head entry or withdrawal in the encounter phase within the total population. The PCA gets closest to answering this question by demonstrating that activity during the encounter is different from activity in the nesting or foraging zones, but in principle this could be achieved by neurons or ensembles that did not encode spatial parameters. The population analyses are focused on neurons sensitive to behaviours relating to the threat encounter, but even before dividing into subtypes etc., this is at most half of the recorded population.

      In our study, the key idea we aim to convey is that mPFC neurons adapt their encoding schemes based on the context or functional needs of the ongoing task. Other reviewers also suggested strengthening the evidence that the same neurons directly switch between encoding two different tasks. The counteracting hypothesis to "switching functions within the same neurons" posits that there are dedicated subsets of neurons that modulate behavior—either by driving decisions/behaviors themselves or being driven by computations from other brain regions.

      To test this idea, we included an additional analysis chapter in the results section titled Overlapping populations of mPFC neurons adaptively encode spatial information and defensive decision. In this section, we directly tested this hypothesis by examining each neuron's contribution to the distance regressor and the event classifier. The results showed that the histogram of feature importance—the contribution to each task—is highly skewed towards zero for both decoders, and removing neurons with high feature importance does not impair the decoder’s performance. These findings suggest that 1) there is no direct division among neurons involved in the two tasks, and 2) information about spatial/defensive behavior is distributed across neurons.

      Furthermore, we tested whether there is a negative correlation between the feature importance of spatial encoding and avoidance encoding. Even if there were no “key neurons” that transmit a significant amount of information about either spatial or defensive behavior, it is still possible that neurons with higher information in the navigation context might carry less information in the active-foraging context, or vice versa. However, we did not observe such a trend, suggesting that mPFC neurons do not exhibit a preference for encoding one type of information over the other.

      Lastly, another reviewer raised the concern that the PCA results, which we used as evidence of functional separation of different ensemble functions, might be driven by a small number of event-coding neurons. To address this, we conducted the same analysis on a subset of data that excluded neural activity potentially influenced by significant events in the E-zone. In the Peri-Event Time Histogram (PETH) analysis, we observed that some neurons exhibit highly-modulated activity upon arrival at the E-zone (head entry; HE) and immediately following voluntary departure or attack (head withdrawal; HW). We defined 'critical event times' as ± one second from these events and excluded neural data from these periods to determine if PCA could still differentiate neural activities across zones. Despite these exclusions, the results continued to show populational differences between zones, reinforcing the notion that neurons adapt their activity according to the context. We acknowledge that this analysis still cannot eliminate all of the confounding factors due to the context change, but we confirmed that excluding two significant events (delivery onset of sucrose and withdrawal movement) does not alter our result.

      To summarize, these additional results further support the conclusion that spatial and avoidance information is distributed across the neural population rather than being handled by distinct subsets. The analyses revealed no negative correlation between spatial and avoidance encoding, and excluding event-driven neural activity did not alter the observed functional separation, confirming that mPFC neurons dynamically adjust their activity to meet contextual demands.

      A second concern is also illustrated by Fig. 7: in the data presented, separate reward and threat encoding neurons were not shown - in the current study design, it is not possible to dissociate reward and threat responses as the data without the threat present were only used to study spatial encoding integrity.

      Thank you for this valuable feedback. Other reviewers have also noted that Figure 7 (now Figure 8) is misleading and contains assertions not supported by our experiments. In response, we have revised the model to more accurately reflect our findings. We have eliminated the distinction between reward coding and threat coding neurons, simplifying it to focus on spatial encoding and avoidance encoding neurons. The updated figure will more appropriately align with our findings and claims. A. Distinct functional states (spatial vs. avoidance decision) encoded by the same population neurons are separable by the region (F- vs. E zone). B. Hypothetical control models by which mPFC neurons assume different functional states.

      Thirdly, the findings of this work are not mechanistic or functional but are purely correlational. For example, it is claimed that analyzing activity around the withdrawal period allows for ascertaining their functional contributions to decisions. But without a direct manipulation of this activity, it is difficult to make such a claim. The authors later discuss whether the elevated response of Type 2 neurons might simply represent fear or anxiety motivation or threat level, or whether they directly contribute to the decision-making process. As is implicit in the discussion, the current study cannot differentiate between these possibilities. However, the language used throughout does not reflect this. 

      We acknowledge that our experiments only involve correlational study and this serves as weakness. Although we carefully managed to select word to not to be deterministic, we agree that some of the language might mislead readers as if we found direct functional contribution. Thus, we changed expressions as below.

      “We then further analyzed the (functional contribution ->)correlation between neural activity and success and failure of avoidance behavior. If the mPFC neurons (encode ->)participate in the avoidance decisions, avoidance withdrawal (AW; withdrawal before the attack) and escape withdrawal (EW; withdrawal after the attack) may be distinguishable from decoded population activity even prior to motor execution.”

      Also, we added part below in discussion section to clarify the limitations of the study.

      “Despite this interesting conjecture, any analysis based on recording data is only correlational, mandating further studies with direct manipulation of the subpopulation to confirm its functional specificity.”

      Fourthly, the authors mention the representation of different functions in 'distinct spatiotemporal regions' but the bulk of the analyses, particularly in terms of response to the threat, do not compare recordings from PL and IL although - as the authors mention in the introduction - there is prior evidence of functional separation between these regions.

      Thank you for bringing this part to our attention. As we mentioned in the introduction, we acknowledge the functional differences between the PL and IL regions. Although differences in spatial encoding between these two areas were not deeply explored, we anticipated finding differences in event encoding, given the distinct roles of the PL and IL in fear and threat processing. However, our initial analysis revealed no significant differences in event encoding between the regions, and as a result, we did not emphasize these differences in the manuscript. To address this point, we have reanalyzed the data separately and included the following findings in the manuscript.

      “However, we did not observe a difference in decoding accuracy between the PL and IL ensembles, and there were no significant interactions between regressor type (shuffled vs. original) and regions (mixed-effects model; regions: p=.996; interaction: p=.782). These results indicate that the population activity in both the PL and IL contains spatial information (Figure 2D, Video 3).

      […]

      Furthermore, we analyzed whether there is a difference in prediction accuracy between sessions with different recorded regions, the PL and the IL. A repeated two-way ANOVA revealed no significant difference between recorded regions, nor any interaction (regions: F(1, 38) = 0.1828, p = 0.671; interaction: F(1, 38) = 0.1614, p = 0.690).

      […]

      We also examined whether there is a significant difference between the PL and IL in the proportion of Type 1 and Type 2 neurons. In the PL, among 379 recorded units, 143 units (37.73%) were labeled as Type 1, and 75 units (19.79%) were labeled as Type 2. In contrast, in the IL, 156 units (61.66%) and 19 units (7.51%) of 253 recorded units were labeled as Type 1 and Type 2, respectively. A Chi-square analysis revealed that the PL contains a significantly higher proportion of Type 2 neurons (χ²(1, 632) = 34.85, p < .001), while the IL contains a significantly higher proportion of Type 1 neurons compared to the other region (χ²(1, 632) = 18.07, p < .001).”

      To summarize our additional results, we did not observe performance differences in distance decoding or event decoding. The only difference we observed was the proportional variation of Type 1 and Type 2 neurons when we separated the analysis by brain region. These results are somewhat counterintuitive, considering the distinct roles of the two regions—particularly the PL in fear expression and the IL in extinction learning. However, since the studies mentioned in the introduction primarily used lesion and infusion methods, this discrepancy may be due to the different approach taken in this study. Considering this, we have added the following section to the discussion.

      “Interestingly, we found no difference between the PL and IL in the decoding accuracy of distance or avoidance decision. This somewhat surprising considering distinct roles of these regions in the long line of fear conditioning and extinction studies, where the PL has been linked to fear expression and the IL to fear extinction learning (Burgos-Robles et al., 2009; Dejean et al., 2016; Kim et al., 2013; Quirk et al., 2006; Sierra-Mercado et al., 2011; Vidal-Gonzalez et al., 2006). On the other hand, more Type 2 neurons were found in the PL and more Type 1 neurons were found in the IL. To recap, typical Type 1 neurons increased the activity briefly after the head entry and then remained inhibited, while Type 2 neurons showed a burst of activity during head entry and sustained increased activity. One study employing context-dependent fear discrimination task (Kim et al., 2013) also identified two distinct types of PL units: short-latency CS-responsive units, which increased firing during the initial 150 ms of tone presentation, and persistently firing units, which maintained firing for up to 30 seconds. Given the temporal dynamics of Type 2 neurons, it is possible that our unsupervised clustering method may have merged the two types of neurons found in Kim et al.’s study.

      While we did not observe decreased IL activity during dynamic foraging, prior studies have shown that IL excitability decreases after fear conditioning (Santini et al., 2008), and increased IL activity is necessary for fear extinction learning. In our paradigm, extinction learning was unlikely, as the threat persisted throughout the experiment. Future studies with direct manipulation of these subpopulations, particularly examining head withdrawal timing after such interventions, could provide insight into how these subpopulations guide behavior.”

      Additionally, we made some changes in the introduction, mainly replacing the PL/IL with mPFC to be consistent with the main body of results and conclusion and also specifying the correlational nature of the recording study.

      “Machine learning-based populational decoding methods, alongside single-cell analyses, were employed to investigate the correlations between neuronal activity and a range of behavioral indices across different sections within the foraging arena.”

      Reviewer 2 (Recommendations):

      The authors consistently use parametric statistical tests throughout the manuscript. Can they please provide evidence that they have checked whether the data are normally distributed? Otherwise, non-parametric alternatives are more appropriate.

      Thank you for mentioning this important issue in the analysis. We re-ran the test of normality for all our data using the Shapiro-Wilk test with a p-value of .05 and found that the following data sets require non-parametric tests, as summarized in Author response table 1 below. For those analyses which did not pass the normality test, we used a non-parametric alternative test instead. We also updated the methods section. For instance, repeated measures ANOVA for supplementary figure S1 and PCA results were changed to the Friedman test with Dunn’s multiple comparison test.

      Author response table 1.

      Line 107: it is not clear here or in the methods whether a single drop of sucrose solution is delivered per lick or at some rate during the encounter, both during the habituation or in the final task. This is important information in order to understand how animals might make decisions about whether to stay or leave and how to interpret neural responses during this time period. Or is it a large drop, such that it takes multiple licks to consume? Please clarify.

      The apparatus we used incorporated an IR-beam sensor-controlled solenoid valve. As the beam sensor was located right in front of the pipe, the rat’s tongue activated the sensor. As a result, each lick opened the valve for a brief period, releasing a small amount of liquid, and the rat had to continuously lick to gain access to the sucrose. We carefully regulated the flow of the liquid and installed a small sink connected to a vacuum pump, so any remaining sucrose not consumed by the rat was instantly removed from the port. We clarified how sucrose was delivered in the methods section and also in the results section.

      Method:

      “The sucrose port has an IR sensor which was activated by a single lick. The rat usually stays in front of the lick port and continuously lick up to a rate of 6.3 times per second to obtain sucrose. Any sucrose droplets dropped in the bottom sink were immediately removed by negative pressure so that the rat’s behavior was focused on the licking.”

      Result:

      “The lick port was activated by an IR-beam sensor, triggering the solenoid valve when the beam was interrupted. The rat gradually learned to obtain rewards by continuously licking the port.”

      However, I'm not sure I understand the authors' logic in the interpretation: does the S-phase not also consist of goal-directed behaviour? To me, the core difference is that one is mediated by threat and the other by reward. In addition, it would be helpful to visualize the behaviour in the S-phase, particularly the number of approaches. This difference in the amount of 'experience' so to speak might drive some of the decrease in spatial decoding accuracy, even if travel distance is similar (it is also not clear how travel distance is calculated - is this total distance?) Ideally, this would also be included as a predictor in the GLM.

      We agree that the behaviors observed during the shuttling phase can also be considered goal-directed, as the rat moves purposefully toward explicit goals (the sucrose port and the N-zone during the return trip). However, we argue that there is a significant difference in the level of complexity of these goals.

      During the L-phase, the rat not only has to successfully navigate to the E-zone for sucrose but also pay attention to the robots, either to avoid an attack from the robot's forehead or escape the fast-striking motion of the claw. When the rat runs toward the E-zone, it typically takes a side-approaching path, similar to Kim and Choi (2018), and exhibits defensive behaviors such as a stretched posture, which were not observed in the S-phase. This behavioral characteristic differs from the S-phase, where the rat adopted a highly stereotyped navigation pattern fairly quickly (within 3 sessions), evidenced by more than 50 shuttling trajectories per session. In this phase, the rat exhibited more stimulus-response behavior, simply repeating the same actions over time without deliberate optimization.

      In our additional experiment with two different levels of goal complexity (reward-only vs. reward/threat conflict), we used a between-subject design in which both groups experienced both the S-phase and L-phase before surgery and underwent only one type of session afterward. This approach ruled out the possibility of differences in contextual experience. Additionally, since we initially designed the S-phase as extended training, behaviors in the apparatus tended to stabilize after rats completed both the S-phase and L-phase before surgery. As a result, we compared the post-surgery Lobsterbot phase to the post-surgery shuttling phase to investigate how different levels of goal complexity shape spatial encoding strength.

      To clarify our claim, we edited the paragraph below.

      “This absence of spatial correlates may result from a lack of complex goal-oriented navigation behavior, which requires deliberate planning to acquire more rewards and avoid potential threats.

      […]

      After the surgery, unlike the Lob-Exp group, the Ctrl-Exp group returned to the shuttling phase, during which the Lobsterbot was removed. With this protocol, both groups experienced sessions with the Lobsterbot, but the Ctrl-Exp group's task became less complex, as it was reduced to mere reward collection.

      . Given these observations, along with the mPFC’s lack of consistency in spatial encoding, it is plausible that the mPFC operates in multiple functional modes, and the spatial encoding mode is preempted when the complexity of the task requires deliberate spatial navigation.”

      Additionally, we added behavior data during initial S-phase into Supplementary Figure 1.

      It is good point that the amount of experience might drive decrease in spatial decoding accuracy. To test this hypothesis, we added a new variable, the number of Lobsterbot sessions after surgery, to the previous GLM analysis. The updated model predicted the outcome variable with significant accuracy (F(4,44) = 10.31, p < .001), and with the R-squared value at 0.4838. The regression coefficients were as follows: presence of the Lobsterbot (2.76, standard error [SE] = 1.11, t = 2.42, p = .020), number of recorded cells (-0.43, SE = .08, t = -5.22, p < .001), recording location (0.90, SE = 1.11, p = .424), and number of L sessions (0.002, SE = 0.11, p = .981). These results indicate that the number of exposures to the Lobsterbot sessions, as a measure of experience, did not affect spatial decoding accuracy.

      For minor edit, we edited the term as “total travel distance”.

      Relating to the previous point, it should be emphasized in both sections on removing the Lobsterbot and on non-navigational behaviours that the spatial decoding is all in reference to distance from the threat (or reward location). The language in these sections differs from the previous section where 'distance from the goal' is mentioned. If the authors wish to discuss spatial decoding per se, it would be helpful to perform the same analysis but relative to the animals' own location which might have equal accuracy across locations in the arena. Otherwise, it is worth altering the language in e.g. line 258 onwards to state the fact that distance to the goal is only decodable when animals are actively engaged in the task.

      Thank you for this comment, we changed the term as “distance from the conflict zone” or “distance of the rat to the center of the E-zone” to clarify our experiment setup.

      In Fig. 5, why is the number of neurons shown in the PETHs less than the numbers shown in the pie charts?

      The difference in the number of neurons between the PETHs and the pie charts in Figure 5 is because PETHs are drawn only for 'event-responsive' units. For visualizing the neurons, we selectively included those that met certain criteria described in Method section (Behavior-responsive unit analysis). We have updated the caption for Figure 5 as follows to minimize confusion.

      “Multiple subpopulations in the mPFC react differently to head entry and head withdrawal.

      (A) Top: The PETH of head entry-responsive units is color-coded based on the Z-score of activity.

      (C) The PETH of head withdrawal-responsive units is color-coded based on the Z-score of activity.”

      I appreciate the amount of relatively unprocessed data plotted in Figure 5, but it would be great to visualize something similar for AW vs. EW responses within the HW2 population. In other words, what is there that's discernably different within these responses that results in the findings of Fig. 6?

      To visualize the difference in neural activity between AW and EW, we included an additional supplementary figure (Supplementary Figure 5). We divided the neurons into Type 1 and Type 2 and plotted PETH during Avoidance Withdrawal (AW) and Escape Withdrawal (EW). Consistent with the results shown in Figure 6d, we could visually observe increased activity in Type 2 neurons before the execution of AW compared to EW. However, we couldn’t find a similar pattern in Type 1 neurons.

      On a related note, it would add explanatory power if the authors were able to more tightly link the prediction accuracy of the ensemble (particularly the Type 2 neurons) to the timing of the behaviour. Earlier in the manuscript it would be helpful to show latency to withdraw in AW trials; are animals leaving many seconds before the attack happens, or are they just about anticipating the timing of the attack? And therefore when using ensemble activity to predict the success of the AW, is the degree to which this can be done in advance (as the authors say, up to 6 seconds before withdrawal) also related to how long the animal has been engaged with the threat?

      We agree that the timing of head withdrawal, particularly in AW trials, is a critical factor in describing the rat's strategy toward the task. To test whether the rat uses a precise timing strategy—for instance, leaving several seconds before the attack or exploiting the discrete 3- and 6-second attack durations—we plotted all head withdrawal timepoints during the 6-second trials. The distribution was more even, without distinguishable peaks (e.g., at the very initial period or at the 3- or 6-second mark). This indicates a lack of precise temporal strategy by the rat. We included additional data in the supplementary figure (Supplementary Figure 6) and added the following to the results section.

      “We monitored all head withdrawal timepoints to assess whether rats developed a temporal strategy to differentiate between the 3-second and 6-second attacks. We found no evidence of such a strategy, as the timings of premature head withdrawals during the 6-second attack trials were evenly distributed (see Supplementary Figure S1).”

      As depicted in the new supplementary figure, head withdrawal times during avoidance behavior vary from sub-seconds to the 3- or 6-second attack timepoints. After receiving the reviewer’s comment, we became curious whether there is a decoding accuracy difference depending on how long the animal engaged with the threat. We selected all 6-second attack and avoidance withdrawal trials and checked if correctly classified trials (AW trials classified as AW) had different head withdrawal times—perhaps shorter durations—compared to misclassified trials (AW trials classified as EW). As shown in Author response image 3 below, there was no significant difference between these two types, indicating that the latency of head withdrawal does not affect prediction accuracy.

      Author response image 3.

      Finally, there remain some open questions. One is how much encoding strength - of either space or the decision to leave during the encounter - relates to individual differences in animal performance or behaviour, particularly because this seems so variable at baseline. A second is how stable this encoding is. The authors mention that the distance encoding must be stable to an extent for their regressor to work; I am curious whether this stability is also found during the encounter coding, and also whether it is stable across experience. For example, in a session when an individual has a high proportion of anticipatory withdrawals, is the proportion of Type 2 neurons higher?

      Thank you for these questions. To recap the number of animals that we used, we used five rats during Lobsterbot experiments, and three rats for control experiment that we removed Lobsterbot after training. Indeed, there were individual differences in performance (i.e. avoidance success rate), number of recorded units (related to the recording quality), and baseline behaviors. To clarify these differences, see author response image 4 below.

      Author response image 4.

      We used a GLM to measure how much of the decoder’s accuracy was explained by individual differences. The result showed that 38.96% of distance regressor’s performance, and 12.14% of the event classifier’s performance was explained by the individual difference. Since recording quality was highly dependent on the animals, the high subject variability detected in the distance regression might be attributed to the number of recorded cells. Rat00 which had the lowest average mean absolute error had the highest number of recorded cells at average of 18. Compared to the distance regression, there was less subject variability in event classification. Indeed, the GLM results showed that the variability explained by the number of cells was only 0.62% in event classification.

      The reason we mentioned that "distance encoding must be stable for our regressor to work" is entirely based on the population-level analysis. Because we used neural data and behaviors from entire trials within a session, the regressor or classifier would have low accuracy if encoding dynamics changed within the session. In other words, if the way neurons encode avoidance/escape predictive patterns changed within a training set, the classifier would fail to generate an optimized separation function that works well across all datasets.

      To further investigate whether changes in experience affect event classification results over time, we plotted an additional graph below. Although there are individual and daily fluctuations in decoding accuracy, there was no observable trend throughout the experiments.

      Author response image 5.

      Regarding the correlation between the ratio of avoidance withdrawal and the proportion of Type 2 neurons, we were also curious and analyzed the data. Across 40 sessions, the correlation was -0.0716. For Type 1 neurons, it was slightly higher at 0.1459. We believe this indicates no significant relationship between the two variables.

      Minor points:

      I struggled with the overuse of acronyms in the paper. Some might be helpful but F-zone/N-zone, for example, or HE/HW, AW/EW are a bit of a struggle. After reading the paper a few times I learned them but a naive reader might need to often refer back to when they were first defined (as I frequently had to).

      To increase readability, we removed acronyms that are not often used and changed HE/HW to head-entry/head-withdrawal.

      I have a few questions about Figure 1F: in the text (line 150) it says that 'surgery was performed after three L sessions when the rats displayed a range of 30% to 60% AW'. This doesn't seem consistent with what is plotted, which shows greater variability in the proportion of AW behaviours both before and after surgery. It also appears that several rats only experienced two days of the L1 phase; please make clear if so. And finally, what is the line at 50% indicating? Neither the text nor the legend discuss any sort of thresholding at 50%. Instead, it would be best to make the distinction between pre- and post-surgery behaviour visually clearer.

      Thank you for pointing out this issue. We acknowledge there was an error in the text description. As noted in the Methods section, we proceeded with surgery after three Lobsterbot sessions. We have removed the incorrect part from the Results section and revised the Methods section for clarity.

      “After three days of Lobsterbot sessions, the rats underwent microdrive implant surgery, and recording data were collected from subsequent sessions, either Lobsterbot or shuttling sessions, depending on the experiment. For all post-surgery sessions, those with fewer than 20 approaches in 30 minutes were excluded from further analysis.”

      Among the five rats, Rat2 and Rat3 did not approach the robot during the entire Lob2 session, which is why these two rats do not have Lob2 data points. We updated the caption for regarding issue.

      Initially, we added a 50% reference line, but we agree it is unnecessary as we do not discuss this reference. We have updated the figure to include the surgery point, as shown in Supplementary Figure 1.

      Fig. 2C: each dot is an ensemble of simultaneously recorded neurons, i.e. a subset of the total 800-odd units if I understand correctly. How many ensembles does each rat contribute? Similarly, is this evenly distributed across PL and IL?

      Yes, each dot represents a single session, with a total of 40 sessions. Five rats contributed 11, 9, 8, 7, and 5 sessions, respectively. Although each rat initially had more than 10 sessions, we discarded some sessions with a low unit count (fewer than 10 sessions; as detailed in Materials and Methods - Data Collection). We collected 25 sessions from the PL and 15 sessions from the IL. Our goal was to collect more than 200 units per each region.

      Please show individual data points for Fig. 2D.

      We update the figure with individual data points.

      Is there a reason why the section on removing the Lobsterbot (lines 200 - 215) does not have associated MAE plots? Particularly the critical comparison between Lob-Exp and Ctl-Exp.

      We intentionally removed some graphs to create a more compact figure, but we appreciate your suggestion and have included the graph in Figure 2.

      Some references to supplementary materials are not working, e.g. line 333.

      Our submitted version of manuscript had reference error. For the current version, we used plane text, and the references are fixed.

      The legend for Supp. Fig. 2B is incorrect.

      We greatly appreciate this point. We changed the caption to match the figure.

      Reviewer 3 (Public Review):

      Thank you for recognizing our efforts in designing an ethologically relevant foraging task to uncover the multiple roles of the mPFC. While we acknowledge certain limitations in our methodology—particularly that we only observed correlations between neural activity and behavior without direct manipulation—we have conducted additional analyses to further strengthen our findings.

      Weakness:

      The primary concern with this study is the absence of direct evidence regarding the role of the mPFC in the foraging behavior of the rats. The ability to predict heterogeneous variables from the population activity of a specific brain area does not necessarily imply that this brain area is computing or using this information. In light of recent reports revealing the distributed nature of neural coding, conducting direct causal experiments would be essential to draw conclusions about the role of the mPFC in spatial encoding and/or threat evaluation. Alternatively, a comparison with the activity from a different brain region could provide valuable insights (or at the very least, a comparison between PL and IL within the mPFC).

      Thank you for the comment. Indeed, the fundamental limitation of the recording study is that it is only correlational, and any causal relationship between neural activity and behavioral indices is only speculative. We made it clearer in the revision and refrained from expressing any speculative ideas suggesting causality throughout the revision. While we did not provide direct evidence that the mPFC is computing or utilizing spatial/foraging information, we based our assertion on previous studies that have directly demonstrated the mPFC's role in complex decision-making tasks (Martin-Fernandez et al., 2023; Orsini et al., 2018; Zeeb et al., 2015) and in certain types of spatial tasks (De Bruin et al., 1994; Sapiurka et al., 2016) . We would like to emphasize that, to the best of our knowledge, there was no previous study which investigated the mPFC function while animal is solving multiple heterogenous problems in semi-naturalistic environment. Therefore, although our recording study only provides speculative causal inference, it certainly provides a foundation for investigating the mPFC function. Future study employing more sophisticated, cell-type specific manipulations would confirm the hypotheses from the current study.

      One of the key questions of this manuscript is how multiple pieces of information are represented in the recorded population of neurons. Most of the studies mentioned above use highly structured experimental designs, which allow researchers to study only one function of the mPFC. In the current study, the semi-naturalistic environment allows rats to freely switch between multiple behavioral sets, and our decoding analysis quantitatively assesses the extent to which spatial/foraging information is embedded during these sets. Our goal is to demonstrate that two different task hyperspaces are co-expressed in the same region and that the degree of this expression varies according to the rat’s current behavior (See Figure 8(b) in the revised manuscript).

      Alternatively, we added multiple analyses. First, we included a single unit-level analysis looking at the place cell-like property to contrast with the ensemble decoding. Most neurons did not show well-defined place fields although there were some indications for place cell-like property. For example, some neurons displayed fragmented place fields or unusually large place fields only at particular spots in the arena (mostly around the gates). The accuracy from this place information at the single-neuron level is much lower than that acquired from population decoding. Likewise, although there were neurons with modulated firing around the time of particular behavior (head entry and withdrawal), overall prediction accuracy of avoidance decision was much higher when the ensemble-based classifier was applied.

      Moreover, given that high-dimensional movement has been shown to be reflected in the neural activity across the entire dorsal cortex, more thorough comparisons between the neural encoding of task variables and movement would help rule out the possibility that the heterogeneous encoding observed in the mPFC is merely a reflection of the rats' movements in different behavioral modes.

      Thanks for the comment. We acknowledge that the neural activity may reflect various movement components across different zones in the arena. We performed several analyses to test this idea. First, we want to recap our run-and-stop event analysis may provide an insight regarding whether the mPFC neurons are encoding locations despite the significant motor events. The rats typically move across the F-zone fairly routinely and swiftly (as if they are “running”) to reach the E-zone at which they reduce the moving speed to almost a halt (“stopping”). The PETHs around these critical motor events, however, did not show any significant modulation of neural activity indicating that most neurons we recorded from mPFC did not respond to movement.

      We added this analysis to demonstrate that these sudden stops did not evoke the characteristic activation of Type 1 and Type 2 neurons observed during head entry into the E-zone. When we isolated these sudden stops outside the E-zone, we did not observe this neural signature (Supplementary Figure 2).

      Second, our PCA results showed that population activity in the E-zone during dynamic foraging behavior was distinct from the activity observed in the N- and F-zones during navigation. However, there is a possibility that the two behaviorally significant events—entry into the E-zone and voluntary or sudden exit—might be driving the differences observed in the PCA results. To account for this, we designated ±1 second from head entry and head withdrawal as "critical event times," excluded the corresponding neural data, and reanalyzed the data. This method removed neural activity associated with sudden movements in specific zones. Despite this exclusion, the PCA still revealed distinct population activity in the E-zone, different from the other zones (Supplementary Figure 4). This result reduces the likelihood that the observed heterogeneous neural activity is merely a reflection of zone-specific movements.

      Lastly, the main claim of the paper is that the mPFC population switches between different functional modes depending on the context. However, no dynamic analysis or switching model has been employed to directly support this hypothesis.

      Thank you for this comment. Since we did not conduct a manipulation experiment, there is a clear limitation in uncovering how switching occurs between the two task contexts. To make the most of our population recording data, we added an additional results section that examines how individual neurons contribute to both the distance regressor and the event classifier. Our findings support the idea that distance and dynamic foraging information are distributed across neurons, with no distinct subpopulations dedicated to each context. This suggests that mPFC neurons adjust their coding schemes based on the current task context, aligning with Duncan’s (2001) adaptive coding model, which posits that mPFC neurons adapt their coding to meet the task's current demands.

      Reviewer 3 (Recommendations):

      The evidence for spatial encoding is relatively weak. In the F-zone (50 x 48 cm), the average error was approximately 17 cm, constituting about a third of the box's width and likely not significantly smaller than the size of a rat's body. The errors in the shuffled data are also not substantially greater than those in the original data. An essential test indicates that spatial decoding accuracy decreases when the Losterbot is removed. However, assessing the validity of the results is difficult in the current state. There is no figure illustrating the results, and no statistics are provided regarding the test for matching the number of neurons.

      We acknowledge that the average error (~ 17 cm ) measured in our study is relatively large, even though the error is significantly smaller than that by the shuffled control model (22.6 cm). Previous studies reported smaller prediction errors but in different experimental conditions: 16 cm in Kaefer et al. (2020) and less than 10 cm in Ma et al. (2023) and Mashhoori et al. (2018). Most notably, the average number of units used in our study (15.8 units per session) is significantly smaller compared to the previous works, which used 63, 49, and 40 units, respectively. As our GLM results demonstrated, the number of recorded cells significantly influenced decoding accuracy (β = -0.43 cm/neuron). With a similar number of recorded cells, we would have achieved comparable decoding accuracy. In addition, unlike other studies that have employed a dedicated maze such as the virtual track or the 8-shaped maze, we exposed rats to a semi-naturalistic environment where they exhibited a variety of behaviors beyond simple navigation. As argued throughout the manuscript, we believe that the spatial information represented in the mPFC is susceptible to disruption when the animal engages in other activities. A similar phenomenon was reported by Mashhoori et al. (2018), where the decoder, which typically showed a median error of less than 10 cm, exhibited a much higher error—nearly 100 cm—near the feeder location.

      As for the reviewer’s request for comparing spatial decoding without the Lobsterbot, we added a new figure to illustrate the spatial decoding results, including statistical details. We also applied a Generalized Linear Model to regress out the effect of the number of recorded neurons and statistically assess the impact of Lobsterbot removal. This adjustment directly addresses the reviewer's request for a clearer presentation of the results and helps contextualize the decoding performance in relation to the number of recorded neurons.

      As indicated in the public review, drawing conclusions about the role of the mPFC in navigation and avoidance behavior during the foraging task is challenging due to the exclusively correlational nature of the results. The accuracy in AW/EW discrimination increases a few seconds before the response, implying that changes in mPFC activity precede the avoidance/escape response. However, one must question whether this truly reflects the case. Could this phenomenon be attributed to rats modifying their "micro-behavior" (as evidenced by changes in movement observed in the video) before executing the escape response, and subsequently influencing mPFC activity?

      We appreciate the reviewer's thoughtful observation regarding the correlational nature of our results and the potential influence of pre-escape micro-behaviors on mPFC activity. We acknowledge that the increased accuracy in AW/EW discrimination preceding the response could also be correlated with micro-behaviors. However, there is very little room for extraneous behavior other than licking the sucrose delivery port within the E-zone, as the rats are highly trained to perform this stereotypical behavior. To support this, we measured the time delays between licking events (inter-lick intervals). The results show a sharp distribution, with 95% of the intervals falling within a quarter second, indicating that the rats were stable in the E-zone, consistently licking without altering their posture.

      To complement the data presented in Author response image 2, a video clip showing a rat engaged in licking behavior was included. We carefully designed the robot compartment and adjusted the distance between the Lobsterbot and the sucrose port to ensure that rats could exhibit only limited behaviors inside the E-zone. The video confirms that no significant micro-behaviors were observed during the rat’s activity in the E-zone.

      If mPFC activity indeed switches mode, the results do not clearly indicate whether individual cells are specifically dedicated to spatial representation and avoidance or if they adapt their function based on the current goal. Figure 7, presented as a schematic illustration, suggests the latter option. However, the proportion of cells in the HE and HW categories that also encode spatial location has not been demonstrated. It has also not been shown how the switch is manifested at the level of the population.

      Thank you for this comment. As the reviewer pointed out, we suggest that mPFC neurons do not diverge based on their functions, but rather adapt their roles according to the current goal. To support this assertion, we added an additional results section that calculates the feature importance of decoders. This analysis allows us to quantitatively measure each neuron’s contribution to both the distance regressor and the event decoder. Our results indicate that distance and defensive behavior are not encoded by a small subset of neurons; instead, the information is distributed across the population. Shuffling the neural data of a single neuron resulted in a median increase in decoding error of 0.73 cm for the distance regressor and 0.01% for the event decoder, demonstrating that the decoders do not rely on a specific subset of neurons that exclusively encode spatial and/or defensive behavior

      Although we found supporting evidence that mPFC neurons encode two different types of information depending on the current context, we acknowledge that we could not go further in answering how this switch is manifested. One simple explanation is that the function is driven by current contextual information and goals—in other words, a bottom-up mechanism. However, in our control experiment, simplifying the navigation task worsened the encoding of spatial information in the mPFC. Therefore, we speculate that an external or internal arbitrator circuit determines what information to encode. A precise temporal analysis of the timepoint when the switch occurs in more controlled experiments might answer these questions. We have added this discussion to the discussion section.

      PL and IL are two distinct regions; however, there is no comparison between the two areas regarding their functional properties or the representations of the cells. Are the proportions of cell categories (HE vs HW or HE1 vs HE2, spatial encoding vs no spatial encoding) different in IL and PL? Are areas differentially active during the different behaviors?

      Thank you for bringing up this issue. As mentioned in our response to the public review, we included a comparison between the PL and IL regions. While we did not observe any differences in spatial encoding (feature importance scores), the only distinction was in the proportion of Type 1 and Type 2 neurons, as the reviewer suggested. We have incorporated our interpretation of these results into the discussion section.

      The results and interpretations of the cluster analysis appear to be highly dependent on the parameters used to define a cluster. For example, the HE2 category includes cells with activity that precedes events and gradually decreases afterward, as well as cells with activity that only follows the events.

      We strongly agree that dependency on hyperparameters is a crucial point when using unsupervised clustering methods. To eliminate any subjective criteria in defining clusters, we carefully selected our clustering approach, which requires only two hyperparameters: the number of initial clusters (set to 8) and the minimum number of cells required to be considered a valid cluster (cutoff limit, 50). The rationale behind these choices was: 1) a higher number of initial clusters would fail to generalize neural activity, 2) clusters with fewer than 50 neurons would be difficult to analyze, and 3) to prevent the separation of clusters that show noisy responses to the event.

      Author response table 2 shows the differences in the number of cell clusters when we varied these two parameters. As demonstrated, changing these two variables does result in different numbers of clusters. However, when we plotted each cluster type’s activity around head entry (HE) and head withdrawal (HW), an increased number of clusters resulted in the addition of small subsets with low variation in activity around the event, without affecting the general activity patterns of the major clusters.

      The example mentioned by the reviewer—possible separation of HE2—appears when using a hyperparameter set those results in 4 clusters, not 3. In this result, 83 units, which were labeled as HE2 in the 3-cluster hyperparameter set, form a new group, HE3 (Group 3). This group of units shows increased activity after head entry and exhibited characteristics similar to HE2, with most of the units classified as HW2, maintaining high activity until head withdrawal. Among the 83 HE3 units, 36 were further classified as HW2, 44 as non-significant, and 3 as HW1. Therefore, we believe this does not affect our analysis, as we observed the separation of two major groups, Type 1 (HE1-HW1) and Type 2 (HE2-HW2), and focused our analysis on these groups afterward.

      Despite this validation, there remains a strong possibility that our method might not fully capture small yet significant subpopulations of mPFC units. As a result, we have included a sentence in the methods section addressing the rationale and stability of our approach.

      “(Materials and Methods) To compensate for the limited number of neurons recorded per session, the hyperparameter set was chosen to generalize their activity and categorize them into major types, allowing us to focus on neurons that appeared across multiple recording sessions. Although changes in the hyperparameter sets resulted in different numbers of clusters, the major activity types remained consistent (Supplementary Figure S8). However, there is a chance that this method may not differentiate smaller subsets of neurons, particularly those with fewer than 50 recorded neurons.”

      Author response table 2.

      Minor points:

      Line 333: Error! Reference source not found. This was probably the place for citing Figure S2?

      Lines 339, 343: Error! Reference source not found.

      Thank you for mentioning these comments. In the new version, all reference functions from Word have been replaced with plain text.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      In this manuscript, the authors use a large dataset of neuroscience publications to elucidate the nature of self-citation within the neuroscience literature. The authors initially present descriptive measures of self-citation across time and author characteristics; they then produce an inclusive model to tease apart the potential role of various article and author features in shaping self-citation behavior. This is a valuable area of study, and the authors approach it with a rich dataset and solid methodology.

      The revisions made by the authors in this version have greatly improved the validity and clarity of the statistical techniques, and as a result the paper's findings are more convincing.

      This paper's primary strengths are: 1) its comprehensive dataset that allows for a snapshot of the dynamics of several related fields; 2) its thorough exploration of how self-citation behavior relates to characteristics of research and researchers.

      Thank you for your positive view of our paper and for your previous comments.

      Its primary weakness is that the study stops short of digging into potential mechanisms in areas where it is potentially feasible to do so - for example, studying international dynamics by identifying and studying researchers who move between countries, or quantifying more or less 'appropriate' self-citations via measures of abstract text similarity.

      We agree that these are limitations of the existing study. We updated the limitations section as follows (page 15, line 539):

      “Similarly, this study falls short in several potential mechanistic insights, such as by investigating citation appropriateness via text similarity or international dynamics in authors who move between countries.”

      Yet while these types of questions were not determined to be in scope for this paper, the study is quite effective at laying the important groundwork for further study of mechanisms and motivations, and will be a highly valuable resource for both scientists within the field and those studying it.

      Reviewer #2 (Public review):

      The study presents valuable findings on self-citation rates in the field of Neuroscience, shedding light on potential strategic manipulation of citation metrics by first authors, regional variations in citation practices across continents, gender differences in early-career self-citation rates, and the influence of research specialization on self-citation rates in different subfields of Neuroscience. While some of the evidence supporting the claims of the authors is solid, some of the analysis seems incomplete and would benefit from more rigorous approaches.

      Thank you for your comments. We have addressed your suggestions presented in the “Recommendations for the authors” section by performing your recommended sensitivity analysis that specifically identifies authors who could be considered neurologists, neuroscientists, and psychiatrists (as opposed to just papers that are published in these fields). Please see the “Recommendations for the authors” section for more details.

      Reviewer #3 (Public review):

      This paper analyses self-citation rates in the field of Neuroscience, comprising in this case, Neurology, Neuroscience and Psychiatry. Based on data from Scopus, the authors identify self-citations, that is, whether references from a paper by some authors cite work that is written by one of the same authors. They separately analyse this in terms of first-author self-citations and last-author self-citations. The analysis is well-executed and the analysis and results are written down clearly. The interpretation of some of the results might prove more challenging. That is, it is not always clear what is being estimated.

      This issue of interpretability was already raised in my review of the previous revision, where I argued that the authors should take a more explicit causal framework. The authors have now revised some of the language in this revision, in order to downplay causal language. Although this is perfectly fine, this misses the broader point, namely that it is not clear what is being estimated. Perhaps it is best to refer to Lundberg et al. (2021) and ask the authors to clarify "What is your Estimand?" In my view, the theoretical estimands the authors are interested in are causal in nature. Perhaps the authors would argue that their estimands are descriptive. In either case, it would be good if the authors could clarify that theoretical estimand.

      Thank you for your comment and for highlighting this insightful paper. After reading this paper, we believe that our theoretical estimand is descriptive in nature. For example, in the abstract of our paper, we state: “This work characterizes self-citation rates in basic, translational, and clinical Neuroscience literature by collating 100,347 articles from 63 journals between the years 2000-2020.” This goal seems consistent with the idea of a descriptive estimand, as we are not interested in any particular intervention or counterfactual at this stage. Instead, we seek to provide a broad characterization of subgroup differences in self-citations such that future work can ask more focused questions with causal estimands.

      Our analysis included subgroup means and generalized additive models, both of which were described as empirical estimands for a theoretical descriptive estimand in Lundberg et al. We added the following text to the paper (page 3, line 112):

      “Throughout this work, we characterized self-citation rates with descriptive, not causal, analyses. Our analyses included several theoretical estimands that are descriptive 17, such as the mean self-citation rates among published articles as a function of field, year, seniority, country, and gender. We adopted two forms of empirical estimands. First, we showed subgroup means in self-citation rates. We then developed smooth curves with generalized additive models (GAMs) to describe trends in self-citation rates across several variables.”

      In addition, we added to the limitations section as follows (page 15, line 539):

      “Yet, this study may lay the groundwork for future works to explore causal estimands.”

      Finally, in my previous review, I raised the issue of when self-citations become "problematic". The authors have addressed this issue satisfactorily, I believe, and now formulate their conclusions more carefully.

      Thank you for your previous comments. We agree that they improved the paper.

      Lundberg, I., Johnson, R., & Stewart, B. M. (2021). What Is Your Estimand? Defining the Target Quantity Connects Statistical Evidence to Theory. American Sociological Review, 86(3), 532-565. https://doi.org/10.1177/00031224211004187

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Thank you for your thorough revisions and responses to the reviews

      Reviewer #2 (Recommendations for the authors):

      I appreciate the authors' responses and am satisfied with all their replies except for my second comment. I still find the message conveyed slightly misleading, as the results seem to be generalized to neurologists, neuroscientists, and psychiatrists. It is important to refine the analysis to focus specifically on neuroscientists, identified as first or last authors based on their publication history. This approach is common in the science of science literature and would provide a more accurate representation of the findings specific to neuroscientists, avoiding the conflation with other related fields. This refinement could serve as a robustness check in the supplementary. I think adding this sub-analysis is essential to the validity of the results claimed in this paper.

      Thank you for your comment. We added a sensitivity analysis where fields are defined by an author’s publication history, not by the journal of each article.

      In the main text, we added the following:

      (Page 3, line 129) “When determining fields by each author’s publication history instead of the journal of each article, we observed similar rates of self-citation (Table S7). The 95% confidence intervals for each field definition overlapped in most cases, except for Last Author self-citation rates in Neuroscience (7.54% defined by journal vs. 8.32% defined by author) and Psychiatry (8.41% defined by journal vs. 7.92% defined by author).”

      Further details are provided in the methods section (page 21, line 801):

      “4.11 Journal-based vs. author-based field sensitivity analyses

      We refined our field-based analysis to focus only on authors who could be considered neuroscientists, neurologists, and psychiatrists. For each author, we looked at the number of articles they had in each subfield, as defined by Scopus. We considered 12 subfields that fell within Neurology, Neuroscience, and Psychiatry. These subfields are presented in Table S12. For each First Author and Last Author, we excluded them if any of their three most frequently published subfields did not include one of the 12 subfields of interest. If an author’s top three subfields included multiple broader fields (e.g., both Neuroscience and Psychiatry), then that author was categorized according to the field in which they published the most articles. Among First Authors, there were 86,220 remaining papers, split between 33,054 (38.33%) in Neurology, 23,216 (26.93%) in Neuroscience, and 29,950 (34.73%) in Psychiatry. Among Last Authors, there were 85,954 remaining papers, split between 31,793 (36.98%) in Neurology, 25,438 (29.59%) in Neuroscience, and 28,723 (33.42%) in Psychiatry.”

      Reviewer #3 (Recommendations for the authors):

      I would like to thank the authors for their responses the points that I raised, I do not have any new comments or further responses.

    1. Author response:

      We appreciate that the reviewers recognize the conceptual novelty of our work and find our work interesting.

      Reviewer #1:

      We thank Reviewer #1 for making us aware that the image presentation of some of what we see as very clear phenotypes in our work might not have been optimal in the reviewed pdf file, presumably due to the relatively low resolution and lack of appropriately magnified images in the merged pdf file. This issue– if not caught and corrected now– might have caused future readers to similarly not appreciate these clear phenotypes. We will carefully revise the figures and ensure maintenance of appropriate pdf resolution in the merged file so that image presentation is optimal and our findings are appropriately represented.

      We appreciate that Reviewer #1 carefully and critically assessed the growth cone transcriptomic data. We agree that future additional validation is warranted, and this will be clearly stated in our revised paper. Because we judge that these data – even in their current form – will be of potential interest to other investigators sooner rather than later, we respectfully offer and request that we should share them in this paper as our attempt so far to identify elements of the relevant growth cone biology, rather than waiting for years before completing additional validation.

      Even upon repeated reflection, we judge and respectfully submit that our CRISPR in utero electroporation experiments are, indeed, conducted with appropriate controls. We thought through the potential controls deeply prior to completing these complex experiments. We will describe our reasoning in detail in our point-by-point response.

      Reviewer #2:

      We thank Reviewer #2 for encouraging us to elaborate on the direction and cross- repressive interplay between Bcl11a and Bcl11b, which we previously identified (Woodworth*, Greig* et al., Cell Rep, 2016). We omitted deep discussion because we had already published this result, cited that work, and did not want to seem overly self- referential, as well as for reasons of length. Though we know and have reported that Bcl11a and Bcl11b are cross-repressive in SCPN development, we currently do not know whether increased Bcl11a expression in Bcl11b-null SCPN contributes to reduced Cdh13 expression. Also, we do not know if there is a similar Bcl11a-Bcl11b cross repression in striatal medium spiny neurons. This will be clarified in our revised paper.

      We agree fully with the reviewer that “the common practice of picking from a list of differentially expressed genes the most likely ones” has been useful for and has substantially contributed to the elucidation of molecular mechanisms in many systems, including in CNS development. Indeed, the current paper identifies Cdh13 as a newly recognized functional molecule in SCPN axon development by in part using this approach. Cdh13 belongs to a well-known gene family, and its expression by SCPN was already reported by us (Arlotta*, Molyneauz* et al., Neuron, 2005). Despite these two facts, we newly identify its function in SCPN development, which has never been investigated or reported. We appreciate the reviewer encouraging us to elaborate on this here.

      Recent technical advancement allows functional screening of a larger list of genes in vivo (Jin et al., Science, 2020; Ramani et al., bioRxiv, 2024; Zheng et al., Cell, 2024). That said, it is still a challenge to specifically access SCPN in vivo and apply such a high-throughput screening assay for axon development. We agree and predict that future work of this type might likely lead to identification of other new and unknown molecular regulators. We respectfully submit that our work reported here will provide useful foundation for many such future studies.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The manuscript reports that expression of the E. coli operon topAI/yjhQ/yjhP is controlled by the translation status of a small open reading frame, that authors have discovered and named toiL, located in the leader region of the operon. The authors propose the following model for topAI activation: Under normal conditions, toiL is translated but topAI is not expressed because of Rho-dependent transcription termination within the topAI ORF and because its ribosome binding site and start codon are trapped in an mRNA hairpin. Ribosome stalling at various codons of the toiL ORF, caused by the presence of some ribosome-targeting antibiotics, triggers an mRNA conformational switch which allows translation of topAI and, in addition, activation of the operon's transcription because the presence of translating ribosomes at the topAI ORF blocks Rho from terminating transcription. Even though the model is appealing and several of the experimental data support some aspects of it, several inconsistencies remain to be solved. In addition, even though TopAI was shown to be an inhibitor of topoisomerase I (Yamaguchi & Inouye, 2015, NAR 43:10387), the authors suggest, without offering any experimental support, that, because ribosome-targeting antibiotics act as inducers, expression of the topAI/yjhQ/yjhP operon may confer resistance to these drugs.

      Strengths:

      - There is good experimental support of the transcriptional repression/activation switch aspect of the model, derived from well-designed transcriptional reporters and ChIP-qPCR approaches.

      - There is a clever use of the topAI-lacZ reporter to find the 23S rRNA mutants where expression topAI was upregulated. This eventually led the authors to identify that translation events occurring at toiL are important to regulate the topAI/yjhQ/yjhP operon. Is there any published evidence that ribosomes with the identified mutations translate slowly (decreased fidelity does not necessarily mean slow translation, does it?)?

      G2253 is in helix 80 of the 23S rRNA, which has been proposed to be involved in correct positioning of the tRNA. Mutations in helix 80 have been reported to cause defects in peptidyl transferase center activity, which could reduce the rate of ribosome movement along the mRNA. If ribosomes are sufficiently slowed when translating toiL, this could induce expression of topAI. G1911 and Ψ1917 are in helix 69 of the 23S rRNA, which is involved in forming the inter-subunit bridge, as well as interactions with release factors. Mutations in helix 69 cause a decrease in the processivity of translation, suggesting that the mutations we identified may increase the occupancy of ribosomes within toiL, thereby inducing expression of topAI. We have added text to the Discussion section to include this speculation.

      - Authors incorporate relevant links to the antibiotic-mediated expression regulation of bacterial resistance genes. Authors can also mention the tryptophan-mediated ribosome stalling at the tnaC leader ORF that activates the expression of tryptophan metabolism genes through blockage of Rho-mediated transcriptional attenuation.

      We have added a citation to a recent structural study of ribosomes translating the tnaC uORF. Specifically, we speculate in the Discussion that toiL may have evolved to sense a ribosome-targeting antibiotic, or another ribosome-targeting small molecule such as an amino acid.

      Weaknesses:

      The main weaknesses of the work are related to several experimental results that are not consistent with the model, or related to a lack of data that needs to be included to support the model.

      The following are a few examples:

      - It is surprising that authors do not mention that several published Ribo-seq data from E. coli cells show active translation of toiL (for example Li et al., 2014, Cell 157: 624). Therefore, it is hard to reconcile with the model that starts codon/Shine-Dalgarno mutations in the toiL-lux reporter have no effect on luciferase expression (Figure 2C, bar graphs of the no antibiotic control samples).

      These data are for a topAI-lux reporter construct rather than toiL-lux. In our model, ribosome stalling within toiL is required to induce expression of the downstream genes; preventing translation of toiL by mutating the start codon or Shine-Dalgarno sequence would not cause ribosome stalling, consistent with the lack of an effect on topAI expression.

      - The SHAPE reactivity data shown in Figure 5A are not consistent with the toiL ORF being translated. In addition, it is difficult to visualize the effect of tetracycline on mRNA conformation with the representation used in Figure 5B. It would be better to show SHAPE reactivity without/with Tet (as shown in panel A of the figure).

      We have modified this figure (now Figure 6) so that we no longer show the SHAPE-seq data +/- tetracycline overlayed on the predicted RNA structure, since at best, the predicted structure likely only represents uninduced state. We have included the predicted structure together with the SHAPE-seq data for untreated cells as a separate panel because it is part of the basis for our model. We have also added a supplementary figure showing a similar RNA structure prediction based on conservation of the topAI upstream region across species (Figure 6 – figure supplement 1), and we describe this in the text.

      - The "increased coverage" of topAI/yjhP/yjhQ in the presence of tetracycline from the Ribo-seq data shown in Figure 6A can be due to activation of translation, transcription, or both. For readers to know which of these possibilities apply, authors need to provide RNA-seq data and show the profiles of the topAI/yjhQ/yjhP genes in control/Tet-treated cells.

      A previous study (Li et al., 2014, PMID 24766808) compared RNA-seq and Ribo-seq data for E. coli to measure normalized ribosome occupancy for each gene. However, sequence coverage for topAI was too low to confidently quantify either the RNA-seq or the Ribo-seq data. Presumably RNA levels were low because of Rho termination. Hence, we were not confident that RNA-seq would provide information on the regulation of topAI-yjhQP. Other data in our study provide strong evidence that regulation is primarily at the level of translation. And the key conclusion from Figure 6 (now Figure 7) is that tetracycline stalls ribosomes on start codons.

      - Similarly, to support the data of increased ribosomal footprints at the toiL start codon in the presence of Tet (Figure 6B), authors should show the profile of the toiL gene from control and Tet-treated cells.

      Figure 6B shows data for both treated and untreated cells. The overall ribosome occupancy is much lower for untreated cells, making it difficult to draw strong conclusions about the relative distribution of ribosomes across toiL.

      - Representation of the mRNA structures in the model shown in Figure 5, does not help with visualizing 1) how ribosomes translate toiL since the ORF is trapped in double-stranded mRNA, and 2) how ribosome stalling on toiL would lead to the release of the initiation region of topAI to achieve expression activation.

      We now show the predicted structure with only SHAPE-seq data for untreated cells. The comparison of SHAPE-seq +/- tetracycline is shown without reference to the predicted structure.

      - The authors speculate that, because ribosome-targeting antibiotics act as expression inducers [by the way, authors should mention and comment that, more than a decade ago, it had been reported that kanamycin (PMID: 12736533) and gentamycin (PMID: 19013277) are inducers of topAI and yjhQ], the genes of the topAI/yjhQ/yjhP operon may confer resistance to these antibiotics. Such a suggestion can be experimentally checked by simply testing whether strains lacking these genes have increased sensitivity to the antibiotic inducers.

      We thank the reviewer for pointing out these references, which we now cite. The fact that another group found that gentamycin induces topAI expression – it is one of the most highly induced genes in that paper – strongly suggests that we missed the key inducing concentrations for one or more antibiotics, meaning that topAI is induced by even more ribosome-targeting antibiotics than we realized.

      We did some preliminary experiments to look for effects of TopAI, YjhQ, and/or YjhP on antibiotic sensitivity, but generated only negative results. Since these experiments were preliminary and far from exhaustive, we have chosen not to include them in the manuscript. Other studies of genes regulated by ribosome stalling in a uORF have looked at genes whose functions in responding to translation stress were already known, so the environmental triggers were more obvious. With so many possible triggers for topAI-yjhQP, it will likely require considerable effort to find the relevant trigger(s). Hence, we consider this an important question, but beyond the scope of this manuscript.

      Reviewer #2 (Public Review):

      Summary:

      In this important study, Baniulyte and Wade describe how the translation of an 8-codon uORF denoted toiL upstream of the topAI-yjhQP operon is responsive to different ribosome-targeting antibiotics, consequently controlling translation of the TopAI toxin as well as Rho-dependent termination with the gene.

      Strengths:

      I appreciate that the authors used multiple different approaches such as a genetic screen to identify factors such as 23S rRNA mutations that affect topA1 expression and ribosome profiling to examine the consequences of various antibiotics on toiL-mediated regulation. The results are convincing and clearly described.

      Weaknesses:

      I have relatively minor suggestions for improving the manuscript. These mainly relate to the figures.

      Reviewer #3 (Public Review):

      Summary:

      The authors nicely show that the translation and ribosome stalling within the ToiL uORF upstream of the co-transcribed topAI-yjhQ toxin-antitoxin genes unmask the topAI translational initiation site, thereby allowing ribosome loading and preventing premature Rho-dependent transcription termination in the topAI region. Although similar translational/transcriptional attenuation has been reported in other systems, the base pairing between the leader sequence and the repressed region by the long RNA looping is somehow unique in toiL-topAI-yjhQP. The experiments are solidly executed, and the manuscript is clear in most parts with areas that could be improved or better explained. The real impact of such a study is not easy to appreciate due to a lack of investigation on the physiological consequences of topAI-yjhQP activation upon antibiotic exposure (see details below).

      Strengths:

      Conclusion/model is supported by the integrated approaches consisting of genetics, in vivo SHAPE-seq and Ribo-Seq.

      Provide an elegant example of cis-acting regulatory peptides to a growing list of functional small proteins in bacterial proteomes.

      Recommendations for the authors:

      Reviewing Editor Comments:

      (1) Examine the consequences of mutations impeding translation of the topAI/yjhQ/yjhP operon on cell growth in the presence and absence of antibiotics.

      See response to Reviewer 1’s comment.

      (2) Resolve discrepancies between the SHAPE data indicating constitutive sequestration of the toiL Shine Dalgarno sequence with antibiotic-regulated translation of the toiL ORF.

      See response to Reviewer 1’s comment.

      (3) Reconcile published Ribo-Seq data with the model that start codon/Shine-Dalgarno mutations in the toiL-lux reporter have no effect on luciferase expression in the absence of antibiotics.

      See response to Reviewer 1’s comment.

      (4) Clarify whether antibiotic MIC values were employed to select antibiotic concentrations for different experiments.

      The antibiotic concentrations we used are in line with reported MICs for E. coli. We now list the reported ECOFFs/MICs and include relevant citations.

      (5) Provide RNA-seq data to complement the Ribo-Seq data for the topAI/yjhQ/yjhP genes in control vs. Tet-treated cells.

      See response to Reviewer 1’s comment.

      (6) Revise the text to address as many of the reviewers' suggestions as reasonably possible.

      Changes to the text have been made as indicated in the responses to the reviewers’ comments.

      Reviewer #2 (Recommendations for the Authors):

      (1) Page 6: I would have liked to have more information about the 39 suppressor mutations in rho. Do any of the cis-acting mutations give support for the model proposed in Figure 8?

      We only know the specific mutation for some of the strains, and we now list those mutations in the Methods section. For other mutants, we mapped the mutation to either the rho gene or to Rho activity, but we did not sequence the rho gene. Most of the specific mutations we did identify fall within the primary RNA-binding site of Rho and hence should be considered partial-loss-of-function mutations (complete loss of function would be lethal).

      We identified cis-acting mutations by re-transforming the lacZ reporter plasmid into a wild-type strain. We did not sequence any of these plasmids.

      (2) Page 12-13, Section entitled "Mapping ribosome stalling sites induced by different antibiotics": This section should start with a better transition regarding the logic of why the experiments were carried out and should end with an interpretation of the results.

      We have added a few sentences at the start of this section to explain the rationale. We have also added two sentences at the end of this section to summarize the interpretation of the data.

      (3) Page 15: The authors should discuss under what conditions the expression of TopAI (and YjhQ/YjhP might be induced? Is expression also elevated upon amino acid starvation?

      We have looked through public RNA-seq data but have not identified growth conditions other than antibiotic treatment that induce expression of topAI, yjhQ or yjhP.

      (4) References: The authors should be consistent about capitalization, italics, and abbreviations in the references.

      These formatting errors will be fixed in the proofing stage.

      (5) All graph figures: There should be more uniformity in the sizes of individual data points (some are almost impossible to see) and error bars across the figures.

      We have tried to make the data points and error bars more visible for figures where they were smaller.

      (6) Figure 1B: I do not think the left arrow labeling is very intuitive and suggest renaming these constructs.

      We have removed the arrows to improve clarity.

      (7) Figure 2A: toiL should be introduced at the first mention of Figure 2A.

      We have added a schematic of the topAI-yjhQ-yjhP region as Figure 1A, including the toiL ORF, which we briefly mention in the text. We have opted to split Figure 2C into two panels. In Figure 2C we now only show data for the wild-type construct. Data for the mutant constructs are now shown in a new figure (Figure 5), alongside data for the wild-type constructs. We have simplified Figure 2A, since the mutations are not relevant to this revised figure, and we now show the schematic with the mutations as Figure 5A.

      (8) Figure 3C and 3D: I suggest giving these graphs headings (or changing the color of the bars in Figure 3D) to make it more obvious that different things are measured in the two panels.

      We have added headers to panels B-D make it clear that which graphs show ChIP-qPCR data which graph shows qRT-PCR data.

      (9) Figure 6: It might be nice to show the topAI-yjhPQ operon here.

      We now show the operon in Figure 1A.

      (10) Figure 8: This figure could be optimized by adding 5' and 3' end labels and having more similarity with the model in Figure 7.

      The constructs shown in Figure 7 lack most of the topAI upstream region, so they aren’t readily comparable to the schematic in Figure 8. However, we have changed the color of the ribosome in Figure 7 to match that in Figure 8. We also indicate the 5’ end of the RNA in Figure 8.

      Reviewer #3 (Recommendations for the Authors):

      Areas to improve:

      (1) While it's important to learn about ToiL-dependent regulation of the downstream topAI-yjhQ toxin-antitoxin genes, the physiological consequence of topAI-yjhQ activation seems to be lost in the manuscript. Everything was done with a reporter lacZ/lux. In the absence of toiL translation (i.e. SD mutant) and/or ribosome stalling, does premature transcription termination result in non-stochiometric synthesis of toxin vs. antitoxin, leading to growth arrest or other measurable phenotype? Knowing the impact of ToiL in the native topAI-yjhQ context will be valuable.

      See response to Reviewer 1’s comment.

      (2) It was indicated in Figure 4-figure supplement 1 that toiL homologs are found in many other proteobacteria, are the UR sequences in those species also form a similar inhibitory RNA loop?? The nt sequence identity of toiL is likely to be constrained by the base pairing of the topAI 5' region.

      We have added a supplementary figure panel showing an RNA structure prediction for the topAI upstream region based on sequence alignment of homologous regions from other species (Figure 6 – figure supplement 1).

      What is the frequency of the MLENVII hepta-peptide in the E. coli genome-wide. Is the sequence disfavored to avoid spurious multi-antibiotic sensing?

      LENVII is not found in any annotated E. coli K-12 protein. However, this is a sufficiently long sequence that we would expect few to no instances in the E. coli proteome.

      (3) Figure 1A, it would be helpful to indicate the location of the toiL (red arrow as in Figure 2A) relative to the putative rut site early in the beginning of the results. Does TSS mark the transcription start site? There is no annotation of TSS in the figure legend. Was TSS previously mapped experimentally? Please include relevant citations.

      We now indicate the position of the TSS relative to the topAI start codon. Similarly, we indicate the position of the start of toiL relative to the topAI start codon in Figure 2A. We now explain “TSS” in the figure legend. There is a reference in the text for the TSS (Thomason et al., 2015).

      (4) Please consider rearranging the results section, perhaps more helpful to introduce the toiL in Figure 1 or earlier. The current format requires readers to switch back-and-forth between Figure 4 and Figure 2.

      We have added a schematic of the topAI upstream region as Figure 1A, and we have separated Figure 2C as described in a response to a comment from Reviewer 2.

      (5) Figure 2A and Figure 2-Figure Suppl 1A, for clarity, please mark the rut site upstream of the red arrow.

      Rather than mark the rut on Figure 2A, which would make for a busy schematic, readers can compare the positions of the rut to those of toiL, which we have now added to Figures 1B (formerly Figure 1A) and 2A.

      (6) The following conclusion seems speculative: "...but does not trigger termination until RNAP ..., >180 nt further downstream…". Shouldn't the authors already know where the termination site is based on their previous Term-seq data (see Ref 1, Adams PP et al 2021)?

      Sites of Rho-dependent transcription termination cannot be mapped precisely from Term-seq data because exoribonucleases rapidly process the unstructured RNA 3’ ends.

      (7) Genetic screen: Please discuss why the 23S rRNA mutations that cause translational infidelity could promote topAI translation. Wouldn't the mutant ribosome be affected in translating toiL?

      See response to Reviewer 1’s comment.

      (8) Although antibiotic concentrations were provided in Figure 2 legend, please provide the MIC values of each antibiotic, e.g., in Table S2, for the tested E. coli strain, to inform readers how specific subinhibitory concentrations were chosen.

      See response to Reviewing Editor.

      (9) Please clarify the calculation of luciferase units in the y-axis of Figure 2A, why the scale is drastically higher than that of Figure 7C using the same antibiotics?

      These reporter assays use different constructs. The reporter construct used for experiments in Figure 7 includes a portion of the ermCL gene and associated downstream sequence. We have enlarged Figure 7A to highlight the difference in reporter constructs.

      (10) Table S4 needs a few more details. It is unclear how those numbers in columns G-H were generated. Do those numbers correspond to ribosome density per nt/ORF?

      We have added footnotes to Table S4 to indicate that the numbers in columns G and H represent sequence read coverage normalized by region length and by the upper quartile of gene expression.

      (11) Figure 5, if the SHAPE results were true, the Shine Dalgarno sequence of toiL is sequestered in the hairpin structure with and without tetracycline treatment. It is inconceivable that translational initiation will occur efficiently, please discuss.

      Our representation of the SHAPE-seq data was confusing since we overlayed the SHAPE-seq changes on a predicted structure that likely corresponds to the uninduced state. We hope that the new version of Figure 5 is clearer.

      We presume the reviewer is referring to the Shine-Dalgarno sequence of topAI rather than toiL, since the Shine-Dalgarno sequence of toiL is predicted to be unstructured even in the absence of tetracycline treatment. The ribosome-binding site of topAI is more accessible in cells treated with tetracycline, although the SHAPE-seq data suggest that this is a transient event. The binding of the initiating ribosome may also reduce reactivity in this region under inducing conditions. We now discuss this briefly in the text.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews: 

      Reviewer #1 (Public review): 

      The manuscript consists of two separate but interlinked investigations: genomic epidemiology and virulence assessment of Salmonella Dublin. ST10 dominates the epidemiological landscape of S. Dublin, while ST74 was uncommonly isolated. Detailed genomic epidemiology of ST10 unfolded the evolutionary history of this common genotype, highlighting clonal expansions linked to each distinct geography. Notably, North American ST10 was associated with more antimicrobial resistance compared to others. The authors also performed long read sequencing on a subset of isolates (ST10 and ST74), and uncovered a novel recombinant virulence plasmid in ST10 (IncX1/IncFII/IncN). Separately, the authors performed cell invasion and cytotoxicity assays on the two S. Dublin genotypes, showing differential responses between the two STs. ST74 replicates better intracellularly in macrophage compared to ST10, but both STs induced comparable cytotoxicity levels. Comparative genomic analyses between the two genotypes showed certain genetic content unique to each genotype, but no further analyses were conducted to investigate which genetic factors likely associated with the observed differences. The study provides a comprehensive and novel understanding on the evolution and adaptation of two S. Dublin genotypes, which can inform public health measures. The methodology included in both approaches were sound and written in sufficient detail, and data analysis were performed with rigour. Source data were fully presented and accessible to readers. 

      Comments on revised version: 

      The authors have addressed all the points raised by the reviewer. The manuscript is now much enhanced in clarity and accuracy. The re-written Discussion is more relevant and brings in comparison with other invasive Salmonella serotypes. 

      Comments: 

      In light of the metadata supplied in this revision, for Australian isolates, all human cases of ST74 (n=7) were from faeces (assuming from gastroenteritis) while 18/40 of ST10 were from invasive specimen (blood and abscess). This may contradict with the manuscript's finding and discussion on different experiment phenotypes of the two STs, with ST74 showing more replication in macrophages and potentially more invasive. Thus, the reviewer suggests the authors to mention this disparity in the Discussion, and discuss possible reasons underlying this disparity. This can strengthen the author's rationale for further in vivo studies. 

      We thank the reviewer for pointing out this important observation. We have amended the text in the Discussion to address the differences in source of human cases as suggested by the Reviewer (lines 392-430). We have also included text highlighting the important knowledge gaps in understanding the drivers for emerging iNTS with broad host ranges and identify future avenues of research that could be explored to better understand the observed differences in the host-pathogen interactions.  

      Reviewer #2 (Public review): 

      This is a comprehensive analysis of Salmonella Dublin genomes that offers insights into the global spread of this pathogen and region-specific traits that are important to understand its evolution. The phenotyping of isolates of ST10 and ST74 also offer insights into the variability that can be seen in S. Dublin, which is also seen in other Salmonella serovars, and reminds the field that it is important to look beyond lab-adapted strains to truly understand these pathogens. This is a valuable contribution to the field. The only limitation, which the authors also acknowledge, is the bias towards S. Dublin genomes from high income settings. However, there is no selection bias; this is simply a consequence of publicly available sequences. 

      We thank the reviewer for their comments and acknowledge the limitations of this study.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      (1) The authors repeatedly assert that an individual's behavior in the foraging assay depends on its prior history (particularly cultivation conditions). While this seems like a reasonable expectation, it is not fully fleshed out. The work would benefit from studies in which animals are raised on more or less abundant food before the behavioral task.

      Cultivation density: While we agree with the reviewer that testing the effects of varying bacterial density during animal development (cultivation) is an interesting experiment, it is not feasible at this time. We previously attempted this experiment but found it nontrivial to maintain stable bacterial density conditions over long timescales as this requires matching the rate of bacterial growth with the rate of bacterial consumption. Despite our best efforts, we have not been able to identify conditions that satisfy these requirements. Thus, we focused our revised manuscript to include only assertions about the effects of recent experiences and added this inquiry as a future direction (lines 618-624).

      (2) The authors convincingly show that the probability of particular behavioral outcomes occurring upon patch encounter depends on time-associated parameters (time since last patch encounter, time since last patch exploitation). There are two concerns here. First, it is not clear how these values are initialized - i.e., what values are used for the first occurrence of each behavioral state? More importantly, the authors don't seem to consider the simplest time parameter, the time since the start of the assay (or time since worm transfer). Transferring animals to a new environment can be associated with significant mechanical stimulus, and it seems quite possible that transferring animals causes them to enter a state of arousal. This arousal, which certainly could alter sensory function or decision-making, would likely decay with time. It would be interesting to know how well the model performs using time since assay starts as the only time-dependent parameter.

      Parameter Initialization: We thank the reviewer for pointing out an oversight in our methods section regarding the model parameter values used for the first encounter. We clarified the initialization of parameters in the manuscript (lines 1162-1179). In short, for the first patch encounter where k = 1:

      ρ<sub>k</sub> is the relative density of the first patch.

      τ<sub>s</sub> is the duration of time spent off food since the beginning of the recorded experiment. For the first patch, this is equivalent to the total time elapsed.

      ρ<sub>h</sub> is the approximated relative density of the bacterial patch on the acclimation plates (see Assay preparation and recording in Methods). Acclimation plates contained one large 200 µL patch seeded with OD<sub>600</sub> = 1 and grown for a total of ~48 hours. As with all patches, the relative density was estimated from experiments using fluorescent bacteria OP50-GFP as described in Bacterial patch density estimation in Methods.

      ρ<sub>e</sub> is equivalent to ρ<sub>h</sub>.

      Transfer Method: We thank the reviewer for their thoughtful comment on how the stress of transferring animals to a new plate may have resulted in an increased arousal state and thus a greater probability of rejecting patches. We anticipated this possibility and, in order to mitigate the stress of moving, we used an agar plug method where animals were transferred using the flat surface of small cylinders of agar. Importantly, the use of agar as a medium to transfer animals provides minimal disruption to their environment as all physical properties (e.g. temperature, humidity, surface tension) are maintained. Qualitatively, we observed no marked change in behavior from before to after transfer with the agar plug method, especially as compared to the often drastic changes observed when using a metal or eyelash pick. We added these additional methodological details to the methods (lines 791-796).

      Time Parameter: However, the reviewer’s concern that the simplest time parameter (time since start of the assay) might better predict animal behavior is valid. We thank the reviewer for pointing out the need to specifically test whether the time-dependent change in explore-exploit decision-making corresponds better with satiety (time off patch) or arousal (time since transfer/start of assay) state. To test this hypothesis, we ran our model with varying combinations of the satiety term τ<sub>s</sub> and a transfer term τ<sub>t</sub>. We found that when both terms were included in the model, the coefficient of the transfer term was non-significant. This result suggests that the relevant time-dependent term is more likely related to satiety than transfer-induced stress (lines 343-358; Figure 4 - supplement 4D).

      (3) Similarly, Figures 2L and M clearly show that the probability of a search event occurring upon a patch encounter decreases markedly with time. Because search events are interpreted as a failure to detect a patch, this implies that the detection of (dilute) patches becomes more efficient with time. It would be useful for the authors to consider this possibility as well as potential explanations, which might be related to the point above.

      Time-dependent changes in sensing: We agree with the reviewer that we observe increased responsiveness to dilute patches with time. Although this is interesting, our primary focus was on what decision an animal made given that they clearly sensed the presence of the bacterial patch. Nonetheless, we added this observation to the discussion as an area of future work to investigate the sensory mechanisms behind this effect (lines 563-568).

      (4) Based on their results with mec-4 and osm-6 mutants, the authors assert that chemosensation, rather than mechanosensation, likely accounts for animals' ability to measure patch density. This argument is not well-supported: mec-4 is required only for the function of the six non-ciliated light-touch neurons (AVM, PVM, ALML/R, PLML/R). In contrast, osm-6 is expected to disrupt the function of the ciliated dopaminergic mechanosensory neurons CEP, ADE, and PDE, which have previously been shown to detect the presence of bacteria (Sawin et al 2000). Thus, the paper's results are entirely consistent with an important role of mechanosensation in detecting bacterial abundance. Along these lines, it would be useful for the authors to speculate on why osm-6 mutants are more, rather than less, likely to "accept" when encountering a patch.

      Sensory mutant behavior: We thank the reviewer for pointing out the error in our interpretation of the behavior of osm-6 and mec-4 animals. We further elaborated on our findings and edited the text to better reflect that osm-6 mutants lack both chemosensory and mechanosensory ciliated sensory neurons (lines 406-448; lines 567-577). Specifically, we provided some commentary on the finding that osm-6 mutants show an augmented ability to detect the presence of bacterial patches but a reduced ability to assess their bacterial density. While this finding seems contradictory, it suggests that in the absence of the ability to assess bacterial density, animals must prioritize exploiting food resources when available.

      (5) While the evidence for the accept-reject framework is strong, it would be useful for the authors to provide a bit more discussion about the null hypothesis and associated expectations. In other words, what would worm behavior in this assay look like if animals were not able to make accept-reject decisions, relying only on exploit-explore decisions that depend on modulation of food-leaving probability?

      Accept-reject vs. stay-switch: We thank the reviewer for alerting us to this gap in our discussion. We have revised the text to further extrapolate upon our point of view on this somewhat philosophical distinction and what it predicts about C. elegans behavior (lines 507-533).

      Reviewer #3 (Public review):

      (1) Sensing vs. non-sensing

      The authors claim that when animals encounter dilute food patches, they do not sense them, as evidenced by the shallow deceleration that occurs when animals encounter these patches. This seems ethologically inaccurate. There is a critical difference between not sensing a stimulus, and not reacting to it. Animals sense numerous stimuli from their environment, but often only behaviorally respond to a fraction of them, depending on their attention and arousal state. With regard to C. elegans, it is well-established that their amphid chemosensory neurons are capable of detecting very dilute concentrations of odors. In addition, the authors provide evidence that osm-6 animals have altered exploit behaviors, further supporting the importance of amphid chemosensory neurons in this behavior.

      Interpretation of “non-sensing” encounters: We thank the reviewer for their comment and agree that we do not know for certain whether the animals sensed these patches or were merely non-responsive to them. We are, however, confident that these encounters lack evidence of sensing. Specifically, we note that our analyses used to classify events as sensing or non-sensing examined whether an animal’s slow-down upon patch entry could be distinguished from either that of events where animals exploited or that of encounters with patches lacking bacteria. We found that  “non-sensing” encounters are indeed indistinguishable from encounters with bacteria-free patches where there are no bacteria to be sensed (see Figure 2 - Supplement 8A-C and Patch encounter classification as sensing or non-responding in Methods). Regardless, we agree with the reviewer that all that can be asserted about these events is that animals do not appear to respond to the bacterial patch in any way that we measured. Therefore, we have replaced the term “non-sensing” with “non-responding” to better indicate the ethological interpretation of these events and clarified the text to reflect this change (lines 193-200; lines 211-212).

      (2) Search vs. sample & sensing vs. non-sensing

      In Figures 2H and 2I, the authors claim that there are three behavioral states based on quantifying average velocity, encounter duration, and acceleration, but I only see three. Based on density distributions alone, there really only seem to be 2 distributions, not 3. The authors claim there are three, but to come to this conclusion, they used a QDA, which inherently is based on the authors training the model to detect three states based on prior annotations. Did the authors perform a model test, such as the Bayesian Information Criterion, to confirm whether 2 vs. 3 Gaussians is statistically significant? It seems like the authors are trying to impose two states on a phenomenon with a broad distribution. This seems very similar to the results observed for roaming vs. dwelling experiments, which again, are essentially two behavioral states.

      Validation of sensing clusters: We are grateful to the reviewer for pointing out the difficulty in visualizing the clusters and the need for additional clarity in explaining the semi-supervised QDA approach. We added additional visualizations and methods to validate the clusters we have discovered. Specifically, we used Silverman’s test to show that the sensing vs. non-responding data were bi-modal (i.e. a two-cluster classification method fits best) and accompanied this statistical test with heat maps which better illustrate the clusters (lines 171-173; lines 190-191; lines 948-972; lines 1003-1005; Figure 2 - supplement 6A-C; Figure 2 - supplement 7C-F).

      Further, it seems that there may be some confusion as to how we arrived at 3 encounter types (i.e. search, sample, exploit). It’s important to note that two methods were used on two different (albeit related) sets of parameters. We first used a two-cluster GMM to classify encounters as explore or exploit. We then used a two-cluster semi-supervised QDA to classify encounters as sensing or non-sensing (now changed to “non-responding”, see above response) using a different set of parameters. We thus separated the explore cluster into two (sensing and non-responding exploratory events) resulting in three total encounter types: exploit, sample (explore/sensing), and search (explore/non-sensing).

      (4) History-dependence of the GLM

      The logistic GLM seems like a logical way to model a binary choice, and I think the parameters you chose are certainly important. However, the framing of them seems odd to me. I do not doubt the animals are assessing the current state of the patch with an assessment of past experience; that makes perfect logical sense. However, it seems odd to reduce past experience to the categories of recently exploited patch, recently encountered patch, and time since last exploitation. This implies the animals have some way of discriminating these past patch experiences and committing them to memory. Also, it seems logical that the time on these patches, not just their density, should also matter, just as the time without food matters. Time is inherent to memory. This model also imposes a prior categorization in trying to distinguish between sensed vs. not-sensed patches, which I criticized earlier. Only "sensed" patches are used in the model, but it is questionable whether worms genuinely do not "sense" these patches.

      Model design: We thank the reviewer for their thoughtful comments on the model. We completed a number of analyses involving model selection including model selection criteria (AIC, BIC) and optimization with regularization techniques (LASSO and elastic nets) and found that the problem of model selection was compounded by the enormous array of highly-correlated variables we had to choose from. Additionally, we found that both interaction terms and non-linear terms of our task variables could be predictive of accept-reject decisions but that the precise set of terms selected depended sensitively on which model selection technique was used and generally made rather small contributions to prediction. The diverse array of results and combinatorial number of predictors to possibly include failed to add anything of interpretable value. We therefore chose to take a different approach to this problem. Rather than trying to determine what the “best” model was we instead asked whether a minimal model could be used to answer a set of core questions. Indeed, our goal was not maximal predictive performance but rather to distinguish between the effects of different influences enough to determine if encounter history had a significant, independent effect on decision making. We thus chose to only include task variables that spanned the most basic components of behavioral mechanisms to ask very specific questions. For example, we selected a time variable that we thought best encapsulated satiety. While we could have included many additional terms, or made different choices about which terms to include, based on our analyses these choices would not have qualitatively changed our results. Further, we sought to validate the parameters we chose with additional studies (i.e. food-deprived and sensory mutant animals). We regard our study as an initial foray into demonstrating accept-reject decision-making in nematodes. The exact mechanisms and, consequently, the best model design are therefore beyond the scope of this study.

      Lastly, in regards to the use of only sensed patches in the model; while we acknowledge that we are not certain as to whether the “non-responding” encounters are truly not sensed, we find qualitatively similar results when including all exploratory patches in our analyses. However, we take the position that sensation is necessary for decision-making and thus believe that while our model’s predictive performance may be better using all encounters, the interpretation of our findings is stronger when we only include sensing events. We have added additional commentary about our model to the discussion section (lines 667-695).

      (5) osm-6

      The osm-6 results are interesting. This seems to indicate that the worms are still sensing the food, but are unable to assess quality, therefore the default response is to exploit. How do you think the worms are sensing the food? Clearly, they sense it, but without the amphid sensory neurons, and not mechanosensation. Perhaps feeding is important? Could you speculate on this?

      We thank the reviewer for their thoughtful remarks. We have added additional commentary about the result of our sensory mutant experiments as described above in response to Reviewer #1 under Sensory mutant behavior.

      (7) Impact:

      I think this work will have a solid impact on the field, as it provides tangible variables to test how animals assess their environment and decide to exploit resources. I think the strength of this research could be strengthened by a reassessment of their model that would both simplify it and provide testable timescales of satiety/starvation memory.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      The authors title the work as an "ethological study" and emphasize the theme of "foraging in naturalistic environments" in contrast to typical laboratory conditions. The only difference in this study relative to typical laboratory conditions is that the food bacteria is distributed in many small patches as compared to one large patch. First, it is not clear to the reviewer that the size of the food patches in these experiments is more relevant to C. elegans in its natural context than the standard sizes of food patches. Furthermore, all the other highly unnatural conditions typical of laboratory cultivation still apply: the use of a 2D agar substrate, a single food bacteria that is not a component of a naturalistic diet, and the use of a laboratory-adapted strain of C. elegans with behavior quite distinct from that of natural isolates. The reviewer is not suggesting that the authors need to make their experiments more naturalistic, only that the experiments as described here should not be described as naturalistic or ethological as there is no support for such claims.

      Ethological interpretation: We thank the reviewer for their comments about the use of the term ethological to describe this study. We chose to develop a patchy bacterial assay to mimic the naturalistic “boom-or-bust” environment. While we agree with the reviewer that we do not know if the size and distribution of the food patches in these experiments is more relevant to C. elegans, we maintain that these experiments were ecologically-inspired and revealed behavior that is difficult to observe in environments with large, densely-seeded bacterial patches. We have updated our text to better reflect that this study was “ecologically-inspired” rather than truly “ethological” in nature (lines 94, 693).

      The main finding of the paper is that worms explore and then exploit, i.e. they frequently reject several bacterial patches before accepting one. This result requires additional scrutiny to reject other possible interpretations. In particular, when worms are transferred to a new plate we would expect some period of increased arousal due to the stressful handling process. A high arousal state might cause rejection of food patches. Could the measured accept/reject decisions be influenced by this effect? One approach to addressing this concern would be to allow the animals to acclimate to the new plate on a bare region before encountering the new food patches.

      We thank the reviewer for their comment on how the stress of transferring animals to a new plate may have resulted in an increased arousal state and thus a greater probability of rejecting patches. We addressed this above in response to Reviewer #1 under Transfer Method and Time Parameter. In brief, we used a worm picking method that mitigated stress and added additional analyses showing that a transfer-related term was less predictive than a satiety-related term.

      Related to the above, in what circumstances exactly are the authors claiming that worms first explore and then exploit? After being briefly deprived of food? After being handled?

      Explore-then-exploit: All animals were well-fed and handled gently as described above under Transfer Method (lines 787-795). Our results suggest that the appearance of an explore-then-exploit strategy is a byproduct of being transferred from an environment with high bacterial density to an environment with low bacterial density as described in the manuscript (lines 461-466).

      The authors emphasize their analysis of the accept/reject decision as a critical innovation. However, the accept/reject decision does not strike me as substantially different from the previously described stay/switch decision. When a worm encounters a new patch of bacteria, accepting this bacteria is equivalent to staying on it and rejecting (leaving) it is equivalent to switching away from it. The authors should explain how these concepts are significantly distinct.

      Accept-reject vs. stay-switch: We thank the reviewer for alerting us to this gap in our discussion. We have revised the text to further extrapolate upon our point of view on this somewhat philosophical distinction and what it predicts about C. elegans behavior (lines 507-533).

      During patch encounter classification, the authors computed three of the animals' behavioral metrics (Line 801-804) and claimed that the combination of these three metrics reveals two non-Gaussian clusters representing encounters where animals sensed the patch or did not appear to sense the patch. The authors also refer to a video to demonstrate the two clusters by rotating the 3-dimension scatter plot. However, the supposed clusters, if any, are difficult to see in a 3D (Video 5) or in a 2D scatter plot (Figure 3I). The authors need to clearly demonstrate the distinct clustering as claimed in the paper as this feature is fundamental and necessary for the model implementation and interpretation of results.

      We are grateful to the reviewer for pointing out the difficulty in visualizing the clusters. We added additional visualizations and methods to validate the clusters we have discovered as described in our above response to Reviewer #3 under Validation of sensing clusters.

      When selecting parameters (covariates) for their model, it is critical to avoid overfitting. Therefore, the authors used AIC and BIC (Figure 4- supplement 1) to demonstrate that the full GLM model has a better model performance than the other models which contain only a subset of the full covariates (in a total of 5). However, the authors compare the full set with only 4 other models whereas the total number of models that need to be compared with is 2^5-2. The authors at least need to include the AIC and BIC scores of all possible models in order to draw the conclusion about the performance of the full model.

      Model selection criterion: We thank the reviewer for pointing out this gap in our methodology. We have now run the model with all combinations of subsets of model parameters and have confirmed that the model with all 5 covariates outperforms all other models even when using BIC, the strictest criterion for overfitting (Figure 1 - supplement 1A). The only other model that performs well (though not as often as the 5-term model) is the 4-term model lacking ρ<sub>h</sub>. This result is not surprising as ρ<sub>h</sub> only changes substantially once in an animal’s encounter history for the single-density, multi-patch data that this model was fit to. For example, for an animal foraging on patches of density 10, on the first encounter ρ<sub>h</sub> = ~200 (see Parameter initialization above), but on every subsequent encounter ρ<sub>h</sub> = ~10. Resultantly, the effect of ρ<sub>h</sub> on the probability of exploiting is somewhat binary on the single-density, multi-patch data set. Nevertheless, we see significantly improved prediction of behavior in the novel multi-density, multi-patch data (Figure 4F) as we observe an effect of the most recently encountered patch. Additionally, we observe a similar impact (i.e., significant coefficient of negative sign) of the ρ<sub>h</sub> term when the model is fit to the multi-density, multi-patch data set (Figure 4 - supplement 4D).

      In any bacterial patch, the edges have a higher density of bacteria than the patch center. Thus, it is possible that a worm scans the patch edge density, on the basis of which it decides to accept or reject the patch whose average density is smaller. This could potentially cause an underestimate of the bacteria density used in the model. Furthermore, the potential inhomogeneity of the patch may further complicate the worm's decision-making, and the discrepancy between the reality and the model assumption will reduce the validity of the model. The authors need to estimate the inhomogeneity of the bacterial patches used in their assays and discuss how the edge effects may affect their results and conclusions.

      Bacterial patch inhomogeneity: We extensively tested the landscape of the bacterial patches by imaging fluorescently-labeled bacteria OP50-GFP (Bacterial Patch Density in Methods; Figure 2 - supplement 1-3). As the reviewer mentions, we observe significantly greater bacterial density at the patch edge. This within-patch spatial inhomogeneity results from areas of active proliferation of bacteria and likely complicates an animal’s ability to accurately assess the quantity of bacteria within a patch and, consequently, our ability to accurately compute a metric related to our assumptions of what the animal is sensing. In our study, we used the relative density of the patch edge where bacterial density is highest as a proxy for an animal’s assessment of bacterial patch density (Figure 2 – supplement 1). This decision was based on a previous finding that the time spent on the edge of a bacterial patch affected the dynamics of subsequent area-restricted search. While within-patch spatial inhomogeneity likely affects an animal’s ability to assess patch density, we do not believe that this qualitatively affects the results of our study. Both the patch densities tested (Figure 2 – supplement 3A) as well as our observations of time-dependent changes in exploitation (Figure 2E,N-O; Figure 3H-I) maintained a monotonic relationship. Therefore, alternative methods of patch density estimation should yield similar results. We have added additional discussion on this topic to our manuscript (lines 578-593).

      The authors claim that their methods (GMM and semi-supervised QDA) are unbiased. This seems unlikely as the QDA involves supervision. The authors need to provide additional explanation on this point.

      Semi-supervised QDA labelling: We have removed the term “unbiased” to avoid any misinterpretation of the methodology and clarified our method of labelling used for “supervising” QDA. Specifically, we made two simple assumptions: 1) animals must have sensed the patch if they exploited it and 2) animals must not have sensed the patch if there was no bacteria to sense. Thus, we labeled encounters as sensing if they were found to be exploitatory as we assume that sensation is prerequisite to exploitation; and we labeled encounters as non-sensing for events where animals encountered patches lacking bacteria (OD<sub>600</sub> = 0). All other points were non-labeled prior to learning the model. In this way, our labels were based on the experimental design and results of the GMM, an unsupervised method; rather than any expectations we had about what sensing should look like. The semi-supervised QDA method then used these initial labels to iteratively fit a paraboloid that best separated these clusters, by minimizing the posterior variance of classification (lines 1012-1021). See Figure 2 - supplement 8A-B for a visualization showing the labelled data.

      Based on the authors' result, worms behaviorally exhibit their preferences toward food abundance (density), which results in a preference scale for a range of densities. Does this scale vary with the worms' initial cultivation states? The author partially verified that by observing starved worms. This hypothesis could be better tested if the authors could analyze the decision-making of the worms that were initially cultivated with different densities of bacterial food.

      While we agree with the reviewer that testing the effects of varying bacterial density during animal development (cultivation) is a very interesting experiment, it is not feasible at this time. We focused our revised manuscript to include only assertions about the effects of recent experiences and added this inquiry as a future direction as described above in our response to Reviewer #1 under Cultivation density.

      It would be helpful to elaborate more on how the framework developed in this paper can be applied more broadly to other behaviors and/or organisms and how it may influence our understanding of decision-making across species.

      We thank the reviewer for alerting us to this gap in our discussion. We have added additional commentary about our model and its utility to the discussion section (lines 667-695).

      Reviewer #3 (Recommendations for the authors):

      Sensing vs. non-sensing

      Perhaps a more ethologically accurate term to describe this behavior would be "ignoring" rather than "not sensing". If the authors feel strongly about using the term "not sensing", then they should provide experimental evidence supporting this claim. However, I think simply changing the terminology negates these experiments.

      We thank the reviewer for their thoughtful comments. While we agree with the reviewer that the term “non-sensing” may not be ethologically accurate (see response to Public Review above under Interpretation of “non-sensing” encounters), we interpret the term “ignoring” to mean that the animal sensed the patches but decided not to react. We have chosen to replace the term “non-sensing” with “non-responding” to best indicate the ethological interpretation of our observation. Nonetheless, we believe that it remains possible that animals are truly not sensing the bacterial patches as our method of classification compared the behavior against encounters with patches lacking bacteria (as described above in response to Reviewer #2 under Semi-supervised QDA labelling).

      History-dependence of the GLM

      Perhaps a simpler approach would be to say the worm senses everything, and this accumulative memory affects the decision to exploit. For example, the animal essentially experiences two feeding states: feeding on patches, and starvation off of patches.

      The level of satiety could be modeled linearly:

      Satiety(t_enter:t_leave) = k_feed*patch_density*delta_t

      Where k_feed is some model parameter for rate of satiety signal accumulation, t_enter is the time the animal entered the patch, t_leave is the time the animal left the patch, and delta_t is the difference between the two. Perhaps you could add a saturation limit to this, but given your data, I doubt that is the case.

      Starvation could be modeled as simply a decay from the last satiety signal:

      Starvation(t_leave:t_enter) = Satiety(t_leave)*exp(-k_starve*delta_t).

      Where starvation is the rate constant for the decay of the satiety signal.

      For the logistic model, the logistic parameter is simply the difference between the current patch density and the current satiety signal.

      A nice thing about this approach is that it negates the need to categorize your patches. All patch encounters matter. Brief patch encounters (categorized as non-sensing and not used in the prior GLM) naturally produce a very small satiety signal and contribute very little to the exploit decision. Another nice thing about this approach is that it gives you memory timescales, that are testable. There is a rate of satiety accumulation and a rate of satiety loss. You should be able to predict behavior with lower patch density, assuming the rate constants hold. (I am not advocating you do more experiments here, just pointing out a nice feature of this approach).

      You could possibly apply this to a GLM for velocity on a non-exploited patch as well, though I assume this would be a linear GLM, given the velocity distributions you provided.

      We thank the reviewer for their time and thoughtfulness in thinking about our model. The reviewer’s proposed model seems entirely reasonable and could aid in elucidating the time component of how prior experience affects decision-making. However, we decided to keep our paper focused on using a minimal model to answer a set of core questions (e.g., Does encounter history or satiety influence decision-making?) (see above under Model design for a more detailed response). Future studies investigating the mechanisms of these foraging decisions should open the door for more mechanistically accurate models. We have expanded our discussion of the model to include this assertion (lines 667-695).

    1. Author response:

      The following is the authors’ response to the original reviews

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Sample size: If the sample size of the study is increased, more confidence and new insights can be inferred about myometrial enhancer-mediated gene regulation in term pregnancy. Such a small sample size (N = 3) limits the statistical power of the study. As mentioned in the manuscript they failed to identify chromatin loops in the second subject's biopsy is observed due to a limited sample.

      We agree with the reviewer’s comment about the sample size. We sincerely hope the result of this study would increase the interest of stakeholders to fund future projects in a larger scale.

      (2) Figure quality: There is a lack of good representations of the results (e.g., screenshots of tables as figure panels!) as well as missing interpretations that might add value to the manuscript.

      Figure 1B and 2B have been converted to the pie chart format.

      (3) Definition of super-enhancer: The definition of super-enhancer is not clear. Also, the computational merging of enhancers to define super-enhancers should be described better.

      Added more details about tool and parameter setting in the Method section of “Identification of super enhancers”:

      “Identification of super enhancers

      H3K27ac-positive enhancers were defined as regions of H3K27ac ChIP-seq peaks in each sample. The enhancers within 12.5Kb were merged by using bedtools merge function with parameter “-d 12500”. The combined enhancer regions were called super enhancers if they were larger than 15Kb. The common super enhancers from multiple samples were used for downstream analysis.”

      Reference:

      Whyte WA, Orlando DA, Hnisz D, Abraham BJ, Lin CY, Kagey MH, Rahl PB, Lee TI, Young RA. Master transcription factors and mediator establish super-enhancers at key cell identity genes. Cell. 2013 Apr 11;153(2):307-19. doi: 10.1016/j.cell.2013.03.035. PMID: 23582322; PMCID: PMC3653129.

      (4) Assay-Specific Limitations: Each assay employed in the study, such as ChIP-Seq and CRISPRa-based Perturb-Seq, has its limitations, including potential biases, sensitivity issues, and technical challenges, which could impact the accuracy and reliability of the results. These limitations should be addressed properly to avoid false-positive results and improve the interpretability of the results.

      The major limitations of the CRISPRa-based Perturb-Seq protocol in this study are the use of the hTERT-HM cells and the two-vector system for transduction. While hTERT-HM cells are a much easier platform in terms of technical operation, primary human myometrial cells are generally considered retaining a molecular context that is closer to the in vivo tissues. Due to the limitation on the efficiency of having two vectors simultaneously present in the same cell, hTERT-HM cells are much more affordable and operationally feasible to conduct the experiment. Future advancements on the increase of viral vector payload capacity may overcome this challenge and open the venue to perform the assay on primary human myometrial cells.

      (5) Sample collection and comparison: There is mention of matched gravid term and non-gravid samples whereas no description or use of control samples was found in the results. Also, the comparison of non-labor samples with labor samples would provide a better understanding of epigenomic and transcriptomic events of myometrium leading to laboring events.

      The description has been updated:

      “Collection of myometrial specimens

      Permission to collect human tissue specimens was prospectively obtained from individuals undergoing hysterectomy or cesarean section for benign clinical indications (H-33461). Gravid myometrial tissue was obtained from the margin of the hysterotomy in women undergoing term cesarean sections (>38 weeks estimated gestational age) without evidence of labor. Non-gravid myometrial tissue was collected from pre-menopausal women undergoing hysterectomy for benign conditions. Specimens from gravid women receiving treatment for pre-eclampsia, eclampsia, pregnancy-related hypertension, or pre-term labor were excluded.”

      (6) Lack of clarity:

      (6a) It is written as 'Chromatin Conformation Capture (Hi-C)'. I think Hi-C is Histone Capture and 3C is Chromosome Conformation Capture! This needs clear writing.

      As the reviewer suggested, to make it clear, we have changed the text “A high throughput chromatin conformation capture (Hi-C) assay” to “A High-throughput Chromosome Conformation Capture (Hi-C) assay”.

      (6b) In multiple places, 'PLCL2' gene is written as 'PCLC2'.

      Corrected as suggested.

      (6c) What is the biological relevance of considering 'active' genes with FPKM {greater than or equal to} 1? This needs clarification.

      In RNA-seq analysis, the gene expression levels are often quantified using FPKM (Fragments Per Kilobase of transcript per Million mapped reads). Setting a threshold of FPKM for defining "active" genes in RNA-seq analysis is biologically relevant, because it helps to distinguish between genuinely expressed genes and background noise. It helps researchers focus on genes, which are more likely to have a significant biological impact. A common threshold for defining "active" genes is FPKM ≥ 1. Genes with FPKM values below this threshold may be transcribed at very low levels or could be background noise.

      (6d) The understanding of differentially methylated genes at promoters is underrated as per the authors. But, why leaving DNA methylation apart, they selected histone modification as the basis of epigenetic reprogramming in terms of myometrium is unclear.

      DNA methylation indeed plays a crucial role in evaluating the impact of cis-acting elements on gene regulation. Large-scale studies, such as the comprehensive analysis of the myometrial methylome landscape in human biopsies (Paul et al., JCI Insight, 2022, PMID: 36066972), have provided valuable insights. When integrated with histone modification and chromatin looping data, contributed by our group and collaborators, future secondary analyses leveraging machine learning are poised to further elucidate the mechanisms underlying myometrial transcriptional regulation.

      (6e) How does the identification of PGR as an upstream regulator of PLCL2 gene expression in human myometrial cells contribute to our understanding of progesterone signaling in myometrial function?

      In a previous study, we demonstrated a positive correlation between PLCL2 and PGR expression in a mouse model and identified PLCL2's role in negatively modulating oxytocin-induced myometrial cell contraction (Peavy et al., PNAS, 2021, PMID: 33707208). The present study builds on this by providing evidence for a direct regulatory mechanism in which PGR influences PLCL2 transcription, likely through a cis-acting element located 35 kb upstream. These findings suggest that PLCL2 acts as a mediator of PGR-dependent myometrial quiescence prior to labor, rather than merely participating in a parallel pathway. Further in vivo studies are necessary to delineate the extent to which PLCL2 mediates PGR activity, particularly the contraction-dampening function of the PGR-B isoform.

      (7) Grammatical error: The manuscript has numerous grammatical errors. Please correct them.

      Corrections have been made as suggested.

      (8) Use of single-cell data: Though from the Methods section, it can be understood that single-cell RNA-seq was done to identify CRISPRa gRNA expressing cells to characterize the effect of gene activation, some results from single-cell data e.g., cell clustering, cell types, gRNA expression across clusters could be added for better elucidation.

      As reviewer suggested, we have prepared a file “PerturbSeq_summary.xlsx” (Dataset S9) to provide additional results of perturb-seq data analysis. It includes 2 spreadsheets, “Cell_per_gRNA” for clustering and “Protospacer_calls_per_cell” for gRNA expression across clusters.

      Reviewer #2 (Recommendations For The Authors):

      (1) The following are a number of grammatical issues in the abstract. I suggest having a careful read of the entire manuscript to identify additional grammatical issues as I may not be able to highlight all of these issues.

      (1a) "The myometrium plays a critical component during pregnancy." change component to role.

      (1b) "It is responsible for the uterus' structural integrity and force generation at term," à replace "," with "."

      (1c) Also, I suggest rephrasing the first 2 sentences to: The myometrium plays a critical role during pregnancy as it is responsible for both the structural integrity of the uterus and force generation at term.

      (1d) "Here we investigated the human term pregnant nonlabor myometrial biopsies for transcriptome, enhancer histone mark cistrome, and chromatin conformation pattern mapping." Remove "the", and modify to "Here we investigated human term pregnant".

      (1e) Missing period and sentence fragment, "PGR overexpression facilitated PLCL2 gene expression in myometrial cells Using CRISPR activation the functionality of a PGR putative enhancer 35-kilobases upstream of the contractile-restrictive gene PLCL2.

      Corrections have been made as suggested.

      (2) Sentence fragment: Studies on the role of steroid hormone receptors in myometrial remodeling have provided evidence that the withdrawal of functional progesterone signaling at term is due to a stoichiometric increase of progesterone receptor (PGR) A to B isoform-related estrogen receptor (ESR) alpha expression activation at term. (Mesiano, Chan et al. 2002) (Merlino, Welsh et al. 2007) (Nadeem, Shynlova et al. 2016).

      The statement has been updated:

      “Studies on the role of steroid hormone receptors in myometrial remodeling suggest that the withdrawal of functional progesterone signaling at term results from a stoichiometric shift favoring the PGR-A isoform over PGR-B. This shift is associated with increased activation of estrogen receptor alpha (ESR1) expression at term (Mesiano, Chan et al. 2002) (Merlino, Welsh et al. 2007) (Nadeem, Shynlova et al. 2016).”

      (3) FOS:JUN heterodimers are implicated to be critical for the initiation of labor through transcriptional regulation of gap junction proteins such as Cx43 (Nadeem, Farine et al. 2018) (Balducci, Risek et al. 1993).

      Use Gja1 (Gap junction alpha 1) as the current correct gene, not Cx43.

      Also, several references predate Nadeem, Farine et al. 2018 and are more appropriate to use as references for the role of Ap-1 proteins in regulating Gja1; PMID: 15618352 and PMID: 12064606 were the first to show this relationship in myometrial cells.

      The statement has been updated as suggested:

      “FOS:JUN heterodimers are implicated to be critical for the initiation of labor through transcriptional regulation of gap junction proteins such as GJA1 (Nadeem, Farine et al. 2018) (Balducci, Risek et al. 1993)”

      (4) Define PLCL2 on first use.

      Updated as suggested.

      (5) There are a number of issues with this section, "Matched sSpecimens of gravid myometrium were collected at the margin of hysterotomy from women undergoing clinically indicated cesarean section at term (>38 weeks estimated gestation age) without evidence of labor. Specimens of healthy, non-gravid myometrium were also pecimens were collected from uteri removed from pre-menopausal women undergoing hysterectomy for benign clinical indications."

      The description has been updated:

      “Collection of myometrial specimens

      Permission to collect human tissue specimens was prospectively obtained from individuals undergoing hysterectomy or cesarean section for benign clinical indications (H-33461). Gravid myometrial tissue was obtained from the margin of the hysterotomy in women undergoing term cesarean sections (>38 weeks estimated gestational age) without evidence of labor. Non-gravid myometrial tissue was collected from pre-menopausal women undergoing hysterectomy for benign conditions. Specimens from gravid women receiving treatment for pre-eclampsia, eclampsia, pregnancy-related hypertension, or pre-term labor were excluded.”

      (6) Enriched motifs were identified by HOMER (Hypergeometric Optimization of Motif EnRichment) v4.11 (Heinz, Benner et al. 2010).

      Please clarify what background is used for motif enrichment.

      We used the default background sequences generated by HOMER from a set of random genomic sequences matching the input sequences in terms of basic properties, such as GC content and length. We have added more details in the Method section:

      “DNA-binding factor motif enrichment analysis

      Enriched motifs were identified by HOMER (Hypergeometric Optimization of Motif EnRichment) v4.11 with default background sequences matching the input sequences (Heinz, Benner et al. 2010).”

      (7) "Six of the seven regions are also co-localized with previously published genome occupancy of transcription regulators curated by the ReMap Atlas"

      Please clarify if this Atlas includes myometrial tissues or not and clarify the cell types included in the atlas.

      According to the UCSC Genome Browser and the reference by Hammal et al. (2022), the current ReMap database includes PGR ChIP-seq data from human myometrial biopsies, available under NCBI GEO accession number GSE137550, alongside data from various other cell and tissue types. ReMap provides valuable insights into potential functional cis-acting elements in the genome from a systems biology perspective. However, tissue specificity requires independent validation.

      (8) "Notably, 76% of the putative super-enhancers are co-localized with known PGR-occupied regions in the human myometrial tissue (Figure S2). This is significantly higher than the 20% co-localization in the regular enhancer group (Figure S2)."

      Because there is a huge difference in the size of the putative super enhancer regions and the isolated enhancers this comparison is not appropriate as conducted. The comparison needs to account for the difference in size of the regions. Please provide P values for significance statements.

      We acknowledge the reviewer's concern that our initial statement was overstated and potentially misleading, given the substantial difference in size between putative super-enhancer regions and regular enhancers. Rather than emphasizing the enrichment, it would be more accurate to simply describe our observation that super-enhancers encompass more PGR-occupied regions.

      Here is the updated version:

      “Notably, 76% of the putative super-enhancers co-localize with known PGR-occupied regions in human myometrial tissue, compared to 20% co-localization observed in regular enhancers (Figure S2).”

      Reviewer #3 (Recommendations For The Authors):

      (1) Title is extremely misleading, as here we do not get a view of the epigenomic landscape, but rather sparce data related to H3K27ac and H3K4me (focusing on enhancers) and chromatin conformation associated with the PLCL2 transcription start site (TSS).

      As suggested, the title is modified to “Assessment of the Histone Mark-based Epigenomic Landscape in Human Myometrium at Term Pregnancy”.

      (2) Improve the first result paragraph by providing a clear rationale for the experiments and their objectives, as well as introducing the samples used. Rather than simply listing approaches and end results in Table 1, offer concise explanations for the experiments alongside the supporting data presented in detailed figures. Using appropriate figures/graphs to effectively contextualize these datasets would be greatly appreciated by readers and would add more value to this research. Currently, it is difficult for us to assess and appreciate the quality of the data.

      The following statement is included in the beginning of the Result section:

      "To better understand the regulatory network shaping the myometrial transcriptome before labor, we analyzed transcriptome and putative enhancers in individual human myometrial specimens. Using RNA-seq, we identified actively expressed RNAs, while ChIP-seq for H3K27ac and H3K4me1 was used to map putative enhancers. Active genes were associated with nearby putative enhancers based on their genomic proximity. Additionally, chromatin looping patterns were mapped using Hi-C to further link active genes and putative enhancers within the same chromatin loops."

      (3) The statistics for every sequencing approach need to be provided for each sample (e.g., RNA-seq: number of total reads, number of mapped reads, % of mapped reads; ChIP-Seq: number of mapped reads, % of mapped reads, % of duplicates).

      We have generated the summary table of each dataset included in this study (Dataset S7) [NGS-summary.xls].

      (4) Figure S1: The rationale behind comparing the Dotts study and yours regarding H3K27ac-positive regions needs to be better defined. Why is this performed if the data will not be used afterwards? What are the conserved regions associated with vs the ones that are variable? Is this biologically relevant? Why not use only the regions conserved between the 6 samples, to have more robust conclusions?

      The purpose of comparing our data with the Dotts dataset is to highlight the degree of variation across studies. In this study, we focused on addressing specific biological questions using our own dataset rather than developing methodologies for meta-analysis. Future advancements in meta-analysis techniques could leverage the combined power of multiple datasets to provide deeper insights.

      (5) Perhaps due to a lack of details, I am unable to ascertain how the putative myometrial enhancers were defined. In Dataset S1, it is stated, "we define the regions that have overlapping H3K27ac and H3K4me1 marks as putative myometrial enhancers at the term pregnant nonlabor stage (Dataset S1)". Within Dataset S1, for subjects 1, 2, and 3, H3K27ac and H3K4me1 double-positive enhancers are shown in term pregnant, non-labor human myometrial specimens, with approximately 100 regions corresponding to 131 (sample 1), 127 (sample 2), and 140 (sample 3) common peaks. However, in Figure 1a, reference is made to the 13114 putative enhancers commonly present across the three specimens. Is Dataset S1 intended to represent only a small fraction of the 13114 putative enhancers? Detailed analyses need to be conducted and better showcased.

      Dataset S1 has been updated to list all 13,114 putative enhancers.

      (6) For the gene expression analyses of RNA-seq data, FPKM values were utilized. However, it is unclear why the gene expression count matrix was normalized based on the ratio of total mapped read pairs in each sample to 56.5 million for the term myometrial specimens. I would recommend exercising caution regarding the use of FPKM expression units, as samples are normalized only within themselves, lacking cross-sample normalization. Consequently, due to external factors unaccounted for by this normalization method, a value of 10 in one sample may not equate to 10 in another.

      We value the reviewer’s input. This question will be addressed in future secondary data analyses with suitable methodologies, as it is beyond the scope of this study.

      (7) In Figure 1b, the authors have categorized their 12157 active genes into 3 bins based on FPKM values: >5 FPKM >1, >15 FPKM >5, and >15 FPKM. However, in the text, they describe these as 'actively high-expressing genes (FPKM >= 15)'. I would advise caution regarding the interpretation of these values, as an FPKM of 15 is not typically associated with highly expressed genes. According to literature and resources such as the Expression Atlas, an FPKM of 15 is generally considered to represent a low to medium expression level.

      We appreciate the reviewer’s feedback. This question will be revisited during secondary data analyses using appropriate methodologies, as it falls outside the scope of the present study.

      To increase readability and clarity, we modified the sentence as following: More than 40% of the 540 putative super enhancers are located within a 100-kilobase distance to high-expressing genes (FPKM >= 15), while only 7.3% of putative myometrial super enhancers are found near low-expressing genes (5 > FPKM >=1) (Figure 2B).

      (8) Out of the 12157 active genes, approximately two-thirds have an FPKM >15. Was this expected? How does this correspond to what is observed in the literature, particularly in other similar studies (https://pubmed.ncbi.nlm.nih.gov/30988671/ ; https://pubmed.ncbi.nlm.nih.gov/35260533/ ) .

      This is indeed an intriguing question that merits further exploration in future secondary analyses.

      (9) It is also surprising to see that for the motif enrichment analysis (Fig. 1C), the P-values are small. This is probably because the percentage of target sequences with the motif is very similar to the percentage of background sequences with the motif. For instance, for selected genes in Figure 1C: AP-1 (50.68% vs. 46.50%), STAT5 (28.08% vs. 25.04%), PGR (17.90% vs. 16.12%), etc. Can one really say that you have a biologically relevant enrichment for values that are so close between target sequences and background sequences?

      Reviewer’s comment is noted. Biological relevance shall be experimentally examined though wet-lab assays in future studies.

      (10) For Figure 2, again not convinced that FPKM >= 15 can be used to say: Compared with the regular putative enhancers, the putative myometrial super-enhancers are found more frequently near active genes that are expressed at relatively higher levels (Figure 1B and Figure 2B). A higher threshold should be used if they want to say this.

      To compare the association of putative enhancers with active genes expressed at different levels, we categorized the active genes into three groups based on their FPKM (Fragments Per Kilobase of transcript per Million mapped reads) values. These groups are defined as follows: the top third active genes (FPKM ≥ 15), the middle third active genes (5 ≤ FPKM < 15), and the bottom third active genes (1 ≤ FPKM < 5). By "active genes expressed at relatively higher levels," we refer specifically to the top third active genes with FPKM values of 15 or higher, indicating their relatively higher expression levels compared to the other groups of active genes.

      (11) More detailed explanations and methods are needed regarding how the data for Figure S2 was obtained.

      The following details were added to the methods section:

      “Colocalization of super enhancers and PGR genome occupancy was compared by calling peaks from previously published PGR ChIP-seq data (GSM4081683 and GSM4081684). The percentages of enhancers and super enhancers that manifest PGR occupancy were calculated by overlapping the genomic regions in each category with PGR occupancy regions.”

      (12) In Figure 2C, there is no information provided on the genes used to obtain the results. It would be helpful to include examples of these genes, along with their expression values, for instance.

      The expression levels of the 346 active genes that are associated with myometrial super enhancers are included in Dataset S4, along with results of the updated gene ontology enrichment analysis using the Database for Annotation, Visualization, and Integrated Discovery (DAVID) of Knowledgebase v2024q4. Selected pathways of interest are listed in updated Figure 2C.

      (13) The linking of PLCL2-related data to the first part of the story is lacking, and the rationale behind it is missing. This entire section should be more detailed, and the data should be expanded to better reflect the context.

      As suggested, we included the following statement at the beginning of the section “Cis-acting elements for the control of the contractile gene PLCL2”:

      “We previously demonstrated the positive correlation of PLCL2 and PGR expression in a mouse model and PLCL2’s function on negatively modulating oxytocin-induced myometrial cell contraction (Peavy et al., 2021). However, the mechanism underlies the PGR regulation of PLCL2 remains unclear. Taking advantage of the mapped myometrial cis-acting elements, we aimed to identify the cis-acting elements that may contribute to the PLCL2 transcriptional regulation with a special interest on the PGR-related enhancers.”

      The context is that our results provide additional evidence to support a direct regulation mechanism of PGR on the PLCL2 transcription, likely though the 35-kb upstream cis-acting element. This finding suggests that PLCL2 likely plays a mediator’s role of PGR dependent myometrial quiescence before laboring rather than a mere passenger on a parallel pathway. Further studies using in vivo models are needed to determine the extent of PLCL2 in mediating PGR, especially PGR-B isoform’s contraction-dampening function.

      (14) The entire Hi-C data should be presented to allow for the assessment of its quality and further value.

      The revised manuscript has included the Hi-C quality control summary in Dataset S8 [HiC-QC-Summary.xlsx].

      (15) The authors state: "For the purpose of functional screening, we focus on H3K27ac signals instead of using H3K27ac/H3K4me1 double positive criterium to cast a wider net." However, it is unclear how many of the targeted regions contained H3K27ac/H3K4me1 peaks. Were enhancers or super-enhancers targeted, and if so, how did they compare to H3K27ac sites?

      The numbers of H3K27ac/H3K4me1 double positive peaks are recorded in Figure 1A. Compared to the numbers of H3K27ac intervals (Table 1), the H3K27ac/H3K4me1 double positive peaks are 62.9%, 70.7%, and 61.2% of corresponding H3K27ac intervals in each individual specimen.

      (16) For the first set of data (Table 1), the authors state, "Together, these results reveal an epigenomic landscape in the human term pregnant myometrial tissue before the onset of labor, which we use as a resource to investigate the molecular mechanisms that prepare the myometrium for subsequent parturition." While it is acknowledged that an epigenetic landscape exists in all tissues, there is a lack of clarity regarding this landscape in the current manuscript, as we are only presented with a table containing numbers.

      This sentence has been revised to: “Together, these results delineate a map of H3K27ac and H3K4me1 positive signals in the human term pregnant myometrial tissue before the onset of labor, which we use as a resource to investigate the molecular mechanisms that prepare the myometrium for subsequent parturition.”

      (17) For S1, the authors conclude: These data together highlight the degree of variation in mapping the epigenome among specimens and datasets. This conclusion seems somewhat perplexing, and I find myself in partial disagreement. Firstly, providing a clear rationale for this section would strengthen the conclusions. It's important to consider what factors may contribute to this variability. It could simply be attributed to differences in experimental settings, such as variations in samples, protocols used, antibodies, sequencing departments, or overall data quality. Deeper analyses of the data could have provided more information.

      We agree with the reviewer that deeper analyses are needed in order to extract more information among studies. However, appropriate methods for meta-analyses should be carefully evaluated and employed for this purpose. We humbly believe that such a task should belong to future studies that may combine available datasets for secondary analyses, leveraging the collective contribution of the reproductive biology community.

      (18) In the methods section, please include an explanation of how enhancers and super-enhancers were defined or add appropriate citations for reference.

      Added more details about tool and parameter setting in the Method section of “Identification of super enhancers”.

      “Identification of super enhancers

      H3K27ac-positive enhancers were defined as regions of H3K27ac ChIP-seq peaks in each sample. The enhancers within 12.5Kb were merged by using bedtools merge function with parameter “-d 12500”. The combined enhancer regions were called super enhancers if they were larger than 15Kb. The common super enhancers from multiple samples were used for downstream analysis.”

      Reference:

      Whyte WA, Orlando DA, Hnisz D, Abraham BJ, Lin CY, Kagey MH, Rahl PB, Lee TI, Young RA. Master transcription factors and mediator establish super-enhancers at key cell identity genes. Cell. 2013 Apr 11;153(2):307-19. doi: 10.1016/j.cell.2013.03.035. PMID: 23582322; PMCID: PMC3653129.

      (19) Additional description on the "Inferred myometrial PGR activities and the correlation analysis "method section should be included to enhance clarity and understanding.

      The description has been updated:

      “The inferred PGR activities were represented by the T-score, which was derived by inputting the mouse myometrial Pgr gene signature, based on the differentially expressed genes between control and myometrial Pgr knockout groups at mid-pregnancy (Wu, Wang et al., 2022), into the SEMIPs application (Li, Bushel et al., 2021). The T-scores were computed using this signature alongside the normalized gene expression counts (FPKM) from 43 human myometrial biopsy specimens.”

      (20) How was the qPCR analysis performed? Was the ddCT method utilized, and was a reference gene used for control? Additional information would be beneficial.

      Quantifying relative mRNA levels was performed via the standard curve method.

      The following details were added: “Relative levels of genes of interest were normalized to the 18S rRNA.”

      (21) Regarding the RNA-Seq analysis of Provera-treated human Myometrial Specimens, the continued use of FPKM is not ideal due to potential differences in RNA composition between libraries. Additionally, clarification is needed on why Cufflinks 2.0.2 was used, considering it is no longer supported.

      FPKM (Fragments Per Kilobase of transcript per Million mapped reads) is used in RNA-Seq analysis, because it allows for the normalization of gene expression data, accounting for differences in gene length and sequencing depth, and facilitates comparability across different genes and libraries. This makes it one of the essential tools for accurately measuring and comparing gene expression levels in various biological and clinical research contexts.

      CuffLinks was once a popular tool for analyzing RNA-seq data, transcriptome assembly, and DEG identification. Its usage has declined in recent years due to the emergence of newer and more advanced tools. The main reason is that it was used for RNA-seq analysis at early stage of this study a few years ago. For the purpose of comparison and consistency, we continued using this tool for later RNA-seq analysis. If we start a new project now, we will choose newer tools, such as HISAT2, Salmon, and DEseq2.

      (22) Overall, sentence structure and typos need to be corrected across the text. Here are some examples:

      Line 17: at term, emerging studies.

      Line 20-22: Here we investigated the human term pregnant nonlabor myometrial biopsies for transcriptome, enhancer histone mark cistrome, and chromatin conformation pattern mapping.

      Line 30-32: PGR overexpression facilitated PLCL2 gene expression in myometrial cells Using CRISPR activation the functionality of a PGR putative enhancer 35-kilobases upstream of the contractile-restrictive gene PLCL2.

      Line 66-70: However, the role of differential myometrial DNA methylation at contractility-driving gene promoter CpG islands in preterm birth is not thought to be major (Mitsuya, Singh et al. 2014), but given that DNA methylation-mediated gene regulation often occurs outside of CpG islands (Irizarry, Ladd-Acosta et al. 2009), there is still work to be done at this interface.

      Line 80-83: Putative enhancers upstream of the PLCL2, a gene encoding for the protein PLCL2 which has been implicated in the modulation of calcium signaling (Uji, Matsuda et al. 2002) and maintenance of myometrial quiescence (Peavey, Wu et al. 2021), transcriptional start site were subject to functional assessment using CRISPR activation based assays.

      Line 290 : sSpecimens

      We appreciate the reviewer’s kind efforts and have made changes accordingly.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Recommendations for the authors):

      Major comments

      (1) The section on page 20 describing the proteomic analysis of EVs is poorly written and confusing, with a lot of data in the supplement. It is not clear what the proteomics data actually means.

      We appreciate your feedback on the clarity of the proteomic analysis section. We have rewritten the section on page 20 with more detained information to provide a clearer explanation of the proteomics data and its biological significance. Additionally, we have incorporated a comparative analysis of the EV and total cell lysate proteomes (Fig. 8E, Supplementary Fig. S7A, Supplementary Tables 3 and 4) for supplemental data interpretation.

      (2) The order of the data could be improved.

      We appreciate your feedback regarding the data organization. We have reorganized the order and position of some data in a more structured and coherent manner, as suggested by the reviewers.

      - Reorganization of the qPCR data (previously Fig. 1C) as Fig. 3A

      - Removal of the data on the growth analysis on raffinose media (previously Fig. 7H).

      -Reorganization of the spotting data of the double mutant (previously Fig 3B) to Supplementary Fig. S3B

      - Reorganization of the subcellular localization data (previously Fig 3E) to Supplementary Fig. S4A

      (3) The discussion is repetitive with the introduction and merely summarizes the results and speculates on the mechanism of how the absence of UGGT, leading to ERQC defects, results in defective EV biogenesis/cargo loading in C. neoformans.

      We removed several repetitive sentences in the discussion and provided additional information on proteome analysis.

      Other questions and comments

      (1) Instead of comprehensively analyzing EVs from the UGG1 mutant, a more informative approach to better understanding how defects in N-linked glycosylation impact secretion, would be to do a proteomic analysis on the total secretions (including beta glucanase-treated cells to release classically secreted proteins from the cell wall) and EVs.

      We agree that a comprehensive proteomic analysis of total secretions and classically secreted proteins would provide deeper insights into how defects in N-glycosylation impact secretion in C. neoformans. To address this concern, we performed an additional set of proteomic analyses, the proteome profiles of total cell lysates and the secretome of C. neoformans cultivated in SD broth and presented the results as Supplementary Table S5 and Supplementary Fig. S7B. These additional analyses provide further insights into the impact of UGG1 deletion on both conventional and unconventional secretion pathways, supporting a more pronounced effect of the UGG1 defect on EV-mediated trafficking. The discussion has been updated accordingly (Page 22, lines 509-514).

      (2) The melanization defect in Ugg1 mutant is not strong. Could the reduction be due to partially compromised Ugg1 mutant growth at 30{degree sign}C as indicated in the spot tests. Were photos of the spot dilution assays taken at 1 and 2 days to investigate slower growth? Or alternatively were growth curves taken in a liquid culture?

      For accuracy of melanin synthesis defect, in addition to analysis on L-DOPA plates, we had assessed melanin production in liquid L-DOPA medium following a 3-day incubation, and the melanin production in liquid media was normalized by cell density (OD<sub>600</sub>). The data on normalized melanin production is now included as Fig. 4B in the revised manuscript. The defective laccase activity in the _ugg1_Δ mutant (Fig. 7C) further corroborates our melanization assay results, which is additionally mentioned in the text (Page 18, lines 393-395).

      (3) Is it accurate to say that some virulence factors (i.e. melanin, capsule and phosphatases) are predominantly trafficked through EV's in C. neoformans? Have studies been done to determine the proportion of virulence factors trafficked via EV's versus traditional secretion?

      We thank you for the thoughtful comments. Some virulence factors, such as urease, melanin and capsule polysaccharides, lack a signal peptide required for targeting for the conventional ER/Golgi secretion pathway. It is generally assumed that the trafficking of these factors in C. neoformans is predominantly mediated by non-conventional secretion via EVs. Additionally, even some virulence factors with signal peptides, such as laccase and phosphatases, are also transported via EVs besides the conventional secretion. The quantitative analysis to compare the proportion of virulence factors secretion via EVs versus the conventional pathway has not been yet reported, despite that genetic evidence suggests that conventional secretion also plays a significant role in the export of capsule polysaccharides. Thus, we were also careful not to highlight EV as the main route of virulence factors in the manuscript.

      (4) There is insufficient background in the introduction linking what is known about the ERQC process to secretion in general. The topic changes from the ERQC process to fungal virulence factor, with a primary focus on non-classical (EV-based) secretion. Classical secretion should also be discussed without assuming that non classical (EV) secretion is the major pathway contributing to fungal virulence.

      We appreciate your insightful comments highlighting the need for more background on the ERQC process and its relationship with secretion. To address the reviewer’s concerns, we have added sentences to describe the key roles of ERQC in conventional protein secretion in the Introduction (Page 5, lines 102-106).

      (5) Figure 1A. What does the blue filled circle with the red outline signify? Fig 1 A legend is not well explained. A summary using material provided in the intro/discussion should be included to briefly explain the process and the differences between fungal species. Please also be aware that the intro starts describing the human ERQC process and then switches to what happens in S. cerevisiae.

      We have revised Figure 1A by removing the red circle and updated the figure legend in the revised manuscript to include more detailed information about the ERQC differences across higher eukaryotes and fungal species.

      (6) Figure 2A. There are no units on the Y-axis. Presumably, the scale is the same for all 3 strains.

      Thank you for your comments. The Y-axis is the same for all three strains and, as in Fig. 2C, and represents the relative fluorescence intensity obtained from the HPLC analysis. We added the units on the Y-axis in Fig. 2A.

      (7) If Mnl1 and 2 have proposed roles in proteasomal degradation, wouldn't they be expected to have ER retention signals, like Ugg1?

      We appreciate your valuable insights regarding the absence of ER retention signals in Mnl1 and Mnl2. Previous studies have shown that Saccharomyces cerevisiae Mnl1/Htm1 does not possess canonical KDEL/HDEL-like ER retention signals. Instead, its retention in the ER lumen is facilitated through its interaction with protein disulfide isomerase Pdi1, which contains an HDEL sequence (Gauss et al. 2011). Thus, it is expected that non-canonical retention mechanisms—such as interactions with other ER proteins—could contribute to the retention of Mnl1 and Mnl2 within the ER. We added this information to the revised manuscript (Page 8, lines 154-159).

      (8) Figure 1 C qPCR showing change in mRNA in response to ER stress should not be grouped in this figure. It could be standalone or discussed when the spot dilution assays are performed. Anyway, spots tests are more convincing of a role in stress response than qPCR as the ugg1 mutant is sensitive to tunicamycin, DTT and cell wall stressing agents.

      As suggested by the reviewer, we have reorganized the qPCR data as a part of Figure 3 (Figure 3A) in the revised manuscript.

      (9) It is odd that mns1/101 mutants are not sensitive to ER and CW stress given their proposed differing location/function in the pathway (Figure 1) determined from the N-linked profiling. Any explanation? Could there be redundancy?

      We appreciate the reviewer’s observation regarding the lack of ER and CW stress sensitivity in the mns1_Δ and _mns101_Δ mutants, despite their proposed roles in _N-glycan processing. We had previously reported that the C. neoformans alg3_Δ mutant, lacking a critical enzyme responsible for the synthesis of Dol-PP-Man<sub>6</sub>GlcNAc<sub>2</sub> in the _N-glycosylation pathway, exhibited clearly impaired N-glycan elongation, but showed no detectable growth defects even under stress conditions in vitro. However, alg3_Δ is avirulent in _in vivo pathogenicity (Thak et al., 2020). Similarly, the mns1_Δ_101_Δ double mutant shows glycan-processing defects that do not compromise cellular fitness under stress conditions but result in attenuated virulence in animal models. These findings suggest that some glycosylation-related defects may impact more severely _in vivo pathogenicity rather than in vitro stress sensitivity.

      (10) Although the Silver-stained gels of the ugg1 mutant are not particularly informative, why weren't they (and Con A blots) performed for the other mutants?

      The overall decrease of hypermannosylated glycans observed in the ugg1_Δ mutant allowed us to detect clear alterations in protein glycosylation patterns in the lectin blot using _Galanthus nivalis agglutinin, which recognizes terminal α1,2-, α1,3-, and α1,6-linked mannose residues. In contrast, the limited changes of a few glycan species in other mutants, including mns1_Δ, _mns101_Δ, and _mns1_Δ_101_Δ, are relatively subtle to be detected in the lectin blot, due to only minor differences in the average lengths of their _N-glycans compared to the WT. Therefore, we presented the lectin blotting data only for the _ugg1_Δ mutant.

      (11) If there is ER stress under normal conditions in the Ugg1 mutant then technically this mutant should be growing more slowly under normal conditions. This is difficult to predict in a spot dilution assay where growth is only visualized at day three when any growth defect may have been corrected. The slower growth rather than the reduced secretion of GXM specifically is therefore more likely to be responsible for the reduced virulence.

      We appreciate the reviewer’s insightful comment regarding the interplay between ER stress, growth defects, and virulence attenuation in the ugg1_Δ mutant. While retarded growth in _C. neoformans is often associated with reduced virulence, there are a few exceptions. For instance, disruptions in cell cycle progression in C. neoformans have been reported to result in larger capsule sizes, which rather enhance in vivo virulence when analyzed in Galleria mellonella infection models (García-Rodas et al., 2014). This highlights that growth defect alone is not sufficient for virulence attenuation. In the case of the _ugg1_Δ mutant, we speculate that the almost complete loss of virulence is attributed not only to its growth retardation but also to its impaired secretion of key virulence factors, including the polysaccharide capsule.

      (12) The rationale for using leucine analogue 5',5',5'-trifluoroleucine (TFL), in a growth assay (Fig. 3C) to determine whether the defective ugg1Δ phenotypes are induced by ER stress caused by misfolded protein accumulation is not explained.

      The leucine analogue 5',5',5'-trifluoroleucine (TFL) can be incorporated into newly synthesized proteins, disrupting normal folding and thus leading to the generation of misfolded proteins (Trotter et al., 2002; Cowie et al., 1959). In the context of a defective ERQC pathway, these misfolded proteins cannot be adequately repaired, resulting in their accumulation and triggering ER stress. Excessive ER stress may ultimately inhibit cell growth in the presence of TFL. This explanation has been incorporated into the revised manuscript (Page 11, lines 236–241).

      (13) I would argue that only the Ugg1 and double Mns mutant were defective in virulence. For the single mutants, it looks like no difference was found relative to WT. The longer median survival of these mutants (if significant) is most likely due to poor infection technique.

      We agree with the reviewer’s opinion that the mns1_Δ and _mns101_Δ single mutants have no significant difference in _in vivo virulence compared to the WT strain, unlike the _mns1_Δ_101_Δ double mutant which showed significant attenuated virulence. We had previously addressed that in the manuscript (Page 13, lines 267-269).

      (14) The authors conclude that the ugg1Δ strain specifically is impaired in extracellular secretion of capsular polysaccharides but is this via classical (SAV1) secretion or EVs?

      In addition to EV-mediated transport, capsular polysaccharide secretion can occur via the Sav1 (Sec4p)-mediated classical secretion pathway. However, our proteome data of total cell lysates indicated that the protein levels of Sav1 were comparable between the WT and _ugg1_Δ strains, suggesting that Sav1p function itself might not be impaired. Given that the _ugg1_Δ mutant exhibits altered vesicular structures (Supplementary Fig. S6) and loss of microvesicles (Fig. 8A), we speculate that a defect might occur at a post-Sav1p step, such as vesicle fusion with the plasma membrane, likely contributing to the complete defect in secretion of capsular polysaccharides in the _ugg1_Δ strain, in which EV biogenesis and defective cargo loading are severely impaired, producing EVs that lack capsular polysaccharides (Figure 8F). However, further studies should be carried out to define the contribution of SAV1 to the secretion of capsular polysaccharides in in the _ugg1_Δ strain.

      (15) The rationale for doing 7 H is very confusing.

      The experiment assessing raffinose utilization as a carbon source was inspired by the previous work of Garcia-Rivera et al., reporting that the _cap59_Δ mutant is unable to utilize raffinose due to a defect in the secretion of raffinose-hydrolyzing enzymes. As another way to investigate potential defects in the conventional secretion pathway, we investigated the growth of the _ugg1_Δ mutant in the presence of raffinose. Due to our extensive data length, we have decided to remove this complementary data from the manuscript.

      (16) It is speculated in the discussion that ER stress impacts lipid/sterol synthesis and that LDs (lipid droplets?) aid the UPR and ERAD in degrading misfolded proteins during ER stress in S. cerevisiae. The authors mention that they observed a drastic increase in LDs in the ugg1Δ mutant. Where is this data? Even with the data, this is all speculation. The authors also speculate that increased numbers of vacuoles in ugg1 (where is the data?) could be the cause of the altered vesicular structures observed in the mutants, which may indicate abnormal lipid homeostasis caused by the ERQC defects, which could, in turn, affect EV biogenesis. Again, this is speculative.

      The data on lipid droplets (LDs) and vacuole staining are presented in Supplementary Figure S6, showing a drastic increase in LDs and an increased in vacuolar size in the _ugg1_Δ mutant compared to the wild-type strain, especially in capsule-inducing conditions. In addition to such changes in vesicular structures, our preliminary data on sphingolipids and sterol analysis in the surface lipid fraction of the _ugg1_Δ mutant led us to propose the hypothesis that ERQC defects may impact lipid metabolism, which in turn could influence EV biogenesis and membrane properties. It is expected that these findings would provide a strong foundation for future studies exploring the link between ERQC, lipid homeostasis, and EV biogenesis. We have revised our speculation on the association of abnormal lipid homeostasis, caused by ERQC, with EV biogenesis more appropriately by adding the information on our preliminary data of lipid profiles and mentioning that the _ugg1_Δ mutant lacks microvesicles, which are derived from the plasma membrane (Page 24, lines 554-559).

      Reviewer #2 (Recommendations for the authors):

      (1) My suggestions for the authors are the same as those presented in the public review: (1) reducing the text in certain sections of the paper to improve readability for the audience, and (2) reconsidering the figures to reduce the amount of information in each one, moving some of the content to the supplementary material.

      We thank the reviewer for their constructive suggestions regarding the organization and readability of the manuscript. As suggested, we addressed your concerns as follows:

      (1) Reducing the text in the Introduction, Results, and Discussion sections by removing repetitive statements and simplifying complex descriptions where possible.

      (2) Changing the presentation of figures: we have also reorganized the presentation of some data by moving non-essential data to the supplementary material. The updated figures and supplementary materials have been clearly referenced in the text to guide readers.

      (3) Reorganization of materials and methods: some parts of methods were moved to Supplementary Information

      (4) Removal of Figure 7H and the sentences describing the result

      More detailed explanations on the reduction and reorganization are also described in the response to the major comments (2) and (3) made by Reviewer #1.

      (2) Figure 3, for example, shows no difference in fungal growth under different cultivation conditions. This information is valuable but could be mentioned in the text, with the image provided as supplementary material, focusing the figure only on images that show significant growth differences among the strains. I suggest a similar approach for other figures so that the authors can include only the most relevant results in the main body of the article and move some figures to the supplementary materials.

      For Fig. 3, the spotting data of the double mutant (previously Fig. 3B) is now presented in the supplementary information (Supplementary Fig. S3B). Additionally, the subcellular localization data (previously Fig 3E) was also moved to the supplementary material (Supplementary Fig. S4A).

      Reviewer #3 (Recommendations for the authors):

      (1) Line 43 "EV-mediated transport of virulence bags" doesn't make sense. EVs have been described as "virulence bags" (and are in this work later in the introduction) but this should here be "transport of virulence factors" or "compounds associated with virulence" but only if you have confirmed that the "cargo" is consistent with this- which is not evident in the abstract.

      Thank you for your insightful comment. We have revised this to "EV-mediated transport of virulence factors" in line with your suggestion.

      (2) Line 49 "secretory pathway" - is there not more than one secretion pathway?

      Thank you for pointing this out. The term "secretory pathway" has been updated to "secretory pathways" to acknowledge the presence of both conventional and unconventional secretion mechanisms.

      (3) Line 53 "recognizes folding defects, repairs them, and ensures the translocation of irreparable misfolded proteins" should be "recognizes folding defects and repairs them or ensures the translocation of irreparable misfolded proteins.

      Thank you for pointing this out. We have revised the sentence as you suggested.

      (4) Lines 88-90 ALG needs to be written out the first time - Asn-linked glycans. Also, consider adding that ALG genes are present in most eukaryotes as it is unclear what you are comparing C. neoformans to.

      Thank you for your helpful comment. We have revised the text to write out "ALG" as "Asn-linked glycosylation" and added the sentence “ALG genes are evolutionary conserved in most eukaryotes” in the revised manuscript (Page 4, line 84).

      (5) Line 99 Cryptococcus has already been abbreviated to C. so don't write it out again.

      We have corrected "Cryptococcus" to “C.” throughout the manuscript after its first mention.

      (6) Line 152- tunicamycin and DTT are not described yet, which may make it challenging for some readers to understand what these drugs are doing/why they were used. What is on lines 156 and 157 for these drugs should go up with the first mention of these drugs.

      Thank you for your helpful suggestion. We have revised the manuscript to include the descriptions and purpose of using tunicamycin (TM) and dithiothreitol (DTT) immediately following their first mention, as recommended (Page 10, lines 208-210).

      (7) The text for Figure 1 C is inaccurate. High temperature also induced KAR2, as noted above, but inaccurately stated in line 160. There is no comment on the significant UGG1 increase with tunicamycin or that KAR2 was highest in this condition.

      Thank you for your thoughtful comment. We have better clarified the significant increase of UGG1 expression following tunicamycin treatment and KAR2 induction upon heat stress in the revised manuscript (Page 10, lines 216-217). Please note that Fig. 1C was revised and is now referred to as Fig. 3A.

      (8) Figure 2B is not well explored/explained. There appears to be more protein in the mutant, including of higher weight in the intracellular compartment. It is difficult to ascertain if there is more too in the secretion phase with this gel. The methods do not specifically describe the concentration of protein added - just volume. Is what we are seeing a loading issue vs real differences?

      Thank you for your insightful comments regarding Figure 2B. We added information on amounts of protein (30 µg per lane) in the legend of Figure 2B.

      The main purpose of Fig. 2B is to examine the altered glycosylation pattern of ERQC by detecting glycoproteins using the Galanthus nivalis agglutinin, which specifically bind terminal α1,2-, α1,3-, and α1,6-linked mannose residues. The result of lectin blotting indicated that glycoproteins are more abundantly detected in the secretion fraction compared to in the soluble intracellular fraction, consistent with the general notion that more than 50% of secretory proteins are glycoproteins. Also, the more abundant proteins with decreased molecular weight in the secretion fraction of ugg1_Δ mutant supported the _N-glycan profiles with decreased hypermannosylation in _ugg1_Δ mutant. We added the purpose and more detailed interpretation on Figure 2B in the revised manuscript (Page 9, lines 174-179).

      (9) Line 242 "melanin pigment" is redundant as melanin is a pigment.

      We thank the reviewer for pointing out the redundancy in the phrase. We revised the text to simply state "melanin".

      (10) Line 250 drops "completely" especially as the mutant did colonize the lungs of mice.

      To avoid any possible misleading, we removed the term "completely" in the revised manuscript.

      (11) Line 275- need to reference 18B7 as it is first introduced here.

      We added the reference on the antibody 18B7 in the revised manuscript.

      (12) Line 308- there are specific techniques to measure GXM size that could validate or refute the statement on "incomplete" polysaccharides. For example, DOI:10.1128/EC.00268-09.

      We appreciated the valuable suggestion on specific techniques to measure GXM size, which will be one of key experiments in our future study. In the revised manuscript we cited the suggested reference to indicate the need for validation of our statement (Page 14, lines 316-318).

      (13) Line 496 "mammals" - why is this used when the study is on a fungus, not a mammal? The structure of the first 2 paragraphs can be clearer to focus more on fungal biology.

      We have compared both mammals and fungi to emphasize that the ERQC system is conserved among eukaryotes but diverged with a few species-specific features. This comparison is relevant in the context of understanding the evolutionary unique features of ERQC pathways in C. neoformans. We modified the first 2 paragraphs to clarify the main issue of our present study (Page 21, lines 472-483).

      (14) Line 525- the ugg mutant was not avirulent as CFU was present and histopathology in the supplementary figures shows the tissue with ugg1 deletion was not normal (although the images are not especially easy to review). Yes, the mutant did not kill under your test conditions, but it was not avirulent (incapable of causing disease). Significantly attenuated or other descriptors should be utilized. Line 548 is also thus incorrect "complete loss of virulence").

      We appreciate the reviewer’s concern regarding the description of the _ugg1_Δ mutant as avirulent. We agree that the use of merely “avirulent" may not fully capture the observed phenotypes in the CFU and histopathological data, since we cannot exclude the possibility that the _ugg1_Δ mutant retains the ability to establish an infection. Thus, we have revised the text by describing the _ugg1_Δ mutant as "almost avirulent".

      (15) Line 597- the study by Fukuoka used kidney cells. It is misleading to not clearly state that this finding of ER stress was NOT done in fungi as the way it is presented makes it read as if this work was performed in C. neoformans. This should be clarified. This should also be double-checked and clarified for other statements, such as the reference to Harada in line 606, as this study used melanoma cells. These cell types are very different from cryptococcus- though I absolutely concur that lessons can be learned from comparative assessments.

      We thank the reviewer for pointing out the need to clarify the experimental context of the cited studies. We explicitly stated the host cell types used in the referenced studies by Fukuoka et al. and by Harada et al., respectively, in the revised manuscript (Page 25, lines 560 and 568).

    1. Author response:

      The following is the authors’ response to the original reviews

      eLife assessment 

      This valuable study aims to present a mathematical theory for why the periodicity of the hexagonal pattern of grid cell firing would be helpful for encoding 2D spatial trajectories. The idea is supported by solid evidence, but some of the comparisons of theory to the experimental data seem incomplete, and the reasoning supporting some of the assumptions made should be strengthened. The work would be of interest to neuroscientists studying neural mechanisms of spatial navigation. 

      We thank the reviewers for this assessment. We have addressed the comments made by reviewers and believe that the revised manuscript has theoretical and practical implications beyond the subfield of neuroscience concerned with mechanisms underpinning spatial memory and spatial navigation. Specifically, the demonstration that four simple axioms beget the spatial firing pattern of grid cells is highly relevant for the field of artificial intelligence and neuromorphic computing. This relevance stems from the fact that the four axioms define a set of four simple computational algorithms that can be implemented in future work in grid cell-inspired computational algorithms. Such algorithms will be impactful because they can perform path integration, a function that is independent of an animal’s or agent’s location and therefore generalizable. Moreover, because of the functional organization of grid cells into modules, the algorithm is also scalable. Generalizability and scalability are two highly sought-after properties of brain-inspired computational frameworks. We also believe that the question why grid cells emerge in the brain is a fundamental one. This manuscript is, to our knowledge, the first one that provides an interpretable and intuitive answer to why grid cells are observed in the brain. 

      Before addressing each comment, we would like to point out that the first sentence of the assessment appears misphrased. The study does not aim to present a theory for why the periodicity in grid cell firing would be helpful for encoding 2D spatial trajectories. To present a theory “for why grid cell firing would be helpful for encoding 2D trajectories”, one assumes the existence of grid cells a priori. Instead of assuming the existence of grid cells and deriving a computational function from grid cells, our study derives grid cells from a computational function, as correctly summarized by reviewers #1 and #3 in their individual statements. In contrast to previous normative models, we prove mathematically that spatial periodicity in grid cell firing is implied by a sequence code of trajectories. If the brain uses cell sequences to code for trajectories, spatially periodic firing must emerge. As correctly pointed out by reviewer #1, the underlying assumptions of this study are that the brain codes for trajectories and that it does so using cell sequences. In response to comments by reviewer #1, we now discuss these two assumptions more rigorously.

      Public Reviews:

      Reviewer #1 (Public Review): 

      Rebecca R.G. et al. set to determine the function of grid cells. They present an interesting case claiming that the spatial periodicity seen in the grid pattern provides a parsimonious solution to the task of coding 2D trajectories using sequential cell activation. Thus, this work defines a probable function grid cells may serve (here, the function is coding 2D trajectories), and proves that the grid pattern is a solution to that function. This approach is somewhat reminiscent in concept to previous works that defined a probable function of grid cells (e.g., path integration) and constructed normative models for that function that yield a grid pattern. However, the model presented here gives clear geometric reasoning to its case. 

      Stemming from 4 axioms, the authors present a concise demonstration of the mathematical reasoning underlying their case. The argument is interesting and the reasoning is valid, and this work is a valuable addition to the ongoing body of work discussing the function of grid cells. 

      However, the case uses several assumptions that need to be clearly stated as assumptions, clarified, and elaborated on: Most importantly, the choice of grid function is grounded in two assumptions: 

      (1) that the grid function relies on the activation of cell sequences, and 

      (2) that the grid function is related to the coding of trajectories. While these are interesting and valid suggestions, since they are used as the basis of the argument, the current justification could be strengthened (references 28-30 deal with the hippocampus, reference 31 is interesting but cannot hold the whole case). 

      We thank this reviewer for the overall positive and constructive criticism. We agree with this reviewer that our study rests on two premises, namely that 1) a code for trajectories exist, and 2) this code is implemented by cell sequences. We now discuss and elaborate on the data in the literature supporting the two premises.

      In addition to the work by Zutshi et al. (reference 31 in the original manuscript), we have now cited additional work presenting experimental evidence for sequential activity of neurons in the medial entorhinal cortex, including sequential activity of grid cells.

      We have added the following paragraph to the Discussion section:

      “Recent studies provided compelling evidence for sequential activity of neurons representing spatial trajectories. In particular, Gardner et al. (2022) demonstrated that the sequential activity of hundreds of simultaneously recorded grid cells in freely foraging rats represented spatial trajectories. Complementary preliminary results indicate that grid cells exhibit left-rightalternating “theta sweeps,” characterized by temporally compressed sequences of spiking activity that encode outwardly oriented trajectories from the current location (Vollan et al., 2024).

      The concept of sequential grid cell activity extends beyond spatial coding. In various experimental contexts, grid cells have been shown to encode non-spatial variables. For instance, in a stationary auditory task, grid cells fired at specific sounds along a continuous frequency axis (Aronov et al., 2017). Further studies revealed that grid cell sequences also represent elapsed time and distance traversed, such as during a delay period in a spatial alternation task (Kraus et al., 2015). Similar findings were reported for elapsed time encoded by grid cell sequences in mice performing a virtual “Door Stop” task (Heys and Dombeck, 2018).

      Additionally, spatial trajectories represented by temporally compressed grid cell sequences have been observed during sleep as replay events (Ólafsdóttir et al., 2016; O’Neill et al., 2017). Collectively, these studies demonstrate that sequential activity of neurons within the MEC, particularly grid cells, consistently encodes ordered experiences, suggesting a fundamental role for temporal structure in neuronal representations.

      The theoretical underpinnings of grid cell activity coding for ordered experiences have been explored previously by Rueckemann et al. (2021) who argued that the temporal order in grid cell activation allows for the construction of topologically meaningful representations, or neural codes, grounded in the sequential experience of events or spatial locations. However, while Rueckemann et al. argue that the MEC supports temporally ordered representations through grid cell activity, our findings suggest an inverse relationship: namely, that grid cell activity emerges from temporally ordered spatial experiences. Additional studies demonstrate that hippocampal place cells may derive their spatial coding properties from higher-order sequence learning that integrates sensory and motor inputs (Raju et al., 2024) and that hexagonal grids, if assumed a priori, optimally encode transitions in spatiotemporal sequences (Waniek, 2018).

      Together, experimental and theoretical evidence demonstrate the significance of sequential neuronal activity within the hippocampus and entorhinal cortex as a core mechanism for representing both spatial and temporal information and experiences.”

      The work further leans on the assumption that sequences in the same direction should be similar regardless of their position in space, it is not clear why that should necessarily be the case, and how the position is extracted for similar sequences in different positions. 

      We thank this reviewer for giving us the opportunity to clarify this point. We define a trajectory as a path taken in space (Definition 6). By this definition, a code for trajectories is independent of the animal’s spatial location. This is consistent with the definition of path integration, which is also independent of an animal’s spatial location. If the number of neurons is finite (Axiom #4) and the space is large, sequences must eventually repeat in different locations. This results in neural sequences coding for the same directions being identical at different locations. We have clarified this point under new Remark 6.1. in the Results section of the revised:

      “Remark 6.1. Note that a code for trajectories is independent of the animal’s spatial location, consistent with the definition of path integration. This implies that, if the number of neurons is finite (Axiom #4) and the space is large, sequences must eventually repeat in different location, resulting in neural sequences coding for the same trajectories at different locations.”

      The formal proof was already included in the original manuscript: “Generally speaking, starting in a firing field of element i and going along any set of firing fields, some element must eventually become active again since the total number of elements is finite by axiom 4. Once there is a repeat of one element’s firing field, the whole sequence of firing fields of all elements must repeat by axiom 1. More specifically, if we had a sequence 1,2, … , k, 1, t of elements, then 1,2 and 1, t both would code for traveling in the same direction from element 1, contradicting axiom 1.”

      Further: “More explicitly, assuming axioms 1 and 4, the firing fields of trajectory-coding elements must be spatially periodic, in the sense that starting at any point and continuing in a single direction, the initial sequence of locally active elements must eventually repeat with a repeat length of at least 3”.

      Regarding the question how an animal’s position is extracted for similar sequences in different positions, we agree with this reviewer that this is an important question when investigating the contributions of grid cells to the coding of space. However, since a code for trajectories is independent of spatial location, the question of how to extract an animal’s position from a trajectory code is irrelevant for this study.

      While a trajectory code by neural sequences begets grid cells, a spatial code by neural sequences does not. Nevertheless, grid cells could contribute to the coding of space (in addition to providing a trajectory code). However, while experimental evidence from studies with rodents and human subjects and theoretical work demonstrated the importance of grid cells for path integration (Fuhs and Touretzky, 2006; McNaughton et al., 2006; Moser et al., 2017), experimental studies have shown that grid cells contribute little to the coding of space by place cells (Hales et al., 2014). Yet, theoretical work (Mathis et al., 2012) showed that coherent activity of grid cells across different modules can provide a code for spatial location that is more accurate than spatial coding by place cells in the hippocampus. Importantly, such a spatial code by coherent activity across grid cell modules does not require location-dependent differences in neural sequences.

      The authors also strengthen their model with the requirement that grid cells should code for infinite space. However, the grid pattern anchors to borders and might be used to code navigated areas locally. Finally, referencing ref. 14, the authors claim that no existing theory for the emergence of grid cell firing that unifies the experimental observations on periodic firing patterns and their distortions under a single framework. However, that same reference presents exactly that - a mathematical model of pairwise interactions that unifies experimental observations. The authors should clarify this point. 

      We thank this reviewer for this valuable feedback. We agree that grid cells anchor to borders and may be used to code navigated areas locally. In fact, the trajectory code performs a local function, namely path integration, and the global grid pattern can only emerge from performing this local computation if the activity of at least one grid unit or element (we changed the wording from unit to element based on feedback from reviewer #3) is anchored to either a spatial location or a border. Yet, the trajectory code itself does not require anchoring to a reference frame to perform local path integration. Because of the local nature of the trajectory code, path integration can be performed locally without the emergence of a global grid pattern. This has been shown experimentally in mice performing a path integration task where changes in the location of a task-relevant object resulted in translations of grid patterns in single trials. Although no global grid pattern was observed, grid cells performed path integration locally within the multiple reference frames defined by the task-relevant object, and grid patterns were visible when the changes in the references frames were accounted for in computing the rate maps (Peng et al., 2023). The data by Peng et al. (2023) confirm that the anchoring of the grid pattern to borders and the emergence of the global pattern are not required for local coding of trajectories. The global pattern emerges only when the reference frame does not change. However, this global pattern itself might not serve any function. According to the trajectory code model, the beguiling grid pattern is merely a byproduct of a local path integration function that is independent of the animal’s current location (which makes the code generalizable across space). The reviewer is correct that, if the reference frame used to anchor the grid pattern did not change in infinite space, the trajectory code model of grid cell firing would predict an infinite global pattern. But does the proof implicitly assume that space is infinite? The trajectory code model makes the quantitative prediction that the field size increases linearly with an increase in grid spacing (the distance between two fields). If the field size remains fixed, periodicity will emerge in finite spaces that are larger than the grid spacing. We have clarified these points in the revised manuscript:

      “Notably, the trajectory code itself does not require anchoring to a reference frame to perform local path integration. Because of the local nature of the trajectory code, path integration can be performed locally without the emergence of a global grid pattern. This has been shown experimentally in mice performing a path integration task where changes in the location of a task-relevant object resulted in translations of grid patterns in single trials (Peng et al., 2023). Although no global grid pattern was observed because the reference frame was not fixed in space, grid cells performed path integration locally within the reference frame defined by the moving task-relevant object, and grid patterns were visible when the changes in the references frames were accounted for in computing the rate maps”.

      Regarding how the emergence of grid cells from a trajectory code relates to the theory of a local code by grid cells brought forward by Ginosar et al. (ref. 14), we argue that the local computational function suggested by Ginosar et al. is to provide a code for trajectories. The perspective article by Ginosar et al. provides an excellent review of the experimental data on grid cells that point to grid cells performing a local function (see also Kate Jeffery’s excellent review article (Jeffery, 2024) on the mosaic structure of the mammalian cognitive map.) Assuming the existence of grid cells a priori, Ginosar et al. then propose three possible functions of grid cells, all of which are consistent with the trajectory code model of grid cell firing. Yet, the perspective article remains agnostic, in our opinion, on the exact nature of the local computation that is carried out by grid cells. But without knowing the local computation underlying grid cell function, a unifying theory explaining the emergence of grid cells cannot be considered complete. In contrast, our manuscript identifies the local computational function as a trajectory code by cell sequences. We have clarified these points in the revised manuscript:

      “The influential hypothesis that grid cells provide a universal map for space is challenged by experimental data suggesting a yet to be identified local computational function of grid cells (Ginosar et al., 2023; Jeffery, 2024). Here, we identify this local computational function as a trajectory code.”

      The mathematical model of pairwise interactions described by Ginosar et al. is fundamentally different from the mathematical framework developed in our manuscript. The mathematical model by Ginosar et al. describes how pairwise interactions between already existent grid fields can explain distortions in the grid pattern caused by the environment’s geometry, reward zones, and dimensionality. However, the model does not explain why there is a grid pattern in the first place. In contrast, our trajectory model provides an explanation for why grid cells may exist by demonstrating that a grid pattern emerges from a trajectory code by cell sequences. We stand by our assessment that a unifying theory of grid cells is not complete if it takes the existence of the grid pattern for granted.

      Reviewer #2 (Public Review): 

      Summary: 

      In this work, the authors consider why grid cells might exhibit hexagonal symmetry - i.e., for what behavioral function might this hexagonal pattern be uniquely suited? The authors propose that this function is the encoding of spatial trajectories in 2D space. To support their argument, the authors first introduce a set of definitions and axioms, which then lead to their conclusion that a hexagonal pattern is the most efficient or parsimonious pattern one could use to uniquely label different 2D trajectories using sequences of cells. The authors then go through a set of classic experimental results in the grid cell literature - e.g. that the grid modules exhibit a multiplicative scaling, that the grid pattern expands with novelty or is warped by reward, etc. - and describe how these results are either consistent with or predicted by their theory. Overall, this paper asks a very interesting question and provides an intriguing answer. However, the theory appears to be extremely flexible and very similar to ideas that have been previously proposed regarding grid cell function. 

      We thank this reviewer for carefully reading the manuscript and their valuable feedback which helps us clarify major points of the study. One major clarification is that the theoretical/axiomatic framework we put forward does not assume grid cells a priori. In contrast, we start by hypothesizing a computational function that a brain region shown to be important for path integration likely needs to solve, namely coding for spatial trajectories. We go on to show that this computational function begets spatially periodic firing (grid maps). By doing so, we provide mathematical proof that grid maps emerge from solving a local computational function, namely spatial coding of trajectories. Showing the emergence of grid maps from solving a local computational function is fundamentally different from many previous studies on grid cell function, which assign potential functions to the existing grid pattern. As we discuss in the manuscript, our work is similar to using normative models of grid cell function. However, in contrast to normative models, we provide a rigorous and interpretable mathematical framework which provides geometric reasoning to its case.

      Major strengths: 

      The general idea behind the paper is very interesting - why *does* the grid pattern take the form of a hexagonal grid? This is a question that has been raised many times; finding a truly satisfying answer is difficult but of great interest to many in the field. The authors' main assertion that the answer to this question has to do with the ability of a hexagonal arrangement of neurons to uniquely encode 2D trajectories is an intriguing suggestion. It is also impressive that the authors considered such a wide range of experimental results in relation to their theory.  

      We thank this reviewer for pointing out the significance of the question addressed by our manuscript.

      Major weaknesses: 

      One major weakness I perceive is that the paper overstates what it delivers, to an extent that I think it can be a bit confusing to determine what the contributions of the paper are. In the introduction, the authors claim to provide "mathematical proof that ... the nature of the problem being solved by grid cells is coding of trajectories in 2-D space using cell sequences. By doing so, we offer a specific answer to the question of why grid cell firing patterns are observed in the mammalian brain." This paper does not provide proof of what grid cells are doing to support behavior or provide the true answer as to why grid patterns are found in the brain. The authors offer some intriguing suggestions or proposals as to why this might be based on what hexagonal patterns could be good for, but I believe that the language should be clarified to be more in line with what the authors present and what the strength of their evidence is. 

      We thank this reviewer for this assessment. While there is ample experimental evidence demonstrating the importance of grid cells for path integration, we agree with this reviewer that there may be other computational functions that may require or largely benefit from the existence of grid cells. We now acknowledge the fact that we have provided a likely teleological cause for the emergence of grid cells and that there might be other causes for the emergence of grid cells. We have changed the wording in the abstract and discussion sections to acknowledge that our study does provide a likely teleological cause. We choose “likely” because the computational function – trajectory coding – from which grid maps emerge is very closely associated to path integration, which numerous experimental and theoretical studies associate with grid cell function.

      Relatedly, the authors claim that they find a teleological reason for the existence of grid cells - that is, discover the function that they are used for. However, in the paper, they seem to instead assume a function based on what is known and generally predicted for grid cells (encode position), and then show that for this specific function, grid cells have several attractive properties. 

      We agree with this reviewer that we leveraged what is known about grid cells, in particular their importance for path integration, in finding a likely teleological cause. However, the major significance of our work is that we demonstrate that coding for spatial trajectories requires spatially periodic firing (grid cells).This is very different from assuming the existence of grid cells a priori and then showing that grid cells have attractive, if not optimal, properties for this function. If we had shown that grid cells optimized a code for trajectories, this reviewer would be correct: we would have suggested just another potential function of grid cells. Instead, we provide both proof and intuition that trajectory coding by cell sequences begets grid cells (not the other way around), thereby providing a likely teleological cause for the emergence of grid cells. As stated above, we clarified in the revised manuscript that we provide a likely teleological cause which requires additional experimental verification.

      There is also some other work that seems very relevant, as it discusses specific computational advantages of a grid cell code but was not cited here: https://www.nature.com/articles/nn.2901

      We thank this reviewer for pointing us toward this article by (Sreenivasan and Fiete, 2011). The revised manuscript now cites this article in the Introduction and Discussion sections. We agree that the article by (Sreenivasan and Fiete, 2011) discusses a specific computational advantage of a population code by grid cells, namely unprecedented robustness to noise in estimating the location from the spiking information of noisy neurons. However, the work by (Sreenivasan and Fiete, 2011) differs from our work in that the authors assume the existence of grid cells a priori.

      In addition, we now discuss other relevant work, namely work on the conformal isometry hypothesis  by (Schøyen et al., 2024) and (Xu et al., 2024), published as pre-prints after publication of the first version of our manuscript, as well as work on transition scale- spaces by Nicolai Waniek. (Xu et al., 2024) and (Schøyen et al., 2024) investigate conformal isometry in the coding of space by grid cells. Conformal isometry means that trajectories in neural space map trajectories in physical space. (Xu et al., 2024) show that the conformal isometry hypothesis can explain the spatially periodic firing pattern of grid cells. (Schøyen et al., 2024) further show that a module of seven grid cells emerges if space is encoded as a conformal isometry, ensuring equal representation in all directions. While the work by (Xu et al., 2024) and (Schøyen et al., 2024) arrive at very similar conclusions as stated in the current manuscript, the conformal isometry hypothesis provides only a partial answer to why grid cells exist because it doesn’t explain why conformal isometry is important or required. In contrast, a sequence code of trajectories provides an intuitive answer to why such a code is important for animal behavior. Furthermore, we included the work by Nicolai Waniek, (2018, 2020) in the Discussion, who demonstrated that the hexagonal arrangement of grid fields is optimal for coding transitions in space. 

      The paragraph added to the Discussion reads as follows:

      “As part of the proof that a trajectory code by cell sequences begets spatially periodic firing fields, we proved that the centers of the firing fields must be arranged in a hexagonal lattice. This arrangement implies that the neural space is a conformally isometric embedding of physical space, so that local displacements in neural space are proportional to local displacements of an animal or agent in physical space, as illustrated in Figure 5. This property has recently been introduced in the grid cell literature as the conformal isometry hypothesis(Schøyen et al., 2024; Xu et al., 2024). Strikingly, Schøyen et al.(Schøyen et al., 2024) arrive at similar if not identical conclusions regarding the geometric principles in the neural representations of space by grid cells.”

      A second major weakness was that some of the claims in the section in which they compared their theory to data seemed either confusing or a bit weak. I am not a mathematician, so I was not able to follow all of the logic of the various axioms, remarks, or definitions to understand how the authors got to their final conclusion, so perhaps that is part of the problem. But below I list some specific examples where I could not follow why their theory predicted the experimental result, or how their theory ultimately operated any differently from the conventional understanding of grid cell coding. In some cases, it also seemed that the general idea was so flexible that it perhaps didn't hold much predictive power, as extra details seemed to be added as necessary to make the theory fit with the data. 

      I don't quite follow how, for at least some of their model predictions, the 'sequence code of trajectories' theory differs from the general attractor network theory. It seems from the introduction that these theories are meant to serve different purposes, but the section of the paper in which the authors claim that various experimental results are predicted by their theory makes this comparison difficult for me to understand. For example, in the section describing the effect of environmental manipulations in a familiar environment, the authors state that the experimental results make sense if one assumes that sequences are anchored to landmarks. But this sounds just like the classic attractornetwork interpretation of grid cell activity - that it's a spatial metric that becomes anchored to landmarks. 

      We thank this reviewer for giving us the opportunity to clarify in what aspects the ‘sequence code of trajectories’ theory of grid cell firing differs from the classic attractor network models, in particular the continuous attractor network (CAN) model. First of all, the CAN model is a mechanistic model of grid cell firing that is specifically designed to simulate spatially periodic firing of grid cells in response to velocity inputs. In contrast, the sequence code of trajectories theory of grid cell firing resembles a normative model showing that grid cells emerge from performing a specific function. However, in contrast to previous normative models, the sequence code of trajectories model grounds the emergence of grid cell firing in a mathematical proof and both geometric reasoning and intuition. The proof demonstrates that the emergence of grid cells is the only solution to coding for trajectories using cell sequences. The sequence code of trajectories model of grid cell firing is agnostic about the neural mechanisms that implements the sequence code in a population of neurons. One plausible implementation of the sequence code of trajectories is in fact a CAN. In fact, the sequence code of trajectories theory predicts conformal isometry in the CAN, i.e., a trajectory in neural space is proportional to a trajectory of an animal in physical space. However, other mechanistic implementations are possible. We have clarified how the sequence code of trajectories theory of grid cells relates to the mechanistic CAN models of grid cells. 

      We added the following text to the Discussion section:

      “While the sequence code of trajectories-model of grid cell firing is agnostic about the neural mechanisms that implements the sequence code, one plausible implementation is a continuous attractor network (McNaughton et al., 2006; Burak and Fiete, 2009). Interestingly, a sequence code of trajectories begets conformal isometry in the attractor network, i.e., a trajectory in neural space is proportional to a trajectory of an animal in physical space.”

      It was not clear to me why their theory predicted the field size/spacing ratio or the orientation of the grid pattern to the wall. 

      We thank this reviewer for bringing to our attention that we lacked a proper explanation for why the sequence code of trajectories theory predicts the field size/spacing ration in grid maps. We have modified/added the following text to the Results section of the manuscript to clarify this point:

      “Because the sequence code of trajectories model of grid cell firing implies a dense packing of firing fields, the spacing between two adjacent grid fields must change linearly with a change in field size. It follows that the ratio between grid spacing and field size is fixed. When using the distance between the centers of two adjacent grid fields to measure grid spacing and a diameter-like metric to measure grid field size, we can compute the ratio of grid spacing to grid field size as √7≈2.65 (see Methods).”

      We are also grateful for this reviewer’s correctly pointing out that the explanation as to why the sequence code of trajectories predicts a rotation of the grid pattern relative to a set of parallel walls in a rectangular environment. We have now made explicit the underlying premise that a sequence of firing fields from multiple grid cells are aligned in parallel to a nearby wall of the environment. We cite additional experimental evidence supporting this premise. Concretely, we quote Stensola and Moser summarizing results reported in (Stensola et al. 2015): “A surprising observation, however, was that modules typically assumed one of only four distinct orientation configurations relative to the environment” (Stensola and Moser, 2016). Importantly, all of the four distinct orientations show the characteristic angular rotation. Intriguingly, this is predicted by the sequence code of trajectories-model under the premise that a sequence of firing fields aligns with one of the geometric boundaries of the environment, as shown in Author response image 1 below.

      Author response image 1.

      Under the premise that a sequence of firing fields aligns with one of the geometric boundaries (walls) of a square arena, there are precisely four possible distinct configurations of orientations. This is precisely what has been observed in experiments (Stensola et al., 2015; Stensola and Moser, 2016).

      We added clarifying language to the Results section: “Under the premise that a sequence of firing fields aligns with one of the geometric boundaries of the environment, the sequence code model explains that the grid pattern typically assume one of only four distinct orientation configurations relative to the environment41,46. Concretely, the four orientation configurations arise when one row of grid fields aligns with one of the two sets of parallel walls in a rectangular environment, and each arrangement can result in two distinct orientations (Figure 3B).”

      I don't understand how repeated advancement of one unit to the next, as shown in Figure 4E, would cause the change in grid spacing near a reward. 

      In familiar environments, spatial firing fields of place cells in hippocampal CA1 and CA3 tend to shift backwards with experience (Mehta et al., 2000; Lee et al., 2004; Roth et al., 2012; Geiller et al., 2017; Dong et al., 2021). This implies that the center of place fields move closer to each other. A potential mechanism has been suggested, namely NMDA receptor-dependent longterm synaptic plasticity (Ekstrom et al., 2001). When we apply the same principle observed for place fields on a linear track to grid fields anchored to a reward zone, grid fields will “gravitate” towards the reward side. A similar idea has been presented by (Ginosar et al., 2023) who use the analogy of reward locations as “black holes”. In contrast to (Ginosar et al., 2023), who we cite multiple times, our idea unifies observations on place cells and grid cells in 1-D and 2-D environments and suggests a potential mechanism. We changed the wording in the revised manuscript and clarified the underlying premises.

      I don't follow how this theory predicts the finding that the grid pattern expands with novelty. The authors propose that this occurs because the animals are not paying attention to fine spatial details, and thus only need a low-resolution spatial map that eventually turns into a higher-resolution one. But it's not clear to me why one needs to invoke the sequence coding hypothesis to make this point. 

      We agree with this reviewer that this point needs clarification. The sequence code model adds explanatory power to the hypothesis that the grid pattern in a novel environment reflects a lowresolution mapping of space or spatial trajectories because it directly links spatial resolution to both field size and spacing of a grid map. Concretely, the spatial resolution of the trajectory code is equivalent to the spacing between two adjacent spatial fields, and the spatial resolution is directly proportional to the grid spacing and field size. If one did not evoke the sequence coding hypothesis, one would need to explain how and why both spacing and field size are related to the spatial resolution of the grid map. Lastly, as written in the manuscript text, we point out that, while the experimentally observed expansion of grid maps is consistent with the sequence code of trajectory, it is not predicted by the theory without making further assumption. 

      The last section, which describes that the grid spacing of different modules is scaled by the square root of 2, says that this is predicted if the resolution is doubled or halved. I am not sure if this is specifically a prediction of the sequence coding theory the authors put forth though since it's unclear why the resolution should be doubled or halved across modules (as opposed to changed by another factor). 

      We agree with reviewer #2 that the exact value of the scaling factor is not predicted by the sequence coding theory. E.g., the sequence code theory does not explain why the spatial resolution doesn’t change by a factor 3 or 1.5 (resulting in changes in grid spacing by square root of 3 or square root of 1.5, respectively). We have changed the wording to reflect this important point. We further clarified in the revised manuscript that future work on multiscale representations using modules of grid cells needs to show why changing the spatial resolution across modules by a factor of 2 is optimal. Interestingly, a scale ratio of 2 is commonly used in computer vision, specifically in the context of mipmapping and Gaussian pyramids, to render images across different scales. Literature in the computer vision field describes why a scaling factor of 2 and the use of Gaussian filter kernels (compare with Gaussian firing fields) is useful in allowing a smooth and balanced transition between successive levels of an image pyramid (Burt and Adelson, 1983; Lindeberg, 2008). Briefly, larger factors (like 3) could result in excessive loss of detail between levels, while smaller factors (like 1.5) would not reduce the image size enough to justify additional levels of computation (that would come with the structural cost of having more grid cell modules in the brain). We have clarified these points in the Discussion section.

      Reviewer #3 (Public Review): 

      The manuscript presents an intriguing explanation for why grid cell firing fields do not lie on a lattice whose axes aligned to the walls of a square arena. This observation, by itself, merits the manuscript's dissemination to the eLife's audience. 

      We thank this reviewer for their positive assessment.

      The presentation is quirky (but keep the quirkiness!). 

      We kept the quirkiness.

      But let me recast the problem presented by the authors as one of combinatorics. Given repeating, spatially separated firing fields across cells, one obtains temporal sequences of grid cells firing. Label these cells by integers from $[n]$. Any two cells firing in succession should uniquely identify one of six directions (from the hexagonal lattice) in which the agent is currently moving. 

      Now, take the symmetric group $\Sigma$ of cyclic permutations on $n$ elements.  We ask whether there are cyclic permutations of $[n]$ such that 

      \left(\pi_{i+1} - \pi_i \right) \mod n \neq \pm 1 \mod n, \; \forall i. 

      So, for instance, $(4,2,3,1)$ would not be counted as a valid permutation of $(1,2,3,4)$, as $(2,3)$ and $(1,4)$ are adjacent. 

      Furthermore, given $[n]$, are there two distinct cyclic permutations such that {\em no} adjacencies are preserved when considering any pair of permutations (among the triple of the original ordered sequence and the two permutations)? In other words, if we consider the permutation required to take the first permutation into the second, that permutation should not preserve any adjacencies. 

      {\bf Key question}: is there any difference between the solution to the combinatorics problem sketched above and the result in the manuscript? Specifically, the text argues that for $n=7$ there is only {\em one} solution. 

      Ideally, one would strive to obtain a closed-form solution for the number of such permutations as a function of $n$.  

      This is a great question! We currently have a student working on describing all possible arrangements of firing fields (essentially labelings of the hexagonal lattice) that satisfy the axioms in 2D, and we expect that results on the number of such arrangements will come out of his work. We plan to publish those results separately, possibly targeting a more mathematical audience.   

      The argument above appears to only apply in the case that every row (and every diagonal) contains all of the elements 1,...,n. However, when n is not prime, there are often arrangements where rows and/or diagonals do not contain every element from 1,...,n. For example, some admissible patterns with 9 neurons have a repeat length of 3 in all directions (horizontally and both diagonals). As a result the construction listed here will not give a full count of all possible arrangements. 

      Recommendations for the authors:  

      Reviewer #1 (Recommendations For The Authors): 

      I think the concise style of mathematical proof is both a curse and a blessing. While it delivers the message, I think the fluency and readability of the mathematical proof could be improved with longer paragraphs and some more editing. 

      We have added some clarifications in the text that we hope improve the readability.

      Reviewer #3 (Recommendations For The Authors): 

      A minor qualm I have with the nomenclature: 

      On page 7: 

      “To prove this statement, suppose that row A consists of units $1, \dots , k$ repeating in this order. Then any row that contains any unit from $1, \dots, k$ must contain the full repeat $1, \dots , k$ by axiom 1. So any row containing any unit from $1,\dots , k$ is a translation of row A, and any unit that does not contain them is disjoint from row A.”

      The last use of `unit' at the end of this paragraph instead of `row' is confusing. Technically, the authors have given themselves license to use this term by defining a unit to be “either to a single cell or a cell assembly”. Yet modern algebra tends to use `unit' as meaning a ring element that has an inverse.  

      We have renamed “unit” to “element” to avoid confusion with the terminology in modern algebra.

    1. Author response:

      Joint Public Review:

      Summary:

      In this study, Daniel et al. used three cognitive tasks to investigate behavioral signatures of cerebellar degeneration. In the first two tasks, the authors found that if an equation was incorrect, reaction times slowed significantly more for cerebellar patients than for healthy controls. In comparison, the slowing in the reaction times when the task required more operations was comparable to normal controls. In the third task, the authors show increased errors in cerebellar patients when they had to judge whether a letter string corresponded to an artificial grammar.

      Strengths:

      Overall, the work is methodologically sound and the manuscript well written. The data do show some evidence for specific cognitive deficits in cerebellar degeneration patients.

      Thank you for the thoughtful summary and constructive feedback. We are pleased that the methodological rigor and clarity of the manuscript were appreciated, and that the data were recognized as providing meaningful evidence regarding cognitive deficits in cerebellar degeneration.

      Weaknesses:

      The current version has some weaknesses in the visual presentation of results. Overall, the study lacks a more precise discussion on how the patterns of deficits relate to the hypothesized cerebellar function. The reviewers and the editor agreed that the data are interesting and point to a specific cognitive deficit in cerebellar patients. However, in the discussion, we were somewhat confused about the interpretation of the result: If the cerebellum (as proposed in the introduction) is involved in forming expectations in a cognitive task, should they not show problems both in the expected (1+3 =4) and unexpected (1+3=2) conditions? Without having formed the correct expectation, how can you correctly say "yes" in the expected condition? No increase in error rate is observed - just slowing in the unexpected condition. But this increase in error rate was not observed. If the patients make up for the lack of prediction by using some other strategy, why are they only slowing in the unexpected case? If the cerebellum is NOT involved in making the prediction, but only involved in detecting the mismatch between predicted and real outcome, why would the patients not show specifically more errors in the unexpected condition?

      Thank you for asking these important questions and initiating an interesting discussion. While decision errors and processing efficiency are not fully orthogonal and are likely related, they are not necessarily the same internal construct. The data from Experiments 1 and 2 suggest impaired processing efficiency rather than increased decision error. Reaction time slowing without increased error rates suggests that the CA group can form expectations but respond more slowly, possibly due to reduced processing efficiency. Thus, this analysis of our data can indicate that the cerebellum is not essential for forming expectations, but it plays a critical role in processing their violations.

      Relatedly, two important questions remain open in the literature concerning the cerebellum’s role in expectation-related processes. The first is whether the cerebellum contributes to the formation of expectations or the processing of their violations. In Experiments 1 and 2, the CA group did not show impairments in the complexity manipulation. As mentioned by the editors, solving these problems requires the formation of expectations during the reasoning process. Given the intact performance of the CA group, these results suggest that they are not impaired in forming expectations. However, in both Experiments 1 and 2, patients exhibited selective impairments in solving incorrect problems compared to correct problems. Since expectation formation is required in both conditions, but only incorrect problems involve a violation of expectation (VE), we hypothesize that the cerebellum is involved in VE processes. We suggest that the CA group can form expectations in familiar tasks, but are impaired in processing unexpected compared to expected outcomes. This supports the notion that the cerebellum contributes to VE, rather than to forming expectations.

      Importantly, while previous experimental manipulations(1–6) have provided important insights, some may have confounded these two internal constructs due to task design limitations (e.g., lack of baseline conditions). Notably, some of these previous studies did not include control conditions (e.g., correct trials) where there was no VE. In addition, other studies did not include a control measure (e.g., complexity effect), which limits their ability to infer the specific cerebellar role in expectation manipulation.

      In addition to the editors’ question, we would like to raise a second important question regarding cerebellar contributions to expectations-related processes. While our findings point to a both unique and consistent cerebellar role in VE processes in sequential tasks, we do not aim to generalize this role to all forms of expectations(2,7,8). Another interesting process is how expectations are formed. Expectations can be formed by different processes(2,7,8), and this should be taken into account when defining cerebellar function. For instance, previous experimental paradigms(1–6), aiming to assess VE, utilized tasks that manipulated rule-based errors or probability-based errors, but did not fully dissociate these constructs. In our Experiments 1 and 2, we specifically manipulated error signals derived from previous top-down effects. However, in Experiment 3, the participant’s VE was derived from within-task processes. In Experiment 3, expectations were formed either by statistical learning or by rule-based learning. During the test stage, when evaluating sensitivity to correct and incorrect problems, the CA group showed deficits only when expectations were formed based on rules. These findings suggest that cerebellar patients may retain a general ability to form expectations. However, their deficit appears to be specific to processing rule-based VE, but not statistically derived VE. This pattern of results aligns with the results of Experiments 1 and 2 where the rules are known and based on pre-task knowledge.

      We suggest that these two key questions are relevant to both motor and non-motor domains and were not fully addressed even in the previous, well-studied motor domain. Thus, the current experimental design used in three different experiments provides a valuable novel experimental perspective, allowing us to distinguish between some, but not all, of the processes involved in the formation of expectations and their violations. For instance, to our knowledge, this is the first study to demonstrate a selective impairment in rule-based VE processing in cerebellar patients across both numerical reasoning and artificial grammar tasks.

      If feasible, we propose that future studies should disentangle different forms of VE by operationalizing them in experimental tasks in an orthogonal manner. This will allow us, as a scientific community, to achieve a more detailed, well-defined cerebellar motor and non-motor mechanistic account.

      References

      (1) Butcher, P. A. et al. The cerebellum does more than sensory prediction error-based learning in sensorimotor adaptation tasks. J. Neurophysiol. 118, 1622–1636 (2017).

      (2) Moberget, T., Gullesen, E. H., Andersson, S., Ivry, R. B. & Endestad, T. Generalized role for the cerebellum in encoding internal models: Evidence from semantic processing. J. Neurosci. 34, 2871–2878 (2014).

      (3) Riva, D. The cerebellar contribution to language and sequential functions: evidence from a child with cerebellitis. Cortex. 34, 279–287 (1998).

      (4) Sokolov, A. A., Miall, R. C. & Ivry, R. B. The Cerebellum: Adaptive Prediction for Movement and Cognition. Trends Cogn. Sci. 21, 313–332 (2017).

      (5) Fiez, J. A., Petersen, S. E., Cheney, M. K. & Raichle, M. E. Impaired non-motor learning and error detection associated with cerebellar damage. A single case study. Brain 115 Pt 1, 155–178 (1992).

      (6) Taylor, J. A., Krakauer, J. W. & Ivry, R. B. Explicit and Implicit Contributions to Learning in a Sensorimotor Adaptation Task. J. Neurosci. 34, 3023–3032 (2014).

      (7) Sokolov, A. A., Miall, R. C. & Ivry, R. B. The Cerebellum: Adaptive Prediction for Movement and Cognition. Trends Cogn. Sci. 21, 313–332 (2017).

      (8) Fiez, J. A., Petersen, S. E., Cheney, M. K. & Raichle, M. E. IMPAIRED NON-MOTOR LEARNING AND ERROR DETECTION ASSOCIATED WITH CEREBELLAR DAMAGEA SINGLE CASE STUDY. Brain 115, 155–178 (1992).

      (9) Picciotto, Y. De, Algon, A. L., Amit, I., Vakil, E. & Saban, W. Large-scale evidence for the validity of remote MoCA administration among people with cerebellar ataxia administration among people with cerebellar ataxia. Clin. Neuropsychol. 0, 1–17 (2024).

      (10) Binoy, S., Monstaser-Kouhsari, L., Ponger, P. & Saban, W. Remote Assessment of Cognition in Parkinsons Disease and Cerebellar Ataxia: The MoCA Test in English and Hebrew. Front. Hum. Neurosci. 17, (2023).

      (11) Saban, W. & Ivry, R. B. Pont: A protocol for online neuropsychological testing. J. Cogn. Neurosci. 33, 2413–2425 (2021).

      (12) Algon, A. L. et al. Scale for the assessment and rating of ataxia : a live e ‑ version. J. Neurol. (2025). doi:10.1007/s00415-025-13071-7

      (13) McDougle, S. D. et al. Continuous manipulation of mental representations is compromised in cerebellar degeneration. Brain 145, 4246–4263 (2022).

    1. Author response:

      eLife Assessment

      This important study uses an innovative task design combined with eye tracking and fMRI to distinguish brain regions that encode the value of individual items from those that accumulate those values for value-based choices. It shows that distinct brain regions carry signals for currently evaluated and previously accumulated evidence. The study provides solid evidence in support of most of its claims, albeit with current minor weaknesses concerning the evidence in favour of gaze-modulation of the fMRI signal. The work will be of interest to neuroscientists working on attention and decision-making.

      We thank the Editor and Reviewers for their summary of the strengths of our study, and for their thoughtful review and feedback on our manuscript. We plan to undertake some additional analyses suggested by the Reviewers to bolster the evidence in favor of gaze-modulation of the fMRI signal.

      Reviewer #1 (Public review):

      Summary:

      This study builds upon a major theoretical account of value-based choice, the 'attentional drift diffusion model' (aDDM), and examines whether and how this might be implemented in the human brain using functional magnetic resonance imaging (fMRI). The aDDM states that the process of internal evidence accumulation across time should be weighted by the decision maker's gaze, with more weight being assigned to the currently fixated item. The present study aims to test whether there are (a) regions of the brain where signals related to the currently presented value are affected by the participant's gaze; (b) regions of the brain where previously accumulated information is weighted by gaze.

      To examine this, the authors developed a novel paradigm that allowed them to dissociate currently and previously presented evidence, at a timescale amenable to measuring neural responses with fMRI. They asked participants to choose between bundles or 'lotteries' of food times, which they revealed sequentially and slowly to the participant across time. This allowed modelling of the haemodynamic response to each new observation in the lottery, separately for previously accumulated and currently presented evidence.

      Using this approach, they find that regions of the brain supporting valuation (vmPFC and ventral striatum) have responses reflecting gaze-weighted valuation of the currently presented item, whereas regions previously associated with evidence accumulation (preSMA and IPS) have responses reflecting gaze-weighted modulation of previously accumulated evidence.

      Strengths:

      A major strength of the current paper is the design of the task, nicely allowing the researchers to examine evidence accumulation across time despite using a technique with poor temporal resolution. The dissociation between currently presented and previously accumulated evidence in different brain regions in GLM1 (before gaze-weighting), as presented in Figure 5, is already compelling. The result that regions such as preSMA respond positively to |AV| (absolute difference in accumulated value) is particularly interesting, as it would seem that the 'decision conflict' account of this region's activity might predict the exact opposite result. Additionally, the behaviour has been well modelled at the end of the paper when examining temporal weighting functions across the multiple samples.

      Thank you!

      Weaknesses:

      The results relating to gaze-weighting in the fMRI signal could do with some further explication to become more complete. A major concern with GLM2, which looks at the same effects as GLM1 but now with gaze-weighting, is that these gaze-weighted regressors may be (at least partially) correlated with their non-gaze-weighted counterparts (e.g., SVgaze will correlate with SV). But the non-gaze-weighted regressors have been excluded from this model. In other words, the authors are not testing for effects of gaze-weighting of value signals *over and above* the base effects of value in this model. In my mind, this means that the GLM2 results could simply be a replication of the findings from GLM1 at present. GLM3 is potentially a stronger test, as it includes the value signals and the interaction with gaze in the same model. But here, while the link to the currently attended item is quite clear (and a replication of Lim et al, 2011), the link to previously accumulated evidence is a bit contorted, depending upon the interpretation of a behavioural regression to interpret the fMRI evidence. The results from GLM3 are also, by the authors' own admission, marginal in places.

      We thank the Reviewer for their thoughtful critique. We acknowledge that our formulation of GLM2 does not test for the effects of gaze-weighted value signals beyond the base effects of value, only in place of the base effects of value. In our revision, we plan to examine alternative ways of quantifying the relative importance of gaze in these results.  

      Reviewer #2 (Public review):

      Summary:

      In this paper, the authors seek to disentangle brain areas that encode the subjective value of individual stimuli/items (input regions) from those that accumulate those values into decision variables (integrators) for value-based choice. The authors used a novel task in which stimulus presentation was slowed down to ensure that such a dissociation was possible using fMRI despite its relatively low temporal resolution. In addition, the authors leveraged the fact that gaze increases item value, providing a means of distinguishing brain regions that encode decision variables from those that encode other quantities such as conflict or time-on-task. The authors adopt a region-of-interest approach based on an extensive previous literature and found that the ventral striatum and vmPFC correlated with the item values and not their accumulation, whereas the pre-SMA, IPS, and dlPFC correlated more strongly with their accumulation. Further analysis revealed that the pre-SMA was the only one of the three integrator regions to also exhibit gaze modulation.

      Strengths:

      The study uses a highly innovative design and addresses an important and timely topic. The manuscript is well-written and engaging, while the data analysis appears highly rigorous.

      Weaknesses:

      With 23 subjects, the study has relatively low statistical power for fMRI.

      We thank the Reviewer for their comments on the strengths of the manuscript, and for highlighting an important limitation. We agree that the number of participants in the study, after exclusions, was lower than your typical fMRI study. However, it is important to note that we do have a lot of data for each subject. Due to our relatively fast, event-related design, we have on average 65 trials per subject (SD = 18) and 5.95 samples per trial (SD \= 4.03), for an average of 387 observations per subject (SD = 18). Our model-based analysis looks for very specific neural time courses across these ~387 observations, giving us substantial power to detect our effects of interest. Still, we acknowledge that our small number of subjects does still limit our power and our ability to generalize to other subjects. We plan to add the following disclaimer to the Discussion section:

      “Together with our limited sample size (n = 23), we may not have had adequate statistical power required to observe consistent effects. Additional research with larger sample sizes is needed to resolve this issue.”

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer 1 (Public Review):

      Summary

      The manuscript uses state-of-the-art analysis technology to document the spatio-temporal dynamics of brain activity during the processing of threats. The authors offer convincing evidence that complex spatio-temporal aspects of brain dynamics are essential to describe brain operations during threat processing.

      Strengths

      Rigorous complex analyses well suited to the data.

      Weaknesses

      Lack of a simple take-home message about discovery of a new brain operation.

      We have addressed the concern under response to item 1 in Recommendations for the authors of Reviewer 2 below.

      Reviewer 1 (Recommendations for the authors):

      The paper presents sophisticated analyses of how the spatiotemporal activity of the brain processes threats. While the study is elegant and relevant to the threat processing literature, it could be improved by better clarification of novelty, scope, assumptions and implications. Suggestions are reported below.

      (1) Introduction: It is difficult to understand what is unsatisfactory in the present literature and why we need this study. For example, lines 57-64 report what works well in the work of Anderson and Fincham but do not really describe what this approach lacks, either in failing to explain real data in conceptual terms.

      We have edited the corresponding lines to better describe what such approaches generally lack:

      Introduction; Lines 63-66: However, the mapping between brain signals and putative mental states (e.g., “encoding”) remained speculative. More generally, state-based modeling of fMRI data would benefit from evaluation in contexts where the experimental paradigm affords a clearer mapping between discovered states and experimental manipulation.

      (2) Also, based on the introduction it is unclear if the focus is on understanding the processing of threat or in the methodological development of experimental design and analysis paradigms for more ecologically valid situations.

      In our present work, we tried to focus on understanding dynamics of threat processing while also contributing to methodological development of analysis of dynamic/ecologically inspired experiments. To that end, we have added a new paragraph at the end of Introduction to clarify the principal focus of our work:

      Introduction; Lines 111-118: Is the present contribution focused on threat processing or methodological developments for the analysis of more continuous/ecologically valid paradigms? Our answer is “both”. One goal was to contribute to the development of a framework that considers brain processing to be inherently dynamic and multivariate. In particular, our goal was to provide the formal basis for conceptualizing threat processing as a dynamic process (see (Fanselow and Lester, 1987)) subject to endogenous and exogenous contributions. At the same time, our study revealed how regions studied individually in the past (e.g., anterior insula, cingulate cortex) contribute to brain states with multi-region dynamics.

      (3) The repeated statement, based on the Fiete paper, that most analyses or models of brain activity do not include an exogenous drive seems an overstatement. There is plenty of literature that not only includes exogenous drives but also studies and documents them in detail. There are many examples, but a prominent one is the study of auditory processing. Essentially all human brain areas related to hearing (not only the activity of individual areas but also their communication) are entrained by the exogenous drive of speech (e.g. J. Gross et al, PLoS Biology 11 e1001752, 2013).

      We have altered the original phrasing, which now reads as:

      Introduction; Lines 93-95: Importantly, we estimated both endogenous and exogenous components of the dynamics, whereas some past work has not modeled both contributions (see discussion in (Khona and Fiete, 2022)).

      Discussion; Lines 454-455: Work on dynamics of neural circuits in systems neuroscience at times assumes that the target circuit is driven only by endogenous processes (Khona and Fiete, 2022).

      (4) Attractor dynamics is used as a prominent descriptor of fMRI activity, yet the discussion of how this may emerge from the interaction between areas is limited. Is it related to the way attractors emerge from physical systems or neural networks (e.g. Hopfield?).

      This is an important question that we believe will benefit from computational and mathematical modeling, but we consider it beyond the scope of the present paper.

      (5) Fig 4 shows activity of 4 regions, not 2 s stated in lines 201-202. Correct?

      Fig. 4 shows activity of two regions and also the average activity of regions belonging to two resting-state networks engaged during threat processing (discussed shortly after lines 201-202). To clarify the above concern, we have changed the following line:

      Results; Lines 228-230: In Fig. 4, we probed the average signals from two resting-state networks engaged during threat-related processing, the salience network which is particularly engaged during higher threat, and the default network which is engaged during conditions of relative safety.

      (6) It would be useful to state more clearly how Fig 7B, C differs from Fig 2A, B (my understanding it is that in the former they are isolating the stimulus-driven processes)

      We have clarified this by adding the following line in the Results:

      Results; Lines 290-292: Note that in Fig. 7B/C we evaluated exogenous contributions only for stimuli associated with each state/state transition reported in Fig. 2A/B (see also Methods).

      Reviewer 2 (Public Review):

      Summary

      This paper by Misra and Pessoa uses switching linear dynamical systems (SLDS) to investigate the neural network dynamics underlying threat processing at varying levels of proximity. Using an existing dataset from a threat-of-shock paradigm in which threat proximity is manipulated in a continuous fashion, the authors first show that they can identify states that each has their own linear dynamical system and are consistently associated with distinct phases of the threat-of-shock task (e.g., “peri-shock”, “not near”, etc). They then show how activity maps associated with these states are in agreement with existing literature on neural mechanisms of threat processing, and how activity in underlying brain regions alters around state transitions. The central novelty of the paper lies in its analyses of how intrinsic and extrinsic factors contribute to within-state trajectories and betweenstate transitions. A final set of analyses shows how the findings generalize to another (related) threat paradigm.

      Strengths

      The analyses for this study are conducted at a very high level of mathematical and theoretical sophistication. The paper is very well written and effectively communicates complex concepts from dynamical systems. I am enthusiastic about this paper, but I think the authors have not yet exploited the full potential of their analyses in making this work meaningful toward increasing our neuroscientific understanding of threat processing, as explained below.

      Weaknesses

      (1) I appreciate the sophistication of the analyses applied and/or developed by the authors. These methods have many potential use cases for investigating the network dynamics underlying various cognitive and affective processes. However, I am somewhat disappointed by the level of inferences made by the authors based on these analyses at the level of systems neuroscience. As an illustration consider the following citations from the abstract: “The results revealed that threat processing benefits from being viewed in terms of dynamic multivariate patterns whose trajectories are a combination of intrinsic and extrinsic factors that jointly determine how the brain temporally evolves during dynamic threat” and “We propose that viewing threat processing through the lens of dynamical systems offers important avenues to uncover properties of the dynamics of threat that are not unveiled with standard experimental designs and analyses”. I can agree to the claim that we may be able to better describe the intrinsic and extrinsic dynamics of threat processing using this method, but what is now the contribution that this makes toward understanding these processes?

      We have addressed the concern under response to item 1 in Recommendations for the authors below.

      (2) How sure can we be that it is possible to separate extrinsically and intrinsically driven dynamics?

      We have addressed the concern under response to item 2 in Recommendations for the authors below.

      Reviewer 2 (Recommendations for the authors):

      (1) To address the first point under weaknesses above: I would challenge the authors to make their results more biologically/neuroscientifically meaningful, in particular in the sections (in results and/or discussion) on how intrinsic and extrinsic factors contribute to within-state trajectories and between-state transitions, and make those explicit in both the abstract and the discussion (what exactly are the properties of the dynamics of threat that are uncovered?). The authors may also argue that the current approach lies the groundwork for such efforts, but does not currently provide such insights. If they would take this position, that should be made explicit throughout (which would make it more of a methodological paper).

      The SLDS approach provides, we believe, a powerful framework to describe system-level dynamics (of threat processing in the the present case). A complementary type of information can be obtained by studying the contribution of individual components (brain regions) within the larger system (brain), an approach that helps connect our approach to studies that typically focus on the contributions of individual regions, and contributes to providing more neurobiological interpretability to the results. Accordingly, we developed a new measure of region importance that captured the extent to which individual brain regions contributed to driving system dynamics during a given state.

      Abstract; Lines 22-25: Furthermore, we developed a measure of region importance that quantifies the contributions of an individual brain region to system dynamics, which complements the system-level characterization that is obtained with the state-space SLDS formalism.

      Introduction; Lines 95-99: A considerable challenge in state-based modeling, including SLDS, is linking estimated states and dynamics to interpretable processes. Here, we developed a measure of region importance that provides a biologically meaningful way to bridge this gap, as it quantifies how individual brain regions contribute to steering state trajectories.

      Results; Lines 302-321: Region importance and steering of dynamics: Based on time series data and input information, the SLDS approach identifies a set of states and their dynamics. While these states are determined in the latent space, they can be readily mapped back to the brain, allowing for the characterization of spatiotemporal properties across the entire brain. Since not all regions contribute equally to state properties, we propose that a region’s impact on state dynamics serves as a measure of its importance.

      We illustrate the concept for STATE 5 (“near miss”) in Fig. 8 (see Fig. S17 for all states). Fig. 8A shows importance in the top row and activity below as a function of time from state entry.The dynamics of importance and activity can be further visualized (Fig. 8B), where some regions of particularly high importance are illustrated together with the ventromedial PFC, a region that is typically not engaged during high-threat conditions. Notably, the importance of the dorsal anterior insula increased quickly in the first time points, and later decreased. In contrast, the importance of the periaqueductal gray was relatively high from the beginning of the state and decreased moderately later.

      Fig. 8C depicts the correlation between these measures as a function of time. For all but STATE 1, the correlation increased over time. Interestingly, for STATES 4-5, the correlation was low at the first and second time points of the state (and for STATE 2 at the first time point), and for STATE 3 the measures were actually anticorrelated; both cases indicate a dissociation between activity and importance. In summary, our results illustrate that univariate region activity can differ from multivariate importance, providing a fruitful path to understand how individual brain regions contribute to collective dynamic properties.

      Discussion; Lines 466-487: In the Introduction, we motivated our study in terms of determining multivariate and distributed patterns of activity with shared dynamics. At one end of the spectrum, it is possible to conceptualize the whole brain as dynamically evolving during a state; at the other end, we could focus on just a few “key” regions, or possibly a single one (at which point the description would be univariate). Here, we addressed this gap by studying the importance of regions to state dynamics: To what extent does a region steer the trajectory of the system? From a mathematical standpoint, our proposed measure is not merely a function of activity of a region but also of the coefficients of the dynamics matrix capturing its effect on across-region dynamics (Eichler, 2005; Smith et al., 2010).

      How distributed should the dynamics of threat be considered? One answer to this question is to consider the distribution of importance values for all states. For STATE 1 (“post shock”), a few regions displayed the highest importance values for a few time points. However, for the other states the distribution of importance values tended to be more uniform at each time point. Thus, based on our proposed importance measure, we conclude that threat-related processing is profitably viewed as substantially distributed. Furthermore, we found that while activity and importance were relatively correlated, they could also diverge substantially. Together, we believe that the proposed importance measure provides a valuable tool for understanding the rich dynamics of threat processing. For example, we discovered that the dorsal anterior insula is important not only during high-anxiety states (such as STATE 5; “near miss”) but also, surprisingly, for a state that followed the aversive shock event (STATE 1; “post shock”). Additionally, we noted that posterior cingulate cortex, widely known to play a central role in the default mode network, to have the highest importance among all other regions in driving dynamics of low-anxiety states (such as STATE 3 and STATE 4; “not near”).

      Methods; Lines 840-866: Region importance We performed a “lesion study”, where we quantified how brain regions contribute to state dynamics by eliminating (zeroing) model parameters corresponding to a given region, and observing the resulting changes in system dynamics. According to our approach, the most important regions are those that cause the greatest change in system dynamics when eliminated.

      The SLDS model represents dynamics in a low dimensional latent space and model parameters are not readily available at the level of individual regions. Thus, the first step was to project the dynamics equation onto the brain data prior to computing importance values. Thus, the linear dynamics equation in the latent space (Eq. 2) was mapped to the original data space of N = 85 ROIs using the emissions model (Eq. 1):

      where C<sup>†</sup> represents the Moore-Penrose pseudoinverse of C, and and denote the corresponding dynamics matrix, input matrix, and bias terms in the original data space.

      Based on the above, we defined the importance of the i<sup>th</sup> ROI at time t based on quantifying the impact of “lesioning” the i<sup>th</sup> ROI, i.e., by setting the i<sup>th</sup> column of , the i<sup>th</sup> row of ,   and the i<sup>th</sup> element of to 0, denoted , , and respectively. Formally, the importance of the i<sup>th</sup> ROI was defined as:

      where ‘∗’ indicates element-wise multiplication of a scalar with a vector, is the activity of i<sup>th</sup> ROI at time corresponds to the i<sup>th</sup> column of is the inner product between i<sup>th</sup> row of and input corresponds to the i<sup>th</sup> element of and represents an indicator vector corresponding to the i<sup>th</sup> ROI. Note that the term is a function of both the i<sup>th</sup> ROI’s activity as well as the coefficients of the dynamics matrix capturing the effect of region i on the one-step dynamics of the entire system (Eichler, 2005; Smith et al., 2010); the remaining terms capture the effect of the external inputs and the bias term on the one-step dynamics of the i<sup>th</sup> ROI.

      After computing for a given run, the resultant importance time series was normalized to zero mean and unit variance.

      (2) To address the second point under the weaknesses above: Given that the distinction between intrinsic and extrinsic dynamics appears central to the novelty of the paper, I would suggest the authors explicitly address this issue in the introduction and/or discussion sections.

      The distinction between intrinsic and extrinsic dynamics is a modeling assumption of SLDS. We used such an assumption because in experimental designs with experimenter manipulated inputs one can profitably investigate both types of contribution to dynamics. While we should not reify the model’s assumption, we can gain confidence in our separation of extrinsically and intrinsically driven dynamics through controlled experiments where we can manipulate external inputs, or by demonstrating time-scale separation of intrinsic and extrinsic dynamics and that they operate at different frequencies. This is an important question that requires additional computational/mathematical modeling, but we consider it beyond the scope of the current paper. We have added the following lines in the discussion section:

      Discussion; Lines 521-528: A further issue that we wish to discuss is related to the distinction between intrinsic and extrinsic dynamics, which is explicitly modeled in our SLDS approach (see Methods, equation 2). We believe this is a powerful approach because in experimental designs with experimenter manipulated inputs, one can profitably investigate both types of contribution to dynamics. However, complete separation between intrinsic and extrinsic dynamics is challenging to ascertain. More generally, one can gain confidence in their separation through controlled experiments where external inputs are manipulated, or by demonstrating timescale separation of intrinsic and extrinsic dynamics.

      (3) In the abstract, the statement “.. studies in systems neuroscience that frequently assume that systems are decoupled from external inputs” sounds paradoxical after first introducing how threat processing is almost exclusively studied using blocked and event-related task designs (which obviously rely on external inputs only). Please clarify this.

      In this work, we wished to state that the SLDS framework characterizes both endogenous and exogenous contributions to dynamics, whereas some past work has not modeled both contributions. To clarify, we have changed the corresponding line:

      Abstract; Lines 19-20: Importantly, we characterized both endogenous and exogenous contributions to dynamics.

      (4) In the abstract, the first mention of circles comes out of the blue; the paradigm needs to be introduced first to make this understandable.

      We have rephrased the corresponding text:

      Abstract; Lines 14-17: First, we demonstrated that the SLDS model learned the regularities of the experimental paradigm, such that states and state transitions estimated from fMRI time series data from 85 regions of interest reflected threat proximity and threat approach vs. retreat.

      (5 In Figure 3, the legend shows z-scores representing BOLD changes associated with states. However, the z-scores are extremely low (ranging between -.4 and .4). Can this be correct, given that maps are thresholded at p < ._001 (i.e., _z > 3_._09)? A similar small range of z-scores is shown in the legend of Fig 5. Please check the z-score ranges.

      The p-value threshold used in Fig. 3 is based on the voxelwise t-test conducted between the participantbased bootstrapped maps and null maps (see Methods : State spatial maps : “To identify statistically significant voxels, we performed a paired t-test between the participant-based boostrapped maps and the null maps.”). Thus, the p-value threshold in the figure does not correspond to the z-scores of the groupaveraged state-activation maps. Similarly in Fig. 5, we only visualized the state-wise attractors on a brain surface map without any thresholding. The purpose of using a z-score color bar was to provide a scale comparable to that of BOLD activity.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      A cortico-centric view is dominant in the study of the neural mechanisms of consciousness. This investigation represents the growing interest in understanding how subcortical regions are involved in conscious perception. To achieve this, the authors engaged in an ambitious and rare procedure in humans of directly recording from neurons in the subthalamic nucleus and thalamus. While participants were in surgery for the placement of deep brain stimulation devices for the treatment of essential tremor and Parkinson's disease, they were awakened and completed a perceptual-threshold tactile detection task. The authors identified individual neurons and analyzed single-unit activity corresponding with the task phases and tactile detection/perception. Among the neurons that were perception-responsive, the authors report changes in firing rate beginning ~150 milliseconds from the onset of the tactile stimulation. Curiously, the majority of the perception-responsive neurons had a higher firing rate for missed/not perceived trials. In summary, this investigation is a valuable addition to the growing literature on the role of subcortical regions in conscious perception.

      Strengths:

      The authors achieved the challenging task of recording human single-unit activity while participants performed a tactile perception task. The methods and statistics are clearly explained and rigorous, particularly for managing false positives and non-normal distributions. The results offer new detail at the level of individual neurons in the emerging recognition of the role of subcortical regions in conscious perception.

      We thank the reviewer for their positive comments.

      Weaknesses:

      "Nonetheless, it remains unknown how the firing rate of subcortical neurons changes when a stimulus is consciously perceived." (lines 76-77) The authors could be more specific about what exactly single-unit recordings offer for interrogating the role of subcortical regions in conscious perception that is unique from alternative neural activity recordings (e.g., local field potential) or recordings that are used as proxies of neural activity (e.g., fMRI).

      We agree with the reviewer that the contribution of micro-electrode recordings was not sufficiently put forward in our manuscript. We added the following sentences to the discussion, when discussing the multiple types of neurons we found:

      Single-unit recordings provide a much higher temporal resolution than functional imaging, which helps assess how the neural correlates of consciousness unfold over time. Contrary to local field potentials, single-unit recordings can expose the variety of functional roles of neurons within subcortical regions, thereby offering a potential for a better mechanistic understanding of perceptual consciousness.

      Related comment for the following excerpts:

      "After a random delay ranging from 0.5 to 1 s, a "respond" cue was played, prompting participants to verbally report whether they felt a vibration or not. Therefore, none of the reported analyses are confounded by motor responses." (lines 97-99).

      "These results show that subthalamic and thalamic neurons are modulated by stimulus onset, irrespective of whether it was reported or not, even though no immediate motor response was required." (lines 188190).

      "By imposing a delay between the end of the tactile stimulation window and the subjective report, we ensured that neuronal responses reflected stimulus detection and not mere motor responses." (lines 245247).

      It is a valuable feature of the paradigm that the reporting period was initiated hundreds of milliseconds after the stimulus presentation so that the neural responses should not represent "mere motor responses". However, verbal report of having perceived or not perceived a stimulus is a motor response and because the participants anticipate having to make these reports before the onset of the response period, there may be motor preparatory activity from the time of the perceived stimulus that is absent for the not perceived stimulus. The authors show sensitivity to this issue by identifying task-selective neurons and their discussion of the results that refer to the confound of post-perceptual processing. Still, direct treatment of this possible confound would help the rigor of the interpretation of the results.

      We agree with the reviewer that direct treatment would have provided the best control. One way to avoid motor preparation is to only provide the stimulus-effector mapping after the stimulus presentation (Bennur & Gold, 2011; Twomey et al., 2016; Fang et al., 2024). Other controls to avoid post-perceptual processing used in consciousness research consist of using no-report paradigms (Tsuchiya et al., 2015) as we did in previous studies (Pereira et al., 2021; Stockart et al., 2024). Unfortunately, neither of these procedures was feasible during the 10 minutes allotted for the research task in an intraoperative setting with auditory cues and vocal responses. We would like to highlight nonetheless that the effects we report are shortlived and incompatible with sustained motor preparation activity.

      We added the following sentence to the discussion:

      Future studies ruling out the presence of motor preparation triggered by perceived stimuli (Bennur & Gold, 2011; Fang et al., 2024; Twomey et al., 2016) and verifying that similar neuronal activity occurs in the absence of task-demands (no-reports; Tsuchiya et al., 2015) or attention (Wyart & Tallon-Baudry, 2008) will be useful to support that subcortical neurons contribute specifically to perceptual consciousness.

      "When analyzing tactile perception, we ensured that our results were not contaminated with spurious behavior (e.g. fluctuation of attention and arousal due to the surgical procedure)." (lines 118-117).

      Confidence in the results would be improved if the authors clarified exactly what behaviors were considered as contaminating the results (e.g., eye closure, saccades, and bodily movements) and how they were determined.

      This sentence was indeed unclear. It introduced the trial selection procedure we used to compensate for drifts in the perceptual threshold, which can result from fluctuations in attention or arousal. We modified the sentence, which now reads:

      When analyzing tactile perception, we ensured that our results were not contaminated by fluctuating attention and arousal due to the surgical procedure. Based on objective criteria, we excluded specific series of trials from analyses and focused on time windows for which hits and misses occurred in commensurate proportions (see methods).

      During the recordings, the experimenter stood next to the patients and monitored their bodily movements, ensuring they did not close their eyes or produce any other bodily movements synchronous with stimulus presentation.

      The authors' discussion of the thalamic neurons could be more precise. The authors show that only certain areas of the thalamus were recorded (in or near the ventral lateral nucleus, according to Figure S3C). The ventral lateral nucleus has a unique relationship to tactile and motor systems, so do the authors hypothesize these same perception-selective neurons would be active in the same way for visual, auditory, olfactory, and taste perception? Moreover, the authors minimally interpret the location of the task, sensory, and perception-responsive neurons. Figure S3 suggests these neurons are overlapping. Did the authors expect this overlap and what does it mean for the functional organization of the ventral lateral nucleus and subthalamic nucleus in conscious perception?

      These are excellent questions, the answers to which we can only speculate. In rodents, the LT is known as a hub for multisensory processing, as over 90% of LT neurons respond to at least two sensory modalities (for a review, see Yang et al., 2024). Yet, no study has compared how LT neurons in rodents encode perceived and nonperceived stimuli across modalities. Evidence in humans is scarce, with only a few studies documenting supramodal neural correlates of consciousness at the cortical level with noninvsasive methods (Noel et al., 2018; Sanchez et al., 2020; Filimonov et al., 2022). We now refer to these studies in the revised discussion: Moreover, given the prominent role of the thalamus in multisensory processing, it will be interesting to assess if it is specifically involved in tactile consciousness or if it has a supramodal contribution, akin to what is found in the cortex (Noel et al., 2018; Sanchez et al., 2020; Filimonov et al., 2022).

      Concerning the anatomical overlap of neurons, we could not reconstruct the exact locations of the DBS tracts for all participants. Because of the limited number of recorded neurons, we preferred to refrain from drawing strong conclusions about the functional organization of the ventral lateral nucleus.

      "We note that, 6 out of 8 neurons had higher firing rates for missed trials than hit trials, although this proportion was not significant (binomial test: p = 0.145)." (lines 215-216).

      It appears that in the three example neurons shown in Figure 4, 2 out of 3 (#001 and #068) show a change in firing rate predominantly for the missed stimulations. Meanwhile, #034 shows a clear hit response (although there is an early missed response - decreased firing rate - around 150 ms that is not statistically significant). This is a counterintuitive finding when compared to previous results from the thalamus (e.g., local field potentials and fMRI) that show the opposite response profile (i.e., missed/not perceived trials display no change or reduced response relative to hit/perceived trials). The discussion of the results should address this, including if these seemingly competing findings can be rectified.

      We thank the reviewer for pointing out this limitation of the discussion. We avoided putting too much emphasis on these aspects due to the limited number of perception-selective neurons. Although subcortical connectivity models would predict that neurons in the thalamus should increase their firing rate for perceived stimuli, we were not surprised to see this heterogeneity as we had previously found neurons decreasing their firing rates for missed stimuli in the posterior parietal cortex (Pereira et al., 2021). We answer these points in response to the reviewer’s last comment below on the latencies of the effects.

      The authors report 8 perception-responsive neurons, but there are only 5 recording sites highlighted (i.e., filled-in squares and circles) in Figures S3C and 4D. Was this an omission or were three neurons removed from the perception-responsive analysis?

      Unfortunately, we could not obtain anatomical images for all participants. This information was present in the methods section, although not clearly enough:

      For 34 / 50 neurons, preoperative MRI and postoperative CT scans (co-registered in patient native space using CranialSuite) were available to precisely reconstruct surgical trajectories and recording locations (for the remaining 16 neurons, localizations were based on neurosurgical planning and confirmed by electrophysiological recordings at various depths).

      Therefore, we added the following sentence in Figures 2, 3, 4 and S3.

      [...] for patients for which we could obtain anatomical images.

      Could the authors speak to the timing of the responses reported in Figure 4? The statistically significant intervals suggested both early (~160-200ms) to late responses (~300ms). Some have hypothesized that subcortical regions are early - ahead of cortical activation that may be linked with conscious perception. Do these results say anything about this temporal model for when subcortical regions are active in conscious perception?

      We agree that response timing could have been better described. We performed a new analysis of the latencies at which our main effects were observed. This analysis revealed the existence of the two clusters mentioned by the reviewer very clearly. We now include this analysis in a new Figure 5 in the revised manuscript.

      We also performed a new analysis to support the existence of bimodal distributions and quantified the latencies. We added this text to the result section:

      We note that the timings of sensory and perception effects in Figures 3 and 4 showed a bimodal distribution with an early cluster (149 ms for sensory neurons; 121 ms for perception neurons; c.f. methods) and a later cluster (330 ms for sensory neurons; 315 ms for perception neurons; Figure 5). and this section to the methods:

      To measure bimodal timings of effect latencies, we fitted a two-component Gaussian mixture distribution to the data in Figure 5 by minimizing the mean square error with an interior-point method. We took the best of 20 runs with random initialization points and verified that the resulting mean square error was markedly (> 4 times) better than using a single component.

      We updated the discussion, including the points made in the comment about higher activity for missed stimuli (above):

      The early cluster’s average timing around 150 ms post-stimulus corresponds to the onset of a putative cortical correlate of tactile consciousness, the somatosensory awareness negativity (Dembski et al., 2021). Similar electroencephalographic markers are found in the visual and auditory modality. It is unclear, however, whether these markers are related to perceptual consciousness or selective attention (Dembski et al., 2021). The later cluster is centered around 300 ms and could correspond to a well known electroencephalographic marker, the P3b (Polich, 2007) whose association with perceptual consciousness has been questioned (Pitts et al., 2014; Dembski et al., 2021) although brain activity related to consciousness has been observed at similar timing even in the absence of report demands (Sergent et al., 2021; Stockart et al., 2024). It is also important to note that these clusters contain neurons with both increased and decreased firing rates following stimulus onset, similar to what was observed previously in the posterior parietal cortex (Pereira et al., 2021).

      Reviewer #2 (Public Review):

      The authors have studied subpopulations of individual neurons recorded in the thalamus and subthalamic nucleus (STN) of awake humans performing a simple cognitive task. They have carefully designed their task structure to eliminate motor components that could confound their analyses in these subcortical structures, given that the data was recorded in patients with Parkinson's Disease (PD) and diagnosed with an Essential Tremor (ET). The recorded data represents a promising addition to the field. The analyses that the authors have applied can serve as a strong starting point for exploring the kinds of complex signals that can emerge within a single neuron's activity. Pereira et. al conclude that their results from single neurons indicate that task-related activity occurs, purportedly separate from previously identified sensory signals. These conclusions are a promising and novel perspective for how the field thinks about the emergence of decisions and sensory perception across the entire brain as a unit.

      We thank the reviewer for these positive comments.

      Despite the strength of the data that was obtained and the relevant nature of the conclusions that were drawn, there are certain limitations that must be taken into consideration:

      (1) The authors make several claims that their findings are direct representations of consciousnessidentifiable in subcortical structures. The current context for consciousness does not sufficiently define how the consciousness is related to the perceptual task.

      This is indeed a complex issue in all studies concerned with perceptual consciousness and we were careful not to make such “direct” claims. Instead, we used the state-of-the-art tools available to study consciousness (see below) and only interpreted our findings with respect to consciousness in the discussion. For example, in the abstract, our claim is that “Our results provide direct neurophysiological evidence of the involvement of the subthalamic nucleus and the thalamus for the detection of vibrotactile stimuli, thereby calling for a less cortico-centric view of the neural correlates of consciousness.”

      In brief, first, we used near-threshold stimuli which allowed us to contrast reported vs. unreported trials while keeping the physical properties of the stimulus comparable. Second, we used subjective reports without incentive for participants to be more conservative or liberal in their response (e.g. through reward). Third, we introduced a random delay before the responses to limit confounding effects due to the report. We also acknowledged that “... it will be important in future studies to examine if similar subcortical responses are obtained when stimuli are unattended (Wyart & Tallon-Baudry, 2008), task-irrelevant (Shafto & Pitts, 2015), or when participants passively experience stimuli without the instruction to report them (i.e., no-report paradigms) (Tsuchyia et al., 2015)”. This last sentence now reads (to address a point made by Reviewer 1 about motor preparation):

      Future studies ruling out the presence of motor preparation triggered by perceived stimuli (Bennur & Gold, 2011; Fang et al., 2024; Twomey et al., 2016) and verifying that similar neuronal activity occurs in the absence of task-demands (no-reports; Tsuchiya et al., 2015) or attention (Wyart & Tallon-Baudry, 2008) will be useful to support that subcortical neurons contribute specifically to perceptual consciousness.

      (2) The current work would benefit greatly from a description and clarification of what all the neurons thathave been recorded are doing. The authors' criteria for selecting subpopulations with task-relevant activity are appropriate, but understanding the heterogeneity in a population of single neurons is important for broader considerations that are being studied within the field.

      We followed the reviewer’s suggestions and added new results regarding the latencies of the reported effects (new Figure 5). We also now show firing rates for hits, misses and overall sensory activity (hits and misses combined) for all perception-selective or sensory-selective (when behavior was good enough; Figure S5). Although a more detailed characterization of the heterogeneity of the neurons identified would have been relevant, it seems beyond the scope of the present study, especially given the relatively small number of neurons we identified, as well as the relative simplicity of the paradigm imposed by the clinical context in which we worked.

      (3) The authors have omitted a proper set of controls for comparison against the active trials, forexample, where a response was not necessary. Please explain why this choice was made and what implications are necessary to consider.

      We had mentioned this limitation in the discussion: Nevertheless, it will be important in future studies to examine if similar subcortical responses are obtained when stimuli are unattended (Wyart & TallonBaudry, 2008), task-irrelevant (Shafto & Pitts, 2015), or when participants passively experience stimuli without the instruction to report them (i.e., no-report paradigms) (Tsuchyia et al., 2015). We agree that such a control would have been relevant, but this was not feasible during the 10 minutes allotted for the research task in an intraoperative setting. These constraints are both clinical, to minimize discomfort for patients and practical, as is difficult to track neurons in an intraoperative setting for more than 10 minutes.

      We added a sentence to this effect in the discussion.

      Reviewer #3 (Public Review):

      Summary:

      This important study relies on a rare dataset: intracranial recordings within the thalamus and the subthalamic nucleus in awake humans, while they were performing a tactile detection task. This procedure allowed the authors to identify a small but significant proportion of individual neurons, in both structures, whose activity correlated with the task (e.g. their firing rate changed following the audio cue signalling the start of a trial) and/or with the stimulus presentation (change in firing rate around 200 ms following tactile stimulation) and/or with participant's reported subjective perception of the stimulus (difference between hits and misses around 200 ms following tactile stimulation). Whereas most studies interested in the neural underpinnings of conscious perception focus on cortical areas, these results suggest that subcortical structures might also play a role in conscious perception, notably tactile detection.

      Strengths:

      There are two strongly valuable aspects in this study that make the evidence convincing and even compelling. First, these types of data are exceptional, the authors could have access to subcortical recordings in awake and behaving humans during surgery. Additionally, the methods are solid. The behavioral study meets the best standards of the domain, with a careful calibration of the stimulation levels (staircase) to maintain them around the detection threshold, and an additional selection of time intervals where the behavior was stable. The authors also checked that stimulus intensity was the same on average for hits and misses within these selected periods, which warrants that the effects of detection that are observed here are not confounded by stimulus intensity. The neural data analysis is also very sound and well-conducted. The statistical approach complies with current best practices, although I found that, in some instances, it was not entirely clear which type of permutations had been performed, and I would advocate for more clarity in these instances. Globally the figures are nice, clear, and well presented. I appreciated the fact that the precise anatomical location of the neurons was directly shown in each figure.

      We thank the reviewer for this positive evaluation.

      Weaknesses:

      Some clarification is needed for interpreting Figure 3, top rows: in my understanding the black curve is already the result of a subtraction between stimulus present trials and catch trials, to remove potential drifts; if so, it does not make sense to compare it with the firing rate recorded for catch trials.

      The black curve represents the firing rate without any subtraction. We only subtracted the firing rates of catch trials in the statistical procedure, as the reviewer noted, to remove potential drift. We added (before baseline correction) to the legend of Figure 3.

      I also think that the article could benefit from a more thorough presentation of the data and that this could help refine the interpretation which seems to be a bit incomplete in the current version. There are 8 stimulus-responsive neurons and 8 perception-selective neurons, with only one showing both effects, resulting in a total of 15 individual neurons being in either category or 13 neurons if we exclude those in which the behavior is not good enough for the hit versus miss analysis (Figure S4A). In my opinion, it should be feasible to show the data for all of them (either in a main figure, or at least in supplementary), but in the present version, we get to see the data for only 3 neurons for each analysis. This very small selection includes the only neuron that shows both effects (neuron #001; which is also cue selective), but this is not highlighted in the text. It would be interesting to see both the stimulus-response data and the hit versus miss data for all 13 neurons as it could help develop the interpretation of exactly how these neurons might be involved in stimulus processing and conscious perception. This should give rise to distinct interpretations for the three possible categories. Neurons that are stimulus-responsive but not perception-selective should show the same response for both hits and misses and hence carry out indifferently conscious and unconscious responses. The fact that some neurons show the opposite pattern is particularly intriguing and might give rise to a very specific interpretation: if the neuron really doesn't tend to respond to the stimulus when hits and misses are put together, it might be a neuron that does not directly respond to the stimulus, but whose spontaneous fluctuations across trials affect how the stimulus is perceived when they occur in a specific time window after the stimulus. Finally, neuron #001 responds with what looks like a real burst of evoked activity to stimulation and also shows a difference between hits and misses, but intriguingly, the response is strongest for misses. In the discussion, the interesting interpretation in terms of a specific gating of information by subcortical structures seems to apply well to this last example, but not necessarily to the other categories.

      We now provide a supplementary Figure showing firing rates for hits, misses and the combination of both. The reviewer’s analysis about whether a perception-selective neuron also has to respond to the stimulus to be involved in gating is interesting. With more data, a finer characterization of these neurons would have been possible. In our study, it is possible that more neurons have similar characteristics as #001 (e.g. #032, #062, #068) but do not show a significant difference with respect to baseline when both hits and misses are considered. We now avoid interpreting null effects, especially considering the low number of trials with near-threshold detection behavior we could collect in 10 minutes. 

      We also realized that we had not updated Figure S7 after the last revision in which we had corrected for possible drifts to obtain sensory-selective neurons. The corrected panel A is provided below.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      It appears that the correct rejection was low for most participants. It would improve interpretation of the behavioral results if correct rejection was shown as a rate (i.e., # of correct rejection trials / total number of no stimulus/blank trials) rather than or in addition to reporting the number of correct rejection trials (Figure 1C).

      We added the following figure to the supplementary information.

      The axis tick marks in Figure 5A late versus early are incorrect (appears the axis was duplicated).

      Thank you for spotting this, it has been corrected.

      Reviewer #2 (Recommendations For The Authors):

      We would like to congratulate the authors on this strongly supported contribution to the field. The manuscript is well-written, although a little bit too concise in sections. See the following comments for the methods that could benefit the present conclusions:

      Thank you for these suggestions that we believe improved our interpretations.

      Major Points

      (1) The subpopulations of neurons that are considered are small, but it is not a confounding issue for the conclusions drawn. However, the behavior of the neurons that were excluded should be considered by calculating the percentage of neurons that are selective for the distinct parameters, as a function of time. This would greatly strengthen the understanding of what can be observed in the two subcortical structures.

      We thank the reviewer for this suggestion. We performed a new analysis of the latencies at which our main effects were observed. This analysis revealed the existence of two clusters, as shown in the new Figure 5 copied below

      We also performed a new analysis to support the existence of bimodal distributions and quantified the latencies. We added this text to the result section:

      We note that the timings of sensory and perception effects in Figures 3 and 4 showed a bimodal distribution with an early cluster (149 ms for sensory neurons; 121 ms for perception neurons; c.f. methods) and a later cluster (330 ms for sensory neurons; 315 ms for perception neurons; Figure 5). and this section to the methods:

      To measure bimodal timings of effect latencies, we fitted a two-component Gaussian mixture distribution to the data in Figure 5 by minimizing the mean square error with an interior-point method. We took the best of 20 runs with random initialization points and verified that the resulting mean square error was markedly (> 4 times) better than using a single component.

      We also updated the discussion:

      The early cluster’s average timing around 150 ms post-stimulus corresponds to the onset of a putative cortical correlate of tactile consciousness, the somatosensory awareness negativity (Dembski et al., 2021). Similar electroencephalographic markers are found in the visual and auditory modality. It is unclear, however, whether these markers are related to perceptual consciousness or selective attention (Dembski et al., 2021). The later cluster is centered around 300 ms and could correspond to a well known electroencephalographic marker, the P3b (Polich, 2007) whose association with perceptual consciousness has been questioned (Pitts et al., 2014; Dembski et al., 2021) although brain activity related to consciousness has been observed at similar timing even in the absence of report demands (Sergent et al., 2021; Stockart et al., 2024). It is also important to note that these clusters contain neurons with both increased and decreased firing rates following stimulus onset, similar to what was observed previously in the posterior parietal cortex (Pereira et al., 2021).

      (2) We highly recommend that the authors consider employing some analysis that decodes therepresentations observable in the activity of individual neurons as a function of time (e.g. Shannon's Mutual Information). This would reinforce and emphasize the most relevant conclusions.

      We thank the reviewers for this suggestion. Unfortunately, such methods would require many more trials than what we were able to collect in the 10-minute slots available in the operating room.

      (3) Although there are small populations recorded in each of the two subcortical structures, they aresufficient to attempt a study using population dynamics (primarily, PCA can still work with smaller populations). Given the broad range of dynamics that are observed in a population of single units typically involved in decision-making, it would be interesting to consider whether heterogeneity is a hallmark of decision-making, and trying to summarize the variance in the activity of the entire population should provide a certain understanding of the cue-selective versus the perception-selective qualities, as an example.

      We now present all 13 neurons that were sensory- or perception-selective for which we had good enough behavior to show hit vs. miss differences in Supplementary Figure S5. Although population-level analyses would be relevant, they are not compatible with the number of neurons we identified.

      (4) A stronger presentation of what the expectations are for the results would also benefit theinterpretability of the manuscript when added to the introduction and discussion sections.

      Due to the scarcity of single-neuron data related to perceptual consciousness, especially in the subcortical structures we explored, our prior expectations did not exceed finding perception-selective neurons. We would prefer to avoid refining these expectations post-hoc. 

      Minor Comments

      (1) Add the shared overlap between differently selective neurons explicitly in the manuscript.

      We added this information at the end of the results section.

      (2) Add a consideration in the methods of why the Wilcoxon test or permutation test was selected forseparate uses. How do the results compare?

      Sorry for this misunderstanding. We clarified this in revised methods:

      To deal with possibly non-parametric distributions, we used Wilcoxon rank sum test or sign test instead of t-tests to test differences between distributions. We used permutation tests instead of Binomial tests to test whether a reported number of neurons could have been obtained by chance.

      Reviewer #3 (Recommendations For The Authors):

      Suggestions for improved or additional experiments, data or analysis:

      As suggested already in the public review, it might be worth showing all 13 neurons with either stimulusresponsive or perception-selective behaviour and, based on that, deepen the potential interpretation of the results for the different categories.

      We agree that this information improves the understanding of the underlying data and this addition was also proposed by reviewer 2. We added it in a new supplementary Figure S5.

      Recommendations for improving the writing and presentation

      As mentioned in the public review, I think Figure 3 needs clarification. I found that, in some instances, it was not entirely clear which type of analyses or permutation tests had been performed, and I would advocate for more clarity in these instances. For example:

      Page 6 line 146 "permuting trial labels 1000 times": do you mean randomly attributing a trial to aneuron? Or something else?

      We agree that this was somewhat unclear. We modified the sentence to:

      permuting the sign of the trial-wise differences

      We now define a sign permutation test for paired tests and a trial permutation test for two-sample tests in the methods and specify which test was used in the maintext.

      Page 7, neurons which have their firing rate modulated by the stimulus: I think you ought to be moreexplicit about the analysis so that we grasp it on the first read. To understand what is shown in Figure 3 I had to go back and forth between the main text and the method, and I am still not sure I completely understood. You compare the firing rate in sliding windows following stimulus onset with the mean firing rate during the 300ms baseline. Sliding windows are between 0 and 400 ms post-stim (according to methods ?) and a neuron is deemed responsive if you find at least one temporal cluster that shows a significant difference with baseline activity (using cluster permutation). Is that correct? Either way, I would recommend being a bit more precise about the analysis that was carried out in the main text, so that we only need to refer to methods when we need specialized information.

      We agree that the methods section was unclear. We re-wrote the following two paragraphs:

      To identify sensory-selective neurons, we assumed that subcortical signatures of stimulus detection ought to be found early following its onset and looked for differences in the firing rates during the first 400 ms post-stimulus onset compared to a 300 ms pre-stimulus baseline. To correct for possible drifts occurring during the trial, we subtracted the average cue-locked activity from catch trials to the cuelocked activity of each stimulus-present trials before realigning to stimulus onset. We defined a cluster as a set of adjacent time points for which the firing rates were significantly different between hits and misses, as assessed by a non-parametric sign rank test. A putative neuron was considered sensory-selective when the length of a cluster was above 80 ms, corresponding to twice the standard deviation of the smoothing kernel used to compute the firing rate. Whether for the shuffled data or the observed data, if more than one cluster was obtained, we discarded all but the longest cluster. This permutation test allowed us to control for multiple comparisons across time and participants.

      For perception-selective neurons, we looked for differences in the firing rates between hit and miss trials during the first 400 ms post-stimulus onset. We defined a cluster as a set of adjacent time points for which the firing rates were significantly different between hits and misses as assessed by a nonparametric Wilcoxon rank sum test. As for sensory-selective neurons, a putative neuron was considered perception-selective when the length of a cluster was above 80 ms, corresponding to twice the standard deviation of the smoothing kernel used to compute the firing rate and we discarded all but the longest cluster.

      Minor points:

      Figure 3: inset showing action potentials, please also provide the time scale (in the legend for example), so that it's clear that it is not commensurate with the firing rate curve below, but rather corresponds to the dots of the raster plot.

      We added the text ”[...], duration: 2.5 ms” in Figures 2, 3, and 4.

      Line 210: I recommend: “we found 8 neurons [...] showing a significant difference *between hits and misses* after stimulus onset."

      We made the change.

      Top of page 9, the following sentence is misleading “This result suggests that neurons in these two subcortical structures have mostly different functional roles ; this could read as meaning that functional roles are different between the two structures. Probably what you mean is rather something along this line : “these two subcortical structures both contain neurons displaying several different functional roles”

      Changed.

      Line 329: remove double “when”

      We made the change, thank you for spotting this.

    1. Author response:

      The following is the authors’ response to the previous reviews

      We would like to thank you for your valuable comments and suggestions, which have greatly contributed to improving our manuscript.

      We have carefully addressed all the reviewers' suggestions, and detailed responses for each Reviewer are provided at the end of this letter. In summary:

      • The Introduction has been revised to provide a more focused discussion on results, toning down the speculative discussion on seasonal host shifts.

      • The methodology section has been clarified, particularly the power analysis, which now includes a clearer explanation. The random effects in the models have been better described to ensure transparency.

      • The Results section was reorganized to highlight the key findings more effectively.

      • The Discussion has been restructured for clarity and conciseness, ensuring the interpretation of the results is clearer and better aligned with the study objectives.

      • Minor edits throughout the manuscript were made to improve readability and accuracy.

      We hope you find this revised version of the manuscript satisfactory.

      Reviewer #1 (Public review):

      Summary:

      This study examines the role of host blood meal source, temperature, and photoperiod on the reproductive traits of Cx. quinquefasciatus, an important vector of numerous pathogens of medical importance. The host use pattern of Cx. quinquefasciatus is interesting in that it feeds on birds during spring and shifts to feeding on mammals towards fall. Various hypotheses have been proposed to explain the seasonal shift in host use in this species but have provided limited evidence. This study examines whether the shifting of host classes from birds to mammals towards autumn offers any reproductive advantages to Cx.

      quinquefasciatus in terms of enhanced fecundity, fertility, and hatchability of the offspring. The authors found no evidence of this, suggesting that alternate mechanisms may drive the seasonal shift in host use in Cx. quinquefasciatus.

      Strengths:

      Host blood meal source, temperature, and photoperiod were all examined together.

      Weaknesses:

      The study was conducted in laboratory conditions with a local population of Cx. quinquefasciatus from Argentina. I'm not sure if there is any evidence for a seasonal shift in the host use pattern in Cx. quinquefasciatus populations from the southern latitudes.

      Comments on the revision:

      Overall, the manuscript is much improved. However, the introduction and parts of the discussion that talk about addressing the question of seasonal shift in host use pattern of Cx. quin are still way too strong and must be toned down. There is no strong evidence to show this host shift in Argentinian mosquito populations. Therefore, it is just misleading. I suggest removing all this and sticking to discussing only the effects of blood meal source and seasonality on the reproductive outcomes of Cx. quin.

      Introduction and discussion have been modified, toned down and sticked to discuss the results as suggested.

      Reviewer #1 (Recommendations for the authors):

      Some more minor comments are mentioned below.

      Line 51: Because 'of' this,

      Changed as suggested.

      Line 56: specialists 'or' generalists

      Changed as suggested.

      Line 56: primarily

      Changed as suggested.

      Line 98: Because 'of' this,

      Changed as suggested.

      Reviewer #2 (Public review):

      Summary:

      Conceptually, this study is interesting and is the first attempt to account for the potentially interactive effects of seasonality and blood source on mosquito fitness, which the authors frame as a possible explanation for previously observed hostswitching of Culex quinquefasciatus from birds to mammals in the fall. The authors hypothesize that if changes in fitness by blood source change between seasons, higher fitness on birds in the summer and on mammals in the autumn could drive observed host switching. To test this, the authors fed individuals from a colony of Cx. quinquefasciatus on chickens (bird model) and mice (mammal model) and subjected each of these two groups to two different environmental conditions reflecting the high and low temperatures and photoperiod experienced in summer and autumn in Córdoba, Argentina (aka seasonality). They measured fecundity, fertility, and hatchability over two gonotrophic cycles. The authors then used generalized linear mixed models to evaluate the impact of host species, seasonality, and gonotrophic cycle on fecundity, fertility, and hatchability. The authors were trying to test their hypothesis by determining whether there was an interactive effect of season and host species on mosquito fitness. This is an interesting hypothesis; if it had been supported, it would provide support for a new mechanism driving host switching. While the authors did report an interactive impact of seasonality and host species, the directionality of the effect was the opposite from that hypothesized. The authors have done a very good job of addressing many of the reviewer's concerns, especially by adding two additional replicates. Several minor concerns remain, especially regarding unclear statements in the discussion.

      Strengths:

      (1) Using a combination of laboratory feedings and incubators to simulate seasonal environmental conditions is a good, controlled way to assess the potentially interactive impact of host species and seasonality on the fitness of Culex quinquefasciatus in the lab.

      (2) The driving hypothesis is an interesting and creative way to think about a potential driver of host switching observed in the field.

      Weaknesses:

      (1) The methods would be improved by some additional details. For example, clarifying the number of generations for which mosquitoes were maintained in colony (which was changed from 20 to several) and whether replicates were conducted at different time points.

      Changed as suggested.

      (2) The statistical analysis requires some additional explanation. For example, you suggest that the power analysis was conducted a priori, but this was not mentioned in your first two drafts, so I wonder if it was actually conducted after the first replicate. It would be helpful to include further detail, such as how the parameters were estimated. Also, it would be helpful to clarify why replicate was included as a random effect for fecundity and fertility but as a fixed effect for hatchability. This might explain why there were no significant differences for hatchability given that you were estimating for more parameters.

      The power analysis was conducted a posteriori, as you correctly inferred. While I did not indicate that it was performed a priori, you are right in noting that this was not explicitly mentioned. As you suggested, the methodology for the power analysis has been revised to clarify any potential doubts.

      Regarding the model for hatchability, a model without a random effect variable was used, as all attempts to fit models with random effects resulted in poor validation. These points have now been clarified and explained in the corresponding section.

      (3) A number of statements in the discussion are not clear. For example, what do you mean by a mixed perspective in the first paragraph? Also, why is the expectation mentioned in the second paragraph different from the hypothesis you described in your introduction?

      Changed as suggested.

      (4) According to eLife policy, data must be made freely available (not just upon request).

      Data and code will be publicly available. The corresponding section was modified.

      Reviewer #2 (Recommendations for the authors):

      Your manuscript is much improved by the inclusion of two additional replicates! The results are much more robust when we can see that the trends that you report are replicable across 3 iterations of the experiment. Congratulations on a greatly improved study and paper! I have several minor concerns and suggestions, listed below:

      38-39: I think it is clearer to say "no statistically significant effect of season on hatchability of eggs" ... or specify if you are referring to blood or the interaction of blood and season. It isn't clear which treatment you are referring to here.

      Changed as suggested.

      54-57: This could be stated more succinctly. Instead of citing papers that deal with specific examples of patterns, I would suggest citing a review paper that defines these terms.

      Changed as suggested.

      83-84: What if another migratory bird is the preferred host in Argentina? I would state this more cautiously (e.g. "may not be applicable...").

      Changed as suggested.

      95-96: I don't understand what you mean by this. These hypotheses are specifically meant to understand mosquitoes that DO have a distinct seasonal phenology, so I'm not sure why this caveat is relevant. And naturally this hypothesis is host dependent, since it is based on specific host reproductive investments. I think that the strongest caveat to this hypothesis is simply that it hasn't been proven.

      Changed as suggested.

      97-115: This is a great paragraph! Very clear and compelling.

      Thanks for your words!

      118: Do you have an exact or estimated number of rafts collected?

      Sorry, I have not the exact number of rafts, but it was at leas more than 20-30.

      135: "over twenty" was changed to "several"; several would imply about 3 generations, so this is misleading. If the colony was actually maintained for over twenty generations, then you should keep that wording.

      Changed as suggested.

      163-164: Can you please clarify whether the replicates were conducted a separate time points?

      Changed as suggested.

      Note: the track changes did not capture all of the changes made; e.g. 163-164 should show as new text but does not.

      You are absolutely right; when I uploaded the last version, I unfortunately deleted all tracked changes and cannot recover them. In this new version, I will ensure that all minimal changes are included as tracked changes.

      186 - 189: the terms should be "fixed effect" and "random effect"

      Changed as suggested.

      191: Edit: linear

      Changed as suggested.

      194: why was replicate not included as a random effect here when it was above? Also, can you please clarify "interaction effects"? Which interactions did you include?

      Changed as suggested. Explained above and in methodology. Hatchability models with random effect variable were poor fitted and validated. The interactions for hatchability were a four-way (season, blood source, cycle and replicate)

      207-208: I'm not sure what you mean by "aimed to achieve"? Weren't you doing this after you conducted the experiments, so wouldn't this be determining the power of your model (post-hoc power analysis)? Also, I think you should provide the parameter estimates that were used (e.g. effect size - did you use the effect size you estimated across the 3 replicates?).

      Changed as suggested.

      214-215: this should be reworded to acknowledge that this is estimated for the given effect size; for example, something like "This sample size was sufficient to detect the observed effect with a statistical power of 0.8" or something along those lines (unless I am misunderstanding how you conducted this test).

      Changed as suggested.

      246. Abbreviate Culex

      Changed as suggested.

      253-255: This sentence isn't clear. What do you mean by mixed? Also, the season really seemed to mainly impact the fitness of mosquitoes fed on mouse blood and here the way it is phrased seems to indicate that season has an impact on the fitness of those fed with chicken blood.

      Changed as suggested.

      258-260: You stated your hypothesis as the relative fitness shifting between seasons, but this statement about the expectation is different from your hypothesis stated earlier. Please clarify.

      You are right. Thank you for noting this. It was changed as suggested.  

      263-266: I also don't understand this sentence; what does the first half of the sentence have to do with the second?

      Changed as suggested.

      269-270: This doesn't align with your observation exactly; you say first AND second are generally most productive, but you observed a drop in the second. Please clarify this.

      Changed as suggested.

      280: I suggest removing "as same as other studies"; your caveats are distinct because your experimental design was unique

      Changed as suggested.

      287: you shouldn't be looking for a "desired" effect; I suggest removing this word

      Changed as suggested.

      288: It wasn't really a priori though, since you conducted it after your first replicate (unless you didn't use the results from the first replicate you reported in the original drafts?)

      It was a posteriori. Changed as suggested.

      290: Why is 290 written here?

      It was a mistype. Deleted as suggested.

      291-298: The meaning of this section of your paragraph is not clear.

      Improve as suggested.

      304-313: This list of 3 explanations are directed at different underlying questions. Explanations 1 and 2 are alternative explanations for why host switching occurs if not due to differences in fitness. This isn't really an explanation of your results so much as alternative explanations for a previously reported phenomenon. And the third is an explanation for why you may not have observed the expected effect. I suggest restructuring this to include the fact that Argentinian quinqs may not host switch as part of your previous list of caveats. Then you can include your two alternative explanations for host switching as a possible future direction (although I would say that it is really just one explanation because "vector biology" is too broad of a statement to be testable). Also, you haven't discussed possible explanations for your actual result, which showed that mosquito fitness decreased when feeding on mouse blood in autumn conditions and in the second gonotrophic, while those that fed on chicken did not experience these changes. Why might that be?

      The discussion was restructured to include all these suggested changes. Additionally, it was also discussed some possible explanations of our results.

      315-317: This statement is vague without a direct explanation of how this will provide insight. I suggest removing or providing an explanation of how this provides insight to transmission and forecasting.

      Changed as suggested.

      319-320: According to eLife policy, all data should be publicly available. From guidelines: "Media Policy FAQs Data Availability Purpose and General Principles To maintain high standards of research reproducibility, and to promote the reuse of new findings, eLife requires all data associated with an article to be made freely and widely available. These must be in the most useful formats and according to the relevant reporting standards, unless there are compelling legal or ethical reasons to restrict access. The provision of data should comply with FAIR principles (Findable, Accessible, Interoperable, Reusable). Specifically, authors must make all original data used to support the claims of the paper, or that is required to reproduce them, available in the manuscript text, tables, figures or supplementary materials, or at a trusted digital repository (the latter is recommended). This must include all variables, treatment conditions, and observations described in the manuscript. The authors must also provide a full account of the materials and procedures used to collect, pre-process, clean, generate and analyze the data that would enable it to be independently reproduced by other researchers."

      - so you need to make your data available online; I also understand the last sentence to indicate that code should be made available.  

      Data and code will be publicly available.

      Table 1: it is notable that in replicate 2, the autumn:mouse:gonotrophic cycle II fecundity and fertility are actually higher than in the summer, which is the opposite of reps 1 and 3 and the overall effect you reported from the model. This might be worth mentioning in the discussion.

      Mentioned in the discussion as suggested.

      Tables 1 and 2: shouldn't this just be 8 treatments? You included replicate as a random effect, so it isn't really a separate set of treatments.

      This table reflects the output of the whole experiment, that is why it is present the 24 expetiments.

      Figure 3: Can you please clarify if this is showing raw data?

      Changed as suggested.

      Note: grammatical copy editing would be beneficial throughout

      Grammar was improved as suggested.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      In this study, Tian et al. explore the role of ubiquitination of non-structural protein 16 (nsp16) in the SARS-CoV-2 life cycle. nsp16, in conjunction with nsp10, performs the final step of viral mRNA capping through its 2'-O-methylase activity. This modification allows the virus to evade host immune responses and protects its mRNA from degradation. The authors demonstrate that nsp16 undergoes ubiquitination and subsequent degradation by the host E3 ubiquitin ligases UBR5 and MARCHF7 via the ubiquitin-proteasome system (UPS). Specifically, UBR5 and MARCHF7 mediate nsp16 degradation through K48- and K27-linked ubiquitination, respectively. Notably, degradation of nsp16 by either UBR5 or MARCHF7 operates independently, with both mechanisms effectively inhibiting SARS-CoV-2 replication in vitro and in vivo. Furthermore, UBR5 and MARCHF7 exhibit broad-spectrum antiviral activity by targeting nsp16 variants from various SARS-CoV-2 strains. This research advances our understanding of how nsp16 ubiquitination impacts viral replication and highlights potential targets for developing broadly effective antiviral therapies.

      Strengths:

      The proposed study is of significant interest to the virology community because it aims to elucidate the biological role of ubiquitination in coronavirus proteins and its impact on the viral life cycle. Understanding these mechanisms will address broadly applicable questions about coronavirus biology and enhance our overall knowledge of ubiquitination's diverse functions in cell biology. Employing in vivo studies is a strength.

      Weaknesses:

      Minor comments:

      Figure 5A- The authors should ensure that the figure is properly labeled to clearly distinguish between the IP (Immunoprecipitation) panel and the input panel.

      Thank you for your suggestion. We have exchanged Figure 5 in this version.

      Reviewer #3 (Public review):

      Summary:

      The manuscript "SARS-CoV-2 nsp16 is regulated by host E3 ubiquitin ligases, UBR5 and MARCHF7" is an interesting work by Tian et al. describing the degradation/ stability of NSP16 of SARS CoV2 via K48 and K27-linked Ubiquitination and proteasomal degradation. The authors have demonstrated that UBR5 and MARCHF7, an E3 ubiquitin ligase bring about the ubiquitination of NSP16. The concept, and experimental approach to prove the hypothesis looks ok. The in vivo data looks ok with the controls. Overall, the manuscript is good.

      Strengths:

      The study identified important E3 ligases (MARCHF7 and UBR5) that can ubiquitinate NSP16, an important viral factor.

      Comments on revisions:

      I had gone through the revised form of the manuscript thoroughly. The authors have addressed all of my concerns. To me, the experimental approach looks convincing that the host E3 ubiquitin ligases (UBR5 and MARCHF7) ubiquitinate NSP16 and mark it for proteasomal degradation via K48- and K27- linkage. The authors have represented the final figure (Fig.8) in a convincing manner, opening a new window to explore the mechanism of capping the vRNA bu NSP16.

      Thank you for your recognition.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, the authors investigated the effect of chronic activation of dopamine neurons using chemogenetics. Using Gq-DREADDs, the authors chronically activated midbrain dopamine neurons and observed that these neurons, particularly their axons, exhibit increased vulnerability and degeneration, resembling the pathological symptoms of Parkinson's disease. Baseline calcium levels in midbrain dopamine neurons were also significantly elevated following the chronic activation. Lastly, to identify cellular and circuit-level changes in response to dopaminergic neuronal degeneration caused by chronic activation, the authors employed spatial genomics (Visium) and revealed comprehensive changes in gene expression in the mouse model subjected to chronic activation. In conclusion, this study presents novel data on the consequences of chronic hyperactivation of midbrain dopamine neurons.

      Strengths:

      This study provides direct evidence that the chronic activation of dopamine neurons is toxic and gives rise to neurodegeneration. In addition, the authors achieved the chronic activation of dopamine neurons using water application of clozapine-N-oxide (CNO), a method not commonly employed by researchers. This approach may offer new insights into pathophysiological alterations of dopamine neurons in Parkinson's disease. The authors also utilized state-of-the-art spatial gene expression analysis, which can provide valuable information for other researchers studying dopamine neurons. Although the authors did not elucidate the mechanisms underlying dopaminergic neuronal and axonal death, they presented a substantial number of intriguing ideas in their discussion, which are worth further investigation.

      We thank the reviewer for these positive comments.

      Weaknesses:

      Many claims raised in this paper are only partially supported by the experimental results. So, additional data are necessary to strengthen the claims. The effects of chronic activation of dopamine neurons are intriguing; however, this paper does not go beyond reporting phenomena. It lacks a comprehensive explanation for the degeneration of dopamine neurons and their axons. While the authors proposed possible mechanisms for the degeneration in their discussion, such as differentially expressed genes, these remain experimentally unexplored.

      We thank the reviewer for this review. We do believe that the manuscript has a substantial mechanistic component, as the central experiments involve direct manipulation of neuronal activity, and we show an increase in calcium levels and gene expression changes in dopamine neurons that coincide with the degeneration. However, we agree that deeper mechanistic investigation would strengthen the conclusions of the paper. We have executed several important revisions, including the addition of CNO behavioral controls, manipulation of intracellular calcium using isradipine, additional transcriptomics experiments and further validation of findings. We believe that these additions significantly bolster the conclusions of the paper.

      Reviewer #2 (Public Review):

      Summary:

      Rademacher et al. present a paper showing that chronic chemogenetic excitation of dopaminergic neurons in the mouse midbrain results in differential degeneration of axons and somas across distinct regions (SNc vs VTA). These findings are important. This mouse model also has the advantage of showing a axon-first degeneration over an experimentally-useful time course (2-4 weeks). 2. The findings that direct excitation of dopaminergic neurons causes differential degeneration sheds light on the mechanisms of dopaminergic neuron selective vulnerability. The evidence that activation of dopaminergic neurons causes degeneration and alters mRNA expression is convincing, as the authors use both vehicle and CNO control groups, but the evidence that chronic dopaminergic activation alters circadian rhythm and motor behavior is incomplete as the authors did not run a CNO-control condition in these experiments.

      Strengths:

      This is an exciting and important paper.

      The paper compares mouse transcriptomics with human patient data.

      It shows that selective degeneration can occur across the midbrain dopaminergic neurons even in the absence of a genetic, prion, or toxin neurodegeneration mechanism.

      We thank the reviewer for these comments.

      Weaknesses:

      Major concerns:

      (1) The lack of a CNO-positive, DREADD-negative control group in the behavioral experiments is the main limitation in interpreting the behavioral data. Without knowing whether CNO on its own has an impact on circadian rhythm or motor activity, the certainty that dopaminergic hyperactivity is causing these effects is lacking.

      We thank the reviewer for this important recommendation. Although the initial version showed that CNO does not produce degeneration of DA neuron terminals, it did not exclude a contribution to the behavioral changes. To address this, we now include a cohort of DREADD free non-injected mice treated with either vehicle or CNO (Figure S1C). We found that on its own, CNO did not significantly impact either light cycle or dark cycle running. Together these results along with the lack of degeneration observed with CNO treatment in non-DREADD mice (Figure 2D) support that our behavioral and histological results are the result of dopamine neuron activation.

      (2) One of the most exciting things about this paper is that the SNc degenerates more strongly than the VTA when both regions are, in theory, excited to the same extent. However, it is not perfectly clear that both regions respond to CNO to the same extent. The electrophysiological data showing CNO responsiveness is only conducted in the SNc. If the VTA response is significantly reduced vs the SNc response, then the selectivity of the SNc degeneration could just be because the SNc was more hyperactive than the VTA. Electrophysiology experiments comparing the VTA and SNc response to CNO could support the idea that the SNc has substantial intrinsic vulnerability factors compared to the VTA.

      We agree that additional electrophysiology conducted in the VTA dopamine neurons would meaningfully add to our understanding of the selective vulnerability in this model, and have completed these experiments in the revision (Figure 1, Figure S2). We now show that in vivo treatment with CNO causes some of the same physiological changes in VTA dopamine neurons as we found in SNc dopamine neurons, including an increased spontaneous firing rate, and a similar decrease in responsiveness to CNO in the slice recordings. Together these observations support the conclusion that SNc axons are intrinsically more vulnerable to increased activity than VTA dopamine axons. 

      (3) The mice have access to a running wheel for the circadian rhythm experiments. Running has been shown to alter the dopaminergic system (Bastioli et al., 2022) and so the authors should clarify whether the histology, electrophysiology, fiber photometry, and transcriptomics data are conducted on mice that have been running or sedentary.

      We have clarified which mice had access to a running wheel in the methods of our revision. Briefly, mice for histology, electrophysiology, and transcriptomics all had access to a running wheel during their treatment. The mice used for photometry underwent about 7 days of running wheel access approximately 3 weeks prior to the beginning of the experiment. The photometry headcaps prevented mice from having access to a running wheel in their home cage. Mice used for non-responder and non-hM3Dq (CNO alone) experiments also had access to a running wheel during their treatment. Mice used for the isradipine experiment did not have access to a running wheel, as the number of mice was too large and while unilateral hM3Dq expression allows for within-animal controls, it does not lend to clear interpretation of running wheel data.

      Reviewer #3 (Public Review):

      Summary:

      In this manuscript, Rademacher and colleagues examined the effect on the integrity of the dopamine system in mice of chronically stimulating dopamine neurons using a chemogenetic approach. They find that one to two weeks of constant exposure to the chemogenetic activator CNO leads to a decrease in the density of tyrosine hydroxylase staining in striatal brain sections and to a small reduction of the global population of tyrosine hydroxylase positive neurons in the ventral midbrain. They also report alterations in gene expression in both regions using a spatial transcriptomics approach. Globally, the work is well done and valuable and some of the conclusions are interesting. However, the conceptual advance is perhaps a bit limited in the sense that there is extensive previous work in the literature showing that excessive depolarization of multiple types of neurons associated with intracellular calcium elevations promotes neuronal degeneration. The present work adds to this by showing evidence of a similar phenomenon in dopamine neurons.

      We thank the reviewer for the careful and thoughtful review of our manuscript.

      While extensive depolarization and associated intracellular calcium elevations promote degeneration generally, we emphasize that the process we describe is novel. Indeed, prior studies delivering chronic DREADDs to vulnerable neurons in models of Alzheimer’s disease did not detect an increase in neurodegeneration, despite seeing changes in protein aggregation (e.g. Yuan and Grutzendler, J Neurosci 2016, PMID: 26758850; Hussaini et al., PLOS Bio 2020, PMID: 32822389). Further, a critical finding from our study is that in our paradigm, this stressor does not impact all dopamine neurons equally, as the SNc DA neurons are more vulnerable than VTA DA neurons, mirroring selective vulnerability characteristic of Parkinson’s disease. This is consistent with a large body of literature that SNc dopamine neurons are less capable of handling large energetic and calcium loads compared to neighboring VTA neurons, and the finding that chronically altered activity is sufficient to drive this preferential loss is novel. In addition, we are not aware of prior studies that have chronically activated DREADDs over several weeks to produce neurodegeneration.

      In terms of the mechanisms explaining the neuronal loss observed after 2 to 4 weeks of chemogenetic activation, it would be important to consider that dopamine neurons are known from a lot of previous literature to undergo a decrease in firing through a depolarization-block mechanism when chronically depolarized. Is it possible that such a phenomenon explains much of the results observed in the present study? It would be important to consider this in the manuscript.

      Thank you for this comment. As discussed in greater detail in the “comments on results section” below, our data suggests this isn’t a prominent feature in our model. However, we cannot rule out a contribution of depolarization block, and have expanded on the discussion of this possibility in the revised manuscript.

      The relevance to Parkinson's disease (PD) is also not totally clear because there is not a lot of previous solid evidence showing that the firing of dopamine neurons is increased in PD, either in human subjects or in mouse models of the disease. As such, it is not clear if the present work is really modelling something that could happen in PD in humans.

      We completely agree that evidence of increased dopamine neuron activity from human PD patients is lacking, and the little data that exists is difficult to interpret without human controls. However, as we outline in the manuscript, multiple lines of evidence suggest that the activity level of dopamine neurons almost certainly does change in PD. Therefore, it is very important that we understand how changes in the level of neural activity influence the degeneration of DA neurons. In this paper we examine the impact of increased activity. Increased activity may be compensatory after initial dopamine neuron loss, or may be an initial driver of death (Rademacher & Nakamura, Exp Neurol 2024, PMID: 38092187). In addition to the human and rodent data already discussed in the manuscript, additional support for increased activity in PD models include:

      • Elevated firing rates in asymptomatic MitoPark mice (Good et al., FASEB J 2011, PMID: 21233488)

      • Increased frequency of spontaneous firing in patient-derived iPSC dopamine neurons and primary mouse dopamine neurons that overexpress synuclein (Lin et al., Acta Neuropath Comm 2021, PMID: 34099060)

      • Increased spontaneous firing in dopamine neurons of rats injected with synuclein preformed fibrils compared to sham (Tozzi et al., Brain 2021, PMID: 34297092)

      We have included citation of these important examples in our revision. In our model, we have found that chronic hyperactivity causes a substantial loss of nigral DA terminals while mesolimbic terminals are relatively spared (Figure 2), and that striatal DA levels are markedly decreased (Figure S6), phenomena that are hallmarks of Parkinson’s disease.

      There are additional levels of complexity to accurately model changes in PD, which may differ between subtypes of the disease, the disease stage, and the subtype of dopamine neuron. Our study models a form of increased intrinsic activity, and interpretation of our results will be facilitated as we learn more about how the activity of DA neurons changes in humans in PD. Similarly, in future studies, it will also be important to study the impact of decreasing DA neuron activity.

      Comments on the introduction:

      The introduction cites a 1990 paper from the lab of Anthony Grace as support of the fact that DA neurons increase their firing rate in PD models. However, in this 1990 paper, the authors stated that: "With respect to DA cell activity, depletions of up to 96% of striatal DA did not result in substantial alterations in the proportion of DA neurons active, their mean firing rate, or their firing pattern. Increases in these parameters only occurred when striatal DA depletions exceeded 96%." Such results argue that an increase in firing rate is most likely to be a consequence of the almost complete loss of dopamine neurons rather than an initial driver of neuronal loss. The present introduction would thus benefit from being revised to clarify the overriding hypothesis and rationale in relation to PD and better represent the findings of the paper by Hollerman and Grace.

      We agree that the findings of Hollerman and Grace support compensatory changes in dopamine neuron activity in response to loss of dopamine neurons, rather than informing whether dopamine neuron loss can also be an initial driver of activity. Importantly, while significant changes to burst firing were not seen until almost complete loss of dopamine neurons, these recordings were made in anesthetized rats which may not be representative of neural activity in awake animals. We adjusted the text so that this is no longer referred to as ‘partial’ loss. At the same time, we point out that the results of other studies on this point are mixed: a 50% reduction in dopamine neurons didn’t alter firing rate or bursting (Harden and Grace, J Neurosci 1995, PMID: 7666198; Bilbao et al., Brain Res 2006, PMID: 16574080), while a 40% loss was found to increase firing rate and bursting (Chen et al., Brain Res 2009. PMID: 19545547) and larger reductions alter burst firing (Hollerman & Grace, Brain Res 1990, PMID: 2126975; Stachowiak et al., J Neurosci 1987, PMID: 3110381). Importantly, even if compensatory, such late-stage increases in dopamine neuron activity may contribute to disease progression and drive a vicious cycle of degeneration in surviving neurons. In addition, we also don’t know how the threshold of dopamine neuron loss and altered activity may differ between mice and humans, and PD patients do not present with clinical symptoms until ~30-60% of nigral neurons are lost (Burke & O’Malley, Exp Neurol 2013, PMID: 22285449; Shulman et al., Annu Rev Pathol 2011, PMID: 21034221).   

      Other lines of evidence support the potential role of hyperactivity in disease initiation, including increased activity before dopamine neuron loss in MitoPark mice (Good et al., FASEB J 2011, PMID: 21233488), increased spontaneous firing in patient-derived iPSC dopamine neurons (Lin et al., Acta Neuropath Comm 2021, PMID: 34099060), and increased activity observed in genetic models of PD (Bishop et al., J Neurophysiol 2010, PMID: 20926611; Regoni et al., Cell Death Dis 2020, PMID: 33173027).

      It would be good that the introduction refers to some of the literature on the links between excessive neuronal activity, calcium, and neurodegeneration. There is a large literature on this and referring to it would help frame the work and its novelty in a broader context.

      We agree that a discussion of hyperactivity, calcium, and neurodegeneration would benefit the introduction. Accordingly, we have expanded on our citation of this literature in both the introduction and discussion sections. However, we believe that the novelty of our study lies in: 1) a chronic chemogenetic activation paradigm via drinking water, 2) demonstrating selective vulnerability of dopamine neurons as a result of altering their activity/excitability alone, and 3) comparing mouse and human spatial transcriptomics.

      Comments on the results section:

      The running wheel results of Figure 1 suggest that the CNO treatment caused a brief increase in running on the first day after which there was a strong decrease during the subsequent days in the active phase. This observation is also in line with the appearance of a depolarization block.

      The authors examined many basic electrophysiological parameters of recorded dopamine neurons in acute brain slices. However, it is surprising that they did not report the resting membrane potential, or the input resistance. It would be important that this be added because these two parameters provide key information on the basal excitability of the recorded neurons. They would also allow us to obtain insight into the possibility that the neurons are chronically depolarized and thus in depolarization block.

      We do report the input resistance in Figure S1C (now Figure S2A, S2B), which was unchanged in CNO-treated animals compared to controls. We did not previously report the resting membrane potential because many of the DA neurons were spontaneously firing. In the revision, we now report the initial membrane potential on first breaking into the cell for the whole cell recordings, which did not vary between groups (Figure S2). This is still influenced by action potential activity, but is the timepoint in the recording least impacted by dialyzing the neuron with the internal solution, which might alter the intracellular concentrations of ions. We observed increased spontaneous action potential activity ex vivo in slices from CNO-treated mice (Figure 1D), thus at least under these conditions these dopamine neurons are not in depolarization block. We also did not see strong evidence of changes in other intrinsic properties of the neurons with whole cell recordings (e.g. Figure S2). Overall, our electrophysiology experiments are not consistent with the depolarization block model, at least not due to changes in the intrinsic properties of the neurons. Although our ex vivo findings cannot exclude a contribution of depolarization block in vivo, we do show that CNO-treated mice removed from their cages for open field testing continue to have a strong trend for increased activity for approximately 10 days (Figure S4B). This finding is also consistent with increased activity of the DA neurons. We have added discussion of these important considerations in the revision.

      It is great that the authors quantified not only TH levels but also the levels of mCherry, coexpressed with the chemogenetic receptor. This could in principle help to distinguish between TH downregulation and true loss of dopamine neuron cell bodies. However, the approach used here has a major caveat in that the number of mCherry-positive dopamine neurons depends on the proportion of dopamine neurons that were infected and expressed the DREADD and this could very well vary between different mice. It is very unlikely that the virus injection allowed to infect 100% of the neurons in the VTA and SNc. This could for example explain in part the mismatch between the number of VTA dopamine neurons counted in panel 2G when comparing TH and mCherry counts. Also, I see that the mCherry counts were not provided at the 2-week time point. If the mCherry had been expressed genetically by crossing the DAT-Cre mice with a floxed fluorescent reported mice, the interpretation would have been simpler. In this context, I am not convinced of the benefit of the mCherry quantifications. The authors should consider either removing these results from the final manuscript or discussing this important limitation.

      We thank the reviewer for this comment, and we agree that this is a caveat of our mCherry quantification. Quantitation of the number of mCherry+ DA neurons specifically informs the impact on transduced DA neurons, and mCherry appears to be less susceptible to downregulation versus TH. As the reviewer points out, it carries the caveat that there is some variability between injections. Our control animals give us an indicator of injection variability, which is likely substantial and prevents us from detecting more subtle changes. Nonetheless, we believe that it conveys useful complementary data. We discuss this caveat in our revision. Note that mCherry was not quantified at the two-week timepoint because there is no loss of TH+ cells at that time.

      Although the authors conclude that there is a global decrease in the number of dopamine neurons after 4 weeks of CNO treatment, the post-hoc tests failed to confirm that the decrease in dopamine number was significant in the SNc, the region most relevant to Parkinson's. This could be due to the fact that only a small number of mice were tested. A "n" of just 4 or 5 mice is very small for a stereological counting experiment. As such, this experiment was clearly underpowered at the statistical level. Also, the choice of the image used to illustrate this in panel 2G should be reconsidered: the image suggests that a very large loss of dopamine

      neurons occurred in the SNc and this is not what the numbers show. A more representative image should be used.

      We agree that the stereology experiments were performed on relatively small numbers of animals, such that only robust effects would be detected. Combined with the small effect size, this may have contributed to the post-hoc tests showing a trend of p=0.1 for both the TH and mCherry dopamine cell counts in the SN at 4 weeks. Given this small effect size, we would indeed need much larger groups to better discern these changes. Stereology is an intensive technique, and we have therefore elected to focus on terminal loss. We have also replaced panel 2G with a more representative CNO image.

      In Figure 3, the authors attempt to compare intracellular calcium levels in dopamine neurons using GCaMP6 fluorescence. Because this calcium indicator is not quantitative (unlike ratiometric sensors such as Fura2), it is usually used to quantify relative changes in intracellular calcium. The present use of this probe to compare absolute values is unusual and the validity of this approach is unclear. This limitation needs to be discussed. The authors also need to refer in the text to the difference between panels D and E of this figure. It is surprising that the fluctuations in calcium levels were not quantified. I guess the hypothesis was that there should be more or larger fluctuations in the mice treated with CNO if the CNO treatment led to increased firing. This needs to be clarified.

      We thank the reviewer for this comment. We understand that this method of comparing absolute values is unconventional. However, these animals were tested concurrently on the same system, and a clear effect on the absolute baseline was observed. We have included a caveat of this in our discussion. Panel D of this figure shows the raw, uncorrected photometry traces, whereas panel E shows the isosbestic corrected traces for the same recording. In panel E, the traces follow time in ascending order. We have also included frequency and amplitude data for these recordings (Figure S4A), along with discussion of the significance of these findings.

      Although the spatial transcriptomic results are intriguing and certainly a great way to start thinking about how the CNO treatment could lead to the loss of dopamine neurons, the presented results, the focusing of some broad classes of differentially expressed genes and on some specific examples, do not really suggest any clear mechanism of neurodegeneration. It would perhaps be useful for the authors to use the obtained data to validate that a state of chronic depolarization was indeed induced by the chronic CNO treatment. Were genes classically linked to increased activity like cfos or bdnf elevated in the SNc or VTA dopamine neurons? In the striatum, the authors report that the levels of DARP32, a gene whose levels are linked to dopamine levels, are unchanged. Does this mean that there were no major changes in dopamine levels in the striatum of these mice?

      While levels of DARPP32 mRNA were unchanged, our additional HPLC data show strong decreases in striatal dopamine in hyperactivated mice. We do not see strong changes in classic activity-related genes (data not shown), however these genes may behave differently in the context of chronic hyperactivity and ongoing degeneration. Instead, we employed NEUROeSTIMator (Bahl et al., Nature Comm. 2024, PMID: 38278804), a deep learning method to predict neural activation based on transcriptomic data. We found that predicted activity scores were significantly higher in GqCNO dopaminergic regions compared to controls (Figure X). Indeed, some of the genes used within the model to predict activity are immediate early genes eg. c-fos.

      The usefulness of comparing the transcriptome of human PD SNc or VTA sections to that of the present mouse model should be better explained. In the human tissues, the transcriptome reflects the state of the tissue many years after extensive loss of dopamine neurons. It is expected that there will be few if any SNc neurons left in such sections. In comparison, the mice after 7 days of CNO treatment do not appear to have lost any dopamine neurons. As such, how can the two extremely different conditions be reasonably compared? Our mouse model and human PD progress over distinct timescales, as is the case with essentially all mouse models of neurodegenerative diseases. Nonetheless, in our view there is still great value in comparing gene expression changes in mouse models with those in human disease. It seems very likely that the same pathologic processes that drive degeneration early in the disease continue to drive degeneration later in the disease. Note that we have tried to address the discrepancy in time scales in part by comparing our mouse model to early PD samples when there is more limited SNc DA neuron loss (see the proportion of DA neurons within the areas of human tissues we selected for sampling in Author response image 1). Therefore, we can indeed use spatial transcriptomics to compare dopamine neurons from mice with initial degeneration to those in patients where degeneration is ongoing.    

      Author response image 1.

      Violin plot of DA neuron proportions sampled within the vulnerable SNV (deconvoluted RCTD method used in unmasked tissue sections of the SNV). Control and early PD subjects.

      Comments on the discussion:

      In the discussion, the authors state that their calcium photometry results support a central role of calcium in activity-induced neurodegeneration. This conclusion, although plausible because of the very broad pre-existing literature linking calcium elevation (such as in excitotoxicity) to neuronal loss, should be toned down a bit as no causal relationship was established in the experiments that were carried out in the present study.

      Our model utilizes hM3Dq-DREADDs that function by activating Gq pathways that are classically expected to increase intracellular calcium to increase neuronal excitability. Indeed in slices from mice that were not treated with CNO, acute CNO application caused depolarizations (Figure 1E) that can be due to an increase in intracellular calcium and also cause increases in intracellular calcium. Additionally, our results show increased calcium by fiber photometry and changes to calcium-related genes, suggesting a causal relation and crucial role of calcium in the mechanism of degeneration. However, we agree that we have not experimentally proven this point. Indeed, a small preliminary experiment with chronic isradipine failed to show protection, although it lacked power to detect a partial effect. We have acknowledged this in the text, and also briefly consider other mechanisms such as increased dopamine levels that could also mediate the toxicity.

      In the discussion, the authors discuss some of the parallel changes in gene expression detected in the mouse model and in the human tissues. Because few if any dopamine neurons are expected to remain in the SNc of the human tissues used, this sort of comparison has important conceptual limitations and these need to be clearly addressed.

      As discussed, we sampled SN DA neurons in early PD (see Author response image 1), and in our view there is great value for such comparisons.

      A major limitation of the present discussion is that it does not discuss the possibility that the observed phenotypes are caused by the induction of a chronic state of depolarization block by the chronic CNO treatment. I encourage the authors to consider and discuss this hypothesis.

      As discussed above, our analyses of DA neuron firing in slices and open field testing to date do not support a prominent contribution of depolarization block with chronic CNO treatment. However, we cannot rule out this hypothesis, therefore we have included additional electrophysiology experiments and have added discussion of this important consideration.  

      Also, the authors need to discuss the fact that previous work was only able to detect an increase in the firing rate of dopamine neurons after more than 95% loss of dopamine neurons. As such, the authors need to clearly discuss the relevance of the present model to PD. Are changes in firing rate a driver of neuronal loss in PD, as the authors try to make the case here, or are such changes only a secondary consequence of extensive neuronal loss (for example because a major loss of dopamine would lead to reduced D2 autoreceptor activation in the remaining neurons, and to reduced autoreceptor-mediated negative feedback on firing). This needs to be discussed.

      As discussed above, while increases in dopamine neuron activity may be compensatory after loss of neurons, the precise percentage required to induce such compensatory changes is not defined in mice and varies between paradigms, and the threshold level is not known in humans. We also reiterate that a compensatory increase in activity could still promote the degeneration of critical surviving DA neurons, whose loss underlies the substantial decline in motor function that typically occurs over the course of PD. Moreover, there are also multiple lines of evidence to suggest that changes in activity can initiate and drive dopamine neuron degeneration (Rademacher & Nakamura, Exp Neurol 2024). For example, overexpression of synuclein can increase firing in cultured dopamine neurons (Dagra et al., NPJ Parkinsons Dis 2021, PMID: 34408150), while mice expressing mutant Parkin have higher mean firing rates (Regoni et al., Cell Death Dis 2020, PMID: 33173027). Similarly, an increased firing rate has been reported in the MitoPark mouse model of PD at a time preceding DA neuron degeneration (Good et al., FASEB J 2011, PMID: 21233488). We also acknowledge that alterations to dopamine neuron activity are likely complex in PD, and that dopamine neuron health and function can be impacted not just by simple increases in activity, but also by changes in activity patterns and regularity. We have amended our discussion to include the important caveat of changes in activity occurring as compensation, as well as further evidence of changes in activity preceding dopamine neuron death.

      There is a very large, multi-decade literature on calcium elevation and its effects on neuronal loss in many different types of neurons. The authors should discuss their findings in this context and refer to some of this previous work. In a nutshell, the observations of the present manuscript could be summarized by stating that the chronic membrane depolarization induced by the CNO treatment is likely to induce a chronic elevation of intracellular calcium and this is then likely to activate some of the well-known calcium-dependent cell death mechanisms. Whether such cell death is linked in any way to PD is not really demonstrated by the present results. The authors are encouraged to perform a thorough revision of the discussion to address all of these issues, discuss the major limitations of the present model, and refer to the broad pre-existing literature linking membrane depolarization, calcium, and neuronal loss in many neuronal cell types.

      While our model demonstrates classic excitotoxic cell death pathways, we would like to emphasize both the chronic nature of our manipulation and the progressive changes observed, with increasing degeneration seen at 1, 2, and 4 weeks of hyperactivity in an axon-first manner. This is a unique aspect of our study, in contrast to much of the previous literature which has focused on shorter timescales. Thus, while we have revised the discussion to more comprehensively acknowledge previous studies of calcium-dependent neuron cell death, we believe we have made several new contributions that are not predicted by existing literature. We have shown that this chronic manipulation is specifically toxic to nigral dopamine neurons, and the data that VTA dopamine neurons continue to be resilient even at 4 weeks is interesting and disease-relevant. We therefore do not want to use findings from other neuron types to draw assumptions about DA neurons, which are a unique and very diverse population. We acknowledge that as with all preclinical models of PD, we cannot draw definitive conclusions about PD with this data. However, we reiterate that we strongly believe that drawing connections to human disease is important, as dopamine neuron activity is very likely altered in PD and a clearer understanding of how dopamine neuron survival is impacted by activity will provide insight into the mechanisms of PD.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) The temporal design of the experiments is quite confusing. For instance, Figures 1 and 3 illustrate the daily changes of the mice and suggest some critical time points within 2 weeks of CNO administration, whereas Figure 2 presents data at 2 and 4 weeks, which are much later than the proposed critical time points. Furthermore, Figure 4 includes only 1 week data, and lacks subsequent data from 2 and 4 weeks, at which significant changes such as calcium levels and neuronal/axonal degeneration are observed.

      While interesting behavior and calcium phenotypes were detected within 2 and 4 weeks of CNO administration (Figures 1 and 3), we only collected tissues for histology at the 2 and 4 week time points (Figure 2). Observing degeneration of DA neuron axons but not cell bodies at 2 weeks served as a rationale to extend to the 4 week time point to determine whether degeneration was progressive. At the same time, our primary focus is on identifying early changes that may drive or contribute to the degeneration. As such, we recorded calcium changes over a 2-week treatment period, capturing the period during which almost all of the dopamine axons are lost. Similarly, we had the capacity to perform spatial transcriptomics at only one time point, and the 1 week time point was selected to capture transcriptomic changes that precede and potentially contribute to the mild and severe degeneration that occurs at 2 and 4 weeks, respectively. We have added text clarifying the rationale for the time points chosen.

      (2) The authors showed the changes in neuronal firing in dopamine neurons by the administration of CNO. However, one of the most important features of dopaminergic neuronal activity is dopamine release at its axon terminals in the striatum. Thus, the claims raised in this paper would be better supported if the authors further show any alterations in dopamine release (by FSCV or fluorescent dopamine sensors) at some critical time points during or after CNO application.

      While we are confident that DA release is altered due to the significant changes in behavior when hM3Dq DREADDs are activated specifically in DA neurons, the current manuscript does not quantify this, or distinguish between axonal and somatodendritic DA release. Interestingly, we did find significantly decreased striatal dopamine by HPLC after chronic activation (Figure S6). We believe that resolving these questions is beyond the scope of this manuscript, but have added text indicating the importance of these experiments.

      (3) The authors used 2% sucrose as a vehicle via drinking water. Please explain the rationale behind this choice.

      We used 2% sucrose as the vehicle because it is also added to the CNO water to counteract the bitterness of CNO (Kumar et al., J Neurotrauma 2024, PMID: 37905504). We have clarified this in the manuscript.

      (4) As we know, mRNA levels of some genes do not always predict their protein levels; there is sometimes a huge discrepancy between mRNA and protein abundance. In this paper, the mechanistic interpretation of the results by the authors heavily relies on the spatial transcriptomics of the midbrain and striatum. Thus, the authors need to provide additional data proving that the gene expression of some genes in the CNO group is also changed at the level of protein.

      We agree that validating hits at the protein level is valuable, however we were limited in our ability to assess these changes for the revision. However, we have done additional transcriptomics with the high resolution Xenium platform to increase confidence in a subset of hits of interest for follow up in future work, and we included data on genes related to DA metabolism and markers of DA neurons.

      (5) The authors provided spatial transcriptomics data only for mice with one week of chronic activation. However, other data also indicate significant differences when the activation period extends beyond 10 to 12 days (Figure 1C, Figure 3D-F). While a 7-day chronic activation time point might be crucial, additional transcriptomics data from later time points would be beneficial to confirm the persistence of these changes in gene expression. Furthermore, differential gene expression (DEG) analysis at these later time points could identify novel pathways or genes influenced by the chronic activation of dopamine neurons.

      This is an interesting point and would provide valuable data as to how chronic activity influences gene expression, however additional transcriptomics at later timepoints is beyond the scope of this paper. In future studies we will assess changes observed in this manuscript at other time points.

      (6) Figure 1D, Figure S1C:

      The authors should present the sample recording traces to demonstrate that the electrophysiological recordings were appropriately made.

      These data have been provided in Figure S2.

      (7) Figure S1C:

      AP thresholds in SNc dopamine neurons from both groups look quite high. In addition, considering the data from the previous reports, AP peak amplitudes in SNc dopamine neurons from both groups seem to be very low. Are these values correct? 

      The thresholds and peaks are correct, including the AP (threshold to peak), which is typical in our (Dr. Margolis’s) experience. AP thresholds are measured from an average of at least 10 APs, as the voltage at which the derivative of the trace first exceeds 10 V/s. As mentioned in the methods section, junction potentials were not corrected, which can result in values that are a bit depolarized from ground truth. This junction potential would be consistent across all recordings, thus not impede detection of a difference in AP thresholds between groups of animals.

      (8) Figure 1E:

      It would be better if the statistical significance is depicted in the graph.

      We don’t perform repeated measures statistics across data like these, as the data are continuous, collected at 10 kHz. For ease of displaying the data, the data for each neuron is binned and then these traces are averaged together. We display SEM to give a sense of the variance across neurons. We have provided sample traces of individual neurons to better demonstrate the variability and significance of this data (Figure S2).

      (9) Figure 2C:

      The representative staining images appear to be taken from coronal slices at anatomically different positions along the rostral-to-caudal axis. Although the total numbers of TH+ cells are comparable between vehicle and CNO groups in the graph, the sample images do not reflect this result. The authors should replace the current images with the better ones.

      We have replaced this image in the manuscript.

      Reviewer #2 (Recommendations For The Authors):

      Minor concerns:

      (1) The authors claim that their transcriptomics experiments are conducted 'before any degeneration has occurred'. And they do not see significant differences in the TH expression in the striatum. However, the n for these mice at 1 week is lower than the n use at 2 weeks (n=5 vs n=8-9) and the images used to show 'no degeneration' really look like there is some degeneration going on. Also, throughout the paper, there is a stronger effect when degeneration is measured with mCherry compared to when it is measured with TH. The 'no change' claim is made only with the TH comparison. It seems possible (and almost likely) that there would be significant axonal degeneration at one week with either a higher sample size or using the mCherry comparison. The authors should simply claim that their transcriptomics data is collected before any 'somatic' degeneration occurs.

      Thank you, we have included data that shows partial terminal loss after one week of activation (Figure S3B, Figure S5A) and have corrected this language in the manuscript to reflect transcriptomics occurring before somatic degeneration.

      (2) While selective degeneration is one of the most interesting findings in the paper, that finding is not emphasized and why it would be interesting to compare the VTA vs SNc is not discussed in the introduction.

      Emphasis for comparing the VTA vs the SNc has been added to the introduction, along with additional electrophysiology data in VTA dopamine neurons in Figure 1 and Figure S2.

      (3) In a similar direction, the vulnerability of dopaminergic neurons has been shown to be differential even within the SNc, with the ventral tier neurons degenerating more severely and the dorsal tier neurons remaining resilient. Is there any evidence for a ventral-dorsal degeneration gradient in the SNc in these experiments?

      This is a really interesting point and changes to dopamine neuron subtypes along the ventraldorsal axis may be occurring in this model, particularly as there is more selective loss of SNc neurons. However, the cell type involved would be difficult to determine at this stage, since single cell transcriptomic resolution is necessary across the entire SNc to identify cell subtypes. Transcriptomic identification is further complicated given that transcriptome change has recently been shown with genetic manipulation (Gaertner et al., bioRxiv 2024, PMID: 38895448), and we would think could similarly change with increased activity. Assessing these issues are beyond the scope of this paper.

      (4) The running data is very interesting and the circadian rhythm alterations are compelling.

      However, it is unclear whether the CNO mice run more total compared with the vehicle mice.

      The authors should show the combined total running data to evaluate this. We now show total running data in Figure 1C.

      (5) The finding that acute CNO has no effect on the membrane potential of SNc neurons after chronic CNO exposure is very peculiar! Especially because the fiber photometry data suggests that CNO continues to have an effect in vivo. Is there any explanation for this?

      While there is no acute electrophysiological response to CNO detected in this group, there may be intracellular pathways activated by the DREADD that do not acutely impact membrane potential in current clamp (I = 0 pA) mode.

      (6) The terminology of chronic CNO is sometimes confusing as it refers to both 2-week and 4week administration. Using additional terminology such as 'early' and 'late' might help with clarity.

      We have decreased usage of ‘chronic,’ and increased usage of more specific treatment times in order to increase clarity throughout the manuscript.

      (7) In Figure 2C, the SNc image looks binarized.

      This image has been updated.

      (8) Also in Figure 2, why are TH and mCherry measured for the 4-week time point, but only TH measured for the 2-week time point?

      mCherry quantification was performed to further support the finding of DA neuron death, and was therefore not assessed at 2 weeks given that there was no change in the TH stereology.

      (9) Additional scale bars and labeling is needed in Figure 3. In addition, there is such a strong reduction in noise after chronic CNO in the fiber photometry recordings, and the noise does not return upon CNO washout. What is the explanation for this?

      Additional scale bars were added to Figure 3. Traces are not getting less noisy with chronic CNO treatment, rather, there is less bursting activity in the dopamine cells. Our interpretation is that the baseline activity is rescued during washout but this bursting activity is not.

      (10) While not necessary to support the claims in this paper, it would be very interesting to see if chronic inhibition of dopaminergic neurons had a similar or different effect, as too little dopaminergic activity may also cause degeneration in some cases.

      We agree that assessing chronic inhibition is valuable, and this is an important area for future research.

      Reviewer #3 (Recommendations For The Authors):

      All the mice used in the study are not listed in the methods section. For example, the GCaMP6f floxed mice discussed in the results section are not listed in the methods. Also, the breeding scheme used for the different mouse lines needs to be described. For example, did the DAT-Cre mice carry one or two alleles?

      Both the DAT<sup>IRES</sup>Cre and GCaMP6f floxed (Ai148) Jax mouse line numbers and RRIDs are included in the methods. DAT<sup>IRES</sup>Cre mice carried two alleles.

      In the methods section, the amount of virus injected needs to be mentioned.

      This information has been added to the methods section.

      In all result graphs, please include the individual data points so that the readers can see the distribution of the data and quickly see the sample size.

      Graphs have been updated to include all individual data points. For line graphs, the distribution is communicated by the error bars, while the n is in the legends.

      The authors provide running wheel data in supplementary figure 1A to validate that chemogenetic activation of dopamine neurons leads to increased locomotor activity. The results shown in the figure appear to be qualitative as no average data is presented. The authors should provide average data from all mice tested.

      Average IP response data for all mice assessed for running wheel activity has been included in Figure S1.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary, and Strengths:

      The authors and their team have investigated the role of Vimentin Cysteine 328 in epithelial-mesenchymal transition (EMT) and tumorigenesis. Vimentin is a type III intermediate filament, and cysteine 328 is a crucial site for interactions between vimentin and actin. These interactions can significantly influence cell movement, proliferation, and invasion. The team has specifically examined how Vimentin Cysteine 328 affects cancer cell proliferation, the acquisition of stemness markers, and the upregulation of the non-coding RNA XIST. Additionally, functional assays were conducted using both wild-type (WT) and Vimentin Cysteine 328 mutant cells to demonstrate their effects on invasion, EMT, and cancer progression. Overall, the data supports the essential role of Vimentin Cysteine 328 in regulating EMT, cancer stemness, and tumor progression. Overall, the data and its interpretation are on point and support the hypothesis. I believe the manuscript has great potential.

      The authors are thankful to the reviewers for carefully reading the manuscript and evaluating the data to make positive comments and supporting our conclusions.

      Weaknesses:

      Minor issues are related to the visibility and data representation in Figures 2E and 3 A-F

      We have revised the figures (Figure 2E and Figure 3A-F) to increase the data visibility.

      Reviewer #2 (Public review):

      The aim of the investigation was to find out more about the mechanism(s) by which the structural protein vimentin can facilitate the epithelial-mesenchymal transition in breast cancer cells.

      The authors focussed on a key amino acid of vimentin, C238, its role in the interaction between vimentin and actin microfilaments, and the downstream molecular and cellular consequences. They model the binding between vimentin and actin in silico to demonstrate the potential involvement of C238, but the outcome is described vaguely.

      We have expanded the discussion of these results in the manuscript to more explicitly describe the critical role of C238 in the vimentin-actin interaction. Specifically, we highlight that C238 lies within a region of the vimentin rod domain known to mediate key protein-protein interactions. Our modeling shows that the thiol group of C238 enables specific hydrogen bonding and potential disulfide-mediated interactions with actin, which are disrupted upon mutation to serine. These findings provide mechanistic insight into the functional importance of this residue.

      The phenotype of a non-metastatic breast cancer cell line MCF7, which doesn't express vimentin, could be changed to a metastatic phenotype when mutant C238S vimentin, but not wild-type vimentin, was expressed in the cells. Expression of vimentin was confirmed at the level of mRNA, protein, and microscopically. Patterns of expression of vimentin and actin reflected the distinct morphology of the two cell lines. Phenotypic changes were assessed through assay of cell adhesion, proliferation, migration, and morphology and were consistent with greater metastatic potential in the C238S MCF7 cells. Changes in the transcriptome of MCF7 cells expressing wild-type and C238S vimentins were compared and expression of Xist long ncRNA was found to be the transcript most markedly increased in the metastatic cells expressing C238S vimentin. Moreover changes in expression of many other genes in the C238S cells are consistent with an epithelial mesenchymal transition. Tumourigenic potential of MCF7 cells carrying C238S but not wild-type, vimentin was confirmed by inoculation of cells into nude mice. This assay is a measure of the stem-cell quality of the cells and not a measure of metastasis. It does demonstrate phenotypic changes that could be linked to metastasis.

      shRNA was used to down-regulate vimentin or Xist in the MCF7 C238S cells. The description of the data is limited in parts and data sets require careful scrutiny to understand the full picture. Down-regulation of vimentin reversed the morphological changes to some degree, but down-regulation of Xist didn't.

      This is understandable given the fact that vimentin interacts with actin which is known to determine cell shape. XIST being a non-coding RNA will not have the same effect.

      Conversely, down-regulation of XIST inhibited cell growth, a sign of reversing metastatic potential, but down-regulation of vimentin had no effect on growth.

      XIST is known to get induced in a number of cancers (see Figure 3E) which is consistent with our observation that its downregulation will inhibit cell growth. However, downregulation of vimentin had no effect on growth which is consistent with our previously published observation that ectopic expression of wildtype vimentin in MCF-7 cells did not influence cell growth (Usman et al Cells 2022, 11(24), 4035; https://doi.org/10.3390/cells11244035).

      Down-regulation of either did inhibit cell migration, another sign of metastatic reversal.

      We have previously shown that ectopic expression of wildtype vimentin in MCF-7 stimulate cell migration due to downregulation of CDH5 (endothelial cadherins) (Usman et al Cells 2022, 11(24), 4035). Therefore, downregulation of vimentin is expected to inhibit cell migration which is what we observed in this study. Why downregulation of XIST inhibited cell migration is not clear. It is conceivable that XIST downregulation affects Lamin expression which may suppress intercellular interactions to increase cell migration. This hypothesis is supported by the fact that vimentin expression in MCF-7 affects Lamin expression (Usman et al Cells 2022, 11(24), 4035).

      The interpretation of this type of experiment is handicapped when full reversal of expression is not achieved, as was the case in this study.

      Full reversal of any biological effect is almost impossible to achieve which is because the shRNAs by nature are not 100% effective. This can however be tested using crispr Cas 9 gene editing to completely knockdown a protein (can’t be used for XIST as it is a non-coding RNA). In that case one has to assume that it will have no off-target effect.

      Overall the study describes an intriguing model of metastasis that is worthy of further investigation, especially at the molecular level to unravel the connection between vimentin and metastasis. The identification of a potential role for Xist in metastasis, beyond its normal role in female cells to inactivate one of the X chromosomes, corroborates the work of others demonstrating increased levels in a variety of tumours in women and even in some tumours in men. It would be of great interest to see where in metastatic cells Xist is expressed and what it binds to.

      The authors fully agree that it is an interesting model of metastasis/oncogenesis that requires further investigation.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Hua et al show how targeting amino acid metabolism can overcome Trastuzumab resistance in HER2+ breast cancer.

      Strengths:

      The authors used metabolomics, transcriptomics and epigenomics approaches in vitro and in preclinical models to demonstrate how trastuzumab-resistant cells utilize cysteine metabolism.

      Thank you for your valuable comments. We would like to extend our appreciation for your efforts. Your constructive suggestion would help improve our research.

      Weaknesses:

      However, there are some key aspects that needs to be addressed.

      Major:

      (1) Patient Samples for Transcriptomic Analysis: It is unclear from the text whether tumor tissues or blood samples were used for the transcriptomic analysis. This distinction is crucial, as these two sample types would yield vastly different inferences. The authors should clarify the source of these samples.

      Thank you for your valuable comments. In the transcriptomic analysis, we included the data of HER2 positive breast cancer patients who received trastuzumab in I-SPY2 trial (GSE181574). Tumor tissues were used in this dataset. We highlighted the usage of “pre-treatment breast cancer tumors” in Line 309 and included the overview of transcriptomic data analysis in I-SPY2 trial in Figure S1F.

      (2) The study only tested one trastuzumab-resistant and one trastuzumab-sensitive cell line. It is unclear whether these findings are applicable to other HER2-positive tumor cell lines, such as HCC1954. The authors should validate their results in additional cell lines to strengthen their conclusions.

      Thank you for your valuable comments. We agree with your opinion, and the exploration of multiple cell lines would make our research findings more comprehensive. This is a limitation of our study, and we would continue to improve our design and methods in future experiments.

      (3) Relevance to Metastatic Disease: Trastuzumab resistance often arises in patients during disease recurrence, which is frequently associated with metastasis. However, the mouse experiments described in this paper were conducted only in the primary tumors. This article would have more impact if the authors could demonstrate that the combination of Erastin or cysteine starvation with trastuzumab can also improve outcomes in metastasis models.

      Thank you for your valuable comments. We agree with your suggestions. The exploration of metastatic disease would make our research more meaningful and help better address clinical key issues. In our future studies, we will continue to investigate the association between the invasive and metastatic capabilities of trastuzumab resistant HER2 positive breast cancer and cysteine metabolism.

      Minor:

      (1) The figures lack information about the specific statistical tests used. Including this information is essential to show the robustness of the results.

      Thank you for your valuable comments. We added statistical information in our figure legends, including Line 849-850, Line 865-867, Line 881-882, Line 898-900, Line 910-911 and Line 923-924.

      (2) Figure 3K Interpretation: The significance asterisks in Figure 3K do not specify the comparison being made. Are they relative to the DMSO control? This should be clarified.

      Thank you for your valuable comments. We have modified this figure to demonstrate it more clearly. In Figure 3K, the significance was determined by one-way ANOVA and the comparison presented was relative to the DMSO control. It was indicated that the combination of erastin or cysteine starvation and trastuzumab could increase lipid peroxidation, although trastuzumab monotherapy did not induce ferroptosis.

      Additionally, the combination of erastin and trastuzumab could result in more lipid peroxidation than erastin alone. Similar results were also found in the combination of cysteine starvation and trastuzumab. These results showed that targeting cysteine metabolism plus trastuzumab could have synergic effects to induce ferroptosis in trastuzumab resistant HER2 positive breast cancer.

      Reviewer #2 (Public review):

      In this manuscript, Hua et al. proposed SLC7A11, a protein facilitating cellular cystine uptake, as a potential target for the treatment of trastuzumab-resistant HER2-positive breast cancer. If this claim holds true, the finding would be of significance and might be translated to clinical practice. Nevertheless, this reviewer finds that the conclusion was poorly supported by the data.

      Notably, most of the data (Figures 2-6) were based on two cell lines - JIMT1 as a representative of trastuzumab-resistant cell line, and SKBR3 as a representative of trastuzumab sensitive cell line. As such, these findings could be cell-line specific while irrelevant to trastuzumab sensitivity at all. Furthermore, the authors claimed ferroptosis simply based on lipid peroxidation (Figure 3). Cell viability was not determined, and the rescuing effects of ferroptosis inhibitors were missing. The xenograft experiments were also suspicious (Figure 4). The description of how cysteine starvation was performed on xenograft tumors was lacking, and the compound (i.e., erastin) used by the authors is not suitable for in vivo experiments due to low solubility and low metabolic stability. Finally, it is confusing why the authors focused on epigenetic regulations (Figures 5 & 6), without measuring major transcription factors (e.g., NRF2, ATF4) which are known to regulate SLC7A11.

      To sum up, this reviewer finds that the most valuable data in this manuscript is perhaps Figure 1, which provides unbiased information concerning the metabolic patterns in trastuzumab-sensitive and primary resistant HER2-positive breast cancer patients.

      Thank you for your valuable comments. We agree with your suggestions. Your feedback would help enhance the quality of our research.

      (1) Our research was mainly conducted in JIMT1 (trastuzumab resistant) and SKBR3 (trastuzumab sensitive), and this is a limitation of our study. The experimental validation using different cell lines will make our research findings more persuasive. In our future research, we will continuously optimize experimental design and methods to make our findings more comprehensive.

      (2) The detection of ferroptosis in our research was mainly performed by evaluating the lipid peroxidation. Experiments measuring cell viability and rescuing effects would help provide more evidence.

      We utilized CCK8 tests to compare cell viabilities of JIMT1 and SKBR3 in different erastin and RSL3 concentrations, as well as different exposure time of cysteine starvation. It was shown that JIMT1 was more sensitive to erastin and RSL3, but tolerant to cysteine starvation, which was consistent with the previous lipid peroxidation tests. This data was included in Figure S5C-E. We added the description in Line 375-379.

      In addition, we also performed experiments to explore the rescuing effects of ferroptosis inhibitor Fer-1. It was indicated that Fer-1 could suppress the lipid peroxidation resulted from erastin, RSL3 and cysteine starvation in both JIMT1 and SKBR3. This provided more evidence that cysteine metabolism played a vital role in modulating HER2 positive breast cancer ferroptosis. This data was included in Figure S5G and S5H. We added the description to Line 387-391.

      (3) In xenograft experiments, the cysteine starvation was performed by feeding cystine/cysteine-deficient diet (Xietong Bio). We added details of this diet on Line 236-237 in Methods.

      We agree with your opinion on the role of erastin in experiments in vivo. We have tried to optimize drug dissolution and other conditions by referring to previous relevant literature. We would continue to improve our experimental design and methods.

      (4) Epigenetic modifications have been recognized as crucial factors in drug resistance formation. An increasing number of studies have emphasized the importance of epigenetic changes in regulating the abnormal expression of oncogenes and tumor suppressor genes related to drug resistance. Currently, the role of epigenetic changes in the development of trastuzumab resistance in HER2 positive breast cancer is still in exploration. We tried to investigate the dysregulation of histone modifications and DNA methylation in trastuzumab resistant HER2 positive breast cancer. Our findings indicated that targeting H3K4me3 and DNA methylation could decrease SLC7A11 expression and induce ferroptosis. This would provide more evidence in exploring trastuzumab resistance mechanisms. We have provided a detailed discussion on Line 598-607.

      We would like to extend our appreciation for your constructive suggestions and continue to improve our research in future experiments.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      (1) Line 334: it would be helpful to clarify that JIMT1 cells are trastuzumab-resistant while SKBR3 cells are trastuzumab sensitive, especially for those not familiar with breast cancer cell lines.

      Thank you for your valuable recommendations. We added the description of trastuzumab sensitive SKBR3 and trastuzumab resistant JIMT1 on Line 334-335.

      (2) Figure 3: the concentrations of erastin and RSL3 should be indicated.

      Thank you for your valuable recommendations. In Figure 3, the concentration of erastin was 10μm and RSL3 was 1μm. We added these details in the figure legends on Line 872-873.

      (3) Figure 3: lipid peroxidation does not necessarily mean ferroptosis. Cell viability data and rescuing effects of ferroptosis inhibitors should be shown.

      Thank you for your valuable recommendations. As we mentioned above, we utilized CCK8 tests to compare cell viabilities of JIMT1 and SKBR3 in different erastin and RSL3 concentrations, as well as different exposure time of cysteine starvation. It was consistent with lipid peroxidation tests that JIMT1 was more sensitive to erastin and RSL3, but tolerant to cysteine starvation. This data was included in Figure S5C-E. We added the description in Line 375-379.

      As described above, we also performed experiments to explore the rescuing effects of ferroptosis inhibitor Fer-1. It was indicated that Fer-1 could suppress the lipid peroxidation resulted from erastin, RSL3 and cysteine starvation in both JIMT1 and SKBR3. This provided more evidence that cysteine metabolism played a vital role in modulating HER2 positive breast cancer ferroptosis. This data was included in Figure S5G and S5H. We added the description to Line 387-391.

      (4) Figure 3H: how cysteine starvation was performed should be clarified in the Methods section.

      Thank you for your valuable recommendations. We performed cell culture with cysteine starvation by utilizing cystine/cysteine-deficient DMEM (BIOTREE) and 1% penicillin streptomycin at 37℃ with 5% CO2. We added details of this diet on Line 141-143 in Methods.

      (5) Figure 4: the meaning of "H" should be clarified.

      Thank you for your valuable recommendations. H was indicated as trastuzumab. We clarified the meaning of “H” in the figure legends on Line 898.

      (6) Figure 4B & 4C: the data of "H" group and "Erastin" group are inconsistent.

      Thank you for your valuable recommendations. In the vivo experiments, the tumor volume changes were analyzed using a paired approach, comparing the tumor size of each individual mouse before and after treatment. We noticed the confusion caused and added more details about our vivo experiments on Line 240 in Methods and Line 892-893 in figure legends.

      (7) Figure 4: how cysteine starvation was performed should be clarified in the Methods section.

      Thank you for your valuable recommendations. We performed cysteine starvation by utilizing cystine/cysteine-deficient diet (Xietong Bio). We added details of this diet on Line 236-237 in Methods.

      We have also corrected some grammatical errors in the manuscript and We would like to extend our great appreciation to all editors and reviewers for their invaluable contributions.

    1. Author response:

      The following is the authors’ response to the original reviews

      Summary of revisions:

      Thanks to the careful review and comments from the reviewers, we restructured the introduction and the discussion to improve clarity and better contextualise findings. We notably discuss further the f<sub>sphere</sub> decrease observations in the cerebellum and the Tau-specific findings (Tau being a possible marker for Purkinje cells development and Tau switching compartment in the thalamus). We added material in Supplementary Information to support these discussion points. We added a figure to show the metabolic profiles normalised by water or by macromolecules and a figure and table related to a rough approximation of f<sub>sphere</sub>, leaning on existing literature. We report the DTI results for thoroughness.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this work, Ligneul and coauthors implemented diffusion-weighted MRS in young rats to follow longitudinally and in vivo the microstructural changes occurring during brain development. Diffusion-weighted MRS is here instrumental in assessing microstructure in a cell-specific manner, as opposed to the claimed gold-standard (manganese-enhanced MRI) that can only probe changes in brain volume. Differential microstructure and complexification of the cerebellum and the thalamus during rat brain development were observed noninvasively. In particular, lower metabolite ADC with increasing age were measured in both brain regions, reflecting increasing cellular restriction with brain maturation. Higher sphere (representing cell bodies) fraction for neuronal metabolites (total NAA, glutamate) and total creatine and taurine in the cerebellum compared to the thalamus were estimated, reflecting the unique structure of the cerebellar granular layer with a high density of cell bodies. Decreasing sphere fraction with age was observed in the cerebellum, reflecting the development of the dendritic tree of Purkinje cells and Bergmann glia. From morphometric analyses, the authors could probe non-monotonic branching evolution in the cerebellum, matching 3D representations of Purkinje cells expansion and complexification with age. Finally, the authors highlighted taurine as a potential new marker of cerebellar development.

      From a technical standpoint, this work clearly demonstrates the potential of diffusion-weighted MRS at probing microstructure changes of the developing brain non-invasively, paving the way for its application in pathological cases. Ligneul and coauthors also show that diffusionweighted MRS acquisitions in neonates are feasible, despite the known technical challenges of such measurements, even in adult rats. They also provide all necessary resources to reproduce and build upon their work, which is highly valuable for the community.

      From a biological standpoint, claims are well supported by the microstructure parameters derived from advanced biophysical modelling of the diffusion MRS data. The assumption of metabolite compartmentation, forming the basis of cell-specific microstructure interpretation of dMRS data, remains debated and should be considered with care (Rae, Neurochem Res, 2014, https://doi.org/10.1007/s11064-013-1199-5). External cross-validation of some of the authors' claims, in particular taurine in the thalamus switching from neurons to astrocytes during brain development, would be a highly valuable addition to this study.

      R1.1: We understand the reviewer's concerns. Metabolic compartmentation is not a one-toone correspondence. Although we interpret the results in the light of metabolic compartmentation, our results are not driven by this assumption. We could not perform a direct cross-validation of the taurine switch in the thalamus, but we now clarify in the discussion why the dMRS results themselves indicate a switch, and we integrate our results better with existing literature on taurine. We now discuss this in more detail for the cerebellar results too.

      Specific strengths:

      (1) The interpretation of dMRS data in terms of cell-specific microstructure through advanced biophysical modelling (e.g. the sphere fraction, modelling the fraction of cell bodies versus neuronal or astrocytic processes) is a strong asset of the study, going beyond the more commonly used signal representation metrics such as the apparent diffusion coefficient, which lacks specificity to biological phenomena.

      (2) The fairly good data quality despite the complexity of the experimental framework should be praised: diffusion-weighted MRS was acquired in two brain regions (although not in the same animals) and longitudinally, in neonates, including data at high b-values and multiple diffusion times, which altogether constitutes a large-scale dataset of high value for the diffusion-weighted MRS community.

      (3) The authors have shared publicly data and codes used for processing and fitting, which will allow one to reproduce or extend the scope of this work to disease populations, and which goes in line with the current effort of the MR(S) community for data sharing.

      Specific weaknesses:

      (1) This work lacks an introduction and a discussion about diffusion MRI, which is already a validated technique to assess brain development non-invasively. Although water lacks cellspecificity compared to metabolites, several studies have reported a decrease in water ADC and increased fractional anisotropy with brain maturation, associated with the myelination process and decreased water content (overview in Hüppi, Chapt. 30 of "Diffusion MRI: Theory, Methods, and Applications", Oxford University Press, 2010). Interestingly, the same observations are found in this work (decreased ADC with age for most metabolites in both brain regions), which should have been commented on. Moreover, the authors could have reported water diffusion properties in addition to metabolites', as I believe the water signal, used for coil combination and/or Eddy currents corrections, is usually naturally acquired during diffusion-weighted MRS scans.

      R1.2: Thank you for these helpful suggestions. We have now improved our introduction of the various modalities, and we contextualise the study in light of previous DTI findings in the as suggested by the reviewer. We agree with the reviewer that the comparison with previous human DTI is relevant, and we now mention it at the beginning of the discussion. However, the very different nature of the dMRS signal compared to dMRI (intracellular and absence of exchange for metabolites) prevents us from drawing any strong conclusions.

      (2) It is unclear why the authors have normalized metabolite concentrations (measured from low b-values diffusion-weighted MRS spectra) to the macromolecule concentrations. First, it is not specified whether in vivo macromolecules were acquired at each age or just at one time point. Second, such ratios are not standard practice in the MRS community so this choice should have been explained. Third, the macromolecule content was reported to change with age (Tkac et al., Magn Reson Med, 2003), therefore a change in metabolite to macromolecule ratio with age cannot be interpreted unequivocally.

      R1.3: We agree with the reviewer that this needed further explanations. We now clarify in the Results section “Metabolic profile changes with age” the reasoning behind choosing macromolecules for normalisation. We also added in the Supplementary Information the metabolite concentrations change with age when normalising by water, and a direct comparison with MM normalisation (Figure S2).

      (3) Some discussion is missing about the choice of the analytical biophysical model (although a few are compared in Supplementary Materials), in particular: is a model of macroscopic anisotropy relevant in cerebellum, made of a large fraction of oriented white matter tracks, and does the model remain valid at different ages given white matter maturation and the ongoing myelination process?

      R1.4: We agree with the reviewer that this is a valid concern. We actually acquired some standard DTI at the end of the acquisition sessions (where possible) having in mind the fibre dispersion estimation. However, data could not be acquired in all animals, and the data quality was poor (see Figure S8, the experimental conditions would have required further optimisation). We now add a couple of sentences at the beginning and in the end of discussion to address this limitation, and we include the DTI data in Supplementary Information.

      Reviewer #2 (Public Review):

      Summary:

      The authors set out to non-invasively track neuronal development in rat neonates, which they achieved with notable success. However, the direct relationship between the results and broader conclusions regarding developmental biology and potential human implications is somewhat overstretched without further validation.

      Strengths:

      If adequately revised and validated, this work could have a significant impact on the field, providing a non-invasive tool for longitudinal studies of brain development and neurodevelopmental disorders in preclinical settings.

      Weaknesses:

      (1) Consistency and Logical Flow:

      The manuscript suffers from a lack of strategic flow in some sections. Specifically, transitions between major findings and methodological discussions need refinement to ensure a logical progression of ideas. For example, the jump from the introduction of developmental trajectories and the technicalities of MRS (Magnetic Resonance Spectroscopy) processing on page 3 could benefit from a bridging paragraph that explicitly states the study's hypotheses based on existing literature gaps.

      R2.1: Thank you for this general feedback (along with your point (3)) that helped us restructure the introduction and the discussion to improve the clarity and flow.

      (2)  Scientific Rigour:

      While the novel application of diffusion-weighted MRS is commendable, there's a notable gap in the rigorous validation of this approach against gold-standard histological or molecular techniques. Particularly, the assertions regarding the sphere fraction and morphological changes inferred from biophysical modelling mandates direct validation to solidify the claims made. A study comparing these in vivo findings with ex vivo confirmation in at least a subset of samples would significantly enhance the reliability of these conclusions.

      R2.2: We agree with the reviewer that this would have been a great addition to the manuscript. Although we could not run new experiments to address these flaws, we now discuss the results more quantitatively, leaning on existing literature (addition of Figure S11 and Table S2). This helps us understand the results around Tau in both regions better, and illustrate the R<sub>sphere</sub> trend.

      (3) Clarity and Novelty:

      - The manuscript often delves deeply into technical specifics at the expense of accessibility to readers not deeply familiar with MRS technology. The introduction and discussions would benefit from a clearer elucidation of why these specific metabolite markers were chosen and their known relevance to neuronal and glial cells, placing this in the context of what is novel compared to existing literature.

      - The novelty aspect could be reinforced by a more structured discussion on how this method could change the current understanding or practices within neurodevelopmental research, compared to the current state of the art.

      R2.3: See answer to (1). By restructuring the introduction and the discussion, we hope to have addressed this point. We now discuss how these findings compare to the state of the art (notably added comparison with dMRI research). Along with the next comment, we better discuss potential implications of these findings for neurodevelopmental research.

      (4) Completeness:

      - The Discussion section requires expansion to offer a more comprehensive interpretation of how these findings impact the broader field of neurodevelopment and psychiatric disorders. Specifically, the implications for human studies or clinical translation are touched upon but not fully explored.

      - Further, while supplementary material provides necessary detail on methodology, key findings from these analyses should be summarized and discussed in the main text to ensure the manuscript stands complete on its own.

      R2.4: Thank you for these helpful suggestions. We now integrate the findings better into the existing literature. We notably discuss how the results might translate to humans.

      (5) Grammar, Style, Orthography:

      There are sporadic grammatical and typographical errors throughout the text which, while minor, detract from the overall readability. For example, inconsistencies in metabolite abbreviations (e.g., tCr vs Cr+PCr) should be standardized.

      R2.5: Thank you for the careful review. This has been corrected.

      (6) References and Additional Context:

      The current reference list is extensive but lacks integration into the narrative. Direct comparisons with existing studies, especially those with conflicting or supportive findings, are scant. More dedicated effort to contextualize this work within the existing body of knowledge would be beneficial.

      R2.6: Because the nature of this work is novel, it is difficult to find directly conflicting/similar works. However, we now integrate the findings into the broader literature.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Minor comments:

      Thank you for the careful review, we have addressed most of the minor comments, except for the last one, which we discuss below.

      - Some figures appear blurred in the printed PDF- Introduction: "constrained and hindered by cell membranes," - maybe use "restricted" instead of "constrained", like everywhere else in the text

      - Introduction: "(typically ~8cm3 vs ~8mm3 in dMRI in humans)" - here I suggest to put the rat brain sizes instead to help the reader understand how small the voxel was at P5 in this study, thus explaining the challenges

      - Fig 1 - numbers 1 and 2 on panel A,B should be clarified and they do not match 1 and 2 on panel C, which is confusing- Fig 2 - I am guessing the large dots are the mean and small are individual data points? Please clarify

      - Please specify "Relative CRLB" rather than just "CRLB", in supp. mat as well

      - Fig 3 - title of panel B, I would change "signal" into "concentration"

      - Fig 3 - end of caption: "and levelled to get Signal(tCr,P30)/Signal(MM,P30)=8", I think "in the thalamus" is missing

      - The results section "Biophysical modelling underlines different developmental trajectories of cell microstructure between the cerebellum and the thalamus" is sometimes unprecise, e.g.: "Cerebellum: The sphere fraction and the radius estimated from tNAA diffusion properties vary with age." but the tNAA sphere fraction seems to vary more with age in the thalamus according to table 1 "Cerebellum: fsphere decreases from 0.63 (P10) to 0.41 (P30), but R is stable" this is for tCr I presume

      - Table 1 - "pvalues" please add "before multiple comparison correction"

      - Figure 5 - Panel B, the L-segment subpanel is unclear -which metabolites is it referring to? Why does Tau have a * in panel A?

      - Update Ref 37 to the journal version

      - Methods: "A STELASER (Ligneul et al., MRM 2017) sequence", add numbered reference instead

      - Please specify that the DIVE toolbox uses Gaussian phase distribution approximation, it is important for the dMRS reader given that your diffusion gradient length is long and cannot be neglected, and that the SGP approximation does not apply.

      The Gaussian phase distribution approximation and the SGP approximation are two different concepts. The gradient duration ∂ (7 ms) is short compared to the gradient separation ∆ (100 ms), but it could still be considered too long for the SGP approximation to hold. However, the gradient duration is accounted for in DIVE in any case.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, the authors Eapen et al. investigated the peptide inhibitors of Cdc20. They applied a rational design approach, substituting residues found in the D-box consensus sequences to better align the peptides with the Cdc20-degron interface. In the process, the authors designed and tested a series of more potent binders, including ones that contain unnatural amino acids, and verified binding modes by elucidating the Cdc-20-peptide structures. The authors further showed that these peptides can engage with Cdc20 in the cellular context, and can inhibit APC/CCdc20 ubiquitination activity. Finally, the authors demonstrated that these peptides could be used as portable degron motifs that drive the degradation of a fused fluorescent protein.

      Strengths:

      This manuscript is clear and straightforward to follow. The investigation of different peptide variations was comprehensive and well-executed. This work provided the groundwork for the development of peptide drug modalities to inhibit degradation or apply peptides as portable motifs to achieve targeted degradation. Both of which are impactful.

      Weaknesses:

      A few minor comments:

      (1) In my opinion, more attention to the solubility issue needs to be discussed and/or tested. On page 10, what is the solubility of D2 before a modification was made? The authors mentioned that position 2 is likely solvent exposed, it is not immediately clear to me why the mutation made was from one hydrophobic residue to another. What was the level of improvement in solubility? Are there any affinity data associated with the peptide that differ with D2 only at position 2?

      The reviewer is correct that we have not done any detailed solubility characterisation; we refer only to observations rather than quantitative analysis. We wrote that we reverted from Leu to Ala due to solubility - we have clarified this statement (page 11) to say that that we reverted to Ala, as it was the residue present in D1, for which we observed a measurable affinity by SPR and saw a concentration-dependent response in the thermal shift analysis. We do not have any peptides or affinity data that explore single-site mutations with the parental peptide of D2. D2 is included in the paper because of its link to the consensus D-box sequence and thus was the logical path to the investigations into positions 3 and 7 that come later in the manuscript.

      (2) I'm not entirely convinced that the D19 density not observed in the crystal structure was due to crystal packing. This peptide is peculiar as it also did not induce any thermal stabilization of Cdc20 in the cellular thermal shift assay. Perhaps the binding of this peptide could be investigated in more detail (i.e., NMR?) Or at least more explanation could be provided.

      This section has been clarified (page 16). The lack of observed density was likely due to the relatively low affinity of D19 and also to the lack of binding of the three C-terminal residues in the crystal, and consequently it has a further reduced affinity. The current wording in the manuscript puts greater emphasis on this second aspect being a D19-specific issue, even though it applies to all four soaked peptides. The extent of peptide-induced thermal stabilisations observed by TSA and CETSA is different, with the latter experiment consistently showing smaller shifts. This observation may be due to the more complex medium (cell lysate vs. purified protein) and/or different concentrations of the proteins in solution. In the CETSA, we over-expressed a HiBiT-tagged Cdc20, which is present in addition to any endogenously expressed Cdc20. Although we did not investigate it, the near identical D-box binding sites on Cdc20 and Cdh1 would suggest that there will be cross-specificity, which could further influence the CETSA experiments.

      The section now reads:

      “We therefore assume that this is the reason for the lack of observed density in this region of the peptides D20 and D21 (Fig. S3E and S3F, respectively). We believe that it causes a reduction in binding affinities of all peptides in crystallo, given the evidence from SPR highlighting a role of position 7 in the interaction (Table 1). Interestingly, the observed electron density of the peptide correlates with Cdc20 binding affinity: D21 and D20, having the highest affinities, display the clearest electron density allowing six amino acids to be modeled, whereas D7 shows relatively poor density permitting modelling of only four residues. For D19, the lack of density observed likely reflects its intrinsically weaker affinity compared to the other peptides, in addition to losing the interactions from position 7 due to crystal packing.”

      Reviewer #2 (Public review):

      Summary:

      The authors took a well-characterised (partly by them), important E3 ligase, in the anaphase-promoting complex, and decided to design peptide inhibitors for it based on one of the known interacting motifs (called D-box) from its substrates. They incorporate unnatural amino acids to better occupy the interaction site, improve the binding affinity, and lay foundations for future therapeutics - maybe combining their findings with additional target sites.

      Strengths:

      The paper is mostly strengths - a logical progression of experiments, very well explained and carried out to a high standard. The authors use a carefully chosen variety of techniques (including X-ray crystallography, multiple binding analyses, and ubiquitination assays) to verify their findings - and they impressively achieve their goals by honing in on tight-binders.

      Weaknesses:

      Some things are not explained fully and it would be useful to have some clarification. Why did the authors decide to model their inhibitors on the D-box motif and not the other two SLiMs that they describe?

      For completeness, in addition to the D-box we did originally construct peptides based on the ABBA and KEN-box motifs, but they did not show any shift in melting temperature of cdc20 in the thermal shift assay whereas the D-box peptides did; consequently, we focused our efforts on the D-box peptides. Moreover, there is much evidence from the literature that points to the unique importance of the D-box motif in mediating productive interactions of substrates with the APC/C (i.e. those leading to polyubiquitination & degradation). One of the clearest examples is a study by Mark Hall’s lab (described in Qin et al. 2016), which tested the degradation of 15 substrates of yeast APC/C in strains carrying alleles of Cdh1 in which the docking sites for D-box, KEN or ABBA were mutated. They observed that whereas degradation of all 15 substrates depended on D-box binding, only a subset required the KEN binding site on Cdh1 and only one required the ABBA binding site. A more recent study from David Morgan’s lab (Hartooni et al. 2022) looking at binding affinities of different degron peptides concluded that KEN motif has very low affinity for Cdc20 and is unlikely to mediate degradation of APC/C-Cdc20 substrates. Engagement of substrate with the D-box receptor is therefore the most critical event mediating APC/C activity and the interaction that needs to be blocked for most effective inhibition of substrate degradation.

      We have added the following text to the Results section “Design of D-box peptides” (page 10):

      “We focused on D-box peptides, as there is much evidence from the literature that points to the unique importance of the D-box motif in mediating productive interactions of substrates with the APC/C (i.e. those leading to polyubiquitination & degradation). One of the clearest examples is a study that tested the degradation of 15 substrates of yeast APC/C in strains carrying alleles of Cdh1 in which the docking sites for D-box, KEN or ABBA were mutated ((Qin et al. 2017)). They observed that, whereas degradation of all 15 substrates depended on D-box binding, only a subset required the KEN binding site on Cdh1 and only one required the ABBA binding site. A more recent study (Hartooni et al. 2022) of binding affinities of different degron peptides concluded that KEN motif has very low affinity for Cdc20 and is unlikely to mediate degradation of APC/C-Cdc20 substrates. Engagement of substrate with the D-box receptor is therefore the most critical event mediating APC/C activity and the interaction that needs to be blocked for most effective inhibition of substrate degradation.”

      What exactly do they mean when they say their 'observation is consistent with the idea that high-affinity binding at degron binding sites on APC/C, such as in the case of the yeast 'pseudo-substrate' inhibitor Acm1, acts to impede polyubiquitination of the bound protein'? It's an interesting thing to think about, and probably the paper they cite explains it more but I would like to know without having to find that other paper.

      Interesting results from a number of labs (Choi et al. 2008,  Enquist-Newman et al. 2008,  Burton et al. 2011, Qin et al. 2019) have shown that mutation of degron SLiMs in Acm1 that weaken interaction with the APC/C have the unexpected consequence of converting Acm1 from APC/C inhibitor to APC/C substrate. A necessary conclusion of these studies is that the outcome of degron binding (i.e. whether the binder functions as substrate or inhibitor) depends on factors other than D-box affinity and that D-box affinity can counteract them. One idea is that if a binder interacts too tightly, this removes some flexibility required for the polyubiquitination process. The most recent study on this question (Qin et al.2019) specifically pins the explanation for the inhibitory function of the high affinity D-box in Acm1 on its ‘D-box Extension’ (i.e. residues 8-12) preventing interaction with APC10.  In our current study, the binding affinity of peptides is measured against Cdc20. In cellular assays however, the D-box must also engage APC10 for degradation to occur. It may be that the peptide binding most strongly to the D-box pocket on Cdc20 is less able to bind to APC10 and therefore less effective in triggering APC10-dependent steps in the polyubiquitination pathway. The important Hartooni et al. paper from David Morgan’s lab confirms that even though the binding of D-box residues to APC10 is very weak on its own, it can contribute 100X increase in affinity of a peptide by adding cooperativity to the interaction of D-box with co-activator. Re Figure 6 and the fact that we did look at peptide binding in cells, these experiments were done in unsynchronised cells, so most Cdc20 would not be bound to APC/C.

      We have modified the text (page 18) from:

      “However, we found the opposite effect: D2 and D3 showed increased rates of mNeon degradation compared to D1 and D19 (Fig. 8C,D). This observation is consistent with the idea that high-affinity binding at degron binding sites on APC/C, such as in the case of the yeast ‘pseudo-substrate’ inhibitor Acm1, acts to impede polyubiquitination of the bound protein (Qin et al. 2019). Indeed, there is no evidence that Hsl1, which is the highest affinity natural D-box (D1) used in our study, is degraded any more rapidly than other substrates of APC/C in yeast mitosis. As shown in Qin et al., mutation of the high affinity D-box in Acm1 converts it from inhibitor to substrate (Qin et al. 2019). Overall, our results support the conclusions that all the D-box peptides engage productively with the APC/C and that the highest affinity interactors act as inhibitors rather than functional degrons of APC/C.”

      to:

      “However, we found the opposite effect: D2 and D3 showed increased rates of mNeon degradation compared to D1 and D19 (Fig. 8C,D). This observation is consistent with conclusions from other studies that affinity of degron binding does not necessarily correlate with efficiency of degradation.  Indeed, there is no evidence that Hsl1, which is the highest affinity natural D-box (D1) used in our study, is degraded any more rapidly than other substrates of APC/C in yeast mitosis. A number of studies of a yeast ‘pseudo-substrate’ inhibitor Acm1, have shown that mutation of the high affinity D-box in Acm1 converts it from inhibitor to substrate (Choi et al. 2008,  Enquist-Newman et al. 2008,  Burton et al. 2011) through a mechanism that governs recruitment of APC10 (Qin et al. 2019). Our study does not consider the contribution of APC10 to binding of our peptides to APC/C<sup>Cdc20</sup> complex, but since there is strong cooperativity provided by this additional interaction (Hartooni et al. 2022) we propose this as the critical factor in determining the ability of the different peptides to mediate degradation of associated mNeon.”

      Reviewer #3 (Public review):

      Summary:

      Eapen and coworkers use a rational design approach to generate new peptide-inspired ligands at the D-box interface of cdc20. These new peptides serve as new starting points for blocking APC/C in the context of cancer, as well as manipulating APC/C for targeted protein degradation therapeutic approaches.

      Strengths:

      The characterization of new peptide-like ligands is generally solid and multifaceted, including binding assays, thermal stability enhancement in vitro and in cells, X-ray crystallography, and degradation assays.

      Weaknesses:

      One important finding of the study is that the strongest binders did not correlate with the fastest degradation in a cellular assay, but explanations for this behavior were not supported experimentally. Some minor issues regarding experimental replicates and details were also noted.

      Interesting results from a number of labs (Choi et al. 2008,  Enquist-Newman et al. 2008,  Burton et al. 2011, Qin et al. 2019) have shown that mutation of degron SLiMs in Acm1 that weaken interaction with the APC/C have the unexpected consequence of converting Acm1 from APC/C inhibitor to APC/C substrate. A necessary conclusion of these studies is that the outcome of degron binding (i.e. whether the binder functions as substrate or inhibitor) depends on factors other than D-box affinity and that D-box affinity can counteract them. One idea is that if a binder interacts too tightly, this removes some flexibility required for the polyubiquitination process. The most recent study on this question (Qin et al.2019) specifically pins the explanation for the inhibitory function of the high affinity D-box in Acm1 on its ‘D-box Extension’ (i.e. residues 8-12) preventing interaction with APC10.  In our current study, the binding affinity of peptides is measured against Cdc20. In cellular assays however, the D-box must also engage APC10 for degradation to occur. It may be that the peptide binding most strongly to the D-box pocket on Cdc20 is less able to bind to APC10 and therefore less effective in triggering APC10-dependent steps in the polyubiquitination pathway. The important Hartooni et al. paper from David Morgan’s lab confirms that even though the binding of D-box residues to APC10 is very weak on its own, it can contribute 100X increase in affinity of a peptide by adding cooperativity to the interaction of D-box with co-activator. Re Figure 6 and the fact that we did look at peptide binding in cells, these experiments were done in unsynchronised cells, so most Cdc20 would not be bound to APC/C.

      We have modified the text (page 18) from:

      “However, we found the opposite effect: D2 and D3 showed increased rates of mNeon degradation compared to D1 and D19 (Fig. 8C,D). This observation is consistent with the idea that high-affinity binding at degron binding sites on APC/C, such as in the case of the yeast ‘pseudo-substrate’ inhibitor Acm1, acts to impede polyubiquitination of the bound protein (Qin et al. 2019). Indeed, there is no evidence that Hsl1, which is the highest affinity natural D-box (D1) used in our study, is degraded any more rapidly than other substrates of APC/C in yeast mitosis. As shown in Qin et al., mutation of the high affinity D-box in Acm1 converts it from inhibitor to substrate (Qin et al. 2019). Overall, our results support the conclusions that all the D-box peptides engage productively with the APC/C and that the highest affinity interactors act as inhibitors rather than functional degrons of APC/C.”

      to:

      “However, we found the opposite effect: D2 and D3 showed increased rates of mNeon degradation compared to D1 and D19 (Fig. 8C,D). This observation is consistent with conclusions from other studies that affinity of degron binding does not necessarily correlate with efficiency of degradation.  Indeed, there is no evidence that Hsl1, which is the highest affinity natural D-box (D1) used in our study, is degraded any more rapidly than other substrates of APC/C in yeast mitosis. A number of studies of a yeast ‘pseudo-substrate’ inhibitor Acm1, have shown that mutation of the high affinity D-box in Acm1 converts it from inhibitor to substrate (Choi et al. 2008,  Enquist-Newman et al. 2008,  Burton et al. 2011) through a mechanism that governs recruitment of APC10 (Qin et al. 2019). Our study does not consider the contribution of APC10 to binding of our peptides to APC/C<sup>Cdc20</sup> complex, but since there is strong cooperativity provided by this additional interaction (Hartooni et al. 2022) we propose this as the critical factor in determining the ability of the different peptides to mediate degradation of associated mNeon.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) On page 12 (towards the end), the author stated D10 contained an A3P mutation, they meant P3A right? 'To test this hypothesis, we proceeded to synthesise D10, a derivative of D4 containing an A3P single point mutation.'

      We thank the reviewer for spotting this typo, which we have corrected.

      (2) Have the authors considered other orthogonal approaches to cross-examine/validate binding affinities? That said, I do not think extra experiments are necessary.

      We did not explore further orthogonal approaches due to the challenges of producing sufficient amounts of the Cdc20 protein. Due to the low affinities of many peptides for Cdc20, many techniques would have required more protein than we were able to produce. We believe that the qualitative TSA combined with the SPR is sufficient to convince the readers; indeed there is a correlation between SPR-determined binding affinities and the thermal shifts: For the natural amino acid-containing peptides (Table 1) D19 has the highest affinity and causes the largest thermal shift in the Cdc20 melting temperature, D10 has the lowest affinity and causes the smallest thermal shift, and D1, D3, D4, and D5 and all rank in the middle by both techniques. For those peptides containing unnatural amino acids (Table 2), again higher affinities are reflected in larger thermal shifts.

      Reviewer #2 (Recommendations for the authors):

      The data seem fine to me. I would appreciate a little more detail on the points mentioned in the public review. Also a thorough reread, maybe by a disinterested party as there are various typos that could be corrected - all in all an excellent clear paper that encompasses a lot of work.

      A colleague has carefully checked the manuscript, and typos have been corrected.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The study dissects distinct pools of diacylglycerol (DAG), continuing a line of research on the central concept that there is a major lipid metabolism DAG pool in cells, but also a smaller signaling DAG pool. It tests the hypothesis that the second pool is regulated by Dip2, which influences Pkc1 signaling. The group shows that stressed yeast increase specific DAG species C36:0 and 36:1, and propose this promotes Pkc1 activation via Pck1 binding 36:0. The study also examines how perturbing the lipid metabolism DAG pool via various deletions such as lro1, dga1, and pah1 deletion impacts DAG and stress signaling. Overall this is an interesting study that adds new data to how different DAG pools influence cellular signaling.

      Strengths:

      The study nicely combined lipidomic profiling with stress signaling biochemistry and yeast growth assays.

      We thank the reviewer for finding this study of interest and appreciating our multi-pronged approach to prove our hypothesis that a distinct pool of DAGs regulated by Dip2 activate PKC signalling.

      Weaknesses:

      One suggestion to improve the study is to examine the spatial organization of Dip2 within cells, and how this impacts its ability to modulate DAG pools. Dip2 has previously been proposed to function at mitochondria-vacuole contacts (Mondal 2022). Examining how Dip2 localization is impacted when different DAG pools are manipulated such as by deletion Pah1 (also suggested to work at yeast contact sites such as the nucleus-vacuole junction), or with Lro1 or Dga1 deletion would broaden the scope of the study.

      We thank the reviewer for the suggestion to trace the localization of Dip2 in the absence of various DAG-acting enzymes. To address this, we generated Dip2-GFP knock-in (KI) in Δpah1, Δlro1 and Δdga1 strains, confirming successful integration by western blotting using an anti-GFP antibody. We then performed microscopy to examine the localization of Dip2. Since Dip2 is a mitochondria-vacuole contact site protein that predominantly localizes to mitochondria (approximately 60% puncta of Dip2 localize to mitochondria) (Mondal et al. 2022), we co-stained the cells with MitoTracker red to visualize mitochondria.

      Consistent with our previous findings, Dip2 colocalizes with the MitoTracker red in WT (Figure 3-figure supplement 2 A). As suggested by the reviewer, we deleted PAH1, which converts phosphatidic acid to DAGs and is also known to work at the nucleus-vacuole junction. On examining whether absence of PAH1 influences the localization of Dip2, we found that there is no change in Dip2’s spatial organization. This could also be due to no observable change in the DAG species on deleting PAH1, as noted in our lipidomic studies (Figure 4. figure supplement 2A). These observations suggest that in a homeostatic condition, Pah1 does not affect the DAG pool acted upon by Dip2 and therefore has no influence on Dip2’s subcellular localization. This data has been incorporated in the revised manuscript (line no. 286-289) and Figure 4-figure supplement 2D-E.

      Similarly, we probed for the localization of Dip2 in LRO1 and DGA1 knock out strains. These enzymes are responsible for converting bulk DAGs to TAGs. We have previously shown that Dip2 is selective for only C36:0 and C36:1 and does not act on the bulk DAGs (Mondal et al. 2022). Both Lro1 and Dga1 are endoplasmic reticulum (ER) resident proteins and the bulk DAG accumulation in their knockouts is shown to be in the ER (Li et al. 2020), not influencing the mitochondrial DAG pool. On tracing Dip2’s localization in these knockouts, we found that Dip2 remains in the mitochondria (Figure 3-figure supplement 2, Figure 4. figure supplement 2D,E). These results suggest that Dip2 localization is not influenced by bulk DAG accumulation, reinforcing its specificity toward selective DAGs, which are likely to be present at mitochondria and mitochondria-vacuole contact sites. We have added this data in the revised manuscript (line no. 240-246) with Figure 3. figure supplement 2.

      Reviewer #2 (Public review):

      Summary:

      The authors use yeast genetics, lipidomic and biochemical approaches to demonstrate the DAG isoforms (36:0 and 36:1) can specifically activate PKC. Further, these DAG isoforms originate from PI and PI(4,5)P2. The authors propose that the Psi1-Plc1-Dip2 functions to maintain a normal level of specific DAG species to modulate PKC signalling.

      Strengths:

      Data from yeast genetics are clear and strong. The concept is potentially interesting and novel.

      We would like to thank the reviewer for the positive comments on our work and finding the study novel and interesting.

      Weaknesses:

      More evidence is needed to support the central hypothesis. The authors may consider the following:

      (1) Figure 2: the authors should show/examine C36:1 DAG. Also, some structural evidence would be highly useful here. What is the structural basis for the assertion that the PKC C1 domain can only be activated by C36:0/1 DAG but not other DAGs? This is a critical conclusion of this work and clear evidence is needed.

      We thank the reviewer for the insightful comments. We were unable to include C36:1 DAG in our in vitro DAG binding assays because it is not commercially available. We have now explicitly mentioned it in the revised manuscript (Line no. 186).

      We agree with the reviewer that PKC activated by C36:0 and C36:1 DAGs is a critical conclusion of our work. While we understand that there is no obvious structural explanation as to how the DAG binding C1 domain of PKC attains the acyl chain specificity for DAGs, our conclusion that yeast Pkc1 is selective for C36:0 and C36:1 DAGs, is supported by a combination of robust in vitro and in vivo data:

      (1) In Vitro Evidence: The liposome binding assays demonstrate that the Pkc1 C1 domain binds only to the selective DAG and does not interact with bulk DAGs.

      (2) In Vivo Evidence: Lipidomic analyses of wild-type cells subjected to cell wall stress reveal increased levels of C36:0 and C36:1 DAGs, while levels of bulk DAGs remain unaffected.

      These findings collectively indicate that Pkc1 neither binds nor is activated by bulk DAGs, reinforcing its specificity for C36:0 and C36:1 DAGs.

      Moreover, the structural basis of this selectivity would require either a specific DAG-bound C1 domain structure of Pkc1, which is difficult owing to the flexibility of the longer acyl chains present in C36:0 and C36:1 DAGs. In addition, capturing the full-length Pkc1 structure that might provide deeper insights has been challenging for several other groups. Also, we hypothesize that the DAG selectivity by Pkc1 is more of a membrane phenomenon wherein these DAGs might create a specific microdomain or form a particular curvature that is sensed by Pkc1. Investigating this would require extensive structural and biophysical studies, that are beyond the scope of the current work but are planned for future research.

      (2) Does Dip2 colocalize with Plc1 or Pkc1?

      As shown in our previous study (Mondal et al. 2022) and in the above section (Figure 3. figure supplement 2(A-B)), Dip2 predominantly localizes to the mitochondria. Pkc1, on the other hand, is known to be found in the cytosol, plasma membrane and bud site (Andrews and Stark 2000). We also checked the localization of Pkc1, co-stained with mitotracker-red and observed no significant overlap between the two, confirming that Pkc1 does not colocalize with Dip2 (Author response image 1).

      Author response image 1.

      Live cell microscopy for tracing Pkc1 localization. (A) Microscopy image panel showing DIC image (left), fluorescence for (A) Pkc1 tagged with GFP, mitotracker-red for staining mitochondria and the merged image for both the fluorophores (right). Scale bar represents 5 µm. (B) Line scan plotted for the fluorescence intensity of Pkc1-GFP along with mitotracker-red across the line shown in the merged panel.

      Moreover, as suggested by the reviewer, we also checked the localization of Plc1 and found that Plc1 is present in cytosol and shows a partial colocalization with the mitochondria (Figure 4-figure supplement 3A-B). As some puncta of Dip2 also colocalize with the vacuoles, we checked whether Plc1 also follows such localization pattern. We costained Plc1-GFP with FM4-64, a vacuolar membrane dye and observed that Plc1 partially localizes to vacuoles as well (Figure 4-figure supplement 3C-D). This is also observed in a previous study where Plc1 was found in a subcellular fractionation of isolated yeast vacuoles and total cell lysate (Jun, Fratti, and Wickner 2004). We also checked similar to Dip2, whether Plc1 also localizes to the Mitochondria-vacuole contact site by using tri-colour imaging with FM4-64 for vacuole, DAPI for mitochondria and GFP tagged Plc1. We were not able to trace Dip2 and Plc1 simultaneously as we could not generate a strain endogenously tagged with two different colours even after several attempts. However, from our observations, we can conclude that Plc1 partially localizes to mitochondria and vacuole and might be locally producing the selective DAGs to be acted upon by Dip2. We have incorporated this data in the revised manuscript (line no. 301-304) with Figure 4-figure supplement 3.

      For probing the localization of Dip2 upon Plc1 activation, we used cell wall stress- a condition inducing Plc1 activation for selective DAG production (this study). Under this condition, we probed the localization of Dip2 by fluorescent microscopy and found that Dip2 does not move to the plasma membrane but remains localized to mitochondria (Figure. 1. figure supplement 3). This result has been added in the revised manuscript (line no. 153-160) with Figure. 1-figure supplement 3.

      This raises intriguing questions regarding the spatial regulation of Pkc1 by Dip2. Since Dip2’s localization remains unaffected, whether the selective DAGs, presumably at the mitochondria, move to the plasma membrane for Pkc1 activation or the Pkc1 translocates to the mitochondria needs further exploration. Addressing these possibilities will require a combination of genetic approaches, organellar lipidomics, and advanced microscopy, which we aim to explore in future studies.

      References:

      Andrews, P. D., and M. J. Stark. 2000. “Dynamic, Rho1p-Dependent Localization of Pkc1p to Sites of Polarized Growth.” Journal of Cell Science 113 ( Pt 15): 2685–93. doi:10.1242/jcs.113.15.2685.

      Jun, Youngsoo, Rutilio A. Fratti, and William Wickner. 2004. “Diacylglycerol and Its Formation by Phospholipase C Regulate Rab- and SNARE-Dependent Yeast Vacuole Fusion*.” Journal of Biological Chemistry 279(51): 53186–95. doi:10.1074/jbc.M411363200.

      Li, Dan, Shu-Gao Yang, Cheng-Wen He, Zheng-Tan Zhang, Yongheng Liang, Hui Li, Jing Zhu, et al. 2020. “Excess Diacylglycerol at the Endoplasmic Reticulum Disrupts Endomembrane Homeostasis and Autophagy.” BMC Biology 18(1): 107. doi:10.1186/s12915-020-00837-w.

      Mondal, Sudipta, Priyadarshan Kinatukara, Shubham Singh, Sakshi Shambhavi, Gajanan S Patil, Noopur Dubey, Salam Herojeet Singh, et al. 2022. “DIP2 Is a Unique Regulator of Diacylglycerol Lipid Homeostasis in Eukaryotes.” eLife 11: e77665. doi:10.7554/eLife.77665.

    1. Author response:

      We wish to express our gratitude to the reviewers for their insightful and constructive comments on the initial version of our manuscript. We greatly value their observations and have every intention of addressing their remarks in a thorough and constructive manner. Based on the editors’ and reviewers’ feedback, we realize that it was not entirely clear that we intended this work primarily to be a resource and not yield strong insights into DNN-human alignment. Since our method also covers the broad range of natural objects - as used in the vast majority of studies on object processing - we also feel we did not sufficiently highlight the breadth of the tool. Based on the editors’ assessment, our explorations into the limits of the method - which we saw as a strength, not a weakness of our work - perhaps overshadowed the otherwise broad applicability somewhat. We hope to clarify this in the revised manuscript. Beyond these general points, we would like to address the following four points:

      • Where feasible, we intend to undertake additional analyses and refine existing ones. For instance, we plan to provide noise ceilings for all datasets where such calculations are possible, and we plan to give careful consideration to implementing a permutation or label-shuffling test to explore some of the ideas shared by the reviewers.

      • We plan to discuss more thoroughly several topics raised by the reviewers (e.g., how our approach might contend with different experimental situations such when using line drawings as stimuli).

      • We aim to enhance the clarity of our manuscript throughout. This will include refining the wording of our abstract and offering a more detailed explanation of the methods employed in the fMRI analyses.

      • We plan to elaborate further on our line of reasoning by addressing potential sources of misunderstanding—such as clarifying what we mean by a “lack of data” and providing greater detail regarding the nature of the 49-dimensional embedding.

    1. Author response:

      The evidence supporting this mechanism is incomplete, with additional work needed to clarify SHP-1's role, the contribution of Fc receptor crosslinking, and the biological relevance across normal and malignant B cells. 

      We will address these points by:

      - including SHP-1 inhibitors in the DuoHexaBody-CD37 cytotoxicity experiments to address the role of SHP-1

      - investigating which Fc receptors are involved in the crosslinking using FcR blocking antibodies and/or use purified fixed effector cells that express different Fc receptors in the DuoHexaBody-CD37 cytotoxicity experiments 

      - study the effect of DuoHexaBody-CD37 on normal B cells

      As the findings are based primarily on in vitro models, further validation would be required to support broader translational conclusions.

      We would like to refer to previous studies that showed potent cytotoxicity of DuoHexaBody-CD37 in vivo, including xenograft and PDX lymphoma models supporting broader translational conclusions:

      Oostindie et al. Blood Cancer Journal (2020) 10:30 https://doi.org/10.1038/s41408-020-0292-7

    1. Author response:

      We thank the reviewers for their comments and for their constructive suggestions. We intend to submit a revised manuscript where we address the comments made in the Public Reviews as well as in the Recommendations for the Authors.

      One of our most interesting findings, as noted by the reviewers, was the discovery of a small subpopulation of cells likely arrested in G2 that accounts for a disproportionate amount of radiation-induced gene expression. In addition, to the responses indicated below, we are planning to include additional “wet lab” experiments in the revised manuscript that address the properties of this seemingly important subpopulation of cells.

      Reviewer 1:

      Strengths:

      (1) The authors have used robust methods for rearing Drosophila larvae, irradiating wing discs, and analyzing the data with Seurat v5 and HHI.

      (2) These data will be informative for the field.

      (3) Most of the data is well-presented.

      (4) The literature is appropriately cited.

      Thank you for these comments

      Weaknesses:

      (1) The data in Figure 1 are single-image representations. I assume that counting the number of nuclei that are positive for these markers is difficult, but it would be good to get a sense of how representative these images are and how many discs were analyzed for each condition in B-M.

      (2) Some of the figures are unclear.

      In the revised manuscript, we will provide a more detailed quantitative analysis. For each condition, we analyzed 4 - 9 discs.

      We assume that the reviewer in referring to panels in Figure 1. We will review these images and if necessary, repeat the experiments or choose alternative images that appear clearer.

      Reviewer 2:

      Overall, the data presented in the manuscript are of high quality but are largely descriptive. This study is therefore perceived as a resource that can serve as an inspiration for the field to carry out follow-up experiments.

      We intend to include more  “wet lab” experiments in our revised manuscript to address the identity and properties of the high-trbl cells that we have identified using the clustering approach based on cell-cycle gene expression.

      Reviewer 3:

      Strengths:

      Overall, the manuscript makes a compelling case for heterogeneity in gene expression changes that occur in response to uniform induction of damage by X-rays in a single-layer epithelium. This is an important finding that would be of interest to researchers in the field of DNA damage responses, regeneration, and development.

      Thank you.

      Weaknesses:

      This work would be more useful to the field if the authors could provide a more comprehensive discussion of both the impact and the limitations of their findings, as explained below.

      Propidium iodide staining was used as a quality control step to exclude cells with a compromised cell membrane. But this would exclude dead/dying cells that result from irradiation. What fraction of the total do these cells represent? Based on the literature, including works cited by the authors, up to 85% of cells die at 4000R, but this likely happens over a longer period than 4 hours after irradiation. Even if only half of the 85% are PI-positive by 4 hr, this still removes about 40% of the cell population from analysis. The remaining cells that manage to stay alive (excluding PI) at 4 hours and included in the analysis may or may not be representative of the whole disc. More relevant time points that anticipate apoptosis at 4 hr may be 2 hr after irradiation, at which time pro-apoptotic gene expression peaks (Wichmann 2006). Can the authors rule out the possibility that there is heterogeneity in apoptosis gene expression, but cells with higher expression are dead by 4 hours, and what is left behind (and analyzed in this study) may be the ones with more uniform, lower expression? I am not asking the authors to redo the study with a shorter time point, but to incorporate the known schedule of events into their data interpretation.

      We thank the reviewer for these important comments. The generation of single-cell RNAseq data from irradiated cells is tricky. Many cells have already died. Even those that do not incorporate propidium iodide are likely in early stages of apoptosis or are physiologically unhealthy and likely made it through our FACS filters. Indeed, in irradiated samples up to  57% of sequenced cells were not included in our analysis since their RNA content seemed to be of low quality. It is therefore likely that our data are biased towards cells that are less damaged. As advised by the reviewer, we will include a clearer discussion of these issues as well as the time course of events and how our analysis captures RNA levels only at a single time point.

      If cluster 3 is G1/S, cluster 5 is late S/G2, and cluster 4 is G2/M, what are clusters 0, 1, and 2 that collectively account for more than half of the cells in the wing disc? Are the proportions of clusters 3, 4, and 5 in agreement with prior studies that used FACS to quantify wing disc cells according to cell cycle stage?

      Clusters 0, 1, and 2 likely contain cells in other stages of the cell cycle, including early G1. Other studies indicate that more than 70% of cells are expected to have a 4C DNA content 4 h after irradiation at 4000 Rad. The high-trbl cluster only accounts for 18% of cells. Thus clusters 0, 1 and 2 could potentially contain other populations that also have a 4C DNA content. Importantly, similar proportions of cells in these clusters are also observed in unirradiated discs. We are mining the gene expression patterns in these clusters with the goal of estimating their location in the cell cycle and will include those data in the revised manuscript.

      The EdU data in Figure 1 is very interesting, especially the persistence in the hinge. The authors speculate that this may be due to cells staying in S phase or performing a higher level of repair-related DNA synthesis. If so, wouldn't you expect 'High PCNA' cells to overlap with the hinge clusters in Figures 6G-G'? Again, no new experiments are needed. Just a more thorough discussion of the data.

      We have found that the locations of elevated PCNA expression do not always correlate with the location of EdU incorporation either by examining scRNA-seq data or by using HCR to detect PCNA. PCNA expression is far more widespread. We intend to present additional data that address this point and also a more thorough discussion in the revised manuscript.

      Trbl/G2/M cluster shows Ets21C induction, while the pattern of Ets21C induction as detected by HCR in Figures 5H-I appears in localized clusters. I thought G2/M cells are not spatially confined. Are Ets21C+ cells in Figure 5 in G2/M? Can the overlap be confirmed, for example, by co-staining for Trbl or a G2/M marker with Ets21C?

      The data show that the high_-trbl_ cells are higher in Ets21C transcripts relative to other cell-cycle-based clusters after irradiation. This does not imply that high-trbl-cells in all regions of the disc upregulate Ets21C equally. Ets21C expression is likely heterogeneous in both ways – by location in the disc and by cell-cycle state. We will attempt to look for co-localization as suggested by the reviewer.

      Induction of dysf in some but not all discs is interesting. What were the proportions? Any possibility of a sex-linked induction that can be addressed by separating male and female larvae?

      We can separate the cells in our dataset into male and female cells by expression of lncRNA:roX1/2. When we do this, we see X-ray induced dysf expressed similarly in both male and female cells. We think that it is therefore unlikely that this difference in expression can be attributed to cell sex. We are investigating other possibilities such as the maturity of discs.

    1. Author response:

      The following is the authors’ response to the original reviews

      eLife Assessment

      This paper investigates how isoform II of transcription factor RUNX2 promotes cell survival and proliferation in oral squamous cell carcinoma cell lines. The authors used gain and loss of function techniques to provide incomplete evidence showing that RUNX2 isoform silencing led to cell death via several mechanisms including ferroptosis that was partially suppressed through RUNX2 regulation of PRDX2 expression. The study provides useful insight into the underlying mechanism by which RUNX2 acts in oral squamous cell carcinoma, but the conclusions of the authors should be revised to acknowledge that ferroptosis is not the only cause of cell death.

      We appreciate the editor’s positive comments on our work and the valuable suggestions provided by the reviewers. We did find that RUNX2 isoform II knockdown or HOXA10 knockdown could also lead to apoptosis. We have revised our title as following: “RUNX2 Isoform II Protects Cancer Cells from Ferroptosis and Apoptosis by Promoting PRDX2 Expression in Oral Squamous Cell Carcinoma”. In addition, we have also revised our conclusions in the abstract as follows: “OSCC cancer cells can up-regulate RUNX2 isoform II to inhibit ferroptosis and apoptosis, and facilitate tumorigenesis through the novel HOXA10/RUNX2 isoform II/PRDX2 pathway.” We have added more experiments to better support our conclusions. Please see following responses to reviewers.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this paper, authors investigated the role of RUNT-related transcription factor 2 (RUNX2) in oral squamous carcinoma (OSCC) growth and resistance to ferroptosis. They found that RUNX2 suppresses ferroptosis through transcriptional regulation of peroxiredoxin-2. They further explored the upstream positive regulator of RUNX2, HOXA10 and found that HOXA10/RUNX2/PRDX2 axis protects OSCC from ferroptosis.

      Strengths:

      The study is well designed and provides a novel mechanism of HOXA10/RUNX2/PRDX2 control of ferroptosis in OSCC.

      Weaknesses:

      According to the data presented in (Figure 2F, Figure 3F and G, Figure 5D and Figure 6E and F), apoptosis seems to be affected in the same amount as ferroptosis by HOXA10/RUNX2/PRDX2 axis, which raises questions on the authors' specific focus on ferroptosis in this study. Reasonably, authors should adapt the title and the abstract in a way that recapitulates the whole data, which is HOXA10/RUNX2/PRDX2 axis control of cell death, including ferroptosis and apoptosis in OSCC.

      We really grateful for your comments. We agree that these figures do show that isoform II-knockdown or HOXA10-knockdown could induce apoptosis. We have adapted the title and abstract as follow:

      Title: “RUNX2 Isoform II Protects Cancer Cells from Ferroptosis and Apoptosis by Promoting PRDX2 Expression in Oral Squamous Cell Carcinoma”.

      Abstract: “In the present study, we surprisingly find that RUNX2 isoform II is a novel ferroptosis and apoptosis suppressor. RUNX2 isoform II can bind to the promoter of peroxiredoxin-2 (PRDX2), a ferroptosis inhibitor, and activate its expression. Knockdown of RUNX2 isoform II suppresses cell proliferation in vitro and tumorigenesis in vivo in oral squamous cell carcinoma (OSCC). Interestingly, homeobox A10 (HOXA10), an upstream positive regulator of RUNX2 isoform II, is required for the inhibition of ferroptosis and apoptosis through the RUNX2 isoform II/PRDX2 pathway. Consistently, RUNX2 isoform II is overexpressed in OSCC, and associated with OSCC progression and poor prognosis. Collectively, OSCC cancer cells can up-regulate RUNX2 isoform II to inhibit ferroptosis and apoptosis, and facilitate tumorigenesis through the novel HOXA10/RUNX2 isoform II/PRDX2 pathway.”

      In addition, we have performed the rescue experiment showing that PRDX2 overexpression rescues the apoptosis induced by isoform II-knockdown (Figure 4-figure supplement 4) or HOXA10-knockdown (Figure 7-figure supplement 2).

      We have added the description about these experiments in result “RUNX2 isoform II promotes the expression of PRDX2” and “HOXA10 inhibits ferroptosis and apoptosis through RUNX2 isoform II” as follow: “In addition, we found that PRDX2 overexpression could partially reduce the increased apoptosis caused by isoform II-knockdown. (Figure 4-figure supplement 4).” “PRDX2 overexpression also could rescue the increased cellular apoptosis caused by HOXA10 knockdown (Figure 7-figure supplement 2).”.

      Comments:

      In the description of the result section related to Figure 3E, the author wrote "In addition, we found that isoform II-knockdown induced shrunken mitochondria with vanished cristae with transmission electron microscopy (Figure 3E). These results suggest that RUNX2 isoform II may suppress ferroptosis." The interpretation provided here is not clear to the reviewer. How shrunken mitochondria and vanished cristae can be linked to ferroptosis?

      We apologize for the inaccurate description. Ferroptotic cells usually exhibit shrunken mitochondria, reduced or absent cristae, and increased membrane dentistry (Dixon et al., 2012). However, the presence of shrunken mitochondria or vanished cristae does not guarantee that ferroptosis has occurred in the cells. Other evidences, such as the increased ROS production and lipid peroxidation accumulation in cells with RUNX2 isoform II-knockdown must be evaluated as we are showing in Figure 3A and 3B. Furthermore, isoform II overexpression suppressed ROS production (Figure 3C) and lipid peroxidation (Figure 3D). We have revised our interpretation as follow: “In addition, we found that isoform II-knockdown induced shrunken mitochondria with vanished cristae with transmission electron microscopy (Figure 3E). This phenomenon along with the above results of ROS production and lipid peroxidation accumulation assays suggests that RUNX2 isoform II may suppress ferroptosis.”.

      Dixon, S. J., Lemberg, K. M., Lamprecht, M. R., Skouta, R., Zaitsev, E. M., Gleason, C. E., . . . Stockwell, B. R. (2012). Ferroptosis: an iron-dependent form of nonapoptotic cell death. Cell, 149(5), 1060-1072. doi:10.1016/j.cell.2012.03.042 PMID:22632970

      The electron microscopy images show more elongated mitochondria in the RUNX2 isoform II-KO cells than in RUNX2 isoform II positive cells, which might result from the fusion of mitochondria. These images should complete with a fluorescent mitochondria staining of these cells.

      We do find that the TEM images of RUNX2 isoform II-knockdown cells show more elongated mitochondria. The mitochondria undergo cycles of fission and fusion, known as mitochondrial dynamics, which in turn leads to changes in mitochondrial length. Through examining factors related to mitochondrial dynamics, we find that isoform II knockdown could decrease the expression levels of FIS1 (Fission, Mitochondrial 1) (Figure 3-figure supplement 2B) which mediates the fission of mitochondria. Therefore, we speculate that the elongated mitochondria in the isoform II-knockdown cells may be due to the decrease in mitochondrial fission through inhibiting FIS1 expression.

      In addition, we have tried our best to perform the fluorescent staining of mitochondrial to observe mitochondrial morphology. However, due to the quality of probes and fluorescent microscope, our images of mitochondrial fluorescence were not satisfactory. So, we re-capture more electron microscopy images, measure the length of mitochondria, and perform statistical analyses. We find that isoform II-knockdown cells show significantly more mitochondrial elongation than the control cells (Author response image 1 and Figure 3-figure supplement 2A). Therefore, we believe that isoform II knockdown promotes mitochondrial elongation to be relatively reliable.

      Author response image 1.

      The new electron microscopy images in RUNX2 isoform II-knockdown cells. RSL3 (a ferroptosis activator) served as a positive control. Scale bar: 1 μm. The calculation and statistical analysis of mitochondrial elongation were added in Figure 3-figure supplement 2A.

      What is the oxygen consumption rate in RUNX2 KO cells?

      We have performed a new mitochondrial stress assay to analyze the oxygen consumption rate (OCR). We find that RUNX2 isoform II-knockdown can decrease OCR in OSCC cell line. This result has been added to Figure 3-figure supplement 3A and B. It is consistent with our observation of the damaged mitochondria morphology in the cells with RUNX2 isoform II knockdown.

      The increase in cell proliferation after RUNX2 overexpression in Figure 2A is not convincing, is there any differences in their migration or invasion capacity?

      We agree that overexpression of isoform II didn’t dramatically enhance OSCC cell proliferation. We consider that it may be due to the existing high level of isoform II in OSCC cells. We have performed wound-healing assay and transwell assay to analyze the migration or invasion capacity of cells with RUNX2 isoform II or isoform I overexpression. We find that isoform II overexpression has no effect on the migration and invasion in OSCC cells (Figure 2-figure supplement 2). This phenomenon suggests that further increasing isoform II cannot improve the migration or invasion capacity of OSCC cells. However, isoform I overexpression suppresses the migration and invasion of cancer cells (Figure 2-figure supplement 2), indicating that the upregulation of isoform I, which is downregulated in OSCC cells, may inhibit tumorigenesis. In addition, we found that the expression level of isoform I was lower in TCGA OSCC patients than that in normal controls (Figure 1D), and patients with higher isoform I showed longer overall survival (Figure 1-figure supplement 1). These results support that isoform I may inhibit tumorigenesis in OSCC cells.

      The in vivo study shows 50% reduction in primary tumor growth after RUNX2 inhibition by shRNA in CAL 27 xenografts, but only one shRNA is shown. Is this one shRNA clone? At least 2 shRNA clones should be used.

      In this vivo primary tumor growth experiment, we used a CAL 27 stable cell line transfected with an shRNA against RUNX2 isoform II (shisoform II-1). We agree that at least two shRNAs should be used. In this revision, we perform another tumor growth experiment with the CAL 27 stably transfected with another new shRNA targeting the different region in isoform II (shisoform II-2). As with the previous experiment, CAL 27 cells stably transfected with this new shRNA also showed significantly reduced tumor growth and weight than those transfected with non-specific control shRNA in nude mice (Figure 2-figure supplement 4A-D).

      Apoptosis and necroptosis seem to be affected in the same amount as ferroptosis by HOXA10/RUNX2/PRDX2 axis. This is evident from experiments in Figure 3E, F and from Figure 6E, F and Figure 3G. Either Fer-1, Z-VAD, or Nec-1 used alone, were not able to fully restore cell proliferation to control cell level, which implies an additive effect of ferroptosis, apoptosis and necrosis. The author should verify potential additive or synergistic effect of the combination of Fer-1 and Z-VAD in these assays after si-RUNX2 in Figure 3 F and G and after si-HOX assays.

      We sincerely appreciate your valuable comments. We have performed the new assay to analyze the potential additive or synergistic effect of the combination of Fer-1 and Z-VAD after RUNX2 isoform II (si-II) or HOXA10 (si-HOX) knockdown. We find that the combination of Fer-1 and Z-VAD is more effective in rescuing the cell proliferation than Fer-1 or Z-VAD alone. (Figure 3- figure supplement 6 and Figure 6- figure supplement 4).

      What is the effect of PRDX2 or HOXA10 depletion on tumor growth?

      We have performed a new xenograft tumor formation assay in nude mice to analyze the effect of PRDX2-knockdown on tumor growth. We found that CAL 27 cells stably transfected with shRNAs against PRDX2 showed significantly reduced tumor growth and weight than those transfected with non-specific control shRNA in nude mice (Figure 4-figure supplement 2A-D). Regarding the effect of HOXA10 depletion on tumor growth, please allow us to cite a study (Guo et al., 2018) which demonstrated that HOXA10 knockout in Fadu cells (a cell line of pharyngeal squamous cell carcinoma) could inhibit tumor growth. 

      We have added these results to the section of “RUNX2 isoform II promotes the expression of PRDX2” as follows: “In line with the inhibitory effect of isoform II-knockdown on tumor growth, CAL 27 cells stably transfected with anti-PRDX2 shRNAs showed notably reduced tumor growth and weight than those transfected with non-specific control shRNA in nude mice (Figure 4-figure supplement 2A-D).”.

      Guo, L. M., Ding, G. F., Xu, W., Ge, H., Jiang, Y., Chen, X. J., & Lu, Y. (2018). MiR-135a-5p represses proliferation of HNSCC by targeting HOXA10. Cancer Biol Ther, 19(11), 973-983. doi:10.1080/15384047.2018.1450112 PMID:29580143

      What is the clinical relevance of HOXA10 in OSCC patients?

      In Figure 5-figure supplement 1B, we have showed that the expression levels of HOXA10 in TCGA OSCC patients were also significantly higher than those in normal controls. In this revision, we further find that patients with higher HOXA10 show significantly shorter overall survival in TCGA OSCC dataset (Figure 5-figure supplement 2C). In addition, we have also analyzed the expression of HOXA10 in our clinical OSCC and adjacent normal tissues, and found that HOXA10 expression level of OSCC tissues is significantly higher than that of normal controls (Figure 5-figure supplement 2A and B), which is consistent with the results from TCGA OSCC dataset.

      We have revised our writing in the result “HOXA10 is required for RUNX2 isoform II expression and cell proliferation in OSCC” as follows: “Similarly, HOXA10 expression level of our clinical OSCC tissues is significantly higher than that of adjacent normal tissues (Figure 5-figure supplement 2A and B). Moreover, TCGA OSCC patients with higher expression levels of HOXA10 showed shorter overall survival (Figure 5-figure supplement 2C).”

      Reviewing editor (Public Review):

      This paper reports the role of the Isoform II of RUNX2 in activating PRDX2 expression to suppress ferroptosis in oral squamous cell carcinoma (OSCC).

      The following major issues should be addressed.

      A major postulate of this study is the specific role of RUNX2 isoform II compared to isoform I.

      Figure 1F shows association between patient survival and Iso II expression, but nothing is shown for Iso I, this should be added, in addition the number of patients at risk in each category should be shown.

      We sincerely appreciate your valuable comments. We have added the survival curve of isoform I (exon 2.1) in the new Figure 1-figure supplement 1. In contrast to isoform II, patients with higher isoform I showed longer overall survival. The numbers of patients at risk in each category in the Figure 1F and Figure 1-figure supplement 1 are added.

      The authors test Iso I and Iso II overexpression in CAL27 or SCC-9 model cell lines. In Fig. 2A in CAL27, the overexpression of Iso II is much stronger than Iso I so it seems premature to draw any conclusions. More importantly, however, no Iso l silencing is shown in either of the cell lines nor the xenografted tumours. This is absolutely essential for the authors hypothesis and should be tested using shRNA in cells and xenografted tumours.

      Thank you for your valuable comments. We agree that the overexpression of isoform I is much stronger than isoform II in CAL 27 cells in Fig. 2A-B. We have done another repeat experiment which shows the similar overexpression of isoform II and I in Figure 2A-figure supplement 1. This repeat experiment also shows that overexpression of FLAG tagged isoform II significantly promoted the proliferation of OSCC cells. We tried our best to knockdown isoform I. However, the specific sequence of isoform I is 317 nt. We designed four anti-isoform I siRNAs, and unfortunately found that none of these siRNAs could knockdown isoform I efficiently. Please see following Author response image 2. Therefore, currently we cannot knockdown isoform I. However, we have tried the overexpression of isoform I. We find that isoform I overexpression inhibits the migration and invasion of cancer cells (Figure 2- figure supplement 2). In addition, we have shown that isoform II overexpression showed enhanced cell proliferation compared with isoform I overexpression in OSCC cells (Figure 2A). Therefore, we consider that isoform I is not essential for OSCC cell proliferation and tumorigenesis. Then, we mainly focus on isoform II in this study.  

      Author response image 2.

      The knockdown efficiency of RUNX2 isoform I (anti-isoform I, si-I-1, si-I-2, si-I-3, si-I-4) in OSCC cells were analyzed by RT-PCR, 18S rRNA served as a loading control. The sequences of siRNAs are as follows: 5’ GGCCACUUCGCUAACUUGU 3’ (si-I-1), 5’ GUUCCAAAGACUCCGGCAA 3’ (si-I-2), 5’ UGGCUGUUGUGAUGCGUAU 3’ (si-I-3), and 5’ CGGCAGUCGGCCUCAUCAA 3’ (si-I-4).

      A major conclusion of this study is that Iso II expression suppresses ferroptosis. To support this idea, the authors use the inhibitor Ferrostatin-1 (Fer -1). While Fer-1 typically does not lead to a 100% rescue, here the effect is only marginal and as shown in Figures 3F and G only marginally better than Z-VAD or Necrostatin 1. These data do not support the idea that the major cause of cell death is ferroptosis. Instead. Iso II silencing leads to cell death through different pathways. The authors should acknowledge this and rephrase the conclusion of the paper accordingly. Moreover, the authors consistently confound cell proliferation with cell death.

      We agree that RUNX2 isoform II-knockdown could also induce apoptosis. We have revised the description in the title and abstract as follow:

      Title: “RUNX2 Isoform II Protects Cancer Cells from Ferroptosis and Apoptosis by Promoting PRDX2 Expression in Oral Squamous Cell Carcinoma”.

      Abstract: “In the present study, we surprisingly find that RUNX2 isoform II is a novel ferroptosis and apoptosis suppressor. RUNX2 isoform II can bind to the promoter of peroxiredoxin-2 (PRDX2), a ferroptosis inhibitor, and activate its expression. Knockdown of RUNX2 isoform II suppresses cell proliferation in vitro and tumorigenesis in vivo in oral squamous cell carcinoma (OSCC). Interestingly, homeobox A10 (HOXA10), an upstream positive regulator of RUNX2 isoform II, is required for the inhibition of ferroptosis and apoptosis through the RUNX2 isoform II/PRDX2 pathway. Consistently, RUNX2 isoform II is overexpressed in OSCC, and associated with OSCC progression and poor prognosis. Collectively, OSCC cancer cells can up-regulate RUNX2 isoform II to inhibit ferroptosis and apoptosis, and facilitate tumorigenesis through the novel HOXA10/RUNX2 isoform II/PRDX2 pathway.”.

      Conclusion: “In conclusion, we identified RUNX2 isoform II as a novel ferroptosis and apoptosis inhibitor in OSCC cells by transactivating PRDX2 expression. RUNX2 isoform II plays oncogenic roles in OSCC. Moreover, we also found that HOXA10 is an upstream regulator of RUNX2 isoform II and is required for suppressing ferroptosis and apoptosis through RUNX2 isoform II and PRDX2.”.

      We apologize for confusing cell proliferation with cell death. We have checked the whole manuscript and corrected the mistakes.

      In Fig. 4A the authors investigate GPX1 expression, whereas GPX4 is often the key ferroptosis regulator, this has to be tested. This is important as the authors also test the effect of the GPX4 inhibitor RSL3, however, the authors do not determine IC<sub50</sub> values of the different cell lines with or without Iso II overexpression or silencing or compared to other RSL3 sensitive or resistant cells. Without this information, no conclusions can be drawn.

      We greatly appreciated the reviewer’s comments. We have performed new experiment to analyze the effect of isoform II on GPX4 expression. We find that isoform II knockdown decreases the expression of GPX4 mRNA and protein (Figure 4-figure supplement 1A and B), and conversely isoform II overexpression promotes GPX4 expression (Figure 4-figure supplement 1C and D), which is consistent with the inhibition of ferroptosis by RUNX2 isoform II. As an upstream positive regulator of RUNX2 isoform II, HOXA10 knockdown also inhibited the expression of GPX4 mRNA and protein (Figure 6-figure supplement 1A and B).

      We also perform new experiment to determine IC<sub50</sub> values of the cells with or without isoform II overexpression or silencing. We find that isoform II overexpression elevates the IC<sub50</sub> values of RSL3 (Figure 3-figure supplement 8A), in contrast, isoform II-knockdown decreases the IC<sub50</sub> values of RSL3 (Figure 3-figure supplement 8B).

      We have added the description of these experiments in Result “RUNX2 isoform II suppresses ferroptosis”, “RUNX2 isoform II promotes the expression of PRDX2” and “HOXA10 inhibits ferroptosis through RUNX2 isoform II” as follow:

      RUNX2 isoform II suppresses ferroptosis: “Isoform II overexpression could elevate the IC<sub50</sub> values of RSL3 (Figure 3-figure supplement 8A), in contrast, isoform II-knockdown decreased the IC<sub50</sub> values of RSL3 (Figure 3-figure supplement 8B).”.

      RUNX2 isoform II promotes the expression of PRDX2: “Firstly, we found that RUNX2 isoform II-knockdown or overexpression could downregulate or upregulate the expression of GPX4 mRNA and protein, respectively (Figure 4-figure supplement 1A-D). In addition to the GPX4, we found that PRDX2 is the most significantly down-regulated gene upon isoform II-knockdown in CAL 27 (Figure 4A).”.

      HOXA10 inhibits ferroptosis through RUNX2 isoform II: “In addition, HOXA10-knockdown could suppress the expression of GPX4 mRNA and protein (Figure 6-figure supplement 1A and B).”.

      In summary, while the authors show that RUNX2 Iso II expression enhances cell survival, the idea that cell death is principally via ferroptosis is not fully established by the data. The authors should modify their conclusions accordingly.

      We agree that RUNX2 isoform II could enhance cell survival via suppressing both ferroptosis and apoptosis. We have revised the description in the title and abstract as follow:

      Abstract: “In the present study, we surprisingly find that RUNX2 isoform II is a novel ferroptosis and apoptosis suppressor. RUNX2 isoform II can bind to the promoter of peroxiredoxin-2 (PRDX2), a ferroptosis inhibitor, and activate its expression. Knockdown of RUNX2 isoform II suppresses cell proliferation in vitro and tumorigenesis in vivo in oral squamous cell carcinoma (OSCC). Interestingly, homeobox A10 (HOXA10), an upstream positive regulator of RUNX2 isoform II, is required for the inhibition of ferroptosis and apoptosis through the RUNX2 isoform II/PRDX2 pathway. Consistently, RUNX2 isoform II is overexpressed in OSCC, and associated with OSCC progression and poor prognosis. Collectively, OSCC cancer cells can up-regulate RUNX2 isoform II to inhibit ferroptosis and apoptosis, and facilitate tumorigenesis through the novel HOXA10/RUNX2 isoform II/PRDX2 pathway.”.

      Conclusion: “In conclusion, we identified RUNX2 isoform II as a novel ferroptosis and apoptosis inhibitor in OSCC cells by transactivating PRDX2 expression. RUNX2 isoform II plays oncogenic roles in OSCC. Moreover, we also found that HOXA10 is an upstream regulator of RUNX2 isoform II and is required for suppressing ferroptosis and apoptosis through RUNX2 isoform II and PRDX2.”

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Epiney et al. use single-nuclei RNA sequencing (snRNA-seq) to characterize the lineage of Type-2 (T2) neuroblasts (NBs) in the adult Drosophila brain. To isolate cells born from T2 NBs, the authors used a genetic tool that specifically allows the permanent labeling of T2-derived cell types, which are then FAC-sorted for snRNA-seq. This effective labeling approach also allows them to compare the isolated T2 lineage cells with T1-derived cell types by a simple exclusion method. The authors begin by describing a transcriptomic atlas for all T1 and T2-derived neuronal and glia clusters, reporting that the T2-derived lineage comprises 161 neuronal clusters, in contrast to the T1 lineage which comprises 114 of them. The authors then use the expression of VAChT, VGlut, Gad1, Tbh, Ple, SerT, and Tdc2 to show that T2 neuroblasts generate all major neuron classes of fast-acting neurotransmitters. Strikingly, they show that a subset of glia and neuronal clusters have disproportionate enrichment in males or females, suggesting that T2 neuroblasts generate sex-biased cell types. The authors then proceed to characterize neuropeptide expression across T2-derived neuronal clusters and argue that the same neuropeptide can be expressed across different cell types, while similar cell types can express distinct neuropeptides. The functional implication of both observations, however, remains to be tested. Furthermore, the authors describe combinatorial transcription factor (TF) codes that are correlated with neuropeptide expression for T2-derived neurons along with an overall TF code for all T2-derived cell types, both of which will serve as an important starting point for future investigations. Finally, the authors map well-studied neuronal types of the central complex to the clusters of their T2-derived snRNA-seq dataset. They use known marker combinations, bulk RNA-seq data and highly specific split-GAL4 driver lines to annotate their T2-derived atlas, establishing a comprehensive transcriptomic atlas that would guide future studies in this field.

      Thanks for the clear and accurate summary of our findings.

      Strengths:

      This study provides an in-depth transcriptomic characterization of neurons and glia derived from Type-2 neuroblast lineages. The results of this manuscript offer several future directions to investigate the mechanisms of diversifying neuronal identity. The datasets of T1-derived and T2-derived cells will pave the way for studies focused on the functional analysis of combinatorial TF codes specifying cell identity, sex-based differences in neurogenesis and gliogenesis, the relationship between neuropeptide (co)expression and cell identity, and the differential contributions of distinct progenitor populations to the same cell type.

      Thank you for the positive comments.

      Weaknesses:

      The study presents several important observations based on the characterization of Type II neuroblast-derived lineages. However, a mechanistic insight is missing for most observations. The idea that there is a sex-specific bias to certain T2-derived neurons and glial clusters is quite interesting, however, the functional significance of this observation is not tested or discussed extensively. Finally, the authors do not show whether the combinatorial TF code is indeed necessary for neuropeptide expression or if this is just a correlation due to cell identity being defined by TFs. Functional knockdown of some candidate TFs for a subset of neuropeptide-expressing cells would have been helpful in this case.

      We agree that we do not provide mechanistic or functional insights. Our goal was to produce hypothesis generating datasets for our lab and others to use to direct functional or mechanistic studies.

      Reviewer #2 (Public review):

      In this manuscript, Epiney et al., present a single-nucleus sequencing analysis of Drosophila adult central brain neurons and glia. By employing an ingenious permanent labeling technique, they trace the progeny of T2 neuroblasts, which play a key role in the formation of the central complex. This transcriptomic dataset is poised to become a valuable resource for future research on neurogenesis, neuron morphology, and behavior.

      Thank you for the positive comments.

      The authors further delve into this dataset with several analyses, including the characterization of neurotransmitter expression profiles in T2-derived neurons. While some of the bioinformatic analyses are preliminary, they would benefit from additional experimental validation in future studies.

      Thank you for the positive comments. We too hope that future research will benefit from this dataset.

      Reviewer #1 (Recommendations for the authors):

      Major points

      (1) In Figures 1E and 4A, the T1 and T2 glia subsets reveal sub-clusters for several cell types as seen by the distribution of points on the UMAP. This observation is never validated or discussed. Do these sub-clusters represent true differences in identities or are they artifacts of the single-nucleus preparation? For Figure 1E, it is not clear whether specific sub-clusters (see Ensheathing-4 vs Ensheathing-5 and Astrocyte-2 vs. Astrocyte-6) are differentially enriched between the T1 and T2 lineages. The existence of these sub-clusters must be discussed or dismissed.  

      We agree that this needs to be addressed more clearly in the manuscript and have made text changes in the Results and Discussion sections to clarify. We note that a recent glial cell atlas (Lago-Baldaia et al., 2023: PMID: 37862379) of the developing fly VNC and optic lobes found sub-clusters that mapped to the same subtype annotations. Interestingly, Lago-Baldaia and colleagues found that the transcriptional diversity of glia cell types did not match the morphological diversity of glia validated in vivo. See text changes below.

      Lines 131-133: “Similar to a previous glial cell atlas (Lago-Baldaia et al., 2023) we found some glial subtypes (astrocytes, ensheathing, and subperineurial) mapped to multiple clusters (Figure 1E, 1F).”

      Lines 206-208: “In line with our T1+T2 atlas and previous glia cell atlas (Lago-Baldaia et al., 2023), some subtypes mapped to several subclusters including ensheathing, astrocytes, and chiasm (Figure 4A-B).”

      Lines 397-401: “Similar to a recent glial cell atlas (Lago-Baldaia et al., 2023), we found glial subtypes like astrocytes, ensheathing, and subperineurial glia mapped to several sub-clusters (Figure 1E-F). It remains unclear if these sub-clusters with the same cell type annotation represent distinct glial identities or different transcriptional states within these populations.”

      (2) The authors present evidence for sex-specific neuronal and glia subtypes and find differential expression of specific yolk proteins and long non-coding RNAs. However, whether any of these differences are driven by other canonical sex-specific genes such as Fruitless (Fru) or Double-sex (Dbx) has not been reported or discussed. The authors must re-analyze their data for these genes and claim whether they have any contribution to sex-specific sub-clusters.

      Thank you for pointing this out. We have made text changes and clarifications to highlight the expression of other canonical sex-specific genes. Fru was enriched in male nuclei as expected. Interestingly, dbx was enriched in female nuclei. It remains to be determined if these genes are mechanisms that may be driving sex-specific changes.

      Lines 224-226: “Additionally, female nuclei were enriched for dbx (Supp Table 8). Male glial nuclei expressed higher levels of genes including the male-specific genes lncRNA:rox1/2 and fru (Figure 5C; Supp Table 8) (Ryner et al., 1996; Amrein and Axel, 1997; Meller et al., 1997).”

      Lines 237-239: “Male nuclei expressed higher levels of genes including the male-specific genes lncRNA:rox1/2 and fru (Figure 5G; Supp Table 9) (Ryner et al., 1996; Amrein and Axel, 1997; Meller et al., 1997).”

      Lines 428-431:” We found the expected differential expression of yolk proteins (yp1, yp2, yp3) enriched in female nuclei and the long non-coding RNAs rox1/2 and fru enriched in male neuronal nuclei (Ryner et al., 1996; Amrein and Axel, 1997; Meller et al., 1997; Warren et al., 1979). Interestingly, we found dbx to be enriched in both glial and neuronal female nuclei.”

      Lines 433-435: “It remains to be determined if these genes are driving these sex-specific differences in glia and neurons.”

      (3) In Figure 6C, it is unclear whether the Ms-2A-LexA-expressing neurons of clusters 157 and 160 project to two different neuropils or share projects to both neuropils. However, it is not explicitly shown in the immunostaining data whether indeed there are two populations to begin with. The authors must check for cluster 157 and 160 specific markers (such as Dh44 and ple) and test whether they appear mutually exclusively in the Ms-2A-LexA-expressing neurons. The same reasoning would apply to the data shown in Figures 6D and 6E, where the authors must test whether the NPF and AstA expressing cells are indeed neurons from clusters 100 and 128, using orthogonal cluster markers to conclude that they are similar (or the same) neurons.

      We changed the focus of the paragraph to confirm that these neurons indeed come from type II and that they target the central complex. Although due to the lack of reagents we cannot test the identity of each one of these neurons, we could make meaningful interpretations of the staining to validate our ideas about neuropeptidergic cells in the central complex. We made sure to mention the limitation of our experiment to avoid any wrong conclusions.

      Minor points

      (1) Line 115 - "cluster that represents optic lobe neurons". How was this cluster identified?

      We reexamined the most significant genes enriched in this cluster 124, and found they are Rh2, ninaC, trpl, and phototransduction related genes (Supplemental table 1). We reassigned the identity of this cluster as ocelli, which also express photoreceptor genes but can’t be easily removed during dissection. We modified the text as follows:

      "We used known markers (Croset et al., 2018; Davie et al., 2018; Supp Table 2) to identify distinct cell types in the central brain, including glia, mushroom body neurons, olfactory projection neurons, clock neurons, Poxn+ neurons, serotonergic neurons, dopaminergic neurons, octopaminergic neurons, corazonergic neurons, hemocytes, and ocelli (Figure 1B, Supp. Table 1)."

      (2) As the separation in Figure 1B is not obvious, annotated cell type clusters must be re-colored instead of being labelled as the exact dots are indistinguishable. This would especially be helpful for OCTY, SER, OPN, and CLK clusters.

      (3) Cluster labels in Figure 1C are barely visible and the font size must be increased for the reader. Recoloring the cluster identities and attaching a legend would again help in this case.

      We recolored the atlas in Figure 1B, 1C and 1C’ and increased the font size in Figure 1C’.

      (4) For Figure 4A, clusters should be labelled on the UMAP along with the legend as it is difficult for the reader to match identities using Seurat colors. The same is true for the UMAPs in Figure 5A.

      Yes, we agree that labeling would improve readability and have done so for UMAPs in Figure 4A and 5A-A’’.

      Reviewer #2 (Recommendations for the authors):

      In this manuscript, Epiney et al., present a single-nucleus sequencing analysis of adult central brain neurons and glia Through the use of a ingenious permanent labeling technique, they are able to trace the progeny of T2 neuroblasts, which contribute significantly to the formation of the central complex. This transcriptomic dataset is the first of its kind and will likely serve as a valuable resource for future studies.

      The authors further explore this dataset through several analyses, including the characterization of neurotransmitter expression profiles in T2-derived neurons. However, the approach used to identify the identity of each neuron cluster could be more clearly articulated, and some of the authors' conclusions are more generalized - either already well-established or lacking sufficient support.

      Detailed comments:

      Abstract - "Our data support the hypothesis that each transcriptional cluster represents one or a few closely related neuron subtypes. - Is this a novel finding? If so, it would be helpful if the authors could explain why this is the case more clearly.

      Our results are not generally novel, and many single cell/single nuclei RNA-seq papers have been published (more citations added to Introduction). Our work is novel in that we analyze Type 1 and Type 2 neuroblasts in the central brain.

      Line 53 - In the introduction the authors should also reference other single-cell studies done in the Drosophila brain.

      Done.

      Line 59 - There are some typos here. The authors could also mention type zero.

      Both done.

      Figure 1 and Sup Table 1 - Authors show in sup table 1 the top cell markers by cluster but there is no correspondence between cluster number and identity. The authors do not say which known markers were used to give the identity to each cluster.

      We have added the cell identity in the Supplemental Table 1. For the unknown cells, we left the column blank. We have also added a Supplemental Table 2 to show the markers we used to give identity to the clusters.

      Supplementary Tables - For each table, more detailed information should be provided regarding what is being compared and the methods used for these comparisons.

      We have added the methods we used in Seurat to generate each individual table.

      Line 138 - Differential gene expression analysis between T1 and T2 glial progeny did not show differences across any glial cell types (Supp Table 4). - Was this comparison done per cluster? Is differential gene expression of top markers, which are anyway the genes that define each glial cell type, enough for this type of analysis?

      Yes, we performed the differential expression analysis using all genes (i.e., not just marker defining) at a cluster-by-cluster resolution with results in Supplemental Table 4. We have edited the text to make this clarification.

      Lines 139-141: “Differential gene expression analysis for all genes between T1 and T2 glial progeny did not show differences across any glial cell types or clusters (Supp Table 4).”

      Line 146 - We identified T1-derived neurons by excluding cells co-expressing T2-specific. Markers FLP+/GFP+/RFP+ plus repo+ glial clusters. - Bioinformatically, correct?

      Yes. We clarified the sentence as follows:

      "We identified T1-derived neurons by bioinformatically excluding cells co-expressing T2-specific markers FLP+/GFP+/RFP+ plus repo+ glial clusters."

      Line 156 - We found that each cluster strongly expressed a unique combination of genes. - As they are grouped by seurat in different clusters, why is this surprising?

      Line 175 - "top 10 significantly enriched genes gathered from each T2 neuron cluster" - can these lists be included?

      Yes they are grouped by Seurat. We toned down the sentence and refer each combination of genes as cluster markers. We modified the sentences as follows:

      Each unique combination of enriched genes could be referred to as cluster markers.

      Line 211- How did the authors identify sex-biased clusters? How did the authors separate the samples/cells by sex? Was it done bioinformatically by the expression of certain genes? If so, which?

      We collected male and female nuclei separately. We have added text in the methods section as follows:

      "Equal amounts of male and female central brains (excluding optic lobes) were dissected at room temperature within 1 hour. The samples were flash-frozen in liquid nitrogen and stored separately at -80°.

      In the first round, we pooled male and female brains together to select GFP+ nuclei and used particle-templated instant partitions to capture single nuclei to generate cDNA library (Fluent BioSciences, Waterton, MA). In the second and third rounds, RFP+ nuclei from male and female brains were collected separately. The split-pool method was then used to generate barcoded cDNA libraries from each individual nucleus."

      Are there sex-specific differences in genes in glia other than genes that were previously known to be sex-specific?

      We report the comprehensive list of sex-specific differences in gene expression for both glia and neurons in Supp tables 8 and 9.

      Line 237 - When the authors mention "We conclude that male and female adult T2 neurons have sex-specific differences in gene expression within the same neuronal subtype" does this mean that these neurons are the same in male and in female brains, but they additionally specifically express sex-specific genes?

      Yes, we report that male and females contain the same neurons defined by their transcriptional profile. It remains to be seen if this sex-specific differences changes how these same neuronal subtypes function between male and females. We have added additional text in the discussion to expand on this thought.

      Lines 437-441: “It remains to be determined if these genes are driving sex-specific differences within glial and neuronal subtypes. These genes may reflect sex-specific differences in the adult central brain and may provide insight into how behavioral circuits are linked to sex-specific behaviors. Future work should aim to characterize and test these genes.”

      Line 250 - The idea behind these sections "What is the relationship between neuropeptide expression and cluster identity?" "relation between cluster and morphology" lacks clarity. As clusters are defined based on principal component analysis, and the genes used to define a cluster are dependent on this method, there is no assumption that each cluster represents only one type of neuron or that it should include only neurons expressing the same neurotransmitter genes. Even if some clusters consist of a single neuron type, this should not be generalized to all clusters (and vice-versa).

      Correct, we cannot determine from the transcriptome data whether distinct clusters will have different morphology. We have changed the focus of the question to address that we are confirming they come from type 2 and that they target the central complex while comparing to known cells that express the neuropeptide.

      Line 265 - We first assayed the neuronal morphology of Ms+ neurons - why did the authors choose these neurons?

      Resolved in main text: “we found that type II-derived Ms-2A-LexA-expressing neurons project to multiple layers of the dorsal fan-shaped body and the entire ellipsoid body, suggesting an unknown class of Ms+ neurons targeting to EB and/orFB".

      Line 268 - "Currently we can't determine whether Ms+ neurons in clusters 157 and 160 project to different CX neuropils, or whether neurons from both clusters share projections into both neuropils. " - The purpose of this point is unclear.

      Resolved in text: “we found that type II-derived Ms-2A-LexA-expressing neurons project to multiple layers of the dorsal fan-shaped body and the entire ellipsoid body, suggesting an unknown class of Ms+ neurons targeting to EB and/or FB”.

      Line 279 - This analysis could be more explored.

      Thank you for your feedback. As the comment was somewhat broad, we were unsure of the specific revisions needed and have therefore left the text unchanged.

      Line 301 - The text regarding this section, and the description and details of respective figures should be proofread to ensure clarity.

      Done.

      Line 386 - Alternatively, co-expression may be due to background from RNAs released during dissociation. - RNA in soup could be bioinformatically analysed.

      Correct. We opted to delete this sentence since our split-pool based method does not create background RNA expression. Additionally, the analysis is performed on scaled expression >2, and any background RNA is unlikely to yield such high expression.

      Discussion - Some of the conclusions are a bit too general, suggesting that the results might be meaningful, but also acknowledging the possibility of artifacts. If the authors could refine this, it would strengthen the manuscript.

      We are sorry but we are uncertain what you are asking; we don't know what you want us to refine. Our apologies for the misunderstanding.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public Review):

      This review evaluates the SCellBOW framework, which applies phenotype algebra to obtain vectors from cancer subclusters or user-defined subclusters.

      Strengths:

      SCellBOW employs an innovative application of NLP-inspired techniques to analyze scRNA-seq data, facilitating the identification and visualization of phenotypically divergent cell subpopulations. The framework demonstrates robustness in accurately representing various cell types across multiple datasets, highlighting its versatility and utility in different biological contexts. By simulating the impact of specific malignant subpopulations on disease prognosis, SCellBOW provides valuable insights into the relative risk and aggressiveness of cancer subpopulations, which is crucial for personalized therapeutic strategies. The identification of a previously unknown and aggressive AR−/NElow subpopulation in metastatic prostate cancer underscores the potential of SCellBOW in uncovering clinically significant findings.

      Major concerns:

      The reliance on bulk RNA-seq data as a reference raises concerns about potentially misleading results due to the presence of RNA expression from immune cells in the TME. It is unclear if SCellBOW adequately addresses this issue, which could affect the accuracy of the cancer subcluster vectors.

      We appreciate the reviewer's concerns. To address the concern about potentially misleading results due to the TME when using bulk RNA-seq data as a reference:

      a. We account for systematic biases between the single-cell and bulk transcriptomics readouts by creating pseudo-bulk profiles for single-cell clusters, enabling more accurate comparisons [Section Materials and methods, Data preparation for phenotype algebra].

      b. We encode expressions into word vectors and co-embed them together. By doing this, we mitigate any possibility of systematic differences in the embedding. It is imperative that we subject both single-cell and bulk data through the same treatments because otherwise, it will be difficult to perform algebraic operations on them [Section Materials and methods, Generating vectors for phenotype algebra].

      c. In our new analysis of the tumor microenvironment, we have shown that SCellBOW effectively differentiates between malignant and non-malignant cells, confirming that it is not biased by the immune cell composition in the bulk RNA-seq data [Section SCellBOW facilitates survival-risk attribution of tumor subpopulations, Fig. 5g-h].

      The method of extracting vectors in phenotype algebra appears to be a straightforward subtraction operation. This simplicity might limit its efficiency in excluding associations with phenotypes from specific subpopulations, potentially leading to inaccurate interpretations of the data.

      Thanks for this excellent query. Vector algebra operations are not done in the gene expression space (i.e., gene expression vectors associated with tumor samples), rather we process the single cell and bulk expression profiles through multiple steps (pseudo-bulk vector generation for single cell clusters, mapping gene expression values to word frequencies as better understood by the Doc2vec neural networks etc.) to ensure their embeddings are consistent and capture intricate phenotypic information. We have demonstrated this through rigorous validation of the clusters yielded on various types of healthy and diseased samples. Furthermore, we have demonstrated the consistency of the vector algebra operations on known cancer subtypes in breast cancer, glioblastoma, and prostate cancer. We have clarified this further in text. [Section Materials and methods, ‘Generating vectors for phenotype algebra’, ‘Survival risk attribution’].

      The review would benefit from additional validation studies to assess the effectiveness of SCellBOW in distinguishing between cancerous and non-cancerous signals, particularly in heterogeneous tumor environments.

      We thank the reviewer for advising this additional validation. While our study primarily focused on signals from malignant cells, we have now considered the impact of the tumor microenvironment. We observed that the predicted risk score increases when the immune component is subtracted from the tumor, suggesting that tumor aggressiveness increases in the absence of immune components. Importantly, the aggressiveness ranking of tumor subtypes (NE > ARAL > ARAH) remained consistent, confirming that SCellBOW effectively preserves subtype-specific risk stratification [Section SCellBOW facilitates survival-risk attribution of tumor subpopulations, Fig. 5g-h].

      Further clarification on how SCellBOW handles mixed-cell populations within bulk RNA-seq data would strengthen the evaluation of its applicability and reliability in diverse research settings.

      We really appreciate the reviewer’s observation. We clarify that rather than relying on absolute gene expression values, SCellBOW maps bulk RNA-seq data into an embedding space, where we extract the latent representation of the tumor. This process effectively masks the influence of mixed-cell populations, reducing biases introduced by immune or stromal components. Furthermore, phenotype algebra operates within this embedding space by comparing cosine similarities between latent representations of bulk and pseudo-bulk datasets, rather than using direct gene expression values. This allows SCellBOW to capture biologically meaningful relationships and infer tumor-specific signals effectively, even in the presence of heterogeneous cell populations. Our benchmarking across diverse cancer types confirms its effectiveness [Section Results, ‘SCellBOW enables pseudo-grading of metastatic prostate cancer tumor microenvironment’, ‘Unsupervised risk-stratification of metastatic prostate cancer clusters using SCellBOW’].

      Reviewer #2 (Public Review):

      The authors developed a novel tool, SCellBOW, to perform cell clustering and infer survival risks on individual cancer cell clusters from the single-cell RNA seq dataset. The key ideas/techniques used in the tool include transfer learning, bag of words (BOW), and phenotype algebra which is similar to word algebra from natural language processing (NLP). Comparisons with existing methods demonstrated that SCellBOW provides superior clustering results and exhibits robust performance across a wide range of datasets. Importantly, a distinguishing feature of SCellBOW compared to other tools is its ability to assign risk scores to specific cancer cell clusters. Using SCellBOW, the authors identified a new group of prostate cancer cells characterized by a highly aggressive and dedifferentiated phenotype.

      Strengths:

      The application of natural language processing (NLP) to single-cell RNA sequencing (scRNA-seq) datasets is both smart and insightful. Encoding gene expression levels as word frequencies is a creative way to apply text analysis techniques to biological data. When combined with transfer learning, this approach enhances our ability to describe the heterogeneity of different cells, offering a novel method for understanding the biological behavior of individual cells and surpassing the capabilities of existing cell clustering methods. Moreover, the ability of the package to predict risk, particularly within cancer datasets, significantly expands the potential applications.

      Major concerns:

      Given the promising nature of this tool, it would be beneficial for the authors to test the risk-stratification functionality on other types of tumors with high heterogeneity, such as liver and pancreatic cancers, which currently lack clinically relevant and well-recognized stratification methods. Additionally, it would be worthwhile to investigate how the tool could be applied to spatial transcriptomics by analyzing cell embeddings from different layers within these tissue

      (1) We completely agree with the reviewer’s view. Our selection of glioblastoma and breast cancer for this study was primarily driven by the focus on extensively studied and well-defined cancer types. To demonstrate the effectiveness of our model, we tested it on advanced prostate cancer, which currently lacks clinically relevant and well-recognized stratification methods. This application to metastatic prostate cancer serves as a proof of concept, illustrating our model's potential to provide valuable insights into cancer types where established stratification approaches are limited or absent.

      (2) Regarding the application of our tool to spatial transcriptomics, we have already analyzed data from Digital Spatial Profiling (DSP). The article is already quite complex and involved, and we are afraid the inclusion of spatial transcriptomics may amount to a significant extension of the method. To this end, although we will discuss the future possibilities, we will skip the method validity check on spatial transcriptomics data.

      Reviewer #2 (Recommendations For The Authors):

      (1) "SCellBOW adapts the popular document-embedding model Doc2vec for single-cell latent representation learning, which can be used for downstream analysis...": Using only simple gene frequency might overlook the dependent relationships between genes, potentially compromising the biological significance. This could be discussed further.

      This is an excellent point raised by the reviewer. We acknowledge that using only simple gene frequency may overlook dependent relationships between genes, potentially compromising biological significance. To address this, we have now compared SCellBOW on the specific task of phenotype algebra and demonstrated its effectiveness in capturing meaningful biological relationships which is overlooked by simple gene frequency. We have now added the results of this comparison and showed that gene expression data alone couldn't cut it for accurate risk stratification [Section Overall discussion, Supplementary Note 7, Supplementary Fig. 8i-k].

      (2) "While existing methods effectively reveal the subpopulations, they are insufficient in associating malignant risk with specific cellular subpopulations identified from scRNA-seq data....": Perhaps I missed it in the methods section, but how does SCellBOW compare to simply performing pseudobulk analysis on separate cell clusters, treating them as bulk RNA-seq, and then associating the signatures with disease prognosis?

      This is an insightful point, and we appreciate the opportunity to clarify it.

      (1) While pseudobulk analysis on separate cell clusters, followed by associating their signatures with disease prognosis, is a common approach, SCellBOW achieves this without requiring a priori knowledge of prognostic biomarkers to determine whether a subpopulation is aggressive.

      (2) Moreover, pseudobulk analysis aggregates gene expression across cells, which can potentially mask intra-cluster heterogeneity, thereby obscuring important signatures associated with disease prognosis. In contrast, the latent representation in SCellBOW captures the semantic meaning of disease aggressiveness, allowing for a more nuanced and biologically meaningful risk assessment.

      (3) "The proposed approach, SCellBOW, can effectively capture the heterogeneity and risk associated with each phenotype, enabling the identification and assessment of malignant cell subtypes in tumors directly from scRNA-seq gene expression profiles, thereby eliminating the need for marker genes...": Have the author compared the resulting group with well-known markers and do they overlap?

      We appreciate this thoughtful question. While SCellBOW does not rely on predefined marker genes for clustering or risk stratification, we have systematically evaluated whether the resulting subpopulations align with well-known markers. To assess this, we compared SCellBOW-derived clusters with established marker-based annotations across multiple datasets. We observed a significant overlap between SCellBOW clusters and canonical marker-defined cell types in various cancers, including GBM, BRCA, and mCRPC.

      (4) "We constructed three use cases leveraging publicly available scRNA-seq datasets...": The three training and testing datasets are all from healthy tissue. How about in tumor tissue? i.e., Could SCellBOW also identify better cell clusters in tumor datasets?

      We appreciate the reviewer’s inquiry. For benchmarking and method validation, we primarily selected normal tissue datasets as they are heavily annotated and well-characterized. Our goal was to extensively evaluate SCellBOW across different clustering metrics, including ARI, NMI, and SI, which required datasets with reliable ground truth. Tumor datasets, in contrast, often lack confirmatory ground truth, making direct benchmarking more challenging. However, to assess SCellBOW’s applicability in tumor settings, we performed downstream analyses on tumor scRNA-seq datasets using phenotype algebra. Our results demonstrate that SCellBOW effectively identifies distinct cell clusters, including malignant and non-malignant populations, reinforcing its applicability in tumor settings [Section Results, ‘Unsupervised risk-stratification of metastatic prostate cancer clusters using SCellBOW’].

      Minor issues:

      (1) Labels of subplots within the manu/figure should be revised to ensure correct order (missing Figures 3a-d, 4b before 4a, etc).

      We thank the reviewer for pointing this out. We have corrected the figure labels and ensured that all subplots follow the correct order, aligning with the manuscript.

      (2) "reaffirmed the clinically known aggressiveness order, i.e., CLA >-MES >-PRO, where CLA succeeds the rest of the subtypes in aggressiveness48 (Figures 4c, d)...": "Fig. 4c, d" should be "Fig. 4e, f". Also please put Figure 4a before 4b. Overall the order of Figure 4 needs to be revised to match the order in the manu. Similar to Figure 6.

      We have corrected the figure reference to Fig. 4e, f and revised the order of Figure 4 to maintain consistency with the manuscript.

      (3) "Our results showed that SCellBOW learned latent representation of single-cells accurately captures the 'semantics' associated with cellular phenotypes and allows algebraic operations such as'+' and'-'." Figure 5f (SCellBOW performances on mCRPC) should also be cited here since Supplementary Figure 6 contains three datasets (GBM, BRCA, mCRPC) while in Figure 4 only GBM and BRCA were shown?

      We thank the reviewer for this suggestion. We have now cited Figure 5f in this section to ensure that all datasets, including mCRPC, are appropriately referenced.

      (4) Under the subheading "SCellBOW facilitates survival-risk attribution of tumor subpopulations", the lines start with "We refer to this as phenotype algebra. We utilized this ability to find an association between the embedding vectors, representing total tumor - a specific malignant cell cluster with tumor aggressiveness..." could be reduced a little bit especially the re-intro of phenotype algebra since the author has already discussed previously (under "overview of SCellBOW").

      We appreciate the feedback and have condensed this section to avoid redundancy while maintaining clarity in connecting phenotype algebra to survival-risk attribution.

      (5) "Most CD4+ T cells map to CL0 and CL9 (here, CL is used as an abbreviation for cluster) (Figure 3f)..." "(here, CL is used as an abbreviation for cluster)" this note could be moved forward to SF2 since CL is first introduced in SF2.

      We thank the reviewer for the suggestion. We have moved the definition of CL (cluster) to Supplementary Figure 2 (SF2), where it is first introduced, for improved clarity.

    1. Author response:

      We sincerely thank the editor and both reviewers for their time and thoughtful feedback on our manuscript. We have addressed several of the concerns in the responses below and are currently working on additional analyses to further strengthen the study. These results will be incorporated into the final version of the research paper.

      Reviewer #1 (Public review):

      Summary:

      The authors investigated the population structure of the invasive weed Lantana camara from 36 localities in India using 19,008 genome-wide SNPs obtained through ddRAD sequencing.

      Strengths:<br /> The manuscript is well-written, the analyses are sound, and the figures are of great quality.

      Weaknesses:

      The narrative almost completely ignores the fact that this plant is popular in horticultural trade and the different color morphs that form genetic populations are most likely the result of artificial selection by humans for certain colors for trade, and not the result of natural selfing. Although it may be possible that the genetic clustering of color morphs is maintained in the wild through selfing, there is no evidence in this study to support that. The high levels of homozygosity are more likely explained as a result of artificial selection in horticulture and relatively recent introductions in India. Therefore, the claim of the title that "the population structure.. is shaped by its mating system" is in part moot, because any population structure is in large part shaped by the mating system of the organism, but further misleading because it is much more likely artificial selection that caused the patterns observed.

      The reviewer raises the possibility that the observed genetic patterns may have originated through the selection of different varieties by the horticultural industry. While it is plausible that artificial selection can lead to the formation of distinct morphs, the presence of a strong structure between them in the wild populations cannot be explained just based on selection. In the wild, different flower colour variants frequently occur in close physical proximity and should, in principle, allow for cross-fertilization. Over time, this gene flow would be expected to erode any genetic structure shaped solely by past selection. However, our results show no evidence of such a breakdown in structure. Despite co-occurring in immediate proximity, the flower colour variants maintain distinct genetic identities. This suggests the presence of a barrier to gene flow, likely maintained by the species' mating system. Moreover, the presence of many of these flower colour morphs in the native range—as documented through observations on platforms like iNaturalist—suggests that these variants may have a natural origin rather than being solely products of horticultural selection.

      While it is plausible that horticultural breeding involved efforts to generate new varieties through crossing—resulting in the emergence of some of the observed morphs—even if this were the case, the dynamics of a self-fertilizing species would still lead to rapid genetic structuring. Following hybridization, just a few generations of selfing are sufficient to produce inbred lines, which can then maintain distinct genetic identities. As discussed in our manuscript, such inbred lines could be associated with specific flower colour morphs and persist through predominant self-fertilization. This mechanism provides a compelling explanation for the strong genetic structure observed among co-occurring flower colour variants in the wild.

      While a recent bottleneck may have increased inbreeding, the strong and consistent genetic structuring we observe within populations is more indicative of predominant self-fertilization. To further validate this, we conducted a bagging experiment on Lantana camara inflorescences to exclude insect-mediated cross-pollination. The results showed no significant difference in seed set between bagged and open-pollinated flowers, supporting the conclusion that L. camara is primarily self-fertilizing in India.

      As the reviewer rightly points out, the mating system of a species plays a crucial role in shaping patterns of genetic structure. However, in many natural populations, structuring patterns are often influenced by a combination of factors such as selection, barriers to gene flow, and genetic drift. In some cases, the mating system exerts a more prominent influence at the microgeographic level, while in others, it can shape genetic structure at broader spatial scales. What is particularly interesting in our study is that - the mating system appears to shape genetic structure at a subcontinental scale. Despite the species having undergone other evolutionary forces—such as a genetic bottleneck and expansion due to its invasive nature—the mating system exerts a more pronounced effect on the observed genetic patterns, and the influence of the mating system is remarkably strong, resulting in a clear and consistent genetic structure across populations.

      Reviewer #2 (Public review):

      Summary:

      The authors performed a series of population genetic analyses in Lantana camara using 19,008 genome-wide SNPs data from 359 individuals in India. They found a clear population structure that did not show a geographical pattern, and that flower color was rather associated with population structure. Excess of homozygosity indicates a high selfing rate, which may lead to fixation of alleles in local populations and explain the presence of population structure without a clear geographic pattern. The authors also performed a forward simulation analysis, theoretically confirming that selfing promotes fixation of alleles (higher Fst) and reduction in genetic diversity (lower heterozygosity).

      Strengths:

      Biological invasion is a critical driver of biodiversity loss, and it is important to understand how invasive species adapt to novel environments despite limited genetic diversity (genetic paradox of biological invasion). Lantana camara is one of the hundred most invasive species in the world (IUCN 2000), and the authors collected 359 plants from a wide geographical range in India, where L. camara has invaded. The scale of the dataset and the importance of the target species are the strengths of the present study.

      Weaknesses:

      One of the most critical weaknesses of this study would be that the output modelling analysis is largely qualitative, which cannot be directly comparable to the empirical data. The main findings of the SLiM-based simulation were that selfing promotes the fixation of alleles and the reduction of genetic diversity. These are theoretically well-reported knowledge, and such findings themselves are not novel, although it may have become interesting these findings are quantitatively integrated with their empirical findings in the studied species. In that sense, a coalescent-based analysis such as an Approximate Bayesian Computation method (e.g. DIY-ABC) utilizing their SNPs data would be more interesting. For example, by ABC-based methods, authors can infer the split time between subpopulations identified in this study. If such split time is older than the recorded invasion date, the result supports the scenario that multiple introductions may have contributed to the population structure of this species. In the current form of the manuscript, multiple introductions were implicated but not formally tested.

      Through our SLiM simulations, we aimed to demonstrate that a pattern of strong genetic structure within a location—similar to what we observed in Lantana camara—can arise under a predominantly self-fertilizing mating system. These simulations were not parameterized using species-specific data from Lantana but were intended as a conceptual demonstration of the plausibility of such patterns under selfing using SNP data. While the theoretical consequences of self-fertilisation have been widely discussed, relatively few studies have directly modelled these patterns using SNP data. Our SLiM simulations contribute to this gap and support the notion that the observed genetic structuring in Lantana may indeed result from predominant self-fertilisation.

      We thank the reviewer for the suggestion regarding the use of simulations based on genomic data from Lantana and for explaining the importance of it. We are currently conducting demographic simulations using genomic data from Lantana to estimate divergence times between the different flower colour variants. We believe this analysis will offer deeper insights and provide further clarity on the points raised by the reviewers.

      I also have several concerns regarding the authors' population genetic analyses. First, the authors removed SNPs that were not in Hardy-Weinberg equilibrium (HWE), but the studied populations would not satisfy the assumption of HWE, i.e., random mating, because of a high level of inbreeding. Thus, the first screening of the SNPs would be biased strongly, which may have led to spurious outputs in a series of downstream analyses.

      Hardy-Weinberg Equilibrium (HWE) filtering is a commonly used step in SNP filtering analysis to exclude loci potentially under selection, thereby enriching for neutral variants and minimizing bias in downstream analyses. To ensure that our results are not influenced by selection-driven SNPs, we conducted the analysis both with and without applying the HWE filter. Notably, the number of SNPs retained did not drop significantly after filtering, and the overall patterns observed remained consistent across both approaches.

      Second, in the genetic simulation, it is not clear how a set of parameters such as mutation rate, recombination rate, and growth rate were determined and how they are appropriate. Importantly, while authors assume the selfing rate in the simulation, selfing can also strongly influence the effective mutation rate (e.g. Nordborg & Donnelly 1997 Genetics, Nordborg 2000 Genetics). It is not clear how this effect is incorporated in the simulation.

      The aim of the SLiM simulation was to demonstrate that the extreme genetic structuring observed in Lantana camara can plausibly arise in natural systems under predominant self-fertilization. For the simulation, we used mutation and recombination rates estimated for Arabidopsis thaliana, as these parameters are currently unknown for Lantana. The details of this will be added in the revised version, and thanks to the reviewer for pointing this out. While we acknowledge that this simulation does not provide an exact representation of the species' evolutionary history, the goal of the simulation was not to produce precise estimates but rather to illustrate the feasibility of such strong genetic structuring resulting from self-fertilization alone. The impact of the selfing on the mutation rate is not incorporated in the simulations now. We will look into the details of this.

      Third, while the authors argue the association between flower color and population structure, their statistical associations were not formally tested.

      We recognize that one of the key improvements needed for the manuscript is to provide experimental evidence supporting self-fertilization. To address this, we conducted a bagging experiment on Lantana camara inflorescences to prevent insect visitation and eliminate insect-mediated cross-fertilization. The results showed no significant difference in seed set between bagged and open-pollinated inflorescences, indicating that Lantana is predominantly self-fertilizing in India. This finding is consistent with our genetic data and will be included in the revised version of the manuscript.

      Also, it is not mentioned how flower color polymorphisms are defined. Could it be possible to distinguish many flower color morphs shown in Figure 1b objectively? I am concerned particularly because the authors also mentioned that flower color may change temporally and that a single inflorescence can have flowers of different colors (L160).

      The different flower colour variants are visually distinguishable. Our classification of these variants is not based on the colour of individual flowers at a single time point, but rather on the overall colour change pattern across the inflorescence over time. In other words, the temporal aspect of colour change has been considered in our grouping. For example, in the “yellow-pink” variant, flowers begin as yellow when young and gradually turn pink as they age. Importantly, variants that follow this pattern do not transition to an orange type at any stage, which distinguishes them from other colour types. The varieties that don't change colours are named based on the single flower colour like “orange”.

    1. Author response:

      The following is the authors’ response to the original reviews

      eLife Assessment

      The authors present an algorithm and workflow for the inference of developmental trajectories from single-cell data, including a mathematical approach to increase computational efficiency. While such efforts are in principle useful, the absence of benchmarking against synthetic data and a wide range of different single-cell data sets make this study incomplete. Based on what is presented, one can neither ultimately judge if this will be an advance over previous work nor whether the approach will be of general applicability.

      We thank the eLife editor for the valuable feedback. Both benchmarking against other methods and validation on a synthetic dataset (“dyntoy”) are indeed presented in the Supplementary Note, although this was not sufficiently highlighted in the main text, which has now been improved.

      Our manuscript contains benchmarking against a challenging synthetic dataset in Figure 1; furthermore, both the synthetic dataset and the real-world thymus dataset have been analyzed in parallel using currently available TI tools (as detailed in the Supplementary Note). z other single-cell datasets (single-cell RNA-seq) were added in response to the reviewers' comments.

      One of the reviewers correctly points out that tviblindi goes against the philosophy of automated trajectory inference. This is correct; we believe that a new class of methods, complementary to fully automated approaches, is needed to explore datasets with unknown biology. tviblindi is meant to be a representative of this class of methods—a semi-automated framework that builds on features inferred from the data in an unbiased and mathematically well-founded fashion (pseudotime, homology classes, suitable low-dimensional representation), which can be used in concert with expert knowledge to generate hypotheses about the underlying dynamics at an appropriate level of detail for the particular trajectory or biological process.

      We would also like to mention that the algorithm and the workflow are not the sole results of the paper. We have thoroughly characterized human thymocyte development, where, in addition to expected biological endpoints, we found and characterized an unexpected activated thymic T-reg endpoint.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors present tviblindi, a computational workflow for trajectory inference from molecular data at single-cell resolution. The method is based on (i) pseudo-time inference via expecting hitting time, (ii) sampling of random walks in a directed acyclic k-NN where edges are oriented away from a cell of origin w.r.t. the involved nodes' expected hitting times, and (iii) clustering of the random walks via persistent homology. An extended use case on mass cytometry data shows that tviblindi can be used elucidate the biology of T cell development.

      Strengths:

      - Overall, the paper is very well written and most (but not all, see below) steps of the tviblindi algorithm are explained well.

      - The T cell biology use case is convincing (at least to me: I'm not an immunologist, only a bioinformatician with a strong interest in immunology).

      We thank the reviewer for feedback and suggestions that we will accommodate, we respond point-by-point below

      Weaknesses:

      - The main weakness of the paper is that a systematic comparison of tviblindi against other tools for trajectory inference (there are many) is entirely missing. Even though I really like the algorithmic approach underlying tviblindi, I would therefore not recommend to our wet-lab collaborators that they should use tviblindi to analyze their data. The only validation in the manuscript is the T cell development use case. Although this use case is convincing, it does not suffice for showing that the algorithms's results are systematically trustworthy and more meaningful (at least in some dimension) than trajectories inferred with one of the many existing methods.

      We have compared tviblindi to several trajectory inference methods (Supplementary note section 8.2: Comparison to state-of-the-art methods, namely Monocle3 (v1.3.1) Cao et al. (2019), Stream (v1.1) Chen et al. (2019), Palantir (v1.0.0) Setty et al. (2019), VIA (v0.1.89) Stassen et al. (2021), StaVia (Via 2.0) Stassen et al. (2024), CellRank 2 (v2.06) Weiler et al. (2024)  and PAGA (scanpy==1.9.3) Wolf et al. (2019). We added thorough and systematic comparisons to the other algorithms mentioned by reviewers. We included extended evaluation on publicly available datasets (Supplementary Note section 10).

      Also, in the meantime we have successfully used tviblindi to investigate human B-cell development in primary immunodeficiency (Bakardjieva M, et al. Tviblindi algorithm identifies branching developmental trajectories of human B-cell development and describes abnormalities in RAG-1 and WAS patients. Eur J Immunol. 2024 Dec;54(12):e2451004. doi: 10.1002/eji.202451004.).

      - The authors' explanation of the random walk clustering via persistent homology in the Results (subsection "Real-time topological interactive clustering") is not detailed enough, essentially only concept dropping. What does "sparse regions" mean here and what does it mean that "persistent homology" is used? The authors should try to better describe this step such that the reader has a chance to get an intuition how the random walk clustering actually works. This is especially important because the selection of sparse regions is done interactively. Therefore, it's crucial that the users understand how this selection affects the results. For this, the authors must manage to provide a better intuition of the maths behind clustering of random walks via persistent homology.

      In order to satisfy both reader types: the biologist and the mathematician, we explain the mathematics in detail in the Supplementary Note, section 4. We improved the Results text to better point the reader to the mathematical foundations in the Supplementary Note.  

      - To motivate their work, the authors write in the introduction that "TI methods often use multiple steps of dimensionality reduction and/or clustering, inadvertently introducing bias. The choice of hyperparameters also fixes the a priori resolution in a way that is difficult to predict." They claim that tviblindi is better than the original methods because "analysis is performed in the original high-dimensional space, avoiding artifacts of dimensionality reduction." However, in the manuscript, tviblindi is tested only on mass cytometry data which has a much lower dimensionality than scRNA-seq data for which most existing trajectory inference methods are designed. Since tviblindi works on a k-NN graph representation of the input data, it is unclear if it could be run on scRNA-seq data without prior dimensionality reduction. For this, cell-cell distances would have to be computed in the original high-dimensional space, which is problematic due to the very high dimensionality of scRNA-seq data. Of course, the authors could explicitly reduce the scope of tviblindi to data of lower dimensionality, but this would have to be stated explicitly.

      In the manuscript we tested the framework on the scRNA-seq data from Park et al 2020 (DOI: 10.1126/science.aay3224). To illustrate that tviblindi can work directly in the high-dimensional space, we applied the framework successfully on imputed 2000 dimensional data. Furthermore we successfully used tviblindi to investigate bone marrow atlas scRNA-Seq dataset Zhang et al. (2024) and atlas of mouse gastrulation Pijuan-Sala et al. (2019). The idea behind tviblindi is to be able to work without the necessity to use non-linear dimensionality reduction techniques, which reduce the dimensionality to a very low number of dimensions and whose effects on the data distribution are difficult to predict. On the other hand the use of (linear) dimensionality reduction techniques which effectively suppress noise in the data such as PCA is a good practice (see also response to reviewer 2). We have emphasized this in the revised version and added the results of the corresponding analysis (see Supplementary note, section 9).

      - Also tviblindi has at least one hyper-parameter, the number k used to construct the k-NN graphs (there are probably more hidden in the algorithm's subroutines). I did not find a systematic evaluation of the effect of this hyper-parameter.

      Detailed discussion of the topic is presented in the Supplementary Note, section 8.1, where Spearman correlation coefficient between pseudotime estimated using k=10 and k=50 nearest neighbors was 0.997.   The number k however does affect the number of candidate endpoints. But even when larger k causes spurious connection between unrelated cell fates, the topological clustering of random walks allows for the separation of different trajectories. We have expanded the “sensitivity to hyperparameters” section 8.1 also in response to reviewer 2.

      Reviewer #2 (Public Review):

      Summary:

      In Deconstructing Complexity: A Computational Topology Approach to Trajectory Inference in the Human Thymus with tviblindi, Stuchly et al. propose a new trajectory inference algorithm called tviblindi and a visualization algorithm called vaevictis for single-cell data. The paper utilizes novel and exciting ideas from computational topology coupled with random walk simulations to align single cells onto a continuum. The authors validate the utility of their approach largely using simulated data and establish known protein expression dynamics along CD4/CD8 T cell development in thymus using mass cytometry data. The authors also apply their method to track Treg development in single-cell RNA-sequencing data of human thymus.

      The technical crux of the method is as follows: The authors provide an interactive tool to align single cells along a continuum axis. The method uses expected hitting time (given a user input start cell) to obtain a pseudotime alignment of cells. The pseudotime gives an orientation/direction for each cell, which is then used to simulate random walks. The random walks are then arranged/clustered based on the sparse region in the data they navigate using persistent homology.

      We thank the reviewer for feedback and suggestions that we have accommodated, we responded point-by-point below.

      Strengths:

      The notion of using persistent homology to group random walks to identify trajectories in the data is novel.

      The strength of the method lies in the implementation details that make computationally demanding ideas such as persistent homology more tractable for large scale single-cell data. This enables the authors to make the method more user friendly and interactive allowing real-time user query with the data.

      Weaknesses:

      The interactive nature of the tool is also a weakness, by allowing for user bias leading to possible overfitting for a specific data.

      tviblindi is not designed as a fully automated TI tool (although it implements a fully automated module), but as a data driven framework for exploratory analysis of unknown data. There is always a risk of possible bias in this type of analysis - starting with experimental design, choice of hyperparameters in the downstream analysis, and an expert interpretation of the results. The successful analysis of new biological data involves a great deal of expert knowledge which is difficult to a priori include in the computational models. 

      tvilblindi tries to solve this challenge by intentionally overfitting the data and keeping the level of resolution on a single random walk. In this way we aim to capture all putative local relationships in the data. The on-demand aggregation of the walks using the global topology of the data allows researchers to use their expert knowledge to choose the right level of detail (as demonstrated in the Figure 4 of the manuscript) while relying on the topological structure of the high dimensional point cloud. At all times tviblindi allows to inspect the composition of the trajectory to assess the variance in the development, possible hubs on the KNN-graph etc.

      The main weakness of the method is lack of benchmarking the method on real data and comparison to other methods. Trajectory inference is a very crowded field with many highly successful and widely used algorithms, the two most relevant ones (closest to this manuscript) are not only not benchmarked against, but also not sited. Including those that specifically use persistent homology to discover trajectories (Rizvi et.al. published Nat Biotech 2017). Including those that specifically implement the idea of simulating random walks to identify stable states in single-cell data (e.g. CellRank published in Lange et.al Nat Meth 2022), as well as many trajectory algorithms that take alternative approaches. The paper has much less benchmarking, demonstration on real data and comparison to the very many other previous trajectory algorithms published before it. Generally speaking, in a crowded field of previously published trajectory methods, I do not think this one approach will compete well against prior work (especially due to its inability to handle the noise typical in real world data (as was even demonstrated in the little bit of application to real world data provided).

      We provided comparisons of tviblindi and vaevictis in the Supplementary Note, section 8.2, where we compare it to Monocle3 (v1.3.1) Cao et al. (2019), Stream (v1.1) Chen et al. (2019), Palantir (v1.0.0) Setty et al. (2019), VIA (v0.1.89) Stassen et al. (2021),  StaVia (Via 2.0) Stassen et al. (2024), CellRank 2 (v2.06) Weiler et al. (2024)  and PAGA (scanpy==1.9.3) Wolf et al. (2019). We added thorough and systematic comparisons to the other algorithms mentioned by reviewers. We included extended evaluation on publicly available datasets (Supplementary Note section 10).

      Beyond general lack of benchmarking there are two issues that give me particular concern. As previously mentioned, the algorithm is highly susceptible to user bias and overfitting. The paper gives the example (Figure 4) of a trajectory which mistakenly shows that cells may pass from an apoptotic phase to a different developmental stage. To circumvent this mistake, the authors propose the interactive version of tviblindi that allows users to zoom in (increase resolution) and identify that there are in fact two trajectories in one. In this case, the authors show how the author can fix a mistake when the answer is known. However, the point of trajectory inference is to discover the unknown. With so much interactive options for the user to guide the result, the method is more user/bias driven than data-driven. So a rigorous and quantitative discussion of robustness of the method, as well as how to ensure data-driven inference and avoid over-fitting would be useful.

      Local directionality in expression data is a challenge which is not, to our knowledge, solved. And we are not sure it can be solved entirely, even theoretically. The random walks passing “through” the apoptotic phase are biologically infeasible, but it is an (unbiased) representation of what the data look like based on the diffusion model. It is a property of the data (or of the panel design), which has to be interpreted properly rather than a mistake. Of note, except for Monocle3 (which does not provide the directionality) other tested methods did not discover this trajectory at all.

      The “zoom in” has in fact nothing to do with “passing through the apoptosis”. We show how the researcher can investigate the suggested trajectory to see if there is an additional structure of interest and/or relevance. This investigation is still data driven (although not fully automated). Anecdotally in this particular case this branching was discovered by a bioinformatician, who knew nothing about the presence of beta-selection in the data.  

      We show that the trajectory of apoptosis of cortical thymocytes consists of 2 trajectories corresponding to 2 different checkpoints (beta-selection and positive/negative selection). This type of a structure, where 2 (or more) trajectories share the same path for most of the time, then diverge only to be connected at a later moment (immediately from the point of view of the beta-selection failure trajectory) is a challenge for TI algorithms and none of tested methods gave a correct result. More importantly there seems to be no clear way to focus on these kinds of structures (common origin and common fate) in TI methods.

      Of note, the “zoom in” is a recommended and convenient method to look for an inner structure, but it does not necessarily mean addition of further homological classes. Indeed, in this case the reason that the structure is not visible directly is the limitation of the dendrogram complexity (only branches containing at least 10% of simulated random walks are shown by default). In summary, tviblindi effectively handled all noise in the data that obscured biologically valid trajectories for other methods. We have improved the discussion of the robustness in the current version.  

      Second, the paper discusses the benefit of tviblindi operating in the original high dimensions of the data. This is perhaps adequate for mass cytometry data where there is less of an issue of dropouts and the proteins may be chosen to be large independent. But in the context of single-cell RNA-sequencing data, the massive undersampling of mRNA, as well as high degree of noise (e.g. ambient RNA), introduces very large degree of noise so that modeling data in the original high dimensions leads to methods being fit to the noise. Therefore ALL other methods for trajectory inference work in a lower dimension, for very good reason, otherwise one is learning noise rather than signal. It would be great to have a discussion on the feasibility of the method as is for such noisy data and provide users with guidance. We note that the example scRNA-seq data included in the paper is denoised using imputation, which will likely result in the trajectory inference being oversmoothed as well.

      We agree with the reviewer. In our manuscript we wanted to showcase that tviblindi can directly operate in high-dimensional space (thousands of dimensions) and we used MAGIC imputation for this purpose. This was not ideal. More standard approach, which uses 30-50 PCs as input to the algorithm resulted in equivalent trajectories. We have added this analysis to the study (Supplementary note, section 9).

      In summary, the fact that tviblindi scales well with dimensionality of the data and is able to work in the original space does not mean that it is always the best option. We have added a corresponding comment into the Supplementary note.  

      Reviewer #3 (Public Review):

      Summary:

      Stuchly et al. proposed a single-cell trajectory inference tool, tviblindi, which was built on a sequential implementation of the k-nearest neighbor graph, random walk, persistent homology and clustering, and interactive visualization. The paper was organized around the detailed illustration of the usage and interpretation of results through the human thymus system.

      Strengths:

      Overall, I found the paper and method to be practical and needed in the field. Especially the in-depth, step-by-step demonstration of the application of tviblindi in numerous T cell development trajectories and how to interpret and validate the findings can be a template for many basic science and disease-related studies. The videos are also very helpful in showcasing how the tool works.

      Weaknesses:

      I only have a few minor suggestions that hopefully can make the paper easier to follow and the advantage of the method to be more convincing.

      (1) The "Computational method for the TI and interrogation - tviblindi" subsection under the Results is a little hard to follow without having a thorough understanding of the tviblindi algorithm procedures. I would suggest that the authors discuss the uniqueness and advantages of the tool after the detailed introduction of the method (moving it after the "Connectome - a fully automated pipeline".

      We thank the reviewer for the suggestion and we have accommodated it to improve readability of the text.

      Also, considering it is a computational tool paper, inevitably, readers are curious about how it functions compared to other popular trajectory inference approaches. I did not find any formal discussion until almost the end of the supplementary note (even that is not cited anywhere in the main text). Authors may consider improving the summary of the advantages of tviblindi by incorporating concrete quantitative comparisons with other trajectory tools.

      We provided comparisons of tviblindi and vaevictis in the Supplementary Note, section 8.2, where we compare it to Monocle3 (v1.3.1) Cao et al. (2019), Stream (v1.1) Chen et al. (2019), Palantir (v1.0.0) Setty et al. (2019), VIA (v0.1.89) Stassen et al. (2021),  StaVia (Via 2.0) Stassen et al. (2024), CellRank 2 (v2.06) Weiler et al. (2024)  and PAGA (scanpy==1.9.3) Wolf et al. (2019). We added thorough and systematic comparisons to the other algorithms mentioned by reviewers. We included extended evaluation on publicly available datasets (Supplementary Note section 10).

      (2) Regarding the discussion in Figure 4 the trajectory goes through the apoptotic stage and reconnects back to the canonical trajectory with counterintuitive directionality, it can be a checkpoint as authors interpret using their expert knowledge, or maybe a false discovery of the tool. Maybe authors can consider running other algorithms on those cells and see which tracks they identify and if the directionality matches with the tviblindi.

      We have indeed used the thymus dataset for comparison of all TI algorithms listed above. Except for Monocle 3 they failed to discover the negative selection branch (Monocle 3 does not offer directionality information). Therefore, a valid topological trajectory with incorrect (expert-corrected) directionality was partly or entirely missed by other algorithms. 

      (3) The paper mainly focused on mass cytometry data and had a brief discussion on scRNA-seq. Can the tool be applied to multimodality data such as CITE-seq data that have both protein markers and gene expression? Any suggestions if users want to adapt to scATAC-seq or other epigenomic data?

      The analysis of multimodal data is the logical next step and is the topic of our current research. At this moment tviblindi cannot be applied directly to multimodal data. It is possible to use the KNN-graph based on multimodal data (such as weighted nearest neighbor graph implemented in Seurat) for pseudotime calculation and random walk simulation. However, we do not have a fully developed triangulation for the multimodal case yet. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Suggestions for improved or additional experiments, data or analyses:

      -  Benchmark against existing trajectory inference methods.

      -  Benchmark on scRNA-seq data or an explicit statement that, unlike existing methods, tviblindi is not designed for such data.

      We provided comparisons of tviblindi and vaevictis in the Supplementary Note, section 8.2, where we compare it to Monocle3 (v1.3.1) Cao et al. (2019), Stream (v1.1) Chen et al. (2019), Palantir (v1.0.0) Setty et al. (2019), VIA (v0.1.89) Stassen et al. (2021),  StaVia (Via 2.0) Stassen et al. (2024), CellRank 2 (v2.06) Weiler et al. (2024)  and PAGA (scanpy==1.9.3) Wolf et al. (2019). We added thorough and systematic comparisons to the other algorithms mentioned by reviewers. We included extended evaluation on publicly available datasets (Supplementary Note section 10).

      -  Systematic evaluation of the effetcs of hyper-parameters on the performance of tviblindi (as mentioned above, there is at least one hyper-parameter, the number k to construct the k-NN graphs).

      This is described in Supplementary Note section 8.1

      Recommendations for improving the writing and presentation:

      -  The GitHub link to the algorithm which is currently hidden in the Methods should be moved to the abstract and/or a dedicated section on code availability.

      -  The presentation of the persistent homology approach used for random walk clustering should be improved (see public comment above).

      This is described extensively in Supplementary Note  

      -  A very minor point (can be ignored by the authors): consider renaming the algorithm. At least for me, it's extremely difficult to remember.

      We choose to keep the original name

      Minor corrections to the text and figures:

      -  Labels and legend texts are too small in almost all figures.

      Reviewer #2 (Recommendations For The Authors):  

      (1) On page 3: "(2) Analysis is performed in the original high-dimensional space avoiding artifacts of dimensionality reduction." In mass cytometry data where there is no issue of dropouts, one may choose proteins such that they are not correlated with each other making dimensionality reduction techniques less relevant. But in the context of an unbiased assays such as single-cell RNA-sequencing (scRNA-seq), one measures all the genes in a cell so dimensionality reduction can help resolve the redundancy in the feature space due to correlated/co-regulated gene expression patterns. This assumption forms the basis of most methods in scRNA-seq. More importantly, in scRNA-seq data the dropouts and ambient molecules in mRNA counts result in so much noise that modeling cells in the full gene expression is highly problematic. So the authors are requested to discuss in detail how they would propose to deal with noise in scRNA-seq data.

      On this note, the authors mention in Supplementary Note 9 (Analysis of human thymus single-cell RNA-seq data): "Imputed data are used as the input for the trajectory inference, scaled counts (no imputation) are shown in line plots". The line plots indicate the gene expression trends along the obtained pseudotime. The authors use MAGIC to impute the data, and we request the authors to mention this in the Methods section (currently one must look through the code on Supplementary Note 1.3 to find this). Data imputation in single-cell RNA-seq data are intended to enable quantification of individual gene expression distribution or pairwise gene associations. But when all the genes in an imputed data are used for visualization, clustering or trajectory inference, the averaging effect will compound and result in severely smoothed data that misses important differences between cell states. Especially, in the case of MAGIC, which uses a transition matrix raised to a power, it is over-smoothing of the data to use a transition matrix smoothed data to obtain another transition matrix to calculate the hitting time (or simulate random walks). Second, the authors' proposal to use scaled counts to study gene trends cannot be generalized to other settings due to drop out issue. Given the few genes (and only one branch) that are highlighted in Figure 7D-G and Figure 31 in Supplementary Note, it is hard to say if scaling raw values would pick up meaningful biology robustly here for other branches.

      We recommend that this data be reanalyzed with non-imputed data used for trajectory inference and imputed gene expression used for line plots.

      As stated above in the public review, we reanalyzed the scRNA Seq data using a more standard approach (first 50 principal components). We have also analyzed two additional scRNA Seq datasets (Section 1 and section 10 of Supplementary Note)

      On the same note, the authors use Seurat's CellCycleScoring to obtain the cell cycle phase of each cell and later use ScaleData to regress them out. While we agree that it is valuable to remove cell cycle effect from the data for trajectory inference (and has been used previously in other methods), the regression approach employed in Seurat's ScaleData is not appropriate. It is an aggressive approach that severely changes expression pattern of many genes and can result in new artifacts (false positives) in the data. We recommend the authors to explore this more and consider using a more principled alternatives such as fscLVM (https://genomebiology.biomedcentral.com/articles/10.1186/s13059-017-1334-8). 

      Cell cycle correction is an open problem (Heumos, Nat Rev Genetics, 2023)

      Here we use an (arguably aggressive) approach to make the presentation more straightforward. The cells we are interested here (end #6) are not dividing and the regression does not change the conclusion drawn in the paper

      (2) The figures provided are extremely low in resolution that it is practically impossible to correctly interpret a lot of the conclusion and references made in the figure (especially Figure 3 in the main text).

      Resolution of the Figures was improved

      (3) There are many aspects of the method that enable easy user biases and can lead to substantial overfitting of the data.

      a. On page 7: "The topology of the point cloud representing human T-cell development is more complex ... and does not offer a clear cutoff for the choice of significant sparse regions. Interactive selection allows the user to vary the resolution and to investigate specific sparse regions in the data iteratively." This implies that the method enables user biases to be introduced into the data analysis. While perhaps useful for exploration, quantitative trajectory assessment using such approach can be faulty when the user (A) may not know the underlying dynamics (B) forces preconceived notion of trajectory.

      The authors should consider making the trajectory inference approach less dependent on interactive user input and show that the trajectory results are robust to any choices the user may make. It may also help if the authors provide an effective guide and mention clearly what issues could result due to the use of such thresholds.

      As explained in the response in public reviews, tviblindi is not designed as a fully automated TI tool, but as a data driven framework for exploratory analysis of unknown data. 

      There is always a risk of possible bias in this type of analysis - starting with experimental design, choice of hyperparameters in the downstream analysis, and an expert interpretation of the results. The successful analysis of new biological data involves a great deal of expert knowledge which is difficult to a priori include in the computational models.  To specifically address the points raised by the reviewer:

      “(A) may not know the underlying dynamics” - tviblindi is designed to perform exploratory analysis of the unknown underlying dynamics. We showcase in the study how this can be performed and we highlight possible cases which can be resolved expertly (spurious connections (doublets), different scales of resolution (beta selection)). Crucially, compared to other TI methods, tviblindi offers a clear mechanism on how to discover, focus and resolve these issues which would (and do) contaminate the trajectories discovered fully automatically by tested methods (cf. the beta selection, or the development of plasmacytoid dendritic cells (PDCs) (Supplementary note, section 10.1).

      “(B) forces preconceived notion of trajectory” - user interaction in tviblindi does not force a preconceived notion of the trajectory. The random walks are simulated before the interactive step in an unbiased manner. During the interactive step the user adjusts trajectory specific resolution - incorrect choice of the resolution may result in either merging distinct trajectories into one or over separating the trajectories (which is arguably much less serious). However the interactive step is designed to deal with exactly this kind of challenge. We showcase (e.g. beta selection, or PDCs development) how to address the issue - tviblindi allows us to investigate deeper structure in any considered trajectory.

      Thus, tviblindi represents a new class of methods that is complementary to fully automated trajectory inference tools. It offers a semi-automated tool that leverages features derived from data in an unbiased and mathematically rigorous manner, including pseudotime, homology classes, and appropriate low-dimensional representations. These can be integrated with expert knowledge to formulate hypotheses regarding the underlying dynamics, tailored to the specific trajectory or biological process under investigation.

      b. In Figure 4, the authors discuss the trajectory of cells emanating from CD3 negative double positive stage and entering apoptotic phase and mention tviblindi may give "the false impression that cells may pass through an apoptotic phase into a later developmental stage" and propose that the interactive version of tviblindi can help user zoom into (increase resolution) this phenomenon and identify that there are in fact two trajectories in one. Given this, how do the other trajectories in the data change if a user manually adjusts the resolution? A quantification of the robustness is important. Also, it appears that a more careful data clean up could avoid such pitfalls where the algorithm infers trajectory based on mixed phenotype and the user would not have to manually adjust the resolution to obtain clear biological conclusion. We not that the original publication of this data did such "data clean up" using simple diffusion map based dimensionality reduction which the authors boast they avoid. There is a reason for this dimensionality reduction (distinguishing signal from noise), even in CyTOF data, let alone its importance in single cell data.

      The reviewer is concerned about two different, but intertwined issues we wish to untangle here. First, data clean-up is typically done on the premise that dead cells are irrelevant and they are a source of false signals. In the case of the thymocytes in the human thymus this premise is not true. Apoptotic cells are a legitimate (actually dominant) fate of the development and thus need to be represented in the TI dataset. Their biological behavior is however complex as they stop expressing proteins and thus lose their surface markers gradually, as dictated by the particular protein degradation kinetics. So can we clean up dead and dying cells better? Yes, but we don't want to do it since we would lose cells we want to analyze. Second, do trajectories change when we zoom into the data? No, only the level of detail presented visually changes. Since we calculate 5000 trajectories in the dataset, we need to aggregate them already for the hierarchical clustering visualization. Note that Figure 4, panel A highlights 159 trajectories selected in V. group. Zooming in means that the hierarchy of trajectories within V. group is revealed (panel D, groups V.a and Vb.) and can be interpreted on the vaevictis and lineplot graphs (panel E, F). 

      c. In the discussion, the authors write "[tviblindi] allows the selection and grouping of similar random walks into trajectories based on visual interaction with the data". This counters the idea of automated trajectory inference and can lead to severe overfitting.

      As explained in reply to Q3, our aim was NOT to create a fully automated trajectory inference tool. Even more, in our experience we realized that all current tools are taking this fully  automated approach with a search for an “ideal” set of hyperparameters. This, in our experience,  leads to a “blackbox” tool that is difficult to interpret for the expert in the biological field. To respond to this need we designed a modular approach where the results of the TI are presented and the expert can interact with them to focus the visualization and to derive interpretation. Our interactive concept is based on 15 years of experience with the data analysis in flow cytometry, where neither manual gating nor full automation is the ultimate solution but smart integration of both approaches eventually wins the game.

      Thus, tviblindi represents a new class of methods that is complementary to fully automated trajectory inference tools.  It offers a semi-automated tool that leverages features derived from data in an unbiased and mathematically rigorous manner. These features include pseudotime, homology classes, and appropriate low-dimensional representations. These features can be integrated with expert knowledge to formulate hypotheses regarding the underlying dynamics, tailored to the specific trajectory or biological process under investigation.

      d. The authors provide some comment on the robustness to the relaxation parameter for witness complex construction in Supplementary Note Section 8.1.2 but it is limited given the importance of this parameter and a more thorough investigation is recommended. We request the authors to provide concrete examples with figures of how changing alpha2 parameter leads to simplicial complexes of different sizes and an assessment of contexts in which the parameter is robust and when not (in both simulated and publicly available real data). Of note, giving the users a proper guide for parameter choice based on these examples and offering them ways to quantify robustness of their results may also be valuable.

      Section 8 in Supplementary Note was extended as requested.

      e. The authors are requested for an assessment of possible short-circuits (e.g. cells of two distantly related phenotypes that get connected erroneously in the trajectory) in the data, and how their approach based on persistent homology deals with it.

      If a short circuit results in a (spurious) alternative trajectory, the persistent homology approach allows us to distinguish it from genuine trajectories that do not follow the short circuit. This prevents contamination of the inferred evolution by erroneous connections. The ability to distinguish and separate distinct trajectories with the same fate is a major strength of this approach (e.g., the trajectory through doublets or the trajectories around checkpoints in thymocytes’ evolution).

      (4) The authors propose vaevictis as a new visualization tool and show its performance compared to the standard UMAP algorithm on a simulated data set (Figure 1 in Supplementary Notes). We recommend a more comprehensive comparison between the two algorithms on a wide array of publicly available single-cell datasets. As well as comparison to other popular dimensionality reduction approaches like force directed layouts, which are the most widely used tool specifically to visualize trajectories.

      We added Section 10 to Supplementary Note that presents multiple comparisons of this kind. It is important to note that tviblindi works independently of visualization and any preferred visualization can be used in the interactive phase (multiple visualisation methods are implemented).

      (5) In Supplementary Note 8.2, the authors compare tviblindi against the other methods. We recommend the authors to quantify the comparison or expand on their assesments in real biological data. For example, in comparison against Palantir and VIA the authors mention "... discovers candidate endpoints in the biological dataset but lacks toolbox to interrogate subtle features such as complex branching" and "fails to discover subtle features (such as Beta selection)" respectively. We recommend the authors to make these comparisons more precise or provide quantification. While the added benefit of interactive sessions of tviblindi may make it more user friendly, the way tviblindi appears to enable analysis of subtle features (e.g. Figure 1H) should be possible in Palantir or VIA as well.

      We extended the comparisons and presented them in Section 8 and 10 in Supplementary Note.  

      (6) The notion of using random walk simulations to identify terminal (and initial states) has been previously used in single-cell data (CellRank algorithm: https://www.nature.com/articles/s41592-021-01346-6). We request the authors to compare their approach to CellRank.

      We compared our algorithm to the CellRank successor CellRank 2 (see section 8.2, Supplementary Note)

      (7) The notion of using persistent homology to discover trajectories has been previously used in single cell data https://pubmed.ncbi.nlm.nih.gov/28459448/. we request a comparison to this approach

      The proposed algorithm was not able to accommodate the large datasets we used.

      scTDA (Rizvi, Camara et al. Nat. Biotechnol. 2017) has not been updated for 6 years. It is not suited for complex atlas-sized datasets both in terms of performance and utility, with its limited visualization tools. It also lacks capabilities to analyze individual trajectories.

      (8) In Figure 3B, the authors visualize the endpoints and simulated random walks using the connectome. There is no edge from start to the apoptotic cells here. It is not clear why? If they are not relevant based on random walks, can the user remove them from analysis? Same for the small group of pink cells below initial point.

      The connectome is a fully automated approach (similar to PAGA) which gives a basic overview of the data. It is not expected to be able to compete with the interactive pipeline of tviblindi for the same reasons as the fully automated methods (difficult to predict the effect of hyperparameters).

      (9) In Supplementary Figure 3, in relation to "Variants of trajectories including selection processes" the author mention that there is a spurious connection between CD4 single positive, and the doublet set of cells. The authors mention that the presence of dividing cells makes it difficult to remove the doublets. We request the authors to discuss why. For example, the authors seem to have cell cycle markers (e.g. Ki67, pH3, Cyclin) and one would think that coupled with DNA intercalator 191/193lr one could further clean-up the data. Can the authors employ alternative toolkits such as doublet detection methods?

      To address this issue, we do remove doublets with illegitimate cell barcodes (e.g. we remove any two cells from two samples with different barcode which present with double barcode). Although there are computational doublet removal approaches for mass cytometry (Bagwell, Cytometry A 2020), mostly applied to peripheral blood samples (where cell division is not present under steady state immune system conditions), these are however not well suited for situations where dividing samples occur (Rybakowska P, Comput Struct Biotechnol J. 2021), which is the case of our thymocyte samples. Furthermore, there are other situations where doublet formation is not an accident, but rather a biological response (Burel JG, Cytometry A (2020). Thus, the doublet cell problem is similar to the apoptotic cell problem discussed earlier.

      We could remove cells with the double DNA signal, but this would remove not only accidental doublets but also the legitimate (dividing) cells. So the question is how to remove the illegitimate doublets but not the legitimate?

      Of note, the trajectory going through doublets does not affect the interpretation of other trajectories as it is readily discriminated by persistent homology and thus random walks passing through this (spurious) trajectory do not contaminate the markers’ evolution inferred for legitimate trajectories.

      We therefore prefer to remove only the barcode illegitimate and keep all others in analysis, using the expert analysis step also to identify (using the cell cycle markers plus other features) the artificially formed doublets and thus spurious connections.

      (10) The authors should discuss how the gene expression trend plots are made (e.g. how are the expression averaged? Rolling mean?).

      The development of those markers is shown as a line plot connecting the average values of a specific marker within a pseudotime segment. By default, the pseudotime values are divided into uniform segments (each containing the same number of points) whose number can be changed in the GUI. To focus on either early or late stages of the development, the segment division can be adjusted in GUI. See section 6 of the Supplementary Note.

      Reviewer #3 (Recommendations For The Authors):

      The overall figures quality needs to be improved. For example, I can barely see the text in Figure 3c.

      Resolution of the Figures was improved

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This work done by Huang et.al. revealed the complex regulatory functions and transcription network of 172 unknown transcription factors of Pseudomonas aeruginosa PAO1. The authors utilized ChIP-seq to profile TFs binding site information across the genome, demonstrating diverse regulatory relationships among them via hierarchical networks with three levels. They further constructed thirteen ternary regulatory motifs in small subs and co-association atlas with 7 core associated clusters. The study also uncovered 24 virulence-related master regulators. The pan-genome analysis uncovered both the conservation and evolution of TFs with P. aeruginosa complex and related species. Furthermore, they established a web-based database combining both existing and novel data from HT-SELEX and ChIP-seq to provide TF binding site information. This study offered valuable insights into studying transcription regulatory networks in P. aeruginosa and other microbes.

      Strengths:

      The results are presented with clarity, supported by well-organized figures and tables that not only illustrate the study's findings but also enhance the understanding of complex data patterns.

      Thank you for your valuable feedback on our paper exploring the transcription regulatory networks in P. aeruginosa.

      Weaknesses:

      The results of this manuscript are mainly presented in systematic figures and tables. Some of the results need to be discussed as an illustration how readers can utilize these datasets.

      We appreciate the valuable suggestion about enhancing the practical aspects of our manuscript. We have expanded the discussion section to include more detailed explanations of how these datasets can be utilized in practical applications. 

      Reviewer #2 (Public review):

      In this work, the authors comprehensively describe the transcriptional regulatory network of Pseudomonas aeruginosa through the analysis of transcription factor binding characteristics. They reveal the hierarchical structure of the network through ChIP-seq, categorizing transcription factors into top-, middle-, and bottom-level, and reveal a diverse set of relationships among the transcription factors. Additionally, the authors conduct a pangenome analysis across the Pseudomonas aeruginosa species complex as well as other species to study the evolution of transcription factors. Moreover, the authors present a database with new and existing data to enable the storage and search of transcription factor binding sites. The findings of this study broaden our knowledge on the transcriptome of P. aeruginosa. This study sheds light on the complex interconnections between various cellular functions that contribute to the pathogenicity of P. aeruginosa, along with the associated regulatory mechanisms. Certain findings, such as the regulatory tendencies of DNA-binding domain-types, provides valuable insights on the possible functions of uncharacterized transcription factors and new functions of those that have already been characterized. The techniques used hold great potential for discovery of transcription factor functions in understudied organisms as well.

      The study would benefit from a more clear discussion on the implications of various findings, such as binding preferences, regulatory preferences, and the link between regulatory crosstalk and virulence. Additionally, the pangenome analysis would be furthered through a discussion of the divergence of the transcription factors of P. aeruginosa PAO1 across species in relation to the findings on the hierarchical structure of the transcriptional regulatory network.

      Thank you for your positive feedback and suggestions.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Major:

      (1) It appears that many TFs are conserved among bacteria, archaebacteria, fungi, plants, and animals. Does this mean these TFs in bacterial could be the ancestors of TFs in fungi, plants, and animals? If we fetch these TFs out and build an evolutionary tree, can we visual the three kingdoms as well?

      Thank you for this comment. While many TFs are conserved across bacteria, archaea, fungi, plants, and animals, this conservation does not necessarily imply a direct ancestral relationship. Instead, it may reflect the fundamental importance of certain domains and regulatory mechanisms, which could have arisen from a common ancestral system or through convergent evolution. If we fetch TF PA2032 out to build an evolutionary tree by setting PAO1 as the root, we can visualize these kingdoms in a tree. We added this content in the revised manuscript. Please see Figure S7D and Lines 404-411.

      “The phylogenetic tree of PA2032 across bacteria, archaea, fungi, plants, and animals, with PAO1 as the root revealed that the bacterial TFs (purple) indicates a high degree of conservation within prokaryotes, suggesting a fundamental role in core regulatory processes. In contrast, eukaryotic TFs (fungi, plants, and animals) form distinct clades with longer branch lengths, indicating significant divergence and specialization during eukaryotic evolution. These findings suggest that while TF is conserved across domains of life, its functional roles and regulatory mechanisms have undergone substantial diversification in eukaryotes.”

      (2) Can the authors give an indication how could we employ the findings of this study in designing next generation of antimicrobial agents?

      Thank you for this important suggestion. We have provided this content in the discussion part. Please see Lines 481-492.

      “The extensive datasets generated in this study offer valuable insights into understanding and targeting P. aeruginosa pathogenicity. The genome-wide binding profiles can be systematically analyzed through our hierarchical regulatory network framework to decode complex virulence mechanisms. The virulence-related master regulators and core regulatory clusters identified in this study highlighted key nodes of transcriptional control. Understanding these regulatory relationships is particularly valuable for identifying targets whose modulation would significantly impact virulence while accounting for potential compensatory mechanisms. This knowledge base thus provides a foundation for developing targeted approaches to combat P. aeruginosa infections, moving beyond traditional antibiotic strategies toward more sophisticated interventions based on regulatory network manipulation.”

      Minor:

      (1) Lines 178-180: It would strengthen the discussion to include a few additional references that support the claims made in this section, providing a more comprehensive context for the readers.

      Yes. We have added more citations(1-5) (No. 1-5 in the references at the end of the rebuttal) to support the claims. Please see Line 182.

      (2) Line 198: You mention 'seven' motifs containing toggle switches, but Fig.3 actually displays eight motifs. Please revise this discrepancy to ensure consistency between the text and the figure.

      Yes. We have revised the wording to “eight”. Please see Line 200.

      (3) Figure 3A: Consider adding a diagram or legend that represents the colors associated with each DNA-binding domain (DBD) family.

      Thank you for your suggestion. The colors of DBD were aligned with the legend in Figure S3. We have added it in Figure 3A.

      Reviewer #2 (Recommendations for the authors):

      Line 21: The use of the abbreviation 'TF' should be done at the first instance of 'transcription factor'.

      Yes. We have revised it. Please see Line 21.

      Line 74: The purpose of this paragraph is slightly unclear. It is recommended that appropriate modifications are made.

      We are sorry for the confusion. The purpose of this paragraph was to introduce the major virulence pathways in P. aeruginosa and mention the important role of TRN in these pathways. We have modified it to make it clearer. Please see Lines 74-75.

      “P. aeruginosa employs diverse virulence pathways to establish successful infection, with QS being one of the major mechanisms involving the expression of many virulence genes.”

      Line 113: How were these 172 TFs selected?

      Thank you for indicating this question. In a previous study, we performed HT-SELEX to characterize the DNA-binding motifs of all TFs in P. aeruginosa PAO1, successfully identifying binding sequences for 182 TFs. To further elucidate the binding landscapes of the rest, we performed ChIP-seq on the remaining TFs (172 TFs in total with high-quality ChIP-seq libraries). Please see Lines 100-101 in the revised manuscript.

      Line 119: Defining other features, namely downstream and include Feature, would be helpful.

      Thank you for your suggestion. We have added the definition for all peak annotation in the legend. Please see Lines 569-574.

      “Annotation heatmap of all peak distribution with 6 locations: Upstream, where the peak is located entirely upstream of the gene; Downstream, where the peak is positioned completely downstream of the gene; Inside, where the peak is entirely contained within the gene body; OverlapStart, where the peak overlaps with the 5' end of the gene; OverlapEnd, where the peak overlaps with the 3' end of the gene; and IncludeFeature, where the peak completely encompasses the gene.”

      Line 129: The distribution type of AraC-type TFs is unclear - it is mentioned that AraC has a 'broad distribution', but it is later stated that it has a 'narrow distribution'.

      We are sorry for this mistake, and we have revised the example for “broad distribution”, which is Cor_CI instead of AraC. Please see Lines 132-135.

      Line 161: 'h value' here may need to be modified to 'absolute h value'.

      Yes. We have revised it. Please see Line 164.

      Line 502: "s The DNA" needs to be corrected.

      Yes. We have revised it. Please see Line 514.

      Line 515: It would be helpful to readers if the reference used for these pathways was cited.

      Yes. We have added the review reference (Shao et al, 2023) related to these pathways(6) (the 6th reference at the end of the rebuttal). Please see Line 527.

      Line 558: "Translation start site" needs to be corrected to "Transcription start site"

      The “TSS” here exactly indicated “Translation start site”.

      Line 593. "Virulent" pathways needs to be corrected to "virulence" pathways.

      Yes. We have revised it. Please see Line 609.

      Line 604: The type of categorization based on which the proportion of genes is displayed needs to be mentioned.

      Yes, we agree. We have added the type of categorization in the legend. Please see Lines 621-627.

      “Figure 6. Conservation and variability of TFs in PAO1. (A). The pie chart shows the proportions of genes categorized by their presence across P. aeruginosa strains for all genes. (B). The pie chart shows the distribution of TFs identified from PAO1 across different conservation categories. (C). The bar plot of the proportion for non-core TFs. Genes are categorized based on their presence frequency across P. aeruginosa strains: Core genes (present in 99% ~ 100% strains), Soft core genes (present in 95% ~ 99% strains), Shell genes (present in 15% ~ 95% strains), and Cloud genes (present in 0% ~ 15% strains).”

      Reference:

      (1) Liang H, Deng X, Li X, Ye Y, Wu M. 2014. Molecular mechanisms of master regulator VqsM mediating quorum-sensing and antibiotic resistance in Pseudomonas aeruginosa. Nucleic acids research 42:10307-10320.

      (2) Jones CJ, Ryder CR, Mann EE, Wozniak DJ. 2013. AmrZ modulates Pseudomonas aeruginosa biofilm architecture by directly repressing transcription of the psl operon. Journal of bacteriology 195:1637-1644.

      (3) Hickman JW, Harwood CS. 2008. Identification of FleQ from Pseudomonas aeruginosa as ac‐di‐GMP‐responsive transcription factor. Molecular microbiology 69:376-389.

      (4) Déziel E, Gopalan S, Tampakaki AP, Lépine F, Padfield KE, Saucier M, Xiao G, Rahme LG. 2005. The contribution of MvfR to Pseudomonas aeruginosa pathogenesis and quorum sensing circuitry regulation: multiple quorum sensing‐regulated genes are modulated without affecting lasRI, rhlRI or the production of N‐acyl‐L‐homoserine lactones. Molecular microbiology 55:998-1014.

      (5) Lizewski SE, Lundberg DS, Schurr MJ. 2002. The transcriptional regulator AlgR is essential for Pseudomonas aeruginosa pathogenesis. Infection and immunity 70:6083-6093.

      (6) Shao X, Yao C, Ding Y, Hu H, Qian G, He M, Deng X. 2023. The transcriptional regulators of virulence for Pseudomonas aeruginosa: Therapeutic opportunity and preventive potential of its clinical infections. Genes & Diseases 10:2049-2063.

    1. Author response:

      The following is the authors’ response to the original reviews

      eLife Assessment

      Previous studies in mammals and other vertebrates have shown that a noninvasive measure of cochlear tuning, based on the latency derived from stimulus-frequency otoacoustic emissions, provides a reasonable, and non-invasive, estimate of cochlear tuning. This valuable study confirms that finding in a new species, the budgerigar, and provides convincing support for the utility of otoacoustic estimates of cochlear tuning, a methodology previously explored primarily in mammals. The study's remaining claims of a mismatch between behavioral frequency selectivity and cochlear tuning are based on old behavioral data, and collected in an extreme frequency region at the edge of the limits of hearing. Hearing abilities are hard to measure accurately on the upper frequency edge of the hearing range, and the evidence for these claims is weak.

      We appreciate the detailed summary of our paper by the editors highlighting its strengths. As described in the following responses, we added additional evidence to the Introduction supporting that budgerigars have (1) unusual behavioral frequency tuning compared to other bird species and (2) unusual behavioral tuning results in budgerigars are not readily explainable by the audiogram. This additional background information, including Fig. 1B, substantially strengthens the claim of mismatched behavioral and neural/otoacoustic frequency tuning in budgerigars. Moreover, that the behavioral data are “old” seems not particularly relevant considering that the same behavioral methods are still widely used in animal research, as elaborated upon in the responses below. We suggest the term “previously published” to clarify the behavioral data used in our analyses.

      Reviewer #1 (Public review):

      Summary:

      In their manuscript, the authors provide compelling evidence that stimulus-frequency otoacoustic emission (SFOAE) phase-gradient delays predict the sharpness (quality factors) of auditory-nerve-fiber (ANF) frequency tuning curves in budgerigars. In contrast with mammals, neither SFOAE- nor ANF-based measures of cochlear tuning match the frequency dependence of behavioral tuning in this species of parakeet. Although the reason for the discrepant behavioral results (taken from previous studies) remains unexplained, the present data provide significant and important support for the utility of otoacoustic estimates of cochlear tuning, a methodology previously explored only in mammals.

      Strengths:

      * The OAE and ANF data appear solid and believable. (The behavioral data are taken from previous studies.)

      * No other study in birds (and only a single previous study in mammals) has combined behavioral, auditory-nerve, and otoacoustic estimates of cochlear tuning in a single species.

      * SFOAE-based estimates of cochlear tuning now avoid possible circularity and were are obtained by assuming that the tuning ratio estimated in chicken applies also to the budgerigar.

      Weaknesses:

      * In mammals, accurate prediction of neural Q_ERB from otoacoustic N_SFOAE involves the application of species-invariance of the tuning ratio combined with an attempt to compensate for possible species differences in the location of the so-called apical-basal transition (for a review, see Shera & Charaziak, Cochlear frequency tuning and otoacoustic emissions. Cold Spring Harb Perspect Med 2019; 9:pii a033498. doi: 10.1101/cshperspect.a033498; in particular, the text near Eq. 2 and the value of CFa|b).

      Despite this history, the manuscript makes no mention of the apical-basal transition, its possible role in birds, or why it was ignored in the present analysis. As but one result, the comparative discussion of the tuning ratio (paragraph beginning on lines 383) is incomplete and potentially misleading. Although the paragraph highlights differences in the tuning ratio across groups, perhaps these differences simply reflect differences in the value of CFa|b. For example, if the cochlea of the budgerigar is assumed to be entirely "apical" in character (so that CFa|b is around 7-8 kHz), then the budgerigar tuning ratios appear to align remarkably well with those previously obtained in mammals (see Shera et al 2010, Fig 9).

      We added sections on the apical-basal transition to the Results and Discussion, including how this concept might apply in budgerigars and other birds.

      * For the most part, the authors take previous behavioral results in budgerigar at face value, attributing the discrepant behavioral results to hypothesized "central specializations for the processing of masked signals". But before going down this easy road, the manuscript would be stronger if the authors discussed potential issues that might affect the reliability of the previous behavioral literature. For example, the ANF data show that thresholds rise rapidly above about 5 kHz. Might the apparent broadening of the behavioral filters arise as a consequence of off-frequency listening due to the need to increase signal levels at these frequencies? Or perhaps there are other issues. Inquiring readers would appreciate an informed discussion.

      This is a good point, also raised by reviewer 2, that declining audibility above 4 kHz could impact behavioral tuning estimates. On the other hand, other bird species with highly similar audiograms to budgerigars show conventional behavioral tuning that increases in sharpness relatively slowly and monotonically for higher frequences. Thus, the unusual pattern of behavioral tuning in budgerigars is not fully explainable by the audiogram. We added a section to the Introduction highlighting these points.

      Reviewer #2 (Public review):

      Summary:

      This manuscript describes two new sets of data involving budgerigar hearing: 1) auditory-nerve tuning curves (ANTCs), which are considered the 'gold standard' measure of cochlear tuning, and 2) stimulus-frequency otoacoustic emissions (SFOAEs), which are a more indirect measure (requiring some assumptions and transformations to infer cochlear tuning) but which are non-invasive, making them easier to obtain and suitable for use in all species, including humans. By using a tuning ratio (relating ANTC bandwidths and SFOAE delay) derived from another bird species (chicken), the authors show that the tuning estimates from the two methods are in reasonable agreement with each other over the range of hearing tested (280 Hz to 5.65 kHz for the ANTCs), and both show a slow monotonic increase in cochlear tuning quality over that range, as expected. These new results are then compared with (much) older existing behavioral estimates of frequency selectivity in the same species.

      Strengths:

      This topic is of interest, because there are some indications from the older behavioral literature that budgerigars have a region of best tuning, which the current authors refer to as an 'acoustic fovea', at around 4 kHz, but that beyond 5 kHz the tuning degrades. Earlier work has speculated that the source could be cochlear or higher (e.g., Okanoya and Dooling, 1987). The current study appears to rule out a cochlear source to this phenomenon.

      Weaknesses:

      The conclusions are rendered questionable by two major problems.

      The first problem is that the study does not provide new behavioral data, but instead relies on decades-old estimates that used techniques dating back to the 1970s, which have been found to be flawed in various ways. The behavioral techniques that have been developed more recently in the human psychophysical literature have avoided these well-documented confounds, such as nonlinear suppression effects (e.g., Houtgast, https://doi.org/10.1121/1.1913048; Shannon, https://doi.org/10.1121/1.381007; Moore, https://doi.org/10.1121/1.381752), perceptual confusion between pure-tone maskers and targets (e.g., Neff, https://doi.org/10.1121/1.393678), beats and distortion products produced by interactions between simultaneous maskers and targets (e.g., Patterson, https://doi.org/10.1121/1.380914), unjustified assumptions and empirical difficulties associated with critical band and critical ratio measures (Patterson, https://doi.org/10.1121/1.380914), and 'off-frequency listening' phenomena (O'Loughlin and Moore, https://doi.org/10.1121/1.385691). More recent studies, tailored to mimic to the extent possible the techniques used in ANTCs, have provided reasonably accurate estimates of cochlear tuning, as measured with ANTCs and SFOAEs (Shera et al., 2003, 2010; Sumner et al., 2010). No such measures yet exist in budgerigars, and this study does not provide any. So the study fails to provide valid behavioral data to support the claims made.

      We appreciate the reviewer’s efforts in summarizing and critiquing our study. We feel that the budgerigar data collected by the Dooling and Saunders labs remain essentially valid today. The methods used in these behavioral studies are rigorous and remain widely used in animal research (e.g., critical bands and ratios: Yost & Shofner, 2009; King et al., 2015; simultaneous masking: Burton et al., 2018). The methods are based on the same power-spectrum-model assumptions of auditory masking as even the most recent and elaborate human psychophysical procedures. We therefore believe that it remains highly relevant to test and report whether these methods can accurately predict cochlear tuning. More importantly, while forward-masking behavioral results are hypothesized to more accurately predict cochlear tuning humans (Shera et al., 2002; Joris et al., 2011; Sumner et al., 2018), evidence from nonhumans is controversial. For example, one study showed a closer match between forward-masking results and auditory-nerve tuning (ferret: Sumner et al., 2018), whereas several others showed a close match for simultaneous masking results (e.g., guinea pig, chinchilla, macaque; reviewed by Ruggero & Temchin, 2005; see Joris et al., 2011 for macaque auditory-nerve tuning). Moreover, forward- and simultaneous-masking results can often be equated with a simple scaling factor (e.g., Sumner et al., 2018). Given no consensus on an optimal behavioral method, and seemingly limited potential for the “wrong” method to fundamentally transform the shape of the behavioral tuning quality function, it seems reasonable to accept previously published behavioral tuning estimates as valid while also discussing limitations and remaining open to alternative interpretations. We added these points to the discussion and added clarification throughout as to the specific behavioral approaches used.

      The second, and more critical, problem can be observed by considering the frequencies at which the old behavioral data indicate a worsening of tuning. From the summary shown in the present Fig. 2, the conclusion that behavioral frequency selectivity worsens again at higher frequencies is based on four data points, all with probe frequencies between 5 and 6 kHz. Comparing this frequency range with the absolute thresholds shown in Fig. 3 (as well as from older budgerigar data) shows it to be on the steep upper edge of the hearing range. Thus, we are dealing not so much with a fovea as the point where hearing starts to end. The point that anomalous tuning measures are found at the edge of hearing in the budgerigar has been made before: Saunders et al. (1978) state in the last sentence of their paper that "the size of the CB rapidly increases above 4.0 kHz and this may be related to the fact that the behavioral audibility curve, above 4.0 kHz, loses sensitivity at the rate of 55 dB per octave."

      Hearing abilities are hard to measure accurately on the upper frequency edge of the hearing range, in humans as well as in other species. The few attempts to measure human frequency selectivity at that upper edge have resulted in quite messy data and unclear conclusions (e.g., Buus et al., 1986, https://doi.org/10.1007/978-1-4613-2247-4_37). Indeed, the only study to my knowledge to have systematically tested human frequency selectivity in the extended high frequency range (> 12 kHz) seems to suggest a substantial broadening, relative to the earlier estimates at lower frequencies, by as much as a factor of 2 in some individuals (Yasin and Plack, 2005; https://doi.org/10.1121/1.2035594) - in other words by a similar amount as suggested by the budgerigar data. The possible divergence of different measures at the extreme end of hearing could be due to any number of factors that are hard to control and calibrate, given the steep rate of threshold change, leading to uncontrolled off-frequency listening potential, the higher sound levels needed to exceed threshold, as well as contributions from middle-ear filtering. As a side note, in the original ANTC data presented in this study, there are actually very few tuning curves at or above 5 kHz, which are the ones critical to the argument being forwarded here. To my eye, all the estimates above 5 kHz in Fig. 3 fall below the trend line, potentially also in line with poorer selectivity going along with poorer sensitivity as hearing disappears beyond 6 kHz.

      This is an excellent point, also raised by reviewer 1, that declining audibility above 4 kHz could influence behavioral tuning measures. While we acknowledge this possibility, declining audibility cannot fully explain the unusual pattern of behavioral frequency tuning in budgerigars considering that other bird species with the same audiogram phenotype show conventional tuning patterns. We added these points to the Introduction and Fig. 1B. We also added clarification throughout that it is not just the shape of tuning function that is noteworthy in budgerigars, but also the extreme slope in the 1-3.5 kHz region. Behavioral tuning quality in budgerigars increases by 5.3 dB/octave in this range (i.e., nearly doubling each octave increase in frequency), vs. 1.8 dB/octave in humans, 2.5 dB/octave in ferret, 1.1 dB/octave in macaque, and 1.9 dB/octave in starling. This additional background information, including Fig. 1B, substantially strengthens the claim of mismatched behavioral and neural/otoacoustic frequency tuning in budgerigars.

      The basic question posed in the current study title and abstract seems a little convoluted (why would you expect a behavioral measure to reflect cochlear mechanics more accurately than a cochlear-based emissions measure?). A more intuitive (and likely more interesting) way of framing the question would be "What is the neural/mechanical source of a behaviorally observed acoustic fovea?" Unfortunately, this question does not lend itself to being answered in the budgerigar, as that 'fovea' turns out to be just the turning point at the end of the hearing range. There is probably a reason why no other study has referred to this as an acoustic fovea in the budgerigar.

      Overall, a safe interpretation of the data is that hearing starts to change (and becomes harder to measure) at the very upper frequency edge, and not just in budgerigars. Thus, it is difficult to draw any clear conclusions from the current work, other than that the relations between ANTC and SFOAEs estimates of tuning are consistent in budgerigar, as they are in most (all?) other species that have been tested so far.

      We removed the term fovea from the paper. See above for our argument that unusual behavioral tuning in budgerigars is not simply or fully explainable by the audiogram.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      Line 34. As far as I could tell, no other study has referred to this region in budgerigar as an acoustic fovea. Probably for good reason (see above). This wording should probably be avoided.

      We removed the term.

      Line 35. Describing 3.5-4 kHz as 'mid-frequencies' is a stretch. 4 kHz is actually the corner frequency, above which hearing degrades.

      We added a more detailed and accurate description of the tuning pattern.

      Lines 89-91. This seems a nice statement of the problem, and to my mind makes for a much better rationale for the study.

      Line 255. "mixed effect" should "mixed effects".

      We made the correction.

      Line 380. Kuhn and Saunders didn't measure high enough to detect any changes in tuning.

      We removed the reference here.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      The manuscript titled "Evolutionary and Functional Analyses Reveal a Role for the RHIM in Tuning RIPK3 Activity Across Vertebrates" by Fay et al. explores the function of RIPK gene family members across a wide range of vertebrate and invertebrate species through a combination of phylogenomics and functional studies. By overexpressing these genes in human cell lines, the authors examine their capacity to activate NF-κB and induce cell death. The methods employed are appropriate, with a thorough analysis of gene loss, positive selection, and functionality. While the study is well-executed and comprehensive, its broader relevance remains limited, appealing mainly to specialists in this specific field of research. It misses the opportunity to extract broader insights that could extend the understanding of these genes beyond evolutionary conservation, particularly by employing evolutionary approaches to explore more generalizable functions.

      Major comments:

      The main issue I encounter is distinguishing between what is novel in this study and what has been previously demonstrated. What new insights have been gained here that are of broader relevance? The discussion, which would be a good place to do so, is very speculative and has little to do with the actual results. Throughout the manuscript, there is little explanation of the study's importance beyond the fact that it was possible to conduct it. Is the evolutionary analysis being used to advance our understanding of gene function, or is the focus merely on how these genes behave across different species? The former would be exciting, while the latter feels less impactful.

      We thank the reviewer for the positive feedback. With regard to the major comment, we have now made changes throughout the revised manuscript to highlight the novel insights that emerge from our work, as well as the importance of using evolutionary and functional analyses to understand gene function. 

      Reviewer #2 (Public review):

      Summary:

      By combining bioinformatical and experimental approaches, the authors address the question of why several vertebrate lineages lack specific genes of the necroptosis pathway or those that regulate the interplay between apoptosis and necroptosis. The lack of such genes was already known from previous publications, but the current manuscript provides a more in-depth analysis and also uses experiments in human cells to address the question of the functionality of the remaining genes and pathways. A particular focus is placed on RIPK3/RIPK1 and their dual roles in inducing NFkB and/or necroptosis.

      Strengths:

      The well-documented bioinformatical analyses provide a comprehensive data basis of the presence/absence of RIP-kinases, other RHIM proteins, apoptosis signaling proteins (FADD, CASP8, CASP10), and some other genes involved in these pathways. Several of these genes are known to be missing in certain animal lineages, which raises the question of why their canonical binding partners are present in these species. By expressing several such proteins (both wildtype and mutants destroying particular interaction regions) in human cells, the authors succeed in establishing a general role of RIPK3 and RIPK1 in NFkB activation. This function appears to be better conserved and more universal than the necroptotic function of the RHIM proteins. The authors also scrutinize the importance of the kinase function and RHIM integrity for these separate functionalities.

      Weaknesses:

      A major weakness of the presented study is the experimental restriction to human HEK293 cells. There are several situations where the functionality of proteins from distant organisms (like lampreys or even mussels) in human cells is not necessarily indicative of their function in the native context. In some cases, these problems are addressed by co-expressing potential interaction partners, but not all of these experiments are really informative.

      A second weakness is that the manuscript addresses some interesting effects only superficially. By using host cells that are deleted for certain signaling components, a more focussed hypothesis could have been tested.

      Thus, while the aim of the study is mostly met, it could have been a bit more ambitious. The limited conclusions drawn by the authors are supported by convincing evidence. I have no doubts that this study will be very useful for future studies addressing the evolution of necroptosis and its regulation by NFkB and apoptosis.

      We thank the reviewer for the positive feedback. We agree that our study is limited by using HEK293 cells. However, we do not have appropriate cell lines for all species analyzed and therefore wished to use a single system to test all effects. As the reviewer points out, we do  co-express when possible, and are careful in the manuscript to not overextend our conclusions. We, like the reviewer, believe that many of the intriguingly findings in this study, which was intended to cover a broad range of species, will be useful for more in-depth studies in a given species.

      Reviewer #3 (Public review):

      This important study provides insights into the functional diversification of RIP family kinase proteins in vertebrate animals. The provided results, which combine bioinformatic and experimental analyses, will be of interest to specialists in both immunology and evolutionary biology. However, the computational part of the methodology is insufficiently covered in the paper and the experimental results would benefit from including data for additional species.

      We thank the reviewer for the positive feedback. As described below, we have now addressed the concerns about the description of the computational methods.

      (1) In the Methods section concerning gene loss analysis, the authors refer to the 'Phylogenetic analysis' section for details of RIPK sequence acquisition and alignment procedure. This section is missing from the manuscript as provided. In its absence, it is hard for the reviewer to provide relevant comments on gene presence/absence analysis.

      We have expanded the gene loss analysis methods to be more comprehensive. 

      (2) In the same section, the authors state that gene sequences were filtered and grouped based on the initial gene tree pattern (lines 448-449). How exactly did the authors filter the non-RIP kinases and other irrelevant homologs from the gene trees? Did they consider the reciprocal best (BLAST) hit approach or similar approaches for orthology inference? Did they also encounter potential pseudogenes of genes marked as missing in Figure 1C? Will the gene trees mentioned be available as supplementary files?

      We have expanded the gene loss analysis methods to be more comprehensive. 

      (3) The authors state the presence of additional RIPK2 paralog in non-therian vertebrates.

      The ramifications of this paralog loss in therians are not discussed in the text, although RIPK2 is also involved in NF-kB activation. In addition, the RIPK2B gene loss pattern is shunned from Figure 1C to Supplementary Figure 4, despite posing comparable interest to the reader.

      We are also intrigued by the RIPK2/RIPK2B data and felt it important to include our findings here, however we do not have functional data for RIPK2B at this point and feel it is better suited for a separate study. We therefore focused both the title and the main figures on RIPK3, for which we have functional data.

      (4) The authors present evidence for (repeated) positive selection in both RIPK1 and RIPK3 in bats; however, neither bat RIPK1/3 orthologs nor bat-specific RHIM tetrad variants (IQFG, IQLG) are considered in the experimental part of the work.

      We included a tetrad variant (VQFG) that is found in bats and multiple other species. We wanted to test a wide range of variant amino acids, so testing both IQFG (found only in bats) and VQFG (found in bats and multiple other diverse species) was not of high importance.

      (5) The authors present gene presence/absence patterns for zebra mussels as an outgroup of vertebrate species analyzed. From the evolutionary perspective, adding results for a closer invertebrate group, such as lancelets, tunicates, or echinoderms, would be beneficial for reconstructing the evolutionary progression of RIPK-mediated immune functions in animals.

      In our initial analyses, we searched for RIPK-like proteins in cnidarians, arthropods, nematodes, amoeba, and spiralia, with only spiralia species containing proteins with substantial homology to vertebrate RIPK1 proteins, as defined by a homologous N-terminal kinase domain and C-terminal RHIM and death domain. We have expanded this analysis to include lancelets, tunicates, and echinoderms and found several lancelet species with RIPK1 like proteins. These data have been added to the manuscript.

      (6) In the broader sense, the list of non-mammalian species included in the study is not explained or substantiated in the text. What was the rationale behind selecting lizards, turtles, and lampreys for experimental assays? Why was turtle RIPK3 but not turtle RIPK1CT protein used for functional tests? Which results do the authors expect to observe if amphibian or teleost RIPK1/3 are included in the analysis, especially those with divergent tetrad variants?

      We have added additional text to define our rationale for selecting which species were tested. 

      (7) For lamprey RIPK3, the observed NF-kB activity levels still remain lower than those of mammalian and reptilian orthologs even after catalytic tetrad modification. In the same way, switching human RIPK3 catalytic tetrad to that of lamprey does not result in NF-kB activation. What are the potential reasons for the observed difference? Does it mean that lamprey's RIPK3 functions in NF-kB activation are, at least partially, delegated to RIPK1?

      The function of lamprey RIPK3 is intriguing, albeit unknown. The reduced activation in human cells may be due to an incompatibility between lamprey RIPK3 and human NF-kB machinery, or it may not function in NF-kB at all. Considering that lamprey do not have other components of the known mammalian necroptosis pathway, it is unclear what function RIPK3 would serve in these species. It is possible lamprey may have a necroptosis pathway that is RIPK3-dependent but distinct from the mammalian pathway. It is an interesting question for future study. 

      (8) In lines 386-388, the authors state that 'only non-mammalian RIPK1CT proteins required the RHIM for maximal NF-kB activation', which is corroborated by results in Figure 4B. The authors further associate this finding with a lack of ZBP1 in the respective species (lines 388-389). However, non-squamate reptiles seem to retain ZBP1, as suggested by

      Supplementary Table 1. Given that, do the authors expect to observe RHIM-independent (maximal) NF-kB activation in turtles and crocodilians or respective RIPK1CT-transfected cells?

      While turtles and crocodiles do retain ZBP1, it is still unclear if they are able to activate ZBP1/RIPK3/MLKL-dependent necroptosis similar to mammals, especially given the divergence in the turtle ZBP1 RHIMs seen in Figure 4C. Future studies will be needed to further test our hypotheses and to continue to characterize innate immune function and evolution across a range of vertebrate species. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Minor comments:

      (1) The title is somewhat restrictive, as it only mentions RIPK3, despite the manuscript covering a broader range of RIPKs and associated proteins.

      We agree that a title that encompasses both the breadth of our study and the depth with which we analyzed RIPK3 would be ideal. However, we were unable to come up with a succinct title that conveyed both points appropriately, so opted for one that focused on our RIPK3 insights.

      (2) Several supplementary figures contain valuable information that could be incorporated into the main figures for greater clarity and emphasis.

      We agree that many interesting pieces of data are in the supplement. We felt it was important to include those data in the manuscript, but also wanted to keep the main manuscript figures as focused as possible.  

      Reviewer #2 (Recommendations for the authors):

      (1) I do not fully agree with the claim that caspase-8 is absent from fish. I briefly repeated this part of the analysis and found several fish proteins that cluster with caspase-8 rather than caspase-10 or cFLIP. From the method section, it does not really become clear how the Casp8/Casp10/cFLIP decision was made, and particularly, how cases were addressed where Genew predate the caspase-8/caspase-10 split. To name just a few examples, the authors might check uniprot:A0A444UA91, W5MXS4, or A0A8X8BKJ8 for being fish Caspase-8 candidates.

      We thank the reviewer for their critical analysis. CASP8 and CASP10 are very similar proteins in humans. We are distinguishing between the two based on vertebrate phylogeny with outgroup proteins (CASP2, CASP9, and CFLAR, see tree in Author response image 1 below) to help define the CASP8/CASP10 clade. Once we isolate CASP8/10, we build an additional tree to distinguish CASP8 and CASP10. Using this method, all fish CASP8/10-like proteins cluster with the mammalian CASP10 clade rather than the CASP8 clade, despite many fish proteins being annotated as CASP8 or CASP8-like. We do acknowledge that, because of the similarities between CASP8 and CASP10, there are likely proteins that can fall in either clade depending on which outgroups are included. To this end, we have updated our gene loss figure to only denote whether a species has no CASP8/10, a single CASP8/10 protein, or both CASP8 and CASP10. We have also updated our methods to better define how we completed our analyses. 

      Author response image 1.

      (2) While analyzing which RIPK3 protein causes cell death (lines 188ff), the underlying assumption is that the heterologous RIPK3 proteins can interact with human MLKL and activate it by phosphorylation. No attempts are being made to check if MLKL actually gets phosphorylated, and this issue is also not discussed. In Figure 2C, cell death is either measured by RIPK3 overexpression alone or by the additional overexpression of ZBP1 and MLKL. However, it is not shown if in all cases all the transfected proteins are expressed at a comparable level, or if the observed cell death might be caused by MLKL/ZBP1 overexpression alone.

      Cell death is dependent on expression of ZBP1, MLKL, and RIPK3, as shown in

      Supplementary Figure 6. We have attempted to detect phospho-MLKL via western blot. However, in these overexpression assays, we are able to detect phospho-MLKL in the presence of RIPK3 and MLKL alone, independent of activation of cell death. In fact, we see reduced phospho-MLKL and reduced expression of MLKL overall when ZBP1, MLKL, and RIPK3 are added, presumably due to cell death induced in these conditions (see blot in Author response image 2 below). We therefore felt these data were of limited use here.

      Author response image 2.

      (3) The manuscript describes a well-documented bioinformatical analysis and acknowledges the body of earlier published work on necroptosis evolution and associated gene losses. However, when discussing the RHIM-related aspects, the authors do not mention previous publications on RHIM conservation in invertebrates and even fungal proteins such as Het-S. They also fail to mention/discuss the amyloid-forming properties of RHIMs, which I consider crucial for understanding the function of RHIM-containing proteins.

      We thank the reviewer for their insight. We have added additional points on both RHIM conservation and amyloid formation.

      (4) Related to the above issue: In lines 226ff, the induction of NFkB by RIPK3 overexpression is described. While RIPK3 from other mammals requires endogenous (human) RIPK1 to be present, lizard and turtle RIPK3 do not require human RIPK1 but *do* require functional RHIMs. It is not checked (or at least discussed) if RHIM amyloid formation is required, nor if the RHIM of the heterologous RIPK3 might act through interaction with endogenous (human) RIPK3.

      We and others (PMID: 29073079) did not detect RIPK3 protein in HEK293T cells. This, combined with the requirement for exogenous RIPK3 to activate cell death, indicate that endogenous RIPK3 is not contributing to these assays. 

      (5) In lines 275ff, the authors observe that RIPK1s from other mammalian species do not require the RHIM for NFkB activation, while RIPK1 from non-mammalian species do require the RHIM. I wonder why the (in my opinion) most obvious explanation is not addressed: Maybe the mammalian RIPK1 proteins are similar enough to the human one so that they can signal on their own, while the more distant RIPK1 cannot and thus require human RIPK1 (associated via RHIMs) for NFkB activation? Since the authors used RIPK1-deficient cells in previous experiments, wouldn't it make sense to test them here, too?

      It is intriguing that the more diverged RIPK1 species require the RHIM for NF-kB signaling. In Supplementary Figure 12, we do test the mammalian and non-mammalian proteins in RIPK1 KO cells and all proteins are able to activate NF-kB. So while nonmammalian RIPK1 signaling is dependent on the RHIM, it is independent of endogenous RIPK1.  

      Minor comments:

      (1) In the legend of Figure 1, there is a typo "heat amp".

      This typo has now been corrected.

      (2) In Figure 3A, the term "FUBAR" is not explained at all.

      FUBAR has now been defined in the methods section.

      Reviewer #3 (Recommendations for the authors):

      A few typos and graph inconsistencies have been encountered in the course of the manuscript, e.g.:

      (1) Line 168: 'heat amp' -> 'heat map'.

      (2) Lines 290-291: 'known mediate' -> 'known to mediate' (?)

      We thank the reviewer for catching these mistakes. They have been corrected. 

      (3) Supplementary Figure 12: Are human RIPK1 results presented in both 'mammalian' and 'non-mammalian' parts of the figure? If so, why do human data differ between the graphs?

      Mammalian and non-mammalian data were collected in separate experiments with human RIPK1 used as a control for both. The human data shown in the two graphs represent two separate experiments.

    1. Author response:

      Reviewer #1 (Evidence, reproducibility and clarity):

      Authors has provided a mechanism by which how presence of truncated P53 can inactivate function of full length P53 protein. Authors proposed this happens by sequestration of full length P53 by truncated P53.

      In the study, performed experiments are well described.

      My area of expertise is molecular biology/gene expression, and I have tried to provide suggestions on my area of expertise. The study has been done mainly with overexpression system and I have included few comments which I can think can be helpful to understand effect of truncated P53 on endogenous wild type full length protein. Performing experiments on these lines will add value to the observation according to this reviewer.

      Major comments:

      (1) What happens to endogenous wild type full length P53 in the context of mutant/truncated isoforms, that is not clear. Using a P53 antibody which can detect endogenous wild type P53, can authors check if endogenous full length P53 protein is also aggregated as well? It is hard to differentiate if aggregation of full length P53 happens only in overexpression scenario, where lot more both of such proteins are expressed. In normal physiological condition P53 expression is usually low, tightly controlled and its expression get induced in altered cellular condition such as during DNA damage. So, it is important to understand the physiological relevance of such aggregation, which could be possible if authors could investigate effect on endogenous full length P53 following overexpression of mutant isoforms.

      Thank you very much for your insightful comments.

      (1) To address “what happens to endogenous wild-type full-length P53 in the context of mutant/truncated isoforms," we employed a human A549 cell line expressing endogenous wild-type p53 under DNA damage conditions such as an etoposide treatment(1). We choose the A549 cell line since similar to H1299, it is a lung cancer cell line (www.atcc.org). For comparison, we also transfected the cells with 2 μg of V5-tagged plasmids encoding FLp53 and its isoforms Δ133p53 and Δ160p53. As shown in Author response image 1A, lanes 1 and 2, endogenous p53 expression, remained undetectable in A549 cells despite etoposide treatment, which limits our ability to assess the effects of the isoforms on the endogenous wild-type FLp53. We could, however, detect the V5-tagged FLp53 expressed from the plasmid using anti-V5 (rabbit) as well as with antiDO-1 (mouse) antibody (Author response image 1). The latter detects both endogenous wildtype p53 and the V5-tagged FLp53 since the antibody epitope is within the Nterminus (aa 20-25). This result supports the reviewer’s comment regarding the low level of expression of endogenous p53 that is insufficient for detection in our experiments.   

      In summary, in line with the reviewer’s comment that ‘under normal physiological conditions p53 expression is usually low,’ we could not detect p53 with an anti-DO-1 antibody. Thus, we proceeded with V5/FLAG-tagged p53 for detection of the effects of the isoforms on p53 stability and function. We also found that protein expression in H1299 cells was more easily detectable than in A549 cells (Compare Author response image 1A and B). Thus, we decided to continue with the H1299 cells (p53-null), which would serve as a more suitable model system for this study.  

      (2) We agree with the reviewer that ‘It is hard to differentiate if aggregation of full-length p53 happens only in overexpression scenario’. However, it is not impossible to imagine that such aggregation of FLp53 happens under conditions when p53 and its isoforms are over-expressed in the cell. Although the exact physiological context is not known and beyond the scope of the current work, our results indicate that at higher expression, p53 isoforms drive aggregation of FLp53. Given the challenges of detecting endogenous FLp53, we had to rely on the results obtained with plasmid mediated expression of p53 and its isoforms in p53-null cells.

      Author response image 1.

      Comparative analysis of protein expression in A549 and H1299 cells. (A) A549 cells (p53 wild-type) were treated with etoposide to induce endogenous wild-type p53 expression. To assess the effects of FLp53 and its isoforms Δ133p53 and Δ160p53 on endogenous wild-type p53 aggregation, A549 cells were transfected with 2 μg of V5-tagged p53 expression plasmids, with or without etoposide (20μM for 8h) treatment. Western blot analysis was done with the anti-V5 (rabbit) to detect V5-tagged proteins and anti-DO-1 (mouse), the latter detects both endogenous wild-type p53 and V5-tagged FLp53. The merged image corresponds to the overlay between the V5 and DO1 antibody signals. (B) H1299 cells (p53-null) were transfected with 2 μg V5tagged p53 expression plasmids or the empty vector control pcDNA3.1. Western blot analysis was done with the anti-V5 (mouse) antibody. 

      (2) Can presence of mutant P53 isoforms can cause functional impairment of wild type full length endogenous P53? That could be tested as well using similar ChIP assay authors has performed, but instead of antibody against the Tagged protein if the authors could check endogenous P53 enrichment in the gene promoter such as P21 following overexpression of mutant isoforms. May be introducing a condition such as DNA damage in such experiment might help where endogenous P53 is induced and more prone to bind to P53 target such as P21.

      Thank you very much for your valuable comments and suggestions. To investigate the potential functional impairment of endogenous wild-type p53 by p53 isoforms, we initially utilized A549 cells (p53 wild-type), aiming to monitor endogenous wild-type p53 expression following DNA damage. However, as mentioned and demonstrated in Author response image 1, endogenous p53 expression was too low to be detected under these conditions, making the ChIP assay for analyzing endogenous p53 activity unfeasible. Thus, we decided to utilize plasmid-based expression of FLp53 and focus on the potential functional impairment induced by the isoforms.

      (3) On similar lines, authors described:

      "To test this hypothesis, we escalated the ratio of FLp53 to isoforms to 1:10. As expected, the activity of all four promoters decreased significantly at this ratio (Figure 4A-D). Notably, Δ160p53 showed a more potent inhibitory effect than Δ133p53 at the 1:5 ratio on all promoters except for the p21 promoter, where their impacts were similar (Figure 4E-H). However, at the 1:10 ratio, Δ133p53 and Δ160p53 had similar effects on all transactivation except for the MDM2 promoter (Figure 4E-H)."

      Again, in such assay authors used ratio 1:5 to 1:10 full length vs mutant. How authors justify this result in context (which is more relevant context) where one allele is Wild type (functional P53) and another allele is mutated (truncated, can induce aggregation). In this case one would except 1:1 ratio of full-length vs mutant protein, unless other regulation is going which induces expression of mutant isoforms more than wild type full length protein. Probably discussing on these lines might provide more physiological relevance to the observed data.

      Thank you for raising this point regarding the physiological relevance of the ratios used in our study.

      (1) In the revised manuscript (lines 193-195), we added in this direction that “The elevated Δ133p53 protein modulates p53 target genes such as miR‑34a and p21, facilitating cancer development(2, 3). To mimic conditions where isoforms are upregulated relative to FLp53, we increased the ratios to 1:5 and 1:10.” This approach aims to simulate scenarios where isoforms accumulate at higher levels than FLp53, which may be relevant in specific contexts, as also elaborated above.

      (2) Regarding the issue of protein expression, where one allele is wild-type and the other is isoform, this assumption is not valid in most contexts. First, human cells have two copies of TPp53 gene (one from each parent). Second, the TP53 gene has two distinct promoters: the proximal promoter (P1) primarily regulates FLp53 and ∆40p53, whereas the second promoter (P2) regulates ∆133p53 and ∆160p53(4, 5). Additionally, ∆133TP53 is a p53 target gene(6, 7) and the expression of Δ133p53 and FLp53 is dynamic in response to various stimuli. Third, the expression of p53 isoforms is regulated at multiple levels, including transcriptional, post-transcriptional, translational, and post-translational processing(8). Moreover, different degradation mechanisms modify the protein level of p53 isoforms and FLp53(8). These differential regulation mechanisms are regulated by various stimuli, and therefore, the 1:1 ratio of FLp53 to ∆133p53 or ∆160p53 may be valid only under certain physiological conditions. In line with this, varied expression levels of FLp53 and its isoforms, including ∆133p53 and ∆160p53, have been reported in several studies(3, 4, 9, 10). 

      (3) In our study, using the pcDNA 3.1 vector under the human cytomegalovirus (CMV) promoter, we observed moderately higher expression levels of ∆133p53 and ∆160p53 relative to FLp53 (Author response image 1B). This overexpression scenario provides a model for studying conditions where isoform accumulation might surpass physiological levels, impacting FLp53 function. By employing elevated ratios of these isoforms to FLp53, we aim to investigate the potential effects of isoform accumulation on FLp53.

      (4) Finally does this altered function of full length P53 (preferably endogenous one) in presence of truncated P53 has any phenotypic consequence on the cells (if authors choose a cell type which is having wild type functional P53). Doing assay such as apoptosis/cell cycle could help us to get this visualization.

      Thank you for your insightful comments. In the experiment with A549 cells (p53 wild-type), endogenous p53 levels were too low to be detected, even after DNA damage induction. The evaluation of the function of endogenous p53 in the presence of isoforms is hindered, as mentioned above. In the revised manuscript, we utilized H1299 cells with overexpressed proteins for apoptosis studies using the Caspase-Glo® 3/7 assay (Figure 7). This has been shown in the Results section (lines 254-269). “The Δ133p53 and Δ160p53 proteins block pro-apoptotic function of FLp53.

      One of the physiological read-outs of FLp53 is its ability to induce apoptotic cell death(11). To investigate the effects of p53 isoforms Δ133p53 and Δ160p53 on FLp53-induced apoptosis, we measured caspase-3 and -7 activities in H1299 cells expressing different p53 isoforms (Figure 7). Caspase activation is a key biochemical event in apoptosis, with the activation of effector caspases (caspase-3 and -7) ultimately leading to apoptosis(12). The caspase-3 and -7 activities induced by FLp53 expression was approximately 2.5 times higher than that of the control vector (Figure 7). Co-expression of FLp53 and the isoforms Δ133p53 or Δ160p53 at a ratio of 1: 5 significantly diminished the apoptotic activity of FLp53 (Figure 7). This result aligns well with our reporter gene assay, which demonstrated that elevated expression of Δ133p53 and Δ160p53 impaired the expression of apoptosis-inducing genes BAX and PUMA (Figure 4G and H). Moreover, a reduction in the apoptotic activity of FLp53 was observed irrespective of whether Δ133p53 or Δ160p53 protein was expressed with or without a FLAG tag (Figure 7). This result, therefore, also suggests that the FLAG tag does not affect the apoptotic activity or other physiological functions of FLp53 and its isoforms. Overall, the overexpression of p53 isoforms Δ133p53 and Δ160p53 significantly attenuates FLp53-induced apoptosis, independent of the protein tagging with the FLAG antibody epitope.”

      Referees cross-commenting

      I think the comments from the other reviewers are very much reasonable and logical.

      Especially all 3 reviewers have indicated, a better way to visualize the aggregation of full-length wild type P53 by truncated P53 (such as looking at endogenous P53# by reviewer 1, having fluorescent tag #by reviewer 2 and reviewer 3 raised concern on the FLAG tag) would add more value to the observation.

      Thank you for these comments. The endogenous p53 protein was undetectable in A549 cells induced by etoposide (Figure R1A). Therefore, we conducted experiments using FLAG/V5-tagged FLp53.  To avoid any potential side effects of the FLAG tag on p53 aggregation, we introduced untagged p53 isoforms in the H1299 cells and performed subcellular fractionation. Our revised results, consistent with previous FLAG-tagged p53 isoforms findings, demonstrate that co-expression of untagged isoforms with FLAG-tagged FLp53 significantly induced the aggregation of FLAG-FLp53, while no aggregation was observed when FLAG-tagged FLp53 was expressed alone (Supplementary Figure 6). These results clearly indicate that the FLAG tag itself does not contribute to protein aggregation. 

      Additionally, we utilized the A11 antibody to detect protein aggregation, providing additional validation (Figure 8 from Jean-Christophe Bourdon et al. Genes Dev. 2005;19:2122-2137). Given that the fluorescent proteins (~30 kDa) are substantially bigger than the tags used here (~1 kDa) and may influence oligomerization (especially GFP), stability, localization, and function of p53 and its isoforms, we avoided conducting these vital experiments with such artificial large fusions. 

      Reviewer #1 (Significance):

      The work in significant, since it points out more mechanistic insight how wild type full length P53 could be inactivated in the presence of truncated isoforms, this might offer new opportunity to recover P53 function as treatment strategies against cancer.

      Thank you for your insightful comments. We appreciate your recognition of the significance of our work in providing mechanistic insights into how wild-type FLp53 can be inactivated by truncated isoforms. We agree that these findings have potential for exploring new strategies to restore p53 function as a therapeutic approach against cancer. 

      Reviewer #2 (Evidence, reproducibility and clarity):

      The manuscript by Zhao and colleagues presents a novel and compelling study on the p53 isoforms, Δ133p53 and Δ160p53, which are associated with aggressive cancer types. The main objective of the study was to understand how these isoforms exert a dominant negative effect on full-length p53 (FLp53). The authors discovered that the Δ133p53 and Δ160p53 proteins exhibit impaired binding to p53-regulated promoters. The data suggest that the predominant mechanism driving the dominant-negative effect is the coaggregation of FLp53 with Δ133p53 and Δ160p53.

      This study is innovative, well-executed, and supported by thorough data analysis. However, the authors should address the following points:

      (1) Introduction on Aggregation and Co-aggregation: Given that the focus of the study is on the aggregation and co-aggregation of the isoforms, the introduction should include a dedicated paragraph discussing this issue. There are several original research articles and reviews that could be cited to provide context.

      Thank you very much for the valuable comments. We have added the following paragraph in the revised manuscript (lines 74-82): “Protein aggregation has become a central focus of modern biology research and has documented implications in various diseases, including cancer(13, 14, 15). Protein aggregates can be of different types ranging from amorphous aggregates to highly structured amyloid or fibrillar aggregates, each with different physiological implications. In the case of p53, whether protein aggregation, and in particular, co-aggregation with large N-terminal deletion isoforms, plays a mechanistic role in its inactivation is yet underexplored. Interestingly, the Δ133p53β isoform has been shown to aggregate in several human cancer cell lines(16). Additionally, the Δ40p53α isoform exhibits a high aggregation tendency in endometrial cancer cells(17). Although no direct evidence exists for Δ160p53 yet, these findings imply that p53 isoform aggregation may play a major role in their mechanisms of actions.”

      (2) Antibody Use for Aggregation: To strengthen the evidence for aggregation, the authors should consider using antibodies that specifically bind to aggregates.

      Thank you for your insightful suggestion. We addressed protein aggregation using the A11 antibody which specifically recognizes amyloid-like protein aggregates. We analyzed insoluble nuclear pellet samples prepared under identical conditions as described in Figure 6B. To confirm the presence of p53 proteins, we employed the anti-p53 M19 antibody (Santa Cruz, Cat No. sc-1312) to detect bands corresponding to FLp53 and its isoforms Δ133p53 and Δ160p53. The monomer FLp53 was not detected (Figure 8, lower panel, Jean-Christophe Bourdon et al. Genes Dev. 2005;19:2122-2137), which may be attributed to the lower binding affinity of the anti-p53 M19 antibody to it. These samples were also immunoprecipitated using the A11 antibody (Thermo Fischer Scientific, Cat No. AHB0052) to detect aggregated proteins. Interestingly, FLp53 and its isoforms, Δ133p53 and Δ160p53, were clearly visible with Anti-A11 antibody when co-expressed at a 1:5 ratio suggesting that they underwent co-aggregation. However, no FLp53 aggregates were observed when it was expressed alone (Author response image 2). These results support the conclusion in our manuscript that Δ133p53 and Δ160p53 drive FLp53 aggregation. 

      Author response image 2.

      Induction of FLp53 Aggregation by p53 Isoforms Δ133p53 and Δ160p53. H1299 cells transfected with the FLAG-tagged FLp53 and V5-tagged Δ133p53 or Δ160p53 at a 1:5 ratio. The cells were subjected to subcellular fractionation, and the resulting insoluble nuclear pellet was resuspended in RIPA buffer. The samples were heated at 95°C until the pellet was completely dissolved, and then analyzed by Western blotting. Immunoprecipitation was performed using the A11 antibody, which specifically recognizes amyloid protein aggregates, and the anti-p53 M19 antibody, which detects FLp53 as well as its isoforms Δ133p53 and Δ160p53. 

      (3) Fluorescence Microscopy: Live-cell fluorescence microscopy could be employed to enhance visualization by labeling FLp53 and the isoforms with different fluorescent markers (e.g., EGFP and mCherry tags).

      We appreciate the suggestion to use live-cell fluorescence microscopy with EGFP and mCherry tags for the visualization FLp53 and its isoforms. While we understand the advantages of live-cell imaging with EGFP / mCherry tags, we restrained us from doing such fusions as the GFP or corresponding protein tags are very big (~30 kDa) with respect to the p53 isoform variants (~30 kDa).  Other studies have shown that EGFP and mCherry fusions can alter protein oligomerization, solubility and aggregation(18, 19) Moreover, most fluorescence proteins are prone to dimerization (i.e. EGFP) or form obligate tetramers (DsRed)(20, 21, 22), potentially interfering with the oligomerization and aggregation properties of p53 isoforms, particularly Δ133p53 and Δ160p53.

      Instead, we utilized FLAG- or V5-tag-based immunofluorescence microscopy, a well-established and widely accepted method for visualizing p53 proteins. This method provided precise localization and reliable quantitative data, which we believe meet the needs of the current study. We believe our chosen method is both appropriate and sufficient for addressing the research question.

      Reviewer #2 (Significance):

      The manuscript by Zhao and colleagues presents a novel and compelling study on the p53 isoforms, Δ133p53 and Δ160p53, which are associated with aggressive cancer types. The main objective of the study was to understand how these isoforms exert a dominant negative effect on full-length p53 (FLp53). The authors discovered that the Δ133p53 and Δ160p53 proteins exhibit impaired binding to p53-regulated promoters. The data suggest that the predominant mechanism driving the dominant-negative effect is the coaggregation of FLp53 with Δ133p53 and Δ160p53.

      We sincerely thank the reviewer for the thoughtful and positive comments on our manuscript and for highlighting the significance of our findings on the p53 isoforms, Δ133p53 and Δ160p53. 

      Reviewer #3 (Evidence, reproducibility and clarity):

      In this manuscript entitled "Δ133p53 and Δ160p53 isoforms of the tumor suppressor protein p53 exert dominant-negative effect primarily by coaggregation", the authors suggest that the Δ133p53 and Δ160p53 isoforms have high aggregation propensity and that by co-aggregating with canonical p53 (FLp53), they sequestrate it away from DNA thus exerting a dominantnegative effect over it.

      First, the authors should make it clear throughout the manuscript, including the title, that they are investigating Δ133p53α and Δ160p53α since there are 3 Δ133p53 isoforms (α, β, γ), and 3 Δ160p53 isoforms (α, β, γ).

      Thank you for your suggestion. We understand the importance of clearly specifying the isoforms under study. Following your suggestion, we have added α in the title, abstract, and introduction and added the following statement in the Introduction (lines 57-59): “For convenience and simplicity, we have written Δ133p53 and Δ160p53 to represent the α isoforms (Δ133p53α and Δ160p53α) throughout this manuscript.” 

      One concern is that the authors only consider and explore Δ133p53α and Δ160p53α isoforms as exclusively oncogenic and FLp53 dominant-negative while not discussing evidences of different activities. Indeed, other manuscripts have also shown that Δ133p53α is non-oncogenic and non-mutagenic, do not antagonize every single FLp53 functions and are sometimes associated with good prognosis. To cite a few examples:

      (1) Hofstetter G. et al. D133p53 is an independent prognostic marker in p53 mutant advanced serous ovarian cancer. Br. J. Cancer 2011, 105, 15931599.

      (2) Bischof, K. et al. Influence of p53 Isoform Expression on Survival in HighGrade Serous Ovarian Cancers. Sci. Rep. 2019, 9,5244.

      (3) Knezovi´c F. et al. The role of p53 isoforms' expression and p53 mutation status in renal cell cancer prognosis. Urol. Oncol. 2019, 37, 578.e1578.e10.

      (4) Gong, L. et al. p53 isoform D113p53/D133p53 promotes DNA doublestrand break repair to protect cell from death and senescence in response to DNA damage. Cell Res. 2015, 25, 351-369.

      (5) Gong, L. et al. p53 isoform D133p53 promotes efficiency of induced pluripotent stem cells and ensures genomic integrity during reprogramming. Sci. Rep. 2016, 6, 37281.

      (6) Horikawa, I. et al. D133p53 represses p53-inducible senescence genes and enhances the generation of human induced pluripotent stem cells. Cell Death Differ. 2017, 24, 1017-1028.

      (7) Gong, L. p53 coordinates with D133p53 isoform to promote cell survival under low-level oxidative stress. J. Mol. Cell Biol. 2016, 8, 88-90.

      Thank you very much for your comment and for highlighting these important studies. 

      We agree that Δ133p53 isoforms exhibit complex biological functions, with both oncogenic and non-oncogenic potentials. However, our mission here was primarily to reveal the molecular mechanism for the dominant-negative effects exerted by the Δ133p53α and Δ160p53α isoforms on FLp53 for which the Δ133p53α and Δ160p53α isoforms are suitable model systems. Exploring the oncogenic potential of the isoforms is beyond the scope of the current study and we have not claimed anywhere that we are reporting that. We have carefully revised the manuscript and replaced the respective terms e.g. ‘prooncogenic activity’ with ‘dominant-negative effect’ in relevant places (e.g. line 90). We have now also added a paragraph with suitable references that introduces the oncogenic and non-oncogenic roles of the p53 isoforms.

      After reviewing the papers you cited, we are not sure that they reflect on oncogenic /non-oncogenic role of the Δ133p53α isoform in different cancer cases.  Although our study is not about the oncogenic potential of the isoforms, we have summarized the key findings below:

      (1) Hofstetter et al., 2011: Demonstrated that Δ133p53α expression improved recurrence-free and overall survival (in a p53 mutant induced advanced serous ovarian cancer, suggesting a potential protective role in this context.

      (2) Bischof et al., 2019: Found that Δ133p53 mRNA can improve overall survival in high-grade serous ovarian cancers. However, out of 31 patients, only 5 belong to the TP53 wild-type group, while the others carry TP53 mutations.

      (3) Knezović et al., 2019: Reported downregulation of Δ133p53 in renal cell carcinoma tissues with wild-type p53 compared to normal adjacent tissue, indicating a potential non-oncogenic role, but not conclusively demonstrating it.

      (4) Gong et al., 2015: Showed that Δ133p53 antagonizes p53-mediated apoptosis and promotes DNA double-strand break repair by upregulating RAD51, LIG4, and RAD52 independently of FLp53.

      (5) Gong et al., 2016: Demonstrated that overexpression of Δ133p53 promotes efficiency of cell reprogramming by its anti-apoptotic function and promoting DNA DSB repair. The authors hypotheses that this mechanism is involved in increasing RAD51 foci formation and decrease γH2AX foci formation and chromosome aberrations in induced pluripotent stem (iPS) cells, independent of FL p53.

      (6) Horikawa et al., 2017: Indicated that induced pluripotent stem cells derived from fibroblasts that overexpress Δ133p53 formed noncancerous tumors in mice compared to induced pluripotent stem cells derived from fibroblasts with complete p53 inhibition. Thus, Δ133p53 overexpression is "non- or less oncogenic and mutagenic" compared to complete p53 inhibition, but it still compromises certain p53-mediated tumor-suppressing pathways. “Overexpressed Δ133p53 prevented FL-p53 from binding to the regulatory regions of p21WAF1 and miR-34a promoters, providing a mechanistic basis for its dominant-negative

      inhibition of a subset of p53 target genes.”

      (7) Gong, 2016: Suggested that Δ133p53 promotes cell survival under lowlevel oxidative stress, but its role under different stress conditions remains uncertain.

      We have revised the Introduction to provide a more balanced discussion of Δ133p53’s dule role (lines 62-73):

      “The Δ133p53 isoform exhibit complex biological functions, with both oncogenic and non-oncogenic potentials. Recent studies demonstrate the non-oncogenic yet context-dependent role of the Δ133p53 isoform in cancer development. Δ133p53 expression has been reported to correlate with improved survival in patients with TP53 mutations(23, 24), where it promotes cell survival in a nononcogenic manner(25, 26), especially under low oxidative stress(27). Alternatively, other recent evidences emphasize the notable oncogenic functions of Δ133p53 as it can inhibit p53-dependent apoptosis by directly interacting with the FLp53 (4, 6). The oncogenic function of the newly identified Δ160p53 isoform is less known, although it is associated with p53 mutation-driven tumorigenesis(28) and in melanoma cells’ aggressiveness(10). Whether or not the Δ160p53 isoform also impedes FLp53 function in a similar way as Δ133p53 is an open question. However, these p53 isoforms can certainly compromise p53-mediated tumor suppression by interfering with FLp53 binding to target genes such as p21 and miR-34a(2, 29) by dominant-negative effect, the exact mechanism is not known.” On the figures presented in this manuscript, I have three major concerns:

      (1) Most results in the manuscript rely on the overexpression of the FLAGtagged or V5-tagged isoforms. The validation of these construct entirely depends on Supplementary figure 3 which the authors claim "rules out the possibility that the FLAG epitope might contribute to this aggregation. However, I am not entirely convinced by that conclusion. Indeed, the ratio between the "regular" isoform and the aggregates is much higher in the FLAG-tagged constructs than in the V5-tagged constructs. We can visualize the aggregates easily in the FLAG-tagged experiment, but the imaging clearly had to be overexposed (given the white coloring demonstrating saturation of the main bands) to visualize them in the V5-tagged experiments. Therefore, I am not convinced that an effect of the FLAG-tag can be ruled out and more convincing data should be added. 

      Thank you for raising this important concern. We have carefully considered your comments and have made several revisions to clarify and strengthen our conclusions.

      First, to address the potential influence of the FLAG and V5 tags on p53 isoform aggregation, we have revised Figure 2 and removed the previous Supplementary Figure 3, where non-specific antibody bindings and higher molecular weight aggregates were not clearly interpretable. In the revised Figure 2, we have removed these potential aggregates, improving the clarity and accuracy of the data.

      To further rule out any tag-related artifacts, we conducted a coimmunoprecipitation assay with FLAG-tagged FLp53 and untagged Δ133p53 and Δ160p53 isoforms. The results (now shown in the new Supplementary Figure 3) completely agree with our previous result with FLAG-tagged and V5tagged Δ133p53 and Δ160p53 isoforms and show interaction between the partners. This indicates that the FLAG / V5-tags do not influence / interfere with the interaction between FLp53 and the isoforms. We have still used FLAGtagged FLp53 as the endogenous p53 was undetectable and the FLAG-tagged FLp53 did not aggregate alone. 

      In the revised paper, we added the following sentences (Lines 146-152): “To rule out the possibility that the observed interactions between FLp53 and its isoforms Δ133p53 and Δ160p53 were artifacts caused by the FLAG and V5 antibody epitope tags, we co-expressed FLAG-tagged FLp53 with untagged Δ133p53 and Δ160p53. Immunoprecipitation assays demonstrated that FLAGtagged FLp53 could indeed interact with the untagged Δ133p53 and Δ160p53 isoforms (Supplementary Figure 3, lanes 3 and 4), confirming formation of hetero-oligomers between FLp53 and its isoforms. These findings demonstrate that Δ133p53 and Δ160p53 can oligomerize with FLp53 and with each other.”

      Additionally, we performed subcellular fractionation experiments to compare the aggregation and localization of FLAG-tagged FLp53 when co-expressed either with V5-tagged or untagged Δ133p53/Δ160p53. In these experiments, the untagged isoforms also induced FLp53 aggregation, mirroring our previous results with the tagged isoforms (Supplementary Figure 5). We’ve added this result in the revised manuscript (lines 236-245): “To exclude the possibility that FLAG or V5 tags contribute to protein aggregation, we also conducted subcellular fractionation of H1299 cells expressing FLAG-tagged FLp53 along with untagged Δ133p53 or Δ160p53 at a 1:5 ratio. The results showed (Supplementary Figure 6) a similar distribution of FLp53 across cytoplasmic, nuclear, and insoluble nuclear fractions as in the case of tagged Δ133p53 or Δ160p53 (Figure 6A to D). Notably, the aggregation of untagged Δ133p53 or Δ160p53 markedly promoted the aggregation of FLAG-tagged FLp53 (Supplementary Figure 6B and D), demonstrating that the antibody epitope tags themselves do not contribute to protein aggregation.” 

      We’ve also discussed this in the Discussion section (lines 349-356): “In our study, we primarily utilized an overexpression strategy involving FLAG/V5tagged proteins to investigate the effects of p53 isoforms Δ133p53 and Δ160p53 on the function of FLp53. To address concerns regarding potential overexpression artifacts, we performed the co-immunoprecipitation (Supplementary Figure 6) and caspase-3 and -7 activity (Figure 7) experiments with untagged Δ133p53 and Δ160p53. In both experimental systems, the untagged proteins behaved very similarly to the FLAG/V5 antibody epitopecontaining proteins (Figures 6 and 7 and Supplementary Figure 6). Hence, the C-terminal tagging of FLp53 or its isoforms does not alter the biochemical and physiological functions of these proteins.”

      In summary, the revised data set and newly added experiments provide strong evidence that neither the FLAG nor the V5 tag contributes to the observed p53 isoform aggregation.

      (2) The authors demonstrate that to visualize the dominant-negative effect, Δ133p53α and Δ160p53α must be "present in a higher proportion than FLp53 in the tetramer" and the need at least a transfection ratio 1:5 since the 1:1 ration shows no effect. However, in almost every single cell type, FLp53 is far more expressed than the isoforms which make it very unlikely to reach such stoichiometry in physiological conditions and make me wonder if this mechanism naturally occurs at endogenous level. This limitation should be at least discussed.

      Thank you for your insightful comment. However, evidence suggests that the expression levels of these isoforms such as Δ133p53, can be significantly elevated relative to FLp53 in certain physiological conditions(3, 4, 9). For example, in some breast tumors, with Δ133p53 mRNA is expressed at a much levels than FLp53, suggesting a distinct expression profile of p53 isoforms compared to normal breast tissue(4). Similarly, in non-small cell lung cancer and the A549 lung cancer cell line, the expression level of Δ133p53 transcript is significantly elevated compared to non-cancerous cells(3). Moreover, in specific cholangiocarcinoma cell lines, the Δ133p53 /TAp53 expression ratio has been reported to increase to as high as 3:1(9). These observations indicate that the dominant-negative effect of isoform Δ133p53 on FLp53 can occur under certain pathological conditions where the relative amounts of the FLp53 and the isoforms would largely vary. Since data on the Δ160p53 isoform are scarce, we infer that the long N-terminal truncated isoforms may share a similar mechanism.

      (3) Figure 5C: I am concerned by the subcellular location of the Δ133p53α and Δ160p53α as they are commonly considered nuclear and not cytoplasmic as shown here, particularly since they retain the 3 nuclear localization sequences like the FLp53 (Bourdon JC et al. 2005; Mondal A et al. 2018; Horikawa I et al, 2017; Joruiz S. et al, 2024). However, Δ133p53α can form cytoplasmic speckles (Horikawa I et al, 2017) when it colocalizes with autophagy markers for its degradation.

      The authors should discuss this issue. Could this discrepancy be due to the high overexpression level of these isoforms? A co-staining with autophagy markers (p62, LC3B) would rule out (or confirm) activation of autophagy due to the overwhelming expression of the isoform.

      Thank you for your thoughtful comments. We have thoroughly reviewed all the papers you recommended (Bourdon JC et al., 2005; Mondal A et al., 2018; Horikawa I et al., 2017; Joruiz S. et al., 2024)(4, 29, 30, 31). Among these, only the study by Bourdon JC et al. (2005) provided data regarding the localization of Δ133p53(4). Interestingly, their findings align with our observations, indicating that the protein does not exhibit predominantly nuclear localization in the Figure 8 from Jean-Christophe Bourdon et al. Genes Dev. 2005;19:2122-2137. The discrepancy may be caused by a potentially confusing statement in that paper(4).

      The localization of p53 is governed by multiple factors, including its nuclear import and export(32). The isoforms Δ133p53 and Δ160p53 contain three nuclear localization sequences (NLS)(4). However, the isoforms Δ133p53 and Δ160p53 were potentially trapped in the cytoplasm by aggregation and masking the NLS. This mechanism would prevent nuclear import. 

      Further, we acknowledge that Δ133p53 co-aggregates with autophagy substrate p62/SQSTM1 and autophagosome component LC3B in cytoplasm by autophagic degradation during replicative senescence(33). We agree that high overexpression of these aggregation-prone proteins may induce endoplasmic reticulum (ER) stress and activates autophagy(34). This could explain the cytoplasmic localization in our experiments. However, it is also critical to consider that we observed aggregates in both the cytoplasm and the nucleus (Figures 6B and E and Supplementary Figure 6B). While cytoplasmic localization may involve autophagy-related mechanisms, the nuclear aggregates likely arise from intrinsic isoform properties, such as altered protein folding, independent of autophagy. These dual localizations reflect the complex behavior of Δ133p53 and Δ160p53 isoforms under our experimental conditions.

      In the revised manuscript, we discussed this in Discussion (lines 328-335): “Moreover, the observed cytoplasmic isoform aggregates may reflect autophagy-related degradation, as suggested by the co-localization of Δ133p53 with autophagy substrate p62/SQSTM1 and autophagosome component LC3B(33). High overexpression of these aggregation-prone proteins could induce endoplasmic reticulum stress and activate autophagy(34). Interestingly, we also observed nuclear aggregation of these isoforms (Figure 6B and E and Supplementary Figure 6B), suggesting that distinct mechanisms, such as intrinsic properties of the isoforms, may govern their localization and behavior within the nucleus. This dual localization underscores the complexity of Δ133p53 and Δ160p53 behavior in cellular systems.”

      Minor concerns:

      -  Figure 1A: the initiation of the "Δ140p53" is shown instead of "Δ40p53"

      Thank you! The revised Figure 1A has been created in the revised paper.

      -  Figure 2A: I would like to see the images cropped a bit higher, so the cut does not happen just above the aggregate bands

      Thank you for this suggestion. We’ve changed the image and the new Figure 2 has been shown in the revised paper.

      -  Figure 3C: what ratio of FLp53/Delta isoform was used?

      We have added the ratio in the figure legend of Figure 3C (lines 845-846) “Relative DNA-binding of the FLp53-FLAG protein to the p53-target gene promoters in the presence of the V5-tagged protein Δ133p53 or Δ160p53 at a 1: 1 ratio.”

      -  Figure 3C suggests that the "dominant-negative" effect is mostly senescencespecific as it does not affect apoptosis target genes, which is consistent with Horikawa et al, 2017 and Gong et al, 2016 cited above. Furthermore, since these two references and the others from Gong et al. show that Δ133p53α increases DNA repair genes, it would be interesting to look at RAD51, RAD52 or Lig4, and maybe also induce stress.

      Thank you for your thoughtful comments and suggestions. In Figure 3C, the presence of Δ133p53 or Δ160p53 only significantly reduced the binding of FLp53 to the p21 promoter. However, isoforms Δ133p53 and Δ160p53 demonstrated a significant loss of DNA-binding activity at all four promoters: p21, MDM2, and apoptosis target genes BAX and PUMA (Figure 3B). This result suggests that Δ133p53 and Δ160p53 have the potential to influence FLp53 function due to their ability to form hetero-oligomers with FLp53 or their intrinsic tendency to aggregate. To further investigate this, we increased the isoform to FLp53 ratio in Figure 4, which demonstrate that the isoforms Δ133p53 and Δ160p53 exert dominant-negative effects on the function of FLp53. 

      These results demonstrate that the isoforms can compromise p53-mediated pathways, consistent with Horikawa et al. (2017), which showed that Δ133p53α overexpression is "non- or less oncogenic and mutagenic" compared to complete p53 inhibition, but still affects specific tumor-suppressing pathways. Furthermore, as noted by Gong et al. (2016), Δ133p53’s anti-apoptotic function under certain conditions is independent of FLp53 and unrelated to its dominantnegative effects.

      We appreciate your suggestion to investigate DNA repair genes such as RAD51, RAD52, or Lig4, especially under stress conditions. While these targets are intriguing and relevant, we believe that our current investigation of p53 targets in this manuscript sufficiently supports our conclusions regarding the dominant-negative effect. Further exploration of additional p53 target genes, including those involved in DNA repair, will be an important focus of our future studies.

      - Figure 5A and B: directly comparing the level of FLp53 expressed in cytoplasm or nucleus to the level of Δ133p53α and Δ160p53α expressed in cytoplasm or nucleus does not mean much since these are overexpressed proteins and therefore depend on the level of expression. The authors should rather compare the ratio of cytoplasmic/nuclear FLp53 to the ratio of cytoplasmic/nuclear Δ133p53α and Δ160p53α.

      Thank you very much for this valuable suggestion. In the revised paper, Figure 5B has been recreated.  Changes have been made in lines 214215: “The cytoplasm-to-nucleus ratio of Δ133p53 and Δ160p53 was approximately 1.5-fold higher than that of FLp53 (Figure 5B).” 

      Referees cross-commenting

      I agree that the system needs to be improved to be more physiological.

      Just to precise, the D133 and D160 isoforms are not truncated mutants, they are naturally occurring isoforms expressed in almost every normal human cell type from an internal promoter within the TP53 gene.

      Using overexpression always raises concerns, but in this case, I am even more careful because the isoforms are almost always less expressed than the FLp53, and here they have to push it 5 to 10 times more expressed than the FLp53 to see the effect which make me fear an artifact effect due to the overwhelming overexpression (which even seems to change the normal localization of the protein).

      To visualize the endogenous proteins, they will have to change cell line as the H1299 they used are p53 null.

      Thank you for these comments. We’ve addressed the motivation of overexpression in the above responses. We needed to use the plasmid constructs in the p53-null cells to detect the proteins but the expression level was certainly not ‘overwhelmingly high’. 

      First, we tried the A549 cells (p53 wild-type) under DNA damage conditions, but the endogenous p53 protein was undetectable. Second, several studies reported increased Δ133p53 level compared to wild-type p53 and that it has implications in tumor development(2, 3, 4, 9). Third, the apoptosis activity of H1299 cells overexpressing p53 proteins was analyzed in the revised manuscript (Figure 7). The apoptotic activity induced by FLp53 expression was approximately 2.5 times higher than that of the control vector under identical plasmid DNA transfection conditions (Figure 7). These results rule out the possibility that the plasmid-based expression of p53 and its isoforms introduced artifacts in the results. We’ve discussed this in the Results section (lines 254269).

      Reviewer #3 (Significance):

      Overall, the paper is interesting particularly considering the range of techniques used which is the main strength.

      The main limitation to me is the lack of contradictory discussion as all argumentation presents Δ133p53α and Δ160p53α exclusively as oncogenic and strictly FLp53 dominant-negative when, particularly for Δ133p53α, a quite extensive literature suggests a not so clear-cut activity.

      The aggregation mechanism is reported for the first time for Δ133p53α and Δ160p53α, although it was already published for Δ40p53α, Δ133p53β or in mutant p53.

      This manuscript would be a good basic research addition to the p53 field to provide insight in the mechanism for some activities of some p53 isoforms.

      My field of expertise is the p53 isoforms which I have been working on for 11 years in cancer and neuro-degenerative diseases

      Thank you very much for your positive and critical comments. We’ve included a fair discussion on the oncogenic and non-oncogenic function of Δ133p53 in the Introduction following your suggestion (lines 62-73). 

      References

      (1) Pitolli C, Wang Y, Candi E, Shi Y, Melino G, Amelio I. p53-Mediated Tumor Suppression: DNA-Damage Response and Alternative Mechanisms. Cancers 11,  (2019).

      (2) Fujita K, et al. p53 isoforms Delta133p53 and p53beta are endogenous regulators of replicative cellular senescence. Nature cell biology 11, 1135-1142 (2009).

      (3) Fragou A, et al. Increased Δ133p53 mRNA in lung carcinoma corresponds with reduction of p21 expression. Molecular medicine reports 15, 1455-1460 (2017).

      (4) Bourdon JC, et al. p53 isoforms can regulate p53 transcriptional activity. Genes & development 19, 2122-2137 (2005).

      (5) Ghosh A, Stewart D, Matlashewski G. Regulation of human p53 activity and cell localization by alternative splicing. Molecular and cellular biology 24, 7987-7997 (2004).

      (6) Aoubala M, et al. p53 directly transactivates Δ133p53α, regulating cell fate outcome in response to DNA damage. Cell death and differentiation 18, 248-258 (2011).

      (7) Marcel V, et al. p53 regulates the transcription of its Delta133p53 isoform through specific response elements contained within the TP53 P2 internal promoter. Oncogene 29, 2691-2700 (2010).

      (8) Zhao L, Sanyal S. p53 Isoforms as Cancer Biomarkers and Therapeutic Targets. Cancers 14,  (2022).

      (9) Nutthasirikul N, Limpaiboon T, Leelayuwat C, Patrakitkomjorn S, Jearanaikoon P. Ratio disruption of the ∆133p53 and TAp53 isoform equilibrium correlates with poor clinical outcome in intrahepatic cholangiocarcinoma. International journal of oncology 42, 1181-1188 (2013).

      (10) Tadijan A, et al. Altered Expression of Shorter p53 Family Isoforms Can Impact Melanoma Aggressiveness. Cancers 13,  (2021).

      (11) Aubrey BJ, Kelly GL, Janic A, Herold MJ, Strasser A. How does p53 induce apoptosis and how does this relate to p53-mediated tumour suppression? Cell death and differentiation 25, 104-113 (2018).

      (12) Ghorbani N, Yaghubi R, Davoodi J, Pahlavan S. How does caspases regulation play role in cell decisions? apoptosis and beyond. Molecular and cellular biochemistry 479, 1599-1613 (2024).

      (13) Petronilho EC, et al. Oncogenic p53 triggers amyloid aggregation of p63 and p73 liquid droplets. Communications chemistry 7, 207 (2024).

      (14) Forget KJ, Tremblay G, Roucou X. p53 Aggregates penetrate cells and induce the coaggregation of intracellular p53. PloS one 8, e69242 (2013).

      (15) Farmer KM, Ghag G, Puangmalai N, Montalbano M, Bhatt N, Kayed R. P53 aggregation, interactions with tau, and impaired DNA damage response in Alzheimer's disease. Acta neuropathologica communications 8, 132 (2020).

      (16) Arsic N, et al. Δ133p53β isoform pro-invasive activity is regulated through an aggregation-dependent mechanism in cancer cells. Nature communications 12, 5463 (2021).

      (17) Melo Dos Santos N, et al. Loss of the p53 transactivation domain results in high amyloid aggregation of the Δ40p53 isoform in endometrial carcinoma cells. The Journal of biological chemistry 294, 9430-9439 (2019).

      (18) Mestrom L, et al. Artificial Fusion of mCherry Enhances Trehalose Transferase Solubility and Stability. Applied and environmental microbiology 85,  (2019).

      (19) Kaba SA, Nene V, Musoke AJ, Vlak JM, van Oers MM. Fusion to green fluorescent protein improves expression levels of Theileria parva sporozoite surface antigen p67 in insect cells. Parasitology 125, 497-505 (2002).

      (20) Snapp EL, et al. Formation of stacked ER cisternae by low affinity protein interactions. The Journal of cell biology 163, 257-269 (2003).

      (21) Jain RK, Joyce PB, Molinete M, Halban PA, Gorr SU. Oligomerization of green fluorescent protein in the secretory pathway of endocrine cells. The Biochemical journal 360, 645-649 (2001).

      (22) Campbell RE, et al. A monomeric red fluorescent protein. Proceedings of the National Academy of Sciences of the United States of America 99, 7877-7882 (2002).

      (23) Hofstetter G, et al. Δ133p53 is an independent prognostic marker in p53 mutant advanced serous ovarian cancer. British journal of cancer 105, 1593-1599 (2011).

      (24) Bischof K, et al. Influence of p53 Isoform Expression on Survival in High-Grade Serous Ovarian Cancers. Scientific reports 9, 5244 (2019).

      (25) Gong L, et al. p53 isoform Δ113p53/Δ133p53 promotes DNA double-strand break repair to protect cell from death and senescence in response to DNA damage. Cell research 25, 351-369 (2015).

      (26) Gong L, et al. p53 isoform Δ133p53 promotes efficiency of induced pluripotent stem cells and ensures genomic integrity during reprogramming. Scientific reports 6, 37281 (2016).

      (27) Gong L, Pan X, Yuan ZM, Peng J, Chen J. p53 coordinates with Δ133p53 isoform to promote cell survival under low-level oxidative stress. Journal of molecular cell biology 8, 88-90 (2016).

      (28) Candeias MM, Hagiwara M, Matsuda M. Cancer-specific mutations in p53 induce the translation of Δ160p53 promoting tumorigenesis. EMBO reports 17, 1542-1551 (2016).

      (29) Horikawa I, et al. Δ133p53 represses p53-inducible senescence genes and enhances the generation of human induced pluripotent stem cells. Cell death and differentiation 24, 1017-1028 (2017).

      (30) Mondal AM, et al. Δ133p53α, a natural p53 isoform, contributes to conditional reprogramming and long-term proliferation of primary epithelial cells. Cell death & disease 9, 750 (2018).

      (31) Joruiz SM, Von Muhlinen N, Horikawa I, Gilbert MR, Harris CC. Distinct functions of wild-type and R273H mutant Δ133p53α differentially regulate glioblastoma aggressiveness and therapy-induced senescence. Cell death & disease 15, 454 (2024).

      (32) O'Brate A, Giannakakou P. The importance of p53 location: nuclear or cytoplasmic zip code? Drug resistance updates : reviews and commentaries in antimicrobial and anticancer chemotherapy 6, 313-322 (2003).

      (33) Horikawa I, et al. Autophagic degradation of the inhibitory p53 isoform Δ133p53α as a regulatory mechanism for p53-mediated senescence. Nature communications 5, 4706 (2014).

      (34) Lee H, et al. IRE1 plays an essential role in ER stress-mediated aggregation of mutant huntingtin via the inhibition of autophagy flux. Human molecular genetics 21, 101-114 (2012).

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary:

      The concept that trained immunity, as defined, can be beneficial to subsequent immune challenges is important in the broad context of health and disease. The significance of this manuscript is the finding that trained immunity is actually a two-edged sword, herein, detrimental in the context of LPS-induced Acute Lung Injury that is mediated by AMs.

      Strengths:

      Several lines of evidence in different mouse models support this conclusion. The postulation that differences in immune responses in individuals are linked to differences in the mycobiome and consequent B-glucan makeup is provocative.

      Weaknesses:

      The findings that the authors state are relevant to sepsis, are actually confined to a specific lung injury model and not classically-defined sepsis. In addition, the ontogeny of the reprogrammed AMs is uncertain. Links in the proposed signaling pathways need to be strengthened.

      Reviewer #2 (Public review):

      Summary:

      Prével et al. present an in vivo study in which they reveal an interesting aspect of β-glucan, a known inducer of enhanced immune responses termed trained immunity in sterile inflammation. The authors can show, that β-glucan's can reprogram alveolar macrophages (AMs) in the lungs through neutrophils and IFNγ signaling and independent of Dectin1. This reprogramming occurs at both transcriptional and metabolic levels. After β-glucan training, LPS-induced sterile inflammation exacerbated acute lung injury via enhanced immunopathology. These findings highlight a new aspect of β-glucan's role in trained immunity and its potential detrimental effects when enhanced pathogen clearance is not required.

      Strengths:

      (1) This manuscript is well-written and effectively conveys its message.

      (2) The authors provide important evidence that β-glucan training is not solely beneficial, but depending on the context can also enhance immunopathology. This will be important to the field for two reasons. It shows again, that trained immunity can also be harmful. Jentho et al. 2021 have already provided further evidence for this aspect. And it highlights anew that LPS application is an insufficient infection model.

      Weaknesses:

      (1) Only a little physiological data is provided by the in vivo models.

      (2) The effects in histology appear to be rather weak.

      Reviewer #1 (Recommendations for the authors):

      The opening paragraph in the introduction focuses on sepsis. This is misleading since this manuscript does not address sepsis but rather intranasal-administered LPS-induced acute lung injury.

      We are in total agreement with the reviewer and have modified the introduction to focus on acute lung injury with clinical relevance more associated to TLR4-mediated acute lung injury and lung inflammation.

      The authors make definitive statements that AMs originate from fetal liver monocytes. However, it is well known that the ontogeny of AMs is complex and AMs can be populated, in part, from peripheral monocytes. The ontogeny of reprogrammed AMs was not addressed in this study but they may come from monocyte-derived AMs following B-glucan training (transfer of AMs into Csf2rb KO mice does not prove the contrary). In this regard, do, for example, the percentages of CD11b+ AMs change? More phenotyping of the control and reprogrammed AMs would enhance the interpretation of the findings.

      The reviewer is correct that the ontogeny of AMs can be heterogenous, especially following a pulmonary challenge. In β-glucan-treated mice, Figure 1I shows no changes in frequency or number of AMs in the BAL. As the reviewer suggested, we repeated this experiment and incorporate more markers for AMs. New Supplementary Figure 1C shows the expression of CD11b on AMs (CD11c<sup>+</sup>SiglecF<sup>+</sup>) from control and β-glucan-treated mice. While the frequency increases with LPS administration, we show no difference between control and β-glucan groups suggesting β-glucan does not induce the expansion of monocyte-derived AMs. Additionally, in New Supplementary Figure 1D, we show the expression of AM-associated markers in order to better delineate their phenotype. We observed no differences in MHCII, CD169, CD64 and F4/80 in β-glucan-treated mice, but an increase in CD80<SUP>+</SUP> AMs following βglucan suggesting enhanced activation corroborating their proinflammatory phenotype. Collectively, these data indicate that while the frequency and number of either yolk-sac or BMderived AMs are unchanged in the β-glucan treated mice, the activation of AMs is enhanced after the systemic treatment with β-glucan.

      The abstract seems to overpromise a bit. First, it mentions trained immunity and HSCs, but they don't seem to formally address either in the context of this model (there is reprogramming as assessed by transcriptome and metabolic analyses which is suggestive as stated by the authors, but do the changes overlap significantly with classically trained immunity?), and second, it links phenotypes together in a pathway(s) that they haven't actually interrogated - although they look at transcripts and do a seahorse assay they don't actually confirm that any of those findings are related to the increased response to LPS in vivo. The long discussion with all the caveats highlights these limitations, all relegated to future studies.

      We thank the reviewer for this comment. In response, we have revised the abstract to more accurately highlight the key findings of this study. Specifically, we introduced the concept of central trained immunity to describe the phenomena commonly observed with β-glucan treatment, contrasting it with the peripheral trained immunity detailed in the manuscript.

      The use of Csf2rb-/- mice to complement the clodronate approach is interesting (this approach has been used in the past with influenza virus). In addition to lacking AMs, these mice develop pulmonary alveolar proteinosis. Do the authors have histopathology from these mice in the current model? They mention PAP in the discussion.

      Pulmonary alveolar proteinosis (PAP) typically develops in Csf2b-/- mice from 12 weeks of age onwards (Stanley et al., Proc Natl Acad Sci USA, 1994). However, in our model, mice were euthanized at 6 weeks, ensuring that pulmonary function and structure remained intact. A hallmark of PAP is the accumulation of protein, primarily surfactant, in BAL. To investigate this, we measured BAL protein concentration and observed no differences at baseline (Figure 2F). These findings were further supported by the absence of differences in BAL proinflammatory cytokine concentrations (Figure 2H).

      A question about their BAL technique? In the control mice without glucan/LPS stimulation, only 40% of BAL cells are AMs [and the total number of AMs (range of <103 to 2-3 x 104) is at least 5-fold lower than typically seen in BALs from healthy mice (105), and there didn't seem to be many PMNs either. Are 60% of the BAL cells lymphocytes/ RBCs? Is it possible that overall AM numbers are changing, but CD11c/SiglecF-positive cell numbers stay the same (only assessed 2 markers)? More phenotyping would help.

      We appreciate the reviewer’s comment and would like to clarify that alveolar macrophages (AMs) are presented in the manuscript as a frequency of viable cells rather than as a frequency of CD45<SUP>+</SUP> cells, to ensure consistency throughout the study. The remaining cells in the samples are likely epithelial cells and lymphocytes, as red blood cells are lysed during sample processing. For additional context, we now provide data showing AMs as a percentage of CD45<SUP>+</SUP> cells, which account for 80–90% of leukocytes. Furthermore, in New Supplementary Figure 1D, we highlight the expression of AM-associated markers to better define their phenotype. We observed no differences in MHCII, CD169, CD64, or F4/80 expression in βglucan-treated mice. However, there was an increase in CD80<SUP>+</SUP> AMs, indicating enhanced activation and corroborating their proinflammatory phenotype.

      Author response image 1.

      AMs as percentage of CD45<SUP>+</SUP> cells. Mice were treated with β-glucan for seven days. We show CD11c<sup>+</sup>SiglecF<sup>+</sup> cells in the bronchoalveolar lavage (BAL) as a percentage of CD45<SUP>+</SUP> cells (n=5).

      Line 130-131. TNF is decreased and not pointed out.

      In the poly(I:C) model, the difference in the BAL TNF concentration is not statistically different between naïve and trained mice due to high variability of data. The reviewer is correct that TNFα does not appear to reflect Poly(I:C)-mediated ALI. We have included this point in the revised manuscript (Line 146-148).

      Reviewer #2 (Recommendations for the authors):

      Suggestions:

      (1) The authors provide evidence for enhanced ALI via different techniques, e.g. histology, vascular leakage, immune cell composition in BAL etc. It would be interesting to see whether there were any changes in the disease severity of ALI. If possible the authors could provide data for survival, temperature, weight, and/or glucose in the different groups.

      Mice are extremely resistant to the pulmonary LPS model. We have previously assessed lethality of our LPS model, and all mice survive even with an increased intranasal dose of LPS 200μg (Pernet et al, Nature, 2023). To address the reviewer concerns, we next assessed the morbidity by monitoring weight loss following LPS challenge and showed β-glucan-treated mice exhibit a delayed recovery time after 4 days LPS treatment (New Supplementary Figure 1B).

      (2) The authors show that ß-glucan mediated training enhances ALI. Conversely, the opposite, decreased immunopathology should be observed in case an LPS tolerance model would be used. I am wondering whether this has already been performed, given that the (LPS/immune)tolerance field is already older than the training field. If not, I suggest incorporating this feature in their discussion.

      Thank you for this insightful comment. While LPS has long been recognized to induce tolerance, studies have also shown that intranasal exposure to ambient levels of LPS can induce alveolar macrophage (AM) training via type I interferon signaling (Zahalka et al., Mucosal Immunol, 2022). In contrast, Mason et al. demonstrated that systemic LPS stimulation induces tolerance through TNF-α signaling, resulting in diminished AM phagocytosis and superoxide production. This leads to reduced neutrophil recruitment and impaired bacterial clearance in a Pseudomonas aeruginosa pneumonia model (J Infect Dis, 1997). Furthermore, we recently reported that systemic administration of β-glucan induces central trained immunity, generating a distinct subset of regulatory neutrophils that promote disease tolerance against influenza viral infection (Khan et al., Nat Immunol, 2025). These findings highlight the complex and context-dependent interplay between training and tolerance. We have expanded on this point in the discussion section of the revised manuscript (Lines 289-297).

      (3) The finding that trained immunity can exert not only beneficial effects but also enhance immunopathology is interesting and should be further explored. Already Jentho et al. (PNAS 2021) have shown that upon sterile inflammation as imposed by LPS, (heme) training can lead to enhanced mortality. This might be a relevant trade-off in trained immunity since no beneficial resistance effect by pathogen killing can be obtained. It would be interesting to see, in their model, whether heme would also enhance ALI after intranasal LPS application. Or at least, can the authors discuss this finding more, also in relation to the already published evidence?

      Thank you for raising this interesting point, which is indeed relevant to our study. Jentho et al. demonstrated that training by heme can be beneficial in combating infectious challenges but can have deleterious effects in the context of sterile inflammation. The concept of endogenous training agents like heme, with their diverse effects on immune cells, aligns well with our βglucan model, particularly given the high prevalence of fungal agents in the microbiome.

      While investigating the effects of heme on alveolar macrophages would certainly be intriguing, Jentho and colleagues have already reported the maladaptive effects of heme, such as tissue damage, during sterile LPS-induced inflammation. As such, these findings might be redundant in the context of our model. However, we have drawn a relevant parallel and expanded on this discussion in the revised manuscript (Lines 382-385).

      (4) It is not clear how the histologies were evaluated. This is a field of great subjectivity. The authors should describe it in more detail. The best option would have been a blinded observer. Was this done?

      Histology samples were evaluated according to ATS 2011 guidelines regarding “Features and measurements of experimental acute lung injury in animals” by a blinded pathologist. We have specified this in the methods of the revised manuscript.

      Minor:

      (1) Line 108 and ff. Please change TNF, not TNFa

      Since we used an ELISA specific for TNF-α rather than general TNF, it is more accurate to refer to it as TNF-α.

      (2) Line 513 and ff. Please use Greek letters when appropriate, e.g. IFN-γ not IFNg.

      Thank you for pointing out these mistakes, we rectified these in the text.

    1. Author response:

      Public Reviews:  

      Reviewer #1 (Public review):

      Summary:

      This manuscript presents a study on expectation manipulation to induce placebo and nocebo effects in healthy participants. The study follows standard placebo experiment conventions with the use of TENS stimulation as the placebo manipulation. The authors were able to achieve their aims. A key finding is that placebo and nocebo effects were predicted by recent experience, which is a novel contribution to the literature. The findings provide insights into the differences between placebo and nocebo effects and the potential moderators of these effects.

      Specifically, the study aimed to:

      (1) assess the magnitude of placebo and nocebo effects immediately after induction through verbal instructions and conditioning

      (2) examine the persistence of these effects one week later, and

      (3) identify predictors of sustained placebo and nocebo responses over time.

      Strengths:

      An innovation was to use sham TENS stimulation as the expectation manipulation. This expectation manipulation was reinforced not only by the change in pain stimulus intensity, but also by delivery of non-painful electrical stimulation, labelled as TENS stimulation.

      Questionnaire-based treatment expectation ratings were collected before conditioning and after conditioning, and after the test session, which provided an explicit measure of participants' expectations about the manipulation.

      The finding that placebo and nocebo effects are influenced by recent experience provides a novel insight into a potential moderator of individual placebo effects.

      We thank the reviewer for their thorough evaluation of our manuscript and for highlighting the novelty and originality of our study.

      Weaknesses:

      There are a limited number of trials per test condition (10), which means that the trajectory of responses to the manipulation may not be adequately explored.

      We appreciate the reviewer’s comment regarding the number of trials in the test phase (i.e., 10 trials per condition). This trial number was chosen to ensure comparability with previous studies employing similar designs and research questions (e.g. Colloca et al., 2010). Our primary objective was to directly compare placebo and nocebo effects within a within-subject design and to examine their persistence one week after the first test session. While we did not specifically aim to investigate the trajectory of responses within a single testing session, we fully agree that a comprehensive analysis of the trajectories of expectation effects on pain would be a valuable extension of our work. We will acknowledge this limitation and future direction in the revised manuscript. 

      On day 8, one stimulus per stimulation intensity (i.e., VAS 40, 60, and 80) was applied before the start of the test session to re-familiarise participants with the thermal stimulation. There is a potential risk of revealing the manipulation to participants during the re-familiarization process, as they were not previously briefed to expect the painful stimulus intensity to vary without the application of sham TENS stimulation.

      We thank the reviewer for the opportunity to clarify that participants were informed at the beginning of the experiment that we would use different stimulation intensities to re-familiarize them with the stimuli before the second test session. We are therefore confident that participants perceived this step as part of a recalibration rather than associating it with the experimental manipulation. We will add this information to the revised version of the manuscript. 

      The differences between the nocebo and control conditions in pain ratings during conditioning could be explained by the differing physiological effects of the different stimulus intensities, so it is difficult to make any claims about expectation effects here.

      We appreciate the reviewer’s comment and agree that, despite the careful calibration of the three pain stimuli, we cannot entirely rule out the possibility that temporal dynamics during the conditioning session were influenced by differential physiological effects of the varying stimulus intensities (e.g., intensity-dependent habituation or sensitization). We will address this in the revision of the manuscript, but we would like to emphasize that the stronger nocebo effects during the test phase are statistically controlled for any differences in the conditioning session. 

      A randomisation error meant that 25 participants received an unbalanced number of 448 trials per condition (i.e., 10 x VAS 40, 14 x VAS 60, 12 x VAS 80).

      We agree that it is unfortunate that 25 participants were conditioned with an unbalanced number of trials per condition during the conditioning session. In the revised version of the manuscript, we will include additional analyses to demonstrate that this imbalance did not systematically bias the results and that the findings observed during the test phase remain robust despite this error.  

      Reviewer #2 (Public review):

      Summary:

      Kunkel et al aim to answer a fundamental question: Do placebo and nocebo effects differ in magnitude or longevity? To address this question, they used a powerful within-participants design, with a very large sample size (n=104), in which they compared placebo and nocebo effects - within the same individuals - across verbal expectations, conditioning, testing phase, and a 1-week follow-up. With elegant analyses, they establish that different mechanisms underlie the learning of placebo vs nocebo effects, with the latter being acquired faster and extinguished slower. This is an important finding for both the basic understanding of learning mechanisms in humans and for potential clinical applications to improve human health.

      Strengths:

      Beyond the above - the paper is well-written and very clear. It lays out nicely the need for the current investigation and what implications it holds. The design is elegant, and the analyses are rich, thoughtful, and interesting. The sample size is large which is highly appreciated, considering the longitudinal, in-lab study design. The question is super important and well-investigated, and the entire manuscript is very thoughtful with analyses closely examining the underlying mechanisms of placebo versus nocebo effects.

      We thank the reviewer for their positive evaluation of our manuscript and for acknowledging the large sample size, methodological rigor, and the significant implications for clinical applications and the broader research field.

      Weaknesses:

      There were two highly addressable weaknesses in my opinion:

      (1) I could not find the preregistration - this is crucial to verify what analyses the authors have committed to prior to writing the manuscript. Please provide a link leading directly to the preregistration - searching for the specified number in the suggested website yielded no results.

      We apologize that the registration number alone does not directly lead to the preregistration of this study. We thank the reviewer for pointing this out and will include a link to the preregistration in the revised manuscript. This study was pre-registered with the German Clinical Trial Register (registration number: DRKS00029228; https://drks.de/search/de/trial/DRKS00029228).

      (2) There is a recurring issue which is easy to address: because the Methods are located after the Results, many of the constructs used, analyses conducted, and even the main placebo and nocebo inductions are unclear, making it hard to appreciate the results in full. I recommend finding a way to detail at the beginning of the results section how placebo and nocebo effects have been induced. While my background means I am familiar with these methods, other readers will lack that knowledge. Even a short paragraph or a figure (like Figure 4) could help clarify the results substantially. For example, a significant portion of the results is devoted to the conditioning part of the experiment, while it is unknown which part was involved (e.g., were temperatures lowered/increased in all trials or only in the beginning).

      We thank the reviewer for this comment and suggestion. In the revised version, we will restructure the manuscript and include more detailed information about the key experimental procedures and design at the beginning of the Results section to enhance clarity and improve the interpretability of the reported findings.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Previous studies have shown that the MSH6 family of mismatch repair proteins contains an unstructured N-terminal domain that contains either a PWWP domain, a Tudor domain or neither and that the interaction of the histone reader domains with the appropriate histone H3 modification enhances mismatch repair, and hence reduces mutation rates in coding regions to some extent. However, the elimination of the MSH6-histone modification probably does not completely eliminate mismatch repair, although the published papers on this point do not seem definitive.

      In this study, the authors perform a details phylogenetic analysis of the presence of the PWWP and Tudor domains in MSH6 proteins across the tree of life. They observe that there are basically three classes of organisms that contain either a PWWP domain, a Tudor domain, or neither. On the basis of their analysis, they suggest that this represents convergent evolution of the independent acquisition of histone reader domains and that key amino acid residues in the reader domains are selected for.

      Strengths:

      The phylogenetic aspects of the work seem well done and the basic evolutionary conclusions of the work are well supported. The basic evolutionary conclusions are interesting and there is little to criticize from my perspective.

      Thank you for the positive evaluation. We appreciate your interest and review.

      Weaknesses:

      A major concern about this paper is that the authors fail to put their work into the proper context of what is already known about the N-terminus of MSH6. Further, their structural studies, which are really structural illustrations, are misleading, often incorrect, and not always helpful in addition to having been published before.

      Thank you for the helpful suggestions on this front. We agree that some of the structural visualizations were over simplified and apologize for the lack of clarity. Notably, we did not annotate the presence of putative or known short PCNA-interacting protein (PIP) motifs which have been found at the linker disordered N-terminus of MSH6 proteins. Indeed, while not direct to our investigation of the origins of histone readers, the PIP motifs are an interesting and functionally important feature of MSH6 structural biology, especially because they may facilitate DNA repair processes more generally. In the revised manuscript, we aim to improve the scholarship on this topic and clarify the presence/importance of this motif for MSH6 function, as well as what is known about the structural biology of the MSH6 N-terminus more broadly. We will add annotations of the PIP motif and will also improve structural prediction by visualizing MSH6 structure in its dimerized form with MSH2, for a more accurate estimate of its folding in vivo. We hope that these in addition to other valuable suggested improvements will enhance the revised manuscript.

      Reviewer #2 (Public review):

      Summary:

      In this work, Monroe JG and colleagues show a compelling case of convergent evolution in the fusion between an important mismatch repair protein (MSH6) and histone reader domains across the tree of life. These fused MSH6 readers have been shown to be important for the recruitment of MSH6 to exon-rich genome locations, therefore improving the efficiency of reducing mutation rates in coding regions.

      Comparative genomic analyses here performed revealed independent instances of MSH6 fusion with histone readers in plants and metazoa with several instances of putative loss (or gain) across the phylogeny. The work also unveiled instances of MSH6 fusion putatively interesting domains in fungi which might be worth exploring in the future.

      The authors also show potential signatures of purifying selection in functional amino acids MSH6 histone readers.

      Overall the approach is adequate for the questions proposed to be answered, the analyses are rigorous and support the authors' claims.

      DNA repair genes are essential to maintain genome stability and fidelity, and alterations in these pathways have been associated with hypermutation phenotypes in the context for instance of cancer in humans, with sometimes implications in treatment resistance. This is an important work that contributes to our understanding of the evolutionary consequences of the evolution of epigenome-targeted DNA repair.

      Strengths:

      The methods used are adequate for the questions and support the results. The search for MSH6 fusions was rigorous and conservative, which strengthens the significance of the claims on the evolutionary history of these fusion events.

      Thank you for the positive evaluation. We appreciate your interest and review.

      Weaknesses:

      I did not identify any major weaknesses, but please see my suggestions/recommendations.

      Thank you, we will also address your suggestions, which provide valuable recommendations for improving the revised manuscript.

      Reviewer #3 (Public review):

      Summary:

      In the manuscript entitled "Convergent evolution of epigenome recruited DNA repair across the Tree of Life", Monroe et al. investigate bioinformatically how some important mechanisms of epigenome-targeted DNA repair evolved at the tree of life scale. They provide a clear example of convergent evolution of these mechanisms between animals and plants, investigating more than 4000 eukaryotic genomes, and uncovering a significant association between gain/retention of such mechanisms with genome size and high intron content, that at least partially explains the evolutionary patterns observed within major eukaryotic lineages.

      Strengths:

      The manuscript is well written, clear, and understandable, and has potentially broad interest. It provides a thorough analysis of the evolution of MSH6-related DNA repair mechanisms using more than 4000 eukaryotic genomes, a pretty impressive number allowing to identify both large-scale (i.e. kingdoms) as well as shorter-scale (i.e. phyla, orders) evolutionary patterns. Moreover, despite providing no experimental validation, it investigates with a sufficient degree of depth, a potential relationship between gain/retention of epigenome recruited DNA repair mediated by MSH6 and genomic, as well as life-history (population size, body mass, lifespan), traits. In particular, it provides convincing evidence for a causative effect between genome size/intron content and the presence/absence of this mechanism. Moreover, it stimulates further scientific investigation and biological questions to be addressed, such as the conservation of epigenomes across the tree of life, the existence of potential trade-offs in gain/retention vs. loss of such mechanisms, and the relationship between these processes, mutation rate heterogeneity, and evolvability.

      Thank you for the positive evaluation. We appreciate your interest and review.

      Weaknesses:

      Despite the interesting and necessary insights provided on (1) the evolution of DNA repair mechanisms, and (2) the convergent evolution of molecular mechanisms, this bioinformatic study emanates from studies in humans and Arabidopsis already showing signs of potential convergent evolution in aspects of epigenome-recruited DNA repair. For this, this study, although bioinformatically remarkably thorough, does not come as a surprise, potentially lowering its novelty.

      What could have increased further its impact, interest, and novelty could have been a more comprehensive understanding of the causative processes leading to gain/retention vs. loss of MSH6-related epigenetic recruitment mechanisms. The authors provide interesting associations with life-history traits (yet not significant), and significant links with genome size and intron content only at the theoretical level. For the first aspect, the analyses could have expanded toward other life-history traits. For the second, maybe it could have been even possible to tackle experimentally some of the generated questions, functionally in some models, or deepened using specific case studies.

      We agree that this work expands on recent experimental work in humans and Arabidopsis on the function of histone readers in MSH6, PWWP and Tudor, respectively. However, the evolution of these fusions remained a significant knowledge gap, limiting the degree to which functional work could be translated to other organisms. This study definitively characterized the evolutionary history of MHS6 histone readers and lays the groundwork for future investigations in diverse species. We agree that more causal inference would be valuable to understand the evolutionary pressures acting on MSH6 histone reader presence/absence. Indeed, we prioritized the conservative approach of testing hypotheses with strict phylogenetically constrained contrasts. While we observed highly significant associations between histone readers and genomic traits like intron content, associations with life history traits were only significant before accounting for phylogeny. It is possible that this is due to a lack of power because such traits are only available in limited taxa. In the revised manuscript, we aim to clarify potential causes, outline future experimental work beyond the scope of this individual study, and argue that this work highlights the need to catalog trait diversity at broader phylogenetic scales.  We also address other valuable suggestions in the revised manuscript.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Is peristimulus alpha (8-14 Hz) frequency and/or phase involved in shaping the length of visual and audiovisual temporal binding windows, as posited by the discrete sampling hypothesis? If so, to what extent and perceptual scenario are they functionally relevant? The authors addressed such questions by collecting EEG data during the completion of the widely-known 2-flash fusion paradigm, administered both in a standard (i.e., visual only, F2) and audiovisual (i.e., 2 flashes and 1 beep, F2B1) fashion. Instantaneous frequency estimation performed over parieto-occipital sensors revealed slower alpha rhythms right after stimulus onset in the F2B1 condition, as compared to the F2, a pattern found to correlate with the difference between modality-specific ISIs (F2B1-F2). Of note, peristimulus alpha frequency differed also between 1 vs 2 flashes reports, although in the visual modality only (i.e., faster alpha oscillations in 2 flash percept vs 1 flash). This pattern of results was reinvigorated in a causal manner via occipital tACS, which was capable of, respectively, narrowing down vs enlarging the temporal binding window of individuals undergoing 13 Hz vs 8 Hz stimulation in the F2 modality alone. To elucidate what the oscillatory signatures of crossmodal integration might be, the authors further focused on the phase of posterior alpha rhythms. Accordingly, the Phase Opposition Sum proved to significantly differ between modalities (F2B1 vs F2) during the prestimulus time window, suggesting that audiovisual signals undergo finer processing based on the ongoing phase of occipital alpha oscillations, rather than the speed at which these rhythms cycle. As a last bit of information, a computational model factoring in the electrophysiological assumptions of both the discrete sampling hypothesis and auditory-induced phase-resetting was devised. Analyses run on such synthetic data were partially able to reproduce the patterns witnessed in the empirical dataset. While faster frequency rates broadly provide a higher probability to detect 2 flashes instead of 1, the occurrence of a concurrent auditory signal in cross-modal trials should cause a transient elongation (i.e. slower frequency rate) of the ongoing alpha cycle due to phase-reset dynamics (as revealed via inter-trial phase clustering), prompting larger ISIs during F2B1 trials. Conversely, the model provides that alpha oscillatory phase might predict how well an observer dissociates sensory information from noise (i.e., perceptual clarity), with the second flash clearly perceived as such as long as it falls within specific phase windows along the alpha cycle.

      Strengths:

      The authors leveraged complementary approaches (EEG, tACS, and computational modelling), the results thereof not only integrate, but depict an overarching mechanistic scenario elegantly framing phase-resetting dynamics into the broader theoretical architecture posited by the discrete sampling hypothesis. Analyses on brain oscillations (either via frequency sliding and phase opposition sum) mostly appear to be methodologically sound, and very-well supported by tACS results. Under this perspective, the modelling approach serves as a convenient tool to reconcile and shed more light on the pieces of evidence gathered on empirical data, returning an appealing account on how cross-modal stimuli interplay with ongoing alpha rhythms and differentially affect multisensory processing in humans.

      Weaknesses:

      Some information relative to the task and the analyses is missing. For instance, it is not entirely clear from the text what the number of flashes actually displayed in explicit short trials is (1 or 2?). We believe it is always two, but it should be explicitly stated.

      We thank the reviewer for highlighting this important point. In our study, all explicit trials consistently presented two flashes. We will clearly state this detail in the Methods section to avoid any further confusion.

      Moreover, the sample size might be an issue. As highlighted by a recent meta-analysis on the matter (Samaha & Romei, 2024), an underpowered sample size may very well drive null-findings relative to tACS data in F2B1 trials, in interplay with broad and un-individualized frequency targets.

      We thank the reviewer for raising this point. First, we would like to clarify that our results do not suggest that the frequency effect is absent in the F2B1 condition; rather, it is relatively attenuated compared to the F2 condition. If the sample size were the primary issue, we would expect to observe a null effect in both conditions. Instead, the stronger frequency modulation in F2 confirms that the sound-induced modulation is present, albeit reduced in the audiovisual context. In our revised manuscript, we will explicitly note that our claim is not that there is no frequency effect in F2B1 but that the effect is weaker relative to F2, and we will also acknowledge the potential limitations associated with sample size and the lack of individualized frequency targeting.

      Some criticality arises regarding the actual "bistability" of bistable trials, as the statistics relative to the main task (i.e., the actual means and SEMs are missing) broadly point toward a higher proclivity to report 2 instead of 1 flash in both F2B1 and F2 trials. This makes sense to some extent, given that 2 flashes have always been displayed (at least in bistable trials), yet tells about something botched during the pretest titration procedure.

      We thank the reviewer for pointing out the potential bias toward reporting “two flashes” in the bistable trials. Because our experimental design involves presenting two flashes in both explicit and bistable trials, a slight tendency to report two flashes may naturally arise, especially at threshold levels determined during pretesting. We believe, however, that this bias does not undermine our primary findings. Our psychophysical procedure is designed to align the inter-stimulus interval with each participant’s fusion threshold, aiming for a near 50/50 split between “one-flash” and “two-flash” reports. However, given that two flashes are always presented, participants may be predisposed to report two flashes when uncertain. This reflects a plausible perceptual bias inherent in the bistable design, rather than a systematic flaw. Importantly, this tendency appears at comparable levels in both the F2 and F2B1 conditions, indicating that it does not selectively affect any particular condition. In the revised manuscript, we will include additional descriptive statistics, such as means and standard deviations, to demonstrate that the observed bias remains within an acceptable range and does not compromise our core conclusions regarding the modulatory effect of auditory input on visual integration.

      Coming to the analyses on brain waves, one main concern relates to the phase-reset-induced slow-down of posterior alpha rhythms being of true oscillatory nature, rather than a mere evoked response (i.e., not sustained over time).

      We appreciate the reviewer’s concern regarding this issue. First, the sustained decrease in posterior alpha frequency observed in our study—persisting for approximately 280 ms—substantially exceeds the typical duration of an auditory evoked potential (generally 50–200 ms) (Näätänen and Picton, 1987). This extended period of modulation suggests that it is not merely a transient evoked response.

      Second, our analysis of alpha power further supports this interpretation. A purely evoked response is usually accompanied by a corresponding increase in signal power; however, our results show no such power increase when comparing the F2B1 condition with the F2 condition.

      Moreover, the observed increase in alpha phase resetting—as measured by inter-trial phase coherence (ITC)—does not significantly correlate with changes in alpha power. This dissociation further indicates that the auditory-induced effects are unlikely to be driven solely by evoked potentials, but are more consistent with a reorganization of the intrinsic neural oscillatory activity.

      Together, these lines of evidence strongly support the view that the auditory-induced decrease in alpha frequency reflects true changes in ongoing oscillatory dynamics, rather than being merely a transient evoked response.

      Another question calling for some further scrutiny regards the overlooked pattern linking the temporal extent of the IAF differences between F2 and F2B1 trials with the ISIs across experimental conditions (explicit short, bistable, and explicit long). That is, the wider the ISI, the longer the temporal extent of the IAF difference between sensory modalities. Although neglected by the authors, such a trend speaks in favour of a rather nuanced scenario stemming from not only auditory-induced phase-reset alpha cycle elongation, but also some non-linear and perhaps super-additive contribution of flash-induced phase-resetting. This consideration introduces some of the issues about the computational simulation, which was modelled around the assumption of phase-resetting being triggered by acoustic stimuli alone. Given how appealing the model already is, I wonder whether the authors might refine the model accordingly and integrate the phase-resetting impact of visual stimuli upon synthetic alpha rhythms.

      We appreciate the reviewer’s insightful comment regarding the potential influence of flash-induced phase resetting on the temporal extent of the IAF differences. We acknowledge that the observation—that wider ISIs are associated with a longer period of IAF differences—hints at a non-linear or even super-additive interaction between auditory- and flash-induced phase resetting mechanisms.

      However, the primary focus of our current study is on how auditory stimuli affect alpha oscillatory dynamics. Our experimental design and computational model were specifically optimized to capture auditory-induced phase resetting. Incorporating the additional influence of flash-induced effects would require a significantly more refined experimental framework and a more complex modeling approach. This added complexity could obscure the interpretation of our main findings, which are centered on auditory influences.

      In the revised manuscript, we will address this intriguing possibility in the Discussion section. We will acknowledge that while the data hint at a potential visual contribution, our present model deliberately isolates auditory-induced phase resetting to maintain clarity. We also propose that future research, with more precise experimental designs and enhanced modeling techniques, is necessary to fully disentangle and capture the interplay between auditory and flash-induced phase resetting mechanisms.

      Relatedly, I would also suggest the authors to throw in a few more simulations to explore the parameter space and assay, to which quantitative extent the model still holds (e.g. allowing alpha frequency to randomly change within a range between 8 and 13 Hz, or pivoting the phase delay around 10 or 50 ms).

      We appreciate the reviewer’s suggestion to further explore our model’s parameter space. In response, we will conduct additional simulations that incorporate variability in alpha frequency—sampling randomly between 8 and 13 Hz—and examine alternative phase delays (e.g., around 10 and 50 ms). By systematically adjusting these parameters, we can more thoroughly evaluate the model’s robustness and delineate its boundaries under a broader range of neurophysiological conditions. We will present these results in the revised manuscript and discuss how they inform our understanding of alpha-driven visual integration in cross-modal contexts.

      As a last remark, I would avoid, or at least tone down, concluding that the results hereby presented might reconcile and/or explain the null effects in Buergers & Noppeney, 2022; as the relationship between IAFs and audiovisual abilities still holds when examining other cross-modal paradigms such as the Sound-Induced Flash-Illusion (Noguchi, 2022), and the aforementioned patterns might be due to other factors, such as a too small sample size (Samaha & Romei, 2024).

      We appreciate the reviewer’s suggestion and will revise our claims accordingly. In the revised manuscript, we will clarify that while our study demonstrates a mechanism by which alpha oscillations influence audiovisual integration in certain paradigms, this does not mean that our findings fully reconcile all conflicting results in the literature. We will emphasize that our mechanism may help explain why alpha frequency plays a critical role in some experimental settings, but that factors such as sample size, task parameters, and experimental design differences likely contribute to the divergent results observed across studies. Accordingly, we acknowledge that further research with larger samples and more refined methodologies is necessary to fully reconcile these discrepancies. This more cautious interpretation will be clearly discussed in the revised manuscript.

      Reviewer #2 (Public review):

      Summary:

      The authors used a visual flash discrimination task in which two flashes are presented one after another with different inter-stimulus intervals. Participants either perceive one flash or two flashes. The authors show that the simultaneous presence of an auditory input extends the temporal window of integration, meaning that two flashes presented shortly after one another are more likely to be perceived as a single flash. Auditory inputs are accompanied by a reduction in alpha frequency over visual areas. Prestimulus alpha frequency predicts perceptual outcomes in the absence of auditory stimuli, whereas prestimulus alpha phase becomes the dominant predictor when auditory input is present. A computational model based on phase-resetting theory supports these findings. Additionally, a transcranial stimulation experiment confirms the causal role of alpha frequency in unimodal visual perception but not in cross-modal contexts.

      Strengths:

      The authors elegantly combined several approaches-from behavior to computational modeling and EEG-to provide a comprehensive overview of the mechanisms involved in visual integration in the presence or absence of auditory input. The methods used are state-of-the-art, and the authors attempted to address possible pitfalls.

      Weaknesses:

      The use of Bayesian statistics could further strengthen the paper, especially given that a few p-values are close to the significance threshold (lines 162 & 258), but they are interpreted differently in different cases (absence of effect vs. trend).

      We appreciate the reviewer’s suggestion regarding the use of Bayesian statistics. We agree that a Bayesian framework can offer valuable complementary insights to our analysis by helping to distinguish whether a marginal p-value represents a trend or truly indicates the absence of an effect. To enhance the robustness of our conclusions, we will incorporate supplemental Bayesian analyses in the revised manuscript.

      Overall, these results provide new insights into the role of alpha oscillations in visual processing and offer an interesting perspective on the current debate regarding the roles of alpha phase and frequency in visual perception. More generally, they contribute to our understanding of the neural dynamics of multisensory integration.

      Reviewer #3 (Public review):

      Summary:

      The authors investigated the impact of an auditory stimulus on visual integration at the behavioral, electrophysiological, and mechanistic levels. Although the role of alpha brain oscillations on visual perception has been widely studied, how the brain dynamics in the visual cortices are influenced by a cross-modal stimulus remains ill-defined. The authors demonstrated that auditory stimulation systematically induced a drop in visual alpha frequency, increasing the time window for audio-visual integration, while in the unimodal condition, visual integration was modulated by small variations within the alpha frequency range. In addition, they only found a role of the phase of alpha brain oscillations on visual perception in the cross-modal condition. Based on the perceptual cycles' theory framework, the authors developed a model allowing them to describe their results according to a phase resetting induced by the auditory stimulation. These results showed that the influence of well-known brain dynamics on one modality can be disrupted by another modality. They provided insights into the importance of investigating cross-modal brain dynamics, and an interesting model that extends the perceptual cycle framework.

      Strengths:

      The results are supported by a combination of various, established experimental and analysis approaches (e.g., two-flash fusion task, psychometric curves, phase opposition), ensuring strong methodological bases and allowing direct comparisons with related findings in the literature.

      The model the authors proposed is an extension and an improvement of the perceptual cycle's framework. Interestingly, this model could then be tested in other experimental approaches.

      Weaknesses:

      There is an increasing number of studies in cognitive neuroscience showing the importance of considering inter-individual variability. The individual alpha frequency (IAF) varied from 8 to 13 Hz with a huge variability across participants, and studies have shown that the IAF influenced visual perception. Investigating inter-individual variations of the IAF in the reported results would be of great interest, especially for the model.

      We appreciate the reviewer’s valuable feedback regarding the importance of inter-individual variability in alpha frequency. In our current study, we have already addressed participant-level variability in our neural data by performing inter-subject correlation analyses, investigating whether individual reductions in alpha frequency correlate with broader temporal integration windows at the behavioral level.

      Moreover, our computational model incorporates physiologically realistic distributions for key parameters, including frequency and amplitude, which captures some degree of individual variability. Nevertheless, we acknowledge that a more targeted examination of how different IAF values specifically affect the model’s predictions would be highly valuable. In response, we will expand our simulations to systematically explore a range of IAF values and assess their impact on temporal integration windows and related measures of audiovisual processing. These additional analyses will help clarify the role of inter-individual variability in alpha frequency and further strengthen the mechanistic account offered by our model. We will detail these enhancements and discuss their implications in the revised manuscript.

      Although the use of non-invasive brain stimulation to infer causality is a method of great interest, the use of tACS in the presented work is not optimal. Instead of inducing alpha brain oscillations in visual cortices, the use of tACS to activate the auditory cortex instead of the actual auditory stimulation would have presented more interest.

      We appreciate the reviewer’s suggestion and acknowledge that non-invasive brain stimulation offers promising avenues for inferring causality. In our study, our primary hypothesis focused on the role of occipital alpha oscillations in defining the temporal window for visual integration, and accordingly we targeted visual cortex in our tACS protocol.

      We recognize that stimulating the auditory cortex could provide additional insights into auditory contributions to phase resetting. However, accurately targeting the auditory cortex with tACS presents technical challenges. The auditory cortex is located deeper within the temporal lobe, and factors such as variable skull thickness and complex current spread make it difficult to reliably modulate its neural activity compared to the more superficial visual areas. Indeed, recent studies have demonstrated that tACS-induced electric fields in the temporal regions tend to be weaker and less focal—for example, Huang et al. (2017) and Opitz et al. (2016) highlight the limitations in achieving robust stimulation of deeper or anatomically complex brain regions using conventional tACS approaches.

      Given these considerations, while we agree that future investigations could benefit from exploring auditory cortex stimulation—either as an alternative or as a complementary approach—the present study remains focused on visual alpha modulation, where our protocol is well validated and yields reliable results. In the revised manuscript, we will clearly discuss these issues and acknowledge the potential, yet technically challenging, possibility of stimulating the auditory cortex in future work to further disentangle the contributions of auditory and visual inputs to cross-modal integration.

    1. Author response:

      Reviewer 1 (Public Review):

      “Summary:

      In this paper, the authors aimed to test the ability of bumblebees to use bird-view and ground-view for homing in cluttered landscapes. Using modelling and behavioural experiments, the authors showed that bumblebees rely most on ground-views for homing.

      Strengths:

      The behavioural experiments are well-designed, and the statistical analyses are appropriate for the data presented.

      Weaknesses:

      Views of animals are from a rather small catchment area.

      Missing a discussion on why image difference functions were sufficient to explain homing in wasps (Murray and Zeil 2017).

      The artificial habitat is not really 'cluttered' since landmarks are quite uniform, making it difficult to infer ecological relevance.”

      Thank you for your thorough evaluation of our study. We aimed to investigate local homing behaviour on a small scale, which is ecologically relevant given that the entrance of bumblebee nests is often inconspicuously hidden within the vegetation. This requires bees to locate their nest entrance using views within a confined area. While many studies have focused on larger scales using radar tracking (e.g. Capaldi et al. 2000; Osborne et al. 2013; Woodgate et al. 2016), there is limited understanding of the mechanisms behind local homing on a smaller scale, especially in dense environments.

      We appreciate your suggestion to include the study by Murray and Zeil (2017) in our discussion. Their research explored the catchment areas of image difference functions on a larger spatial scale with a cubic volume of 5m x 5m x 5m. Aligned with their results, we found that image difference functions pointed towards the location of the objects surrounding the nest when the images were taken above the objects. However, within the clutter, i.e. the dense set of objects surrounding the nest, the model did not perform well in pinpointing the nest position.

      We agree with your comment about the term "clutter". Therefore, we will refer to our landmark arrangement as a "dense environment" instead. Uniformly distributed objects do indeed occur in nature, as seen in grasslands, flower meadows, or forests populated with similar plants.

      Reviewer 2 (Public Review):

      Summary:

      In a 1.5m diameter, 0.8m high circular arena bumblebees were accustomed to exiting the entrance to their nest on the floor surrounded by an array of identical cylindrical landmarks and to forage in an adjacent compartment which they could reach through an exit tube in the arena wall at a height of 28cm. The movements of one group of bees were restricted to a height of 30cm, the height of the landmark array, while the other group was able to move up to heights of 80cm, thus being able to see the landmark array from above.

      During one series of tests, the flights of bees returning from the foraging compartment were recorded as they tried to reach the nest entrance on the floor of the arena with the landmark array shifted to various positions away from the true nest entrance location. The results of these tests showed that the bees searched for the net entrance in the location that was defined by the landmark array.

      In a second series of tests, access to the landmark array was prevented from the side, but not from the top, by a transparent screen surrounding the landmark array. These tests showed that the bees of both groups rarely entered the array from above, but kept trying to enter it from the side.

      The authors express surprise at this result because modelling the navigational information supplied by panoramic snapshots in this arena had indicated that the most robust information about the location of the nest entrance within the landmark array was supplied by views of the array from above, leading to the following strong conclusions:

      line 51: "Snapshot models perform best with bird's eye views"; line 188: "Overall, our model analysis could show that snapshot models are not able to find home with views within a cluttered environment but only with views from above it."; line 231: "Our study underscores the limitations inherent in snapshot models, revealing their inability to provide precise positional estimates within densely cluttered environments, especially when compared to the navigational abilities of bees using frog's-eye views." Strengths:

      The experimental set-up allows for the recording of flight behaviour in bees, in great spatial and temporal detail. In principle, it also allows for the reconstruction of the visual information available to the bees throughout the arena.

      The experimental set-up allows for the recording of flight behaviour in bees, in great spatial and temporal detail. In principle, it also allows for the reconstruction of the visual information available to the bees throughout the arena.

      Weaknesses:

      Modelling:

      Modelling left out information potentially available to the bees from the arena wall and in particular from the top edge of the arena and cues such as cameras outside the arena. For instance, modelled IDF gradients within the landmark array degrade so rapidly in this environment, because distant visual features, which are available to bees, are lacking in the modelling. Modelling furthermore did not consider catchment volumes, but only horizontal slices through these volumes.

      When we started modelling the bees’ homing based on image-matching, we included the arena wall. However, the model simulations pointed only coarsely towards the clutter but not toward the nest position. We hypothesised that the arena wall and object location created ambiguity. Doussot et al. (2020) showed that such a model can yield two different homing locations when distant and local cues are independently moved. Therefore, we reduced the complexity of the environment by concentrating on the visual features, which were moved between training and testing. (Neither the camera nor the wall were moved between training and test). We acknowledge that this information should have been provided to substantiate our reasoning. As such, we will include model results with the arena wall in the revised paper.

      As we wanted to investigate if bees would use ground views or bird’s eye views to home in a dense environment, we think the catchment volumes would provide qualitatively similar, though quantitatively more detailed information as catchment slices. Our approach of catchment slices is sufficient to predict whether ground or bird' s-eye views perform better in leading to the nest, and we will, therefore, not include further computations of catchment volumes.

      Behavioural analysis:

      The full potential of the set-up was not used to understand how the bees' navigation behaviour develops over time in this arena and what opportunities the bees have had to learn the location of the nest entrance during repeated learning flights and return flights.

      Without a detailed analysis of the bees' behaviour during 'training', including learning flights and return flights, it is very hard to follow the authors' conclusions. The behaviour that is observed in the tests may be the result of the bees' extended experience shuttling between the nest and the entry to the foraging arena at 28cm height in the arena wall. For instance, it would have been important to see the return flights of bees following the learning flights shown in Figure 17.

      Basically, both groups of bees (constrained to fly below the height of landmarks (F) or throughout the height of the arena (B)) had ample opportunities to learn that the nest entrance lies on the floor of the landmark array. The only reason why B-bees may not have entered the array from above when access from the side was prevented, may simply be that bumblebees, because they bumble, find it hard to perform a hovering descent into the array.

      A prerequisite for studying the learning flight in a given environment is showing that the bees manage to return to their home. Here, our primary goal was to demonstrate this within a dense environment. While we understand that a detailed analysis of the learning and return flights would be valuable, we feel this is outside the scope of this particular study.

      Multi-snapshot models have been repeatedly shown to be sufficient to explain the homing behaviour in natural as well as artificial environments. A model can not only be used to replicate but also to predict a given outcome and shape the design of experiments. Here, we used the models to shape the experimental design, as it does not require the entire history of the bee's trajectory to be tested and provides interesting insight into homing in diverse environments.

      Our current knowledge of learning flights did not permit these investigations of bee training. Firstly, our setup does not allow us to record each inbound and outbound flight of the bumblebees during training. Doing so would require blocking the entire colony for extended time periods, potentially impairing the motivation of the bees to forage or the survival and development of the colony. Secondly, the exact locations where bees learn or if and whether they continuously learn by weighting the visual experience based on their positions and orientations is not always clear. It makes it difficult to categorise these flights accurately in learning and return flights. Additionally, homing models remain elusive on the learning mechanisms at play during the learning flights. Therefore, we believe that continuous effort must be made to understand bees' learning and homing ability. We felt it was necessary first to establish that bees could navigate back to the nest in a dense, cluttered environment. With this understanding, we are currently conducting a detailed study of the bees' learning flights in various dense environments and provide these results in a separate article.

      While we acknowledge that the bees had ample opportunities to learn the location of the nest entrance, we believe that their behaviour of entering the dense environment at a very low altitude cannot be solely explained by extended experience. It is possible that the bees could have also learned to enter at the edge of the objects or above the objects before descending within the clutter.

      General:

      The most serious weakness of the set-up is that it is spatially and visually constrained, in particular lacking a distant visual panorama, which under natural conditions is crucial for the range over which rotational image difference functions provide navigational guidance. In addition, the array of identical landmarks is not representative of natural clutter and, because it is visually repetitive, poses un-natural problems for view-based homing algorithms. This is the reason why the functions degrade so quickly from one position to the next (Figures 9-12), although it is not clear what these positions are (memory0-memory7).

      In conclusion, I do not feel that I have learnt anything useful from this experiment; it does suggest, however, that to fully appreciate and understand the homing abilities of insects, there is no alternative but to investigate these abilities in the natural conditions in which they have evolved.

      We respectfully disagree with the evaluation that our study does not provide new insights due to the controlled lab conditions. Both field and lab research are absolutely necessary and should feed each other. Dismissing the value of controlled lab experiments would overlook the contributions of previous lab-based research, which has significantly advanced our understanding of animal behaviour. It is only possible to precisely define the visual test environments under laboratory conditions and to identify the role of these components for the behaviour through targeted variation of individual components of the environment. These results should guide field-based experiments for validation.

      Our lab settings are a kind of abstraction of natural situations focusing on those aspects that are at the centre of the research question. Our approach here was that bumblebees have to find their inconspicuous nest hole in nature, which is difficult to find in often highly dense environments, and ultimately on a spatial scale in the metre range. We first wanted to find out if bumblebees can find their nest hole under the particularly challenging condition that all objects surrounding the nest hole are the same. This was not yet clear. Uniformly distributed objects may, however, also occur in nature, as seen with visually inconspicuous nest entrances of bumblebees in grass meadows, flower meadows, or forests with similar plants. We agree that the term "clutter" is not well-defined in the literature and will refer to our environment as a "dense environment."

      Despite the lack of a distant visual panorama, or also UV light, wind, or other confounding factor inherent to field work, the bees successfully located the nest position even when we shifted the dense environment within the flight arena. We used rotational-image difference functions based on snapshots taken around the nest position to predict the bees' behaviour, as this is one of the most widely accepted and computationally most parsimonious

      mechanisms for homing. This approach also proved effective in our more restricted conditions, where the bees still managed to pinpoint their home.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      This paper tackles an important question: What drives the predictability of pre-stimulus brain activity? The authors challenge the claim that "pre-onset" encoding effects in naturalistic language data have to reflect the brain predicting the upcoming word. They lay out an alternative explanation: because language has statistical structure and dependencies, the "pre-onset" effect might arise from these dependencies, instead of active prediction. The authors analyze two MEG datasets with naturalistic data.

      Strengths:

      The paper proposes a very reasonable alternative hypothesis for claims in prior work. Two independent datasets are analyzed. The analyses with the most and least predictive words are clever, and nicely complement the more naturalistic analyses.

      Weaknesses:

      I have to admit that I have a hard time understanding one conceptual aspect of the work, and a few technical aspects of the analyses are unclear to me. Conceptually, I am not clear on why stimulus dependencies need to be different from those of prediction. Yes, it is true that actively predicting an upcoming word is different from just letting the regression model pick up on stimulus dependencies, but given that humans are statistical learners, we also just pick up on stimulus dependencies, and is that different from prediction? Isn't that in some way, the definition of prediction (sensitivity to stimulus dependencies, and anticipating the most likely upcoming input(s))?

      This brings me to some of the technical points: If the encoding regression model is learning one set of regression weights, how can those reflect stimulus dependencies (or am I misunderstanding which weights are learned)? Would it help to fit regression models on for instance, every second word or something (that should get rid of stimulus dependencies, but still allow to test whether the model predicts brain activity associated with words)? Or does that miss the point? I am a bit unclear as to what the actual "problem" with the encoding model analyses is, and how the stimulus dependency bias would be evident. It would be very helpful if the authors could spell out, more explicitly, the precise predictions of how the bias would be present in the encoding model.

      We thank the reviewer for their comments and address both points.

      Conceptually, there is a key difference between encoding predictions, i.e. pre-activations of future words, versus encoding stimulus dependencies. The speech acoustics provide a useful control case: they encode the stimulus (and therefore stimulus dependencies) but do not predict. When we apply the encoding analysis to the acoustics (i.e. when we estimate the acoustics pre-onset from post-onset words), we observe the “hallmarks of prediction” – yet, clearly, the acoustics aren't "predicting" the next word.

      This reveals the methodological issue: if the brain were just passively filtering the stimulus (akin to a speech spectrogram), these "prediction hallmarks" would still appear in the acoustics encoding results, despite no actual prediction taking place. Therefore, one necessary criterion for concluding pre-activation from pre-stimulus neural encoding, is that at least the pre-stimulus encoding performance is better on neural data than on the stimulus itself. This would show that the pre-onset neural signal contains additional predictive information about the next word beyond that of the stimulus (e.g. acoustics) itself. We will make this point more prominent in the revision.

      Regarding the regression: different weights are estimated per time point in a time-resolved regression. This allows for modeling of unfolding responses over time, but also for the learning of stimulus dependencies.

      To sum up, the difference between encoding dependencies and predictions is at the core of our work. We appreciate this was not clear in the initial version and we will make this much clearer in the revision, conceptually and methodologically.

      Reviewer #2 (Public review):

      Summary:

      At a high level, the reviewers demonstrate that there is an explanation for pre-word-onset predictivity in neural responses that does not invoke a theory of predictive coding or processing. The paper does this by demonstrating that this predictivity can be explained solely as a property of the local mutual information statistics of natural language. That is, the reason that pre-word onset predictivity exists could simply boil down to the common prevalence of redundant bigram or skip-gram information in natural language.

      Strengths:

      The paper addresses a problem of significance and uses methods from modern NeuroAI encoding model literature to do so. The arguments, both around stimulus dependencies and the problems of residualization, are compellingly motivated and point out major holes in the reasoning behind several influential papers in the field, most notably Goldstein et al. This result, together with other papers that have pointed out other serious problems in this body of work, should provoke a reconsideration of papers from encoding model literature that have promoted predictive coding. The paper also brings to the forefront issues in extremely common methods like residualization that are good to raise for those who might be tempted to use or interpret these methods incorrectly.

      Weaknesses:

      The authors don't completely settle the problem of whether pre-word onset predictivity is entirely explainable by stimulus dependencies, instead opting to show why naive attempts at resolving this problem (like residualization) don't work. The paper could certainly be better if the authors had managed to fully punch a hole in this.

      We thank the reviewer for their assessment.

      We believe the limitation we highlight extends beyond the specific method of residualizing features. Rather, it points to a fundamental problem: adjusting the features (X matrix) alone cannot address stimulus dependencies that persist in the signal (y matrix), as we demonstrate by using a different signal (acoustics) that encodes no predictions. While removing dependencies from the signal would be more thorough, this would also eliminate the effect of interest. We view this as a fundamental limitation of the encoding analysis approach combined with the experimental design, rather than something that can be resolved analytically. We will perform additional analyses to test this premise and elaborate on this point in our revision.

      Reviewer #3 (Public review):

      Summary:

      The study by Schönmann et al. presents compelling analyses based on two MEG datasets, offering strong evidence that the pre-onset response observed in a highly influential study (Goldstein et al., 2022) can be attributed to stimulus dependencies, specifically, the auto-correlation in the stimuli-rather than to predictive processing in the brain. Given that both the pre-onset response and the encoding model are central to the landmark study, and that similar approaches have been adopted in several influential works, this manuscript is likely to be of high interest to the field. Overall, this study encourages more cautious interpretation of pre-onset responses in neural data, and the paper is well written and clearly structured.

      Strengths:

      (1) The authors provide clear and convincing evidence that inherent dependencies in word embeddings can lead to pre-activation of upcoming words, previously interpreted as neural predictive processing in many influential studies.

      (2) They demonstrate that dependencies across representational domains (word embeddings and acoustic features) can explain the pre-onset response, and that these effects are not eliminated by regressing out neighboring word embeddings - an approach used in prior work.

      (3) The study is based on two large MEG datasets, showing that results previously observed in ECoG data can be replicated in MEG. Moreover, the stimulus dependencies appear to be consistent across the two datasets.

      Weaknesses:

      (1) To allow a more direct comparison with Goldstein et al., the authors could consider using their publicly available dataset.

      (2) Goldstein et al. already addressed embedding dependencies and showed that their main results hold after regressing out the embedding dependencies. This may lessen the impact of the concerns about self-dependency raised here.

      (3) While this study shows that stimulus dependency can account for pre-onset responses, it remains unclear whether this fully explains them, or whether predictive processing still plays a role. The more important question is whether pre-activation remains after accounting for these confounds.

      We thank the reviewer for their comments.

      We want to address a key unclarity regarding the procedure of regressing out embedding dependencies. While Goldstein et al. showed that neural encoding results persist after their control analysis (like we did, too, in our supplementary Figure S3), this does not lessen the concern surrounding stimulus dependencies. Our analyses demonstrate that even after such residualization, the "hallmarks of prediction" remain encodable in the speech acoustics – a control system that, by definition, cannot predict upcoming words. Therefore, the hallmarks of prediction can be fully explained by stimulus dependencies. This persistence in the acoustics strengthens rather than lessens our concerns about dependencies.

      This connects to a broader methodological point: our key evidence comes from analyzing the stimulus material itself as a control system. By comparing results from encoding neural responses to those of a system that encodes the stimulus, and therefore the dependencies that cannot predict the upcoming input (like acoustics), we can establish proper criteria for concluding that the brain engages in prediction. Notably, the Goldstein dataset was not available when we conducted this research. However, for the revision we will perform additional analyses to make a more direct comparison.

      Finally, our focus was not to definitively test whether the brain predicts upcoming words, but rather to establish rigorous methodological and epistemological criteria for making such claims. We will elaborate on this crucial distinction in our revision and more prominently feature our central argument about the limitations of current evidence for neural prediction.

    1. Author response:

      The following is the authors’ response to the original reviews

      Response to public reviews:

      We thank the reviewers for their careful evaluation of our manuscript and appreciate the suggestions for improvement. We will outline our planned revisions in response to these reviews.

      Reviewer 2: “The one exception is the claim that "maintenance of respiration is the only cellular target of chalkophore mediated copper acquisition." While under the in vitro conditions tested this does appear to be the case; however, it can't be ruled out that the chalkophore is important in other situations. In particular, for maintenance of the periplasmic superoxide dismutase, SodC, which is the other M. tuberculosis enzyme known to require copper.”

      And

      Reviewer 3: “Because the phenotype of M. tuberculosis lacking chalkophores is similar, if not identical, to using Q203, an inhibitor of cytochrome bcc:aa3, the authors propose that the coppercontaining cytochrome bcc:aa3 is the only recipient of copper-uptake by chalkophores. A minor weakness of the work is that this latter conclusion is not verified under infection conditions and other copper-enzymes might still be functionally required during one or more stages of infection.

      Both comments concern the question of whether the bcc:aa3 respiratory oxidase supercomplex is the only target of chalkophore delivered copper. In culture, our experiments suggest that bcc:aa3 is the only target. The evidence for this claim is in Figure 2E and F. In 2E, we show that M. tuberculosis D_ctaD_ (a subunit of bcc:aa3) is growth impaired, copper chelation with TTM does not exacerbate that growth defect, and that a D_ctaD_D_nrp_ double mutant is no more sensitive to TTM than D_ctaD_. These data indicate that role of the chalkophore in protecting against copper deprivation is absent when the bcc:aa3 oxidase is missing. Similar results were obtained with Q203 (Figure 2F). Q203 or TTM arrest growth of M. tuberculosis D_nrp, but the combination has no additional effect, indicating that when Q203 is inhibiting the _bcc:aa3 oxidase, the chalkophore has no additional role. However, we agree with the reviewers that we cannot exclude the possibility that during infection, there is an additional target of chalkophore mediated Cu acquisition. We have added this caveat to the discussion of revised version of this manuscript. 

      Response to Reviewers Recommendations for the authors:

      Reviewing Editor Comments:

      In addition to the specific recommendations below, there was consensus that the conclusions/discussion should contextualize that the results cannot exclude that in other conditions (such as in infection), enzymes other than cytochrome bcc:aa3 receive copper from the chalkophore system.

      Reviewer #1 (Recommendations for the authors):

      (1) In the introduction, the authors mention that the nrp operon is only present in pathogenic Mtb and Mycobacterium marinum but not non-pathogenic mycobacterium. Is the nrp operon present in other pathogenic mycobacterium such as in M. leprae, M. avium or M. abscessus?

      Bhatt et al (PMID 30381350) presented an analysis of the distribution of nrp gene clusters in mycobacteria and concluded that M. bovis, M. leprae and M. canetti clearly encode nrp genes. M. marinum has been shown to have a functional chalkophore biosynthetic cluster, but the presence of this system in other mycobacteria awaits experimental validation. We have added the Bhatt reference to this sentence in the introduction. 

      (2) Figure 1A - it would be helpful if the genes were grouped and labeled as per their purpose (for example, CytBD components, bcc:aa3 components). While these are described in the text, the genes belonging to the chalkophore cluster are not defined in the text, and are thus not easily identified in the figure.

      The order of genes in the heatmap is determined by unsupervised clustering as indicated by the dendrogram to the left of the heatmap. To highlight chalkophore and CytBD genes, we have added color coding to the gene names and explained this color coding in the legend. 

      (3) Figure 2B/2C - it is interesting that complementation of ΔnrpΔcydAB with cydABCD does not rescue growth to Δnrp levels. Is there an explanation for this? 

      AND

      (4) Figure 2C - BCS is not introduced in the text for this figure nor are the results described - which seems like an oversight. It is interesting that BCS treatment does have a full rescue with cydABCD complementation, while TTM treatment does not. Is there an explanation for this?

      We thank the reviewer for raising this issue. We have attempted several different complementation constructs, including CydAB alone and different promoters, to address the partial complementation in question. However, we do not have an adequate explanation for this partial complementation. As the reviewer notes, the partial complementation is only evident with TTM, not BCS. However, we cannot speculate on the reason for this difference at present.  We have added a note to the text in the results section noting this difference. 

      (5) Figure 2F - is there a reason for the change in TTM concentrations (50 μM TTM vs 10 μM TTM)? Is the concentration for Q203 in both single treatment and combinatory tests 100nM?  

      We have clarified the 100nm Q203 concentration in the figure legend. To avoid confusion, we have removed the 50µM TTM condition from panel F because the growth inhibition phenotype of 10µM is shown in panel E and is the comparator for the combined TTM/Q203 condition in panel F. 

      (6) Figure 3A - I assume d0 = day 0, d3 = day 3. This should be defined.

      We have modified the legend to clarify these abbreviations. 

      (7) Figure 4B - as complementation of nrp for ΔnrpΔcydAB returns levels back to WT, I assume there is no attenuation with ΔcydAB alone? Clarification would be appreciated.

      The mouse phenotype of M. tuberculosis D_cydAB_ is reported here:

      https://www.pnas.org/doi/10.1073/pnas.1706139114#sec-1 and this paper is reference 22 of the paper and was noted in the discussion. 

      Reviewer #2 (Recommendations for the authors):

      In vitro conditions that require SodC could reveal a role for the chalkophore (ie., exposure to extracellular or periplasmic superoxide stress under low iron conditions). Some minor confusion exists with the terminology around the two oxidases found in M. tuberculosis. The bcc:aa3 oxidase is a supercomplex between the reductase and oxidase complexes. This point should be clarified in the introduction as the term supercomplex isn't used until later in line 194 and without definition. Referring to the bcc:aa3 supercomplex as an oxidase is fine but is sometimes confusing especially when mentioning the target of Q203 is the oxidase as it targets the reductase portion of the supercomplex.

      We thank the reviewer for this point. We have modified the text to refer to the supercomplex at first mention and modified subsequent mentions to be clearer. 

      In the RNA preparation section boxes appear in several places where spaces should be.

      We do not see these boxes so we suspect this is a conversion error of some type. 

      Reviewer #3 (Recommendations for the authors):

      The authors have very carefully performed their studies and their main conclusions are amply supported by the data. The manuscript is also very clearly written, and easily accessible to a broad audience interested in both bioinorganic chemistry and mycobacteria. I have two recommendations:

      (1) I agree that the evidence shows that chalkophores provide copper to cytochrome bcc:aa3. Under lab-culture conditions, it could well be that, when cytochrome bd is deleted or inhibited, cytochrome bcc:aa3 is rate limiting. Under lab-culture conditions, it is also clear that only the expression of a select number of enzymes is affected. However, this does not mean that cytochrome bcc:aa3 is the ONLY enzyme that receives copper from chalkophores. Thus, under infection conditions, other copper enzymes might be important. For instance, M. tuberculosis expresses a Cu-Zn superoxide dismutase. In summary, perhaps the authors would consider changing the wording of statements such as that in Figure 2E and the conclusions drawn in the discussion.

      This comment concerns the question of whether the bcc:aa3 respiratory supercomplex is the only target of chalkophore delivered copper. In culture, our experiments suggest that the supercomplex is the only target. The evidence for this claim is in Figure 2E and F. In 2E, we show that M. tuberculosis D_ctaD_ (a subunit of the bcc:aa3 supercomplex) is growth impaired, copper chelation with TTM does not exacerbate that growth defect, and that a D_ctaD_D_nrp_ double mutant is no more sensitive to TTM than D_ctaD_. These data indicate that role of the chalkophore in protecting against copper deprivation is absent when the bcc:aa3 supercomplex is missing. Similar results were obtained with Q203 (Figure 2F). Q203 or TTM arrest growth of M. tuberculosis D_nrp, but the combination has no additional effect, indicating that when Q203 is inhibiting _bcc:aa3, the chalkophore has no additional role. However, we agree with the reviewers that we cannot exclude the possibility that during infection, there is an additional target of chalkophore mediated Cu acquisition. We have added the following to the discussion: “Although chalkophore mediated protection of the bcc:aa3 supercomplex is an important virulence function, we cannot exclude the possibility that additional copper dependent enzymes use chalkophore delivered copper during infection.”

      (2) There is a difference between copper-uptake (e.g. by chalkophores) and the maturation of metallo-enzymes. A short paragraph discussing knowledge from other bacteria in this area would help understand the role chalkophores (e.g. see 10.1128/mBio.00065-18 or 10.1111/mmi.14701). This could possibly be extended with a genome analysis to check which other proteins are present in M. tuberculosis.

      We thank the reviewer for this point. We agree that our data does not distinguish between 1) a generic role for the chalkophore in copper uptake, with the ultimate candidate metalloenzyme rendered dysfunctional by copper loss, and 2) the chalkophore being an intrinsic part of the cytochrome maturation pathway and interacting directly with the target enzymes. We have added this point to the discussion but have not otherwise added the suggested full discussion of metalloenzyme maturation as we believe this discussion is beyond the scope of our data. 

      Finally, can I suggest the labels d0 and d3 are made clearer in Figure 3A (and defined in the legend).

      We have modified the legend to be clearer.

    1. Author response:

      The following is the authors’ response to the previous reviews

      We thank the editors and Reviewers 1 and 3 for their though6ul consideration of our manuscript. The present revision is submitted to address comments raised concerning rank determinations and the following sentence in the editorial assessment:

      The evidence that food-washing is deliberate is compelling, but the evidence for variable and adaptive investment depending on rank, including the fitness-relevance and ultimate evolutionary implications of the findings, is incomplete given limitations of the experimental design.

      Close reading of this sentence reveals two parallel threads. The first can be read as “…evidence for variable rank is incomplete given the limitations of the experimental design,” whereas the second can be read as “…evidence for adaptive investment and fitness is incomplete given the limitations of the experimental design.” The first alludes to a critique of our methods, while the second alludes to points of discussion unrelated to our experimental design. Unpacking this sentence is important because it casts the totality of our paper as “incomplete,” a word of consequence for early-career scholars because it prevents indexing in Web of Science.

      For clarity, we will refer to these topics as Thread 1 and Thread 2 in the following response.

      Thread 1 seems rooted in a comment made by Reviewer 1, which is reproduced below:

      I am still struck that there was an analysis of only trials where <3 individuals are present. If rank was important, I would imagine that behavior might be different in social contexts when theA, scrounging, policing, aggression, or other distractions might occur-- where rank would have effects on foraging behavior. Maybe lower rankers prioritize rapid food intake then. If rank should be related to investment in this behavior, we might expect this to be magnified (or different) in social contexts where it would affect foraging. It might just be that the data was too hard to score or process in those settings, or the analysis was limited. Additionally, I think that more robust metrics of rank from more densely sampled focal follow data would be a beJer measure, but I acknowledge the limitations in getting the ideal. Since rank is central to the interpretation of these results, I think that reduced social contexts in which rank was analyzed and the robustness of the data from which rank was calculated and analyzed are the main weaknesses of the evidence presented in this paper.

      We are grateful for this perspective of Reviewer 1, but it puts us in an uncomfortable position. We must respond rather forcefully because of its influence on the above assessment. A problem with R1’s comment is that it uses the word “foraging” (a behavior we did not study) instead of “cleaning” (the behavior we did study). Still, we can substitute the latter word with the former to get the gist of it. 

      R1 criticizes our methods as a prelude for imagining the behaviors of our study animals, a form of conjecture. R1 correctly supposes a positive relationship between the number of animals and the intensity of competition for a limited food resource, a well-known phenomenon; and, yes, the food in each trial was decidedly limited, being fixed at nine cucumber slices. But R1 incorrectly presumes rank effects on cleaning under conditions of intense food competition. When the number of monkeys participating in a trial exceeded the number of feeding stations (n = 3), we saw little or no cleaning effort, either brushing or washing. So, rank effects on cleaning are immaterial under these conditions. As our study goals were narrowly focused on detecting individual propensities, or choices, as a function of rank, we limited our analysis to trials involving three monkeys or fewer. In retrospect, we admit that we should have provided better justification for our choice of trials, so we’ve edited one of our sentences:

      Original sentence 

      Formerly lines 219-220: To minimize the potential confounding effects of dominance interactions, we analyzed trials with ≤ 3 monkeys.

      Revised sentence

      Current lines 219-224: We excluded trials from analysis if the number of participating monkeys exceeded the number of feeding stations, as these conditions produced high levels of feeding competition with scant cleaning behavior. Such conditions effectively erased individual variation in sand removal, the topic motivating our experiment. Accordingly, we analyzed trials with ≤ 3 monkeys, putting 937 food-handling bouts into the GLMM statistical models, which included data on individual rank, sex, and sand treatment.

      R1’s final criticism – “I think that more robust metrics of rank from more densely sampled focal follow data would be a better measure, but I acknowledge the limitations in getting the ideal” – seems to imply that rank data were collected during our experiment. On the contrary, we determined ranks from five years of focal follows preceding the experiment, achieving the very standard that R1 describes as ideal. The relevant text appeared on lines 165-169 in version 2.0:

      To determine the rank-order of adults, we recorded dyadic agonistic interactions and their outcomes (i.e., aggression, supplants, and silent-bared-teeth displays of submission) during 5min focal follows of individuals based on a randomized order of continuous rotation (Tan et al., 2018). In some cases, these data were supplemented with ad libitum observations. This protocol existed during five years (2013-2018) of continual observations before we conducted our experiment in July-August 2018. 

      Naturally, we were puzzled by R1’s dismissal of our methods, as well as R1’s conclusion, reached without evidence, that “[the] reduced social contexts in which rank was analyzed and the robustness of the data from which rank was calculated and analyzed are the main weaknesses of the evidence presented in this paper.” It is unsubstantiated assertation with no definition of robustness, making it difficult for anyone to objectively assess the quality of our data.

      We detect in R1’s words some unfamiliarity with the social organization of our study species, which is fair enough. To better orient readers to the dominance hierarchy of Macaca fascicularis, and to boost reader confidence in the volume and quality of our rank data, we have added several sentences to this section of the manuscript, lines 169-183:

      Macaques form multi-male multi-female (polygynandrous) social groups with individual dominance hierarchies. In M. fascicularis, the hierarchy is strictly linear and extremely steep, meaning aggression is unidirectional (de Waal, 1977; van Noordwijk and van Schaik, 2001) with profound asymmetries in outcomes for individuals of adjacent ranks (Balasubramaniam et al., 2012). Further, the dominance hierarchies of philopatric females are stable and predictable. Daughters follow the pattern of youngest ascendancy, ranking just below their mothers with few known exceptions among older sisters (de Waal, 1977; van Noordwijk and van Schaik, 1999). Taken together, these species traits are conducive to unequivocal rank determinations. 

      To determine the rank-order of adults in our study group, we recorded dyadic agonistic interactions and their outcomes (i.e., aggression, supplants, and silent-bared-teeth displays of submission) during 5-min focal follows of individuals based on a randomized order of continuous rotation (Tan et al., 2018). These data were supplemented with ad libitum observations and all rank determinations were updated monthly, and when males immigrated or emigrated. This protocol predates our experiment in July-August 2018, representing 970 hr of focal data during five years of systematic study (2013-2018). 

      Thread 2 criticizes our evidence for adaptive investment and fitness, describing it is a limitation of our experimental design. Accordingly, the totality of our experiment was classified as “incomplete.” Yet, our experiment was never designed to collect such evidence, and we make no claims of having it. Rather, we discussed potential fitness consequences to highlight the broader significance of our study, connecting it diverse bodies of literature, from evolutionary theory to paleoanthropology. Our intent was to follow the conventions of scientific writing; to put our results into conversation with the wider literature and set an agenda for future research.

      On reflection, Thread 2 seems to pivot around something as arbitrary as structure. Previously, our results and discussion were combined under a single section header (“Results and Discussion”), a stylistic choice to economize words. Our manuscript is a Short Report, which is limited to 1,500 words of main text. But this level of concision proved counterproductive. It blurred our results and discussion in the minds of readers. Indeed, Reviewer 3 described it as “misleading,” a barbed word that accomplishes the same act attributed to us. To counter this perspective, we have simply partitioned our Results (now “Experimental Results”) and Discussion to draw a sharper distinction between the two components of our paper.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, Muramoto and colleagues have examined a mechanism by which the executioner caspase Drice is activated in a non-lethal context in Drosophila. The authors have comprehensively examined this in the Drosophila olfactory receptor neurons using sophisticated techniques. In particular, they had to engineer a new reporter by which non-lethal caspase activation could be detected. The authors conducted a proximity labeling experiment and identified Fasciclin 3 as a key protein in this context. While the removal of Fascilin 3 did not block non-lethal caspase activation (likely because of redundant mechanisms), its overexpression was sufficient to activate non-lethal caspase activation.

      Strengths:

      While non-lethal functions of caspases have been reported in several contexts, far less is known about the mechanisms by which caspases are activated in these non-lethal contexts. So, the topic is very timely. The overall detail of this work is impressive and the results for the most part are wellcontrolled and justified.

      Weaknesses:

      The behavioral results shown in Figure 6 need more explanation and clarification (more details below). As currently shown, the results of Figure 6 seem uninterpretable. Also, overall presentation of the Figures and description in legends can be improved.

      We sincerely thank the reviewer for their highly positive evaluation of our study, particularly from a technical perspective. We also greatly appreciate the valuable comments provided on our manuscript. In response, we have revised the manuscript with a particular focus on Figure 6, as well as the overall presentation of the figure and its description in the legends, in accordance with the reviewer’s suggestions. For further clarification, please refer to our detailed point-by-point responses provided below.

      Reviewer #2 (Public review):

      In this study, the authors investigate the role of caspases in neuronal modulation through non-lethal activation. They analyze proximal proteins of executioner caspases using a variety of techniques, including TurboID and a newly developed monitoring system based on Gal4 manipulation, called MASCaT. They demonstrate that overexpression of Fas3G promotes the non-lethal activation of caspase Dronc in olfactory receptor neurons. In addition, they investigate the regulatory mechanisms of non-lethal function of caspase by performing a comprehensive analysis of proximal proteins of executioner caspase Drice. It is important to point out that the authors use an array of techniques from western blot to behavioral experiments and also that the generated several reagents, from fly lines to antibodies.

      This is an interesting work that would appeal to readers of multiple disciplines. As a whole these findings suggest that overexpression of Fas3G enhances a non-lethal caspase activation in ORNs, providing a novel experimental model that will allow for exploration of molecular processes that facilitate caspase activation without leading to cell death.

      We sincerely thank the reviewer for their highly positive evaluation of our study, particularly from a methodological perspective. We also greatly appreciate the valuable comments provided on our manuscript. In response, we have revised the manuscript in line with the reviewer’s suggestions. For further clarification, please refer to our detailed point-by-point responses provided below.

      Reviewing Editor comments:

      I am pleased to let you know that our reviewers found the results in your paper important and the evidence compelling. There are a few minor comments and a point was raised regarding figure 6 for which further details were asked. Please see the reviewer's comments. We are looking forward to receiving an updated version of your very interesting paper.

      We are grateful to you and the reviewers for dedicating time to review our manuscript and for providing insightful comments and suggestions. We have revised our manuscript in line with the reviewers' feedback. The major revision involves clarifying the two-choice preference assay presented in Figure 6. Details of these revisions are provided in our point-by-point responses to the reviewers’ comments below. The new and extensively modified sections of text are highlighted in blue. We have introduced new panels (Figures 1D, 3D, 6B, and 6C) and made modifications to Figure 6A. The previous Figure 1D has been relocated to Figure 1–figure supplement 1B. Additionally, our detailed responses to the reviewers’ comments are also highlighted in blue within the point-by-point response section. With all concerns and suggestions from the Editor and reviewers addressed, our conclusion—that executioner caspase is proximal to Fasciclin 3 which facilitates non-lethal activation in Drosophila olfactory receptor neurons—is now more robustly supported. We are confident that our revised manuscript makes a significant contribution to the fields of caspase function and neurobiology. We remain hopeful that the reviewers will find it suitable for publication in eLife.

      Reviewer #1 (Recommendations for the authors):

      The main comment here is related to Figure 6, which needs to be better explained. First, if the results in Figure 6B and C are conducted with young flies, why is the preference index close to 0? Aren't these young flies more attracted to ACV? Second, what are the results with Dronc-RNAi and DroncDN alone? These should be shown to more accurately assess the outcome of Fas3G expression with and without Dronc inhibition. Third, if Fas3G overexpression induces non-lethal caspase activation and a behavioral change, why does Dronc inhibition enhance (and not suppress) this behavioral change?

      We sincerely thank the reviewer for the comment. We used one-week-old young flies for the two-choice preference assay. We found that 16 hours of starvation combined with 25% ACV in the trap elicited a robust attraction behavior to the vinegar (New Figure 6B). In contrast, 4 hours of starvation with 1% ACV in the trap resulted in milder attraction behavior, with the preference index value being close to 0 but still showing a positive trend (New Figure 6B). Since our hypothesis is that non-lethal caspase activation suppresses attraction behavior, and that inhibiting caspase activation could enhance attraction, we used the milder experimental condition for subsequent analyses.

      In the original manuscript, we did not test Dronc inhibition alone because caspase activation is rarely observed in young flies (as demonstrated in Figure 3C, New Figure 3D, etc), suggesting that Dronc inhibition during this stage would not affect behavior. This hypothesis is further supported by previous research showing that inhibition of caspase activity in aged flies restores attraction behavior but does has no effect in young flies (Chihara et al., 2014). To validate this hypothesis, we conducted the two-choice preference assay again, including caspase activity inhibition by Dronc<sup>DN</sup> expression alone. As expected, Dronc inhibition alone did not alter behavior in young flies (New Figure 6C).

      We also observed that Fas3G overexpression promotes a weak, though not statistically significant, enhancement in attraction behavior. Importantly, simultaneous inhibition of caspase activity further enhanced attraction behavior (New Figure 6C). These results suggest that Fas3G overexpression has a dual function: one aspect promotes attraction behavior, while the other induces non-lethal caspase activation. In this context, non-lethal caspase activation appears to counteract the behavioral response, acting as a regulatory brake. To address the reviewer’s comments comprehensively, we included the New Figure 6B and replaced the original Figure 6B and C with New Figure 6C. Additionally, we revised the manuscript text as follows:

      Using a two-choice preference assay with ACV (Figure 6A), we found that 16 hours of starvation combined with 25% ACV in the trap elicited a robust attraction behavior to the vinegar (Figure 6B). In contrast, 4 hours of starvation with 1% ACV in the trap resulted in milder attraction behavior, with the preference index value being close to 0 but still showing a positive trend (Figure 6B). Under the milder experimental condition, we first confirmed that inhibition of caspase activity through expressing Dronc<sup>DN</sup> didn’t affect attraction behavior in young adult (Figure 6C), consistent with a previous report (Chihara et al., 2014).We then observed that the overexpression of Fas3G, which activates caspases, did not impair attraction behavior. Instead, it rather appeared to enhance the tendency for attraction behavior (Figure 6C), suggesting that Fas3G promotes attraction behavior. Finally, we found that inhibiting Fas3G overexpression-facilitated non-lethal caspase activation by expressing Dronc<sup>DN</sup> strongly promoted attraction to ACV (Figure 6C). Overall, these results suggest that Fas3G overexpression has a dual function: it enhances attraction behavior while also triggering non-lethal caspase activation, which counteracts the behavioral response, functioning as a regulatory brake without causing cell death.

      Other minor comments are below:

      The authors should clarify that while they refer to their caspases reporters as "non-lethal caspase reporters", these are caspase reporters in general and can report both lethal and non-lethal caspase activation. Of course, the only surviving cells are those that experience non-lethal caspase activation.

      We thank the reviewer for pointing this out. This reporter can monitor caspase activation with high sensitivity only if the cell is capable of transcribing and translating the reporter proteins following cleavage of the probe, most likely in living cells. However, as mentioned, using the term “non-lethal reporter” is not accurate, as additional experiments are required to determine whether caspase activation leads to cell death. Therefore, we removed the term “non-lethal” and referred to this reporter simply as a highly sensitive caspase reporter in the revised manuscript.

      Some of the figure panels could be better described in the legends (e.g. Figure 1E, 1F, 4E, 4F).

      We thank the reviewer for the comment. We have included additional explanations in the figure legends throughout the manuscript.

      In Figure 3C, the OL and AL regions should be marked in the figure as done in Figure 1C.

      We thank the reviewer for the comment. We have marked OL and AL regions in Figure 3C and Figure 2A as in Figure 1C.

      In Figures 4A and B, the authors should rearrange the order of the x-axis to reflect the order that appears in the text (Dronc first).

      We thank the reviewer for the comment. We have rearranged the order of labels on the X-axis to reflect the order that appears in the text.

      In Figure 6B, do the colors imply anything? If so, it should be explained. 

      We thank the reviewer for pointing this out. We intended to use the colors where the light blue bars represent Fas3G overexpression, while the red dots indicate caspase-activated conditions. In the New Figure 6C, we used light blue dots for Fas3G overexpression and red bars for caspase-activated conditions. We have added an explanation in the figure legend. In addition, we have removed the colors in Figure 4B and have added an explanation in the figure legend in Figure 4D.  

      Reviewer #2 (Recommendations for the authors):

      (1) For the methods section make a table for the lines, the way they are listed is not the most easy to read.

      We thank the reviewer for the comment. We have listed the fly strains used in this study in Table S3.

      (2) Lines 420 to 573, not sure why this is here, this information should be in the figure or figure legend, or make a table if necessary.

      We thank the reviewer for the comment. We have listed the detailed genotypes corresponding to each figure in Table S4.

      (3) Blocking with donkey serum, do you get better results than bovine?

      We have not conducted tests with bovine serum for immunohistochemistry. Donkey serum was used throughout the manuscript.

      (4) The Methods section is very thorough and complete but I recommend the use of tables to organize some of the reagents used.

      We thank the reviewer for the comment. We have listed the fly strains used in this study in Table S3 and the detailed genotypes corresponding to each figure in Table S4.

      (5) Line 647 spells out LC-MS/MS.

      We thank the reviewer for pointing this out. We have provided the full spelling as “liquidchromatography-tandem mass spectrometry”.

      (6) Line 808 spells out ACV (apple cider vinegar) and MQ (MilliQ water).

      We thank the reviewer for pointing this out. We have provided the full spelling as suggested.

      (7) Figure 1D. Why do you use only females? 

      We thank the reviewer for pointing this out. In the original manuscript, we analyzed female flies by crossing each Gal4 strain with UAS-Drice-RNAi; Drice::V5::TurboID virgin females. In this case, because Pebbled-Gal4 is located on X chromosome, we could only use female flies for the analysis. To address this, we examined the expression pattern in males flies by crossing each Gal4 virgin female with UAS-Drice-RNAi; Drice::V5::TurboID males. As expected, Drice expression is also mostly depleted when using the ORN-specific Gal4 driver, Pebbled-Gal4, suggesting that Drice expression is predominantly observed in ORNs in males as well. We have added New Figure 1D to present the male data. The original Figure 1D, which presents female data, has been relocated to Figure 1–figure supplement 1B.

      (8) Figure 1D. Be clear about the LN driver used here in the figure.

      We thank the reviewer for pointing this out. We used Orb<sup>0449</sup>-Gal4 driver (#63325, Bloomington Drosophila Stock Center), which has been previously characterized as an LN-specific Gal4 driver (Wu et al., 2017). Accordingly, we have revised “LN-Gal4” to “Orb<sup>0449</sup>-Gal4” throughout the manuscript.

      (9) Figure 1 and Supplementary Figure 1 images are very good. I would recommend the use of a different color palette, to help visualization for colorblind readers (such as this reviewer).

      We apologize for any inconvenience caused. We chose the green/magenta color pair because these are complementary colors, which generally provide better contrast compared to other color pairs. Therefore, we have decided to continue using this pair. To enhance readability, we have intensified the magenta signal in the New Figure 1D and Figure 1–figure supplement 1B. We retained the original magenta signal levels in Figure 1C and Figure 1–figure supplement 1A to avoid oversaturation. Instead, we have kept the Streptavidin-only signal images alongside the color merged images for clarity. We hope these adjustments improve the visualization and help you better interpret the figures.

      (10) Based on Supplementary Figure 1 and based on the fact that Figures 1B and 1C use males, why not used also males for Figure 1D?

      Please refer to our reply to comment #7. We have now included the results for males in the New Figure 1D, which show a similar expression pattern to that observed in females. The results for females originally shown in Figure 1D have been relocated to Figure 1–figure supplement 1B.

      (11) Why were the old versus young flies used for Figure 3 raised at 29C? Why not let the animals age at 25C? The use of 29C throughout the manuscript is not clear.

      We thank the reviewer for pointing this out. Most of the UAS fly strains used in this study, including a Fas3G overexpression line, are UASz lines, which exhibit relatively low expression levels compared to UASt lines (DeLuca and Spradling, 2018). Since the Gal4/UAS system is temperature-dependent (Duffy, 2002), we performed most of the experiments at 29°C to enhance gene expression.

      For the aging experiments, we chose to rear flies at 29°C because higher temperatures accelerate aging including neuronal aging (Okenve-Ramos et al., 2024), allowing for faster experimentation, and 29°C is within the ecologically relevant range of temperatures for Drosophila melanogaster (SotoYéber et al., 2018). Additionally, we confirmed that a subset of olfactory receptor neurons undergo aging-dependent caspase activation at both 29°C and 25°C, as shown in New Figure 3D.

      (12) Why not use an Or42b specific GAL 4 for the aging experiment? What are the odorants that are detected by this ORN? Are any of the odorants behaviorally relevant compounds?

      We thank the reviewer for pointing this out. While the exact odorant detected by Or42b neurons has not been fully determined, these neurons innervate the DM1 region in the antennal lobe, which is activated by ACV. Additionally, Or42b neurons have been shown to be required for attraction behavior to ACV (Semmelhack and Wang, 2009), supporting the relevance of ACV for the behavioral experiment.   We used Or42b-Gal4 to confirm that Or42b neurons undergo aging-dependent caspase activation, which is detectable using the MASCaT system (New Figure 3D). Furthermore, we verified that these neurons exhibit aging-dependent caspase activation at both 25°C and 29°C (New Figure 3D).

      (13) Make the panel lettering in all the figures bigger or bold.

      We thank the reviewer for pointing this out. We have increased the size of the panel lettering and made it bold throughout the figures to improve the readability.

      (14) Line 806. MilliQ water.

      We thank the reviewer for pointing this out. We have ensured that “MilliQ water” is consistently spelled this way throughout the manuscript.

      (15) Figure 6. The authors need to be more clear on the experimental conditions. At what time of the day was this experiment performed? Was the experiment run in DD? Were the flies young or old?

      We thank the reviewer for pointing this out. We performed the assay using one-week-old young flies under constant dark conditions during both the starvation period and the assay. We have added a detailed explanation in the Methods section. For clarity, we have also revised Figure 6A to provide a more detailed explanation of the experimental setup.

      References

      Chihara T, Kitabayashi A, Morimoto M, Takeuchi K-I, Masuyama K, Tonoki A, Davis RL, Wang JW, Miura M. 2014. Caspase inhibition in select olfactory neurons restores innate attraction behavior in aged Drosophila. PLoS Genet 10:e1004437.

      DeLuca SZ, Spradling AC. 2018. Efficient expression of genes in the Drosophila germline using a UAS promoter free of interference by Hsp70 piRNAs. Genetics 209:381–387.

      Duffy JB. 2002. GAL4 system in Drosophila: a fly geneticist’s Swiss army knife. Genesis 34:1–15.

      Okenve-Ramos P, Gosling R, Chojnowska-Monga M, Gupta K, Shields S, Alhadyian H, Collie C, Gregory E, Sanchez-Soriano N. 2024. Neuronal ageing is promoted by the decay of the microtubule cytoskeleton. PLoS Biol 22:e3002504.

      Semmelhack JL, Wang JW. 2009. Select Drosophila glomeruli mediate innate olfactory attraction and aversion. Nature 459:218–223.

      Soto-Yéber L, Soto-Ortiz J, Godoy P, Godoy-Herrera R. 2018. The behavior of adult Drosophila in the wild. PLoS One 13:e0209917.

      Wu B, Li J, Chou Y-H, Luginbuhl D, Luo L. 2017. Fibroblast growth factor signaling instructs ensheathing glia wrapping of Drosophila olfactory glomeruli. Proc Natl Acad Sci U S A 114:7505–7512.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer 1 (Public Review):

      Summary:

      In this paper, the authors aimed to test the ability of bumblebees to use bird-view and ground-view for homing in cluttered landscapes. Using modelling and behavioural experiments, the authors showed that bumblebees rely most on ground-views for homing.

      Strengths:

      The behavioural experiments are well-designed, and the statistical analyses are appropriate for the data presented.

      Weaknesses:

      Views of animals are from a rather small catchment area.

      Missing a discussion on why image difference functions were sufficient to explain homing in wasps (Murray and Zeil 2017).

      The artificial habitat is not really 'cluttered' since landmarks are quite uniform, making it difficult to infer ecological relevance.

      Thank you for your thorough evaluation of our study. We aimed to investigate local homing behaviour on a small spatial scale, which is ecologically relevant given that the entrance of bumblebee nests is often inconspicuously hidden within the vegetation. This requires bees to locate their nest hole within a confined area. While many studies have focused on larger spatial scales using radar tracking (e.g. Capaldi et al. 2000; Osborne et al. 2013; Woodgate et al. 2016), there is limited understanding of the mechanisms behind local homing, especially in dense environments as we propose here.

      We appreciate your suggestion to include the study by Murray and Zeil (2017) in our discussion. Their research explored the catchment areas of image difference functions on a larger spatial scale with a cubic volume of 5m x 5m x 5m. Aligned with their results, we found that image difference functions pointed towards the location of the objects surrounding the nest when the images were taken above the objects. However, within the clutter, i.e. the dense set of objects surrounding the nest, the model did not perform well in pinpointing the nest position.

      See the new discussion at lines 192-197

      We agree with your comment about the term "clutter". Therefore, we referred to our landmark arrangement as a "dense environment" instead. Uniformly distributed objects do indeed occur in nature, as seen in grasslands, flower meadows, or forests populated with similar plants.

      See line 20 and we changed the wording throughout the manuscript and figures.

      Reviewer 1 (Recommendations): 

      The manuscript is well written, nicely designed experiments and well illustrated. I have a few comments below.

      It would be useful to discuss known data of learning flights in bumblebees, and the height or catchment area of their flights. This will allow the reader to compare your exp design to the natural learning flights.

      In our study, we first focused on demonstrating the ability to solve a homing task in a dense environment. As we observed the bees returning within the dense environment and not from above it (contrary to the model predictions), we investigated whether they flew above it during their first flights. The bees did indeed fly above, demonstrating their ability to ascend and descend within the constellation of objects (see Supplementary Material Fig. 22).

      In nature, the learning flight of bumblebees may cover several decametres, with the loops performed during these flights increasing with flight time (e.g. Osborne et al. 2013; Woodgate et al. 2016). A similar pattern can be observed on a smaller spatial scale (e.g. Philippides et al. 2013). Similar to the loops that extend over time, the bees gradually gain altitude (Lobecke et al., 2018). However, these observations come from studies where few conspicuous objects surround the nest entrance.

      Although our study  focussed on the performance in goal finding in cluttered environments, we now also address the issue of learning flights in the discussion, as learning flights are the scaffolding of visual learning. We have already conducted several learning flight experiments to fill the knowledge gap mentioned above. These will allow us in a forthcoming paper to compare learning flights in this environment with the existing literature (Sonntag et al., 2024).

      We added a reference to this in the discussion (lines 218-219 and 269-272)

      Include bumblebee in the title rather than 'bees'.

      We adapted the title accordingly:

      “Switching perspective: Comparing ground-level and bird’s-eye views for bumblebees navigating dense environments”

      I found switching between bird-views and frog-views to explain bee-views slightly tricky to read. Why not use 'ground-views', which you already have in the title?

      We agree and adapted the wording in the manuscript according to this suggestion.

      I am not convinced there is evidence here to suggest the bees do not use view-based navigation, because of the following: In L66: unclear what were the views centred around, I assume it is the nest. Is 45cm above the ground the typical height gained by bumblebees during learning flight? The clutter seems to be used more as an obstacle that they are detouring to reach the goal, isn't it?

      Based on many previous studies, view-based navigation can be assumed to be one of the plausible mechanisms bees use for homing (Cartwright & Collett, 1987; Doussot et al., 2020; Lehrer & Collett, 1994; Philippides et al., 2013; Zeil, 2022). In our tests, when the dense environment was shifted to a different position in the flight arena, almost no bees searched at the real location of the nest entrance but at the fictive new location within the dense environment, indicating that the bees assumed  the nest to be located within the dense environment, and therefore  that vision played a crucial role for homing. We thus never meant that the bees were not using view-based navigation. We clarified this point in the revised manuscript.

      See lines 247-248, 250-259, added visual memory to schematic in Fig. 6

      In our model simulations, the memorised snapshots were centred around the nest. However, we found that a multi-snapshot model could not explain the behaviour of the bees. This led us to suggest that bees likely employ acombination of multiple mechanisms for navigation.

      We refined paragraph about possible alternative homing mechanisms. See lines  218-263

      The height of learning flights has not been extensively investigated in previous studies, and typical heights are not well-documented in the literature. However, from our observations of the first outbound flights of bumblebees within the dense environment, we noted that they quickly increased their altitude and then flew above the objects. Since the objects had a height of 0.3 metres, we chose 0.45 metres as a height above the objects for our study.

      Furthermore, the nest is positioned within the arrangement of objects, making it a target the bees must actively find rather than detour around.

      I think a discussion to contrast your findings with Murray and Zeil 2017 will be useful. It was unclear to me whether the flight arena had UV availability, if it didn't, this could be a reason for the difference.

      We referred to this study in the discussion of the revised paper (see our response to the public review). Lines 192-197

      As in most lab studies on local homing, the bees did not have UV light available in the arena. Even without this, they were successful in finding their nest position during the tests. We clarified that in the revised manuscript. See line 334-336

      Figure 2A, can you add a scale bar?

      We added a scale bar to the figure showing the dimensions of the arena. See Fig. 2

      The citation of figure orders is slightly off. We have Figure 5 after Figure 2, without citing Figures 3 and 4. Similarly for a few others.

      We carefully checked the order of cited figures and adapted them.

      Reviewer 2 (Public Review):

      Summary:

      In a 1.5m diameter, 0.8m high circular arena bumblebees were accustomed to exiting the entrance to their nest on the floor surrounded by an array of identical cylindrical landmarks and to forage in an adjacent compartment which they could reach through an exit tube in the arena wall at a height of 28cm. The movements of one group of bees were restricted to a height of 30cm, the height of the landmark array, while the other group was able to move up to heights of 80cm, thus being able to see the landmark array from above.

      During one series of tests, the flights of bees returning from the foraging compartment were recorded as they tried to reach the nest entrance on the floor of the arena with the landmark array shifted to various positions away from the true nest entrance location. The results of these tests showed that the bees searched for the net entrance in the location that was defined by the landmark array.

      In a second series of tests, access to the landmark array was prevented from the side, but not from the top, by a transparent screen surrounding the landmark array. These tests showed that the bees of both groups rarely entered the array from above, but kept trying to enter it from the side.

      The authors express surprise at this result because modelling the navigational information supplied by panoramic snapshots in this arena had indicated that the most robust information about the location of the nest entrance within the landmark array was supplied by views of the array from above, leading to the following strong conclusions: line 51: "Snapshot models perform best with bird's eye views"; line 188: "Overall, our model analysis could show that snapshot models are not able to find home with views within a cluttered environment but only with views from above it."; line 231: "Our study underscores the limitations inherent in snapshot models, revealing their inability to provide precise positional estimates within densely cluttered environments, especially when compared to the navigational abilities of bees using frog's-eye views."

      Strengths:

      The experimental set-up allows for the recording of flight behaviour in bees, in great spatial and temporal detail. In principle, it also allows for the reconstruction of the visual information available to the bees throughout the arena.

      The experimental set-up allows for the recording of flight behaviour in bees, in great spatial and temporal detail. In principle, it also allows for the reconstruction of the visual information available to the bees throughout the arena.

      Weaknesses:

      Modelling:

      Modelling left out information potentially available to the bees from the arena wall and in particular from the top edge of the arena and cues such as cameras outside the arena. For instance, modelled IDF gradients within the landmark array degrade so rapidly in this environment, because distant visual features, which are available to bees, are lacking in the modelling. Modelling furthermore did not consider catchment volumes, but only horizontal slices through these volumes.

      When we started modelling the bees’ homing based on image-matching, we included the arena wall. However, the model simulations pointed only coarsely towards the dense environment but not toward the nest position. We hypothesised that the arena wall and object location created ambiguity. Doussot et al. (2020) showed that such a model can yield two different homing locations when distant and local cues are independently moved. Therefore, we reduced the complexity of the environment by concentrating on the visual features, which were moved between training and testing (neither the camera nor the wall were moved between training and test). We acknowledge that this information should have been provided to substantiate our reasoning. As such, we included model results with the arena wall in the supplements of the revised paper. See lines 290-293, Figures S17-21

      We agree that the catchment volumes would provide quantitatively more detailed information as catchment slices. Nevertheless, since our goal was  to investigate if bees would use ground views or bird's eye views to home in a dense environment, catchment slices, which provide qualitatively similar information as catchment volumes, are sufficient to predict whether ground or bird's-eye views perform better in leading to the nest. Therefore, we did not include further computations of catchment volumes. (ll. 296-297)

      Behavioural analysis:

      The full potential of the set-up was not used to understand how the bees' navigation behaviour develops over time in this arena and what opportunities the bees have had to learn the location of the nest entrance during repeated learning flights and return flights.

      Without a detailed analysis of the bees' behaviour during 'training', including learning flights and return flights, it is very hard to follow the authors' conclusions. The behaviour that is observed in the tests may be the result of the bees' extended experience shuttling between the nest and the entry to the foraging arena at 28cm height in the arena wall. For instance, it would have been important to see the return flights of bees following the learning flights shown in Figure 17. Basically, both groups of bees (constrained to fly below the height of landmarks (F) or throughout the height of the arena (B)) had ample opportunities to learn that the nest entrance lies on the floor of the landmark array. The only reason why B-bees may not have entered the array from above when access from the side was prevented, may simply be that bumblebees, because they bumble, find it hard to perform a hovering descent into the array.

      A prerequisite for studying the learning flight in a given environment is showing that the bees manage to return to their home. Here, our primary goal was to demonstrate this within a dense environment. While we understand that a detailed analysis of the learning and return flights would be valuable, we feel this is outside the scope of this particular study.

      Multi-snapshot models have been repeatedly shown to be sufficient to explain the homing behaviour in natural as well as artificial environments(Baddeley et al., 2012; Dittmar et al., 2010; Doussot et al., 2020; Möller, 2012; Wystrach et al., 2011, 2013; Zeil, 2012). A model can not only be used to replicate but also to predict a given outcome and shape the design of experiments. Here, we used the models to shape the experimental design, as it does not require the entire history of the bee's trajectory to be tested and provides interesting insight into homing in diverse environments.

      Since we observed behavioural responses different from the one suggested by the models, it becomes interesting to look at the flight history. If we had found an alignment between the model and the behaviour, looking at thehistory would have become much less interesting. Thus our results raise an interest in looking at the entire flight history, which will require not only effort on the recording procedure, but as well conceptually. At the moment the underlying mechanisms of learning during outbound, inbound, exploration, or orientation flight remains evasive and therefore difficult to test a hypothesis. A detailed description of the flight during the entire bee history would enable us to speculate alternative models to the one tested in our study, but would remain limited in testing those.

      While we acknowledge that the bees had ample opportunities to learn the location of the nest entrance, we believe that their behaviour of entering the dense environment at a very low altitude cannot be solely explained by extended experience. It is possible that the bees could have also learned to enter at the edge of the objects or above the objects before descending within the dense environment.

      General:

      The most serious weakness of the set-up is that it is spatially and visually constrained, in particular lacking a distant visual panorama, which under natural conditions is crucial for the range over which rotational image difference functions provide navigational guidance. In addition, the array of identical landmarks is not representative of natural clutter and, because it is visually repetitive, poses un-natural problems for view-based homing algorithms. This is the reason why the functions degrade so quickly from one position to the next (Figures 9-12), although it is not clear what these positions are (memory0-memory7).

      In conclusion, I do not feel that I have learnt anything useful from this experiment; it does suggest, however, that to fully appreciate and understand the homing abilities of insects, there is no alternative but to investigate these abilities in the natural conditions in which they have evolved.

      We respectfully disagree with the evaluation that our study does not provide new insights due to the controlled laboratory conditions. Both field and laboratory research are necessary and should complement each other. Dismissing the value of controlled lab experiments would overlook the contributions of previous lab-based research, which has significantly advanced our understanding of animal behaviour. It is only possible to precisely define the visual test environments under laboratory conditions and to identify the role of the components of the environment for the behaviour through targeted variation of them. These results yield precious information to then guide future field-based experiments for validation.

      Our laboratory settings are a kind of abstraction of natural situations focusing on those aspects that are at the centre of the research question. Our approach here was based on the knowledge that bumblebees have to find their inconspicuous nest hole in nature, which is difficult to find in often highly dense environments, and ultimately on a spatial scale in the metre range. We first wanted to find out if bumblebees can find their nest hole under the particularly challenging condition that all objects surrounding the nest hole are the same. This was not yet clear. Uniformly distributed objects may, however, also occur in nature, as seen with visually inconspicuous nest entrances of bumblebees in grass meadows, flower meadows, or forests with similar plants. We agree that the term "clutter" is not well-defined in the literature and now refer to the  environment as a "dense environment."

      We changed the wording throughout the manuscript and figures.

      Despite the lack of a distant visual panorama, or also UV light, wind, or other confounding factors inherent to field work conditions, the bees successfully located the nest position even when we shifted the dense environment within the flight arena. We used rotational-image difference functions based on snapshots taken around the nest position to predict the bees' behaviour, as this is one of the most widely accepted and computationally most parsimonious assessments of catchment areas in the context of local homing. This approach also proved effective in our more restricted conditions, where the bees still managed to pinpoint their home.

      Reviewer 2 (Recommendations):

      (1) Clarify what is meant by modelling panoramic images at 1cm intervals (only?) along the x-axis of the arena.

      The panoramic images were taken along a grid with 0.5cm steps within the dense environment and 1cm steps in the rest of the arena. A previous study (Doussot et al., 2020) showed successful homing of multi-snapshot models in an environment of similar scale with a grid with 2cm steps. Therefore, we think that our scaling is sufficiently fine. We apologise for the missing information in the method section and added it to the revised manuscript. See lines 286-287

      (2) In Figures 9-12 what are the memory0 to memory7 locations and reference image orientations? Explain clearly which image comparisons generated the rotIDFs shown.

      Memory 0 to memory 7 are examples of the eight memorised snapshots, which are aligned in the nest direction and taken around the nest. In the rotIDFs shown, we took memory 0 as a reference image, and compared the 7 others by rotating them against memory 0. We clarified that in the revised manuscript.

      See revised figure caption in Fig. S9 – 16

      (3) Figure 9 seems to compare 'bird's-eye', not 'frog's-eye' views.

      We apologise for that mistake and carefully double-checked the figure caption.

      See revised figure caption Fig. S9

      (4) Why do you need to invoke a PI vector (Figure 6) to explain your results?

      Since the bees were able to home in the dense environment without entering the object arrangement from above but from the side, image matching alone could not explain the bees’ behaviour. Therefore, we suggest, as an hypothesis for future studies, a combination of mechanisms such as a home vector. Other alternatives, perhaps without requiring a PI vector, may explain the bees’ behaviour, and we will welcome any future contributions from the scientific community.

      References

      Baddeley, B., Graham, P., Husbands, P., & Philippides, A. (2012). A Model of Ant Route Navigation Driven by Scene Familiarity. PLoS Computational Biology,8(1), e1002336. https://doi.org/10.1371/journal.pcbi.1002336

      Capaldi, E. A., Smith, A. D., Osborne, J. L., Farris, S. M., Reynolds, D. R., Edwards, A. S., Martin, A., Robinson, G. E., Poppy, G. M., & Riley, J. R. (2000).

      Ontogeny of orientation flight in the honeybee revealed by harmonic radar. Nature, 403. https://doi.org/10.1038/35000564

      Cartwright, B. A., & Collett, T. S. (1987). Landmark maps for honeybees. Biological Cybernetics, 57(1), 85–93. https://doi.org/10.1007/BF00318718

      Dittmar, L., Stürzl, W., Baird, E., Boeddeker, N., & Egelhaaf, M. (2010). Goal seeking in honeybees: Matching of optic flow snapshots? Journal of Experimental Biology, 213(17), 2913–2923. https://doi.org/10.1242/jeb.043737

      Doussot, C., Bertrand, O. J. N., & Egelhaaf, M. (2020). Visually guided homing of bumblebees in ambiguous situations: A behavioural and modelling study. PLoS Computational Biology, 16(10). https://doi.org/10.1371/journal.pcbi.1008272

      Lehrer, M., & Collett, T. S. (1994). Approaching and departing bees learn different cues to the distance of a landmark. Journal of Comparative Physiology A, 175(2), 171–177. https://doi.org/10.1007/BF00215113

      Lobecke, A., Kern, R., & Egelhaaf, M. (2018). Taking a goal-centred dynamic snapshot as a possibility for local homing in initially naïve bumblebees. Journal of Experimental Biology, 221(2), jeb168674. https://doi.org/10.1242/jeb.168674

      Möller, R. (2012). A model of ant navigation based on visual prediction. Journal of Theoretical Biology, 305, 118–130. https://doi.org/10.1016/j.jtbi.2012.04.022

      Murray, T., & Zeil, J. (2017). Quantifying navigational information: The catchment volumes of panoramic snapshots in outdoor scenes. PLOS ONE, 12(10), e0187226. https://doi.org/10.1371/journal.pone.0187226

      Osborne, J. L., Smith, A., Clark, S. J., Reynolds, D. R., Barron, M. C., Lim, K. S., & Reynolds, A. M. (2013). The ontogeny of bumblebee flight trajectories: From Naïve explorers to experienced foragers. PLoS ONE, 8(11). https://doi.org/10.1371/journal.pone.0078681

      Philippides, A., de Ibarra, N. H., Riabinina, O., & Collett, T. S. (2013). Bumblebee calligraphy: The design and control of flight motifs in the learning and return flights of Bombus terrestris. Journal of Experimental Biology, 216(6), 1093–1104. https://doi.org/10.1242/jeb.081455

      Sonntag, A., Lihoreau, M., Bertrand, O. J. N., & Egelhaaf, M. (2024). Bumblebees increase their learning flight altitude in dense environments. bioRxiv, 2024.10.14.618154. https://doi.org/10.1101/2024.10.14.618154

      Woodgate, J. L., Makinson, J. C., Lim, K. S., Reynolds, A. M., & Chittka, L. (2016). Life-long radar tracking of bumblebees. PLoS ONE, 11(8). https://doi.org/10.1371/journal.pone.0160333

      Wystrach, A., Mangan, M., Philippides, A., & Graham, P. (2013). Snapshots in ants? New interpretations of paradigmatic experiments. Journal of Experimental Biology, 216(10), 1766–1770. https://doi.org/10.1242/jeb.082941

      Wystrach, A., Schwarz, S., Schultheiss, P., Beugnon, G., & Cheng, K. (2011). Views, landmarks, and routes: How do desert ants negotiate an obstacle course? Journal of Comparative Physiology A: Neuroethology, Sensory, Neural, and Behavioral Physiology, 197(2), 167–179. https://doi.org/10.1007/s00359-010-0597-2

      Zeil, J. (2012). Visual homing: An insect perspective. Current Opinion in Neurobiology, 22(2), 285–293. https://doi.org/10.1016/j.conb.2011.12.008

      Zeil, J. (2022). Visual navigation: Properties, acquisition and use of views. Journal of Comparative Physiology A. https://doi.org/10.1007/s00359-022-01599-2

    1. Author response:

      The following is the authors’ response to the original reviews

      We thank the reviewers for their careful reading of our manuscript and their considered feedback. Please see our detailed response to reviewer comments inset below.

      In addition to requested modifications we have also uploaded the proteomics data from 2 of the experiments contained within the manuscript onto the Immunological Proteome Resource (ImmPRes) website: immpres.co.uk making the data available in an easy-to-use graphical format for interested readers to interrogate and explore. We have added the following text to the data availability section (lines 1085-1091) to indicate this:

      “An easy-to-use graphical interface for examining protein copy number expression from the 24-hour TCR WT and Pim dKO CD4 and CD8 T cell proteomics and IL-2 and IL-15 expanded WT and Pim dKO CD8 T cell proteomics datasets is also available on the Immunological Proteome Resource website: immpres.co.uk (Brenes et al., 2023) under the Cell type(s) selection: “T cell specific” and Dataset selection: “Pim1/2 regulated TCR proteomes” and “Pim1/2 regulated IL2 or IL15 CD8 T cell proteomes”.”

      As well as indicating in figure legends where proteomics datasets are first introduced in Figures 1, 2 and 4 with the text:

      “An interactive version of the proteomics expression data is available for exploration on the Immunological Proteome Resource website: immpres.co.uk

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary and Strengths:

      The study focuses on PIM1 and 2 in CD8 T cell activation and differentiation. These two serine/threonine kinases belong to a large network of Serine/Threonine kinases that acts following engagement of the TCR and of cytokine receptors and phosphorylates proteins that control transcriptional, translational and metabolic programs that result in effector and memory T cell differentiation. The expression of PIM1 and PIM2 is induced by the T-cell receptor and several cytokine receptors. The present study capitalized on high-resolution quantitative analysis of the proteomes and transcriptomes of Pim1/Pim2-deficient CD8 T cells to decipher how the PIM1/2 kinases control TCRdriven activation and IL-2/IL-15-driven proliferation, and differentiation into effector T cells.

      Quantitative mass spectrometry-based proteomics analysis of naïve OT1 CD8 T cell stimulated with their cognate peptide showed that the PIM1 protein was induced within 3 hours of TCR engagement, and its expression was sustained at least up to 24 hours. The kinetics of PIM2 expression was protracted as compared to that of PIM1. Such TCRdependent expression of PIM1/2 correlated with the analysis of both Pim1 and Pim2 mRNA. In contrast, Pim3 mRNA was only expressed at very low levels and the PIM3 protein was not detected by mass spectrometry. Therefore, PIM1 and 2 are the major PIM kinases in recently activated T cells. Pim1/Pim2 double knockout (Pim dKO) mice were generated on a B6 background and found to express a lower number of splenocytes. No difference in TCR/CD28-driven proliferation was observed between WT and Pim dKO T cells over 3 days in culture. Quantitative proteomics of >7000 proteins further revealed no substantial quantitative or qualitative differences in protein content or proteome composition. Therefore, other signaling pathways can compensate for the lack of PIM kinases downstream of TCR activation.

      Considering that PIM1 and PIM2 kinase expression is regulated by IL-2 and IL-15, antigen-primed CD8 T cells were expanded in IL-15 to generate memory phenotype CD8 T cells or expanded in IL-2 to generate effector cytotoxic T lymphocytes (CTL). Analysis of the survival, proliferation, proteome, and transcriptome of Pim dKO CD8 T cells kept for 6 days in IL-15 showed that PIM1 and PIM2 are dispensable to drive the IL-15mediated metabolic or differentiation programs of antigen-primed CD8 T cells. Moreover, Pim1/Pim2-deficiency had no impact on the ability of IL-2 to maintain CD8 T cell viability and proliferation. However, WT CTL downregulated the expression of CD62L whereas the Pim dKO CTL sustained higher CD62L expression. Pim dKO CTL was also smaller and less granular than WT CTL. Comparison of the proteome of day 6 IL-2 cultured WT and Pim dKO CTL showed that the latter expressed lower levels of the glucose transporters, SLC2A1 and SLC2A3, of a number of proteins involved in fatty acid and cholesterol biosynthesis, and CTL effector proteins such as granzymes, perforin, IFNg, and TNFa. Parallel transcriptomics analysis showed that the reduced expression of perforin and some granzymes correlated with a decrease in their mRNA whereas the decreased protein levels of granzymes B and A, and the glucose transporters SLC2A1 and SLC2A3 did not correspond with decreased mRNA expression. Therefore, PIM kinases are likely required for IL-2 to maximally control protein synthesis in CD8 CTL. Along that line, the translational repressor PDCD4 was increased in Pim dKO CTL and pan-PIM kinase inhibitors caused a reduction in protein synthesis rates in IL-2expanded CTL. Finally, the differences between Pim dKO and WT CTL in terms of CD62L expression resulted in Pim dKO CTL but not WT CTL retained the capacity to home to secondary lymphoid organs. In conclusion, this thorough and solid study showed that the PIM1/2 kinases shape the effector CD8 T cell proteomes rather than transcriptomes and are important mediators of IL2-signalling and CD8 T cell trafficking.

      Weaknesses:

      None identified by this reviewer.

      Reviewer #2 (Public Review):

      Summary:

      Using a suite of techniques (e.g., RNA seq, proteomics, and functional experiments ex vivo) this paper extensively focuses on the role of PIM1/2 kinases during CD8 T-cell activation and cytokine-driven (i.e., IL-2 or IL-15) differentiation. The authors' key finding is that PIM1/2 enhances protein synthesis in response to IL-2 stimulation, but not IL-15, in CD8+ T cells. Loss of PIM1/2 made T cells less 'effector-like', with lower granzyme and cytokine production, and a surface profile that maintained homing towards secondary lymphoid tissue. The cytokines the authors focus on are IL-15 and Il-2, which drive naïve CD8 T cells towards memory or effector states, respectively. Although PIM1/2 are upregulated in response to T-cell activation and cytokine stimulation (e.g., IL-15, and to a greater extent, IL-2), using T cells isolated from a global mouse genetic knockout background of PIM1/2, the authors find that PIM1/2 did not significantly influence T-cell activation, proliferation, or expression of anything in the proteome under anti-

      CD3/CD28 driven activation with/without cytokine (i.e., IL-15) stimulation ex vivo. This is perhaps somewhat surprising given PIM1/2 is upregulated, albeit to a small degree, in response to IL-15, and yet PIM1/2 did not seem to influence CD8+ T cell differentiation towards a memory state. Even more surprising is that IL-15 was previously shown to influence the metabolic programming of intestinal intraepithelial lymphocytes, suggesting cell-type specific effects from PIM kinases. What the authors went on to show, however, is that PIM1/2 KO altered CD8 T cell proteomes in response to IL-2. Using proteomics, they saw increased expression of homing receptors (i.e., L-selectin, CCR7), but reduced expression of metabolism-related proteins (e.g., GLUT1/3 & cholesterol biosynthesis) and effector-function related proteins (e.g., IFNy and granzymes). Rather neatly, by performing both RNA-seq and proteomics on the same IL2 stimulated WT vs. PIM1/2 KO cells, the authors found that changes at the proteome level were not corroborated by differences in RNA uncovering that PIM1/2 predominantly influence protein synthesis/translation. Effectively, PIM1/2 knockout reduced the differentiation of CD8+ T cells towards an effector state. In vivo adoptive transfer experiments showed that PIM1/2KO cells homed better to secondary lymphoid tissue, presumably owing to their heightened L-selectin expression (although this was not directly examined).

      Strengths:

      Overall, I think the paper is scientifically good, and I have no major qualms with the paper. The paper as it stands is solid, and while the experimental aim of this paper was quite specific/niche, it is overall a nice addition to our understanding of how serine/threonine kinases impact T cell state, tissue homing, and functionality. Of note, they hint towards a more general finding that kinases may have distinct behaviour in different T-cell subtypes/states. I particularly liked their use of matched RNA-seq and proteomics to first suggest that PIM1/2 kinases may predominantly influence translation (then going on to verify this via their protein translation experiment - although I must add this was only done using PIM kinase inhibitors, not the PIM1/2KO cells). I also liked that they used small molecule inhibitors to acutely reduce PIM1/2 activity, which corroborated some of their mouse knockout findings - this experiment helps resolve any findings resulting from potential adaptation issues from the PIM1/2 global knockout in mice but also gives it a more translational link given the potential use of PIM kinase inhibitors in the clinic. The proteomics and RNA seq dataset may be of general use to the community, particularly for analysis of IL-15 or IL-2 stimulated CD8+ T cells.

      We thank the reviewer for their comments supporting the robustness and usefulness of our data.

      Weaknesses:

      It would be good to perform some experiments in human T cells too, given the ease of e.g., the small molecule inhibitor experiment.

      The suggestions to check PIM inhibitor effects in human T cell is a good one. We think an ideal experiment would be to use naïve cord blood derived CD4 and CD8 cells as a model to avoid the impact of variability in adult PBMC and to really look at what PIM kinases do as T cells first respond to antigen and cytokines. In this context there is good evidence that the signalling pathways used by antigen receptors or the cytokines IL-2 and IL-15 are not substantially different in mouse and human. We have also previously compared proteomes of mouse and human IL-2 expanded cytotoxic T cells and they are remarkably similar. As such we feel that mature mouse CD8 T cells are a genetically tractable model to use to probe the signalling pathways that control cytotoxic T cell function. To repeat the full set of experiments observed within this study with human T cells would represent 1-year of work by an experienced postdoctoral fellow.

      Unfortunately, the funding for the project has come to an end and there is no capacity to complete this work.

      Would also be good for the authors to include a few experiments where PIM1/2 have been transduced back into the PIM1/2 KO T cells, to see if this reverts any differences observed in response to IL-2 - although the reviewer notes that the timeline for altering primary T cells via lentivirus/CRISPR may be on the cusp of being practical such that functional experiments can be performed on day 6 after first stimulating T cells.

      A rescue experiment could indeed be informative, though of course comes with challenges/caveats with re-expressing both proteins that have been deleted at once and ability to control the level of PIM kinase that is re-expressed. This work using the Pim dKO mice was performed from 2019-2021 and was seriously impacted by the work restrictions during the COVID19 pandemic. We had to curtail all mouse colonies to allow animal staff to work within the legal guidelines. We had to make choices and the Pim1/2 dKO colony was stopped because we felt we had generated very useful data from the work but could not justify continued maintenance of the colony at such a difficult time. As such we no longer have this mouse line to perform these rescue experiments.

      We have however, performed a limited number of retroviral overexpression studies in WT IL-2-expanded CTL, where T cells were transfected after 24 hours activation and phenotype measured on day 6 of culture. We chose to leave these out of the initial manuscript as these were overexpression under conditions where PIM expression was already high, rather than a true test of the ability of PIM1 or PIM2 to rescue the Pim dKO phenotype. A more robust test would also have required doing these overexpression experiments in IL-15 expanded or cytokine deprived CTL where PIM kinase expression is low, however, we ran out of time and funding to complete this work.

      We have provided Author response image 1 below from the experiments performed in the IL-2 CTL for interested readers. The limited experiments that were performed do support some key phenotypes observed with the Pim dKO mice or PIM inhibitors, finding that PIM1 or PIM2 overexpression was sufficient to increase S6 phosphorylation, and provided a small further increase in GzmB expression above the already very high levels in IL-2 expanded CTL.

      Author response image 1.

      PIM1 or PIM2 overexpression drives increased GzmB expression and S6 phosphorylation in WT IL-2 CTL. OT1 lymph node cell suspensions were activated for 24 hours with SIINFEKL peptide (10 ng/mL), IL-2 (20 ng/mL) and IL-12 (2 ng/mL) then transfected with retroviruses to drive expression of PIM1-GFP, PIM2-GFP fusion proteins or a GFP only control. T cells were split into fresh media and IL-2 daily and (A) GzmB expression and (B) S6 phosphorylation assessed by flow cytometry in GFP+ve vs GFP-ve CD8 T cells 5 days post-transfection (i.e. day 6 of culture). Histograms are representative of 2 independent experiments.

      Other experiments could also look at how PIM1/2 KO influences the differentiation of T cell populations/states during ex vivo stimulation of PBMCs or in vivo infection models using (high-dimensional) flow cytometry (rather than using bulk proteomics/RNA seq which only provide an overview of all cells combined).

      We did consider the idea of in vivo experiments with the Pim1/2 dKO mice but rejected this idea as the mice have lost PIM kinases in all tissues and so we would not be able to understand if any phenotype was CD8 T cell selective. To note the Pim1/2 dKO mice are smaller than normal wild type mice (discussed further below) and clearly have complex phenotypes. An ideal experiment would be to make mice with floxed Pim1 and Pim2 alleles so that one could use cre recombinase to make a T cell-specific deletion and then study the impact of this in in vivo models. We did not have the budget or ethical approval to make these mice. Moreover, this study was carried out during the COVID pandemic when all animal experiments in the UK were severely restricted. So our objective was to get a molecular understanding of the consequences of losing theses kinases for CD8 T cells focusing on using controlled in vitro systems. We felt that this would generate important data that would guide any subsequent experiments by other groups interested in these enzymes.

      We do accept the comment about bulk population proteomics. Unfortunately, single cell proteomics is still not an option at this point in time. High resolution multidimensional flow cytometry is a valuable technique but is limited to looking at only a few proteins for which good antibodies exist compared to the data one gets with high resolution proteomics.

      Alongside this, performing a PCA of bulk RNA seq/proteomes or Untreated vs. IL-2 vs. IL-15 of WT and PIM1/2 knockout T cells would help cement their argument in the discussion about PIM1/2 knockout cells being distinct from a memory phenotype.

      We thank the reviewer for this very good suggestion. We have now included PCAs for the RNAseq and proteomics datasets of IL-2 and IL-15 expanded WT vs Pim dKO CTL in Fig S5 and added the following text to the discussion section of the manuscript (lines 429-431):

      “… and PCA plots of IL-15 and IL-2 proteomics and RNAseq data show that Pim dKO IL-2 expanded CTL are still much more similar to IL-2 expanded WT CTL than to IL-15 expanded CTL (Fig S5)”.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      In panel B of Figure S1, are the smaller numbers of splenocytes found in dKO fully accounted for by a reduction in the numbers of T cells or also correspond to a reduction in B cell numbers? Are the thymus and lymph nodes showing the same trend?

      We’re happy to clarify on this.

      Since we were focused on T cell phenotypes in the paper this is what we have plotted in this figure, however there is also a reduction in total number of B, NK and NKT cells in the Pim dKO mice (see James et al, Nat Commun, 2021 for additional subset percentages). We find that all immune subsets we have measured make up the same % of the spleen in Pim dKO vs WT mice (we show this for T cell subsets in what was formerly Fig S1C and is now Fig S1A), the total splenocyte count is just lower in the Pim dKO mice (which we show in what was formerly Fig S1B and is now Fig S1C). To note, the Pim dKO mice were smaller than their WT counterparts (though we have not formally weighed and quantified this) and we think this is likely the major factor leading to lower total splenocyte numbers.

      We have not checked the thymus so can’t comment on this. We can confirm that lymph nodes from Pim dKO mice had the same number and % CD4 and CD8 T cells as in WT.

      For our in vitro studies we have made sure to either use co-cultures or for single WT and Pim dKO cultures to equalise starting cell densities between wells to account for the difference in total splenocyte number. We have now clarified this point in the methods section lines 682-684

      “For generation of memory-like or effector cytotoxic T lymphocytes (CTL) from mice with polyclonal T cell repertoires, LN or spleen single cell suspensions at an equal density for WT and Pim dKO cultures (~1-3 million live cells/mL)….”

      Reviewer #2 (Recommendations For The Authors):

      Line 89-99 - PIM kinase expression is elevated in T cells in autoimmunity and inhibiting therefore may make some sense if PIM is enhancing T cell activity. Why then would you use an inhibitor in cancer settings? This needs better clarification for readers, with reference to T cells, particularly given this is an important justification for looking at PIM kinases in T cells.

      We thank the reviewer for highlighting the lack of clarity in our explanation here.

      PIM kinase inhibitors alone are proposed as anti-tumour therapies for select cancers to block tumour growth. However so far these monotherapies haven’t been very effective in clinical trials and combination treatment options with a number of strategies are being explored. There are two lines of logic for why PIM kinase inhibitors might be a good combination with an e.g. anti-PD1 or adoptive T cell immunotherapy. 1) PIM kinase inhibition has been shown to reduce inhibitory/suppressive surface proteins (e.g. PDL1) and cytokine (e.g. TGFbeta) expression in tumour cells and macrophages in the tumour microenvironment. 2) Inhibiting glycolysis and increasing memory/stem-like phenotype has been identified as desirable for longer-lasting more potent anti-tumour T cell immunity. PIM kinase inhibition has been shown to reduce glycolytic function and increase several ‘stemness’ promoting transcription factors e.g. TCF7 in a previous study. Controlled murine cancer models have shown improvement in clearance with the combination of pan-Pim kinase inhibitors and anti-PD1/PDL1 treatments (Xin et al, Cancer Immunol Res, 2021 and Chatterjee et al, Clin Cancer Res 2019).

      It is worth noting, this is seemingly contradictory with other studies of Pim kinases in T cells that have generally found Pim1/2/3 deletion or inhibition in T cells to be suppressive of their function.

      We have clarified this reasoning/seeming conflict of results in the introductory text as follows (lines 90-101):

      “PIM kinase inhibitors have also entered clinical trials to treat some cancers (e.g. multiple myeloma, acute myeloid leukaemia, prostate cancer), and although they have not been effective as a monotherapy, there is interest in combining these with immunotherapies. This is due to studies showing PIM inhibition reducing expression of inhibitory molecules (e.g. PD-L1) on tumour cells and macrophages in the tumour microenvironment and a reported increase of stem-like properties in PIM-deficient T cells which could potentially drive longer lasting anti-cancer responses (Chatterjee et al., 2019; Xin et al., 2021; Clements and Warfel, 2022). However, PIM kinase inhibition has also generally been shown to be inhibitory for T cell activation, proliferation and effector activities (Fox et al., 2003; Mikkers et al., 2004; Jackson et al., 2021) and use of PIM kinase inhibitors could have the side effect of diminishing the anti-tumour T cell response.”  

      Line 93 - The use of 'some cancers' is rather vague and unscientific - please correct phrasing like this. The same goes for lines 54 and 77 (some kinases and some analyses).

      We have clarified the sentence in what is now Line 91 to include examples of some of the cancers that PIM kinase inhibitors have been explored for (see text correction in response to previous reviewer comment), which are predominantly haematological malignancies. The use of the phrase ‘some kinases’ and ‘some analyses’ in what are now Lines 52 and 75 is in our view appropriate as the subsequent sentence/(s) provide specific details on the kinases and analyses that are being referred to.

      Lines 146-147 - Could it be that rather than redundancies, PIM KO is simply not influential on TCR/CD28 signalling in general but influences other pathways in the T cell?

      We agree that the lack of PIM1/2 effect could also be because PIM targets downstream of TCR/CD28 are not influential and have clarified the text as follows (lines 156-161):

      “These experiments quantified expression of >7000 proteins but found no substantial quantitative or qualitative differences in protein content or proteome composition in activated WT versus Pim dKO CD4 and CD8 T cells (Fig 1G-H) (Table S1). Collectively these results indicate that PIM kinases do not play an important unique role in the signalling pathways used by the TCR and CD28 to control T cell activation.”

      Line 169 - Instead of specifying control - maybe put upregulate or downregulate for clarity.

      We have changed the text as per reviewer suggestion (see line 183)

      Line 182-183 - I would move the call out for Figure 2D to after the last call out for Figure 2C to make it more coherent for readers.

      We have changed the text as per reviewer suggestion (see lines 197-200)

      Line 190 - 14,000 RNA? total, unique? mRNA?

      These are predominantly mRNA since a polyA enrichment was performed as part of the standard TruSeq stranded mRNA sample preparation process, however, a small number of lncRNA etc were also detected in our RNA sequencing. We left the results in as part of the overall analysis since it may be of interest to others but don’t look into it further. We do mention the existence of the non-mRNA briefly in the subsequent sentence when discussing the total number of DE RNA that were classified as protein coding vs non-coding.

      We have edited this sentence as follows to more accurately reflect that the RNA being referred to is polyA+ (lines 205-207):

      “The RNAseq analysis quantified ~14,000 unique polyA+ mRNA and using a cut off of >1.5 fold-change and q-value <0.05 we saw that the abundance of 381 polyA+ RNA was modified by Pim1/Pim2-deficiency (Fig 2E) (Table S2A).

      Questions/points regarding figures:

      Figure 1 - Is PIM3 changed in expression with the knockout of PIM1/2 in mice? Although the RNA is low could there be some compensation here? The authors put a good amount of effort in to showing that mouse T cells do not exhibit differences from knocking out pim1/2 i.e., Efforts have been made to address this using activation markers and cell size, cytokines, and proliferation and proteomics of activated T cells. What do the resting T cells look like though? Although TCR signalling is not impacted, other pathways might be. Resting-state comparison may identify this.

      In all experiments Pim3 mRNA was only detected at very low levels and no PIM3 protein was detected by mass spectrometry in either wild type or PIM1/2 double KO TCR activated or cytokine expanded CD8 T cells (See Tables S1, S3, S4). There was similarly no change in Pim3 mRNA expression in RNAseq of IL-2 or IL-15 expanded CD8 T cells (See Tables S2, S6). While we have not confirmed this in resting state cells for all the conditions examined, there is no evidence that PIM3 compensates for PIM1/2deficiency or that PIM3 is substantially expressed in T cells.

      Figure 1A&B - Does PIM kinase stay elevated when removing TCR stimulus? During egress from lymph node and trafficking to infection/tumour/autoimmune site, T cells experience a period of 'rest' from T-cell activation so is PIM upregulation stabilized, or does it just coincide with activation? This could be a crucial control given the rest of the study focuses on day 6 after initial activation (which includes 4 days of 'rest' from TCR stimulation). Nice resolution on early time course though.

      This is an interesting question. Unfortunately, we do not know how sensitive PIM kinases are to TCR stimulus withdrawal, as we have not tried removing the TCR stimulus during early activation and measuring PIM expression.

      Based on the data in Fig 2A there is a hint that 4 hours withdrawal of peptide stimulus may be enough to lose PIM1/2 expression (after ~36 hrs of TCR activation), however, we did not include a control condition where peptide is retained within the culture. Therefore, we cannot resolve this question from the current experimental data, as this difference could also be due to a further increase in PIMs in the cytokine treated conditions rather than a reduction in expression in the no cytokine condition. This ~36-hour time point is also at a stage where T cells have become more dependent on cytokines for their sustained signalling compared to TCR stimulus.

      It is worth noting that PIM kinases are thought to have fairly short mRNA and protein half lives (~5-20 min for PIM1 in primary cells, ~10 min – 1 hr for PIM2). This is consistent with previous observations that cytotoxic T cells need sustained IL-2/Jak signalling to sustain PIM kinase expression, e.g. in Rollings et al (2018) Sci Signaling, DOI:10.1126/scisignal.aap8112 . We would therefore expect that sustained signalling from some external signalling receptor whether this is TCR, costimulatory receptors or cytokines is required to drive Pim1/2 mRNA and protein expression.

      Figure 1D - the CD4 WT and Pim dKO plots are identical - presumably a copying error - please correct.

      We apologise for the copying error and have amended the manuscript to show the correct data. We thank the reviewer for noticing this mistake.

      In Figure 1H - there is one protein found significant - would be nice to mention what this is - for example, if this is a protein that influences TCR levels this could be quite important.

      The protein is Phosphoribosyl Pyrophosphate synthase 1 like 1 (Prps1l1).

      This was a low confidence quantification (based on only 2 peptides) with no known function in T cells. Based on what is known, this gene is predominantly expressed in the testis (though also detected in spleen, lung, liver). A whole-body KO mouse found no difference in male fertility. No further phenotype has been reported in this mouse. See: Wang et al (2018) Mol Reprod Dev, DOI: 10.1002/mrd.23053

      We have added the following text to the legend of Figure 1H to address this protein:

      “Phosphoribosyl Pyrophosphate synthase 1 like 1 (Prps1l1), was found to be higher in Pim dKO CD8 T cells, but was a low confidence quantification (based on only 2 unique peptides) with no known function in T cells.”

      Figure S1 - In your mouse model the reduction in CD4 T cells is quite dramatic in the spleen - is this reduced homing or reduced production of T cells through development?

      Could you quantify the percentage of CD45+ cells that are T cells from blood too? Would be good to have a more thorough analysis of this new mouse model.

      We apologise for the lack of clarity around the Pim dKO mouse phenotype. Something we didn’t mention previously due to a lack of a formal measurement is that the Pim dKO mice were typically smaller than their WT counterparts. This is likely the main reason for total splenocytes being lower in the Pim dKO mice - every organ is smaller. It is not a phenotype reported in Pim1/2 dKO mice on an FVB background, though has been reported in the Pim1/2/3 triple KO mouse before (see Mikkers et al, Mol Cell Biol 2004 doi: 10.1128/MCB.24.13.6104-6115.2004).

      The % cell type composition of the spleen is equivalent between WT and Pim dKO mice and as mentioned above, was controlled for when setting up of our in vitro cultures.

      We have revised the main text and changed the order of the panels in Fig S1 to make this caveat clearer as follows (lines 138-144):

      “There were normal proportions of peripheral T cells in spleens of Pim dKO mice (Fig S1A) similar to what has been reported previously in Pim dKO mice on an FVB/N genetic background (Mikkers et al., 2004), though the total number of T cells and splenocytes was lower than in age/sex matched wild-type (WT) mouse spleens (Fig S1B-C). This was not attributable to any one cell type (Fig S1A)(James et al., 2021) but was instead likely the result of these mice being smaller in size, a phenotype that has previously been reported in Pim1/2/3 triple KO mice (Mikkers et al., 2004).”

      Figure S1C - why are only 10-15% of the cells alive? Please refer to this experiment in the main text if you are going to include it in the supplementary figure.

      With regards what was previously Fig S1C (now Fig S1A) we apologise for our confusing labelling. We were quoting these numbers as the percentage of live splenocytes (i.e. % of live cells). Typically ~80-90% of the total splenocytes were alive by the time we had processed, stained and analysed them by flow cytometry direct ex vivo. Of these CD4 and CD8 T cells made up ~%10-15 of the total live splenocytes (with most of the rest of the live cells being B cells).  

      We have modified the axis to say “% of splenocytes” to make it clearer that this is what we are plotting.

      Figure S1 - Would be good to show that the T cells are truly deficient in PIM1/2 in your mice to be absolutely sure. You could just make a supplementary plot from your mass spec data.

      This is a good suggestion and we have now included this data as supplementary figure 2.

      To note, due to the Pim1 knockout mouse design this is not as simple as showing presence or absence of total PIM1 protein detection in this instance.

      To elaborate: the Pim1/Pim2 whole body KO mice used in this study were originally made by Prof Anton Berns’ lab (Pim1 KO = Laird et al Nucleic Acids Res, 1993, doi: 10.1093/nar/21.20.4750, with more detail on deletion construct in te Riele, H. et al, Nature,1990, DOI: 10.1038/348649a0; Pim2 KO = Mikkers et al, Mol Cell Biol, 2004, DOI: 10.1128/MCB.24.13.6104-6115.2004). They were given to Prof Victor Tybulewicz on an FVB/N background. He then backcrossed them onto the C57BL/6 background for > 10 generations then gave them to us to intercross into Pim1/2 dKO mice on a C57BL/6 background.

      The strategy for Pim1 deletion was as follows:

      A neomycin cassette was recombined into the Pim1 gene in exon 4 deleting 296 Pim1 nucleotides. More specifically, the 98th pim-1 codon (counted from the ATG start site = the translational starting point for the 34 kDa isoform of PIM1) was fused in frame by two extra codons (Ser, Leu) to the 5th neo codon (pKM109-90 was used). The 3'-end of neo included a polyadenylation signal. The cassette also contains the PyF101 enhancer (from piiMo +PyF101) to ensure expression of neo on homologous recombination in ES cells.

      Collectively this means that the PIM1 polypeptide is made prior to amino acid 98 of the 34 kDa isoform but not after this point. This deletes functional kinase activity in both the 34 kDa and 44 kDa PIM1 isoforms. Ablation of PIM1 kinase function using this KO was verified via kinase activity assay in Laird et al. Nucelic Acids Res 1993.

      The strategy to delete Pim2 was as follows:

      “For the Pim2 targeting construct, genomic BamHI fragments encompassing Pim2 exons 1, 2, and 3 were replaced with the hygromycin resistance gene (Pgp) controlled by the human PGK promoter.” (Mikkers et al Mol Cell Biol, 2004)

      The DDA mass spectrometry data collected in Fig 1 G-H and supplementary table 1 confirmed we do not detect peptides from after amino acid residue 98 in PIM1 (though we do detect peptides prior to this deletion point) and we do not detect peptides from the PIM2 protein in the Pim dKO mice. Thus confirming that no catalytically active PIM1/PIM2 proteins were made in these mice.

      We have added a supplementary figure S2 showing this and the following text (Lines 155-156):

      “Proteomics analysis confirmed that no catalytically active PIM1 and PIM2 protein were made in Pim dKO mice (Fig S2).”

      Figure 2A - I found the multiple arrows a little confusing - would just use arrows to indicate predicted MW of protein and stars to indicate non-specific. Why are there 3 bands/arrows for PIM2?  

      The arrows have now been removed. We now mention the PIM1 and PIM2 isoform sizes in the figure legend and have left the ladder markings on the blots to give an indication of protein sizes. There are 2 isoforms for PIM1 (34 and 44 kDa) in addition to the nonspecific band and 3 isoforms of PIM2 (40, 37, 34 kDa, though two of these isoform bands are fairly faint in this instance). These are all created via ribosome use of different translational start sites from a single Pim1 or Pim2 mRNA transcript.

      The following text has been added to the legend of Fig 2A:

      “Western blots of PIM1 (two isoforms of 44 and 34 kDa, non-specific band indicated by *), PIM2 (three isoforms of 40, 37 and 34 kDa) or pSTAT5 Y694 expression.”

      Figure 2A - why are the bands so faint for PIM1/2 (almost non-existent for PIM2 under no cytokine stim) here yet the protein expression seems abundant in Figure 1B upon stim without cytokines? Is this a sensitivity issue with WB vs proteomics? My apologies if I have missed something in the methods but please explain this discrepancy if not.

      There is differing sensitivity of western blotting versus proteomics, but this is not the reason for the discrepancy between the data in Fig 1B versus 2A. These differences reflect that Fig1 B and Fig 2A contrast PIM levels in two different sets of conditions and that while proteomics allows for an estimate of ‘absolute abundance’ Western blotting only shows relative expression between the conditions assessed.  

      To expand on this… Fig 1B proteomics looks at naïve versus 24 hr aCD3/aCD28 TCR activated T cells. The western blot data in Fig 2A looks at T cells activated for 1.5 days with SIINFEKL peptide and then washed free of the media containing the TCR stimulus and cultured with no stimulus for 4 or 24 hrs hours and contrast this with cells cultured with IL-2 or IL-15 for 4 or 24 hours. All Fig 2A can tell us is that cytokine stimuli increases and/or sustains PIM1 and PIM2 protein above the level seen in TCR activated cells which have not been cultured with cytokine for a given time period. Overexposure of the blot does reveal detectable PIM1 and PIM2 protein in the no cytokine condition after 4 hrs. Whether this is equivalent to the PIM level in the 24 hr TCR activated cells in Fig 1B is not resolvable from this experiment as we have not included a sample from a naïve or 24 hr TCR activated T cell to act as a point of reference.

      Figure 4F - Your proteomics data shows substantial downregulation in proteomics data for granzymes and ifny- possibly from normalization to maximise the differences in the graph - and yet your flow suggests there are only modest differences. Can you explain why a discrepancy in proteomics and flow data - perhaps presenting in a more representative manner (e.g., protein counts)?

      The heatmaps are a scaled for ‘row max’ to ‘row min’ copy number comparison on a linear scale and do indeed visually maximise differences in expression between conditions. This feature of these heatmaps is also what makes the lack of difference in GzmB and GzmA at the mRNA heatmap in Fig 5C quite notable.

      We have now included bar graphs of Granzymes A and B and IFNg protein copy number in Figure 4 (see new Fig 4G-H) to make clearer the magnitude of the effect on the major effector proteins involved in CTL killing function. It is worth noting that flow cytometry histograms from what was formerly Fig 4G (now Fig 4I) are on a log-scale so the shift in fluorescence does generally correspond well with the ~1.7-2.75-fold reduction in protein expression observed.

      Figure 4G - did you use isotype controls for this flow experiment? Would help convince labelling has worked - particularly for low levels of IFNy production.

      We did not use isotype controls in these experiments but we are using a well validated interferon gamma antibody and very carefully colour panel/compensation controls to minimise background staining. The only ways to be 100% confident that an antibody is selective is to use an interferon gamma null T cell which we do not have. We do however know that the antibody we use gives flow cytometry data consistent with other orthogonal approaches to measure interferon gamma e.g. ELISA and mass spectrometry.

      Figure 5M - why perform this with just the PIM kinase inhibitors? Can you do this readout for the WT vs. PIM1/2KO cells too? This would really support your claims for the paper about PIM influencing translation given the off-target effects of SMIs.

      Regrettably we have not done this particular experiment with the Pim dKO T cells. As mentioned above, due to this work being performed predominantly during the COVID19 pandemic we ultimately had to make the difficult decision to cease colony maintenance. When work restrictions were lifted we could not ethically or economically justify resurrecting a mouse colony for what was effectively one experiment, which is why we chose to test this key biological question with small molecule inhibitors instead.

      We appreciate that SMIs have off target effects and this is why we used multiple panPIM kinase inhibitors for our SMI validation experiments. While the use of 2 different inhibitors still doesn’t completely negate the concern about possible off-target effects, our conclusions re: PIM kinases and impact on proteins synthesis are not solely based on the inhibitor work but also based on the decreased protein content of the PIM1/2 dKO T cells in the IL-2 CTL, and the data quantifying reductions in levels of many proteins but not their coding mRNA in PIM1/2dKO T cells compared to controls.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary:

      The manuscript titled "Household clustering and seasonal genetic  variation of Plasmodium falciparum at the community-level in The Gambia" presents a valuable genetic spatio-temporal analysis of  malaria-infected individuals from four villages in The Gambia, covering  the period between December 2014 and May 2017. The majority of samples  were analyzed using a SNP barcode with the Spotmalaria panel, with a  subset validated through WGS. Identity-by-descent (IBD) was calculated  as a measure of genetic relatedness and spatio-temporal patterns of the  proportion of highly related infections were investigated. Related  clusters were detected at the household level, but only within a short  time period.

      Strengths:

      This study offers a valuable dataset, particularly due to its  longitudinal design and the inclusion of asymptomatic cases. The  laboratory analysis using the Spotmalaria platform combined and  supplemented with WGS is solid, and the authors show a linear  correlation between the IBD values determined with both methods,  although other studies have reported that at least 200 SNPs are required for IBD analysis. Data-analysis pipelines were created for (1) variant  filtering for WGS and subsequent IBD analysis, and (2) creating a  consensus barcode from the spot malaria panel and WGS data and  subsequent SNP filtering and IBD analysis.

      Weaknesses:

      Further refining the data could enhance its impact on both the scientific community and malaria control efforts in The Gambia.

      (1) The manuscript would benefit from improved clarity and better  explanation of results to help readers follow more easily. Despite  familiarity with genotyping, WGS, and IBD analysis, I found myself  needing to reread sections. While the figures are generally clear and  well-presented, the text could be more digestible. The aims and  objectives need clearer articulation, especially regarding the rationale for using both SNP barcode and WGS (is it to validate the approach with the barcode, or is it to have less missing data?). In several analyses, the purpose is not immediately obvious and could be clarified.

      The text of the manuscript has now been thoroughly revised. But please let us know if a specific section remains unclear.

      (2) Some key results are only mentioned briefly in the text without  corresponding figures or tables in the main manuscript, referring only  to supplementary figures, which are usually meant for additional detail, but not main results. For example, data on drug resistance markers  should be included in a table or figure in the main manuscript.

      We agree with the reviewer suggesting to move the prevalence of drug resistance markers from supplementary figures (previously Figure S8) to the main manuscript (now Figure 5). If other Figure/Table should be moved to the main manuscript please let us know.

      (3) The study uses samples from 2 different studies. While these are  conducted in the same villages, their study design is not the same,  which should be addressed in the interpretation and discussion of the  results. Between Dec 2014 and Sept 2016, sampling was conducted only in 2 villages and at less frequent intervals than between Oct 2016 to May  2017. The authors should assess how this might have impacted their  temporal analysis and conclusions drawn. In addition, it should be  clarified why and for exactly in which analysis the samples from Dec  2016 - May 2017 were excluded as this is a large proportion of your  samples.

      We have clarified which set of samples was used in our Results (Lines 293-295, 316-319). While two villages were recruited halfway through the study, two villages (J and K, Figure 1C) consistently provided data for each transmission season. Importantly, our temporal analysis accounts for these differences by grouping paired barcodes based on their respective locations (Figure 3B). Despite variations in sampling frequency, we still observe a clear overall decline in relatedness between the ‘0-2 months’ and ‘2-5 months’ groups, both of which include barcodes from all four villages.

      (4) Based on which criteria were samples selected for WGS? Did the  spatiotemporal spread of the WGS samples match the rest of the genotyped samples? I.e. were random samples selected from all times and places,  or was it samples from specific times/places selected for WGS?

      All P. falciparum positive samples were sent for genotyping and whole genome sequencing, ensuring no selection bias. However, only samples with sufficient parasite DNA were successfully sequenced. We have updated the text (Line 129-130) and added a supplementary figure (Figure S4) to show the sample collection broken down by type of data (barcode or genome). High quality genomes are distributed across all time points.

      (5) The manuscript would benefit from additional detail in the methods section.

      Please see our response in the section “Recommendation for the authors”.

      (6) Since the authors only do the genotype replacement and build  consensus barcode for 199 samples, there is a bias between the samples  with consensus barcode and those with only the genotyping barcode. How  did this impact the analysis?

      While we acknowledge the potential for bias between samples with a consensus barcode (based on WGS) and those with genotyping-only barcodes, its impact is minimal. WGS does indeed produce a more accurate barcode compared to SNP genotyping, but any errors in the genotyping barcodes were mitigated by excluding loci that systematically mismatched with WGS data (see Figure S3). Additionally, the use of WGS improved the accuracy of 51 % (216/425) of barcodes, which strengthens the overall quality and validity of our analysis.

      (7) The linear correlation between IBD-values of barcode vs genome is  clear. However, since you do not use absolute values of IBD, but a  classification of related (>=0.5 IBD) vs. unrelated (<0.5), it  would be good to assess the agreement of this classification between the 2 barcodes. In Figure S6 there seem to be quite some samples that would be classified as unrelated by the consensus barcode, while they have  IBD>0.5 in the Genome-IBD; in other words, the barcode seems to be  underestimating relatedness.

      a. How sensitive is this correlation to the nr of SNPs in the barcode?

      We measured the agreement between the two classifications using specificity (0.997), sensitivity (0.841) and precision (0.843) described in the legend of Figure S8. To further demonstrate the good agreement between the two methods, we calculated a Cohen’s kappa value of 0.839 (Lines 226, 290), indicative of a strong agreement (McHugh 2012). As expected, the correlation between IBD values obtained by both methods improves (higher Cohen’s kappa and R<sup>2</sup>) as the cutoff for the minimal number of comparable and informative loci per barcode pair is raised (data not shown).

      (8) With the sole focus on IBD, a measure of genetic relatedness, some of the conclusions from the results are speculative.

      a. Why not include other measures such as genetic diversity, which  relates to allele frequency analysis at the population level (using, for example, nucleotide diversity)? IBD and the proportion of highly  related pairs are not a measure of genetic diversity. Please revise the  manuscript and figures accordingly.

      We agree with the fact that IBD is not a direct measure of genetic diversity, even though both are related (Camponovo et al., 2023). More precisely, IBD is a measure of the level of inbreeding in the population (Taylor et al., 2019). We have updated our manuscript by replacing “genetic diversity” with “genetic relatedness” or “inbreeding/outcrossing” when appropriate. Nucleotide diversity would be relevant if we wanted to compare different settings, e.g. Africa vs Asia, however this is not the case here.

      b. Additionally, define what you mean by "recombinatorial genetic  diversity" and explain how it relates to IBD and individual-level  relatedness.

      We considered the term ‘recombinatorial genetic diversity’ to be equivalent to the level of inbreeding in the population. Because this expression is rather uncommon, we decided to drop it from our manuscript and replace it with “inbreeding/outcrossing”.

      c. Recombination is one potential factor contributing to the loss of  relatedness over time. There are several other factors that could  contribute, such as mobility/gene flow, or study-specific limitations  such as low numbers of samples in the low transmission season and many  months apart from the high transmission samples.

      Indeed, the loss of relatedness could be attributed not only to the recombination of local cases but also to new parasites introduced by imported malaria cases. As we stated in our manuscript, previous studies have shown a limited effect of imported cases on maintaining transmission (Lines 72-74). Nevertheless, we cannot definitely exclude that imported cases have an effect on inbreeding levels, since we do not have access to genetic data of surrounding parasites at the time of the study. We updated the discussion accordingly (Lines 497-501).

      d. By including other measures such as linkage disequilibrium you could  further support the statements related to recombination driving the loss of relatedness.

      This commendable suggestion is actually part of an ongoing project focusing on the sharing of IBD fragments and how it correlates with linkage disequilibrium. However, we believe that this analysis would not fit in the scope of our manuscript which is really about spatio-temporal effects on parasite relatedness at a local scale.

      (9) While the authors conclude there is no seasonal pattern in the  drug-resistant markers, one can observe a big fluctuation in the dhps  haplotypes, which go down from 75% to 20% and then up and down again  later. The authors should investigate this in more detail, as dhps is  related to SP resistance, which could be important for seasonal malaria  chemoprofylaxis, especially since the mutations in dhfr seem near-fixed  in the population, indicating high levels of SP resistance at some of  the time points.

      As the reviewer noted, the DHPS A437G haplotype appears to decrease in prevalence twice throughout our study: from the 2015 and 2016 high transmission seasons to the subsequent 2016 and 2017 low transmission seasons. Seasonal Malaria Chemoprophylaxis (SMC) was carried out in the area through the delivery of sulfadoxine–pyrimethamine plus amodiaquine to children 5 years old and younger during high transmission seasons. As DHPS A437G haplotype has been associated with resistance to sulfadoxine, its apparent increase in prevalence during high transmission seasons could be resulting from the selective pressure imposed on parasites. After SMC, the decrease in prevalence observed during low transmission seasons could be caused by a fitness cost of the mutation favouring wild-type parasites over resistant ones. We updated our manuscript to reflect this relevant observation (Lines 400-405).

      (10) I recommend that raw data from genotyping and WGS should be deposited in a public repository.

      Genotyping data is available in the supplementary table 4 (Table S4). Whole genome sequencing is accessible in a European Nucleotide Archive public repository with the identifiers provided in supplementary table 5 (Table S5). We added references to these tables in the manuscript (Lines 249-250).

      Reviewer #2 (Public review):

      Summary:

      Malaria transmission in the Gambia is highly seasonal, whereby periods  of intense transmission at the beginning of the rainy season are  interspersed by long periods of low to no transmission. This raises  several questions about how this transmission pattern impacts the  spatiotemporal distribution of circulating parasite strains. Knowledge  of these dynamics may allow the identification of key units for targeted control strategies, the evaluation of the effect of selection/drift on  parasite phenotypes (e.g., the emergence or loss of drug resistance  genotypes), and analyze, through the parasites' genetic nature, the  duration of chronic infections persisting during the dry season. Using a combination of barcodes and whole genome analysis, the authors try to  answer these questions by making clever use of the different  recombination rates, as measured through the proportion of genomes with  identity-by-descent (IBD), to investigate the spatiotemporal relatedness of parasite strains at different spatial (i.e., individual, household,  village, and region) and temporal (i.e., high, low, and the  corresponding the transitions) levels. The authors show that a large  fraction of infections are polygenomic and stable over time, resulting  in high recombinational diversity (Figure 2). Since the number of  recombination events is expected to increase with time or with the  number of mosquito bites, IBD allows them to investigate the  connectivity between spatial levels and to measure the fraction of  effective recombinational events over time. The authors demonstrate the  epidemiological connectivity between villages by showing the presence of related genotypes, a higher probability of finding similar genotypes  within the same household, and how parasite-relatedness gradually  disappears over time (Figure 3). Moreover, they show that transmission  intensity increases during the transition from dry to wet seasons  (Figure 4). If there is no drug selection during the dry season and if  resistance incurs a fitness cost it is possible that alleles associated  with drug resistance may change in frequency. The authors looked at the  frequencies of six drug-resistance haplotypes (aat1, crt, dhfr, dhps,  kelch13, and mdr1), and found no evidence of changes in allele  frequencies associated with seasonality. They also find chronic  infections lasting from one month to one and a half years with no  dependence on age or gender.

      The use of genomic information and IBD analytic tools provides the  Control Program with important metrics for malaria control policies, for example, identifying target populations for malaria control and  evaluation of malaria control programs.

      Strength:

      The authors use a combination of high-quality barcodes (425 barcodes  representing 101 bi-allelic SNPs) and 199 high-quality genome sequences  to infer the fraction of the genome with shared Identity by Descent  (IBD) (i.e. a metric of recombination rate) over several time points  covering two years. The barcode and whole genome sequence combination  allows full use of a large dataset, and to confidently infer the  relatedness of parasite isolates at various spatiotemporal scales.

      Reviewer #3 (Public review):

      Summary

      This study aimed to investigate the impact of seasonality on the malaria parasite population genetic. To achieve this, the researchers conducted a longitudinal study in a region characterized by seasonal malaria  transmission. Over a 2.5-year period, blood samples were collected from  1,516 participants residing in four villages in the Upper River Region  of The Gambia and tested the samples for malaria parasite positivity.  The parasites from the positive samples were genotyped using a genetic  barcode and/or whole genome sequencing, followed by a genetic  relatedness analysis.

      The study identified three key findings:

      (1) The parasite population continuously recombines, with no single genotype dominating, in contrast to viral populations;

      (2) The relatedness of parasites is influenced by both spatial and temporal distances; and

      (3) The lowest genetic relatedness among parasites occurs during the  transition from low to high transmission seasons. The authors suggest  that this latter finding reflects the increased recombination associated with sexual reproduction in mosquitoes.

      The results section is well-structured, and the figures are clear and  self-explanatory. The methods are adequately described, providing a  solid foundation for the findings. While there are no unexpected  results, it is reassuring to see the anticipated outcomes supported by  actual data. The conclusions are generally well-supported; however, the  discussion on the burden of asymptomatic infections falls outside the  scope of the data, as no specific analysis was conducted on this aspect  and was not stated as part of the aims of the study. Nonetheless, the  recommendation to target asymptomatic infections is logical and  relevant.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) The manuscript would benefit from additional detail in the methods section.

      a. Refer to Figure 1 when you describe the included studies and sample processing.

      We added the reference to Figure 1 (Line 131).

      b. While you describe each step in the pipeline, you do not specify the  tools, packages, or environment used (the GitHub link is also  non-functional). A graphic representation of the pipeline, with more  bioinformatic details than Supplementary Figure S1, would be helpful.  Add references to used tools and software created by others.

      The GitHub link has been updated and is now functional. We find Figure S1 already heavy in details, adding in more would be detrimental to our will of it being an easily readable summary of our pipeline. Readers seeking in-depth explanation of our pipeline might be more interested in reading the methods section instead. We are very much committed to credit the authors of the tools that were essential for us to create our analysis pipeline. The two most relevant tools that we used are hmmIBD and the Fws calculation, which were both cited in the methods (Lines 148-152, 214-215).

      c. What changed in the genotyping protocol after May 2016? Does it not  lead to bias in the (temporal) analysis by leaving these loci in for  samples collected before May 2016 and making them 'unknown' for the  majority of samples collected after this date?

      These 21 SNPs all clustered in 1 of the 4 multiplexes used for molecular genotyping, which likely failed to produce accurate base calls. We updated the text to include this information (Lines 198-200).

      The rationale behind the discarding of these 21 SNPs for barcodes sampled after May 2016 was that they were consistently mismatching with the WGS SNPs, probably due to genotyping error as mentioned above. However, by replacing these unknown positions in the molecular barcodes with WGS SNPs, 141 samples did recover some of these 21 SNPs with the accurate base calls (Figure S3A). Additionally, we added an extra analysis to assess the agreement between barcodes and WGS data (Figure S3B).

      d. Related to this, how are unknown and mixed genotypes treated in the  binary matrix? How is the binary matrix coded? Is 0 the same as the  reference allele? So all the missing and mixed are treated as  references? How many missing and mixed alleles are there, how often does it occur and how does this impact the IBD analysis?

      We acknowledge that the details that we provided regarding the IBD analysis were confusing. hmmIBD requires a matrix that contains positive or null integers for each different allele at a given loci (all our loci were bi-allelic, thus only 0 and 1 were used) and -1 for missing data. In our case, we set missing and mixed alleles to -1, which were then ignored during the IBD estimation. The corresponding text was updated accordingly (Lines 173-175).

      e. By excluding households with less than 5 comparisons, are you not preselecting households with high numbers of cases, and therefore higher likelihood of transmission within the household?

      All participants in each household were sampled at every collection time point. This sampling was unbiased towards likelihood of transmission. Excluding pairs of households with less than 5 comparisons was necessary to ensure statistical robustness in our analyses. Besides, this does not necessarily restrict the analysis to only households with a high number of cases as it is the total number of pairs between households that must equal 5 at least (for instance these pairs would pass the cutoff: household with 1 case vs household with 5 cases; household with 2 cases vs household with 3 cases).

      (2) Since the authors only do the genotype replacement and build  consensus barcode for 199 samples, there is a bias between the samples  with consensus barcode and those with only the genotyping barcode. How  did this impact the analysis?

      See (6) in the Public Review.

      a. It would be good to get a better sense of the distribution of the nr  of SNPs in the barcode. The range is 30-89, and 30 SNPs for IBD is  really not that much.

      Adding the range of the number of available SNPs per barcode is indeed particularly relevant. We added a supplementary figure (Figure S5) showing the distribution of homozygous SNPs per barcode, showing that a very small minority of barcodes have only 30 SNPs available for IBD (average of 65, median of 64).

      b. Did you compare the nr of SNPs in the consensus vs. only genotyped  barcodes? Is there more missing data in the genotype-only barcodes?

      We added a supplementary figure (Figure S5) with the distribution of homozygous SNPs in consensus (216 samples) and molecular (209 samples) barcodes. Consensus barcodes have more homozygous SNPs (average 76, median 82) than molecular barcodes (average of 54, median of 53), showing the improvement resulting from using whole genome sequencing data.

      c. How was the cut-off/sample exclusion criteria of 30 SNPs in the barcode determined?

      As described above (Public review section 7.a.), we removed pairs of barcodes with less than 30 comparable loci (and 10 informative loci) because this led to a good agreement between IBD values obtained from barcodes and genomes while still retaining a majority of pairwise IBD values.

      d. Was there more/less IBD between sample pairs with a consensus barcode vs those with genotype-only barcodes?

      We separated pairwise IBD values into two groups: “within consensus” and “within molecular”. The percentages of related barcodes (IBD ≥ 0.5) was virtually identical between “within consensus” (1.88 %) and “within molecular” (1.71 %) groups (χ<sup>2</sup> = 1.33, p value > 0.24).

      (3) Line 124 adds a reference for the PCR method used.

      We have updated this information: varATS qPCR (Line 121).

      (4) Line 126, what is MN2100ff? Is this the catalogue number of the  cellulose columns? Please clarify and add manufacturer details.

      MN2100ff was a replacement for CF11. We added a link to the MalariaGen website describing the product and the procedure (Lines 124-125).

      (5) Line 143: Figure S7 is the first supplementary figure referenced. Change the order and make this Figure S1?

      The numbering of figures is now fixed.

      (6) Line 154: How many SNPs were in the vcf before filtering?

      There were 1,042,186 SNPs before filtering. This information was added to the methods (Line 168).

      (7) Line 156: Why is QUAL filtered at 10000? This seems extremely high.  (I could be mistaken, but often QUAL above 50 or so is already fine, why discard everything below 10000?). What is the range of QUAL scores in  your vcf?

      We used the QUAL > 10000 to make our analyses less computationally intensive while keeping enough relevant genetic information. We agree that keeping variants with extremely high values of QUAL is not relevant above a certain threshold as it translates into infinitesimally low probabilities (10<sup>-(QUAL/10)</sup>) of the variant calling being wrong. We then decided to use a minimal population minor allele frequency (MAF) of 0.01 to keep a variant as this will make the IBD calculation more accurate (Taylor et al., 2019). The variant filtering was carried out with the MAF > 0.01 filter, resulting in 27,577 filtered SNPs with a minimal QUAL of 132. With a cutoff of 3000 available SNPs, we retrieved all 199 genomes previously obtained with the QUAL > 10000 condition. The methods have been updated accordingly (Lines 166-170).

      (8) Line 161-165: How did you handle the mixed alleles in the hmmIBD  analysis for the WGS data? Did you set them as 0 as you do later on for  the consensus barcode?

      Mixed alleles and missing data were ignored. This translated into a value of -1 for the hmmIBD matrix and not 0 as we incorrectly stated previously. We updated our manuscript with this correct information (Lines 173-175).

      (9) Line 168-171: How many SNPs do you have in the WGS dataset after all the filtering steps? If the aim of the IBD with WGS was to validate the IBD-analysis with the barcode, wouldn't it make sense to have at least  200 loci (as shown in Taylor et al to be required for hmmIBD) in the WGS data? What proportion of comparisons were there with only 100 pairs of  loci? This seems like really few SNPs from WGS data.

      There were 27,577 SNPs overall in the 199 high quality genomes. In our analysis, we make the distinction between comparable and informative loci. For two loci to be comparable, they both have to be homozygous. To be informative, they must be comparable and at least one of them must correspond to the minor allele in the population. We borrowed this term and definition from hmmIBD software which yields directly the number of informative loci per pair. By keeping pairs with at least 100 informative SNPs, we aimed to reduce the number of samples artificially related because only population major alleles are being compared. Pairs of genomes had between 1073 and 27466 of these, way above the recommended 200 loci in Taylor et al. (2019). We added more details on comparable and informative sites (Lines 152-160).

      (10) Line 178: why remove the 12 loci that are absent from the WGS? Are  these loci also poorly genotyped in the spotmalaria panel?

      As our goal is to validate the reliability of molecular genotyped SNPs, these 12 loci have to be removed. Especially because we did find a consistent discrepancy between genotyped and WGSed SNPs, which cannot be tested if these SNPs are absent from the genomes.

      (11) Line 180-182: What do you mean by this sentence: "Genomic barcodes  are built using different cutoffs of within-sample MAF and aligned  against molecular barcodes from the same isolates." Is this the analysis presented in the supplementary figure and resulting in the cut-off of  MAF 0.2? Please clarify.

      A loci where both alleles are called can result from two distinct haploïd genomes present or from an error occurring during sequencing data acquisition or processing. To distinguish between the two, we empirically determined the cutoff of within-sample MAF above which the loci can be considered heterozygous and below which only the major allele is kept. The corresponding figure was indeed Figure S2 (referenced in next sentence Lines 192-195). We clarified our approach in the methods (Lines 190-192) and legends of Figures S2 and Figure S3.

      (12) Line 191: How often was there a mismatch between WGS and SNP barcode?

      We added a panel (Figure S3B) showing the average agreement of each SNP between molecular genotyping and WGS. We highlighted the 21 discrepant SNPs showing a lower agreement only for samples collected after May 2016.

      (13) Line 201-204: This part is unclear (as above for the WGS): did you  include sample pairs with more than 10 paired loci? But isn't 10 loci  way too few to do IBD analysis?

      We included pairs of samples with at least 30 comparable loci and 10 informative paired loci (refer to our answer to comment 8 for the difference between the two). We added more details regarding comparable and informative sites (Lines 152-160). Indeed, using fewer than 200 loci leads to an IBD estimation that is on average off by 0.1 or more (Taylor et al., 2019). However we showed that the barcode relatedness classification based on a cutoff of IBD (related when above 0.5, unrelated otherwise) was close enough to our gold standard using genomes (each pair having more than 1000 comparable sites). Because we use this classification approach rather than the exact value of barcode-estimated IBD in our study, our 30 minimum comparable sites cutoff seems sufficient.

      (14) Lines 206-207: which program did you use to analyse Fws?

      We did not use any program, we computed Fws according to Manske et al. (2012) methods.

      (15) Line 233: "we attempted parasite genotyping and whole genome  sequencing of 522 isolates over 16 time points" => This is confusing, you did not do WGS of 522 samples, only 199 as mentioned in the next  sentence.

      We attempted whole genome sequencing on 331 isolates and molecular genotyping on 442 isolates with 251 isolates common between the two methods. We updated our text to clarify this point (Lines 247-252).

      (16) Lines 256-259: Add a range of proportions or some other summary  statistic in this section as you are only referring here to  supplementary figures to support these statements.

      The text has been updated (Lines 271-274).

      (17) Line 260: check the formatting of the reference "Collins22" as the rest of the document references are numbered.

      Fixed.

      (18) Figure 2/3:

      a. You could also inspect relatedness at the temporal level, by  adjusting the network figure where the color is village and shape is  time (month/year).

      Although visualising the effect of time on the parasite relatedness network would be a valuable addition, we did not find any intuitive and simple way of doing so. Using shapes to represent time might end up being more confusing than helpful, especially because the sampling was not done at fixed intervals.

      b. To further support the statement of clustering at the household  level, it might be useful to add a (supplementary) figure with the  network with household number/IDs as color or shape. In the network,  there seems to be a lot of relatedness within the villages and between  villages. Perhaps looking only at the distribution of the proportion of  highly related isolates is simplifying the data too much. Besides, there is no statistical difference between clustering at the household vs  within-village levels as indicated in Figure 3.

      Unfortunately, there are too many households (71 in Figure 2) to make a figure with one color or shape per household readable. The statistical test of the difference between the within household and within village relatedness yielded a p value above the cutoff of 0.05 (p value of 0.084). However, it is possible that the lack of significance arises from the relatively low number of data points available in the “within household” group. This is even more plausible considering the statistical difference of both “within household” and “within village” groups with “between village” group. Overall, our results indicate a decreasing parasite relatedness with spatial distance, and that more investigation would be needed to quantify the difference between “within household” and “within village” groups. 

      (19) Figure 4: Please add more description in the caption of this figure to help interpret what is displayed here. Figure 4A is hard to  interpret and does not seem to show more than is already shown in Figure 3A. What do the dots represent in Figure 4B? It is not clear what is  presented here.

      Compared to Figure 3A, Figure 4A enables the visualization of the relatedness between each individual pair of time points, which are later used in the comparison of relatedness between seasonal groups in Figure 4B. For this reason, we believe that Figure 4A should remain in the manuscript. However, we agree that the relationship between Figure 4A and Figure 4B is not intuitive in the way we presented it initially. For this reason, we added more details in the legend and modified Figure 4A to highlight the seasonal groups used in Figure 4B. 

      (20) Line 360-361: what did you do when haplotypes were not identical?

      We explained it in the methods section (Lines 144-146): in this case, only WGS haplotypes were kept.

      (21) Section chronic infections: it is important to mention that the  majority of chronic infections are individuals from the monthly  dry-season cohort.

      We added a statement about the 21 chronically infected individuals that were also part of the December 2016 – May 2017 monthly follow-up (Lines 423-426).

      (22) Lines 381-386: Did you investigate COI in these individuals? Could  it be co-circulating strains that you do not pick up at all times due to the consensus barcodes and discarding of mixed genotypes (and does not  necessarily show intra-host competition. That is speculation and should  perhaps not be in the results)?

      This is exactly what we think is happening. Due to the very nature of genotyping, only one strain may be observed at a time in the case of a co-infection, where distinct but related strains are simultaneously present in the host. The picked-up strain is typically the one with the highest relative abundance at the time of sampling. As the reviewer stated, fluctuation of strain abundance might not only be due to intra-host competition but also asynchronous development stages of the two strains. We added this observation to the manuscript (Lines 432-435).

      (22) Figure 6: highlight the samples where the barcode was not available in a different color to be able to see the difference between a  non-matching barcode and missing data.

      We thank the reviewer for this great suggestion. We have now added to Figure 6 barcodes available along with their level of relatedness with the dominant genotypes for each continuous infections.

      (24) Improve the discussion by adding a clear summary of the main  findings and their implications, as well as study-specific limitations.

      The Discussion has been updated with a paragraph summarizing the primary results (Lines 451-457).

      (25) Line 445: "implying that the whole population had been replaced in just one year "

      a. What do you mean by replaced? Did other populations replace the  existing populations? I am not sure the lack of IBD is enough to show  that the population changed/was replaced. Perhaps it is more accurate to say that the same population evolved. Nevertheless, other measures such as genetic diversity and genetic differentiation or population  structure.would be more suitable to strengthen these conclusions.

      We agree that “replaced” was the wrong term in this case. We rather intended to describe how the numerous recombinations between malaria parasites completely reshaped the same initial population which gradually displayed lower levels of relatedness over time. We updated the manuscript accordingly (Lines 507-512).

      Reviewer #2 (Recommendations for the authors):

      (1) Line 260: Remove Collins 22.

      Fixed.

      (2) Lines 270-274: 73 + 213 = 286 not 284; sum of percentages is equal to 101%.

      The numbers are correct: the 73 barcodes identical (IBD >= 0.9) to another barcode are a subset of the 213 related (IBD >= 0.5) to another barcode. However we agree that this might be confusing and will considering barcodes to be related if they have an IBD between 0.5 and 0.9, while excluding those with an IBD >= 0.9. The text has been updated (Lines 299-301).

      (3) Section: "Independence of seasonality and drug resistance markers prevalence".

      The text has been revised and the supplementary figure is now a main figure.

      (4) For readers unaware of malaria control policy in the Gambia it would be helpful to have more details on the specifics of anti-malarial drug  administration.

      We added the drugs used in SMC (sulfadoxine-pyrimethamine and amodiaquine) and the first line antimalarial treatment in use in The Gambia during our study (Coartem) (Lines 383-388).

      Reviewer #3 (Recommendations for the authors):

      (1) The abstract is not as clear as the authors' summary. For example, I found the sentence starting with "with 425 P. falciparum..." hard to  follow.

      The abstract has been updated.

      (2) It is better to consistently use "barcode genotyping "or "genotyping by barcode". Sometimes "molecular genotyping" is used instead of  "barcode genotyping"

      We have now replaced all occurrences of “barcode genotyping” with “molecular genotyping” or “molecular barcode genotyping”. We prefer to stick with “molecular genotyping” as this let us distinguish between the molecular and the genomic barcode.

      (3) The introduction is quite disjoined and does not provide a clear  build-up to the gap in knowledge that the study is attempting to fill.  please revise.

      Introduction is now thoroughly revised.

      (4) Line 31 "with notable increase of parasite differentiation" is an interpretation and not an observation.

      We have modified that sentence (Lines 31-33).

      (5) Overall, the introduction requires substantial revision.

      Introduction is now thoroughly revised.

      (6) Line 70 "parasite population adapts..." I thought this required phenotypic analysis and not genetics?

      The idea is that population of parasites may adapt to environmental conditions (such as seasonality) by selecting the most fitted genotypes. For instance, antimalarial exposure has an effect of selecting parasites with specific mutations in drug resistance related genes, and this even appears to be transient (for example with chloroquine). As such, there is good reason to think that seasonality might have a similar effect on parasite genetics.

      (7) Line 129-130: the #442 is not reflected in the schematic Figure 1.

      This is an intentional choice to make the figure more synthetic. For this reason, we included the Figure S1, which provides more details on the data collection and analysis pipeline.

      (8) Line 242-243: "Made with natural earth". What is this?

      This is a statement acknowledging the use of Natural Earth data to produce the map presented in Figure 1A.

      (9) Line 260: "collins22", is this a reference?

      Fixed.

      (10) Line 269-70. Very hard to follow. Please revise.

      We changed the text (Lines 293-297).

      (11) Line 324: similarly... I think there is a typo here.

      We did not find any typo in this specific sentence. However, “Similarly to Figure 3” sounds maybe a bit off, so we changed it to “As in Figure 3” (Line 351).

      (12) Line 332-334: very hard to follow. please revise. Again, the lower  parasite relatedness during the transition from low to high was linked  to recombination occurring in the mosquito but what about infection  burden shifting to naive young children? Is there a role for host  immunity in the observed reduction in parasite-relatedness during the  transition period?

      This text has been rewritten (Lines 356-361).

      About the hypothesis of infection burden shifting to naïve young children, this question is difficult to address in The Gambia because children under 5 years old received Seasonal Malaria Chemoprophylaxis during the high transmission season. In older children (6-15 years old), the prevalence was similar to adults (Fogang et al., 2024).

      About the role of host immunity on parasite relatedness across time and space, our dataset is too small to divide it in different age groups. Further studies should address this very interesting question.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This paper examines changes in relaxation time (T1 and T2) and magnetization transfer parameters that occur in a model system and in vivo when cells or tissue are depolarized using an equimolar extracellular solution with different concentrations of the depolarizing ion K<sup>+</sup>. The motivation is to explain T2 changes that have previously been observed by the authors in an in vivo model with neural stimulation (DIANA) and to try to provide a mechanism to explain those changes.

      Strengths:

      The authors argue that the use of various concentrations of KCL in the extracellular fluid depolarize or hyperpolarize the cell pellets used and that this change in membrane potential is the driving force for the T2 (and T1-supplementary material) changes observed. In particular, they report an increase in T2 with increasing KCL concentration in the extracellular fluid (ECF) of pellets of SH-SY5Y cells. To offset the increasing osmolarity of the ECF due to the increase in KCL, the NaCL molarity of the ECF is proportionally reduced. The authors measure the intracellular voltage using patch clamp recordings, which is a gold standard. With 80 mM of KCL in the ECF, a change in T2 of the cell pellets of ~10 ms is observed with the intracellular potential recorded as about -6 mv. A very large T1 increase of ~90 ms is reported under the same conditions. The PSR (ratio of hydrogen protons on macromolecules to free water) decreases by about 10% at this 80 mM KCL concentration. Similar results are seen in a Jurkat cell line and similar, but far smaller changes are observed in vivo, for a variety of reasons discussed. As a final control, T1 and T2 values are measured in the various equimolar KCL solutions. As expected, no significant changes in T1 and T2 of the ECF were observed for these concentrations.

      Weaknesses:

      [Reviewer 1, Comment 1] While the concepts presented are interesting, and the actual experimental methods seem to be nicely executed, the conclusions are not supported by the data for a number of reasons. This is not to say that the data isn't consistent with the conclusions, but there are other controls not included that would be necessary to draw the conclusion that it is membrane potential that is driving these T1 and T2 changes. Unfortunately for these authors, similar experiments conducted in 2008 (Stroman et al. Magn. Reson. in Med. 59:700-706) found similar results (increased T2 with KCL) but with a different mechanism, that they provide definite proof for. This study was not referenced in the current work.

      It is well established that cells swell/shrink upon depolarization/hyperpolarization. Cell swelling is accompanied by increased light transmittance in vivo, and this should be true in the pellet system as well. In a beautiful series of experiments, Stroman et al. (2008) showed in perfused brain slices that the cells swell upon equimolar KCL depolarization and the light transmittance increases. The time course of these changes is quite slow, of the order of many minutes, both for the T2-weighted MRI signal and for the light transmittance. Stroman et al. also show that hypoosmotic changes produce the exact same time course as the KCL depolarization changes (and vice versa for the hyperosmotic changes - which cause cell shrinkage). Their conclusion, therefore, was that cell swelling (not membrane potential) was the cause of the T2-weighted changes observed, and that these were relatively slow (on the scale of many minutes).

      What are the implications for the current study? Well, for one, the authors cannot exclude cell swelling as the mechanism for T2 changes, as they have not measured that. It is however well established that cell swelling occurs during depolarization, so this is not in question. Water in the pelletized cells is in slow/intermediate exchange with the ECF, and the solutions for the two compartment relaxation model for this are well established (see Menon and Allen, Magn. Reson. in Med. 20:214-227 (1991). The T2 relaxation times should be multiexponential (see point (3) further below). The current work cannot exclude cell swelling as the mechanism for T2 changes (it is mentioned in the paper, but not dealt with). Water entering cells dilutes the protein structures, changes rotational correlation times of the proteins in the cell and is known to increase T2. The PSR confirms that this is indeed happening, so the data in this work is completely consistent with the Stroman work and completely consistent with cell swelling associated with depolarization. The authors should have performed light scattering studies to demonstrate the presence or absence of cell swelling. Measuring intracellular potential is not enough to clarify the mechanism.

      [Reviewer 1, Response 1] We appreciate the reviewer’s comments. We agree that changes in cell volume due to depolarization and hyperpolarization significantly contribute to the observed changes in T2, PSR, and T1, especially in pelletized cells. For this reason, we already noted in the Discussion section of the original manuscript that cell volume changes influence the observed MR parameter changes, though this study did not present the magnitude of the cell volume changes. In this regard, we thank the reviewer for introducing the work by Stroman et al. (Magn Reson Med 59:700-706, 2008). When discussing the contribution of the cell volume changes to the observed MR parameter changes, we additionally discussed the work of Stroman et al. in the revised manuscript.

      In addition, we acknowledge that the title and main conclusion of the original manuscript may be misleading, as we did not separately consider the effect of cell volume changes on MR parameters. To more accurately reflect the scope and results of this study and also take into account the reviewer 2’s suggestion, we adjusted the title to “Responses to membrane potential-modulating ionic solutions measured by magnetic resonance imaging of cultured cells and in vivo rat cortex” and also revised the relevant phrases in the main text.

      Finally, when [K<sup>+</sup>]-induced membrane potential changes are involved, there seems to be factors other than cell volume changes that appear to influence T<sup>2</sup> changes. Our follow-up study shows that there are differences in volume changes for the same T<sup>2</sup> change in the following two different situations: pure osmotic volume changes versus [K<sup>+</sup>]-induced volume changes. For example, for the same T<sup>2</sup> change, the volume change for depolarization is greater than the volume change for hypoosmotic conditions. We will present these results in this coming ISMRM 2025 and are also preparing a manuscript to report shortly.

      [Reviewer 1, Comment 2] So why does it matter whether the mechanism is cell swelling or membrane potential? The reason is response time. Cell swelling due to depolarization is a slow process, slower than hemodynamic responses that characterize BOLD. In fact, cell swelling under normal homeostatic conditions in vivo is virtually non-existent. Only sustained depolarization events typically associated with non-naturalistic stimuli or brain dysfunction produce cell swelling. Membrane potential changes associated with neural activity, on the other hand, are very fast. In this manuscript, the authors have convincingly shown a signal change that is virtually the same as what was seen in the Stroman publication, but they have not shown that there is a response that can be detected with anything approaching the timescale of an action potential. So one cannot definitely say that the changes observed are due to membrane potential. One can only say they are consistent with cell swelling, regardless of what causes the cell swelling.

      For this mechanism to be relevant to explaining DIANA, one needs to show that the cell swelling changes occur within a millisecond, which has never been reported. If one knows the populations of ECF and pellet, the T2s of the ECF and pellet and the volume change of the cells in the pellet, one can model any expected T2 changes due to neuronal activity. I think one would find that these are minuscule within the context of an action potential, or even bulk action potential.

      [Reviewer 1, Response 2] In the context of cell swelling occurring at rapid response times, if we define cell swelling simply as an “increase in cell volume,” there are several studies reporting transient structural (or volumetric) changes (e.g., ~nm diameter change over ~ms duration) in neuron cells during action potential propagation (Akkin et al., Biophys J 93:1347-1353, 2007; Kim et al., Biophys J 92:3122-3129, 2007; Lee et al., IEEE Trans Biomed Eng 58:3000-3003, 2011; Wnek et al., J Polym Sci Part B: Polym Phys 54:7-14, 2015; Yang et al., ACS Nano 12:4186-4193, 2018). These studies show a good correlation between membrane potential changes and cell volume changes (even if very small) at the cellular level within milliseconds.

      As mentioned in the Response 1 above, this study does not address rapid dynamic membrane potential changes on the millisecond scale, which we explicitly mentioned as one of the limitations in the Discussion section of the original manuscript. For this reason, we do not claim in this study that we provide the reader with definitive answers about the mechanisms involved in DIANA. Rather, as a first step toward addressing the mechanism of DIANA, this study confirms that there is a good correlation between changes in membrane potential and measurable MR parameters (e.g., T<sup>2</sup> and PSR) when using ionic solutions that modulate membrane potential. Identifying MR parameter changes that occur during millisecond-scale membrane potential changes due to rapid neural activation will be addressed in the follow-up study mentioned in the Response 1 above.

      There are a few smaller issues that should be addressed.

      [Reviewer 1, Comment 3] (1) Why were complicated imaging sequences used to measure T1 and T2? On a Bruker system it should be possible to do very simple acquisitions with hard pulses (which will not need dictionaries and such to get quantitative numbers). Of course, this can only be done sample by sample and would take longer, but it avoids a lot of complication to correct the RF pulses used for imaging, which leads me to the 2nd point.

      [Reviewer 1, Response 3] We appreciate the reviewer’s suggestion regarding imaging sequences. In fact, we used dictionaries for fitting in vivo T<sup>2</sup> decay data, not in vitro data. Sample-by-sample nonlocalized acquisition with hard pulses may be applicable for in vitro measurements. However, for in vivo measurements, a slice-selective multi-echo spin-echo sequence was necessary to acquire T<sup>2</sup> maps within a reasonable scan time. Our choice of imaging sequence was guided by the need to spatially resolve MR signals from specific regions of interest while balancing scan time constraints.

      [Reviewer 1, Comment 4] (2) Figure S1 (H) is unlike any exponential T2 decay I have seen in almost 40 years of making T2 measurements. The strange plateau at the beginning and the bump around TE = 25 ms are odd. These could just be noise, but the fitted curve exactly reproduces these features. A monoexponential T2 decay cannot, by definition, produce a fit shaped like this.

      [Reviewer 1, Response 4] The T<sup>2</sup> decay curves in Figure S1(H) indeed display features that deviate from a simple monoexponential decay. In our in vivo experiments, we used a multi-echo spin-echo sequence with slice-selective excitation and refocusing pulses. In such sequences, the echo train is influenced by stimulated echoes and imperfect slice profiles. This phenomenon is inherent to the pulse sequence rather than being artifacts or fitting errors (Hennig, Concepts Magn Reson 3:125-143, 1991; Lebel and Wilman, Magn Reson Med 64:1005-1014, 2010; McPhee and Wilman, Magn Reson Med 77:2057-2065, 2017). Therefore, we fitted the T<sub>2</sub> decay curve using the technique developed by McPhee and Wilman (2017).

      [Reviewer 1, Comment 5] (3) As noted earlier, layered samples produce biexponential T2 decays and monoexponential T1 decays. I don't quite see how this was accounted for in the fitting of the data from the pellet preparations. I realize that these are spatially resolved measurements, but the imaging slice shown seems to be at the boundary of the pellet and the extracellular media and there definitely should be a biexponential water proton decay curve. Only 5 echo times were used, so this is part of the problem, but it does mean that the T2 reported is a population fraction weighted average of the T2 in the two compartments.

      [Reviewer 1, Response 5] We understand the reviewer’s concern regarding potential biexponential decay due to the presence of different compartments. In our experiments, we carefully positioned the imaging slice sufficiently remote from the pellet-media interface. This approach ensures that the signal predominantly arises from the cells (and interstitial fluid), excluding the influence of extracellular media above the cell pellet. We described the imaging slice more clearly in the revised manuscript. As mentioned in our Methods section, for in vitro experiments, we repeated a single-echo spin-echo sequence with 50 difference echo times. While Figure 1C illustrates data from five echo times for visual clarity, the full dataset with all 50 echo times was used for fitting. We clarified this point in the revised manuscript to avoid any misunderstanding.

      [Reviewer 1, Comment 6] (4) Delta T1 and T2 values are presented for the pellets in wells, but no absolute values are presented for either the pellets or the KCL solutions that I could find.

      [Reviewer 1, Response 6] As requested by the reviewer, we included the absolute values in the supplementary information.

      Reviewer #2 (Public review):

      Summary:

      Min et al. attempt to demonstrate that magnetic resonance imaging (MRI) can detect changes in neuronal membrane potentials. They approach this goal by studying how MRI contrast and cellular potentials together respond to treatment of cultured cells with ionic solutions. The authors specifically study two MRI-based measurements: (A) the transverse (T2) relaxation rate, which reflects microscopic magnetic fields caused by solutes and biological structures; and (B) the fraction or "pool size ratio" (PSR) of water molecules estimated to be bound to macromolecules, using an MRI technique called magnetization transfer (MT) imaging. They see that depolarizing K<sup>+</sup> and Ba2+ concentrations lead to T2 increases and PSR decreases that vary approximately linearly with voltage in a neuroblastoma cell line and that change similarly in a second cell type. They also show that depolarizing potassium concentrations evoke reversible T2 increases in rat brains and that these changes are reversed when potassium is renormalized. Min et al. argue that this implies that membrane potential changes cause the MRI effects, providing a potential basis for detecting cellular voltages by noninvasive imaging. If this were true, it would help validate a recent paper published by some of the authors (Toi et al., Science 378:160-8, 2022), in which they claimed to be able to detect millisecond-scale neuronal responses by MRI.

      Strengths:

      The discovery of a mechanism for relating cellular membrane potential to MRI contrast could yield an important means for studying functions of the nervous system. Achieving this has been a longstanding goal in the MRI community, but previous strategies have proven too weak or insufficiently reproducible for neuroscientific or clinical applications. The current paper suggests remarkably that one of the simplest and most widely used MRI contrast mechanisms-T2 weighted imaging-may indicate membrane potentials if measured in the absence of the hemodynamic signals that most functional MRI (fMRI) experiments rely on. The authors make their case using a diverse set of quantitative tests that include controls for ion and cell type-specificity of their in vitro results and reversibility of MRI changes observed in vivo.

      Weaknesses:

      [Reviewer 2, Comment 1] The major weakness of the paper is that it uses correlational data to conclude that there is a causational relationship between membrane potential and MRI contrast. Alternative explanations that could explain the authors' findings are not adequately considered. Most notably, depolarizing ionic solutions can also induce changes in cellular volume and tissue structure that in turn alter MRI contrast properties similarly to the results shown here. For example, a study by Stroman et al. (Magn Reson Med 59:700-6, 2008) reported reversible potassium-dependent T2 increases in neural tissue that correlate closely with light scattering-based indications of cell swelling. Phi Van et al. (Sci Adv 10:eadl2034, 2024) showed that potassium addition to one of the cell lines used here likewise leads to cell size increases and T2 increases. Such effects could in principle account for Min et al.'s results, and indeed it is difficult to see how they would not contribute, but they occur on a time scale far too slow to yield useful indications of membrane potential. The authors' observation that PSR correlates negatively with T2 in their experiments is also consistent with this explanation, given the inverse relationship usually observed (and mechanistically expected) between these two parameters. If the authors could show a tight correspondence between millisecond-scale membrane potential changes and MRI contrast, their argument for a causal connection or a useful correlational relationship between membrane potential and image contrast would be much stronger. As it is, however, the article does not succeed in demonstrating that membrane potential changes can be detected by MRI.

      [Reviewer 2, Response 1] We appreciate the reviewer’s comments. We agree that changes in cell volume due to depolarization and hyperpolarization significantly contribute to the observed MR parameter changes. For this reason, we have already noted in the Discussion section of the original manuscript that cell volume changes influence the observed MR parameter changes. In this regard, we thank the reviewer for introducing the work by Stroman et al. (Magn Reson Med 59:700-706, 2008) and Phi Van et al. (Sci Adv 10:eadl2034, 2024). When discussing the contribution of the cell volume changes to the observed MR parameter changes, we additionally discussed both work of Stroman et al. and Phi Van et al. in the revised manuscript.

      In addition, this study does not address rapid dynamic membrane potential changes on the millisecond scale, which we explicitly discussed as one of the limitations of this study in the Discussion section of the original manuscript. For this reason, we do not claim in this study that we provide the reader with definitive answers about the mechanisms involved in DIANA. Rather, as a first step toward addressing the mechanism of DIANA, this study confirms that there is a good correlation between changes in membrane potential and measurable MR parameters (although on a slow time scale) when using ionic solutions that modulate membrane potential. Identifying MR parameter changes that occur during millisecond-scale membrane potential changes due to rapid neural activation will be addressed in the follow-up study mentioned in the Response 1 to Reviewer 1’s Comment 1 above.

      Together, we acknowledge that the title and main conclusion of the original manuscript may be misleading. To more accurately reflect the scope and results of this study and also consider the reviewer’s suggestion, we adjusted the title to “Responses to membrane potential-modulating ionic solutions measured by magnetic resonance imaging of cultured cells and in vivo rat cortex” and also revised the relevant phrases in the main text.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      [Reviewer 1, Comment 7] The manuscript is well written. One thing to emphasize early on is that the KCL depolarization is done in an equimolar (or isotonic) manner. I was not clear on this point until I got to the very end of the methods. This is a strength of the paper and should be presented earlier.

      [Reviewer 1, Response 7] In response to the reviewer’s suggestion, we have revised the manuscript to present the equimolar characteristic of our experiment earlier.

      [Reviewer 1, Comment 8] In terms of experiments, the relaxation time measurements are not well constructed. They should be done with a CPMG sequence with hundreds of echos and properly curve fit. This is entirely possible on a Bruker spectrometer.

      [Reviewer 1, Response 8] As noted in our Response to Reviewer 1’s Comment 3, while a CPMG sequence with numerous echoes and straightforward curve fitting can be effective, it is less feasible for in vivo experiments. Our multi-echo spin-echo sequence was a balanced approach between spatial resolution, reasonable scan duration, and the need to localize signals within specific regions of interest.

      [Reviewer 1, Comment 9] Measurements of cell swelling should be done to determine the time course of the cell swelling. This could be with NMR (CPMG) or with light scattering. For this mechanism to be relevant to explaining DIANA, one needs to show that the cell swelling changes occur within a millisecond, which has never been reported. If one knows the populations of ECF and pellet, the T2s of the ECF and pellet and the volume change of the cells in the pellet, one can model any expected T2 changes due to neuronal activity.

      [Reviewer 1, Response 9] We acknowledge the importance of further research to further strengthened the claims of this study through additional experiments such as cell volume recording. We will do it in future studies.

      As noted in our Response 2 to Reviewer 1’s Comment 2, this study does not address rapid membrane potential changes on the millisecond scale, and we acknowledge that establishing the precise timing of cell swelling is crucial for fully understanding the mechanisms of DIANA. Our current work demonstrates that MR parameters (e.g., T<sup>2</sup> and PSR) correlate strongly with membrane potential-modulating ionic environments, but it does not extend to millisecond-scale neural activation. We recognize the importance of further experiments, such as direct cell volume measurements and plan to incorporate it in future studies to build on the insights gained from the present work.

      Reviewer #2 (Recommendations for the authors):

      Here are a few comments, questions, and suggestions for improvement:

      [Reviewer 2, Comment 2] I could not find much information about the various incubation times and delays used for the authors' in vitro experiments. For each of the in vitro experiments in particular, how long were cells exposed to the stated ionic condition prior to imaging, and how long did the imaging take? Could this and any other relevant information about the experimental timing please be provided and added to the methods section?

      [Reviewer 2, Response 2] We have included the information about the preparation/incubation times in the revised manuscript. For the scan time, it was already stated in the original manuscript: 23 minutes for the single-echo spin-echo sequence and 23 minutes for the inversion-recovery multi-echo spin-echo, for a total of 46 minutes.

      [Reviewer 2, Comment 3] In what format were the cells used for patch clamping, and were any controls done to ensure that characteristics of these cells were the same as those pelleted and imaged in the MRI studies? How long were the incubation times with ionic solutions in the patch clamp experiment? This information should likewise be added to the paper.

      [Reviewer 2, Response 3] We have clarified in the revised manuscript that SH-SY5Y cells were patch clamp-measured in their adherent state. On the other hand, the cells were dissociated from the culture plate and pelleted, so the experimental environments were not entirely identical. The patch clamp experiments involved a 20–30 minutes incubation period with the ionic solutions. We have included this information in the revised manuscript.

      [Reviewer 2, Comment 4] Can the authors provide information about the mean cell size observed under each condition in their in vitro experiments?

      [Reviewer 2, Response 4] We did not directly quantify the mean cell size for each in vitro condition in this study, so we do not have corresponding data. However, we acknowledge that this information could provide valuable insights into potential mechanisms underlying the observed MR parameter changes. In future experiments, we plan to include direct cell-size measurements to further elucidate how changes in cell volume or hydration contribute to our MR findings.

      [Reviewer 2, Comment 5] The ionic challenges used both in vitro and in vivo could also have affected cell permeability, with corresponding effects that would be detectable in diffusion weighted imaging. Did the authors examine this or obtain any results that could reflect on contributions of permeability properties to the contrast effects they report?

      [Reviewer 2, Response 5] We did not perform diffusion-weighted imaging and therefore do not have direct data regarding changes in cell permeability. We agree that incorporating diffusion-weighted measurements could help distinguish whether the MR parameters changes are driven primarily by membrane potential shifts, cell volume changes, or variations in permeability properties. We will consider these approaches in our future studies.

      [Reviewer 2, Comment 6] Clearly, a faster stimulation method such as optogenetics, in combination with time-locked MRI readouts of the pelleted cells, would be more effective at demonstrating a useful relationship between cellular neurophysiology and MRI contrast in vitro. Can the authors present data from such an experiment? Is there any information they can present that documents the time course of observed responses in their experiments?

      [Reviewer 2, Response 6] In the current study, our methodology did not include time-resolved or dynamic measurements. While it may be possible to obtain indirect information about the temporal dynamics using T<sup>2</sup>-weighted or MT-weighted imaging, such an experiment was beyond the scope of this work. However, we agree that an optogenetic approach with time-locked MRI acquisitions could help directly link cell physiology to MRI contrast, and we will explore this in future studies.

      [Reviewer 2, Comment 7] The authors used a drug cocktail to suppress hemodynamic effects in the experiments of Figs. 5-6. What evidence is there that this cocktail successfully suppresses hemodynamic responses and that it also preserves physiological responses to the ionic challenges used in their experiments? Were analogous in vivo results also obtained in the absence of the cocktail?

      [Reviewer 2, Response 7] We appreciate the reviewer’s concern regarding pharmacological suppression of hemodynamic effects. Although each component is known to inhibit nitric oxide synthesis, we did not directly measure the degree of hemodynamic suppression in this study. In addition, we cannot definitively confirm that these agents preserved the physiological responses to the ionic challenges. We have clarified these points in the revised manuscript and identified them as limitations of the study.

      [Reviewer 2, Comment 8] Why weren't PSR results reported as part of the in vivo experimental results in Fig. 5? Does PSR continue to vary inversely to T2 in these experiments?

      [Reviewer 2, Response 8] In our current experimental setup, acquiring the T<sup>2</sup> map four times required 48 minutes, and extending the scan to include additional quantitative MT measurements for PSR would have significantly prolonged the scanning session. Given that these experiments were conducted on acutely craniotomized rats, maintaining stable physiological conditions for such a long period of time was challenging. Therefore, due to time constraints, we did not perform MT measurements and focused on T<sub>2</sub> mapping.

      [Reviewer 2, Comment 9] The authors have established in vivo optogenetic stimulation paradigms in their laboratory and used them in the Toi et al. DIANA study. Were T2 or PSR changes observed in vivo using standard T2 measurement or T2-weighted imaging methods that do not rely on the DIANA pulse sequence they originally applied?

      [Reviewer 2, Response 9] Our current T<sub>2</sub> mapping experiments utilized a standard multi-echo spin-echo sequence, rather than the DIANA pulse sequence employed in our previous work. In this respect, the T<sub>2</sub> changes we observed in vivo do not rely on the specialized DIANA methodology.

      [Reviewer 2, Comment 10] In the discussion section, the authors state that to their knowledge, theirs "is the first report that changes in membrane potential can be detected through MRI." This cannot be true, as their own Toi et al. Science paper previously claimed this, and a number of the studies cited on p.2 also claimed to detect close correlates of neuroelectric activity. This statement should be amended or revised.

      [Reviewer 2, Response 10] We appreciate the reviewer’s comment. We have revised the discussion section of the manuscript to reflect the points raised by the reviewer.

      [Reviewer 2, Comment 11] Because the current study does not actually demonstrate that changes in membrane potential can be detected by MRI, the authors should alter the title, abstract, and a number of relevant statements throughout the text to avoid implying that this has been shown. The title, for instance, could be changed to "Responses to depolarizing and hyperpolarizing ionic solutions measured by magnetic resonance imaging of excitable cells and rat brains," or something along these lines.

      [Reviewer 2, Response 11] We appreciate the reviewer’s suggestions. We have revised the title, abstract, and relevant statements of the manuscript to clarify that our findings show MR-detectable responses to ionic solutions that are expected to modulate membrane potential, rather than demonstrating direct detection of membrane potential changes by MRI.

      [Reviewer 2, Comment 12] The axes in Fig. 3 seem to be mislabeled. I think the horizontal axes are supposed to be membrane potential measured in mV.

      [Reviewer 2, Response 12] Thank the reviewer for finding an error. We have corrected the axis labels in Figure 3 to indicate membrane potential (in mV) on the horizontal axis.

      [Reviewer 2, Comment 13] Since neither the experiments in Jurkat cells (Fig. 4) nor the in vivo MRI tests (Fig. 5-6) appear to have made in conjunction with membrane potential measurements, it seems like a stretch to refer to these experiments as involving manipulation of membrane potentials per se. Instead, the authors should refer to them as involving administration of stimuli expected to be depolarizing or hyperpolarizing. The "hyperpolarization" and "depolarization" labels of Fig. 4 similarly imply a result that has not actually been shown, and should ideally be changed.

      [Reviewer 2, Response 13] To prevent any misleading that membrane potential changes were directly measured in Jurkat cells or in vivo, we have revised the relevant text and figure labels.

      [Reviewer 2, Comment 14] The changes in T2 and PSR documented with various K<sup>+</sup> challenges to Jurkat cells in Fig. 4 seem to follow a step-function-like profile that differs from the results reported in SH-SY5Y cells. Can the authors explain what might have caused this difference?

      [Reviewer 2, Response 14] We currently do not have a definitive explanation for why Jurkat cells exhibit a step-function-like response to varying K⁺ levels, whereas SH-SY5Y cells show a linear response to log [K<sup>+</sup>]. Experiments that include direct membrane potential measurements in Jurkat cells would help clarify whether this difference arises from genuinely different patterns of depolarization/hyperpolarization or from other factors. We have revised the revised manuscript to address this point.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review): 

      Summary: 

      This fascinating manuscript studies the effect of education on brain structure through a natural experiment. Leveraging the UK BioBank, these authors study the causal effect of education using causal inference methodology that focuses on legislation for an additional mandatory year of education in a regression discontinuity design. 

      Strengths: 

      The methodological novelty and study design were viewed as strong, as was the import of the question under study. The evidence presented is solid. The work will be of broad interest to neuroscientists 

      Weaknesses: 

      There were several areas which might be strengthed from additional consideration from a methodological perspective. 

      We sincerely thank the reviewer for the useful input, in particular, their recommendation to clarify RD and for catching some minor errors in the methods (such as taking the log of the Bayes factors). 

      Reviewer #1 (Recommendations for the authors): 

      (1) The fuzzy local-linear regression discontinuity analysis would benefit from further description. 

      (2) In the description of the model, the terms "smoothness" and "continuity" appear to be used interchangeably. This should be adjusted to conform to mathematical definitions. 

      We have now added to our explanations of continuity regression discontinuity. In particular, we now explain “fuzzy”, and add emphasis on the two separate empirical approaches (continuity and local-randomization), along with fixing our use of “smoothness” and “continuity”.

      results:

      “Compliance with ROSLA was very high (near 100%; Sup. Figure 2). However, given the cultural and historical trends leading to an increase in school attendance before ROSLA, most adolescents were continuing with education past 15 years of age before the policy change (Sup Plot. 7b). Prior work has estimated 25 percent of children would have left school a year earlier if not for ROSLA 41. Using the UK Biobank, we estimate this proportion to be around 10%, as the sample is healthier and of higher SES than the general population (Sup. Figure 2; Sup. Table 2) 46–48.”

      methods:

      “RD designs, like ours, can be ‘fuzzy’ indicating when assignment only increases the probability of receiving it, in turn, treatment assigned and treatment received do not correspond for some units 33,53. For instance, due to cultural and historical trends, there was an increase in school attendance before ROSLA; most adolescents were continuing with education past 15 years of age (Sup Plot. 7b). Prior work has estimated that 25 percent of children would have left school a year earlier if not for ROSLA 41. Using the UK Biobank, we estimate this proportion to be around 10%, as the sample is healthier and of higher SES than the general population (Sup. Figure 2; Sup. Table 2) 46–48.”

      (3) The optimization of the smoother based on MSE would benefit from more explanation and consideration. How was the flexibility of the model taken into account in testing? Were there any concerns about post-selection inference? A sensitivity analysis across bandwidths is also necessary. Based on the model fit in Figure 1, results from a linear model should also be compared. 

      It is common in the RD literature to illustrate plots with higher-order polynomial fits while inference is based on linear (or at most quadratic) models (Cattaneo, Idrobo & Titiunik, 2019). We agree that this field-specific practice can be confusing to readers. Therefore, we have redone Figure 1 using local-linear fits better aligning with our analysis pipeline. Yet, it is still not a one-to-one alignment as point estimation and confidence are handled robustly while our plotting tools are simple linear fits. In addition, we updated Sup. Fig 3 and moved 3rd-order polynomial RD plots to Sup. Fig 4.

      Empirical RD has many branching analytical decisions (bandwidth, polynomial order, kernel) which can have large effects on the outcome. Fortunately, RD methodology is starting to become more standardized (Catteneo & Titiunik, 2022, Ann. Econ Rev) as there have been indications of publication bias using these methods (Stommes, Aronow & Sävje, 2023, Research and Politics (This paper suggest it is not researcher degrees of freedom, rather inappropriate inferential methods)). While not necessarily ill-intended, researcher degrees of freedom and analytic flexibility are major contributors to publication bias. We (self) limited our analytic flexibility by using pre-registration (https://osf.io/rv38z).

      One of the most consequential analytic decisions in RD is the bandwidth size as there is no established practice, they are context-specific and can be highly influential on the results. The choice of bandwidths can be framed as a ‘bias vs. variance trade-off’. As bandwidths increase, variance decreases since more subjects are added yet bias (misspecification error/smoothing bias) also increases (as these subjects are further away and less similar). In our case, our assignment (running/forcing) variable is ‘date of birth in months’; therefore our smallest comparison would be individuals born in August 1957 (unaffected/no treatment) vs September 1957 (affected/treated). This comparison has the least bias (subjects are the most similar to each other), yet it comes at the expense of very few subjects (high variance in our estimate). 

      MSE-derived bandwidths attempt to solve this issue by offering an automatic method to choose an analysis bandwidth in RD. Specifically, this aims to minimize the MSE of the local polynomial RD point estimator – effectively choosing a bandwidth by balancing the ‘bias vs. variance trade-off’ (explained in detail 4.4.2 Cattaneo et al., 2019 p 45 - 51 “A practical introduction to regression discontinuity designs: foundations”). Yet, you are very correct in highlighting potential overfitting issues as they are “by construction invalid for inference” (Calonico, Cattaneo & Farrell, 2020, p. 192). Quoting from Cattaneo and Titiunik’s Annual Review of Economics from 2022: 

      “Ignoring the misspecification bias can lead to substantial overrejection of the null hypothesis of no treatment effect. For example, back-of-the-envelop calculations show that a nominal 95% confidence interval would have an empirical coverage of about 80%.”

      Fortunately, modern RD analysis packages (such as rdrohust or RDHonest) calculate robust confidence intervals - for more details see Armstrong and Kolesar (2020). For a summary on MSE-bandwidths see the section “Why is it hard to estimate RD effects?” in Stommes and colleagues 2023 (https://arxiv.org/abs/2109.14526). For more in-depth handling see the Catteneo, Idrobo, and Titiunik primer (https://arxiv.org/abs/1911.09511).

      Lastly, with MSE-derived bandwidths sensitivity tests only make sense within a narrow window of the MSE-optimized bandwidth (5.5 Cattaneo et al., 2019 p 106 - 107). When a significant effect occurs, placebo cutoffs (artificially moving the cutoff) and donut-hole analysis are great sensitivity tests. Instead of testing our bandwidths, we decided to use an alternate RD framework (local randomization) in which we compare 1-month and 5-month windows. Across all analysis strategies, MRI modalities, and brain regions, we do not find any effects of the education policy change ROSLA on long-term neural outcomes.

      (4) In the Bayesian analysis, the authors deviated from their preregistered analytic plan. This whole section is a bit confusing in its current form - for example, point masses are not wide but rather narrow. Bayes factors are usually estimated; it is unclear how or why a prior was specified. What exactly is being modeled using a prior? Also, throughout - If the log was taken, as the methods seem to indicate for the Bayes factor, this should be mentioned in figures and reported estimates. 

      First, we would like to thank you for spotting that we incorrectly kept the log in the methods. We have fixed this and added the following sentence to the methods: 

      “Bayes factors are reported as BF<sub>10</sub> in support of the alternative hypothesis, we report Bayes factors under 1 as the multiplicative inverse (BF<sub>01</sub> = 1/BF)”

      All Bayesian analyses need to have a prior. In practice, this becomes an issue when you’re uncertain about 1) the location of the effect (directionality & center mass, defined by a location parameter), yet more importantly, the 2) confidence/certainty of the range-spread of possible effects (determined by a scale parameter). In normally distributed priors these two ‘beliefs’ are represented with a mean and a standard deviation (the latter impacts your confidence/certainty on the range of plausible parameter space). 

      Supplementary figure 6 illustrates several distributions (location = 0 for all) with varying scale parameters; when used as Bayesian priors this indicates differing levels of confidence in our certainty of the plausible parameter space. We illustrate our three reported, normally distributed priors centered at zero in blue with their differing scale parameters (sd = .5, 1 & 1.5).

      All of these five prior distributions have the same location parameter (i.e., 0) yet varying differences in the scale parameter – our confidence in the certainty of the plausible parameter space. At first glance it might seem like a flat/uniform prior (not represented) is a good idea – yet, this would put equal weight on the possibility of every estimate thereby giving the same probability mass to implausible values as plausible ones. A uniform prior would, for instance, encode the hypothesis that education causing a 1% increase in brain volume is just as plausible as it causing either a doubling or halving in brain volume. In human research, we roughly know a range of reasonable effect sizes and it is rare to see massive effects.

      A benefit of ‘weakly-informative’ priors is that they limit the range of plausible parameter values. The default prior in STAN (a popular Bayesian estimation program; https://mc-stan.org) is a normally distributed prior with a mean of zero and an SD of 2.5 (seen in orange in the figure; our initial preregistered prior). This large standard deviation easily permits positive and negative estimates putting minimal emphasis on zero. Contrast this to BayesFactor package’s (Morey R, Rouder J, 2023) default “wide” prior which is the Cauchy distribution (0, .7) illustrated in magenta (for more on the Cauchy see: https://distribution-explorer.github.io/continuous/cauchy.html). 

      These different defaults reflect differing Bayesian philosophical schools (‘estimate parameters’ vs ‘quantify evidence’ camps); if your goal is to accurately estimate a parameter it would be odd to have a strong null prior, yet (in our opinion) when estimating point-null BF’s a wide default prior gives far too much evidence in support of the null. In point-null BF testing the Savage-Dickey density ratio is the ratio between the height of the prior at 0 and the height of the posterior at zero (see Figure under section “testing against point null 0”). This means BFs can be very prior sensitive (seen in SI tables 5 & 6). For this reason, we thought it made sense to do prior sensitivity testing, to ensure our conclusions in favor of the null were not caused solely by an overly wide prior (preregistered orange distribution) we decided to report the 3 narrower priors (blue ones).

      Alternative Bayesian null hypotheses testing methods such as using Bayes Factors to test against a null region and ‘region of practical equivalence testing’ are less prior sensitive, yet both methods demand the researcher (e.g. ‘us’) to decide on a minimal effect size of practical interest. Once a minimal effect size of interest is determined any effect within this boundary is taken as evidence in support of the null hypothesis.

      (5) It is unclear why a different method was employed for the August / September data analysis compared to the full-time series. 

      We used a local-randomization RD framework, an entirely different empirical framework than continuity methods (resulting in a different estimate). For an overview see the primer by Cattaneo, Idrobo & Titiunik 2023 (“A Practical Introduction to Regression Discontinuity Designs: Extensions”; https://arxiv.org/abs/2301.08958).

      A local randomization framework is optimal when the running variable is discrete (as in our case with DOB in months) (Cattaneo, Idrobo & Titiunik 2023). It makes stronger assumptions on exchangeability therefore a very narrow window around the cutoff needs to be used. See Figure 2.1 and 2.2 (in the Cattaneo, Idrobo & Titiunik 2023) for graphical illustrations of 1) a randomized experiment, 2) a continuity RD design, and 3) local-randomization RD. Using the full-time series in a local randomization analysis is not recommended as there is no control for differences between individuals as we move further away from the cutoff – making the estimated parameter highly endogenous.

      We understand how it is confusing to have both a new framework and Bayesian methods (we could have chosen a fully frequentist approach) but using a different framework allows us to weigh up the aforementioned ‘bias vs variance tradeoff’ while Bayesian methods allow us to say something about the weight of evidence (for or against) our hypothesis.

      (6) Figure 1 - why not use model fits from those employed for hypothesis testing? 

      This is a great suggestion (ties into #3), we have now redone Figure 1.

      (7) The section on "correlational effect" might also benefit from additional analyses and clarifications. Indeed, the data come from the same randomized experiment for which minimum education requirements were adjusted. Was the only difference that the number of years of education was studied as opposed to the cohort? If so, would the results of this analysis be similar in another subsample of the UK Biobank for which there was no change in policy?

      We have clarified the methods section for the correlational/associational effect. This was the same subset of individuals for the local randomization analysis; all we did was change the independent variable from an exogenous dummy-coded ROSLA term (where half of the sample had the natural experiment) to a continuous (endogenous) educational attainment IV. 

      In principle, the results from the associational analysis should be exactly the same if we use other UK Biobank cohorts. To see if the association of education attainment with the global neuroimaging cohorts was similar across sub-cohorts of new individuals, we conducted post hoc Bayesian analysis on eight more subcohort of 10-month intervals, spaced 2 years apart from each other (Sup. Figure 7; each indicated by a different color). Four of these sub-cohorts predate ROSLA, while the other four are after ROSLA. Educational attainment is slowly increasing across the cohorts of individuals born from 1949 until 1965; intriguingly the effect of ROSLA is visually evident in the distributions of educational attainment (Sup. Figure 7). Also, as seen in the cohorts predating ROSLA more and more individuals were (already) choosing to stay in education past 15 years of age (see cohort 1949 vs 1955 in Sup. Figure 7).

      Sup. Figure 8 illustrates boxplots of the educational attainment posterior of the eight sub-cohorts in addition to our original analysis (s1957) using a normal distributed prior with a mean of 0 and a sd of 1. Total surface area shows a remarkably replicable association with education attainment. Yet, it is evident the “extremely strong” association we found for CSF was a statistical fluke – as the posterior of other cohorts (bar our initial test) crosses zero. The conclusions for the other global neuroimaging covariates where we concluded ‘no associational effect’ seems to hold across cohorts.

      We have now added methods, deviation from preregistration, and the following excerpt to the results:

      “A post hoc replication of this associational analysis in eight additional 10-month cohorts spaced two years apart (Sup. Figure 7) indicates our preregistered report on the associational effect of educational attainment on CSF to be most likely a false-positive (Sup. Figure 8). Yet, the positive association between surface area and educational attainment is robust across the additional eight replication cohorts.”

      Reviewer #2 (Public review): 

      Summary: 

      The authors conduct a causal analysis of years of secondary education on brain structure in late life. They use a regression discontinuity analysis to measure the impact of a UK law change in 1972 that increased the years of mandatory education by 1 year. Using brain imaging data from the UK Biobank, they find essentially no evidence for 1 additional year of education altering brain structure in adulthood. 

      Strengths: 

      The authors pre-registered the study and the regression discontinuity was very carefully described and conducted. They completed a large number of diagnostic and alternate analyses to allow for different possible features in the data. (Unlike a positive finding, a negative finding is only bolstered by additional alternative analyses). 

      Weaknesses: 

      While the work is of high quality for the precise question asked, ultimately the exposure (1 additional year of education) is a very modest manipulation and the outcome is measured long after the intervention. Thus a null finding here is completely consistent educational attainment (EA) in fact having an impact on brain structure, where EA may reflect elements of training after a second education (e.g. university, post-graduate qualifications, etc) and not just stopping education at 16 yrs yes/no. 

      The work also does not address the impact of the UK Biobank's well-known healthy volunteer bias (Fry et al., 2017) which is yet further magnified in the imaging extension study (Littlejohns et al., 2020). Under-representation of people with low EA will dilute the effects of EA and impact the interpretation of these results. 

      References: 

      Fry, A., Littlejohns, T. J., Sudlow, C., Doherty, N., Adamska, L., Sprosen, T., Collins, R., & Allen, N. E. (2017). Comparison of Sociodemographic and Health-Related Characteristics of UK Biobank Participants With Those of the General Population. American Journal of Epidemiology, 186(9), 1026-1034. https://doi.org/10.1093/aje/kwx246 

      Littlejohns, T. J., Holliday, J., Gibson, L. M., Garratt, S., Oesingmann, N., Alfaro-Almagro, F., Bell, J. D., Boultwood, C., Collins, R., Conroy, M. C., Crabtree, N., Doherty, N., Frangi, A. F., Harvey, N. C., Leeson, P., Miller, K. L., Neubauer, S., Petersen, S. E., Sellors, J., ... Allen, N. E. (2020). The UK Biobank imaging enhancement of 100,000 participants: rationale, data collection, management and future directions. Nature Communications, 11(1), 2624. https://doi.org/10.1038/s41467-020-15948-9 

      We thank the reviewer for the positive comments and constructive feedback, in particular, their emphasis on volunteer bias in UKB (similar points were mentioned by Reviewer 3). We have now addressed these limitations with the following passage in the discussion:

      “The UK Biobank is known to have ‘healthy volunteer bias’, as respondents tend to be healthier, more educated, and are more likely to own assets [71,72]. Various types of selection bias can occur in non-representative samples, impacting either internal (type 1) or external (type 2) validity. One benefit of a natural experimental design is that it protects against threats to internal validity from selection bias [43], design-based internal validity threats still exist, such as if volunteer bias differentially impacts individuals based on the cutoff for assignment. A more pressing limitation – in particular, for an education policy change – is our power to detect effects using a sample of higher-educated individuals. This is evident in our first stage analysis examining the percentage of 15-year-olds impacted by ROSLA, which we estimate to be 10% in neuro-UKB (Sup. Figure 2 & Sup. Table 2), yet has been reported to be 25% in the UK general population [41]. Our results should be interpreted for this subpopulation  (UK, 1973, from 15 to 16 years of age, compliers) as we estimate a ‘local’ average treatment effect [73]. Natural experimental designs such as ours offer the potential for high internal validity at the expense of external validity.”

      We also highlighted it both in the results and methods.

      We appreciate that one year of education may seem modest compared to the entire educational trajectory, but as an intervention, we disagree that one year of education is ‘a very modest manipulation’. It is arguably one of the largest positive manipulations in childhood development we can administer. If we were to translate a year of education into the language of a (cognitive) intervention, it is clear that the manipulation, at least in terms of hours, days, and weeks, is substantial. Prior work on structural plasticity (e.g., motor, spatial & cognitive training) has involved substantially more limited manipulations in time, intensity, and extent. There is even (limited) evidence of localized persistent long-term structural changes (Wollett & Maguire, 2011, Cur. Bio.).

      We have now also highlighted the limited generalizability of our findings since we estimate a ‘local’ average treatment effect. It is possible higher education (college, university, vocational schools, etc.) could impact brain structure, yet we see no theoretical reason why it would while secondary wouldn’t. Moreover, higher education education is even trickier to research empirically due to heightened self and administrative selection pressures. While we cannot discount this possibility, the impacts of endogenous factors such as genetics and socioeconomic status are most likely heightened. That being said, higher education offers exciting possibilities to compare more domain-specific processes (e.g., by comparing a philosophy student to a mathematics student). Causality could be tested in European systems with point entry into field-specific programs – allowing comparison of students who just missed entry criteria into one topic and settled for another.

      Regarding the amount of time following the manipulation, as we highlight in our discussion this is both a weakness and a strength. Viewed from a developmental neuroplasticity lens it would have been nice to have imaging immediately following the manipulation. Yet, from an aging perspective, our design has increased power to detect an effect.  

      Reviewer #2 (Recommendations for the authors): 

      (1) The authors assert there is no strong causal evidence for EA on brain structure. This overlooks work from Mendielian Randomisation, e.g. this careful work: https://pubmed.ncbi.nlm.nih.gov/36310536/ ... evidence from (good quality) MR studies should be considered. 

      We thank the reviewer for highlighting this well-done mendelian randomization study. We have now added this citation and removed previous claims on the “lack of causal evidence existing”. We refrain from discussing Mendelian randomization, as it it would need to be accompanied by a nuanced discussion on the strong limitations regarding EduYears-PGS in Mendelian randomization designs.

      (2) Tukey/Boxplot is a good name for your identification of outliers but your treatment of outliers has a well-recognized name that is missing: Windsorisation. Please add this term to your description to help the reader more quickly understand what was done. 

      Thanks, we have now added the term winsorized.

      (3) Nowhere is it plainly stated that "fuzzy" means that you allow for imperfect compliance with the exposure, i.e. some children born before the cut-off stayed in school until 16, and some born after the cut-off left school before 16. For those unfamiliar with RD it would be very helpful to explain this at or near the first reference of the term "fuzzy". 

      We have now clarified the term ‘fuzzy’ to the results and methods:

      methods:

      “RD designs, like ours, can be ‘fuzzy’ indicating when assignment only increases the probability of receiving it, in turn, treatment assigned and treatment received do not correspond for some units 33,53. For instance, due to cultural and historical trends, there was an increase in school attendance before ROSLA; most adolescents were continuing with education past 15 years of age (Sup Plot. 7b). Prior work has estimated that 25 percent of children would have left school a year earlier if not for ROSLA 41. Using the UK Biobank, we estimate this proportion to be around 10%, as the sample is healthier and of higher SES than the general population (Sup. Figure 2; Sup. Table 2) 46–48.”

      (4) Supplementary Figure 2 never states what the percentage actually measures. What exactly does each dot represent? Is it based on UK Biobank subjects with a given birth month? If so clarify. 

      Fixed!

      Reviewer #3 (Public review): 

      Summary: 

      This study investigates evidence for a hypothesized, causal relationship between education, specifically the number of years spent in school, and brain structure as measured by common brain phenotypes such as surface area, cortical thickness, total volume, and diffusivity. 

      To test their hypothesis, the authors rely on a "natural" intervention, that is, the 1972 ROSLA act that mandated an extra year of education for all 15-year-olds. The study's aim is to determine potential discontinuities in the outcomes of interest at the time of the policy change, which would indicate a causal dependence. Naturalistic experiments of this kind are akin to randomised controlled trials, the gold standard for answering questions of causality. 

      Using two complementary, regression-based approaches, the authors find no discernible effect of spending an extra year in primary education on brain structure. The authors further demonstrate that observational studies showing an effect between education and brain structure may be confounded and thus unreliable when assessing causal relationships. 

      Strengths: 

      (1) A clear strength of this study is the large sample size totalling up to 30k participants from the UK Biobank. Although sample sizes for individual analyses are an order of magnitude smaller, most neuroimaging studies usually have to rely on much smaller samples. 

      (2) This study has been preregistered in advance, detailing the authors' scientific question, planned method of inquiry, and intended analyses, with only minor, justifiable changes in the final analysis. 

      (3) The analyses look at both global and local brain measures used as outcomes, thereby assessing a diverse range of brain phenotypes that could be implicated in a causal relationship with a person's level of education. 

      (4) The authors use multiple methodological approaches, including validation and sensitivity analyses, to investigate the robustness of their findings and, in the case of correlational analysis, highlight differences with related work by others. 

      (5) The extensive discussion of findings and how they relate to the existing, somewhat contradictory literature gives a comprehensive overview of the current state of research in this area. 

      Weaknesses: 

      (1) This study investigates a well-posed but necessarily narrow question in a specific setting: 15-year-old British students born around 1957 who also participated in the UKB imaging study roughly 60 years later. Thus conclusions about the existence or absence of any general effect of the number of years of education on the brain's structure are limited to this specific scenario. 

      (2) The authors address potential concerns about the validity of modelling assumptions and the sensitivity of the regression discontinuity design approach. However, the possibility of selection and cohort bias remains and is not discussed clearly in the paper. Other studies (e.g. Davies et al 2018, https://www.nature.com/articles/s41562-017-0279-y) have used the same policy intervention to study other health-related outcomes and have established ROSLA as a valid naturalistic experiment. Still, quoting Davies et al. (2018), "This assumes that the participants who reported leaving school at 15 years of age are a representative sample of the sub-population who left at 15 years of age. If this assumption does not hold, for example, if the sampled participants who left school at 15 years of age were healthier than those in the population, then the estimates could underestimate the differences between the groups.". Recent studies (Tyrrell 2021, Pirastu 2021) have shown that UK Biobank participants are on average healthier than the general population. Moreover, the imaging sub-group has an even stronger "healthy" bias (Lyall 2022). 

      (3) The modelling approach used in this study requires that all covariates of no interest are equal before and after the cut-off, something that is impossible to test. Mentioned only briefly, the inclusion and exclusion of covariates in the model are not discussed in detail. Standard imaging confounds such as head motion and scanning site have been included but other factors (e.g. physical exercise, smoking, socioeconomic status, genetics, alcohol consumption, etc.) may also play a role. 

      We thank the reviewer for their numerous positive comments and have now attempted to address the first two limitations (generalizability and UKB bias) with the following passage in the discussion:

      “The UK Biobank is known to have ‘healthy volunteer bias’, as respondents tend to be healthier, more educated, and are more likely to own assets [71,72]. Various types of selection bias can occur in non-representative samples, impacting either internal (type 1) or external (type 2) validity. One benefit of a natural experimental design is that it protects against threats to internal validity from selection bias [43], design-based internal validity threats still exist, such as if volunteer bias differentially impacts individuals based on the cutoff for assignment. A more pressing limitation – in particular, for an education policy change – is our power to detect effects using a sample of higher-educated individuals. This is evident in our first stage analysis examining the percentage of 15-year-olds impacted by ROSLA, which we estimate to be 10% in neuro-UKB (Sup. Figure 2 & Sup. Table 2), yet has been reported to be 25% in the UK general population [41]. Our results should be interpreted for this subpopulation  (UK, 1973, from 15 to 16 years of age, compliers) as we estimate a ‘local’ average treatment effect [73]. Natural experimental designs such as ours offer the potential for high internal validity at the expense of external validity.”

      We further highlight this in the results section:

      “Compliance with ROSLA was very high (near 100%; Sup. Figure 2). However, given the cultural and historical trends leading to an increase in school attendance before ROSLA, most adolescents were continuing with education past 15 years of age before the policy change (Sup Plot. 7b). Prior work has estimated 25 percent of children would have left school a year earlier if not for ROSLA 41. Using the UK Biobank, we estimate this proportion to be around 10%, as the sample is healthier and of higher SES than the general population (Sup. Figure 2; Sup. Table 2) 46–48.”

      Healthy volunteer bias can create two types of selection bias; crucially participation itself can serve as a collider threatening internal validity (outlined in van Alten et al., 2024; https://academic.oup.com/ije/article/53/3/dyae054/7666749). Natural experimental designs are partially sheltered from this major limitation, as ‘volunteer bias’ would have to differentially impact individuals on one side of the cutoff and not the other – thereby breaking a primary design assumption of regression discontinuity. Substantial prior work (including this article) has not found any threats to the validity of the 1973 ROSLA (Clark & Royer 2010, 2013; Barcellos et al., 2018, 2023; Davies et al., 2018, 2023). While the Davies 2028 article did IP-weight with the UK Biobank sample, Barcellos and colleagues 2023 (and 2018) do not, highlighting the following “Although the sample is not nationally representative,  our estimates have internal validity because there is no differential selection on the two sides of the September 1, 1957 cutoff – see  Appendix A.”.

      The second (more acknowledged & arguably less problematic) type of selection bias results in threats to external validity (aka generalizability). As highlighted in your first point; this is a large limitation with every natural experimental design, yet in our case, this is further amplified by the UK Biobank’s healthy volunteer bias. We have now attempted to highlight this limitation in the discussion passage above.

      Point 3 – the inability to fully confirm design validity – is again, another inherent limitation of a natural experimental approach. That being said, extensive prior work has tested different predetermined covariates in the 1973 ROSLA (cited within), and to our knowledge, no issues have been found. The 1973 ROSLA seems to be one of the better natural experiments around (there was also a concerted effort to have an ‘effective’ additional year; see Clark & Royer 2010). For these reasons, we stuck with only testing the variables we wanted to use to increase precision (also offering new neuroimaging covariates that didn’t exist in the literature base). One additional benefit of ROSLA was that the cutoff was decided years later on a variable that happened (date of birth) in the past – making it particularly hard for adolescents to alter their assignments.

      Reviewer #3 (Recommendations for the authors): 

      (1) FMRIB's preprocessing pipeline is mentioned. Does this include deconfounding of brain measures? Particularly, were measures deconfounded for age before the main analysis? 

      This is such a crucial point that we triple-checked, brain imaging phenotypes were not corrected for age (https://biobank.ctsu.ox.ac.uk/crystal/crystal/docs/brain_mri.pdf) – large effects of age can be seen in the global metrics; older individuals have less surface area, thinner cortices, less brain volume (corrected for head size), more CSF volume (corrected for head size), more white matter hyperintensities, and worse FA values. Figure 1 shows these large age effects, which are controlled for in our continuity-based RD analysis.

      One’s date of birth (DOB) of course does not match perfectly to their age, this is why we included the covariate ‘visit date’; this interplay can now be seen in our updated SI Figure 1 (recommended in #3) which shows the distributions of visit date, DOB, and age of scan. 

      In a valid RD design covariates should not be necessary (as they should be balanced on either side of the cutoff), yet the inclusion of covariates does increase precision to detect effects. We tested this assumption, finding the effect of ‘visit date’ and its quadratic term to be not related to ROSLA (Sup. Table 1). This adds further evidence (specific to the UK Biobank sample) to the existing body of work showing the 1973 ROSLA policy change to not violate any design assumptions. Threats to internal validity would more than likely increase endogeneity and result in ‘false causal positive causal effects’ (which is not what we find).  

      (2) Despite the large overall sample size, I am wondering whether the effective number of samples is sufficient to detect a potentially subtle effect that is further attenuated by the long time interval before scanning. As stated, for the optimised bandwidth window (DoB 20 to 35 months around cut-off), N is about 5000. Does this mean that effectively about 250 (10%) out of about 2500 participants born after the cut-off were leaving school at 16 rather than 15 because of ROSLA? For the local randomisation analysis, this becomes about N=10 (10% out of 100). Could a power analysis show that these cohort sizes are large enough to detect a reasonably large effect? 

      This is a very valid point, one which we were grappling with while the paper was out for review. We now draw attention to this in the results and highlight this as a limitation in the discussion. While UKB’s non-representativeness limits our power (10% affected rather than 25% in the general population), it is still a very large sample. Our sample size is more in line with standard neuroimaging studies than with large cohort studies. 

      The novelty of our study is its causal design, while we could very precisely measure an effect of some phenotype (variable X) in 40,000 individuals. This effect is probably not what we think we are measuring. Without IP-weighting it could even have a different sign. But more importantly, it is not variable X – it is the thousands of things (unmeasured confounders) that lead an individual to have more or less of variable X. The larger the sample the easier it is for small unmeasured confounders to reach significance (Big data paradox) – this in no way invalidates large samples, it is just our thinking and how we handle large samples will hopefully change to a more casual lens.

      (3) Supplementary Figure 1: A similar raincloud plot of date of birth would be instructive to visualise the distribution of subjects born before and after the 1957 cut-off. 

      Great idea! We have done this in Sup Fig. 1 for both visit date and DOB.

      (4) p.9: Not sure about "extreme evidence", very strong would probably be sufficient. 

      As preregistered, we interpreted Bayes Factors using Jeffrey’s criteria. ‘Extreme evidence’ is only used once and it is about finding an associational effect of educational attainment on CSF (BF10 > 100). Upon Reviewer 1’s recommendation 7, we conducted eight replication samples (Sup. Figure 7 & 8) and have now added the following passage to the results:

      “A post hoc replication of this associational analysis in eight additional 10-month cohorts spaced two years apart (Sup. Figure 7) indicates our preregistered report on the associational effect of educational attainment on CSF to be most likely a false-positive (Sup. Figure 8). Yet, the positive association between surface area and educational attainment is robust across the additional eight replication cohorts.”

      (5) The code would benefit from a bit of clean-up and additional documentation. In its current state, it is not easy to use, e.g. in a replication study. 

      We have now further added documentation to our code; including a readme describing what each script does. The analysis pipeline used is not ideal for replications as the package used for continuity-based RD (RDHonest) initially could not handle covariates – therefore we manually corrected our variables after a discussion with Prof Kolesár (https://github.com/kolesarm/RDHonest/issues/7). 

      Prof Kolesár added this functionality recently and future work should use the latest version of the package as it can correct for covariates. We have a new preprint examining the effect of 1972 ROLSA on telomere length in the UK Biobank using the latest package version of RDHonest (https://www.biorxiv.org/content/10.1101/2025.01.17.633604v1). To ensure maximum availability of such innovations, we will ensure the most up-to-date version of this script becomes available on this GitHub link (https://github.com/njudd/EduTelomere).

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary:

      In a heroic effort, Ozanna Burnicka-Turek et al. have made and investigated conduction system-specific Tbx3-Tbx5 deficient mice and investigated their cardiac phenotype. Perhaps according to expectations, given the body of literature on the function of the two T-box transcription factors in the heart/conduction system, the cardiomyocytes of the ventricular conduction system seemed to convert to "ordinary" ventricular working myocytes. As a consequence, loss of VCS-specific conduction system propagation was observed in the compound KO mice, associated with PR and QRS prolongation and elevated susceptibility to ventricular tachycardia.

      Strengths:

      Great genetic model. Phenotypic consequences at the organ and organismal levels are well investigated. The requirement of both Tbx3 and Tbx5 for maintaining VCS cell state has been demonstrated.

      We thank Reviewer #1 for acknowledging the effort involved in generating and characterizing the Tbx3/Tbx5 double conditional knockout mouse model and for highlighting the significance of this work in elucidating the role of these transcription factors in maintaining the functional and transcriptional identity of the ventricular conduction system. 

      Weaknesses:

      The actual cell state of the Tbx3/Tbx5 deficient conducting cells was not investigated in detail, and therefore, these cells could well only partially convert to working cardiomyocytes, and may, in reality, acquire a unique state.

      We agree with Reviewer #1 that the Tbx3/Tbx5 double mutant ventricular conduction myocardial cells may only partially convert to working cardiomyocytes or may acquire a unique state.  The transcriptional state of the double mutant VCS cells was investigated by bulk profiling of key genes associated with specific conduction and non-conduction cardiac regions, including fast conduction, slow conduction, or working myocardium. Neither the bulk transcriptional approaches nor the optical mapping approaches we employed capture single-cell data; in both cases, the data represents aggregated signals from multiple cells (1, 2). Single cell approaches for transcriptional profiling and cellular electrophysiology would clarify this concern and are appropriate for future studies. 

      (1) O’Shea C, Nashitha Kabri S, Holmes AP, Lei M, Fabritz L, Rajpoot K, Pavlovic D (2020) Cardiac optical mapping – State-of-the-art and future challenges. The International Journal of Biochemistry & Cell Biology 126:105804. doi: 10.1016/j.biocel.2020.105804. (2) Efimov IR, Nikolski VP, and Salama G (2004) Optical Imaging of the Heart. Circulation Research 95:21-33. doi: 10.1161/01.RES.0000130529.18016.35.

      Reviewer #2 (Public review):

      Summary:

      The goal of this work is to define the functions of T-box transcription factors Tbx3 and Tbx5 in the adult mouse ventricular cardiac conduction system (VCS) using a novel conditional mouse allele in which both genes are targeted in cis. A series of studies over the past 2 decades by this group and others have shown that Tbx3 is a transcriptional repressor that patterns the conduction system by repressing genes associated with working myocardium, while Tbx5 is a potent transcriptional activator of "fast" conduction system genes in the VCS. In a previous work, the authors of the present study further demonstrated that Tbx3 and Tbx5 exhibit an epistatic relationship whereby the relief of Tbx3-mediated repression through VCS conditional haploinsufficiency allows better toleration of Tbx5 VCS haploinsufficiency. Conversely, excess Tbx3-mediated repression through overexpression results in disruption of the fast-conduction gene network despite normal levels of Tbx5. Based on these data the authors proposed a model in which repressive functions of Tbx3 drive the adoption of conduction system fate, followed by segregation into a fast-conducting VCS and slow-conduction AVN through modulation of the Tbx5/Tbx3 ratio in these respective tissue compartments.

      The question motivating the present work is: If Tbx5/Tbx3 ratio is important for slow versus fast VCS identity, what happens when both genes are completely deleted from the VCS? Is conduction system identity completely lost without both factors and if so, does the VCS network transform into a working myocardium-like state? To address this question, the authors have generated a novel mouse line in which both Tbx5 and Tbx3 are floxed on the same allele, allowing complete conditional deletion of both factors using the VCS-specific MinK-CreERT2 line, convincingly validated in previous work. The goal is to use these double conditional knockout mice to further explore the model of Tbx3/Tbx5 co-dependent gene networks and VCS patterning. First, the authors demonstrate that the double conditional knockout allele results in the expected loss of Tbx3 and Tbx5 specifically in the VCS when crossed with Mink-CreERT2 and induced with tamoxifen. The double conditional knockout also results in premature mortality. Detailed electrophysiological phenotyping demonstrated prolonged PR and QRS intervals, inducible ventricular tachycardia, and evidence of abnormal impulse propagation along the septal aspect of the right ventricle. In addition, the mutants exhibit downregulation of VCS genes responsible for both fast conduction AND slow conduction phenotypes with upregulation of 2 working myocardial genes including connexin-43. The authors conclude that loss of both Tbx3 and Tbx5 results in "reversion" or "transformation" of the VCS network to a working myocardial phenotype, which they further claim is a prediction of their model and establishes that Tbx3 and Tbx5 "coordinate" transcriptional control of VCS identity.

      We appreciate Reviewer #2’s detailed summary of the study’s aims, methodologies, and findings, as well as their thoughtful suggestions for further analysis. We are grateful for their recognition of our genetic model’s novelty and robustness.

      Overall Appraisal:

      As noted above, the present study does not further explore the Tbx5/Tbx3 ratio concept since both genes are completely knocked out in the VCS. Instead, the main claims are that the absence of both factors results in a transcriptional shift of conduction tissue towards a working myocardial phenotype, and that this shift indicates that Tbx5 and Tbx3 "coordinate" to control VCS identity and function.

      We agree with this reviewer’s assessment of the assertions in our manuscript.  The novel combined Tbx5/Tbx3 double mutant model does not further explore the TBX5/TBX3 ratio concept, which we previously examined in detail (1). Instead, as the Reviewer notes, this manuscript focuses on testing a model that the coordinated activity of Tbx3 and Tbx5 defines specialized ventricular conduction identity. 

      (1) Burnicka-Turek O, Broman MT, Steimle JD, Boukens BJ, Petrenko NB, Ikegami K, Nadadur RD, Qiao Y, Arnolds DE, Yang XH, Patel VV, Nobrega MA, Efimov IR, Moskowitz IP (2020) Transcriptional Patterning of the Ventricular Cardiac Conduction System. Circulation Research 127:e94-e106. doi:10.1161/CIRCRESAHA.118.314460. 

      Strengths:

      (1) Successful generation of a novel Tbx3-Tbx5 double conditional mouse model.

      (2) Successful VCS-specific deletion of Tbx3 and Tbx5 using a VCS-specific inducible Cre driver line.

      (3) Well-powered and convincing assessments of mortality and physiological phenotypes. (4) Isolation of genetically modified VCS cells using flow.

      We thank Reviewer #2 for acknowledging the listed strengths of our study.

      Weaknesses:

      (1) In general, the data is consistent with a long-standing and well-supported model in which Tbx3 represses working myocardial genes and Tbx5 activates the expression of VCS genes, which seem like distinct roles in VCS patterning. However, the authors move between different descriptions of the functional relationship and epistatic relationship between these factors, including terms like "cooperative", "coordinated", and "distinct" at various points. In a similar vein, sometimes terms like "reversion" are used to describe how VCS cells change after Tbx3/Tbx5 conditional knockout, and other times "transcriptional shift" and at other times "reprogramming". But these are all different concepts. The lack of a clear and consistent terminology for describing the phenomena observed makes the overarching claims of the manuscript more difficult to evaluate.

      We discriminate prior work on the “long-standing and well-supported model’ supported by investigation of the role of Tbx5 and Tbx3 independently from this work examining the coordinated role of Tbx5 and Tbx3. Prior work demonstrated that Tbx3 represses working myocardial genes and Tbx5 activates expression of VCS genes, consistent with the reviewer’s suggestion of their distinct roles in VCS patterning. However, the current study uniquely evaluates the combined role of Tbx3 and Tbx5 in distinguishing specialized conduction identify from working myocardium, for the first time. 

      We appreciate Reviewer #2’s feedback regarding the need for consistent terminology when describing the impact of the double Tbx3 and Tbx5 mutant. We will edit the manuscript to replace terms like “reversion” with “transcriptional shift” or “transformation” when describing the observed phenotype, and we will use “coordination” to describe the combined role of Tbx5 and Tbx3 in maintaining VCS-specific identity.

      (2) A more direct quantitative comparison of Tbx5 Adult VCS KO with Tbx5/Tbx3 Adult VCS double KO would be helpful to ascertain whether deletion of Tbx3 on top of Tbx5 deletion changes the underlying phenotype in some discernable way beyond mRNA expression of a few genes. Superficially, the phenotypes look quite similar at the EKG and arrhythmia inducibility level and no optical mapping data from a single Tbx5 KO is presented for comparison to the double KO.

      We thank Reviewer #2 for the suggestions that a direct comparison between Tbx5 single conditional knockout and Tbx3/Tbx5 double conditional knockout models may help isolate the specific contribution of Tbx3 deletion in addition to Tbx5 deletion. 

      Previous studies have assessed the effect of single Tbx5 CKO in the VCS of murine hearts (1, 3, 5). Arnolds et al. demonstrated that the removal of Tbx5 from the adult ventricular conduction system results in VCS slowing, including prolonged PR and QRS intervals, prolongation of the His duration and His-ventricular (HV) interval (3).

      Furthermore, Burnicka-Turek et al. demonstrated that the single conditional knockout of Tbx5 in the adult VCS caused a shift toward a pacemaker cell state, with ectopic beats and inappropriate automaticity (1). Whole-cell patch clamping of VCS-specific Tbx5 deficient cells revealed action potentials characterized by a slower upstroke (phase 0), prolonged plateau (phase 2), delayed repolarization (phase 3), and enhanced phase 4 depolarization - features characteristic of nodal action potentials rather than typical VCS action potentials (3). These observations were interpreted as uncovering nodal potential of the VCS in the absence of Tbx5. Based on the role of Tbx3 in CCS specification (2), we hypothesized that the nodal state of the VCS uncovered in the absence of Tbx5 was enabled by maintained Tbx3 expression. This motivated us to generate the double Tbx5

      / Tbx3 knockout model to examine the state of the VCS in the absence of both T-box TFs. In the current study, we demonstrate that the VCS-specific deletion of Tbx3 and Tbx5 results in the loss of fast electrical impulse propagation in the VCS, similar to that observed in the single Tbx5 mutant. However, unlike the Tbx5 single mutant, the Tbx3/Tbx5 double deletion does not cause a gain of pacemaker cell state in the VCS. Instead, the physiological data suggests a transition toward non-conduction working myocardial physiology. This conclusion is supported by the presence of only a single upstroke in the optical action potential (OAP) recorded from the His bundle region and VCS cells in Tbx3/Tbx5 double conditional knockout mice. The electrical properties of VCS cells in the double knockout are functionally indistinguishable from those of ventricular working myocardial cells. As a result, ventricular impulse propagation is significantly slowed, resembling activation through exogenous pacing rather than the rapid conduction typically associated with the VCS. We will edit the text of the manuscript to more carefully distinguish the observations between these models, as suggested.

      (1) Burnicka-Turek O, Broman MT, Steimle JD, Boukens BJ, Petrenko NB, Ikegami K, Nadadur RD, Qiao Y, Arnolds DE, Yang XH, Patel VV, Nobrega MA, Efimov IR, Moskowitz IP (2020) Transcriptional Patterning of the Ventricular Cardiac Conduction System. Circulation Research 127:e94-e106. doi:10.1161/CIRCRESAHA.118.314460. 

      (2) Mohan RA, Bosada FM, van Weerd JH, van Duijvenboden K, Wang J, Mommersteeg MTM, Hooijkaas IB, Wakker V, de Gier-de Vries C, Coronel R, Boink GJJ, Bakkers J, Barnett P, Boukens BJ, Christoffels VM (2020) T-box transcription factor 3 governs a transcriptional program for the function of the mouse atrioventricular conduction system. Proc Natl Acad Sci U S A. 117:18617-18626. doi: 10.1073/pnas.1919379117.

      (3) Arnolds DE, Liu F, Fahrenbach JP, Kim GH, Schillinger KJ, Smemo S, McNally EM, Nobrega MA, Patel VV, Moskowitz IP (2012) TBX5 drives Scn5a expression to regulate cardiac conduction system function. The Journal of Clinical Investigation 122:2509–2518. doi: 10.1172/JCI62617.

      (4) Frank DU, Carter KL, Thomas KR, Burr RM, Bakker ML, Coetzee WA, Tristani-Firouzi M, Bamshad MJ, Christoffels VM, Moon AM (2012) Lethal arrhythmias in Tbx3-deficient mice reveal extreme dosage sensitivity of cardiac conduction system function and homeostasis. Proc Natl Acad Sci U S A. 109:E154-63. doi: 10.1073/pnas.1115165109.

      (5) Moskowitz IP, Pizard A, Patel VV, Bruneau BG, Kim JB, Kupershmidt S, Roden D, Berul CI, Seidman CE, Seidman JG (2004) The T-Box transcription factor Tbx5 is required for the patterning and maturation of the murine cardiac conduction system. Development 131:4107-4116. doi: 10.1242/dev.01265. PMID: 15289437.

      (3) The authors claim that double knockout VCS cells transform to working myocardial fate, but there is no comparison of gene expression levels between actual working myocardial cells and the Tbx3/Tbx5 DKO VCS cells so it's hard to know if the data reflect an actual cell state change or a more non-specific phenomenon with global dysregulation of gene expression or perhaps dedifferentiation. I understand that the upregulation of Gja1 and Smpx is intended to address this, but it's only two genes and it seems relevant to understand their degree of expression relative to actual working myocardium. In addition, the gene panel is somewhat limited and does not include other key transcriptional regulators in the VCS such as Irx3 and Nkx2-5. RNA-seq in these populations would provide a clearer comparison among the groups.

      And

      the main claims are that the absence of both factors results in a transcriptional shift of conduction tissue towards a working myocardial phenotype, and that this shift indicates that Tbx5 and Tbx3 "coordinate" to control VCS identity and function. However, only limited data are presented to support the claim of transcriptional reprogramming since the knockout cells are not directly compared to working myocardial cells at the transcriptional level and only a small number of key genes are assessed (versus genome-wide assessment).

      We appreciate Reviewer #2’s suggestion to expand the gene expression analysis in Tbx3/Tbx5-deficient VCS cells by including other specific genes and comparisons with “native”/actual working ventricular myocardial cells and broadening the gene panel. In this study, we evaluated core cardiac conduction system markers, revealing a loss of conduction system-specific gene expression in the double mutant VCS. Furthermore, we evaluated key working myocardial markers normally excluded from the conduction system, Gja1 and Smpx, revealing a shift towards a working myocardial state in the double mutant VCS (Figure 4). We agree that a more comprehensive analysis, such as transcriptome-wide approaches, would offer greater clarity on the extent and specificity of the observed shift from conduction to non-conduction identity. These approaches are appropriate directions for future studies.

      (4) From the optical mapping data, it is difficult to distinguish between the presence of (a) a focal proximal right bundle branch block due to dysregulation of gene expression in the VCS but overall preservation of the right bundle and its distal ramifications; from (b) actual loss of the VCS with reversion of VCS cells to a working myocardial fate. Related to this, the authors claim that this experiment allows for direct visualization of His bundle activation, but can the authors confirm or provide evidence that the tissue penetration of their imaging modality allows for imaging of a deep structure like the AV bundle as opposed to the right bundle branch which is more superficial? Does the timing of the separation of the sharp deflection from the subsequent local activation suggest visualization of more distal components of the VCS rather than the AV bundle itself? Additional clarification would be helpful.

      And

      In addition, the optical mapping dataset is incomplete and has alternative interpretations that are not excluded or thoroughly discussed.

      We agree with Reviewer #2 that the resolution of the optical mapping experiment may be insufficient to precisely localize the conduction block due to the limited signal strength from the VCS. It is possible that the region defined as the His Bundle also includes portions of the right bundle branch. Our control mice show VCS OAP upstrokes consistent with those reported by Tamaddon et al. (2000) using Di-4-ANEPPS (1). We appreciate the Reviewer’s attention to alternative interpretations, and we will incorporate these caveats into the manuscript text. 

      (1) Tamaddon HS, Vaidya D, Simon AM, Paul DL, Jalife J, Morley GE (2000) Highresolution optical mapping of the right bundle branch in connexin40 knockout mice reveals slow conduction in the specialized conduction system. Circulation Research 87:929-36. doi: 10.1161/01.res.87.10.929. 

      Impact:

      The present study contributes a novel and elegantly constructed mouse model to the field. The data presented generally corroborate existing models of transcriptional regulation in the VCS but do not, as presented, constitute a decisive advance.

      And

      In sum, while this study adds an elegantly constructed genetic model to the field, the data presented fit well within the existing paradigm of established functions of Tbx3 and Tbx5 in the VCS and in that sense do not decisively advance the field. Moreover, the authors' claims about the implications of the data are not always strongly supported by the data presented and do not fully explore alternative possibilities.

      We appreciate Reviewer # 2’s acknowledgment of the elegance and novelty of the mouse model we generated. However, we respectfully disagree with their assessment that this work merely corroborates existing models without providing a decisive advance. Previous studies have investigated single Tbx5 or Tbx3 gene knockouts in-depth and established the T-box ratio model for distinguishing fast VCS from slow nodal conduction identity (1) that the reviewer alludes to in earlier comments. In contrast, this study aimed to explore a different model, that the combined effects of Tbx5 and Tbx3 distinguish adult VCS identity from non-conduction working myocardium. The coordinated Tbx3 and Tbx5 role in conduction system identify remained untested due to the lack of a mouse model that allowed their simultaneous removal. The very model the reviewer recognizes as “novel and elegantly constructed” has allowed the examination of the coordinated role of Tbx5 and Tbx3 for the first time. While we acknowledge the opportunity for additional depth of investigation of this model in future studies, the data we present provides consistent experimental support for the coordinated requirement of both Tbx5 and Tbx3 for ventricular cardiac conduction system identity. 

      (1) Burnicka-Turek O, Broman MT, Steimle JD, Boukens BJ, Petrenko NB, Ikegami K, Nadadur RD, Qiao Y, Arnolds DE, Yang XH, Patel VV, Nobrega MA, Efimov IR, Moskowitz IP (2020) Transcriptional Patterning of the Ventricular Cardiac Conduction System. Circulation Research 127:e94-e106. doi:10.1161/CIRCRESAHA.118.314460. 

      Reviewer #3 (Public review):

      Summary:

      In the study presented by Burnicka-Turek et al., the authors generated for the first time a mouse model to cause the combined conditional deletion of Tbx3 and Tbx5 genes. This has been impossible to achieve to date due to the proximity of these genes in chromosome 5, preventing the generation of loss of function strategies to delete simultaneously both genes. It is known that both Tbx3 and Tbx5 are required for the development of the cardiac conduction system by transcription factor-specific but also overlapping roles as seen in the common and diverse cardiac defects found in patients with mutations for these genes. After validating the deletion efficiency and specificity of the line, the authors characterized the cardiac phenotype associated with the cardiac conduction system (CCS)-specific combined deletion of T_bx5_ and Tbx3 in the adult by inducing the activation of the CCS-specific tamoxifen-inducible Cre recombination (MinKcreERT) at 6 weeks after birth. Their analysis of 8-9-week-old animals did not identify any major morphological cardiac defects. However, the authors found conduction defects including prolonged PR and QTR intervals and ventricular tachycardia causing the death of the double mutants, which do not survive more than 3 months after tamoxifen induction. Molecular and optical mapping analysis of the ventricular conduction system (VCS) of these mutants concluded that, in the absence of Tbx5 and Tbx3 function, the cells forming the ventricular conduction system (VCS) become working myocardium and lose the specific contractile features characterizing VCS cells. Altogether, the study identified the critical combined role of Tbx3 and Tbx5 in the maintenance of the VCS in adulthood.

      Strengths:

      The study generated a new animal model to study the combined deletion of Tbx5 and Tbx3 in the cardiac conduction system. This unique model has provided the authors with the perfect tool to answer their biological questions. The study includes top-class methodologies to assess the functional defects present in the different mutants analyzed, and gathered very robust functional data on the conduction defects present in these mutants. They also applied optical action potential (OAP) methods to demonstrate the loss of conduction action potential and the acquisition of working myocardium action potentials in the affected cells because of Tbx5/Tbx3 loss of function. The study used simpler molecular and morphological analysis to demonstrate that there are no major morphological defects in these mutants and that indeed, the conduction defects found are due to the acquisition of working myocardium features by the VCS cells. Altogether, this study identified the critical role of these transcription factors in the maintenance of the VCS in the adult heart.

      We appreciate the Reviewer’s comments regarding the originality and utility of our model and the strengths of our methodological approach. The Reviewer’s appreciation of the molecular and morphological analyses as well as their constructive feedback is highly valuable.

      Weaknesses:

      In the opinion of this reviewer, the weakness in the study lies in the morphological and molecular characterization. The morphological analysis simply described the absence of general cardiac defects in the adult heart, however, whether the CCS tissues are present or not was not investigated. Lineage tracing analysis using the reporter lines included in the crosses described in the study will determine if there are changes in CCS tissue composition in the different mutants studied. Similarly, combining this reporter analysis with the molecular markers found to be dysregulated by qPCR and western blot, will demonstrate that indeed the cells that were specified as VCS in the adult heart, become working myocardium in the absence of Tbx3 and Tbx5 function.

      We appreciate the reviewer’s concern regarding the morphology of the cardiac conduction system in the Tbx3/Tbx5 double conditional knockout model. We did not observe any structural abnormalities, as the Reviewer notes. We agree with their suggestion for using Genetic Inducible Fate Mapping to mark cardiac conduction cells expressing MinKCre. In fact, we utilized this approach to isolate VCS cells for transcriptional profiling. Specifically, we combined the tamoxifen-inducible MinKCreERT allele with the Cre-dependent R26Eyfp reporter allele to label MinKCre-expressing cells in both control VCS and VCS-specific double Tbx3/Tbx5 knockouts. EYFP-positive cells were isolated for transcriptional studies, ensuring that our analysis exclusively targeted conduction system-lineage marked cells. The ability to isolate MinKCre-marked cells from both controls and Tbx5/Tbx3 double mutants indicates that VCS cells persisted in the double knockout. Nonetheless, the suggestion for in-vivo marking by Genetic Inducible

      Fate Mapping and morphologic analysis is a valuable recommendation for future studies. 

      Reviewer #1 (Recommendations for the authors):

      In a heroic effort, Ozanna Burnicka-Turek et al. have made and investigated conduction system-specific Tbx3-Tbx5 deficient mice and investigated their cardiac phenotype. Perhaps according to expectations, given the body of literature on the function of the two T-box transcription factors in the heart/conduction system, the cardiomyocytes of the ventricular conduction system seemed to convert to "ordinary" ventricular working myocytes. As a consequence, loss of VCS-specific conduction system propagation was observed in the compound KO mice, associated with PR and QRS prolongation and elevated susceptibility to ventricular tachycardia.

      Previous work suggested the prediction that VCS-specific genetic ablation of both the TBX3 and TBX5 would transform fast-conducting adult VCS into cells resembling working myocardium, eliminating specialized CCS fate. The current study suggests that this prediction is at least to some extent accurate.

      We appreciate Reviewer #1’s summary and recognition of our study. As the review notes, the simultaneous deletion of Tbx3 and Tbx5 in the mature ventricular conduction system (VCS) suggests a conversion of VCS to "ordinary" ventricular working myocytes. To our knowledge, this represents a novel observation and experimental model that uniquely captures the combined roles of these essential T-box transcription factors. We believe that this model offers a valuable platform for further investigation into the transcriptional mechanisms underlying conduction system specialization.

      (1) The huge effort made to generate the DKO model contrasts with the limited efforts made to study the mechanism. Conditional deficiency of Tbx3 and Tbx5 creates an artificial situation that is useful for addressing fundamental mechanistic questions. The authors provide a rather superficial analysis of the changes in the VCS upon deletion of these two critically important factors and do not provide really novel insights into their requirement/function in the VCS gene regulatory network and epigenetic state. So to what extent do VCS cardiomyocytes (CMs) from Tbx3/5 DKO mice resemble "simple" working myocardium? To what extent do these cells acquire the working myocardial (epigenetic) state, do these cells have an epigenetic memory of the Tbx3/Tbx5+ history, is the enhancer usage between the modified VCS CMs and the working CMs similar or not, etc.? The assumption that the authors' data indicate that the DKO VCS CMs simply acquire a ventricular working "fate" is unlikely. Following this reasoning, the reverse experiment to induce Tbx3 and Tbx5 expression in working CMs would result in complete conversion to VCS CMs, which is also unlikely.

      To answer such questions, transcriptomic and epigenetic state analysis, electrophysiologic analysis (e.g. patch-clamp), cell/subcellular level analysis, etc. would be required, as well as a comparison of the changed state of the DKO VCS CMs to that of working CMs.

      This initial study focused on generating the Tbx3:Tbx5 double-conditional knockout model and characterizing the resulting physiological and molecular changes within the VCS. We analyzed transcriptomic markers of fast conduction (VCS), slow conduction (nodal), and non-conduction (working myocardium). Additionally, we applied optical mapping to evaluate the physiological consequences of the double knockout, which allowed a calculated AP of the VCS to be generated. We agree that a more in-depth mechanistic investigation of the VCS transformation upon Tbx3/Tbx5 deletion by transcriptomic or cellular electrophysiology could provide a deeper understanding of the precise transcriptional/epigenetic state of the VCS in the double knockout and clarify whether there is a partial or complete conversion of VCS cells to a simple working myocardial phenotype. The suggestions by the reviewer will be considered for future studies.

      (2) Tbx3 stimulates BMP-TGFb signaling (e.g. positive loop between Tbx3-Bmp2), which in turn stimulates EMT and modulates the behavior of endocardial and mesenchymal cells. Did the authors investigate the impact of Tbx3/5 DKO on non-CM cells in and around the VCS? (see also comment 1). The insulation of the AVB for example could be a Tbx3/5 non cell autonomous target.

      We appreciate the Reviewer’s suggestion to examine the impact of Tbx3/Tbx5 deletion on non-CM cells surrounding the VCS. While this is an intriguing avenue for future exploration, it falls outside the scope of the current study, which focused on the cardiomyocyte-specific roles of Tbx3 and Tbx5 in maintaining adult VCS identity.

      (3) The MinK-Cre line used (from the Moskowitz lab) also recombines in the AVN (Arnolds et al 2011). The authors do not mention changes in the AVN, and systematically call the line VCS specific (which refers to the AVB, BB, PVCS I assume). This could also impact the PR interval. Please address.

      The MinK-Cre line recombines in the atrioventricular bundle (AVB) and bundle branches (BB). It recombines in cardiomyocytes adjacent to the atrioventricular node (AVN). We previously interpreted these cells as the penetrating portion of the His bundle into the AVN. This line does not recombine in the vast majority, if any, physiologic nodal cells. We also assessed nodal conduction parameters by invasive electrophysiologic (EP) studies. Our data showed that non-VCS parameters, including sinus node recovery time, AV node recovery time, and atrial and ventricular effective refractory periods, remained within normal ranges in Tbx3:Tbx5-deficient mice (please see Figure 2I). These findings indicate that AVN function is preserved in the VCS-specific double knockout, reinforcing the specificity of the observed conduction defects to the ventricular conduction system.

      (4) Did the authors also investigate the electrophysiological changes in the (EGFP+) DKO VCS CMs? Would these resemble the properties of ventricular working CMs, or would they still show some VCS properties? (see also comment 1).

      We performed electrophysiologic analysis of the double knockout by optical mapping. Optical mapping provides tissue-level resolution, capturing the functional behavior of clusters of thousands of cells simultaneously, rather than individual cells. While this technique does not achieve single-cell resolution, it allows for a comprehensive assessment of electrophysiological changes across the VCS region. Single cell electrophysiology is a good idea for future studies. 

      (5) Throughout the manuscript, the authors use "patterning" and "fate", which are applicable to development and differentiation, not to the situation where a gene is removed from fully differentiated cells in an adult organism resulting in a change of these cells. Perhaps more appropriate are "state" change and the requirement for "homeostasis/maintenance" of state.

      We appreciate the Reviewer’s concern regarding the terminology used to describe changes in VCS cell identity. To ensure precision and uniformity, we replaced terms such as “fate” and “patterning” with “state” or “maintenance” to reflect the shift in cellular characteristics in a fully differentiated adult tissue context. 

      Minor:

      (1) Please provide all data points in bar graphs.

      We have incorporated individual data points into the bar graphs as suggested, ensuring enhanced transparency and clarity in the data presentation.

      “(2) Formally, gene expression levels between samples are not normally distributed. The Welch t-test used here assumes a normal distribution. Therefore, nonparametric tests should be used.

      We appreciate Reviewer #1’s consideration of the appropriate statistical approach to the qPCR data and clarify our statistical approach here. Normality within each experimental group was assessed using the Shapiro-Wilk test. Between-group comparisons were conducted using Welch t-test, and multiple comparisons were corrected using the Benjamini & Hochberg method to control the false discovery rate (FDR) (71). If a significant difference was detected between two groups (t-test FDR < 0.05) but normality was rejected in any of the compared groups (Shapiro-Wilk P < 0.05), a non-parametric Wilcoxon rank-sum test was used for verification. A significant group-mean difference was confirmed at one-tailed Wilcoxon P≤0.05 (detailed in Supplementary Data Set I). Furthermore, we have updated the qRT-PCR information in each figure and their respective legends as follows. Statistical analysis was performed using R version 4.2.0. We have included a new Supplementary Data Set I, detailing the statistical analysis of qRT-PCR data. Additionally, we have revised the Methods/Statistics section to detail the applied statistical analysis. 

      (3) Some of the panels of figures are tiny and cannot be evaluated. For example, in Figure 1B the actual data (expression of Tbx3/5) is impossible to see.

      We appreciate the Reviewer’s observation and have revised the figures to improve visual clarity and ensure that the presented data are easily interpretable by readers.

      Reviewer #2 (Recommendations for the authors):

      Additional Experiments, Data, Analysis:

      (1) Comparisons between both single knockouts and double knockouts at the phenotypic level are needed. In some instances, the data is shown (e.g., mortality and EKG) but direct statistical comparison is not performed. In other instances (optical mapping and gene expression), data with single knockouts are not shown. If combined VCS Tbx3/Tbx5 deletion does not change the phenotype of the VCS Tbx5 single deletion, this should be explicitly stated and discussed.

      We appreciate Reviewer #2’s suggestion to compare the phenotypic outcomes of the Tbx3 and Tbx5 single conditional knockout models with those observed in Tbx3/Tbx5 double conditional knockout model. We have expanded the discussion section of our manuscript to incorporate a more detailed comparison between the double Tbx3/Tbx5 model and the single Tbx5 and Tbx3 models [1-5], highlighting the distinct phenotypic outcomes of the single and double knockouts.

      (1) Burnicka-Turek O, Broman MT, Steimle JD, Boukens BJ, Petrenko NB, Ikegami K, Nadadur RD, Qiao Y, Arnolds DE, Yang XH, Patel VV, Nobrega MA, Efimov IR, Moskowitz IP (2020) Transcriptional Patterning of the Ventricular Cardiac Conduction System. Circulation Research 127:e94-e106. doi:10.1161/CIRCRESAHA.118.314460. 

      (2) Mohan RA, Bosada FM, van Weerd JH, van Duijvenboden K, Wang J, Mommersteeg MTM, Hooijkaas IB, Wakker V, de Gier-de Vries C, Coronel R, Boink GJJ, Bakkers J, Barnett P, Boukens BJ, Christoffels VM (2020) T-box transcription factor 3 governs a transcriptional program for the function of the mouse atrioventricular conduction system. Proc Natl Acad Sci U S A. 117:18617-18626. doi: 10.1073/pnas.1919379117.

      (3) Arnolds DE, Liu F, Fahrenbach JP, Kim GH, Schillinger KJ, Smemo S, McNally EM, Nobrega MA, Patel VV, Moskowitz IP (2012) TBX5 drives Scn5a expression to regulate cardiac conduction system function. The Journal of Clinical Investigation 122:2509–2518. doi: 10.1172/JCI62617.

      (4) Frank DU, Carter KL, Thomas KR, Burr RM, Bakker ML, Coetzee WA, Tristani-Firouzi M, Bamshad MJ, Christoffels VM, Moon AM (2012) Lethal arrhythmias in Tbx3-deficient mice reveal extreme dosage sensitivity of cardiac conduction system function and homeostasis. Proc Natl Acad Sci U S A. 109:E154-63. doi: 10.1073/pnas.1115165109. [5] Moskowitz IP, Pizard A, Patel VV, Bruneau BG, Kim JB, Kupershmidt S, Roden D, Berul CI, Seidman CE, Seidman JG (2004) The T-Box transcription factor Tbx5 is required for the patterning and maturation of the murine cardiac conduction system. Development 131:4107-4116. doi: 10.1242/dev.01265.

      (2) Genome-wide expression analysis including working myocardium would provide stronger evidence for interconversion of cell states. Ideally, this would include single knockouts.

      We agree that a genome-wide expression analysis, including a direct comparison with working myocardium, would provide more comprehensive insights into cell state transitions in Tbx3:Tbx5-deficient VCS cells. Additionally, incorporating single knockout models into such analyses would further clarify the distinct and cooperative contributions of Tbx3 and Tbx5 to maintaining VCS identity. This is a good suggestion for future studies.

      (3) This may not be essential to support the authors' claims, but the addition of epigenetic data from single and double KO VCS using ATAC-seq (which can be performed with relatively small numbers of cells) could provide stronger evidence for cell state changes of the kind hypothesized by the authors.

      We agree that epigenetic data such as ATAC-seq would complement transcriptional analyses and provide insight into chromatin states that underlie the observed cellular reprogramming. This is a good suggestion for follow-up studies to further characterize the molecular state of Tbx3:Tbx5-deficient VCS cells.

      (4) Additional clarification of the optical mapping experiments to exclude alternative interpretations like focal right bundle branch block and to include single knockouts for comparison - if the Tbx5 single KO looks the same as the double KO that would be very important to know and would directly affect interpretation of the experiment.

      Right septal optical mapping preparation involved removing the right ventricular free wall to directly image the right ventricular septum, which contains the VCS. In a healthy mouse, there are two peak components of the optical action potential upstroke, the first peak due to the activation of the VCS and the second due to the activation of the ventricular cardiomyocytes. Importantly, in Tbx3:Tbx5 double-conditional knockout mice, the first peak was absent, rather than delayed, indicating loss of fast conduction through the VCS. This absence suggests a shift in VCS cells toward a ventricular working myocardial phenotype, rather than a regional conduction block or delayed propagation through a structurally intact VCS.

      Previous studies from our group have extensively characterized the effect of single Tbx5 knockout on the VCS in murine hearts [1, 2, 3]. Arnolds et al. demonstrated that VCSspecific Tbx5-deficiency results in significant slowing of VCS conduction, evidenced by prolonged PR and QRS intervals, along with lengthening of the atrio-Hisian interval, His duration, and Hisioventricular interval [1]. Although both single Tbx5 knockout and Tbx3:Tbx5 double knockout mice exhibit slowing of ventricular conduction system, our optical mapping studies reveal distinct differences in their electrophysiological phenotypes. Burnicka-Turek et al. showed that the single knockout of Tbx5 in the VCS leads to a shift toward a pacemaker cell state, evidenced by ectopic beats originating in the ventricles and inappropriate automaticity [3]. During spontaneous beats, electrical impulses were retrogradely activated, propagating from the ventricles to the atria [3]. Whole-cell patch clamping recordings confirmed that Tbx5-deficient VCS cells displayed action potentials resembling pacemaker cells, characterized by slower upstroke (phase 0), prolonged plateau (phase 2), delayed repolarization (phase 3), and enhanced phase 4 depolarization [3]. In contrast, our current study on VCS-specific Tbx3:Tbx5 double knockout demonstrates a loss of the VCS-specific fast conduction propagation. Optical mapping demonstrated the absence of the initial upstroke corresponding to VCS activation in the His bundle region, indicating a shift in the VCS cells toward a ventricular working myocardium state. This loss of fast conduction properties highlights a fundamental distinction between single and double knockouts, suggesting that both Tbx3 and Tbx5 are required to maintain VCS identity and function.

      (1) D. E. Arnolds et al., “TBX5 drives Scn5a expression to regulate cardiac conduction system function,” J. Clin. Invest., vol. 122, no. 7, pp. 2509–2518, Jul. 2012, doi: 10.1172/JCI62617.

      (2) Moskowitz, I.P., Pizard, A., Patel, V.V., Bruneau, B.G., Kim, J.B., Kupershmidt, S., Roden, D., Berul, C.I., Seidman, C.E., Seidman, J.G. (2004) The T-Box transcription factor Tbx5 is required for the patterning and maturation of the murine cardiac conduction system. Development 131(16):4107-4116. 

      (3) Burnicka-Turek, O., Broman, M.T., Steimle, J.D., Boukens, B.J., Peterenko, N.B, Ikegami, K., Nadadur, R.D., Qiao, Y., Arnolds, D.E., Yang, X.H., Patel, V.V., Nobrega, M.A., Efimov, I.R., Moskowitz, I.P. (2020) Transcriptional Patterning of the Ventricular Cardiac Conduction System. Circ Res. 127(3):e94-e106. 

      Methods:

      (1) Additional methods on FACS are required. The methods section references a paper from 2004 (reference 67) that describes the flow sorting of embryonic cardiomyocytes. However, flow cytometric isolation of intact adult cardiomyocytes, which the authors describe in the present work, is a distinct technique and generally requires special equipment. These need to be described in more detail to be fully replicable.

      We thank Reviewer #2 for highlighting the need to provide additional details regarding our flow cytometric isolation of adult VCS cardiomyocytes. While we referenced earlier methods, we agree that isolating adult cardiomyocytes requires specialized approaches. Therefore, we revised the Methods section to include a detailed description of the equipment, procedures, and adaptations specific to isolating intact adult VCS cells to ensure full replicability.

      Minor Corrections:

      (1) Figure 1D. Please add a statistical test for mortality between the double conditional KO and the Tbx5 conditional KO.

      We have revised Figure 1D to include the statistical test comparing mortality between the Tbx3:Tbx5 double conditional knockout and the Tbx5 conditional knockout cohorts.

      (2) Figure 2A, 2I, 3A: Please include all individual data points not just a bar graph with error bars.

      We have added all individual data points to the bar graphs as recommended, enhancing the transparency and clarity of the data presentation.

      (3) Figure 2A: Please consider separate graphs for PR and QRS with appropriately scaled Y-axis so differences are easier to see.

      We appreciate Reviewer #2’s suggestion and fully agree with it. As a result, we have revised Figure 2A to include separate graphs for PR and QRS intervals, each with appropriately scaled Y-axes. This adjustment enhanced both the readability and the clarity of the observed differences.

      (4) Figure 3 G-K: The figure would be easier to interpret for the reader if genotypes were shown in the figure not just in the legend.

      We agree with Reviewer #2’s suggestion and have revised Figure 3 accordingly by adding genotype labels directly to the histological sections in Panels G-K. This update improves clarity, making the data easier for readers to interpret without needing to refer to the figure legend.

      (5) Figure 4A, C: Are vertical axes mislabeled? They say, "CON VCS and TBX5OE VCS". Please double-check axis labels and data on the graph.

      We appreciate the Reviewer bringing the mislabeling of the vertical axis in Figure 4 to our attention. We have corrected the labeling errors and ensured consistency between the graph and the underlying data.

      (6) Legend to Supplementary Figure 6. Says "Tbx3:Tbx3" instead of "Tbx3:Tbx5".

      We thank Reviewer #2 for pointing out the typo. It has been corrected to: “Supplementary Figure 6. Tbx3:Tbx5 double-conditional knockout mice exhibit QRS prolongation”.

      (7) Discussion. The authors write, "In Tbx3:Tbx5 double VCS knockout, we observed repression of fast VCS markers and also repression of Pan-CCS markers transcribed throughout the entire CCS." The term 'repression' has a specific connotation with transcription regulators that is likely not intended in this context so perhaps 'reduced expression' would be better here?

      We agree with Reviewer #2 and have replaced “repression” with “reduced expression” throughout the text (look below for references).

      “In the Tbx3:Tbx5 double VCS knockout, we observed a reduction in the expression of both fast VCS markers and Pan-CCS markers transcribed throughout the entire CCS.”

      (8) Discussion, the authors write, "This study combined with prior literature (1, 7, 11, 15, 26, 53, 54) indicates that the presence of both Tbx3 and Tbx5 is necessary for the specification of the adult VCS (Figure 7)." Since this work presents data from an adult conditional deletion, it's not clear how it informs our understanding of the specification, which occurs during development. Perhaps "maintenance of VCS fate" would be more appropriate here?

      We agree with Reviewer #2 that the term “maintenance of VCS fate” is more appropriate in the context of our study. Accordingly, we have updated the text to reflect this terminology.

      Reviewer #3 (Recommendations for the authors):

      (1) Figure 2B: It is hard to see the IF images. What is the cardiac structure studied? Maybe a dashed line and a label to define the region and the structure represented will help. As the authors have described that the crosses used contain a reporter allele (R26-EYFP), a clearer way to show these results would be to include images of the linage traced cells with the reporter, not only to identify the CCS structure analyzed, but also to demonstrate that the deletion is specific to the MinK-creERT expression in the CCS.

      We appreciate the Reviewer’s suggestion to improve the clarity of Figure 2B by delineating the cardiac structures analyzed. In response, we have added dashed lines and labels to highlight the regions of interest within the IF images. Unfortunately, we were unable to capture high-quality EYFP fluorescence images for these sections. However, to address this concern, we microdissected the region shown in the IF images and performed FACS to isolate EYFP-positive cells from this specific area. These sorted cells were subsequently used for qPCR analysis, which confirmed the presence of Tbx3 and Tbx5 in control samples and the successful deletion of both genes in the doubleconditional knockout samples (Figure 2C, middle panel). We believe this approach provides robust evidence for the specificity of the MinK-CreERT expression in the CCS and the efficiency of gene deletion in the targeted region.

      (2) 3G-K: The authors describe the absence of morphological defects in the tissue sections of adult hearts from the different genotypes analyzed. Although this reviewer agrees that there seem to be no major defects in the general cardiac morphology of these animals, the higher magnification images suggest some tissue differences at the level of the AVN especially in the double HET, double HOMO, and the Tbx3 HOMO. Is that due to the section plane used? If so, more appropriate and comparable sections must be provided. Again, as the crosses used by the authors contain a reporter allele (R26-EYFP), it is required that the authors show that the CCS cells, where deletions are induced, are still present in equivalent areas in the mutants and that they remain in similar numbers only failing to maintain their specification into CCS due to Tbx3 and Tbx5 loss of function.

      This analysis will reinforce the authors' claims on the role of Tbx5/Tbx3 in this process.

      We thank the reviewer for their thorough assessment and thoughtful feedback on our histological analysis. The higher magnification images in Figure 3G-K do not specifically present the AVN. These sections primarily represent areas of the ventricular conduction system (VCS), particularly the His bundle and bundle branches, rather than the AVN itself. We do not believe that the observed morphological differences are related to AVN tissue, and there were no functional deficits attributable to the AVN in the double knockout. Furthermore, the Mink-Cre allele used in this study does not recombine in the ANV proper.   We agree that confirming the presence of CCS cells in equivalent regions across different genotypes is crucial. Our approach using FACS-based isolation of EYFP-positive cells from the VCS, followed by qPCR analysis, provides evidence that these cells remain present in double conditional knockouts, although they fail to maintain their specialized gene expression profile. This reinforces our conclusion that Tbx3 and Tbx5 are essential for maintaining the molecular identity of CCS cells, rather than their physical presence.

      (3) Figure 4: The authors performed molecular analysis by qPCR and WB in Tbx5/Tbx3 double mutants to demonstrate that CCS cells lose the expression of CCS genes and express working myocardium genes. Could this be further demonstrated by ISH, HCR, or IF together with lineage tracing to provide evidence that these changes are located where the CCS tissues are in the control embryos? Analysis of 2 or 3 of these markers of each type on tissue sections would be enough.

      We thank the Reviewer for their insightful suggestion regarding additional validation of our molecular findings through ISH, HCR, or IF combined with lineage tracing. However, we would like to clarify that the molecular analyses we performed by qPCR and WB were conducted on EYFP-positive cells that were specifically isolated from the ventricular conduction system (VCS) region of both control and double conditional knockout (dCKO) mice. These EYFP-positive cells were obtained through fluorescence-activated cell sorting (FACS), ensuring that our analyses were confined to the targeted VCS population. Alternate approaches are appropriate for future studies to investigate the precise genomic and molecular nature of the transformation observed in the double knockout.

      (4) Discussion: in the discussion section the authors conclude that the combined role of Tbx5/Tbx3 is critical for the specification of the adult VCS. However, as the Tbx5/Tbx3 loss of function conditions are only induced in adult animals 6 weeks old, would it be more appropriate that their function is the maintenance of the VCS cell fate and that if not present these cells return to the working myocardium fate? If the authors believe that these genes are involved in the induction of VCS specification in adults, then they need to demonstrate that, before the loss of function induction at 6 weeks, these cells are not yet specified as adult VCS.

      We appreciate the Reviewer’s clarification regarding terminology. We agree that our study focuses on adult-specific conditional deletion and thus reflects the maintenance, rather than the specification, of VCS cell fate. Accordingly, we have revised the text to explicitly state that Tbx3 and Tbx5 are critical for maintaining VCS identity in adult mice, and that their loss leads to a shift toward a working myocardial fate.

      Minor:

      (1) There is no consistency in the way the quantitative data is shown in graphs. There are some graphs showing only bars, other dot plots, and other a combination of both. The authors must homogenise the representation of quantitative data showing the different data points in dot plots and not in bar graphs.

      We have standardized the quantitative data presentation across all figures, by including individual data points in bar graphs, ensuring enhanced transparency and clarity.

      (2) Figure 3: The labels defining the genotypes corresponding to the different histological sections of adult hearts (Panels G-K) are missing. Panels J and K are not referenced in the text.

      We thank Reviewer #3 for highlighting these omissions. We have added the genotype labels to the histological sections in Panels G-K of Figure 3 to ensure clarity. Furthermore, we have now referenced Panels J and K in the results and in the supplementary material (please look below for references).

      “Histological examination of all four-chambers demonstrated no discernible differences between VCS-specific Tbx3:Tbx5 double-knockout (Tbx3<sup>fl/fl</sup>;Tbx5<sup>fl/fl</sup>;R26<sup>EYFP/+</sup>; MinK<sup>CreERT2/+</sup>) and control (Tbx3<sup>+/+</sup>;Tbx5<sup>+/+</sup>;R26<sup>EYFP/+</sup>; MinK<sup>CreERT2/+</sup>) mice, nor between . the double-knockout (Tbx3<sup>fl/fl</sup>;Tbx5<sup>fl/fl</sup>;R26<sup>EYFP/+</sup>; MinK<sup>CreERT2/+</sup>) and single-knockout models for either Tbx3 (Tbx3<sup>fl/fl</sup>;Tbx5<sup>+/+</sup>;R26<sup>EYFP/+</sup>; MinK<sup>CreERT2/+</sup>) or Tbx5 (Tbx3<sup>+/+</sup>;Tbx5<sup>fl/fl</sup>;R26<sup>EYFP/+</sup>; MinK<sup>CreERT2/+</sup>).Ventricular muscle appeared normal without hypertrophy or myofibrillar disarray and no fibrosis was present (Figure 3G, 3I, 3J, and 3K, respectively).”

      “Additionally, we confirmed the absence of histological and structural abnormalities in these mice, aligning with previous findings (Figures 3A, 3F versus 3B, and 3K versus 3G, respectively)(1, 11).”

      (3) Typo: Supplementary Figure 6. Tbx3:Tbx3 double-conditional knockout: it should say Tbx5:Tbx3 double-conditional knockout.

      We thank Reviewer #3 for pointing out the typo. It has been corrected to: “Supplementary Figure 6. Tbx3:Tbx5 double-conditional knockout mice exhibit QRS prolongation”.

    1. Author response:

      The following is the authors’ response to the original reviews

      eLife Assessment

      This manuscript makes valuable contributions to our understanding of cell polarisation dynamics and its underlying mechanisms. Through the development of a computational pipeline, the authors provide solid evidence that compensatory actions, whether regulatory or spatial, are essential for the robustness of the polarisation pattern. However, a more comprehensive validation against experimental data and a proper estimation of model parameters are required for further characterization and predictions in natural systems, such as the C. elegans embryo.

      We sincerely thank the editor(s) for their pertinent assessment. We have carefully considered the constructive recommendations and made the necessary revisions in the manuscript, which are also detailed in this response letter. We have implemented most of the revisions requested by the reviewers. For the few requests we did not fully accept, we have provided justifications. The corresponding revisions in both the Manuscript and Supplementary Information are highlighted with a yellow background. To provide a more comprehensive validation against experimental data and model parameters used for characterizing and predicting natural systems, we reproduced the qualitative and semi-quantitative phenomenon in three more experimental groups previously published (Section 2.5; Fig. S8) [Gotta et al., Curr. Biol., 2001; Aceto et al., Dev. Biol., 2006]. Combined with the original experiments (Section 2.5; Fig. S7) [Hoege et al., Curr. Biol., 2010; Beatty et al., Development, 2010; Beatty et al., Development, 2013], now we have reproduced five experimental groups in total (two acting on LGL-1 and three on CDC-42), comprising eight perturbed conditions and using wild-type as the reference. These results effectively demonstrate how comprehensively the network structure and parameters capture the characteristics of the C. elegans embryo. We have also acknowledged the limitations of the current cell polarization model and provided, in 2. Results and 3. Discussion and conclusion, a detailed outline of potential model improvements.

      Joint Public Review:

      The polarisation phenomenon describes how proteins within a signalling network segregate into different spatial domains. This phenomenon holds fundamental importance in biology, contributing to various cellular processes such as cell migration, cell division, and symmetry breaking in embryonic morphogenesis. In this manuscript, the authors assess the robustness of stable asymmetric patterns using both a previously proposed minimal model of a 2-node network and a more realistic 5-node network based on the C. elegans cell polarisation network, which exhibits anterior-posterior asymmetry. They introduce a computational pipeline for numerically exploring the dynamics of a given reaction-diffusion network and evaluate the stability of a polarisation pattern. Typically, the establishment of polarisation requires the mutual inhibition of two groups of proteins, forming a 2-node antagonistic network. Through a reaction-diffusion formulation, the authors initially demonstrate that the widely-used 2-node antagonistic network for creating polarised patterns fails to maintain the polarised pattern in the face of simple modifications. However, the collapsed polarisation can be restored by combining two or more opposing regulations. The position of the interface can be adjusted with spatially varied kinetic parameters. Furthermore, the authors show that the 5-node network utilised by C. elegans is the most stable for maintaining polarisation against parameter changes, identifying key parameters that impact the position of the interface.

      We sincerely thank the editor(s) for the pertinent summary!

      While the results offer novel and insightful perspectives on the network's robustness for cell polarisation, the manuscript lacks comprehensive validation against experimental data, justified node-node network interactions, and proper estimation of model parameters (based on quantitative measurements or molecular intensity distributions). These limitations significantly restrict the utility of the model in making meaningful predictions or advancing our understanding of cell polarisation and pattern formation in natural systems, such as the C. elegans embryo.

      We sincerely thank the editor(s) for the comment!

      To provide a more comprehensive validation against experimental data and model parameters, we reproduced the qualitative and semi-quantitative phenomenon in three more experimental groups previously published (Section 2.5; Fig. S8) [Gotta et al., Curr. Biol., 2001; Aceto et al., Dev. Biol., 2006]. Combined with the original experiments (Section 2.5; Fig. S7) [Hoege et al., Curr. Biol., 2010; Beatty et al., Development, 2010; Beatty et al., Development, 2013], now we have reproduced five experimental groups in total (two acting on LGL-1 and three on CDC-42), comprising eight perturbed conditions and using wild-type as the reference. These meaningful predictions effectively demonstrate the utility of our model’s network structure and parameters in advancing our understanding of cell polarisation and pattern formation in natural systems, exemplified by the C. elegans embryo.

      We have also acknowledged the limitations of the current cell polarization model and provided, in 2. Results and 3. Discussion and conclusion, a detailed outline of potential model improvements. The limitations include, but are not limited to, issues involving “node-node network interactions” and the “proper estimation of model parameters (based on quantitative measurements or molecular intensity distributions)”, both of which rely on experimental measurements of biological information.   However, comprehensive experimental measurement data on every molecular species, their interactions, and each species’ intensity distribution in space and time were not fully available from prior research. Refinement is lacking for some of these interactions, potentially requiring years of additional experimentation. Moreover, for certain species at specific developmental stages, only relative (rather than absolute) intensity measurements are available. We agreed that such information is essential for establishing a more utilizable model and discussed it thoroughly in 3. Discussion and conclusion. From a theoretical perspective, we adopted assumptions from the previous literature and constructed a minimal model for a specific cell polarization phase to investigate the network's robustness, supported by five experimental groups and eight perturbed conditions in the C. elegans embryo.

      The study extends its significance by examining how cells maintain pattern stability amid spatial parameter variations, which are common in natural systems due to extracellular and intracellular fluctuations. The authors found that in the 2-node network, varying individual parameters spatially disrupt the pattern, but stability is restored with compensatory variations. Additionally, the polarisation interface stabilises around the step transition between parameter values, making its localisation tunable. This suggests a potential biological mechanism where localisation might be regulated through signalling perception.

      We sincerely thank the editor(s) for the pertinent review!

      Focusing on the C. elegans cell polarisation network, the authors propose a 5-node network based on an exhaustive literature review, summarised in a supplementary table. Using their computational pipeline, they identify several parameter sets capable of achieving stable polarisation and claim that their model replicates experimental behaviour, even when simulating mutants. They also found that among 34 possible network structures, the wild-type network with mutual inhibition is the only one that proves viable in the computational pipeline. Compared with previous studies, which typically considered only 2- or 3-node networks, this analysis provides a more complete and realistic picture of the signalling network behind polarisation in the C. elegans embryo. In particular, the model for C. elegans cell polarisation paves the way for further in silico experiments to investigate the role of the network structure over the polarisation dynamics. The authors suggest that the natural 5-node network of C. elegans is optimised for maintaining cell polarisation, demonstrating the elegance of evolution in finding the optimal network structure to achieve certain functions.

      We sincerely thank the editor(s) for the pertinent review!

      Noteworthy limitations are also found in this work. To simplify the model for numerical exploration, the authors assume several reactions have equivalent dynamics, reducing the parameter space to three independent dimensions. While the authors briefly acknowledge this limitation in the "Discussion and Conclusion" section, further analysis might be required to understand the implications. For instance, it is not clear how the results depend on the particular choice of parameters. The authors showed that adding additional regulation might disrupt the polarised pattern, with the conclusion apparently depending on the strength of the regulation. Even for the 5-node wild-type network, which is the most robust, adding a strong enough self-activation of [A], as done in the 2-node network, will probably cause the polarised pattern to collapse as well.

      We sincerely thank the editor(s) for the comment!

      Now we have thoroughly expanded our acknowledgment of the model’s limitations in in 2. Results and 3. Discussion and conclusion. To rule out the equivalent dynamics assumption undermines our conclusions, we have added simulations showing that the cell polarization pattern stability does not depend on the exact strength of each regulation, provided the regulations on both sides are initially balanced as a whole (Fig. S5). Specifically, we used a Monte Carlo method to sample a wide range of various parameter values ( i.e., γ, α, k<sub>1</sub>, k<sub>2</sub>, q<sub>1</sub>, q<sub>2</sub> and [X<sub>c</sub>) for all nodes and regulations in simple 2-node network and C. elegans 5-node network, to achieve pattern stability. Under these conditions (i.e., without any reduction in the parameter space), single-sided self-regulation, single-sided additional regulation, and unequal system parameters still cause the stable polarized pattern to collapse, consistent with our conclusions in the simplified conditions with the parameter space reduced to three independent dimensions.

      Additionally, the authors utilise parameter values that are unrealistic, fail to provide units for some of them, and assume unknown parameter values without justification. The model appears to have non-dimensionalised length but not time, resulting in a mix of dimensional and non-dimensional variables that can be confusing. Furthermore, they assume equal values for Hill coefficients and many parameters associated with activation and inhibition pathways, while setting inhibition intensity parameters to 1. These arbitrary choices raise concerns about the fidelity of the proposed model in representing the real system, as their selected values could potentially differ by many orders of magnitude from the actual parameters.

      We sincerely thank the editor(s) for the comment!

      We apologize for the confusion. The non-dimensionalised parameter values are adopted from previous theoretical research [Seirin-Lee et al., Cells, 2020], which originates from the experimental measurement in [Goehring et al., J. Cell Biol., 2011; Goehring et al., Science, 2011]. With the in silico time set as 2 sec per step, now we have added the Supplemental Text justifying how the units are removed during non-dimensionalization. This demonstrates that the derived non-dimensionalized parameter in this paper achieves realistic values on the same order of magnitude as those observed in reality, confirming the fidelity of the proposed model in representing the real system.

      The assumption of “equal values for Hill coefficients and many parameters associated with activation and inhibition pathways” is to reduce the parameter space for affordable computational cost. It is a widely-used strategy to fix Hill coefficients [Seirin-Lee et al., J. Theor. Biol., 2015; Seirin-Lee, Bull. Math. Biol., 2021] and unify parameter values for different pathways in network research about both cell polarization [Marée et al., Bull. Math. Biol., 2006; Goehring et al., Science, 2011; Trong et al., New J. Phys., 2014] and other biological topics (e.g., plasmid transferring in the microbial community [Wang et al., Nat. Commun., 2020]), to control computational cost. Nevertheless, to rule out that the equivalent dynamics assumption undermines our conclusions, we have added simulations showing that the cell polarization pattern stability does not depend on the exact parameter values associated with activation and inhibition pathways, provided the regulations on both sides are initially balanced as a whole (Fig. S5). Specifically, we used a Monte Carlo method to sample a wide range of various parameter values (i.e_., _γ, α, k<sub>1</sub>, k<sub>2</sub>, q<sub>1</sub>, q<sub>2</sub> and [X<sub>c</sub>) for all nodes and regulations in simple 2-node network and C. elegans 5-node network, to achieve pattern stability. Under these conditions ( i.e., without any reduction in the parameter space), single-sided self-regulation, single-sided additional regulation, and unequal system parameters still cause the stable polarized pattern to collapse, consistent with our conclusions in the simplified conditions with the parameter space reduced to three independent dimensions.

      To confirm the fidelity of the proposed model in representing the real system, we reproduced the qualitative and semi-quantitative phenomenon in three more experimental groups previously published (Section 2.5; Fig. S8) [Gotta et al., Curr. Biol., 2001; Aceto et al., Dev. Biol., 2006]. Combined with the original experiments (Section 2.5; Fig. 5; Fig. S7) [Hoege et al., Curr. Biol., 2010; Beatty et al., Development, 2010; Beatty et al., Development, 2013], now we have reproduced five experimental groups in total (two acting on LGL-1 and three on CDC-42), comprising eight perturbed conditions and using wild-type as the reference. These results effectively demonstrate how comprehensively the network structure and parameters capture the characteristics of the C. elegans embryo. We have also acknowledged the limitations of the current cell polarization model and provided, in 2. Results and 3. Discussion and conclusion, a detailed outline of potential model improvements.

      It is worth noting that, although a strict match between numerical and realistic parameter values with consistent units is always helpful, a lot of notable pure numerical studies successfully unveil principles that help interpret [Ma et al., Cell, 2009] and synthesize real biological systems [Chau et al., Cell, 2012]. These studies suggest that numerical analysis in biological systems remains powerful, even when comprehensive experimental data from prior research are not fully available.

      The definition of stability and its evaluation in the proposed pipeline might also be too narrow. Throughout the paper, the authors discuss the stability of the polarised pattern, checked by an exhaustive search of the parameter space where the system reaches a steady state with a polarised pattern instead of a homogeneous pattern. It is not clear if the stability is related to the linear stability analysis of the reaction terms, as conducted in Goehring et al. (Science, 2011), which could indicate if a homogeneous state exists and whether it is stable or unstable. The stability test is performed through a pipeline procedure where they always start from a polarised pattern described by their model and observe how it evolves over time. It is unclear if the conclusions depend on the chosen initial conditions. Particularly, it is unclear what would happen if the initial distribution of posterior molecules is not exactly symmetric with respect to the anterior molecules, or if the initial polarisation is not strong.

      We sincerely thank the editor(s) for the comment!

      The definition of stability and its evaluation in the proposed pipeline consider two criteria: 1. The pattern is polarized; 2. The pattern is stable. Following simulations, figures, and videos (Fig. 1-6; Fig. S1-S5; Fig. S7-S9; Movie S1-S5) have sufficiently demonstrated that the parameters and networks set up capture the cell polarization dynamis regarding both the stable and unstable states very well.

      Now we have added new simulation on alternative initial conditions. They demonstrating the necessity of a polarized initial pattern set up independently of the reaction-diffusion network during the establishment phase, probably through additional mechanisms such as the active actomyosin contractility and flow [Cuenca et al., Development, 2003; Gross et al., Nat. Phys., 2019]. Our conclusions ( i.e., single-sided self-regulation, single-sided additional regulation, and unequal system parameters cause the stable polarized pattern to collapse) have little dependence on the chosen initial conditions as long as the unsymmetric initial patterns can set up a stable polarized pattern. A part of the simulations institutively show our conclusions still hold if the initial distribution of posterior molecules is not exactly symmetric with respect to the anterior molecules, or if the initial polarisation is not strong (Fig. S4 and Fig. S9).

      Regarding the biological interpretation and relevance of the model, it overlooks some important aspects of the C. elegans polarisation system. The authors focus solely on a reaction-diffusion formulation to reproduce the polarisation pattern. However, the polarisation of the C. elegans zygote consists of two distinct phases: establishment and maintenance, with actomyosin dynamics playing a crucial role in both phases (see Munro et al., Dev Cell 2004; Shivas & Skop, MBoC 2012; Liu et al., Dev Biol 2010; Wang et al., Nat Cell Biol 2017). Both myosin and actin are crucial to maintaining the localisation of PAR proteins during cell polarisation, yet the authors neglect cortical flows during the establishment phase and any effects driven by myosin and actin in their model, failing to capture the system's complexity. How this affects the proposed model and conclusions about the establishment of the polarisation pattern needs careful discussion. Additionally, they assume that diffusion in the cytoplasm is infinitely fast and that cytoplasmic flows do not play any role in cell polarity. Finite cytoplasmic diffusion combined with cytoplasmic flows could compromise the stability of the anterior-posterior molecular distributions. The authors claim that cytoplasmic diffusion coefficients are two orders of magnitude higher than membrane diffusion coefficients, but they seem to differ by only one order of magnitude (Petrášek et al., Biophys. J. 2008). The strength of cytoplasmic flows has been quantified by a few studies, including Cheeks et al., and Curr Biol 2004.

      We sincerely thank the editor(s) for the comment!

      Indeed, previous research highlighted the importance of convective cortical flow in orchestrating the localisation of PAR proteins during the establishment phase of polarisation formation [Goehring et al., J. Cell Biol., 2011; Rose et al., WormBook, 2014; Beatty et al., Development, 2013]. However, during the maintenance phase, the non-muscle myosin II (NMY-2) is regulated downstream by the PAR protein network rather than serving as the primary upstream factor controlling PAR protein localization [Goehring et al., J. Cell Biol., 2011; Rose et al., WormBook, 2014; Beatty et al., Development, 2013]. While some theoretical studies integrated both reaction-diffusion dynamics and the effects of myosin and actin [Tostevin, 2008; Goehring, Science, 2011], others focused exclusively on reaction-diffusion dynamics [Dawes et al., Biophys. J., 2011; Seirin-Lee et al., Cells, 2020]. We have now clarified the distinction between the establishment and maintenance phases in 1. Introduction, emphasized our research focus on the reaction-diffusion dynamics during the maintenance phase in 2. Results, and provided a discussion of the omitted actomyosin dynamics to foster a more comprehensive understanding in the future in 3. Discussion and conclusion. The effect of the establishment phase is studied as the initial condition for the cell polarization simulation solely governed by reaction-diffusion dynamics, with new simulations demonstrating the necessity of a polarized initial pattern set up independently of the reaction-diffusion network during the establishment phase, probably through additional mechanisms such as the active actomyosin contractility and flow [Cuenca et al., Development, 2003; Gross et al., Nat. Phys., 2019].

      Cytoplasmic and membrane diffusion coefficients differ by two orders of magnitude according to previous experimental measurements on PAR-2 and PAR-6 [Goehring et al., J. Cell Biol., 2011; Lim et al., Cell Rep., 2021]. Many previous C. elegans cell polarization models have incorporated mass-conservation model combined with finite cytoplasmic diffusion, but this model description can lead to reverse spatial concentration distribution between the cell membrane and cytosol [Fig. 3 of Seirin-Lee et al., J. Theor. Biol., 2016; Fig. 2ab of Seirin-Lee et al., J. Math. Biol., 2020], disobeying experimental observation [Fig. 4A of Sailer et al., Dev. Cell, 2015; Fig. 1A of Lim et al., Cell Rep., 2021]. This implies that the infinite cytoplasmic diffusion, without precise experiment-based parameter assignment or accounting for other hidden biological processes ( e.g., protein production and degradation), may be inappropriate in modeling the real spatial concentration distributions distinguished between the cell membrane and cytosol. To address this issue, some theoretical research incorporated protein production and degradation into their model, to acquire the consistent spatial concentration distribution between the cell membrane and cytosol [Tostevin et al., Biophys. J., 2008]. More definitive experimental data on the spatiotemporal changes in protein diffusion, production, and degradation are essential for providing a more realistic representation of cellular dynamics and enhancing the model's predictive power.

      Now we have acknowledged the possibly overlooked aspects of the C. elegans polarisation system in 3. Discussion and conclusion, a detailed outline of potential model improvements. Those aspects include, but are not limited to, issues involving “neglect cortical flows” and the “diffusion in the cytoplasm is infinitely fast”. From a theoretical perspective, we adopted assumptions from the previous literature and constructed a minimal model for a specific cell polarization phase to investigate the network's robustness. The meaningful predictions of five experimental groups and eight perturbed conditions in the C. elegans embryo faithfully supports the biological interpretation and relevance of the model.

      Although the authors compare their model predictions to experimental observations, particularly in reproducing mutant behaviours, they do not explicitly show or discuss these comparisons in detail. Diffusion coefficients and off-rates for some PAR proteins have been measured (Goehring et al., JCB 2011), but the authors seem to use parameter values that differ by many orders of magnitude, perhaps due to applied scaling. To ensure meaningful predictions, whether their proposed model captures the extensive published data should be evaluated. Various cellular/genetic perturbations have been studied to understand their effects on anterior-posterior boundary positioning. Testing these perturbations' responses in the model would be important. For example, comparing the intensity distribution of PAR-6 and PAR-2 with measurements during the maintenance phase by Goehring et al., JCB 2011, or comparing the normalised intensity of PAR-3 and PKC-3 from the model with those measured by Wang et al., Nat Cell Biol 2017, during establishment and maintenance phases (in both wild-type and cdc-42 (RNAi) zygotes) could provide insightful validation. Additionally, in the presence of active CDC-42, it has been observed that PAR-6 extends further into the posterior side (Aceto et al., Dev Biol 2006). Conducting such validation tests is essential to convince readers that the model accurately represents the actual system and provides insights into pattern formation during cell polarisation.

      We sincerely thank the editor(s) for the comment!

      To provide more comprehensive validations and refinements to ensure the model accurately represents biological systems, we extensively reproduced the qualitative and semi-quantitative phenomenon in three more experimental groups previously published (Section 2.5; Fig. S8) [Gotta et al., Curr. Biol., 2001; Aceto et al., Dev. Biol., 2006]. Combined with the original experiments (Section 2.5; Fig. 5; Fig. S7) [Hoege et al., Curr. Biol., 2010; Beatty et al., Development, 2010; Beatty et al., Development, 2013], now we have reproduced five experimental groups in total from published data, comprising eight perturbed conditions and using wild-type as the reference. We have also explicitly show the comparison between model predictions and experimental observations (including the mutant behaviors reproduction as well) in detail, by describing how “cell polarization pattern characteristics in simulation” responds to various cellular/genetic perturbations (Section 2.5; Fig. 5; Fig. S7 and S8). The original and new validation tests conducted can convince readers that the model accurately represents the actual system and provides insights into pattern formation during cell polarisation.

      The diffusion coefficients for anterior and posterior molecular species were assigned according to previous experimental and theoretical research [Goehring et al., J. Cell Biol., 2011; Goehring et al., Science, 2011; Seirin-Lee et al., Cells, 2020]. The off-rates are assigned uniformly by searching viable parameter sets that can set up a network with cell polarization pattern stability. Now we have added simulations showing that the cell polarization pattern stability and response to network structure and parameter perturbation does not depend on the exact parameter values (incl., diffusion coefficients and off-rates), provided the parameter values on both sides are initially balanced as a whole (Fig. S5). Specifically, we used a Monte Carlo method to sample a wide range of various parameter values ( i.e., γ, α, k<sub>1</sub>, k<sub>2</sub>, q<sub>1</sub>, q<sub>2</sub> and [X<sub>c</sub>) for all nodes and regulations in simple 2-node network and C. elegans 5-node network, to achieve pattern stability. Under these conditions ( i.e., without any reduction in the parameter space), single-sided self-regulation, single-sided additional regulation, and unequal system parameters still cause the stable polarized pattern to collapse, consistent with our conclusions in the simplified conditions with the parameter space reduced to three independent dimensions.

      With the in silico time set as 2 sec per step, now we have added the Supplemental Text justifying how the units are removed during non-dimensionalization. This demonstrates that the derived non-dimensionalized parameter in this paper achieves realistic values on the same order of magnitude as those observed in reality, confirming the fidelity of the proposed model in representing the real system. We agreed that full experimental measurements of biological information are essential for establishing a more utilizable model and discussed it thoroughly in 3. Discussion and conclusion.

      A clear justification, with references, for each network interaction between nodes in the five-node model is needed. Some of the activatory/inhibitory signals proposed by the authors have not been demonstrated ( e.g. CDC-42 directly inhibiting CHIN-1). Table S2 provided by the authors is insufficient to justify each node-node interaction, requiring additional explanations. (See the review by Gubieda et al., Phil. Trans. R. Soc. B 2020, for a similar node network that differs from the authors' model.) Additionally, the intensity distributions of cortical PAR-3 and PKC-3 seem to vary significantly during both establishment and maintenance phases (Wang et al., Nat Cell Biol 2017), yet the authors consider the PAR-3/PAR-6/PKC-3 as a single complex. The choices in the model should be justified, as the presence or absence of clustering of these PAR proteins can be crucial during cell polarisation (Wang et al., Nat Cell Biol 2017; Dawes & Munro, Biophys J 2011).

      We sincerely thank the editor(s) for the comment!

      Now we have acknowledged the limitations of the current cell polarization model and provided, in 2. Results and 3. Discussion and conclusion, a detailed outline of potential model improvements. The limitations include, but are not limited to, issues involving “each network interaction between nodes” and the “consider the PAR-3/PAR-6/PKC-3 as a single complex”, in which the former one relies on experimental measurements of biological information. However, comprehensive experimental measurement data on every molecular species, their interactions, and each species’ intensity distribution in space and time were not fully available from prior research. Refinement is lacking for some of these interactions, potentially requiring years of additional experimentation. Moreover, for certain species at specific developmental stages, only relative (rather than absolute) intensity measurements are available. We agreed that such information is essential for establishing a more utilizable model and discussed it thoroughly in 3. Discussion and conclusion.

      In consistent with previous modeling efforts [Goehring et al., Science, 2011; Gross et al., Nat. Phys., 2019; Lim et al., Cell Rep., 2021], our model treats the PAR-3/PAR-6/PKC-3 complex as a single entity for simplification, thus neglecting the potentially distinct spatial distributions of each single molecular species. We agree that a more comprehensive model, capable of resolving the individual localization patterns of these anterior PAR proteins, would be a valuable future direction. From a theoretical perspective, we adopted assumptions from the previous literature and constructed a minimal model for a specific cell polarization phase to investigate the network's robustness, supported by five experimental groups and eight perturbed conditions in the C. elegans embryo.

      In summary, the authors successfully demonstrate the importance of compensatory actions in maintaining polarisation robustness. Their computational pipeline offers valuable insights into the dynamics of reaction-diffusion networks. However, the lack of detailed experimental validation and realistic parameter estimation limits the model's applicability to real biological systems. While the study provides a solid foundation, further work is needed to fully characterise and validate the model in natural contexts. This work has the potential to significantly impact the field by providing a new perspective on the robustness of cell polarisation networks.

      We sincerely thank the editor(s) for the pertinent summary!

      To provide a more comprehensive validation against experimental data and model parameters, three more groups of the qualitative and semi-quantitative phenomenon regarding CDC-42 are reproduced based on previously published experiments (Section 2.5; Fig. S8) [Gotta et al., Curr. Biol., 2001; Aceto et al., Dev. Biol., 2006]. Combined with the original experiments (Section 2.5; Fig. 5; Fig. S7) [Hoege et al., Curr. Biol., 2010; Beatty et al., Development, 2010; Beatty et al., Development, 2013], now we have reproduced five experimental groups in total, comprising eight perturbed conditions and using wild-type as the reference.

      With the in silico time set as 2 sec per step, now we have added the Supplemental Text justifying how the units are removed during non-dimensionalization. This demonstrates that the derived non-dimensionalized parameter in this paper achieves realistic values on the same order of magnitude as those observed in reality, confirming the fidelity of the proposed model in representing the real system. Together with the reproduction of five experimental groups (eight perturbed conditions with wild-type as the reference), the model’s applicability to real biological systems in natural contexts are are fully characterised and validated.

      The computational pipeline developed could be a valuable tool for further in silico experiments, allowing researchers to explore the dynamics of more complex networks. To maximise its utility, the model needs comprehensive validation and refinement to ensure it accurately represents biological systems. Addressing these limitations, particularly the need for more detailed experimental validation and realistic parameter choices, will enhance the model's predictive power and its applicability to understanding cell polarisation in natural systems.

      We sincerely thank the editor(s) for the comment!

      To provide more comprehensive validations and refinements to ensure the model accurately represents biological systems, we extensively reproduced the qualitative and semi-quantitative phenomenon in three more experimental groups previously published (Section 2.5; Fig. S8) [Gotta et al., Curr. Biol., 2001; Aceto et al., Dev. Biol., 2006]. Combined with the original experiments (Section 2.5; Fig. 5; Fig. S7) [Hoege et al., Curr. Biol., 2010; Beatty et al., Development, 2010; Beatty et al., Development, 2013], now we have reproduced five experimental groups in total from published data, comprising eight perturbed conditions and using wild-type as the reference. We have also explicitly show the comparison between model predictions and experimental observations (including the mutant behaviors reproduction as well) in detail, by describing how “cell polarization pattern characteristics in simulation” responds to various cellular/genetic perturbations (Section 2.5; Fig. 5; Fig. S7 and S8).

      With the in silico time set as 2 sec per step, now we have added the Supplemental Text justifying how the units are removed during non-dimensionalization. This demonstrates that the derived non-dimensionalized parameter in this paper achieves realistic values on the same order of magnitude as those observed in reality, confirming the fidelity of the proposed model in representing the real system. Together with the reproduction of five experimental groups (eight perturbed conditions with wild-type as the reference), the model's predictive power and its applicability to understanding cell polarisation in natural systems are enhanced.

      Now we have added simulations showing that the cell polarization pattern stability and response to network structure and parameter perturbation does not depend on the exact parameter values (incl., diffusion coefficients, basal off-rates and inhibition intensity), provided the parameter values on both sides are initially balanced as a whole (Fig. S5). Specifically, we used a Monte Carlo method to sample a wide range of various parameter values (i.e., γ, α, k<sub>1</sub>, k<sub>2</sub>, q<sub>1</sub>, q<sub>2</sub> and [X<sub>c</sub>) for all nodes and regulations in simple 2-node network and C. elegans 5-node network, to achieve pattern stability. Under these conditions ( i.e., without any reduction in the parameter space), single-sided self-regulation, single-sided additional regulation, and unequal system parameters still cause the stable polarized pattern to collapse, consistent with our conclusions in the simplified conditions with the parameter space reduced to three independent dimensions.

      Recommendations for the Authors:

      (1) Parameterisation and Model Validation: The authors utilise parameter values that lack realism and fail to provide units for some of them, which can lead to confusion. For instance, the length of the cell is set to 0.5 without clear justification, raising questions about the scale used. Additionally, there's a mix of dimensional and non-dimensional variables, potentially complicating interpretation. Furthermore, arbitrary choices such as equal Hill coefficients and setting inhibition intensity parameters to 1 raise concerns about model fidelity. To ensure meaningful predictions, the authors should validate their model against extensive published data, including cellular/genetic perturbations. For example, comparing intensity distributions of PAR proteins measured during maintenance phases by Goehring et al., JCB 2011, and those obtained from the model could provide valuable validation. Similarly, comparisons with data from Wang et al., Nat Cell Biol 2017, on wild-type and cdc-42 (RNAi) zygotes, as well as observations from Aceto et al., Dev Biol 2006, on PAR-6 extension in the presence of active CDC-42, would strengthen the model's validity. Such validation tests are essential for convincing readers that the model accurately represents the actual system and can provide insights into pattern formation during cell polarisation.

      We sincerely thank the editor(s) and referee(s) for the helpful suggestion!

      Now we have added a new section, Parameter Nondimensionalization and Order of Magtitude Consistency, into Supplemental Text. In this section, we introduced how we adopted the parameter nondimensionalization and value assignments from previous works [Goehring et al., J. Cell Biol., 2011; Goehring et al., Science, 2011; Seirin-Lee et al., Cells, 2020]. We listed four examples (i.e., evolution time, membrane diffusion coefficient, basal off-rate, and inhibition intensity) to show the consistency in order of magtitude between numerical and realistic values.

      The assumption of “equal Hill coefficients” is to reduce the parameter space for an affordable computational cost. It is a widely-used strategy to fix Hill coefficients [Seirin-Lee et al., J. Theor. Biol., 2015; Seirin-Lee, Bull. Math. Biol., 2021] in network research, to control computational cost. Besides, setting inhibition intensity parameters to 1 is for determining a numerical scale. Now we have added simulations showing that the cell polarization pattern stability does not depend on the exact parameter values associated with activation and inhibition pathways, provided the regulations on both sides are initially balanced as a whole (Fig. S5). Specifically, we used a Monte Carlo method to sample a wide range of various parameter values (i.e., γ, α, k<sub>1</sub>, k<sub>2</sub>, q<sub>1</sub>, q<sub>2</sub> and [X<sub>c</sub>) for all nodes and regulations in simple 2-node network and C. elegans 5-node network, to achieve pattern stability. Under these conditions (i.e., without any reduction in the parameter space), single-sided self-regulation, single-sided additional regulation, and unequal system parameters still cause the stable polarized pattern to collapse, consistent with our conclusions in the simplified conditions with the parameter space reduced to three independent dimensions.

      To confirm the fidelity of the proposed model in representing the real system, we reproduced the qualitative and semi-quantitative phenomenon in three more experimental groups previously published (Section 2.5; Fig. S8) [Gotta et al., Curr. Biol., 2001; Aceto et al., Dev. Biol., 2006]. Combined with the original experiments (Section 2.5; Fig. 5; Fig. S7) [Hoege et al., Curr. Biol., 2010; Beatty et al., Development, 2010; Beatty et al., Development, 2013], now we have reproduced five experimental groups in total (two acting on LGL-1 and three on CDC-42), comprising eight perturbed conditions and using wild-type as the reference. These results effectively demonstrate how comprehensively the network structure and parameters capture the characteristics of the C. elegans embryo. We have also acknowledged the limitations of the current cell polarization model and provided, in 2. Results and 3. Discussion and conclusion, a detailed outline of potential model improvements.

      It is worth noting that, although a strict match between numerical and realistic parameter values with consistent units is always helpful, a lot of notable pure numerical studies successfully unveil principles that help interpret [Ma et al., Cell, 2009] and synthesize real biological systems [Chau et al., Cell, 2012]. These studies suggest that numerical analysis in biological systems remains powerful, even when comprehensive experimental data from prior research are not fully available.

      (2) Parameter Changes: It is not clear how the parameters change as more complicated networks are explored, and how this affects the comparison between the simple and complete model. Clarification on this point would be beneficial.

      We sincerely thank the editor(s) and referee(s) for the helpful suggestion!

      The computational pipeline in Section 2.1 is generalized for all reaction-diffusion networks, including the simple and complete ones studied in this paper. The parameter changes included two parts: 1. The mutual activation in the anterior (none for the simple 2-node network and q<sub2</sub> for the complete 5-node network); 2. The viable parameter sets (122 sets for the simple 2-node network and 602 sets for the complete 5-node network). Now we have explicitly clarified those differences:

      Those differences don’t affect the comparison between the simple and complete models. Now we have added comprehensive comparisons between the simple and complete models about 1. How they respond to alternative initial conditions consistently (Fig. S2). 2. How they respond to alternative single modifications consistently (Fig. S4 and S9), even when the parameters (i.e., γ, α, k<sub>1</sub>, k<sub>2</sub>, q<sub>1</sub>, q<sub>2</sub> and [X<sub>c</sub>) are assigned with various values concerning all nodes and regulations (Fig. S5).

      (3) Exploration of Model Parameter Space: In the two-node dual antagonistic model, the authors observe that the cell polarisation pattern is unstable for different systems (Fig. 1). However, it remains uncertain whether this instability holds true for the entire model parameter space. Have the authors thoroughly screened the full model parameter space to support their statements? It would be beneficial for the authors to provide clarification on the extent of their exploration of the model parameter space to ensure the robustness of their conclusions.

      We sincerely thank the editor(s) and referee(s) for the helpful suggestion!

      The trade-off between considered parameter space and computational cost is a long-term challenge in network study as there are always numerous combinations of network nodes, edges, and parameters [Ma et al., Cell, 2009; Chau et al., Cell, 2012]. The computational pipeline in Section 2.1 generalized for all reaction-diffusion networks exerts two strategies to limit the computational cost and set up a basic network reference: 1. Dimension Reduction (Strategy 1) - Unifying the parameter values for different nodes and different edges within the same regulatory type to minimize the unidentical parameter numbers into 3; 2: Parameter Space Confinement (Strategy 2): Enumerating the dimensionless parameter set on a three-dimensional (3D) grid confined by γ∈ [0,0.05] in steps ∆γ = 0.001, k<sub>1</sub>∈[0,5] in steps ∆k<sub>1</sub> = 0.05,  and  in steps .

      In the early stage of our project, we tried to explore “the entire model parameter space” as indicated by the reviewer. We first tried to use the Monte Carlo method to find parameter solutions in an open parameter space and with all parameter values allowed to be different. However, such a process is full of randomness and is computationally expensive (taking months to search viable parameter sets but still unable to profile the continuous viable parameter space; the probability of finding a viable parameter set is no higher than 0.02%, making it very hard to profile a continuous viable parameter space). Now we clearly can see the viable parameter space is a thin curved surface where all parameters have to satisfy a critical balance (Fig. 3a, b, Fig. 5e, f). This is why we exert a typical strategy for dimension reduction in network research in both cell polarization [Marée et al., Bull. Math. Biol., 2006; Goehring et al., Science, 2011; Trong et al., New J. Phys., 2014] and other biological topics (e.g., plasmid transferring in the microbial community [Wang et al., Nat. Commun., 2020]), i.e., unifying the parameter values for different nodes and different edges within the same regulatory type.

      Additionally, the curved surface for viable parameter space can be extended to infinite as long as the parameter balance is achieved (Fig. 3a, b, Fig. 5e, f), it is impossible or unnecessary to explore “the entire model parameter space”. Setting up a confined parameter region near the original point for parameter enumeration can help profile the continuous viable parameter space, which is sufficient for presenting the central conclusion of this paper – that is - the network structure and parameter need to satisfy a balance for stable cell polarization.

      To support a comprehensive study considering all kinds of reference and perturbed networks, we have maximized the parameter domain size by exhausting all the computational research we can access, including 400-500 Intel(R) Core(TM) E5-2670v2 and Gold 6132 CPU on the server (High-Performance Computing Platform at Peking University) and 5 Intel(R) Core(TM) i9-14900HX CPU on personal computers.

      To make it certain that instability holds true when the model parameter space is extended, we add a comprehensive comparison between the simple and complete models about how their instability occurs consistently even when the parameters (i.e., γ, α, k<sub>1</sub>, k<sub>2</sub>, q<sub>1</sub>, q<sub>2</sub> and [X<sub>c</sub>) are assigned with various values concerning all nodes and regulations, searched by the Monte Carlo method (Fig. S5).

      (4) Sensitivity of Numerical Solutions to Initial Conditions: Are the numerical solutions in both models sensitive to the chosen initial condition? What results do the models provide if uniform initial distributions were utilised instead?

      We sincerely thank the editor(s) and referee(s) for the comments!

      To investigate both the simple network and the realistic network consisting of various node numbers and regulatory pathways [Goehring et al., Science, 2011; Lang et al., Development, 2017], we propose a computational pipeline for numerical exploration of the dynamics of a given reaction-diffusion network's dynamics, specifically targeting the maintenance phase of stable cell polarization after its initial establishment [Motegi et al., Nat. Cell Biol., 2011; Goehring et al., Science, 2011; Seirin-Lee et al., Cells, 2020].

      Now we have added new simulations and explanations for the sensitivity of numerical solutions to initial conditions. For both models, a uniform initial distribution leads to a homogeneous pattern while a Gaussian noise distribution leads to a multipolar pattern. In contrast, an initial polarized distribution (even with shifts in transition planes, weak polarization, or asymmetric curve shapes between the two molecular species) can maintain cell polarization reliably.

      (5) Initial Conditions and Stability Tests: In Figure 1, the authors discuss the stability of the basic two-node network (a) upon modifications in (b-d). The stability test is performed through a pipeline procedure in which they always start from a polarised pattern described by Equation (4) and observe how the pattern evolves over time. It would be beneficial to explore whether the stability test depends on this specific initial condition. For instance, what would happen if the posterior molecules have an initial distribution of 1/(1+e^(-10x)), which is not exactly symmetric with respect to the anterior molecules' distribution of 1-1/(1+e^(-20x))? Additionally, if the initial polarisation is not as strong, for example, with the anterior molecules having a distribution of 10-1/(1+e^(-20x)) and the posterior molecules having a distribution of 9+1/(1+e^(-20x)), how would this affect the results?

      We sincerely thank the editor(s) and referee(s) for the constructive advice!

      Now we have added comprehensive comparisons between the simple and complete models about how they respond to alternative initial conditions consistently (Fig. S4, Fig. S9). The successful cell polarization pattern requests an initial polarized pattern, but its following stability and response to perturbation depend very little on the specific form of the initial polarized pattern. All the conditions mentioned by the reviewer have been included.

      (6) Stability Analysis: Throughout the paper, the authors discuss the stability of the polarised pattern. The stability is checked by an exhaustive search of the parameter space, ensuring the system reaches a steady state with a polarised pattern instead of a homogeneous pattern. It would be beneficial to explore if this stability is related to a linear stability analysis of the model parameters, similar to what was conducted in Reference [18], which can determine if a homogeneous state exists and whether it is stable or unstable. Including such an analysis could provide deeper insights into the system's stability and validate its robustness.

      We sincerely thank the editor(s) and referee(s) for the comments!

      We agree that the linear stability analysis can potentially offer additional insights into polarized pattern behavior. However, this approach often requests the aid of numerical solutions and is therefore not entirely independent [Goehring et al., Science, 2011]. Over the past decade, numerical simulations have consistently proven to be a reliable and sufficient approach for studying network dynamics, spanning from C. elegans cell polarization [Tostevin et al., Biophys. J, 2008; Blanchoud et al., Biophys. J, 2015; Seirin-Lee, Dev. Growth Differ., 2020] to topics in metazoon [Chau et al., Cell, 2012; Qiao et al., eLife, 2022; Sokolowski et al., arXiv, 2023]. Numerous purely numerical studies have successfully unveiled principles that help interpret [Ma et al., Cell, 2009] and synthesized real biological systems [Chau et al., Cell, 2012], independent of additional mathematical analysis. Thus, we leverage our numerical framework to address the cell polarization problems cell polarization problems in this paper.

      To confirm the reliability of stability checked by an exhaustive search of the parameter space, now we reproduce the qualitative and semi-quantitative phenomenon in three more experimental groups previously published (Section 2.5; Fig. S8) [Gotta et al., Curr. Biol., 2001; Aceto et al., Dev. Biol., 2006]. Combined with the original experiments (Section 2.5; Fig. 5; Fig. S7) [Hoege et al., Curr. Biol., 2010; Beatty et al., Development, 2010; Beatty et al., Development, 2013], we reproduce five experimental groups in total (two acting on LGL-1 and three acting on CDC-42), comprising eight perturbed conditions and using wild-type as the reference.

      To confirm the robustness of our conclusions regarding the system's stability, now we add comprehensive comparisons between the simple and complete models about 1. How they respond to alternative initial conditions consistently (Fig. S4; Fig. S9). 2. How they respond to alternative single modifications consistently, even when the parameters (i.e., γ, α, k<sub>1</sub>, k<sub>2</sub>, q<sub>1</sub>, q<sub>2</sub> and [X<sub>c</sub> ) are assigned with various values concerning all nodes and regulations (Fig. S5).

      (7) Interface Position Determination: In Figure 4, the authors demonstrate that by using a spatially varied parameter, the position of the interface can be tuned. Particularly, the interface is almost located at the step where the parameter has a sharp jump. However, in the case of a homogeneous parameter (e.g., Figure 4(a)), the system also reaches a stable polarised pattern with the interface located in the middle (x = 0), similar to Figure 4(b), even though the homogeneous parameter does not contain any positional information of the interface. It would be helpful to clarify the difference between Figure 4(a) and Figure 4(b) in terms of the interface position determination.

      We sincerely thank the editor(s) and referee(s) for the comments!

      The case of a homogeneous parameter (e.g., Fig. 4a), in which the system also reaches a stable polarised pattern with the interface located in the middle (x = 0), is just a reference adopted from Fig. 1a to show that the inhomogeneous positional information in Fig. 4b can achieve a similar stable polarised pattern.

      Now we clarify the interface position determination to Section 2.4 to improve readability. Moreover, it is marked with grey dashed line in all the patterns in Fig. 4 and Fig. 6 to highlight the importance of inhomogeneous parameters on interface localization.

      (8) Presented Comparison with Experimental Observations: The comparison with experimental observations lacks clarity. It isn't clear that the model "faithfully recapitulates" the experimental observations (lines 369-370). We recommend discussing and showing these comparisons more carefully, highlighting the expectations and similarities.

      We sincerely thank the editor(s) and referee(s) for the constructive suggestion!

      Now we remove the word “faithfully” and highlight the expectations and similarities of each experimental group by describing “cell polarization pattern characteristics in simulation: …”.

      (9) Validation of Model with Experimental Data: Given the extensive number of model parameters and the uncertainty of their values, it is essential for the authors to validate their model by comparing their results with experimental data. While C. elegans polarisation has been extensively studied, the authors have yet to utilise existing data for parameter estimation and model validation. Doing so would considerably strengthen their study.

      We sincerely thank the editor(s) and referee(s) for the constructive suggestion!

      To utilise existing data for parameter estimation, now we add a new section, Parameter Nondimensionalization and Order of Magtitude Consistency, into Supplemental Text. In this section, we introduced how we adopted the parameter nondimensionalization and value assignments from previous works [Goehring et al., J. Cell Biol., 2011; Goehring et al., Science, 2011; Seirin-Lee et al., Cells, 2020]. We listed four examples (i.e., evolution time, membrane diffusion coefficient, basal off-rate, and inhibition intensity) to show the consistency in order of magtitude between numerical and realistic values.

      To utilise existing data for model validation, now we reproduce the qualitative and semi-quantitative phenomenon in three more experimental groups previously published (Section 2.5; Fig. S8) [Gotta et al., Curr. Biol., 2001; Aceto et al., Dev. Biol., 2006]. Combined with the original experiments (Section 2.5; Fig. 5; Fig. S7) [Hoege et al., Curr. Biol., 2010; Beatty et al., Development, 2010; Beatty et al., Development, 2013], we reproduce five experimental groups in total (two acting on LGL-1 and three acting on CDC-42), comprising eight perturbed conditions and using wild-type as the reference.

      Also, we acknowledge the limitations of the current cell polarization model and provided, in 3. Discussion and conclusion, a detailed outline of potential model improvements. The limitations include, but are not limited to, issues involving “extensive number of model parameters” and “uncertainty of their values”, both of which rely on experimental measurements of biological information. However, comprehensive experimental measurement data on every molecular species, their interactions, and each species’ intensity distribution in space and time were not fully available from prior research. Refinement is lacking for some of these interactions, potentially requiring years of additional experimentation. Moreover, for certain species at specific developmental stages, only relative (rather than absolute) intensity measurements are available. We agreed that such information is essential for establishing a more utilizable model and discussed it thoroughly in 3. Discussion and conclusion. From a theoretical perspective, we adopted assumptions from the previous literature and constructed a minimal model for a specific cell polarization phase to investigate the network's robustness, supported by five experimental groups and eight perturbed conditions with wild-type as a reference in the C. elegans embryo.

      (10) Enhancing Model Accuracy by Considering Cortical Flows: The authors are encouraged to include cortical flows in their cell polarisation model, as these flows are known to be pivotal in the process. Although the current model successfully predicts cell polarisation without accounting for cortical flows, research has demonstrated their significant role in polarisation formation. By incorporating cortical flows, the model would provide a more thorough and precise representation of the biological process. Furthermore, previous studies, such as those by Goehring et al. (References 17 and 18), highlight the importance of convective actin flow in initiating polarisation. It would be valuable for the authors to address the contribution of convection with actin flow to the establishment of the polarisation pattern. The polarisation of the C. elegans zygote progresses through two distinct phases: establishment and maintenance, both heavily influenced by actomyosin dynamics. Works by Munro et al. (Dev Cell 2004), Shivas & Skop (MBoC 2012), Liu et al. (Dev. Biol. 2010), and Wang et al. (Nat Cell Biol 2017) underscore the critical roles of myosin and actin in orchestrating the localisation of PAR proteins during cell polarisation. To enhance the fidelity of their model, we recommend that the authors either integrate cortical flows and consider the effects driven by myosin and actin, or provide a discussion on the repercussions of omitting these dynamics.

      We sincerely thank the editor(s) and referee(s) for the comment!

      Indeed, previous research highlighted the importance of convective cortical flow in orchestrating the localisation of PAR proteins during the establishment phase of polarisation formation [Goehring et al., J. Cell Biol., 2011; Rose et al., WormBook, 2014; Beatty et al., Development, 2013]. However, during the maintenance phase, the non-muscle myosin II (NMY-2) is regulated downstream by the PAR protein network rather than serving as the primary upstream factor controlling PAR protein localization. While some theoretical studies integrated both reaction-diffusion dynamics and the effects of myosin and actin [Tostevin et al., Biophys J, 2008; Goehring et al, Science, 2011], others focused exclusively on reaction-diffusion dynamics [Dawes et al., Biophys. J., 2011; Seirin-Lee et al., Cells, 2020]. Now we clarify the distinction between the establishment and maintenance phases, emphasize our research focus on the reaction-diffusion dynamics during the maintenance phase, and provide a discussion of these omitted dynamics to foster a more comprehensive understanding in the future, as suggested.

      (11) Further Justification of Network Interactions: The authors should provide additional explanations, supported by empirical evidence, for the network interactions assumed in their model. This includes both node-node interactions and the rationale behind protein complex formations. Some of the proposed interactions lack empirical validation, as noted in studies such as Gubieda et al., Phil. Trans. R. Soc. B 2020. Additionally, discrepancies in protein intensity distributions, as observed in Wang et al., Nat Cell Biol 2017, should be addressed, particularly concerning the consideration of the PAR-3/PAR-6/PKC-3 complex as a single entity. Justifying these choices is crucial for ensuring the model's credibility and alignment with experimental findings.

      We sincerely thank the editor(s) and referee(s) for the helpful advice!

      In consistency with previous modeling efforts [Goehring et al., Science, 2011; Gross et al., Nat. Phys., 2019; Lim et al., Cell Rep., 2021], our model treats the PAR-3/PAR-6/PKC-3 complex as a single entity for simplification, thus neglecting the potentially distinct spatial distributions of each single molecular species.

      Now we acknowledge the limitations of the current cell polarization model and provided, in 3. Discussion and conclusion, a detailed outline of potential model improvements. The limitations include, but are not limited to, issues involving “node-node interactions” and “discrepancies in protein intensity distributions”, both of which rely on experimental measurements of biological information. However, comprehensive experimental measurement data on every molecular species, their interactions, and each species’ intensity distribution in space and time were not fully available from prior research. Refinement is lacking for some of these interactions, potentially requiring years of additional experimentation. Moreover, for certain species at specific developmental stages, only relative (rather than absolute) intensity measurements are available. We agreed that such information is essential for establishing a more utilizable model and discussed it thoroughly in 3. Discussion and conclusion.

      To ensure the model's credibility and alignment with experimental findings, now we reproduce the qualitative and semi-quantitative phenomenon in three more experimental groups previously published (Section 2.5; Fig. S8) [Gotta et al., Curr. Biol., 2001; Aceto et al., Dev. Biol., 2006]. Combined with the original experiments (Section 2.5; Fig. 5; Fig. S7) [Hoege et al., Curr. Biol., 2010; Beatty et al., Development, 2010; Beatty et al., Development, 2013], now we have reproduced five experimental groups in total (two acting on LGL-1 and three on CDC-42), comprising eight perturbed conditions and using wild-type as the reference.

      (12) Further Justification of Node-Node Network Interactions: The authors should provide further justification for the node-node network interactions assumed in their study. To the best of our knowledge, some of the node-node interactions proposed have not yet been empirically demonstrated. Providing additional explanations for these interactions would enhance the credibility of the model and ensure its alignment with empirical evidence.

      We sincerely thank the editor(s) and referee(s) for the helpful advice!

      Now we acknowledge the limitations of the current cell polarization model and provided, in 3. Discussion and conclusion, a detailed outline of potential model improvements. The limitations include, but are not limited to, issues involving “node-node network interactions”, which rely on experimental measurements of biological information. However, comprehensive experimental measurement data on every molecular species, their interactions, and each species’ intensity distribution in space and time were not fully available from prior research. Refinement is lacking for some of these interactions, potentially requiring years of additional experimentation. Moreover, for certain species at specific developmental stages, only relative (rather than absolute) intensity measurements are available. We agreed that such information is essential for establishing a more utilizable model and discussed it thoroughly in 3. Discussion and conclusion.

      To enhance the credibility of the model and ensure its alignment with empirical evidence, we reproduced the qualitative and semi-quantitative phenomenon in three more experimental groups previously published (Section 2.5; Fig. S8) [Gotta et al., Curr. Biol., 2001; Aceto et al., Dev. Biol., 2006]. Combined with the original experiments (Section 2.5; Fig. 5; Fig. S7) [Hoege et al., Curr. Biol., 2010; Beatty et al., Development, 2010; Beatty et al., Development, 2013], now we have reproduced five experimental groups in total (two acting on LGL-1 and three on CDC-42), comprising eight perturbed conditions and using wild-type as the reference.

      (13) Justification for Network Interactions and Protein Complexes: The authors must provide clear justifications, supported by references, for each network interaction between nodes in the five-node model. Some of the activatory/inhibitory signals proposed lack empirical validation, such as CDC-42 directly inhibiting CHIN-1. The provided Table S2 is insufficient to justify these interactions, necessitating additional explanations. Reviewing relevant literature, such as the work by Gubieda et al., Phil. Trans. R. Soc. B 2020, may offer insights into similar node networks. Furthermore, the authors should address discrepancies in protein intensity distributions, as observed in studies like Wang et al., Nat Cell Biol 2017. Specifically, the authors consider the PAR-3/PAR-6/PKC-3 complex as a single entity despite potential differences in their distributions. Justification for this choice is essential, particularly considering the importance of clustering dynamics during cell polarisation, as demonstrated by Wang et al., Nat Cell Biol 2017, and Dawes & Munro, Biophys J 2011.

      We sincerely thank the editor(s) and referee(s) for the helpful advice!

      In consistent with previous modeling efforts [Goehring et al., Science, 2011; Gross et al., Nat. Phys., 2019; Lim et al., Cell Rep., 2021], our model treats the PAR-3/PAR-6/PKC-3 complex as a single entity for simplification, thus neglecting the potentially distinct spatial distributions of each single molecular species. Besides, the inhibition of CHIN-1 from CDC-42, which recruits cytoplasmic PAR-6/PKC-3 to form a complex, may act indirectly to restrict CHIN-1 localization through phosphorylation [Sailer et al., Dev. Cell, 2015; Lang et al., Development, 2017].

      Now we acknowledge the limitations of the current cell polarization model and provided, in 3. Discussion and conclusion, a detailed outline of potential model improvements. The limitations include, but are not limited to, issues involving “each network interaction between nodes in the five-node model” and “discrepancies in protein intensity distributions”, both of which rely on experimental measurements of biological information. However, comprehensive experimental measurement data on every molecular species, their interactions, and each species’ intensity distribution in space and time were not fully available from prior research. Refinement is lacking for some of these interactions, potentially requiring years of additional experimentation. Moreover, for certain species at specific developmental stages, only relative (rather than absolute) intensity measurements are available. We agreed that such information is essential for establishing a more utilizable model and discussed it thoroughly in 3. Discussion and conclusion. From a theoretical perspective, we adopted assumptions from the previous literature and constructed a minimal model for a specific cell polarization phase to investigate the network's robustness, supported by five experimental groups and eight perturbed conditions with wild-type as a reference in the C. elegans embryo.

      (14) Incorporating Cytoplasmic Dynamics into the Model: The authors assume infinite cytoplasmic diffusion and neglect the role of cytoplasmic flows in cell polarity, which may oversimplify the model. Finite cytoplasmic diffusion combined with flows could potentially compromise the stability of anterior-posterior molecular distributions, affecting the accuracy of the model's predictions. The authors claim a significant difference between cytoplasmic and membrane diffusion coefficients, but the actual disparity seems smaller based on data from Petrášek et al., Biophys. J. 2008. For example, cytosolic diffusion coefficients for NMY-2 and PAR-2 differ by less than one order of magnitude. Additionally, the strength of cytoplasmic flows, as quantified by studies such as Cheeks et al., and Curr Biol 2004, should be considered when assessing the impact of cytoplasmic dynamics on polarity stability. Incorporating finite cytoplasmic diffusion and cytoplasmic flows into the model could provide a more realistic representation of cellular dynamics and enhance the model's predictive power.

      We sincerely thank the editor(s) and referee(s) for the comment!

      Cytoplasmic and membrane diffusion coefficients differ by two orders of magnitude according to previous experimental measurements on PAR-2 and PAR-6 [Goehring et al., J. Cell Biol., 2011; Lim et al., Cell Rep., 2021]. Many previous C. elegans cell polarization models have incorporated mass-conservation model combined with finite cytoplasmic diffusion, but this model description can lead to reverse spatial concentration distribution between the cell membrane and cytosol [Fig. 3 of Seirin-Lee et al., J. Theor. Biol., 2016; Fig. 2ab of Seirin-Lee et al., J. Math. Biol., 2020], disobeying experimental observation [Fig. 4A of Sailer et al., Dev. Cell, 2015; Fig. 1A of Lim et al., Cell Rep., 2021]. This implies that the infinite cytoplasmic diffusion, without precise experiment-based parameter assignment or accounting for other hidden biological processes (e.g., protein production and degradation), may be inappropriate in modeling the real spatial concentration distributions distinguished between the cell membrane and cytosol. To address this issue, some theoretical research incorporated protein production and degradation into their model, to acquire the consistent spatial concentration distribution between the cell membrane and cytosol [Tostevin et al., Biophys. J., 2008]. More definitive experimental data on the spatiotemporal changes in protein diffusion, production, and degradation are essential for providing a more realistic representation of cellular dynamics and enhancing the model's predictive power.

      Cytoplasmic flows indeed play an unneglectable role in cell polarity during the establishment phase [Kravtsova et al., Bull. Math. Biol., 2014], which creates a spatial gradient of actomyosin contractility and directs PAR-3/PKC-3/PAR-6 to the anterior membrane by cortical flow [Rose et al., WormBook, 2014; Lang et al., Development, 2017]. However, during the maintenance phase, the non-muscle myosin II (NMY-2) is regulated downstream by the PAR protein network rather than serving as the primary upstream factor controlling PAR protein localization [Goehring et al., J. Cell Biol., 2011; Rose et al., WormBook, 2014; Geβele et al., Nat. Commun., 2020]. While some theoretical studies integrated both reaction-diffusion dynamics and the effects of myosin and actin [Tostevin, 2008; Goehring, Science, 2011], others focused exclusively on reaction-diffusion dynamics [Dawes et al., Biophys. J., 2011; Seirin-Lee et al., Cells, 2020]. We now emphasize our research focus on the reaction-diffusion dynamics during the maintenance phase, so the dynamics between NMY-2 and PAR-2 are not included. We have also provided a discussion of the simplified cytoplasmic diffusion and omitted cytoplasmic flows to foster a more comprehensive understanding in the future.

      (15) Explanation of Lethality References: On page 13, the authors mention lethality without adequately explaining why they are drawing connections with lethality experimental data.

      We sincerely thank the editor(s) and referee(s) for the comment!

      It is well-known that cell polarity loss in C. elegans zygote will lead to symmetric cell division, which brings out the more symmetric allocation of molecular-to-cellular contents in daughter cells; this will result in abnormal cell size, cell cycle length, and cell fate in daughter cells, followed by embryo lethality [Beatty et al., Development, 2010; Beatty et al., Development, 2013; Rodriguez et al., Dev. Cell, 2017; Jankele et al., eLife, 2021]. Now we explain why we are drawing connections with lethality experimental data in Section 2.5.

      (16) Improved Abstract: "...However, polarity can be restored through a combination of two modifications that have opposing effects..." This sentence could be revised for better clarity. For example, the authors could consider rephrasing it as follows: "...However, polarity restoration can be achieved by combining two modifications with opposing effects...".

      We sincerely thank the editor(s) and referee(s) for helpful advice!

      Now we revise the abstract as follows:

      “Abstract – However, polarity restoration can be achieved by combining two modifications with opposing effects.”

      (17) Conservation of Mass in Network Models: Is conservation of mass satisfied in their network models?

      We sincerely thank the editor (s) and referee(s) for the comment!

      While previous experiments provide evidence for near-constant protein mass during the establishment phase [Goehring et al., Science, 2011], whether this is consistent until the end of maintenance is unclear.

      Many previous C. elegans cell polarization models have assumed mass conservation on the cell membrane and in the cell cytosol, this model description can lead to reverse spatial concentration distribution between the cell membrane and cytosol [Fig. 3 of Seirin-Lee et al., J. Theor. Biol., 2016; Fig. 2ab of Seirin-Lee et al., J. Math. Biol., 2020], disobeying experimental observation [Fig. 4A of Sailer et al., Dev. Cell, 2015; Fig. 1A of Lim et al., Cell Rep., 2021]. This implies that mass conservation may be inappropriate in modeling the real spatial concentration distributions distinguished between the cell membrane and cytosol. To address this issue, some theoretical research incorporated protein production and degradation into their model, instead of assuming mass conservation [Tostevin et al., Biophys. J., 2008]. More definitive experimental data on the spatiotemporal changes in protein mass are essential for constructing a more accurate model.

      Given the absence of a universally accepted model in agreement with experimental observation, we adopted the assumption that the concentration of molecules in the cytosol (not the total mass on the cell membrane and in the cell cytosol) is spatially inhomogeneous and temporally constant, which was also used before [Kravtsova et al., Bull. Math. Biol., 2014]. In the context of this well-mixed constant cytoplasmic concentration, our model successfully reproduced the cell polarization phenotype in wild-type and eight perturbed conditions (Section 2.5; Fig. S7; Fig. S8), supporting the validity of this simplified, yet effective, model. Now we have provided a discussion of protein mass assumption to foster a more comprehensive understanding in the future.

      (18) Comparison of Network Structures: In Figure 1c, the authors demonstrate that the symmetric two-node network is susceptible to single-sided additional regulation. They considered four subtypes of modifications, depending on whether [L] is in the anterior or posterior and whether [A] and [L] are mutually activating or inhibiting. What is the difference between the structure where [L] is in the anterior and in the posterior? Upon comparing the time evolution of the left panel ([L] is sided with

      ) and the right panel ([L] is sided with [A]), the difference is so tiny that they are almost indistinguishable. It might be beneficial for the authors to provide a clearer explanation of the differences between these network structures to aid in understanding their implications.

      We sincerely thank the editor(s) and referee(s) for the constructive suggestion!

      The difference between the structures where [L] is in the anterior and posterior is the initial spatial concentration distribution of [L], which is polarized to have a higher concentration in the anterior and posterior respectively. The time evolution of the left panel ([L] is sided with [P]) and the right panel [L] is sided with [P]) is almost indistinguishable because the perturbation from [L] is slight (less than over one order of magnitude) compared to the predominant [A]~[P] interaction ( for [A]~[P] mutual inhibition while for [A]~[L] mutual inhibition and for [A]~[L] mutual activation), highlighting the response of cell polarization pattern. To aid the readers in understanding their implications, we have added the [L] and plotted the spatial concentration distribution of all three molecular species at t=0,100, 200, 300, 400 and 500 in Fig. S3, where the difference between the [L] ones in the left and right panels are distinguishably shown.

      (19) Figure Reference: In line 308, Fig. 4a is referenced when explaining the loss of pattern stability by modifying an individual parameter, but this is not shown in that panel. Please update the panel or adjust the reference in the main text.

      We sincerely thank the editor(s) and referee(s) for pointing out this problem!

      Fig. 4 focuses on the regulatable shift of the zero-velocity interface by modifying a pair of individual parameters, not on the loss (or recovery) of pattern stability, which has been analyzed as a focus in Fig. 1, Fig. 2, and Fig. 3. Fig. 4a is actually from the same simulation as the one in Fig. 1a, which has spatially uniform parameters used as a reference in Fig. 4. The individual parameter modification in other subfigures of Fig. 4 shows how the zero-velocity interface is shifted in a regulatable manner always in the context of pattern stability. Now we update the panel, adjust the reference, add one more paragraph, and improve the wording to clarify how the analyses in Fig. 4 are carried out on top of the pattern stability already studied.

      (20) Viable Parameter Sets: In line 355, the number of viable parameter sets (602) is not very informative by itself. We suggest reporting the fraction or percentage of sets tested that resulted in viable results instead. This applies similarly to lines 411 and 468.

      We sincerely thank the editor(s) and referee(s) for the constructive comment!

      Now the fraction/percentage of parameter sets tested that resulted in viable results are added everywhere the number appears.

      (21) Perturbation Experiments: In lines 358-359, "the perturbation experiments" implies that those considered are the only possible ones. Please rephrase to clarify.

      We sincerely thank the editor(s) and referee(s) for the helpful advice!

      Now we rephrase three paragraphs to clarify why the perturbation experiments involved with [L] and [C] are considered instead of other possible ones.

      (22) Figure 2S: This figure is unclear. The caption states that panel (a) shows the "final concentration distribution," but only a line is shown. If "distribution" refers to spatial distribution, please clarify which parameters are shown.

      We sincerely thank the editor(s) and referee(s) for pointing out this problem!

      Now we clarify the “spatial concentration distribution” and which parameters are shown in the figure caption.

      (23) Figure 5 and 6 Captions: The captions for Figures 5 and 6 could benefit from clarification for better understanding.

      We sincerely thank the editor(s) and referee(s) for the constructive suggestion!

      Now we clarify the details in the captions of Fig. 5 and Fig. 6 for better understanding.

      (24) Figure 5 Legend: The legend on the bottom right corner of Figure 5 is unclear. Please specify to which panel it refers.

      We sincerely thank the editor(s) and referee(s) for the constructive suggestion!

      Now we clarify to which the legend on the bottom right corner of Fig. 5 refers.

      (25) L and A~C Interactions: In paragraphs 405-418, please explain why the L and A~C interactions are removed for the comparison instead of others.

      We sincerely thank the editor(s) and referee(s) for the constructive suggestion!

      Now we add a separate paragraph and a supplemental figure to explain why the L and A~C interactions are removed for the comparison instead of others.

      (26) Network Structures in Figure S3: From the "34 possible network structures" considered in Figure S3 (lines 440-441), why are the "null cases" (L disconnected from the network) relevant? Shouldn't only 32 networks be considered?

      We sincerely thank the editor(s) and referee(s) for pointing out this problem!

      Now the two “null cases” are removed:

      (27) Figure S3 Caption: The caption must state that the position of the nodes (left or right) implies the polarisation pattern. Additionally, with the current size of the figure, the dashed lines are extremely hard to differentiate from the continuous lines.

      We sincerely thank the editor(s) and referee(s) for the constructive suggestion!

      Now we state that the position of the nodes (left or right) implies the polarization pattern. Additionally, we have modified the figure size and dashed lines so that the dash lines are adequately distinguishable from the continuous lines.

      (28) Equation #7: It is confusing to use P as the number of independent simulations when P is also one of the variables/species in the network. Please consider using different notation.

      We sincerely thank the editor(s) and refer(s) for the hhelpful advice!

      Now we replace the P in current Equation #8 with Q and the P in current Equation #10 with W.

      (29) Use of "Detailed Balance": The authors used the term "detailed balance" to describe the intricate balance between the two groups of proteins when forming a polarised pattern. However, "detailed balance" is a term with a specific meaning in thermodynamics. Breaking detailed balance is a feature of nonequilibrium systems, and the polarisation phenomenon is evidently a nonequilibrium process. Using the term "detailed balance" may cause confusion, especially for readers with a physics background. It might be advisable to reconsider the terminology to avoid potential confusion and ensure clarity for readers.

      We sincerely thank the editor(s) and referee(s) for the constructive suggestion!

      To avoid potential confusion and ensure clarity for readers, now we replace “detailed balance” with “balance”, “required balance”, or “interplay” regarding different contexts.

      (30) Terminology: The word "molecule" is used where "molecular species" would be more appropriate, e.g., lines 456 and 551. Please revise these instances.

      We sincerely thank the editor(s) and referee(s) for the constructive suggestion!

      Now we replace all the “molecule” by “molecular species” as suggested.

      (31) Section 2.5: This section is confusing. It isn't clear where the "method outlined" (line 464) is nor what "span an iso-velocity surface at vanishing speed" means in line 470. The sentence in lines 486-488, "An expression similar to Eq. 8 enables quantitative prediction...", is too vague. Please clarify these points and specify what the "similar expression" is and where it can be found.

      We sincerely thank the editor(s) and referee(s) for the constructive suggestion!

      Now we clarify these points and specify the terms as suggested.

      (32) Software Mention: The software is only mentioned in the abstract and conclusions. It should also be mentioned where the computational pipeline is described, and the instructions available in the supplementary information need to be referenced in the main text.

      We sincerely thank the editor(s) and referee(s) for pointing out this problem!

      Now we mention the software where the computational pipeline is described and reference the instructions available in the Supplemental Text.

      (33) Supplementary Material References: Several parts of the supplementary material are never referenced in the main text, including Figure S1, Movies S3-S4, and the Instructions for PolarSim. Please reference these in the main text to clarify their relevance and how they fit with the manuscript's narrative.

      We sincerely thank the editor(s) and referee(s) for pointing out this problem!

      Now we add all the missing references for supplementary materials to the main text properly.

    1. Author response:

      General Statements

      We sincerely appreciate the constructive comments from the reviewers, which have significantly enhanced the clarity and rigor of our manuscript. Most of their suggestions have already been incorporated into the revised version. Additionally, we are conducting an additional experiment to further substantiate our conclusions, and preliminary data seem to support our findings.

      As pointed out by Reviewer #1, the regulation of neural circuit function by oligodendrocytes is currently a highly significant and actively studied topic. Our study demonstrates that regional heterogeneity in oligodendrocytes underlies the microsecond-level computational processes in the sound localization circuit. We believe this work represents a substantial contribution to the field.

      Description of the planned revisions

      • Evaluation of node formation along axons sparsely expressing eTeNT (related to Reviewer #2: comment 1)

      Based on the approximately 90% expression efficiency of A3V-eTeNT in NM neurons, we interpreted that vesicular release from NM axons was largely inhibited in the NL region, leading to the suppression of oligodendrogenesis and the subsequent emergence of unmyelinated segments. However, the effects of eTeNT on myelination are likely diverse, and a possibility remains that eTeNT directly disrupted axon-oligodendrocyte interactions, preventing oligodendrocytes from myelinating the axons expressing eTeNT.

      To test this possibility, we have initiated an additional experiment to evaluate formation of nodes along axons, while expressing eTeNT sparsely by electroporation. Preliminary results indicated that unmyelinated segments did not increase, supporting our original conclusion. After completion of the experiment, we will include the findings as a Supplementary Figure associated with Figure 6, which will provide a clearer understanding of how eTeNT influences myelination.

      Description of the revisions that have already been incorporated in the transferred manuscript

      • Revised terminology from "nodal distribution" to "nodal spacing" throughout the manuscript. (Reviewer #1: comment 1)

      • Emphasized that our analyses were focused on the main trunk of NM axons (Reviewer #1: comment 2) We explicitly stated throughout the manuscript that we analyzed the main trunk of NM axons and made it clear that our findings do not contradict those by Seidl et al. (J Neurosci 2010), showing the similar axon diameter between midline and ventral NL regions (page 7, line 7).

      • Added an explanation on the maturation of sound localization circuit (Reviewer #1: comment 3) We explained that chickens have high ability of sound localization at hatch, emphasizing that the sound localization circuit is almost fully developed by E21 (page 4, line 12).

      • Emphasized the diverse effects of neuronal activity on oligodendrocytes (page 10, line 18) (Reviewer #1: comment 4)

      • Added details on the efficiency of A3V-eTeNT expression in NM neurons to the Results section (page 8, line 5) (Reviewer #2: comment 1)  

      • Made it clear in Figure Legend for Figure 6D that the analysis was conducted under the condition, where most of the axons were labeled by A3V-eTeNT (page 31, line 9) (Reviewer #2: comment 2)

      • Clarified the rationale for statistical test selection (Reviewer #2: comment 3.1)

      • Reanalyzed all statistical data with appropriate methods using R (Reviewer #2: comment 3.2)

      • Clearly indicated which statistical tests were used in each figure (Reviewer #2: comment 3.3)

      • Clarified what n represents and N used in each experiment (Reviewer #2: comment 3.4)

      • Added individual data points to bar graphs in Figure  5 and 6 (Reviewer #2: comment 3.5)

      • Emphasized the importance of comparing the ITD circuit with that of rodents (page 11, line 32) (Reviewer #2: comment 4) 

      • Softened the expressions related to "determine" (Reviewer #2: comment 5)

      Our study demonstrates that regional differences in the intrinsic properties of oligodendrocytes are the prominent determinant of nodal spacing patterns. However, we acknowledge that this does not establish a direct causation. Accordingly, relevant expressions have been revised throughout the manuscript.

      • Added references (Reviewer #2: comment 6)

      • Corrected units in Figure 1G (Reviewer #2: comment 7)

      • Added discussion about the involvement of pre-nodal clusters in the regional differences in nodal spacing (page 9, line 35) (Reviewer #3: comment 1).

      Related to this issue, we have added new data to Figure 6I.

      • Discussed the possibility that the developmental origin and/or the pericellular microenvironment of OPCs contributed to the regional heterogeneity of oligodendrocytes (page 9, line 21) (Reviewer #3: comment 3).

      • Added references used in the response to reviewers into the main text.

      • Corrected the data error in Figure 6G, H

      • Corrected the dataset in Figure 3E

      We limited the data in Figure 3E–G to those measuring both myelin length and diameter simultaneously.

      Description of analyses that authors prefer not to carry out

      • Analysis in adult chickens (Reviewer #1: comment 3,4)

      The chick brainstem auditory circuit is nearly fully developed by E21, and we have also demonstrated that nodal spacing increases by approximately 20% while maintaining regional differences up to P9. Therefore, our study covers the period from pre-myelination to postfunctional maturation, and we think that the necessity of analyzing aged animals is small.

      • Functional evaluation of the efficiency of eTeNT suppression (Reviewer #2: comment 1)

      It is technically challenging to quantitatively assess the inhibition of vesicular release by eTeNT in NM axons given that multiple synapses from different NM axons converge onto postsynaptic neurons. In addition, previous studies have already validated the efficacy of this construct in multiple species. Therefore, we will not evaluate electrophysiologically the extent of vesicular release inhibition by eTeNT in this study. Instead, we have provided clear evidence that A3V-eTeNT is expressed efficiently and leads to notable phenotypic changes, such as the inhibition of oligodendrogenesis. (page 8, line 5).

      • Replacing figures with data averaged per animal (Reviewer #2: comment 3.4)

      Our study focuses on the distribution of morphological characteristics at the single-cell level rather than solely on group means. Averaging measurements per animal could obscure this cellular heterogeneity and potentially misrepresent our findings. Given that data distributions in our plots show clear distinctions, we believe that averaging per biological replicate is not essential in this case. If requested, we will be happy to provide the outputs of PlotsOfDifferences as supplementary source data files, similar to those used in eLife publications, for each figure.

      • Additional experiments to manipulate oligodendrocyte density (Reviewer #2: comment 5)

      We have already demonstrated that A3V-eTeNT reduces oligodendrocyte density in the NL region, and some of the arguments in our study are based on this result. Therefore, we think that further experiments are not necessary.

      • Verification of the presence of pre-nodal clusters (Reviewer #3: comment 1)

      We investigated the presence of pre-nodal clusters on NM axons, but we could not identify them in the immunohistochemistry of AnkG. As the occurrence of pre-nodal clusters varies depending on neuronal type, we consider that pre-nodal clusters are not prominent in the NM axons and that further experimental validation would not be necessary. Instead, we have added a discussion on the possibility that pre-nodal clusters contribute to regional differences in nodal spacing along NM axons (page 9, line 35).

      • Axon diameter measurements using EM (Reviewer #3: comment 2)

      This experiment was already done by Seidl et al. (2010), and hence, we do not think it necessary to repeat it. We believe that the relative differences in axon diameter between the regions could be adequately assessed using the optical approach with membrane-targeted GFP.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Ma, Yang et al. report a new investigation aimed at elucidating one of the key nutrients S. Typhimurium (STM) utilizes with the nutrient-poor intracellular niche within the macrophage, focusing on the amino acid beta-alanine. From these data, the authors report that beta-alanine plays an important role in mediating STM infection and virulence. The authors employ a multidisciplinary approach that includes some mouse studies and ultimately propose a mechanism by which panD, involved in B-Ala synthesis, mediates the regulation of zinc homeostasis in Salmonella. The impact of this work is questionable. There are already many studies reporting Salmonella-effector interactions, and while this adds to that knowledge it is not a significant advance over previous studies. While the authors are investigating an interesting question, the work has two important weaknesses; if addressed, the conclusions of this work and broader relevance to bacterial pathogenesis would be enhanced.

      Strengths:

      This reviewer appreciates the multidisciplinary nature of the work. The overall presentation of the figure graphics are clear and organized.

      Weaknesses:

      First, this study is very light on mechanistic investigations, even though a mechanism is proposed. Zinc homeostasis in cells, and roles in bacteria infections, are complex processes with many players. The authors have not thoroughly investigated the mechanisms underlying the roles of B-Ala and panD in impacting STM infection such that other factors cannot be ruled out. Defining the cellular content of Zn2+ STM in vivo would be one such route. With further mechanistic studies, the possibility cannot be ruled out that the authors have simply deleted two important genes and seen an infection defect - this may not relate directly to Zn2+ acquisition.

      Thank you for your patient and thoughtful reading, as well as the constructive comments and advice regarding our manuscript. We have revised the manuscript based on your comments and suggestions.

      You are correct that this work has not thoroughly investigated the mechanisms underlying the roles of β-alanine, panD, and zinc in impacting Salmonella infection. It is challenging to isolate sufficient amounts of Salmonella from infected cells or tissues and then measure the zinc concentration in the bacteria, and we have attempted to do so without success. Therefore, we investigated the zinc content in mouse liver and RAW264.7 cells infected with Salmonella Typhimurium 14028s wild-type (WT) and panD mutant (Δ_panD_), which can indirectly reflect zinc acquisition by intracellular Salmonella. We observed that the zinc content in Δ_panD_-infected mouse liver macrophages and RAW264.7 cells was increased compared with that in WT-infected mouse liver macrophages and RAW264.7 cells, respectively (Figures 5E and 6A). This implies that the panD gene and β-alanine are important for Salmonella to absorb zinc from host cells. This information has been added to the revised manuscript (lines 325-329, 344-348).

      Meanwhile, we concur that additional, unknown mechanisms are involved in the virulence regulation by β-alanine in Salmonella. Our findings indicate that the double mutant Δ_panD_Δ_znuA_, which cannot synthesize β-alanine nor uptake zinc, is more attenuated than the single mutant Δ_znuA_ (Figures 5D and 6B). This suggests that the contribution of β-alanine to Salmonella's virulence is partially dependent on zinc acquisition. We have revised the related descriptions throughout the manuscript for clarity (lines 31, 304, 341,1056, 1068).

      Second, the authors hint at their newly described mechanism/pathway being important for disease and possibly a target for therapeutics. This claim is not justified given that they have employed a single STM strain, which was isolated from chickens and is not even a clinical isolate. The authors could enhance the impact of their findings and relevance to human disease by demonstrating it occurs in human clinical isolates and possibly other serovars. Further, the use of mouse macrophage as a model, and mice, have limited translatability to human STM infections.

      We thank you for your comments and advice on our manuscript and are delighted to accept them. Salmonella Typhimurium causes systemic disease in mice, which is similar to the symptoms of typhoid fever in humans and has been widely used to explore the pathogenesis of Salmonella. Based on your comment, we have now performed additional experiments to confirm several key points of our findings in another typical Salmonella serovar, Salmonella enterica serovar Typhi, which is a human-limited serovar and the cause of typhoid fever in humans (PLoS Pathog. 2012, 8(10):e1002933).

      We constructed the panD mutant strain (ΔpanD) in the S. Typhi strain Ty2 and  subsequently compared the replication of ΔpanD with that of the Ty2 wild-type in the human THP-1 monocyte like cell line (ATCC TIB-22) using gentamicin protection assays. The results showed that the replication of ΔpanD in THP-1cells was reduced by 2.6-fold at 20 h post-infection compared to the Ty2 wild-type strain  (P < 0.01) (Figure 2_figure Supplement 3), suggesting that panD also facilitates S. Typhi replication in human macrophages and may be involved in the systemic infection of S. Typhi in humans. This result has been included in the revised manuscript. (lines 203-210).

      Based on these results, we speculate that PanD may serve as a potential target for treating Salmonella infection.

      Reviewer #1 (Recommendations for the authors):

      (1) Line 28. Latin phrases like de novo should be italicized.

      Thank you for your careful review. We have revised the manuscript thoroughly (Lines 28, 65, 77, 106, 171, 173, 214, 1002, 1023, 1078).

      (2) Line 45. 'survival' typo.

      We have corrected it in the revised manuscript (Line 45).

      (3) Line 57. What evidence or prior work supports the SCV of macrophages in a nutrient-poor environment? Citation needed.

      The relevant reference has now been added (lines 62-63).

      (4) Lines 65-68. If an 'increasing number of studies have focused' on this topic, please cite them here.

      The relevant reference has now been added (lines 72-73).

      (5) Lines 69-71. Citations are needed for these claims.

      The relevant reference has now been added (lines 76-77, 79-80).

      (6) Line 76-77. Citation needed for this claim.

      The relevant reference has now been added (lines 84, 86).

      (7) Line 116-122, and Figure 1C, and Figure 1 legend. An important claim in this work is that the amino acid content of the macrophage cytoplasm is different +/- STM infection. The authors need to explain this result more carefully and define their acronyms. What is VIP, Log2 FC, etc.? What do the colors in Figure 1C mean? They are not defined. If possible, it would be more approachable to list these as molar concentrations, weight/cell, or number of molecules/cell. The authors should calculate an effect size for each of these data to help assess if the differences are meaningful. Without this information, and a clearer explanation of what these data are, it is difficult to evaluate the authors' claim that "8 [amino acids] showed significant differences in abundance."

      Thank you for the comment. The full names of VIP (Variable Importance in the Projection) and FC (fold change) have been included in the revised manuscript. In Figure 1C of the original manuscript, pink represents the content of amino acids that increased following Salmonella infection, whereas blue signifies the content of amino acids that decreased after Salmonella infection.

      Based on your suggestion, we have revised Figure 1C (now Figure 1C, D in the revised manuscript) and the content of amino acids is now expressed as weight per cell (ng/ 10<sup>7</sup> cells). The legend has been updated accordingly. (lines 9931-997).

      (8) Line 134-138. Additional controls are required for this experiment. By adding a nutrient (B-Ala) you have increased the nutrient availability and growth potential of the bacteria. This may not relate to anything special to B-Ala. Perhaps the addition of another amino acid, or sugar, would have a similar impact. Further, this result would be more compelling if the authors demonstrated a dose-dependent effect of B-Ala addition.

      Thank you for the comment. To further confirm that host-derived β-alanine can promote intracellular Salmonella replication, we have added varying concentrations of β-alanine (0.5, 1, 2, and 4 mM) to the culture medium (RPMI) of RAW264.7 cells. Subsequently, we infected these cells with Salmonella to assess the impact of β-alanine supplementation on the bacterium's replication within macrophages. Our observations indicate that the addition of 1, 2, and 4 mM β-alanine significantly (P < 0.001) enhanced Salmonella replication in RAW264.7 cells. Furthermore, the increase in Salmonella intracellular replication was dose-dependent, as illustrated in the revised Figure 1E. These findings suggest that host-derived β-alanine facilitates Salmonella replication inside macrophages. We have included these results in the revised manuscript (lines 141-149).

      (9) Lines 181-184, and Figure 2E. In addition to the fold-change replication data, here and elsewhere the authors should provide raw CFU counts for data transparency.

      Thank you for bringing this to our attention. In this work, we have utilized “fold intracellular replication (20 h intracellular bacterial CFU/ 2 h intracellular bacterial CFU)” to illustrate the differences in intracellular replication of different Salmonella strains in macrophages. The term “fold intracellular replication” is commonly employed in recently published reports (eg. FEMS Microbiol Lett. 2024, 9;371:fnae067; mBio. 2024, 15(7):e0112824; Front Microbiol. 2024, 14:1340143). To ensure data transparency, we have included the raw CFU counts in the source data file.

      (10) Line 197. Why employ i.p. injection of STM? As a non-typhoidal serovar, STM infection is enteric, and so i.p. injection seems very artificial if the goal is to understand the role B-Ala synthesis in disease.

      Thank you for the comment. Salmonella can induce gastroenteritis or systemic infection, which are associated with its capacity to invade intestinal epithelial cells and replicate within macrophages, respectively. In this study, using gentamicin protection assays and immunofluorescence analysis, we demonstrated that β-alanine is crucial for Salmonella replication inside macrophages. Since replication in macrophages is a key determinant of systemic Salmonella infection, we hypothesized that β-alanine also affects Salmonella systemic infection in vivo. Intraperitoneal (i.p.) injection enables Salmonella to disseminate directly to systemic sites via the lymphatic and bloodstream systems, bypassing the need for intestinal invasion (Microbiol Res. 2023, 275:127460; Int Immunopharmacol. 2016, 31:233-8). Thus, we conducted the mice infection assays via intraperitoneal (i.p.) injection to ascertain whether β-alanine affects systemic Salmonella infection. We have included the description in the revised manuscript to enhance clarity. (lines 217-221).

      Whether β-alanine influences Salmonella invasion of intestinal epithelial cells and intestinal colonization has not been investigated in this work; this issue will be explored in our future studies.

      (11) Line 207-214 and Figure 3. If the hypothesis is that B-Ala mediates STM survival/virulence through enhancing metabolism in the SCV and intracellular niche, why did the authors not investigate/enumerate STM in this niche in their in vivo studies?

      Thank you for the comment. Through immunofluorescence staining, we have investigated the bacterial count of Salmonella wild-type (WT), panD mutant (Δ_panD_), and complemented strain (cpanD) within the macrophages of the mouse liver. The findings indicated that the number of Δ_panD_ in each liver macrophage was significantly (P < 0.0001) lower than that of WT, and the complementation of Δ_panD_ increased the bacterial count in each liver macrophage to the level of WT (refer to Figure 3E in the revised manuscript). These results have been included in the revised manuscript. (lines 234-239).

      (12) Figure 4B - the down genes label is cut off.

      Thank you for your careful review. We have corrected it in the revised Figure 4B.

      (13) Line 260-265. SPI-2 needs to be defined and introduced, as do other terms here, to make the work approachable to non-STM specialists.

      The introduction of SPI-2 has been added to the revised manuscript. (Lines 290-292).

      (14) Line 300-301. Additional experiments are needed to support the claim that "data indicate that β-alanine promotes in vivo virulence of Salmonella, partially by increasing the expression of zinc transporter genes." Gene up- or down-regulation does not necessarily have any meaningful impact on function or activity. The authors here need an assay that confirms that the function of znuA is disrupted, such as examining the cell Zn2+ content in vivo at different levels of B-Ala exposure and/or panD activity. Moreover, more Zn2+ is not necessarily beneficial for STM, at levels too high zinc can exert cell toxicity. So, the authors have a correlation but no data supporting this mechanism explains their observations of virulence and infection. How much Zn2+ is ideal for STM growth?

      Thank you for the comment. It is challenging to isolate sufficient amounts of Salmonella from infected cells or tissues and then measure the zinc concentration in the bacteria, and we have attempted to do so without success. Therefore, we investigated the zinc content in mouse liver and RAW264.7 cells infected with Salmonella Typhimurium 14028s wild-type (WT) and panD mutant (ΔpanD), which can indirectly reflect zinc acquisition by intracellular Salmonella. We observed that the zinc content in Δ_panD_-infected mouse liver macrophages and RAW264.7 cells was increased compared with that in WT-infected mouse liver macrophages and RAW264.7 cells, respectively (Figures 5E and 6A). This implies that the panD gene and β-alanine are important for Salmonella to absorb zinc from host cells. This information has been added to the revised manuscript (lines 325-329, 344-348).

      Zinc is essential for bacterial survival and growth, as zinc-binding proteins constitute approximately 5% of the bacterial proteome and play crucial roles in bacterial metabolism and growth (J Proteome Res. 2006, 5(11):3173-8; Future Med Chem. 2017, 9(9):899-910). Regarding Salmonella, zinc is also employed to undermine the antimicrobial host defense mechanisms of macrophages, by inhibiting NF-кB activation and impairing NF-кB-dependent bacterial clearance (J Biol Chem. 2018, 293(39):15316-15329; Infect Immun. 2017, 85(12):e00418-17). Thus, the efficient acquisition of zinc may play a crucial role in the survival and replication of Salmonella within macrophages, where zinc availability is extremely limited (Infect Immun. 2007, 75(12):5867-76; Biochim Biophys Acta. 2016, 1860(3):534-41). It has been reported that Salmonella utilizes the high-affinity ZnuABC zinc transporter to maximize zinc availability within host cells (Infect Immun. 2007, 75(12):5867-76). Here, we discovered that β-alanine can enhance the expression of the zinc transporter genes znuABC, which might serve as a supplementary mechanism for the efficient uptake of zinc by Salmonella within macrophages.

      You are correct that more zinc is not necessarily beneficial for Salmonella, as excessive zinc can inhibit the growth of Salmonella. Considering that zinc availability is limited within macrophages and the znuABC genes are significantly upregulated when Salmonella resides inside macrophages (PLoS Pathog. 2015, 11(11):e1005262; Science. 2018, 362(6419):1156-1160), it is likely that zinc acts as a limiting factor and may not attain very high concentrations during Salmonella's growth within macrophages. We have included a discussion on this matter in the revised manuscript.t (lines 459-466).

      (15) Figure 6B. Related to the above, these data would be more compelling with higher n and a dose-dependent response demonstrated for Zn2+ addition. This is a central point of the manuscript, and effectively what the authors propose as the underlying mechanism, and it should be more robustly substantiated.

      Thank you for the comment. As stated in the previous response, we were unable to directly assess the bacterial zinc concentration during Salmonella growth within macrophages. Instead, we investigated the zinc content in mouse liver and RAW264.7 cells infected with Salmonella Typhimurium 14028s wild-type (WT) and panD mutant (ΔpanD), which can indirectly reflect zinc acquisition by intracellular Salmonella. We observed that the zinc content in Δ_panD_-infected mouse liver macrophages and RAW264.7 cells was increased compared with that in WT-infected mouse liver macrophages and RAW264.7 cells, respectively (Figures 5E and 6A). This implies that the panD gene and β-alanine are important for Salmonella to absorb zinc from host cells. Moreover, considering that zinc availability is limited within macrophages and the znuABC genes are significantly upregulated when Salmonella resides inside macrophages (PLoS Pathog. 2015, 11(11):e1005262; Science. 2018, 362(6419):1156-1160), it is likely that zinc acts as a limiting factor and may not attain very high concentration during Salmonella's growth within macrophages.

      Reviewer #2 (Public review):

      Summary:

      Salmonella exploits host- and bacteria-derived β-alanine to efficiently replicate in host macrophages and cause systemic disease. β-alanine executes this by increasing the expression of zinc transporter genes and therefore the uptake of zinc by intracellular Salmonella.

      Strengths:

      The experiments designed are thorough and the claims made are directly related to the outcome of the experiments. No overreaching claims were made.

      Weaknesses:

      A little deeper insight was expected, particularly towards the mechanistic aspects. For example, zinc transport was found to be the cause of the b-alanine-mediated effect on Salmonella intracellular replication. It would have been very interesting to see which are the governing factors that may get activated or inhibited due to Zn accumulation that supports such intracellular replication.

      We appreciate your review and advice. To further investigate the mechanisms by which β-alanine, panD, and zinc influence Salmonella infection, we have conducted additional experiments as suggested. For instance, we examined the zinc content in mouse liver and RAW264.7 cells infected with Salmonella Typhimurium 14028s wild-type (WT) and panD mutant (Δ_panD_). This approach indirectly reflects zinc acquisition by intracellular Salmonella, as it is challenging to isolate sufficient amounts of the bacteria from infected cells or tissues for zinc concentration measurement. We observed that the zinc content in Δ_panD_-infected mouse liver macrophages and RAW264.7 cells was increased compared to that in WT-infected counterparts (Figures 5E and 6A). This suggests that the panD gene and β-alanine are crucial for Salmonella to absorb zinc from host cells. This new information has been included in the revised manuscript (lines 325-329, 344-348).

      Zinc is essential for bacterial survival and growth, as zinc-binding proteins constitute approximately 5% of the bacterial proteome and play crucial roles in bacterial metabolism and growth. (J Proteome Res. 2006, 5(11):3173-8; Future Med Chem. 2017, 9(9):899-910 ). Regarding Salmonella, zinc is also employed to undermine the antimicrobial host defense mechanisms of macrophages, by inhibiting NF-кB activation and impairing NF-кB-dependent bacterial clearance (J Biol Chem. 2018, 293(39):15316-15329; Infect Immun. 2017, 85(12):e00418-17). Thus, efficient zinc uptake could be crucial for Salmonella survival and replication within macrophages, where zinc availability is extremely limited (Infect Immun. 2007, 75(12):5867-76; Biochim Biophys Acta. 2016, 1860(3):534-41). It has been reported that Salmonella exploits the high-affinity ZnuABC zinc transporter to maximize zinc availability in host cells (Infect Immun. 2007, 75(12):5867-76). Here, we discovered that β-alanine can enhance the expression of the zinc transporter genes znuABC, which might serve as a supplementary mechanism for the efficient uptake of zinc by Salmonella within macrophages. We have addressed this issue in the revised manuscript (lines 459-466).

      Reviewer #2 (Recommendations for the authors):

      A few general clarifications and suggested experiments:

      (1) Metabolome analysis: Salmonella can itself produce b-alanine. Given that it is isolated from infected cells where salmonella has scavenged b-alanine from host cytosol as well as produced it, how b-alanine levels went down in metabolome analysis is confusing.

      Thank you for the comment. The method for targeted metabolic profiling is conducted as outlined in a recently published paper by our group (Nat Commun. 2021, 12(1):879). To prevent delays and changes in metabolite concentrations during the separation of bacterial contents from macrophages, we determined the combined metabolite concentrations directly from infected cells and Salmonella. We observed that each Salmonella cell contained only 0.01%-0.02% of the concentration of each corresponding combined metabolite. Approximately 94% of the infected macrophages contained no more than ten bacteria at 8 hours post-infection, confirming that the combined metabolites were predominantly from the host. We have included an explanation of this issue in the method section. (lines 557-560).

      (2) What is the basal level of b-alanine produced by macrophages? How was 1 mM conc. chosen?

      According to our results, the content of β-alanine in uninfected RAW264.7 cells is 26-33 μM/10<sup>7</sup> cell (700-900 ng/10<sup>7</sup> cell). The 1 mM concentration was chosen based on a published report (Appl Microbiol Biotechnol. 2004, 65(5):576-82).

      Additionally, we have supplemented the culture medium (RPMI) of RAW264.7 cells with 0.5, 1, 2, and 4 mM β-alanine and subsequently infected them with Salmonella to assess the impact of β-alanine supplementation on the bacterium's replication within macrophages. Our observations revealed that the supplementation with 1, 2, and 4 mM β-alanine significantly (P < 0.001) enhanced Salmonella replication in RAW264.7 cells. Furthermore, the addition of β-alanine to the infected cells resulted in a dose-dependent increase in Salmonella intracellular replication, as depicted in Figure 1E. These findings further support the notion that host-derived β-alanine facilitates Salmonella replication within macrophages. This data has been incorporated into the revised manuscript (lines 141-149).

      (3) The antimicrobial activity of macrophages preventing the growth of intracellular Salmonella will primarily be governed by genes such as GBPs, defensins, nitric oxide, etc. The expression of these genes should be tested rather than cytokines which are secreted with little effect on intracellular Salmonella.

      Thank you for the suggestion. We have investigated the levels of ROS (reactive oxygen species) and RNS (reactive nitrogen species) in Salmonella-infected RAW264.7 cells, both in the presence and absence of 1 mM β-alanine. The results indicated that β-alanine did not affect the ROS and RNS levels in RAW 264.7 cells (Figure 1_figure Supplement 1), suggesting that β-alanine does not influence the antimicrobial activity of macrophages. We have included these results in the revised manuscript (lines150-153).

      (4) For animal experiments, how many times was the experiment repeated? Can the animal experiment be done with b-alanine supplementation and panD mutant? Can the liver be stained to detect the bacteria?

      Thank you for the comment.

      i) Mouse infection assays were conducted twice, with at least 2 mice (n ≥ 2) in each injection group. The combined data from the two experiments was used for statistical analysis. This information has been added to the revised manuscript. (lines 678-681).

      ii) As suggested, mice infected with the panD mutant (Δ_panD_) were administered β-alanine (500 mg/kg/day, Behav Brain Res. 2014, 272:131-40; Physiol Behav. 2015, 145:29-37) orally on a daily basis. On the third day post-infection, the bacterial burden in the liver and spleen and the body weight of the infected mice were measured. The results indicated that administering β-alanine to mice did not affect the bacterial burden of ΔpanD in the liver and spleen nor did it influence the body weight of the infected mice (please refer to Author response image 1 below). It has been reported that β-alanine is a rate-limiting precursor for the biosynthesis of carnosine in mammals (Med Sci Sports Exerc. 2010, 42(6):1162-73; Neurochem Int. 2010, 57(3):177-88). Following supplementation, β-alanine may be rapidly synthesized into carnosine in mice, and the free β-alanine, particularly that which enters the macrophages of the liver and spleen, may be limited and insufficient to enhance Salmonella replication.

      Author response image 1.

      iii) Through immunofluorescence staining, we have investigated the bacterial count of Salmonella wild-type (WT), panD mutant (Δ_panD_), and complemented strain (c_panD_) within the macrophages of the mouse liver. The findings indicate that the number of Δ_panD_ in each liver macrophage was significantly (P < 0.0001) lower than that of WT, and the complementation of Δ_panD_ increased the bacterial count in each liver macrophage to the level of WT (Figure 3E in the revised manuscript). These results have been included in the revised manuscript. (lines 234-239).

      Reviewer #3 (Public review):

      Summary:

      Salmonella is interesting due to its life within a compact compartment, which we call SCV or Salmonella containing vacuole in the field of Salmonella. SCV is a tight-fitting vacuole where the acquisition of nutrients is a key factor by Salmonella. The authors among many nutrients, focussed on beta-alanine. It is also known from many other studies that Salmonella requires beta-alanine. The authors have done in vitro RAW macrophage infection assays and In vivo mouse infection assays to see the life of Salmonella in the presence of beta-alanine. They concluded by comprehending that beta-alanine modulates the expression of many genes including zinc transporters which are required for pathogenesis.

      Strengths:

      This study made a couple of knockouts in Salmonella and did a transcriptomic investigation to understand the global gene expression pattern.

      Weaknesses:

      The following questions are unanswered:

      (1) It is not clear how the exogenous beta-alanine is taken up by macrophages.

      We thank the reviewer for the question. It has been reported that β-alanine is transported into eukaryotic cells via the TauT (SLC6A6) and PAT1 (SLC36A1) transporters (Acta Physiol (Oxf). 2015, 213(1):191-212; Am J Physiol Cell Physiol. 2020 Apr 1;318(4):C777-C786; Biochim Biophys Acta. 1994, 1194(1):44-52.).

      (2) It is not clear how the Beta-alanine from the cytosol of the macrophage enters the SCV.

      According to the published report, translocation of SPI2 effector proteins induces the formation of specific tubular membrane compartments extend from the SCV, known as Salmonella-induced filaments (SIFs) (Traffic. 2001, 2(9):643-53; Traffic. 2007, 8(3):212-25; Traffic. 2008, 9(12):2100-16; Microbiology (Reading). 2012, 158(Pt 5):1147-1161). The membranes and lumens of both SIFs and SCVs form a continuous network, allowing vacuolar Salmonella to access various types of endocytosed materials (Front Cell Infect Microbiol. 2021, 11:624650; Cell Host Microbe. 2017, 21(3):390-402). We hypothesize that β-alanine may enter SCVs from the cytoplasm of macrophages via SIFs. This information has been included in the revised manuscript (lines 56-61).

      (3) It is not clear how the beta-alanine from SCV enters the bacterial cytosol.

      Thank you for the question. We have attempted to identify the transporter of β-alanine in Salmonella, but we found that the CycA transporter, which transports β-alanine in Escherichia coli, does not function in the same manner in Salmonella, despite Salmonella being closely related to E. coli.

      BasC is a bacterial LAT (L-Amino acid transporter) with an APC fold (J Gen Physiol. 2019, 151(4):505-517). The basC gene is reported to be present in the genomes of Pseudomonas, Acinetobacter, and Aeromonas, etc. Following your suggestion, we searched the genome of Salmonella Typhimurium at NCBI and did not find any basC gene or genes with a sequence similar to basC. Unfortunately, we have yet to identify the β-alanine transporter in Salmonella, and we will persist in our search in future work.

      (4) There is no clarity on the utilization of exogenous beta-alanine of the host and the de novo synthesis of beta-alanine by panD of Salmonella.

      Thank you for the comment. Our findings indicated that β-alanine levels were reduced in Salmonella-infected RAW264.7 cells. Furthermore, the addition of β-alanine to the culture medium (RPMI) of RAW264.7 cells significantly enhanced Salmonella replication, suggesting that the intracellular Salmonella utilize host-derived β-alanine for their growth. However, to date, we have not identified the transporter responsible for the uptake of exogenous β-alanine into the Salmonella cytosol.

      Moreover, we have discovered that the replication of the Salmonella panD mutant within macrophages and its virulence in mice are significantly reduced compared to the wild type (WT), indicating that the de novo synthesis of β-alanine is crucial for Salmonella's intracellular replication and virulence.

      These results indicate that either acquisition from the host or de novo synthesis of β-alanine is critical for Salmonella replication inside macrophages.

      Reviewer #3 (Recommendations for the authors):

      Cite this paper from 1985, which talks about the role of beta-alanine in Salmonella infection J Gen Microbiol,. 1985 May;131(5):1083-90. doi: 10.1099/00221287-131-5-1083. A Salmonella typhimurium strain defective in uracil catabolism and beta-alanine synthesis, T P West, T W Traut, M S Shanley, G A O'Donovan

      We have now cited this paper in the revised manuscript (lines 82-83).

      (2) BasC- can be important for beta-alanine transport. CycA transporter was not found to be involved in beta-alanine. However, it is important to find out which transporter is required for the uptake of beta-alaine.

      Thank you for pointing it out. We agree that it is important to determine which transporter is necessary for the uptake of β-alanine in Salmonella. BasC is a bacterial LAT (L-Amino acid transporter) with an APC fold (J Gen Physiol. 2019, 151(4):505-517). The basC gene is reported to be present in the genomes of Pseudomonas, Acinetobacter, and Aeromonas, etc. Following your suggestion, we searched the genome of Salmonella Typhimurium at NCBI and did not find any basC gene or genes with a sequence similar to basC. Unfortunately, we have yet to identify the β-alanine transporter in Salmonella, and we will persist in our search in future work.

      (3) Bacteria being quite stringent with its energy resources, it is unlikely that it will use de novo synthesis if the host resources are available. Only if the host resources are depleted, can it turn on the de novo synthesis involving panD. What is the status of fold-replication of panD mutant in the presence of exogenous addition of beta-alanine?

      Thank you for the comment. The addition of 1 to 4 mM of β-alanine increased the replication of the panD mutant (Δ_panD_) in RAW264.7 cells by 1.7- to 3.1-fold. This increase in Salmonella intracellular replication was dose-dependent, as shown in Figure 2H of the revised manuscript, further illustrating that host-derived β-alanine promotes Salmonella replication inside macrophages.

      We agree that bacteria are quite stringent with their energy resources. The results of this work indicate that either acquisition from the host or de novo synthesis of β-alanine is critical for Salmonella replication inside macrophages. We speculate that Salmonella relies on a large amount of β-alanine to efficiently replicate in macrophages, thereby highlighting the importance of β-alanine for Salmonella intracellular growth. We have discussed this issue in the revised manuscript. (lines 392-396).

      (4) 100% survival of animals infected with panD mutant is a bit of concern. What happens when beta-alanine is fed to mice and infected with panD mutant?

      Thank you for the comment. As suggested, mice infected with the panD mutant (ΔpanD) were administered β-alanine (500 mg/kg/day, as reported in Behav Brain Res. 2014, 272:131-40; Physiol Behav. 2015, 145:29-37) orally on a daily basis. On the third day post-infection, the bacterial load in the liver and spleen, as well as the body weight of the infected mice, were measured. The results indicated that administering β-alanine did not affect the bacterial load of Δ_panD_ in the liver and spleen nor did it influence the body weight of the infected mice (refer to Author response image 1). It has been reported that β-alanine is a rate-limiting precursor for the biosynthesis of carnosine in mammals (Med Sci Sports Exerc. 2010, 42(6):1162-73; Neurochem Int. 2010, 57(3):177-88). Following supplementation, β-alanine may be rapidly converted into carnosine in mice, and the free β-alanine, particularly that which enters the macrophages of the liver and spleen, may be limited and insufficient to enhance Salmonella replication.

      (5) How does beta-alanine from macrophages' cytosol enter the SCV.

      Thank you for pointing it out. According to published reports, the translocation of SPI2 effectors triggers the formation of specialized tubular membrane compartments, known as Salmonella-induced filaments (SIFs), which extend from the SCV (Traffic. 2001, 2(9):643-53; Traffic. 2007, 8(3):212-25; Traffic. 2008, 9(12):2100-16; Microbiology. 2012, 158:1147-1161). The membranes and lumens of SIFs and SCVs create a continuous network, allowing vacuolar Salmonella to access various types of endocytosed materials (Front Cell Infect Microbiol. 2021, 11:624650; Cell Host Microbe. 2017, 21(3):390-402). Consequently, it is plausible that β-alanine enters SCVs from the macrophage cytoplasm via SIFs. This information has been included in the revised manuscript.(lines 56-61).

      (6) It would be essential to dissect the role of exogenous beta-alanine and the use of de novo synthesized beta-alanine.

      We agree that it is essential to dissect the role of exogenous β-alanine and the use of de novo synthesized β-alanine. Our results indicate that Salmonella-infected macrophages exhibited lower levels of β-alanine compared to mock-infected macrophages. Furthermore, β-alanine supplementation in the cell medium enhanced Salmonella replication within macrophages in a dose-dependent manner, revealing that Salmonella utilizes host-derived β-alanine to promote intracellular replication. Additionally, a deficiency in the biosynthesis of β-alanine, resulting from mutation of the rate-limiting gene panD, led to reduced Salmonella replication in macrophages and systemic infection in mice. This suggests that Salmonella also employs bacterial-derived β-alanine to enhance intracellular replication and pathogenicity.

      We sought to identify the main transporters responsible for β-alanine uptake in Salmonella. Unfortunately, we have not yet found the transporter. We will address this issue in our future work.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public Review): 

      Summary: 

      This manuscript presents a method to infer causality between two genes (and potentially proteins or other molecules) based on the non-genetic fluctuations among cells using a version of the dual-reporter assay as a causal control, where one half of the dual-reporter pair is causally decoupled, as it is inactive. The authors propose a statistical invariant identity to formalize this idea. 

      We thank the referee for this summary of our work. 

      Strengths: 

      The paper outlines a theoretical formalism, which, if experimentally used, can be useful in causal network inference, which is a great need in the study of biological systems. 

      We thank the referee for highlighting the potential value of our proposed method.

      Weaknesses: 

      The practical utility of this method may not be straightforward and potentially be quite difficult to execute. Additionally, further investigations are needed to provide evidence of the broad applicability of the method to naturally occurring systems and its scalability beyond the simple circuit in which it is experimentally demonstrated. 

      We agree with these two points and have rewritten the manuscript, in particular highlighting the considerable future work that remains to be done to establish the broad applicability and scalability of our method.

      In the rewritten manuscript we explicitly spell out potential practical issues and we explicitly state that our presented proof–of–principle feasibility study does not guarantee that our method will successfully work in systems beyond the narrowly sampled test circuits. This helps readers to clearly distinguish between what we claim to have done from what remains to be done. The re-written parts and additional clarifications are:

      Abstract (p. 1), Introduction (p. 1-2), Sec. “Proposed additional tests” (p. 8), and “Limitations of this study” (p. 10).

      Reviewer #2 (Public Review): 

      Summary: 

      This paper describes a new approach to detecting directed causal interactions between two genes without directly perturbing either gene. To check whether gene X influences gene Z, a reporter gene (Y) is engineered into the cell in such a way that (1) Y is under the same transcriptional control as X, and (2) Y does not influence Z. Then, under the null hypothesis that X does not affect Z, the authors derive an equation that describes the relationship between the covariance of X and Z and the covariance of Y and Z. Violation of this relationship can then be used to detect causality. 

      The authors benchmark their approach experimentally in several synthetic circuits. In four positive control circuits, X is a TetR-YFP fusion protein that represses Z, which is an RFP reporter. The proposed approach detected the repression interaction in two or three of the positive control circuits. The authors constructed sixteen negative control circuit designs in which X was again TetR-YFP, but where Z was either a constitutively expressed reporter or simply the cellular growth rate. The proposed method detected a causal effect in one of the eight negative controls, which the authors argue is not a false positive, but due to an unexpected causal effect. Overall, the data support the practical usefulness of the proposed approach. 

      We thank the referee for their summary of our work.

      Strengths: 

      The idea of a "no-causality control" in the context of detected directed gene interactions is a valuable conceptual advance that could potentially see play in a variety of settings where perturbation-based causality detection experiments are made difficult by practical considerations. 

      By proving their mathematical result in the context of a continuous-time Markov chain, the authors use a more realistic model of the cell than, for instance, a set of deterministic ordinary differential equations. 

      We thank the referee for summarizing the value of our work. 

      Caveats: 

      The term "causally" is used in the main-text statement of the central theorem (Eq 2) without a definition of this term. This makes it difficult to fully understand the statement of the paper's central theorem without diving into the supplement.  

      We thank the referee for this suggestion. In the revised manuscript we now define causal effects right before the statement of the main theorem of the main text (p. 2). We have also added a definition of the causal network arrows in the caption of Fig. 1 to help readers better understand our central claim.

      The basic argument of theorem 1 appears to rely on establishing that x(t) and y(t) are independent of their initial conditions. Yet, there appear to be some scenarios where this property breaks down: 

      (1) Theorem 1 does not seem to hold in the edge case where R=beta=W=0, meaning that the components of interest do not vary with time, or perhaps vary in time only due to measurement noise. In this case x(t), y(t), and z(t) depend on x(0), y(0), and z(0). Since the distributions of x(0), y(0), and z(0) are unspecified, a counterexample to the theorem may be readily constructed by manipulating the covariance matrix of x(0), y(0), and z(0). 

      (2) A similar problem may occur when transition probabilities decay with time. For example, suppose that again R=0 and X are degraded by a protease (B), but this protease is subject to its own first-order degradation. The deterministic version of this situation can be written, for example, dx/dt=-bx and db/dt=-b. In this system, x(t) approaches x(0)exp(-b(0)) for large t. Thus, as above, x(t) depends on x(0). If similar dynamics apply to the Y and Z genes, we can make all genes depend on their initial conditions, thus producing a pathology analogous to the above example. 

      The reviewer does not know when such examples may occur in (bio)physical systems. Nevertheless, since one of the advantages of mathematics is the ability to correctly identify the domain of validity for a claim, the present work would be strengthened by "building a fence" around these edge cases, either by identifying the comprehensive set of such edge cases and explicitly prohibiting them in a stated assumption set, or by pointing out how the existing assumptions already exclude them.  

      We thank the referee for bringing to our attention these edge cases that indeed violate our theorem as stated. In the revised manuscript we have “built a fence” around these edge cases by adding two requirements to the premise of our theorem: First, we have added the requirement that the degradation rate does not decay to zero for any possible realization. That is, if beta(t) is the degradation rate of X and Y for a particular cell over time, then taking the time average of beta(t) over all time must be non-zero. Second, we have added the requirement that the system has evolved for enough time such that the dual reporter averages <x> and <y>, along with the covariances Cov(x, z_{k}) and Cov(y, z_{k}) have reached a time-independent stationary state.  

      With these requirements, no assumptions need to be made about the initial conditions of the system, because any differences in the initial conditions will decay away as the system reaches stationarity. For instance, the referee’s example (1) is not possible with these requirements because beta(t) can no longer remain zero. Additionally, example (2) is no longer possible because the time average of the degradation rate would be zero, which is no longer allowed (i.e., we would have that integral from 0 to T of b(0)exp(-t)/T dt =  0 when T goes to infinity). 

      Note that adding the condition that degradation cannot decay to exactly zero does not reduce the biological applicability of the theorem. But as the referee correctly points out any mathematical theorem needs to be accurately stated and stand on its own regardless of whether biological systems could realize particular edge cases. Also note, that the requirement that the cellular ensemble has reached a time-independent distribution of cell-to-cell variability can be (approximately) experimentally verified by taking snapshots of ensemble variability at two sufficiently separate different moments in time. 

      In response to the referee’s comment, we have added the above requirements when stating the theorem in the main text. We have also added the requirement of non-decay of the degradation rate to the definition of the system in SI Sec. 4, along with the stationarity requirement in theorem 1 in SI Sec 5. We have also added mathematical details to the proof of the invariant in SI Sec 5.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      This manuscript presents a method to infer causality between two genes (and potentially proteins or other molecules) based on the non-genetic fluctuations among cells using a version of the dual-reporter assay as a causal control, where one half of the dual-reporter pair is causally decoupled, as it is inactive. The authors propose a statistical invariant identity to formalize this idea. They propose and experimentally demonstrate the utility of this idea with a synthetic reporter system in bacteria. 

      The paper is well written and clearly outlines the principle, the mathematical invariant relationship both to give the reader an intuitive understanding of why the relationship must be true and in their mathematical derivation of the proof of Theorem 1. 

      The paper outlines a theoretical formalism, which, if experimentally used, can be useful in causal network inference, which is a great need in the study of biological systems. However, the practical utility of this method may not be straightforward and potentially be quite difficult to execute. We think this work could offer a platform to advance the field of network inference, but would encourage the authors to address the following comments. 

      We thank the reviewer for the positive comments on readability, summarizing the value of our work, as well as the critical comments below that helped us improve the manuscript.

      Major comments: 

      (1) Although the invariant identity seems theoretically sound, the data from synthetic engineered circuits in this manuscript do not support that the invariant holds for natural causal relations between genes in wild-type cells. In all the positive control synthetic circuits (numbers 1 to 4) the target gene Z i.e. RFP was always on the plasmid, and in circuit #4 there was an additional endogenous copy. The authors recapitulate the X-to-Z causality in circuits 1, 2, and 3 but not 4. Ultimately, the utility of this method lies in the ability to capture causality from endogenous correlations, this observation suggests that the method might not be useful for that task. 

      We thank the referee for their careful reading of our synthetic circuits and sincerely apologize for an error in our description of circuit #4 in the schematic of Table S2 of the supplement. We incorrectly stated that this circuit contained a chromosomally expressed RFP. In fact, in circuit #4 RFP was only on the plasmid just like in the circuits #1-3. We have corrected the schematic in the revised manuscript and have verified that the other circuits are correctly depicted.

      In the revised manuscript, we now explicitly spell out that all our “positive control” test cases had the genes of interest expressed on plasmids, and that we have not shown that our method successfully detected causal interactions in a chromosomally encoded gene regulatory circuit, see additional statements in Sec. “Causally connected genes that break the invariant” on p. 6. 

      In the absence of any explicit experimental evidence, it is then important to consider whether chromosomally encoded circuits are expected to cause problems for our method which is based on a fluctuation test. Due to plasmid copy number fluctuations, X and Z will fluctuate significantly more when expressed on plasmids than when expressed chromosomally. However, because this additional variability is shared between X and Z it does not help our analysis which relies on stochastic differences in X and Z expression due to “intrinsic noise” effects downstream of copy number fluctuations. The additional “extrinsic noise” fluctuations due to plasmid copy number variability would wash out violations of Eq. (2) rather than amplify them. If anything, we thus expect our test cases to have been harder to analyze than endogenous fluctuations. This theoretical expectation is indeed borne out by numerical test cases presented in the revised supplement where plasmid copy fluctuations severely reduced the violations of Eq. 2, see new additional SI Sec. 15. 

      Additionally, the case of the outlier circuit (number 12) suggests that exogenous expression of certain genes may lead to an imbalance of natural stoichiometry and lead to indirect effects on target genes which can be misinterpreted as causal relations. Knocking out the endogenous copy may potentially ameliorate this issue but that remains to be tested. 

      We agree with the referee that the expression of exogenous genetic reporters can potentially affect cellular physiology and lead to undesired effects. In the revised manuscript we now explicitly spell out that the metabolic burden or the phototoxicity of introducing fluorescent proteins could in principle cause artificial interactions that do not correspond to the natural gene regulatory network, see Sec. “Proposed additional tests” on p. 8.

      However, it is also important to consider that the test circuit #12 represents a synthetic circuit with genes that were expressed at extremely high levels (discussed in 3rd paragraph of Sec. “Evidence that RpoS mediated stress response affected cellular growth in the outlier circuit”, p. 8), which led to the presumed cellular burden. Arguably, natural systems would not typically exhibit such high expression levels, but importantly even if they did, our method does not necessarily rely on fluorescently tagged proteins but can, in principle, also be applied to other methods such as transcript counting through sequencing or in-situ hybridization of fluorescent probes.  

      Ultimately, the value of this manuscript will be greatly elevated if the authors successfully demonstrate the recapitulation of some known naturally existing causal and non-causal relations. For this, the authors can choose any endogenous gene Z that is causally controlled by gene X. The gene X can be on the exogenous plasmid along with the reporter and the shared promoter. Same for another gene Z' which is not causally controlled by gene X. Potentially a knockout of endogenous X may be required but it might depend  on what genes are chosen. 

      If the authors think the above experiments are outside the scope of this manuscript, they should at least address these issues and comment on how this method could be effectively used by other labs to deduce causal relations between their favorite genes. 

      Because a full analysis of naturally occurring gene interactions was beyond the scope of our work, we agree with the referee’s suggestion to add a section to discuss the limitations of our experimental results. In the revised manuscript we reiterate that additional investigations are needed to show that the method works to detect causal interactions between endogenous genes, see Abstract (p. 1), Introduction (p. 1-2), Sec. “Proposed additional tests” (p. 8), and “Limitations of this study”  (p. 9). In the original manuscript we explicitly spelled out how other researchers can potentially carry out this further work in the subsections titled “Transcriptional dual reporters” (p. 3) and ”Translational dual reporters” (p. 3).  In the revised manuscript, we have added a section “Proposed additional tests” (p. 8) in which we propose an experiment analogous to the one proposed by the referee above, involving an endogenous gene circuit found in E. coli, as an example to test our invariant. 

      (2) For a theoretical exposition that is convincing, we suggest the authors simulate a larger network (for instance, a network with >10 nodes), like the one shown schematically in Figure 1, and demonstrate that the invariant relationship holds for the causally disconnected entities, but is violated for the causally related entities. It would also be interesting to see if any quantification for the casual distance between "X" and the different causally related entities could be inferred.  

      We thank the referee for this suggestion. We have added SI Sec. 14 where we present simulation results of a larger network with 10 nodes. We find that all of the components not affected by X satisfy Eq. (2) as they must. However, it is important to consider that we have analytically proven the invariant of Eq. (2) for all possible systems. It provably applies equally to networks with 5, 100, or 10,000 components. The main purpose of the simulations presented in Fig. (2) is to illustrate our results and to show that correlation coefficients do not satisfy such an invariant. However, they are not used as a proof of our mathematical statements.

      We thank the referee for the interesting suggestion of quantifying a “causal distance”. Unfortunately, the degree to which Eq. (2) is violated cannot directly equate to an absolute measure for the “causal distance” of an interaction. This is because both the strength of the interaction and the size of the stochastic fluctuations in X affect the degree to which Eq. (2) is violated. The distance from the line should thus be interpreted as a lower bound on the causal effect from X to Z because we do not know the magnitude of stochastic effects inherent to the expression of the dual reporters X and Y. While the dual reporters X and Y are identically regulated, they will differ due to stochastic fluctuations. Propagation of these fluctuations from X to Z are what creates an asymmetry between the normalized covariances. In the most extreme example, if X and Y do not exhibit any stochastic fluctuations we have x(t)=y(t) for all times and Eq. (2) will not be violated even in the presence of a strong causal link from X to Z.

      However, it might be possible to infer a relative causal distance to compare causal interactions within cells.

      That is, in a given network, the normalized covariances between X, Y and two other components of interest Z1, Z2 that are affected by X can be compared. If the asymmetry between (η𝑥𝑧1 , η𝑦𝑧1) is larger than the asymmetry between (η𝑥𝑧2 , η𝑦𝑧2) , then we might be able to conclude that X affects Z1 with a stronger interaction than the interaction from X to Z2, because here the intrinsic fluctuations in X are the same in both cases. 

      In response to the referee’s comment and to test the idea of a relative causal distance, we have simulated a larger network made of 10 components. In this network, X affects a cascade of components called Z8, Z9, and Z10, see the additional SI Sec. 14. Here the idea of a causal distance can be defined as the distance down the cascade: Z8 is closest to X and so has the largest causal strength, whereas Z10 has the weakest. Indeed, simulating this system we find that the asymmetry between η𝑥𝑧8 and η𝑦𝑧8 is the largest whereas that between  η𝑥𝑧10 and η𝑦𝑧10 the smallest. We also find that all of the components not affected by X have normalized covariances that satisfy Eq. (2). This result suggests that the relative causal distance or strength in a network could potentially be estimated from the degree of the violations of Eq. (2). 

      However, we note that these are preliminary results. In the case of the specific regulatory cascade now considered in SI Sec. 14, the idea of a causal distance can be well defined. Once feedback is introduced into the system, this definition may no longer make sense. For instance, consider the same network that we simulate in SI Sec. 14, but where the most downstream component in the cascade, Z10, feeds back and affects X and Y. In such a circuit it is unclear whether Z8 or Z10 is “causally closer” to X. A more thorough theoretical analysis, equipped with a more universal quantitative definition for causal distance or strength, would be needed to deduce what information can be inferred from the relative distances in the violations of Eq. (2). While this defines an interesting research question, answering it goes beyond the scope of the current manuscript. 

      Minor comments: 

      - The method relies on the gene X and the reporter Y having the same control which would result in similar dynamics. The authors do not quantitatively compare the YFP and CFP expression if this indeed holds for the synthetic circuits. It would be useful to know how much deviation between the two can be tolerated while not affecting the outcome. 

      We thank the referee for their comment. The invariant of Eq. (2) is indeed only guaranteed to hold only when the transcription rate of Y is proportional to that of X. How much levels of X and Y covary depends on the stochastic effects intrinsic to the expression of the dual reporters as well as how similar the transcriptional control of X and Y is. The stochastic difference between X and Y is exactly what we exploit. 

      However, in the limit of high YFP and CFP levels, intrinsic fluctuations that cause stochastic expression differences between X and Y become negligible and we can directly infer whether they are indeed tightly co-regulated from time-traces: Below, we show two single cell traces taken with our experimental setup in which the YFP and CFP fluorescence trajectories are almost exactly proportional. Both of these traces are from circuit #10 as defined in Table. S4. 

      Author response image 1.

      We chose the above traces because they showed the highest correlation between YFP and CFP levels. Other traces for lower expression levels have lower correlations due to effects of intrinsic noise (see Tables S2-S4). However, the existence of one trace in which YFP is almost perfectly proportional to CFP throughout can only occur if the YFP and CFP genes are under the same control. And, since the control of YFP and CFP genes in all of our synthetic circuits are identical (with the same promoters and plasmid positions), these data strongly suggest that our dual reporters are tightly co-regulated in all the synthetic circuits. Moreover, the negative control experiments presented in Fig. 3E provide a natural consistency check that the YFP and CFP are under the same control and satisfy Eq. (1).

      We agree that it would be useful to know how much the X and Y production rates can differ for Eq. (2) to hold. Importantly, our proven theorem already allows for the rates to differ by an unspecified proportionality constant. In response to the referee’s comment we have derived a more general condition under which our approach holds. In the newly added SI Sec. 7 we prove that Eq. (2) holds also when rates differ as long as the difference is stochastic in nature with an average of zero. We also prove that Eq. (2) holds in the face of multiplicative noise that is independent of the X and Y production rates.

      However, the production rates of X and Y cannot differ in all ways. Some types of differences between the X and Y production rates can lead to deviations of Eq. (2) even when there is no causal interaction. To highlight this, we added the results of simulations of a toy model in which the X and Y production rates differ by an additive noise term that does not average to zero, see Fig. S19B of the newly added SI Sec. 7.

      - The invariant should potentially hold true for any biological species that are causally related e.g. protein-protein interactions. Also, this method could potentially find many applications in eukaryotic cells. Although it's outside the scope of current work to experimentally demonstrate such applications, the authors should comment on experimental strategies to apply this method to overcome potential pitfalls (e.g. presence of enhancers in eukaryotic cells). 

      We thank the referee for this suggestion. We agree that there are potential pitfalls that could come into effect when our proposed approach is applied on more complex systems such as eukaryotic gene expression. In response to the referee’s comment, we have added an explicit discussion of these potential pitfalls in the discussion section “Limitations of this study” (see p. 10). 

      In particular, in eukaryotes there are many genes in which promoter sequences may not be the sole factor determining transcription rates. Other factors that can be involved in gene regulation include the presence of enhancers, epigenetic modifications, and bursts in gene expression, to name a few. We thus propose a few strategies, which include positioning the passive reporter at a similar gene loci as the gene of interest, measuring the gene regulation activities of the gene of interest and its passive reporter using a separate method, and exploiting the invariant with a third gene, where it is known there is no causal interaction, as a consistency check. In addition, we include in the SI a new section SI Sec. 8 which shows that the invariant holds in the face of many types of bursty gene expression dynamics.

      However, the above is not a comprehensive list. Some of the issues the referee mentions are serious and may not be straightforward to overcome. We now spell this out explicitly in the revised manuscript (p. 10). 

      - In the legend of Fig. 1, the sentence "Data points here are for..." is missing a few words, or needs to be rephrased. 

      We thank the referee for this comment. We have rewritten the figure caption, which now reads “Data points are numerical simulations of specific example networks (see SI for details) to illustrate the analytically proven theorem of Eq. 2.”

      - Fig. 2 talks about the uncertainties associated with each point on the scatter plots. However, it is difficult to understand the quantification in such a plot. It would be great to have a plot quantifying the uncertainties in the invariant relation for the different topologies studied, specifically in order to understand if one topology is consistently deviating more from the x=y line than the other topologies studied here.  

      We thank the referee for this suggestion. In the supplement of the revised manuscript we have added supplemental Figs. S3, S4, and  S5 to separately quantify the uncertainty of the difference processes plotted in Fig. 2 and have added a new section (SI Sec. 11) to discuss the processes simulated in Fig. 2 in more detail. In short, each simulated process generated less than ~5% of outliers when considering 95% confidence intervals (with the max percentage deviation being 5.01% for process 5, see Fig. S5). These outliers were then simulated over a larger number of simulations to reduce the sampling error, which resulted in 0% of outliers (see Sec. “Confidence intervals for finite sampling error” on Materials and Methods on p. 11). Some simulated processes generated larger percentage errors in the normalized covariances than others, but this is expected as different processes have different dynamics which will result in different degrees of sampling of the underlying distributions.

      Note, that the invariant of Eq. 2 is analytically proven for all tested topologies as none of the topologies include a causal effect from X to Z. Any deviation of the numerical data from the straight line prediction of Eq. 2 (right column in Fig. 2C) is due to the finite sampling of a stochastic process to estimate the true covariance from the sampling covariance. Any given parameter set was simulated several times which allowed us to estimate the sampling error from differences in between repeated samples. In the additional SI figures we now quantify this error for the different topologies. 

      In addition to the above changes we want to highlight that the purpose of the simulations presented in Fig. (2) is not to prove our statements or explore the behavior of different topologies. The purpose of the data presented in the right column of Fig. 2C is to illustrate the theoretical invariant and act as a numerical sanity check of our analytically proven result. In contrast, the data in the left column of Fig 2C illustrates that the correlations do not satisfy an invariant like Eq. 2 which applies to covariances but not correlations.  

      - The legend for Fig. 3 seems to end abruptly. There likely needs to be more.  

      We thank the referee for catching this mistake. We have corrected the accidentally truncated figure caption of Fig. 3.

      - There is a typo in equation (5.3) on page 23 of supplementary material, there should be x instead of y in the degradation equation of x. 

      We thank the referee for catching this mistake which has been corrected in the revised manuscript.

      - In the supplemental material, to understand the unexpected novel discovery of causality, Figure S5 is presented. However, this doesn't give the context for other negative controls designed, and the effect of rfp dynamics (which can be seen in the plots both in the main paper and the supplement) in the growth rate of cells in those constructs. As a baseline, it would be nice to have those figures.  

      We thank the referee for this suggestion. We have now included representative RFP traces with the growth rates for other negative control circuits, see Fig. S10. In addition, we have now included the cross correlation functions between RFP and growth rate in these negative control circuits, see Fig. S10A. While in all cases, RFP and growth rate are negatively correlated, the outlier circuit exhibits the largest negative correlation.

      The suggested comparison of the referee thus highlights that – in isolation – a negative correlation between RFP and growth rate is only weak evidence for our hypothesized causal interaction because negative correlations can result from the effect of growth rate affecting volume dilution and thus RFP concentration. Crucially, we thus additionally considered the overall variability of growth rate and found the outlier circuit has the largest growth rate variability which is indicative of something that is affecting the growth rate of those cells, see Fig. S10B. To compare the magnitude of RFP variability against other strains requires constraining the comparison group to other synthetic circuits that have RFP located on the chromosome rather than a plasmid. This is why we compare the CV of the outlier with the CV of circuit #5, which corresponds to the “regular” repressilator (i.e., the outlier circuit without the endogenous lacI gene). As an additional comparison, we computed the CV for a strain of E. coli that does not contain a synthetic plasmid at all, but still contains the RFP gene on the chromosome. We find that the CVs in the outlier circuit to be larger than in these two additional circuits, suggesting that the outlier circuit causes additional fluctuations in the RFP and growth rate. We now spell this out explicitly in the revised manuscript (see Sec. “Evidence that RpoS mediated stress response affected cellular growth in the outlier circuit“, p. 8).

      The referee is correct that the above arguments are only circumstantial evidence, but they do show that the data is consistent with a plausible explanation of the hypothesized causal interaction. Our main evidence for an RpoS mediated stress response that explains the deviations from Eq. 2 in the outlier circuit is the perturbation experiment in which the deviation disappears for the RpoS knockout strain. We now spell out this argument explicitly in the revised manuscript (see Sec. “Evidence that RpoS mediated stress response affected cellular growth in the outlier circuit“, p. 8).

      Reviewer #2 (Recommendations For The Authors): 

      The proof of theorem 1 relies on an earlier result, lemma 1. Lemma 1 only guarantees the existence of a "dummy" system that satisfies the separation requirement and preserves the dynamics of X and Y. However, in principle, it may be possible to maintain the dynamics of X and Y while still changing the relationship between Cov(X,Zk) and Cov(Y,Zk). This could occur if the dynamics of Zk differ in a particular way between the original system and the dummy system. So lemma 1 needs to be a little stronger- it needs  to mention that the dynamics of Zk are preserved, or something along these lines. The proof of lemma 1 appears to contain the necessary ingredients for what is actually needed, but this should be clarified. 

      We agree with the referee that this is an important distinction. Lemma 1 does in fact guarantee that any component Zk that is not affected by X and Y will have the same dynamics in the “dummy” system. However, as the referee points out, this is not stated in the lemma statement nor in the proof of the lemma. In response to the referee’s comment, we have made it clear in the lemma statement that the Zk dynamics are preserved in the “dummy” system, and we have also added details to the proof to show that this is the case, see Lemma 1 on p. 27 of the SI. 

      Readers who are familiar with chemical reaction diagrams, but not birth-death process diagrams may waste some time trying to interpret Equation 1 as a chemical reaction diagram with some sort of rate constant as a label on each arrow (I did this). It may be helpful to either provide a self-contained definition of the notation used, or mention a source where the necessary definitions can be found. 

      We agree with the referee. In the revised manuscript we have added a description of the notation used below Equation 1 of the main text, see p. 2. The notational overloading of the “arrow notation” is a perennial problem in the field and we thank the referee for reminding us of the need to clarify what the arrows mean in our diagrams.

      It would be helpful if the authors could propose a rule for deciding whether dependence is detected or not. As it stands presently, the output of the approach seems to be a chart like that in Figure 3D where you show eta_xz and eta_yz with confidence interval bars and the reader must visually assess whether the points more-or-less fall on the line of unity. It would be better to have some systematic procedure for making a "yes or no" call as to whether a causal link was detected or not. Having a systematic detection rule would allow you to make a call as to whether dependence in circuit 3 was detected or not. It would also allow you or a future effort to evaluate the true positive rate of the approach in simulated settings. 

      We thank the referee for this suggestion. In the revised manuscript we have added an explicit rule for detecting causality using the invariant of Eq. (2). Specifically, Eq. (2) can be re-written as r = 1 where r is the covariability ratio r = etaXZ/etaYZ. In that case, given 95% confidence intervals for the experimentally determined covariability ratio r, we say that there is a causal interaction if the confidence intervals overlap with the value of r = 1. 

      This corresponds to a null hypothesis test at the 2.5% significance level. The reason that it is at 2.5% significance and not 5% significance is as follows. Let’s say we measure a covariability ratio of r_m, and the 95% confidence interval is [r_m - e_m, r_m + e_m] for some error e_m. Without loss of generality, let’s say that r_m > 1 (the same applies if r_m < 1). This means that Prob(r < r_m - e_m) = 2.5% and Prob(r > r_m + e_m) = 2.5% , where r is the actual value of the covariability ratio. Under the null hypothesis that there is no causal interaction, we set r = 1. However, we now have Prob(1 < r_m + e_m) = 0, because we know that r_m > 1 and so we must have r_m + e_m > 1. The probability that the value of 1 falls outside the error bars is therefore 2.5% under the null hypothesis. 

      This proposed rule is the same rule that we used to detect statistical outliers in our simulations, where we found a “false positive” rate of 2.3% over 6522 simulated systems due to statistical sampling error (as discussed in the Materials and Methods section). In response to the referee’s suggestion, we have added the section “A rule for detecting causality in the face of measurement uncertainty” (p. 4). We also apply the rule to the experimental data and find that the rule detects 2/4 causal interactions in Fig. 3D. We have clarified this in the Fig. 3D caption, in the main text, and we have added a figure in the SI (Fig. S2) where we apply the null hypothesis test on the measured covariability ratios. 

      Note, whether the third interaction is “detected” or not depends on the cut-off value used. We picked the most common 95% rule to be consistent with the traditional statistical approaches. With this rule one of the data points lies right at the cusp of detection, but ultimately falls into the “undetected” category if a strictly binary answer is sought under the above rule. 

      It would be helpful to mention what happens when the abundance of a species hits zero. Specifically, there are two ways to interpret the arrow from X to X+d with a W on top: 

      Interpretation (1): 

      P(X+d | X) = W if X+d {greater than or equal to} 0  P(X+d | X) = 0 if X_i+d_i < 0 for at least one i 

      Interpretation (2): 

      P(X+d | X) = W regardless of whether X+d < 0  W = 0 whenever X_i < d_i for at least one i 

      Interpretation (1) corresponds to a graph where the states are indexed on the non-negative integers. Interpretation (2) corresponds to a graph where the states are indexed on the integers (positive or negative), and W is responsible for enforcing the non-negativity of mass. I believe you need the second interpretation because the first interpretation leads to problems with your definition of causality. For example, consider the reaction: 

      (Na, K) -- 0.1 --> (Na-1, K+1) 

      This could occur if Na and K are the intracellular concentrations of sodium and potassium ions in a cell that has an ATP-driven sodium-potassium exchanger whose rate is limited by the frequency with which extracellular potassium ions happen to flow by. Per the definition of causality found in the appendix, Na has no causal effect on K since Na does not show up in the reaction rate term. However, under interpretation (1), Na clearly has a causal effect on K according to a reasonable definition of causality because if Na=0, then the reaction cannot proceed, whereas if Na>0 then it can. However, under interpretation (2), the reaction above cannot exist and so this scenario is excluded. 

      We thank the referee for this comment that helped us clarify the meaning of arrows with propensities. In short, interpretation (2) corresponds to the definition of our stochastic systems. This is consistent with the standard notation used for the chemical master equation. As the referee points out, because molecular abundances cannot be negative, any biochemical system must then have the property that the propensity of a reaction must be equal to zero when the system is in a state in which an occurrence of that reaction would take one of the abundances to negative numbers. Stochastic networks that do not have this property cannot correspond to biochemical reaction networks.

      In the revised manuscript, we now spell this out explicitly to avoid any confusion, see SI page 25.

      Furthermore, we additionally discuss the referee’s example in which the rate of exchanging Na for K through an ion exchanger is approximately independent of the intracellular Na concentration. Because biochemical systems cannot become negative, it cannot be that the rate is truly constant, but at some point for low concentrations must go down until it becomes exactly zero for zero molecules. 

      Importantly, agreement with Eq. (2) does not imply that there is no causal effect from X to Zk. It is the deviation from Eq. (2) that implies the existence of a causal effect from X to Zk. Therefore, although the above referee’s example would constitute a causal interaction in our framework, it would not lead to a deviation of Eq. (2) because the fluctuations in Na (which we exploit) do not propagate to K. From a practical point of view, our method thus detects whether changing X over the observed range affects the production and degradation rates of Zk. 

      In the course of setting up the negative control benchmark circuits, a perturbation-based causal validation would be nice. For instance, first, verify that X does not affect Z by intervening on X (e.g. changing its copy number or putting it under the control of an inducible promoter), and ensuring that Z's activity is not affected by such interventions upon X. This approach would help to adjudicate questions of whether the negative control circuits actually have an unknown causal link. The existing benchmark is already reasonably solid in my view, and I do not know how feasible this would be with the authors' setup, but I think that a perturbation-based validation could in principle be the gold standard benchmark.  

      We agree that additional perturbation-based validation tests on all of the negative control circuits would indeed improve the evidence that our method worked as advertised. While such experiments are indeed beyond the scope of our current work we now explicitly point out the benefits of such additional controls in the revised Discussion.

      Below is a series of comments about typography, mostly about section 4 of the supplement. 

      We thank the referee for their careful reading and highlighting those mistakes.

      At the bottom of page 21, Z_aff is defined as the set of components that are affected by X. However, later Z_aff seems to refer to components affected by X or Y. For instance, in the proof of lemma 1, it is written "However, because a is part of z_aff, the {ak} variables must be affected by X and/or Y." 

      We thank the referee for catching this mistake. We have changed the definition of Z_aff throughout the supplement to refer to components affected by X or Y. If it can be experimentally ensured that Y is a passive reporter (i.e., it does not affect other components in the cell), then the theorem can only be violated if X affects Z. 

      In the equation following Eq 5.2, W_k and d_k should be W_i and d_i ?  

      Yes, the referee is correct. In the revised manuscript we have corrected W_k and d_k to W_i and d_i. 

      In Eq 5.3 in the lower-left transition diagram, I think a "y" should be an "x". 

      Yes, the referee is correct. In the revised manuscript  we have fixed this typo.

      In the master equation above Eq 5.5, the "R" terms for the y reactions are missing the alpha term, and I think two of the beta terms need to be multiplied by x and y respectively.  

      The referee is correct. In the revised manuscript  we have fixed this typo.

      The notation of Eq 5.8, where z_k(t) is the conditional expectation of z_kt, is strange and difficult to follow. Why does z_k(t) not get a bar over it like its counterparts for x, y, R, and beta? The bars, although not a perfect solution, do help.  

      We agree with the referee’s comment and have added further explanations to define the averages in question, see SI p. 28. In short, when we condition on the history of the components not affected by X or Y, we in effect condition on the time trajectories of z_{k} (when it is part of the components not affected by X and/or Y) and beta (since it only depends on the components not affected by X or Y). We thus previously did not include the bars when taking the averages of these components in the conditional space because the conditioning in effect sets their time-trajectories (so they become deterministic functions of time). In the revised manuscript we now also denote these conditional expectations with bars and we have added comments to the proof to clarify their definition.

      I think it would be helpful to show how the relationship <x>=<y>/alpha is obtained from Eq 5.5.  

      We agree with this suggestion and have added the derivations, see Eqs. (5.9) - (5.13) in the revised SI. 

      In the main text, the legend of Fig 3 cuts off mid-sentence.  

      We thank the referee for catching this mistake which has been fixed in the revised manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      The manuscript by Rios et al. investigates the potential of GSK3 inhibition to reprogram human macrophages, exploring its therapeutic implications in conditions like severe COVID-19. The authors present convincing evidence that GSK3 inhibition shifts macrophage phenotypes from pro-inflammatory to anti-inflammatory states, thus highlighting the GSK3-MAFB axis as a potential therapeutic target. Using both GM-CSF- and M-CSF-dependent monocyte-derived macrophages as model systems, the study provides extensive transcriptional, phenotypic, and functional characterizations of these reprogrammed cells. The authors further extend their findings to human alveolar macrophages derived from patient samples, demonstrating the clinical relevance of GSK3 inhibition in macrophage biology.

      The experimental design is sound, leveraging techniques such as RNA-seq, flow cytometry, and bioenergetic profiling to generate a comprehensive dataset. The study's integration of multiple model systems and human samples strengthens its impact and relevance. The findings not only offer insights into macrophage plasticity but also propose novel therapeutic strategies for macrophage reprogramming in inflammatory diseases.

      Strengths:

      (1) Robust Experimental Design: The use of both in vitro and ex vivo models adds depth to the findings, making the conclusions applicable to both experimental and clinical settings.

      (2) Thorough Data Analysis: The extensive use of RNA-seq and gene set enrichment analysis (GSEA) provides a clear transcriptional signature of the reprogrammed macrophages.

      (3) Relevance to Severe COVID-19: The study's focus on macrophage reprogramming in the context of severe COVID-19 adds clinical significance, especially given the relevance of macrophage-driven inflammation in this disease.

      Weaknesses:

      There are no significant weaknesses in the study, though some minor points could be addressed for clarity and completeness, as outlined in the recommendations below.

      Many thanks for these comments. Please find below the response to the  specific recommendations.

      Recommendations for the authors:

      (1) In lines 263-266, the term "MoMac-VERSE" and its associated clusters are introduced without sufficient explanation. The authors should provide additional clarification on what these clusters represent and how they were derived.

      We have revised the text according to the reviewer´s suggestion and followed the original nomenclature of the MoMac-VERSE monocyte/macrophage clusters, also recognizing the procedure for their identification. The newly modified text now states: "Thus, analysis of the MoMac-VERSE (a resource that identified conserved monocyte and macrophage states derived from healthy and pathologic human tissues) (GSE178209) (2), indicated that GSK3 inhibition augments the expression of the gene sets that define MoMac-VERSE subsets identified as long-term resident macrophages [Cluster HES1_Mac (#2)] and tumor-associated macrophages with an M2-like signature [Clusters HES1_Mac (#2), TREM2_Mac (#3), C1Q<sup>hi</sup>_Mac (#16) and FTL_Mac (#17)] (2) (Figure 1H)."

      (2) In line 283, the reference labeled "2227" appears incorrect. It seems to be a formatting issue, and it might refer to references 22-27. Please verify and correct.

      All wrongly formatted references throughout the manuscript have been checked and corrected.

      (3) In line 353, the reference is incorrect. Please reviewe ensure that all references are properly cited throughout the manuscript.

      All wrongly formatted references throughout the manuscript have been checked and corrected.

      (4) In line 368, one of the patient samples shows a decreased IL-10 response after CHIR treatment. The authors should acknowledge the heterogeneity in the primary cell responses and adjust the conclusion accordingly to reflect this variability.

      We have modified the text following the reviewer´s comment, and acknowledge the heterogeneity in the production of IL-10 after GSK3 inhibition in the three analyzed samples. The modified text now states: "Consistent with these findings, CHIR-AMØ exhibited higher expression of MAFB (Figure 6F) whose increase correlated with an augmented secretion of Legumain, CCL2 and IL-10 (Figure 6G), although the latter was only seen in two samples, probably reflecting heterogeneity in primary cell responses."

      (5) Figure 7B: the UMAP shows 4 populations, but according to the visualization in the sup fig 3, there should be many more clusters. How do the authors explain this? Are these patient-specific clusters? Also, IMs can be separated into at least subpopulations. Can the authors plot also bona fide macrophage markers expressed by all subpopulations?

      To clarify this whole issue, and avoid misleading visualization of donor-specific clusters (see below), we have now replaced all UMAP plots shown in the previous version (in old Figure 7 and old Supplementary Figure 3) with new UMAP plots after running scVI reduction. In addition, we are including a new Supplementary Figure (new Supplementary Figure 3) that contains the information of the 21310 single-cell transcriptomes from human lungs reported in GSE128033 (ref. 47) after filtering and integration [nFeature > 200 and < 6000; Unique Molecular Identifiers (nCount) > 1000) and % of mitochondrial genes (< 15 %)]. Besides, old Supplementary Figure 3 has been replaced by the new Supplementary Figure 4, which includes the information of the single-cell transcriptomes from human lung macrophages selected from GSE128033 (ref. 47) based on their expression of the monocyte/macrophage-associated markers CD163, FABP4, LYVE1 or FCN1.

      Addresing the first question, UMAPs in old Figure 7B and old Supplementary Figure 3B had a different  number of clusters because old Figure 7B was derived from old Supplementary Figure 3B after grouping macrophage clusters according to the expression of previously defined markers and to limit the weight of donor-specific clusters. Specifically, the macrophage clusters from old Figure 7B were re-grouped according to the differential expression of:

      - FCN1 (including cluster 4, 7 and 12 from Figure 7B): Infiltrating monocytes.

      - FABP4 and TYMS-negative (including clusters 0, 2, 5 and 13 from Figure 7B), or MARCO and INHBA (cluster 9 from Figure 7B) or PPARG (cluster 11 from Figure 7B): Alveolar macrophages (AMØ).

      - TYMS, MKI67, TOP2A and NUSAP1 (cluster 15 from Figure 7B): Proliferating AMØ.

      - LYVE1 or RNASE1 or LGMN (including clusters 1, 3, 6, 8, 10 and 14 from Figure 7B): Interstitial Macrophages (IMØ).

      As the reviewer suggested, this type of UMAP plot yielded a large number of donor-specific clusters. To avoid such a misleading representation, we have now plotted UMAPs after running scVI reduction in every case. The new plots are now shown in new Figure 7A, new Figure 7B, new Supplementary Figure 3 (containing the information of the 21310 single-cell transcriptomes from GSE128033) and the novel Supplementary Figure 4 (with the information of the single-cell transcriptomes from human lung macrophages from GSE128033).

      Finally, to address the last issue, we have now plotted the expression of genes used for macrophage definition (CD163, FABP4, LYVE1, FCN1), as well as proliferation-associated genes (TYMS, MKI67, TOP2A, NUSAP1) and other bona fide macrophage marker genes (SPI1, FOLR2) in Supplementary Figure 4C.

      (6) statistics should be indicated in every figure legend and for every subfigure where applicable.

      We have now included the specific statistical procedure applied for each Figure and panel.

      Reviewer 2 (Public review):

      The study by Rios and colleagues provides the scientific community with a compelling exploration of macrophage plasticity and its potential as a therapeutic target. By focusing on the GSK3-MAFB axis, the authors present a strong case for macrophage reprogramming as a strategy to combat inflammatory and fibrotic diseases, including severe COVID-19. Using a robust and comprehensive methodology, in this study it is conducted a broad transcriptomic and functional analyses and offers valuable mechanistic insights while highlighting its clinical relevance

      Strengths:

      Well performed and analyzed

      Weaknesses:

      Additional analyses, including mechanistic studies, would increase the value of the study

      In an effort to address the comment of the reviewer, we have performed more detailed analysis of the kinetics and dose-response effects of GSK3 inhibition, which are now provided as new Supplementary Figure 3A.

      Regarding additional mechanistic studies, we decided to explore the relationship between inactive GSK3β and MAFB levels at the early stages of M-CSF- or GM-CSF-driven monocyte-to-macrophage differentiation. These experiments, performed in three independent monocyte preparations, indicated that, 48 hours along differentiation, M-CSF promoted a huge increase in both MAFB expression and a slight (albeit significant) rise in inactive GSK3β (P-Ser9-GSK3β) (compared to either untreated or GM-CSF-treated monocytes), further supporting the macrophage re-programming effect of GSK3. However, since the M-CSF-promoted increase in MAFB levels was much robust than the enhancement in inactive GSK3β, we hypothesize that proteasomal degradation of MAFB might be also distinct between M-CSF- (M-MØ) and GM-CSF-dependent (GM-MØ) monocyte-derived macrophages.

      Author response image 1.

      Total GSK3β, p-Ser9-GSK3β and MAFB levels in three preparations of freshly purified monocytes either unstimulated (-) or stimulated with M-CSF (10 ng/ml) or GM-CSF (1,000 U/ml) at different time points, as determined by Western blot (upper panel). Vinculin protein levels were determined as protein loading control. Mean ± SEM of the GSK3β/Vinculin, p-Ser9-GSK3β/Vinculin, and MAFB/Vinculin protein ratios from the three independent experiments are shown (lower panel) (paired Student’s t test: *, p<0.05; ****, p<0.001).

      Based on this finding, we then determined proteasome activity in fully differentiated M-CSF- and GM-CSF-dependent monocyte-derived macrophages. Use of the Immunoproteasome Activity Fluorometric Assay Kit II (UBPBio) in M-MØ and GM-MØ, either untreated or exposed to the proteasome inhibitor MG132, revealed that immune-proteasomal and proteasomal activity is significantly stronger in GM-MØ than in M-MØ,  as demonstrated in assays for chymotrypsin-like (ANW) and branched amino acid preferring (PAL) activity (immunoproteasome), and trypsin-like (KQL) activity (both proteasome and immunoproteasome). This result suggested that, indeed, immunoproteasomal activity might contribute to the differential expression of MAFB in M-MØ and GM-MØ.

      Author response image 2.

      Immunoproteasome activity in M-MØ and GM-MØ, either untreated or exposed to MG132, as determined using the Immunoproteasome Activity Fluorometric Assay Kit II (UBPBio) on the three indicated peptides (upper panel).  Mean ± SEM of three independent experiments are shown (paired Student’s t test: *, p<0.05) (lower panel).

      Consequently, we next set up experiments to assess whether the proteasome inhibitor MG132 was capable of enhancing the expression of MAFB-dependent genes in GM-MØ. Preliminary results of GM-MØ exposure to MG132 for 6 hours indicated an increase in the expression of MAFB protein and the MAFB-dependent genes LGMN and IL10. , as well as a reduction in the expression of the GM-MØ-specific gene CD1C.

      Author response image 3.

      A. Schematic representation of the exposure of MG132 to GM-MØ for 6 hours. B. MAFB protein levels in four independent preparations of GM-MØ exposed to either DMSO (DMSO-GM-MØ) or the proteasome inhibitor MG132 (MG132-GM-MØ) for 6 hours, as determined by Western blot (left panel). GAPDH protein levels were determined as protein loading control. Mean ± SEM of the MAFB/GAPDH protein ratios from the four independent experiments are shown (right panel) (paired Student’s t test: ***, p<0.005). C. Relative mRNA levels of the indicated genes in DMSO-GM-MØ and MG132-GM-MØ, as determined by RT-PCR on seven independent samples (paired Student’s t test: ***, p<0.005; ****, p<0.001).

      Unfortunately, this proteasome inhibitor (MG-132) caused a great reduction in cell viability after 6-8 hours. Since a similar decrease in cell viability was observed upon analysis with the ONX-0914 immunoproteasome inhibitor, we could not procede any further with this approach.

      Given the reviewer´s suggestion to include mechanistic insights to the manuscript, we are now providing these results (and the corresponding figures) only for the reviewer´s information and to make clear our attempts to comply with his/her request.

      Recommendations for the authors:

      The results are of interest, and only some minor issues need to be addressed to strengthen the conclusions of the study.

      We gratefully thank the reviewer for his/her comments. 

      (1) This study employs a single dose of 10 μM of the GSK3 inhibitor CHIR-99021 for 48 hours, which is reasonable for in vitro studies. However, further investigation into the effect of different doses and exposure times could provide additional insight into optimal dosing and durability of reprogramming effects. In addition, would an alternative GSK3 inhibitors have comparable effects?

      Following the reviewer suggestion, we have performed a kinetics and dose-response analysis of the effects of CHIR-99021, using MAFB protein levels as a readout. This experiments is now shown in new Supplementary Figure 1A, that replaces the old Supplementary Figure 1A panel where a shorter kinetics was presented. Results of this new experiment indicates a maximal effect of 10µM CHIR-99021, and that the effect of the inhibitor becomes maximal 24-48 hours after treatment. The text has been modified accordingly, and it now states: "Kinetics and dose-response analysis of the effects of CHIR-99021 on MAFB expression showed that maximal protein levels were achieved after a 24-48 hour exposure to 10µM CHIR-99021 (Supplementary Figure 1A), conditions that were used hereafter."

      Regarding the use of alternative GSK3 inhibitors, we had already provided that information in Supplementary Figure 1B, where the effects of SB-216763 (10 µM) or LiCl (10 mM) were evaluated. The huge reversal of the Tyr<sup>216</sup>/Ser<sup>9</sup> GSK3β phosphorylation ratio observed with CHIR-99021 was not seen with other GSK3 inhibitors, as indicated in the text. In any event, we believe that the relevance of this result with SB-216763 or LiCl is minimized by the results generated after siRNA-mediated GSK3 knockdown (shown in Figure 4), that completely reproduced the effects seen with CHIR-99021.

      (2) Why in the "reanalysis of single cell RNAseq data" section, the authors use Seurat v5 (R) but then change to python, and the other way around?

      As indicated in the documentation for Integrative Analysis in Seurat v5 (https://satijalab.org/seurat/articles/seurat5_integration), scVIIntegration requires reticulate package which allow us to run Python environment in R.

      (3) When the authors refer to the clusters enriched in MoMacVERSE, they use the labels of the clusters (for example #2 or #3). I would suggest using the annotations described in the original paper, to link it to the bibliography published through the labels established in the paper.

      We have revised the text according to the reviewer´s suggestion and followed the original nomenclature of the MoMac-VERSE monocyte/macrophage clusters, also recognizing the procedure for their identification. The newly modified text now states: "Thus, analysis of the MoMac-VERSE (a resource that identified conserved monocyte and macrophage states derived from healthy and pathologic human tissues) (GSE178209) (2), indicated that GSK3 inhibition augments the expression of the gene sets that define MoMac-VERSE subsets identified as long-term resident macrophages [Cluster HES1_Mac (#2)] and tumor-associated macrophages with an M2-like signature [Clusters HES1_Mac (#2), TREM2_Mac (#3), C1Q<sup>hi</sup>_Mac (#16) and FTL_Mac (#17)] (2) (Figure 1H)."

      (4) In line 309. Is there any significance on the "having a stronger effect"?

      We apologize for the misleading sentence. The phrase has been modified for better clarity, and the text now states: "Like CHIR-99021, silencing of both GSK3A and GSK3B augmented the expression of MAFB, with the simultaneous silencing of both GSK3A and GSK3B genes having a stronger effect (Figure 4B), and modulated the expression of 329 genes (Figure 4C,D)."

      (5) In line 337, "(22)(27)", are these references?

      All wrongly formatted references throughout the manuscript have been checked and corrected.

      (6) In the single-cell reanalysis, could you please provide integration Qc plots? It would be interesting to have it on the paper.

      To clarify this whole issue, and avoid misleading visualization of donor-specific clusters (see below), we have now replaced all UMAP plots shown in the previous version (in old Figure 7 and old Supplementary Figure 3) with new UMAP plots after running scVI reduction. In addition, we are including a new Supplementary Figure (new Supplementary Figure 3) that contains the information of the 21310 single-cell transcriptomes from human lungs reported in GSE128033 (ref. 47) after filtering and integration [nFeature > 200 and < 6000; Unique Molecular Identifiers (nCount) > 1000) and % of mitochondrial genes (< 15 %)]. Besides, old Supplementary Figure 3 has been replaced by the new Supplementary Figure 4, which includes the information of the single-cell transcriptomes from human lung macrophages selected from GSE128033 (ref. 47) based on their expression of the monocyte/macrophage-associated markers CD163, FABP4, LYVE1 or FCN1.

      As requested by the reviewer, we are now providing the Qc plots for the re-analysis in the new Supplementary Figures 3 and 4.

    1. Author response:

      The following is the authors’ response to the original reviews

      Response to the Editors’ Comments

      Thankyou for this summary of the reviews and recommendations for corrections. We respond to each in turn, and have documented each correction with specific examples contained within our response to reviewers below.

      ‘They all recommend to clarify the link between hypotheses and analyses, ground them more clearly in, and conduct critical comparisons with existing literature, and address a potential multiple comparison problem.’

      We have restructured our introduction to include the relevant literature outlined by the reviewers, and to be more clearly ground the goals of our model and broader analysis. We have additionally corrected for multiple comparisons within our exploratory associative analyses. We have additionaly sign posted exploratory tests more clearly.

      ‘Furthermore, R1 also recommends to include a formal external validation of how the model parameters relate to participant behaviour, to correct an unjustified claim of causality between childhood adversity and separation of self, and to clarify role of therapy received by patients.’

      We have now tempered our language in the abstract which unintentionally implied causality in the associative analysis between childhood trauma and other-to-self generalisation. To note, in the sense that our models provide causal explanations for behaviour across all three phases of the task, we argue that our model comparison provides some causal evidence for algorithmic biases within the BPD phenotype. We have included further details of the exclusion and inclusion criteria of the BPD participants within the methods.

      R2 specifically recommends to clarify, in the introduction, the specific aim of the paper, what is known already, and the approach to addressing it.’

      We have more thoroughly outlined the current state of the art concerning behavioural and computational approaches to self insertion and social contagion, in health and within BPD. We have linked these more clearly to the aims of the work.

      ‘R2 also makes various additional recommendations regarding clarification of missing information about model comparison, fit statistics and group comparison of parameters from different models.’

      Our model comparison approach and algorithm are outlined within the original paper for Hierarchical Bayesian Model comparison (Piray et al., 2019). We have outlined the concepts of this approach in the methods. We have now additionally improved clarity by placing descriptions of this approach more obviously in the results, and added points of greater detail in the methods, such as which statistics for comparison we extracted on the group and individual level.

      In addition, in response to the need for greater comparison of parameters from different models, we have also hierarchically force-fitted the full suite of models (M1-M4) to all participants. We report all group differences from each model individually – assuming their explanation of the data - in Table S2. We have also demonstrated strong associations between parameters of equivalent meaning from different models to support our claims in Fig S11. Finally, we show minimal distortion to parameter estimates in between-group analysis when models are either fitted hierarchically to the entire population, or group wise (Figure S10).

      ‘R3 additionally recommends to clarify the clinical and cognitive process relevance of the experiment, and to consider the importance of the Phase 2 findings.’

      We have now included greater reference to the assumptions in the social value orientation paradigm we use in the introduction. We have also responded to the specific point about the shift in central tendencies in phase 2 from the BPD group, noting that, while BPD participants do indeed get more relatively competitive vs. CON participants, they remain strikingly neutral with respect to the overall statespace. Importantly, model M4 does not preclude more competitive distributions existing.

      ‘Critically, they also share a concern about analyzing parameter estimates fit separately to two groups, when the best-fitting model is not shared. They propose to resolve this by considering a model that can encompass the full dynamics of the entire sample.’

      We have hierarchically force-fitted the full suite of models (M1-M4) to all participants to allow for comparison between parameters within each model assumption. We report all group differences from each model individually – assuming their explanation of the data - in Table S2 and Table S3. We have also demonstrated strong associations between parameters of equivalent meaning from different models to support our claims in Fig S11. We also show minimal distortion to parameter estimates in between-group analysis when models are either fitted hierarchically to the entire population, or group wise (Figure S10).

      Within model M1 and M2, the parameters quantify the degree to which participants believe their partner to be different from themselves. Under M1 and M2 model assumptions, BPD participants have meaningfully larger versus CON (Fig S10), which supports the notion that a new central tendency may be more parsimonious in phase 2 (as in the case of the optimal model for BPD, M4). We also show strong correlations across models between under M1 and M2, and the shift in central tendenices of beliefs between phase 1 and 2 under M3 and M4. This supports our primary comparison, and shows that even under non-dominant model assumptions, parameters demonstrate that BPD participants expect their partner’s relative reward preferences to be vastly different from themselves versus CON.

      ‘A final important point concerns the psychometric individual difference analyses which seem to be conducted on the full sample without considering the group structure.’

      We have now more clearly focused our psychometric analysis. We control for multiple comparisons, and compare parameters across the same model (M3) when assessing the relationship between paranoia, trauma, trait mentalising, and social contagion. We have relegated all other exploratory analyses to the supplementary material and noted where p values survive correction using False Discovery Rate.

      Reviewer 1:

      ‘The manuscript's primary weakness relates to the number of comparisons conducted and a lack of clarity in how those comparisons relate to the authors' hypotheses. The authors specify a primary prediction about disruption to information generalization in social decision making & learning processes, and it is clear from the text how their 4 main models are supposed to test this hypothesis. With regards to any further analyses however (such as the correlations between multiple clinical scales and eight different model parameters, but also individual parameter comparisons between groups), this is less clear. I recommend the authors clearly link each test to a hypothesis by specifying, for each analysis, what their specific expectations for conducted comparisons are, so a reader can assess whether the results are/aren't in line with predictions. The number of conducted tests relating to a specific hypothesis also determines whether multiple comparison corrections are warranted or not. If comparisons are exploratory in nature, this should be explicitly stated.’

      We have now corrected for multiple comparisons when examining the relationship between psychometric findings and parameters, using partial correlations and bootstrapping for robustness. These latter analyses were indeed not preregistered, and so we have more clearly signposted that these tests were exploratory. We chose to focus on the influence of psychometrics of interest on social contagion under model M3 given that this model explained a reasonable minority of behaviour in each group. We have now fully edited this section in the main text in response, and relegated all other correlations to the supplementary materials.

      ‘Furthermore, the authors present some measures for external validation of the models, including comparison between reaction times and belief shifts, and correlations between model predicted accuracy and behavioural accuracy/total scores. However it would be great to see some more formal external validation of how the model parameters relate to participant behaviour, e.g., the correlation between the number of pro-social choices and ß-values, or the correlation between the change in absolute number of pro-social choices and the change in ß. From comparing the behavioural and computational results it looks like they would correlate highly, but it would be nice to see this formally confirmed.’

      We have included this further examination within the Generative Accuracy and Recovery section:

      ‘We also assessed the relationship (Pearson rs) between modelled participant preference parameters in phase 1 and actual choice behaviour: was negatively correlated with prosocial versus competitive choices (r=-0.77, p<0.001) and individualistic versus competitive choices (r=-0.59, p<0.001); was positively correlated with individualistic versus competitive choices (r=0.53, p<0.001) and negatively correlated with prosocial versus individualistic choices (r=-0.69, p<0.001).’

      ‘The statement in the abstract that 'Overall, the findings provide a clear explanation of how self-other generalisation constrains and assists learning, how childhood adversity disrupts this through separation of internalised beliefs' makes an unjustified claim of causality between childhood adversity and separation of self - and other beliefs, although the authors only present correlations. I recommend this should be rephrased to reflect the correlational nature of the results.’

      Sorry – this was unfortunate wording: we did not intend to imply causation with our second clause in the sentence mentioned. We have amended the language to make it clear this relationship is associative:

      ‘Overall, the findings provide a clear explanation of how self-other generalisation constrains and assists learning, how childhood adversity is associated with separation of internalised beliefs, and makes clear causal predictions about the mechanisms of social information generalisation under uncertainty.’

      ‘Currently, from the discussion the findings seem relevant in explaining certain aberrant social learning and -decision making processes in BPD. However, I would like to see a more thorough discussion about the practical relevance of their findings in light of their observation of comparable prediction accuracy between the two groups.’

      We have included a new paragraph in the discussion to address this:

      ‘Notably, despite differing strategies, those with BPD achieved similar accuracy to CON participants in predicting their partners. All participants were more concerned with relative versus absolute reward; only those with BPD changed their strategy based on this focus. Practically this difference in BPD is captured either through disintegrated priors with a new median (M4) or very noisy, but integrated priors over partners (M1) if we assume M1 can account for the full population. In either case, the algorithm underlying the computational goal for BPD participants is far higher in entropy and emphasises a less stable or reliable process of inference. In future work, it would be important to assess this mechanism alongside momentary assessments of mood to understand whether more entropic learning processes contribute to distressing mood fluctuation.’

      ‘Relatedly, the authors mention that a primary focus of mentalization based therapy for BPD is 'restoring a stable sense of self' and 'differentiating the self from the other'. These goals are very reminiscent of the findings of the current study that individuals with BPD show lower uncertainty over their own and relative reward preferences, and that they are less susceptible to social contagion. Could the observed group differences therefore be a result of therapy rather than adverse early life experiences?’

      This is something that we wish to explore in further work. While verbal and model descriptions appear parsimonious, this is not straight forward. As we see, clinical observation and phenomenological dynamics may not necessarily match in an intuitive way to parameters of interest. It may be that compartmentalisation of self and other – as we see in BPD participants within our data – may counter-intuitively express as a less stable self. The evolutionary mechanisms that make social insertion and contagion enduring may also be the same that foster trust and learning.

      ‘Regarding partner similarity: It was unclear to me why the authors chose partners that were 50% similar when it would be at least equally interesting to investigate self-insertion and social contagion with those that are more than 50% different to ourselves? Do the authors have any assumptions or even data that shows the results still hold for situations with lower than 50% similarity?’

      While our task algorithm had a high probability to match individuals who were approximately 50% different with respect to their observed behaviour, there was variation either side of this value. The value of 50% median difference was chosen for two reasons: 1. We wanted to ensure participants had to learn about their partner to some degree relative to their own preferences and 2. we did not want to induce extreme over or under familiarity given the (now replicated) relationship between participant-partner similarity and intentional attributions (see below). Nevertheless, we did have some variation around the 50% median. Figure 3A in the top left panel demonstrates this fluctuation in participant-partner similarity and the figure legend further described this distribution (mean = 49%, sd = 12%). In future work we want to more closely manipulate the median similarity between participants and partners to understand how this facilitates or inhibits learning and generalisation.

      There is some analysis of the relationship between degrees of similiarity and behaviour. In the third paragraph of page 15 we report the influence of participant-partner similarity on reaction times. In prior work (Barnby et al., 2022; Cognition) we had shown that similarity was associated with reduced attributions of harm about a partner, irrespective of their true parameters (e.g. whether they were prosocial/competitive). We replicate this previous finding with a double dissociation illustrated in Figure 4, showing that greater discrepancies in participant-partner prosociality increases explicit harmful intent attributions (but not self-interest), and discrepancies in participant-partner individualism reduces explicit self-interest attributions (but not harmful intent). We have made these clearer in our results structure, and included FDR correction values for multiple comparisons.

      The methods section is rather dense and at least I found it difficult to keep track of the many different findings. I recommend the authors reduce the density by moving some of the secondary analyses in the supplementary materials, or alternatively, to provide an overall summary of all presented findings at the end of the Results section.

      We have now moved several of our exploratory findings into the supplementary materials, noteably the analysis of participant-partner similarity on reaction times (Fig S9), as well as the uncorrected correlation between parameters (Fig S7).

      Fig 2C) and Discussion p. 21: What do the authors mean by 'more sensitive updates'? more sensitive to what?

      We have now edited the wording to specify ‘more belief updating’ rather than ‘sensitive’ to be clearer in our language.

      P14 bottom: please specify what is meant by axial differences.

      We have changed this to ‘preference type’ rather than using the term ‘axial’.

      It may be helpful to have Supplementary Figure 1 in the main text.

      Thank you for this suggestion. Given the volume of information in the main text we hope that it is acceptable for Figure S1 to remain in the supplementary materials.

      Figure 3D bottom panel: what is the difference between left and right plots? Should one of them be alpha not beta?

      The left and right plots are of the change in standard deviation (left) and central tendency (right) of participant preference change between phase 1 and 3. This is currently noted in the figure legend, but we had added some text to be clearer that this is over prosocial-competitive beliefs specifically. We chose to use this belief as an example given the centrality of prosocial-comeptitive beliefs in the learning process in Figure 2. We also noticed a small labelling error in the bottom panels of 3D which should have noted that each plot was either with respect to the precision or mean-shift in beliefs during phase 3.

      ‘The relationship between uncertainty over the self and uncertainty over the other with respect to the change in the precision (left) and median-shift (right) in phase 3 prosocial-competitive beliefs .’

      Supplementary Figure 4: The prior presented does not look neutral to me, but rather right-leaning, so competitive, and therefore does indeed look like it was influenced by the self-model? If I am mistaken please could the authors explain why.

      This example distribution is taken from a single BPD participant. In this case, indeed, the prior is somewhat right-shifted. However, on a group level, priors over the partner were closely centred around 0 (see reported statistics in paragraph 2 under the heading ‘Phase 2 – BPD Participants Use Disintegrated and Neutral Priors). However, we understand how this may come across as misleading. For clarity we have expanded upon Figure S4 to include the phase 1 and prior phase 2 distributions for the entire BPD population for both prosocial and individualistic beliefs. This further demonstrates that those with BPD held surprisingly neutral beliefs over the expectations about their partners’ prosociality, but had minor shifts between their own individualistic preferences and the expected individualistic preferences of their partners. This is also visible in Figure S2.

      Reviewer 2:

      ‘There are two major weaknesses. First, the paper lacks focus and clarity. The introduction is rather vague and, after reading it, I remained confused about the paper's aims. Rather than relying on specific predictions, the analysis is exploratory. This implies that it is hard to keep track, and to understand the significance, of the many findings that are reported.’

      Thank you for this opportunity to be clearer in our framing of the paper. While the model makes specific causal predictions with respect to behavioural dynamics conditional on algorithmic differences, our other analyses were indeed exploratory. We did not preregister this work but now given the intriguing findings we intent to preregister our future analyses.

      We have made our introduction clearer with respect to the aims of the paper:

      ‘Our present work sought to achieve two primary goals: 1. Extend prior causal computational theories to formalise the interrelation between self-insertion and social contagion within an economic paradigm, the Intentions Game and 2., Test how a diagnosis of BPD may relate to deficits in these forms of generalisation. We propose a computational theory with testable predictions to begin addressing this question. To foreshadow our results, we found that healthy participants employ a mixed process of self-insertion and contagion to predict and align with the beliefs of their partners. In contrast, individuals with BPD exhibit distinct, disintegrated representations of self and other, despite showing similar average accuracy in their learning about partners. Our model and data suggest that the previously observed computational characteristics in BPD, such as reduced self-anchoring during ambiguous learning and a relative impermeability of the self, arise from the failure of information about others to transfer to and inform the self. By integrating separate computational findings, we provide a foundational model and a concise, dynamic paradigm to investigate uncertainty, generalization, and regulation in social interactions.’

      ‘Second, although the computational approach employed is clever and sophisticated, there is important information missing about model comparison which ultimately makes some of the results hard to assess from the perspective of the reader.’

      Our model comparison employed what is state of the art random-effects Bayesian model comparison (Piray et al., 2019; PLOS Comp. Biol.). It initially fits each individual to each model using Laplace approximation, and subsequently ‘races’ each model against each other on the group level and individual level through hierarchical constraints and random-effect considerations. We included this in the methods but have now expanded on the descrpition we used to compare models:

      In the results -

      ‘All computational models were fitted using a Hierarchical Bayesian Inference (HBI) algorithm which allows hierarchical parameter estimation while assuming random effects for group and individual model responsibility (Piray et al., 2019; see Methods for more information). We report individual and group-level model responsibility, in addition to protected exceedance probabilities between-groups to assess model dominance.’

      We added to our existing description in the methods –

      ‘All computational models were fitted using a Hierarchical Bayesian Inference (HBI) algorithm which allows hierarchical parameter estimation while assuming random effects for group and individual model responsibility (Piray et al., 2019). During fitting we added a small noise floor to distributions (2.22e<sup>-16</sup>) before normalisation for numerical stability. Parameters were estimated using the HBI in untransformed space drawing from broad priors (μM\=0, σ<sup>2</sup><sub>M</sub> = 6.5; where M\={M1, M2, M3, M4}). This process was run independently for each group. Parameters were transformed into model-relevant space for analysis. All models and hierarchical fitting was implemented in Matlab (Version R2022B). All other analyses were conducted in R (version 4.3.3; arm64 build) running on Mac OS (Ventura 13.0). We extracted individual and group level responsibilities, as well as the protected exceedance probability to assess model dominance per group.’

      (1) P3, third paragraph: please define self-insertion

      We have now more clearly defined this in the prior paragraph when introducing concepts.

      ‘To reduce uncertainty about others, theories of the relational self (Anderson & Chen, 2002) suggest that people have availble to them an extensive and well-grounded representation of themselves, leading to a readily accessible initial belief (Allport, 1924; Kreuger & Clement, 1994) that can be projected or integrated when learning about others (self-insertion).’

      (2) Introduction: the specific aim of the paper should be clarified - at the moment, it is rather vague. The authors write: "However, critical questions remain: How do humans adjudicate between self-insertion and contagion during interaction to manage interpersonal generalization? Does the uncertainty in self-other beliefs affect their generalizability? How can disruptions in interpersonal exchange during sensitive developmental periods (e.g., childhood maltreatment) inform models of psychiatric disorders?". Which of these questions is the focus of the paper? And how does the paper aim at addressing it?

      (3) Relatedly, from the introduction it is not clear whether the goal is to develop a theory of self-insertion and social contagion and test it empirically, or whether it is to study these processes in BPD, or both (or something else). Clarifying which specific question(s) is addressed is important (also clarifying what we already know about that specific question, and how the paper aims at elucidating that specific question).

      We have now included our specific aims of the paper. We note this in the above response to the reviwers general comments.

      (4) "Computational models have probed social processes in BPD, linking the BPD phenotype to a potential over-reliance on social versus internal cues (Henco et al., 2020), 'splitting' of social latent states that encode beliefs about others (Story et al., 2023), negative appraisal of interpersonal experiences with heightened self-blame (Mancinelli et al., 2024), inaccurate inferences about others' irritability (Hula et al., 2018), and reduced belief adaptation in social learning contexts (Siegel et al., 2020). Previous studies have typically overlooked how self and other are represented in tandem, prompting further investigation into why any of these BPD phenotypes manifest." Not clear what the link between the first and second sentence is. Does it mean that previous computational models have focused exclusively on how other people are represented in BPD, and not on how the self is represented? Please spell this out.

      Thank you for the opportunity to be clearer in our language. We have now spelled out our point more precisely, and included some extra relevant literature helpfully pointed out by another reviewer.

      ‘Computational models have probed social processes in BPD, although almost exclusively during observational learning. The BPD phenotype has been associated with a potential over-reliance on social versus internal cues (Henco et al., 2020), ‘splitting’ of social latent states that encode beliefs about others (Story et al., 2023), negative appraisal of interpersonal experiences with heightened self-blame (Mancinelli et al., 2024), inaccurate inferences about others’ irritability (Hula et al., 2018), and reduced belief adaptation in social learning contexts (Siegel et al., 2020). Associative models have also been adapted to characterize  ‘leaky’ self-other reinforcement learning (Ereira et al., 2018), finding that those with BPD overgeneralize (leak updates) about themselves to others (Story et al., 2024). Altogether, there is currently a gap in the direct causal link between insertion, contagion, and learning (in)stability.’

      (5) P5, first paragraph. The description of the task used in phase 1 should be more detailed. The essential information for understanding the task is missing.

      We have updated this section to point toward Figure 1 and the Methods where the details of the task are more clearly outlined. We hope that it is acceptable not to explain the full task at this point for brevity and to not interrupt the flow of the results.

      “Detailed descriptions of the task can be found in the methods section and Figure 1.’

      (6) P5, second paragraph: briefly state how the Psychometric data were acquired (e.g., self-report).

      We have now clarified this in the text.

      ‘All participants also self-reported their trait paranoia, childhood trauma, trust beliefs, and trait mentalizing (see methods).’

      (7) "For example, a participant could make prosocial (self=5; other=5) versus individualistic (self=10; other=5) choices, or prosocial (self=10; other=10) versus competitive (self=10; other=5) choices". Not sure what criteria are used for distinguishing between individualistic and competitive - they look the same?

      Sorry. This paragraph was not clear that the issue is that the interpretation of the choice depends on both members of the pair of options. Here, in one pair {(self=5,other=5) vs (self=10,other=5)}, it is highly pro-social for the self to choose (5,5), sacrificing 5 points for the sake of equality. In the second pair {(self=10,other=10) vs (self=10,other=5)}, it is highly competitive to choose (10,5), denying the other 5 points at no benefit to the self. We have clarified this:

      ‘We analyzed the ‘types’ of choices participants made in each phase (Supplementary Table 1). The interpretation of a participant’s choice depends on both values in a choice. For example, a participant could make prosocial (self=5; other=5) versus individualistic (self=10; other=5) choices, or prosocial (self=10; other=10) versus competitive (self=10; other=5) choices. There were 12 of each pair in phases 1 and 3 (individualistic vs. prosocial; prosocial vs. competitive; individualistic vs. competitive).’  

      (8) "In phase 1, both CON and BPD participants made prosocial choices over competitive choices with similar frequency (CON=9.67[3.62]; BPD=9.60[3.57])" please report t-test - the same applies also various times below.

      We have now included the t test statistics with each instance.

      ‘In phase 3, both CON and BPD participants continued to make equally frequent prosocial versus competitive choices (CON=9.15[3.91]; BPD=9.38[3.31]; t=-0.54, p=0.59); CON participants continued to make significantly less prosocial versus individualistic choices (CON=2.03[3.45]; BPD=3.78 [4.16]; t=2.31, p=0.02). Both groups chose equally frequent individualistic versus competitive choices (CON=10.91[2.40]; BPD=10.18[2.72]; t=-0.49, p=0.62).’

      (9) P 9: "Models M2 and M3 allow for either self-insertion or social contagion to occur independently" what's the difference between M2 and M3?

      Model M2 hypothesises that participants use their own self representation as priors when learning about the other in phase 2, but are not influenced by their partner. M3 hypothesises that participants form an uncoupled prior (no self-insertion) about their partner in phase 2, and their choices in phase 3 are influenced by observing their partner in phase 2 (social contagion). In Figure 1 we illustrate the difference between M2 and M3. In Table 1 we specifically report the parameterisation differences between M2 and M3. We have also now included a correlational analysis of parameters between models to demonstrate the relationship between model parameters of equivalent value between models (Fig S11). We have also force fitted all models (M1-M4) to the data independently and reported group differences within each (see Table S2 and Table S3).

      (10) P 9, last paragraph: I did not understand the description of the Beta model.

      The beta model is outlined in detail in Table 1. We have also clarified the description of the beta model on page 9:

      ‘The ‘Beta model’ is equivalent to M1 in its causal architecture (both self-insertion and social contagion are hypothesized to occur) but differs in richness: it accommodates the possibility that participants might only consider a single dimension of relative reward allocation, which is typically emphasized in previous studies (e.g., Hula et al., 2018).’

      (11) P 9: I wonder whether one could think about more intuitive labels for the models, rather than M1, M2 etc.. This is just a suggestion, as I am not sure a short label would be feasible here.

      Thank you for this suggestion. We apologise that it is not very intitutive. The problem is that given the various terms we use to explain the different processes of generalisation that might occur between self and other, and given that each model is a different combination of each, we felt that numbering them was a lesser evil. We hope that the reader will be able to reference both Figure 1 and Table 1 to get a good feel for how the models and their causal implications differ.

      (12) Model comparison: the information about what was done for model comparison is scant, and little about fit statistics is reported. At the moment, it is hard for a reader to assess the results of the model comparison analysis.

      Model comparison and fitting was conducted using simultaneous hierarchical fitting and random-effects comparison. This is employed through the HBI package (Piray et al., 2019) where the assumptions and fitting proceedures are outlined in great detail. In short, our comparison allows for individual and group-level hierarchical fitting and comparison. This overcomes the issue of interdependence between and within model fitting within a population, which is often estimated separately.

      We have outlined this in the methods, although appreciate we do not touch upon it until the reader reaches that point. We have added a clarification statement on page 9 to rectify this:

      ‘All computational models were fitted using a Hierarchical Bayesian Inference (HBI) algorithm which allows hierarchical parameter estimation while assuming random effects for group and individual model responsibility (Piray et al., 2019; see Methods for more information). We report individual and group-level model responsibility, in addition to protected exceedance probabilities between-groups to assess model dominance.’

      (13) P 14, first paragraph: "BPD participants were also more certain about both types of preference" what are the two types of preferences?

      The two types of preferences are relative (prosocial-competitive) and absolute (individualistic) reward utility. These are expressed as b and a respectively. We have expanded the sentence in question to make this clearer:

      ‘BPD participants were also more certain about both self-preferences for absolute and relative reward ( = -0.89, 95%HDI: -1.01, -0.75; = -0.32, 95%HDI: -0.60, -0.04) versus CON participants (Figure 2B).’

      (14) "Parameter Associations with Reported Trauma, Paranoia, and Attributed Intent" the results reported here are intriguing, but not fully convincing as there is the problem of multiple comparisons. The combinations between parameters and scales are rather numerous. I suggest to correct for multiple comparisons and to flag only the findings that survive correction.

      We have now corrected this and controlled for multiple comparisons through partial correlation analysis, bootstrapping assessment for robustness, permutation testing, and False Detection Rate correction. We only report those that survive bootstrapping and permutation testing, reporting both corrected (p[fdr]) and uncorrected (p) significance.

      (15) Results page 14 and page 15. The authors compare the various parameters between groups. I would assume that these parameters come from M1 for controls and from M4 for BDP? Please clarify if this is indeed the case. If it is the case, I am not sure this is appropriate. To my knowledge, it is appropriate to compare parameters between groups only if the same model is fit to both groups. If two different models are fit to each group, then the parameters are not comparable, as the parameter have, so to speak, different "meaning" in two models. Now, I want to stress that my knowledge on this matter may be limited, and that the authors' approach may be sound. However, to be reassured that the approach is indeed sound, I would appreciate a clarification on this point and a reference to relevant sources about this approach.

      This is an important point. First, we confirmed all our main conclusions about parameter differences using the maximal model M1 to fit all the participants. We added Supplementary Table 2 to report the outcome of this analysis. Second, we did the same for parameters across all models M1-M4, fitting each to participants without comparison. This is particularly relevant for M3, since at least a minority of participants of both groups were best explained by this model. We report these analyses in Fig S11:

      Since the M4 is nested within M1, we argue that this comparison is still meaningful, and note explanations in the text for why the effects noted between groups may occur given the differences in their causal meaning, for example in the results under phase 2 analyses:

      ‘Belief updating in phase 2 was less flexible in BPD participants. Median change in beliefs (from priors to posteriors) about a partner’s preferences was lower versus. CON ( = -5.53, 95%HDI: -7.20, -4.00; = -10.02, 95%HDI: -12.81, -7.30). Posterior beliefs about partner were more precise in BPD versus CON ( = -0.94, 95%HDI: -1.50, -0.45;  = -0.70, 95%HDI: -1.20, -0.25).  This is unsurprising given the disintegrated priors of the BPD group in M4, meaning they need to ‘travel less’ in state space. Nevertheless, even under assumptions of M1 and M2 for both groups, BPD showed smaller posteriors median changes versus CON in phase 2 (see Table T2). These results converge to suggest those with BPD form rigid posterior beliefs.’

      (16) "We built and tested a theory of interpersonal generalization in a population of matched participants" this sentence seems to be unwarranted, as there is no theory in the paper (actually, as it is now, the paper looks rather exploratory)

      We thank the reviewer for their perspective. Formal models can be used as a theoretical statement on the casual algorithmic process underlying decision making and choice behaviour; the development of formal models are an essential theoretical tool for precision and falsification (Haslbeck et al., 2022). In this sense, we have built several competing formal theories that test, using casual architectures, whether the latent distribution(s) that generate one’s choices generalise into one’s predictions about another person, and simultaneously whether one’s latent distribution(s) that represent beliefs about another person are used to inform future choices.

      Reviewer 3:

      ‘My broad question about the experiment (in terms of its clinical and cognitive process relevance): Does the task encourage competition or give participants a reason to take advantage of others? I don't think it does, so it would be useful to clarify the normative account for prosociality in the introduction (e.g., some of Robin Dunbar's work).’

      We agree that our paradigm does not encourage competition. We use a reward structure that makes it contingent on participants to overcome a particular threshold before earning rewards, but there is no competitive element to this, in that points earned or not earned by partners have no bearing on the outcomes for the participant. This is important given the consideration of recursive properties that arise through mixed-motive games; we wanted to focus purely on observational learning in phase 2, and repercussion-free choices made by participants in phase 1 and 3, meaning the choices participants, and decisions of a partner, are theoretically in line with self-preferences irrespective of the judgement of others. We have included a clearer statement of the structure of this type of task, and more clearly cited the origin for its structure (Murphy & Ackerman, 2011):

      ‘Our present work sought to achieve two primary goals. 1. Extend prior causal computational theories to formalise and test the interrelation between self-insertion and social contagion on learning and behaviour to better probe interpersonal generalisation in health, and 2., Test whether previous computational findings of social learning changes in BPD can be explained by infractions to self-other generalisation. We accomplish these goals by using a dynamic, sequential social value economic paradigm, the Intentions Game, building upon a Social Value Orientation Framework (Murphy & Ackerman, 2011) that assumes motivational variation in joint reward allocation.’

      Given the introductions structure as it stands, we felt providing another paragraph on the normative assumptions of such a game was outside the scope of this article.

      ‘The finding that individuals with BPD do not engage in self-other generalization on this task of social intentions is novel and potentially clinically relevant. The authors find that BPD participants' tendency to be prosocial when splitting points with a partner does not transfer into their expectations of how a partner will treat them in a task where they are the passive recipient of points chosen by the partner. In the discussion, the authors reasonably focus on model differences between groups (Bayesian model comparison), yet I thought this finding -- BPD participants not assuming prosocial tendencies in phase 2 while CON participant did -- merited greater attention. Although the BPD group was close to 0 on the \beta prior in Phase 2, their difference from CON is still in the direction of being more mistrustful (or at least not assuming prosociality). This may line up with broader clinical literature on mistrustfulness and attributions of malevolence in the BPD literature (e.g., a 1992 paper by Nigg et al. in Journal of Abnormal Psychology). My broad point is to consider further the Phase 2 findings in terms of the clinical interpretation of the shift in \beta relative to controls.’

      This is an important point, that we contextualize within the parameterisation of our utility model. While the shift toward 0 in the BPD participants is indeed more competitive, as the reviewer notes, it is surprisingly centred closely around 0, with only a slight bias to be prosocial (mean = -0.47;  = -6.10, 95%HDI: -7.60, -4.60). Charitably we might argue that BPD participants are expecting more competitive preferences from their partner. However even so, given their variance around their priors in phase 2, they are uncertain or unconfident about this. We take a more conservative approach in the paper and say that given the tight proximity to 0 and the variance of their group priors, they are likely to be ‘hedging their bets’ on whether their partner is going to be prosocial or competitive. While the movement from phase 1 to 2 is indeed in the competitive direction it still lands in neutral territory. Model M4 does not preclude central tendancies at the start of Phase 2 being more in the competitive direction.

      ‘First, the authors note that they have "proposed a theory with testable predictions" (p. 4 but also elsewhere) but they do not state any clear predictions in the introduction, nor do they consider what sort of patterns will be observed in the BPD group in view of extant clinical and computational literature. Rather, the paper seems to be somewhat exploratory, largely looking at group differences (BPD vs. CON) on all of the shared computational parameters and additional indices such as belief updating and reaction times. Given this, I would suggest that the authors make stronger connections between extant research on intention representation in BPD and their framework (model and paradigm). In particular, the authors do not address related findings from Ereira (2020) and Story (2024) finding that in a false belief task that BPD participants *overgeneralize* from self to other. A critical comparison of this work to the present study, including an examination of the two tasks differ in the processes they measure, is important.’

      Thank you for this opportunity to include more of the important work that has preceded the present manuscript. Prior work has tended to focus on either descriptive explanations of self-other generalisation (e.g. through the use of RW type models) or has focused on observational learning instability in absence of a causal model from where initial self-other beliefs may arise. While the prior work cited by the reviewer [Ereira (2020; Nat. Comms.) and Story (2024; Trans. Psych.)] does examine the inter-trial updating between self-other, it does not integrate a self model into a self’s belief about an other prior to observation. Rather, it focuses almost exclusively on prediction error ‘leakage’ generated during learning about individual reward (i.e. one sided reward). These findings are important, but lie in a slightly different domain. They also do not cut against ours, and in fact, we argue in the discussion that the sort of learning instability described above and splitting (as we cite from Story ea. 2024; Psych. Rev.) may result from a lack of self anchoring typical of CON participants. Nevertheless we agree these works provide an important premise to contrast and set the groundwork for our present analysis and have included them in the framing of our introduction, as well as contrasting them to our data in the discussion.

      In the introduction:

      ‘The BPD phenotype has been associated with a potential over-reliance on social versus internal cues (Henco et al., 2020), ‘splitting’ of social latent states that encode beliefs about others (Story et al., 2023), negative appraisal of interpersonal experiences with heightened self-blame (Mancinelli et al., 2024), inaccurate inferences about others’ irritability (Hula et al., 2018), and reduced belief adaptation in social learning contexts (Siegel et al., 2020). Associative models have also been adapted to characterize  ‘leaky’ self-other reinforcement learning (Ereira et al., 2018), finding that those with BPD overgeneralize (leak updates) about themselves to others (Story et al., 2024). Altogether, there is currently a gap in the direct causal link between insertion, contagion, and learning (in)stability.’

      In the discussion:

      ‘Disruptions in self-to-other generalization provide an explanation for previous computational findings related to task-based mentalizing in BPD. Studies tracking observational mentalizing reveal that individuals with BPD, compared to those without, place greater emphasis on social over internal reward cues when learning (Henco et al., 2020; Fineberg et al., 2018). Those with BPD have been shown to exhibit reduced belief adaptation (Siegel et al., 2020) along with ‘splitting’ of latent social representations (Story et al., 2024a). BPD is also shown to be associated with overgeneralisation in self-to-other belief updates about individual outcomes when using a one-sided reward structure (where participant responses had no bearing on outcomes for the partner; Story et al., 2024b). Our analyses show that those with BPD are equal to controls in their generalisation of absolute reward (outcomes that only affect one player) but disintegrate beliefs about relative reward (outcomes that affect both players) through adoption of a new, neutral belief. We interpret this together in two ways: 1. There is a strong concern about social relativity when those with BPD form beliefs about others, 2. The absence of constrained self-insertion about relative outcomes may predispose to brittle or ‘split’ beliefs. In other words, those with BPD assume ambiguity about the social relativity preferences of another (i.e. how prosocial or punitive) and are quicker to settle on an explanation to resolve this. Although self-insertion may be counter-intuitive to rational belief formation, it has important implications for sustaining adaptive, trusting social bonds via information moderation.’

      In addition, perhaps it is fairer to note more explicitly the exploratory nature of this work. Although the analyses are thorough, many of them are not argued for a priori (e.g., rate of belief updating in Figure 2C) and the reader amasses many individual findings that need to by synthesized.’

      We have now noted the primary goals of our work in the introduction, and have included caveats about the exploratory nature of our analyses. We would note that our model is in effect a causal combination of prior work cited within the introduction (Barnby et al., 2022; Moutoussis et al., 2016). This renders our computational models in effect a causal theory to test, although we agree that our dissection of the results are exploratory. We have more clearly signposted this:

      ‘Our present work sought to achieve two primary goals. 1. Extend prior causal computational theories to formalise and test the interrelation between self-insertion and social contagion on learning and behaviour to better probe interpersonal generalisation in health, and 2., Test whether previous computational findings of social learning changes in BPD can be explained by infractions to self-other generalisation. We accomplish these goals by using a dynamic, sequential economic paradigm, the Intentions Game, building upon a Social Value Orientation Framework (Murphy & Ackerman, 2011) that assumes innate motivational variation in joint reward allocation.‘

      ‘Second, in the discussion, the authors are too quick to generalize to broad clinical phenomena in BPD that are not directly connected to the task at hand. For example, on p. 22: "Those with a diagnosis of BPD also show reduced permeability in generalising from other to self. While prior research has predominantly focused on how those with BPD use information to form impressions, it has not typically examined whether these impressions affect the self." Here, it's not self-representation per se (typically, identity or one's view of oneself), but instead cooperation and prosocial tendencies in an economic context. It is important to clarify what clinical phenomena may be closely related to the task and which are more distal and perhaps should not be approached here.’

      Thank you for this important point. We agree that social value orientation, and particularly in this economically-assessed form, is but one aspect of the self, and we did not test any others. A version of the social contagion phenomena is also present in other aspects of the self in intertemporal (Moutoussis et al., 2016), economic (Suzuki et al., 2016) and moral preferences (Yu et al., 2021). It would be most interesting to attempt to correlate the degrees of insertion and contagion across the different tasks.

      We take seriously the wider concern that behaviour in our tasks based on economic preferences may not have clinical validity. This issue is central in the whole field of computational psychiatry, much of which is based on generalizing from tasks like ours, and discussing correlations with psychometric measures. We hope that it is acceptable to leave such discussions to the many reviews on computational psychiatry (Montague et al., 2012; Hitchcock et al., 2022; Huys et al., 2016). Here, we have just put a caveat in the dicussion:

      ‘Finally, a limitation may be that behaviour in tasks based on economic preferences may not have clinical validity. This issue is central to the field of computational psychiatry, much of which is based on generalising from tasks like that within this paper and discussing correlations with psychometric measures. Extrapolating  economic tasks into the real world has been the topic of discussion for the many reviews on computational psychiatry (e.g. Montague et al., 2012; Hitchcock et al., 2022; Huys et al., 2016). We note a strength of this work is the use of model comparison to understand causal algorithmic differences between those with BPD and matched healthy controls. Nevertheless, we wish to further pursue how latent characteristics captured in our models may directly relate to real-world affective change.’

      ‘On a more technical level, I had two primary concerns. First, although the authors consider alternative models within a hierarchical Bayesian framework, some challenges arise when one analyzes parameter estimates fit separately to two groups, particularly when the best-fitting model is not shared. In particular, although the authors conduct a model confusion analysis, they do not as far I could tell (and apologies if I missed it) demonstrate that the dynamics of one model are nested within the other. Given that M4 has free parameters governing the expectations on the absolute and relative reward preferences in Phase 2, is it necessarily the case that the shared parameters between M1 and M4 can be interpreted on the same scale? Relatedly, group-specific model fitting has virtues when believes there to be two distinct populations, but there is also a risk of overfitting potentially irrelevant sample characteristics when parameters are fit group by group.

      To resolve these issues, I saw one straightforward solution (though in modeling, my experience is that what seems straightforward on first glance may not be so upon further investigation). M1 assumes that participants' own preferences (posterior central tendency) in Phase 1 directly transfer to priors in Phase 2, but presumably the degree of transfer could vary somewhat without meriting an entirely new model (i.e., the authors currently place this question in terms of model selection, not within-model parameter variation). I would suggest that the authors consider a model parameterization fit to the full dataset (both groups) that contains free parameters capturing the *deviations* in the priors relative to the preceding phase's posterior. That is, the free parameters $\bar{\alpha}_{par}^m$ and $\bar{\beta}_{par}^m$ govern the central tendency of the Phase 2 prior parameter distributions directly, but could be reparametrized as deviations from Phase 1 $\theta^m_{ppt}$ parameters in an additive form. This allows for a single model to be fit all participants that encompasses the dynamics of interest such that between-group parameter comparisons are not biased by the strong assumptions imposed by M1 (that phase 1 preferences and phase 2 observations directly transfer to priors). In the case of controls, we would expect these deviation parameters to be centred on 0 insofar as the current M1 fit them best, whereas for BPD participants should have significant deviations from earlier-phase posteriors (e.g., the shift in \beta toward prior neutrality in phase 2 compared to one's own prosociality in phase 1). I think it's still valid for the authors to argue for stronger model constraints for Bayesian model comparison, as they do now, but inferences regarding parameter estimates should ideally be based on a model that can encompass the full dynamics of the entire sample, with simpler dynamics (like posterior -> prior transfer) being captured by near-zero parameter estimates.’

      Thank you for the chance to be clearer in our modelling. In particular, the suggestion to include a model that can be fit to all participants with the equivalent of the likes of partial social insertion, to check if the results stand, can actually be accomplished through our existing models.  That is, the parameter that governs the flexibility over beliefs in phase 2 under models M1 (dominant for CON participant) and M2 parameterises the degree to which participants think their partner may be different from themselves. Thus, forcibly fitting M1 and M2 hierarchically to all participants, and then separately to BPD and CON participants, can quantify the issue raised: if BPD participants indeed distinguish partners as vastly different from themselves enough to warent a new central tendency, should be quantitively higher in BPD vs CON participants under M1 and M2.

      We therefore tested this, reporting the distributional differences between for BPD and CON participants under M1, both when fitted together as a population and as separate groups. As is higher for BPD participants under both conditions for M1 and M2 it supports our claim and will add more context for the comparison - may be large enough in BPD that a new central tendency to anchor beliefs is a more parsimonious explanation.

      We cross checked this result by assessing the discrepancy between the participant’s and assumed partner’s central tendencies for both prosocial and individualistic preferences via best-fitting model M4 for the BPD group. We thereby examined whether belief disintegration is uniform across preferences (relative vs abolsute reward) or whether one tendency was shifted dramatically more than another.  We found that beliefs over prosocial-competitive preferences were dramatically shifted, whereas those over individualistic preferences were not.

      We have added the following to the main text results to explain this:

      Model Comparison:

      ‘We found that CON participants were best fit at the group level by M1 (Frequency = 0.59, Protected Exceedance Probability = 0.98), whereas BPD participants were best fit by M4 (Frequency = 0.54, Protected Exceedance Probability = 0.86; Figure 2A). We first analyse the results of these separate fits. Later, in order to assuage concerns about drawing inferences from different models, we examined the relationships between the relevant parameters when we forced all participants to be fit to each of the models (in a hierarchical manner, separated by group). In sum, our model comparison is supported by convergence in parameter values when comparisons are meaningful. We refer to both types of analysis below.’

      Phase 1:

      ‘These differences were replicated when considering parameters between groups when we fit all participants to the same models (M1-M4; see Table S2).’

      Phase 2:

      ‘To check that these conclusions about self-insertion did not depend on the different models, we found that only under M1 and M2 were consistently larger in BPD versus CON. This supports the notion that new central tendencies for BPD participants in phase 2 were required, driven by expectations about a partner’s relative reward. (see Fig S10 & Table S2). and parameters under assumptions of M1 and M2 were strongly correlated with median change in belief between phase 1 and 2 under M3 and M4, suggesting convergence in outcome (Fig S11).’

      ‘Furthermore, even under assumptions of M1-M4 for both groups, BPD showed smaller posterior median changes versus CON in phase 2 (see Table T2). These results converge to suggest those with BPD form rigid posterior beliefs.’

      ‘Assessing this same relationship under M1- and M2-only assumptions reveals a replication of this group effect for absolute reward, but the effect is reversed for relative reward (see Table S3). This accords with the context of each model, where under M1 and M2, BPD participants had larger phase 2 prior flexibility over relative reward (leading to larger initial surprise), which was better accounted for by a new central tendency under M4 during model comparison. When comparing both groups under M1-M4 informational surprise over absolute reward was consistently restricted in BPD (Table S3), suggesting a diminished weight of this preference when forming beliefs about an other.’

      Phase 3

      ‘In the dominant model for the BPD group—M4—participants are not influenced in their phase 3 choices following exposure to their partner in phase 2. To further confirm this we also analysed absolute change in median participant beliefs between phase 1 and 3 under the assumption that M1 and M3 was the dominant model for both groups (that allow for contagion to occur). This analysis aligns with our primary model comparison using M1 for CON and M4 for BPD  (Figure 2C). CON participants altered their median beliefs between phase 1 and 3 more than BPD participants (M1: linear estimate = 0.67, 95%CI: 0.16, 1.19; t = 2.57, p = 0.011; M3: linear estimate = 1.75, 95%CI: 0.73, 2.79; t = 3.36, p < 0.001). Relative reward was overall more susceptible to contagion versus absolute reward (M1: linear estimate = 1.40, 95%CI: 0.88, 1.92; t = 5.34, p<0.001; M3: linear estimate = 2.60, 95%CI: 1.57, 3.63; t = 4.98, p < 0.001). There was an interaction between group and belief type under M3 but not M1 (M3: linear estimate = 2.13, 95%CI: 0.09, 4.18, t = 2.06, p=0.041). There was only a main effect of belief type on precision under M3 (linear estimate = 0.47, 95%CI: 0.07, 0.87, t = 2.34, p = 0.02); relative reward preferences became more precise across the board. Derived model estimates of preference change between phase 1 and 3 strongly correlated between M1 and M3 along both belief types (see Table S2 and Fig S11).’

      ‘My second concern pertains to the psychometric individual difference analyses. These were not clearly justified in the introduction, though I agree that they could offer potentially meaningful insight into which scales may be most related to model parameters of interest. So, perhaps these should be earmarked as exploratory and/or more clearly argued for. Crucially, however, these analyses appear to have been conducted on the full sample without considering the group structure. Indeed, many of the scales on which there are sizable group differences are also those that show correlations with psychometric scales. So, in essence, it is unclear whether most of these analyses are simply recapitulating the between-group tests reported earlier in the paper or offer additional insights. I think it's hard to have one's cake and eat it, too, in this regard and would suggest the authors review Preacher et al. 2005, Psychological Methods for additional detail. One solution might be to always include group as a binary covariate in the symptom dimension-parameter analyses, essentially partialing the correlations for group status. I remain skeptical regarding whether there is additional signal in these analyses, but such controls could convince the reader. Nevertheless, without such adjustments, I would caution against any transdiagnostic interpretations such as this one in the Highlights: "Higher reported childhood trauma, paranoia, and poorer trait mentalizing all diminish other-to-self information transfer irrespective of diagnosis." Since many of these analyses relate to scales on which the groups differ, the transdiagnostic relevance remains to be demonstrated.’

      We have restructured the psychometric section to ensure transparency and clarity in our analysis. Namely, in response to these comments and those of the other reviewers, we have opted to remove the parameter analyses that aimed to cross-correlate psychometric scores with latent parameters from different models: as the reviewer points out, we do not have parity between dominant models for each group to warrant this, and fitting the same model to both groups artificially makes the parameters qualitatively different. Instead we have opted to focus on social contagion, or rather restrictions on , between phases 1 and 3 explained by M3. This provides us with an opportunity to examine social contagion on the whole population level isolated from self-insertion biases. We performed bootstrapping (1000 reps) and permutation testing (1000 reps) to assess the stability and significance of each edge in the partial correlation network, and then applied FDR correction (p[fdr]), thus controlling for multiple comparisons. We note that while we focused on M3 to isolate the effect across the population, social contagion across both relative and absolute reward under M3 strongly correlated with social contagion under M1 (see Fig S11).

      ‘We explored whether social contagion may be restricted as a result of trauma, paranoia, and less effective trait mentalizing under the assumption of M3 for all participants (where everyone is able to be influenced by their partner). To note, social contagion under M3 was highly correlated with contagion under M1 (see Fig S11). We conducted partial correlation analysis to estimate relationships conditional on all other associations and retained all that survived bootstrapping (1000 reps), permutation testing (1000 reps), and subsequent FDR correction. Persecution and CTQ scores were both moderately associated with MZQ scores (RGPTSB r = 0.41, 95%CI: 0.23, 0.60, p = 0.004, p[fdr]=0.043; CTQ r = 0.354 95%CI: 0.13, 0.56, p=0.019, p[fdr]=0.02). MZQ scores were in turn moderately and negatively associated with shifts in prosocial-competitive preferences () between phase 1 and 3 (r = -0.26, 95%CI: -0.46, -0.06, p=0.026, p[fdr]=0.043). CTQ scores were also directly and negatively associated with shifts in individualistic preferences (; r = -0.24, 95%CI: -0.44, -0.13, p=0.052, p[fdr]=0.065). This provides some preliminary evidence that trauma impacts beliefs about individualism directly, whereas trauma and persecutory beliefs impact beliefs about prosociality through impaired mentalising (Figure 4A).’

      (1) As far as I could tell, the authors didn't provide an explanation of this finding on page 5: "However, CON participants made significantly fewer prosocial choices when individualistic choices were available" While one shouldn't be forced to interpret every finding, the paper is already in that direction and I found this finding to be potentially relevant to the BPD-control comparison.

      Thank you for this observation. This sentance reports the fact that CON participants were effectively more selfish than BPD participants. This is captured by the lower value of reported in Figure 2, and suggests that CON participants were more focused on absolute value – acting in a more ‘economically rational’ manner – versus BPD participants. This fits in with our fourth paragraph of the discussion where we discuss prior work that demonstrates a heightened social focus in those with BPD. Indeed, the finding the reviewer highlights further emphasises the point that those with BPD are much more sensitive, and motived to choose, options concerning relative reward than are CON participants. The text in the discussion reads:

      ‘We also observe this in self-generated participant choice behaviour, where CON participants were more concerned over absolute reward versus their BPD counterparts, suggesting a heighted focus on relative vs. absolute reward in those with BPD.’

      (2) The adaptive algorithm for adjusting partner behavior in Phase 2 was clever and effective. Did the authors conduct a manipulation check to demonstrate that the matching resulted in approximately 50% difference between one's behavior in Phase 1 and the partner in Phase 2? Perhaps Supplementary Figure suffices, but I wondered about a simpler metric.

      Thanks for this point. We highlight this in Figure 3B and within the same figure legend although appreciate the panel is quite small and may be missed.  We have now highlighted this manipulation check more clearly in behavioural analysis section of the main text:

      ‘Server matching between participant and partner in phase 2 was successful, with participants being approximately 50% different to their partners with respect to the choices each would have made on each trial in phase 2 (mean similarity=0.49, SD=0.12).’

      (3) The resolution of point-range plots in Figure 4 was grainy. Perhaps it's not so in the separate figure file, but I'd suggest checking.

      Apologies. We have now updated and reorganised the figure to improve clarity.

      (4) p. 21: Suggest changing to "different" as opposed to "opposite" since the strategies are not truly opposing: "but employed opposite strategies."

      We have amended this.

      (5) p. 21: I found this sentence unclear, particularly the idea of "similar updating regime." I'd suggest clarifying: "In phase 2, CON participants exhibited greater belief sensitivity to new information during observational learning, eventually adopting a similar updating regime to those with BPD."

      We have clarified this statement:

      ‘In observational learning in phase 2, CON participants initially updated their beliefs in response to new information more quickly than those with BPD, but eventually converged to a similar rate of updating.’

      (6) p. 23: The content regarding psychosis seemed out of place, particularly as the concluding remark. I'd suggest keeping the focus on the clinical population under investigation. If you'd like to mention the paradigm's relevance to psychosis (which I think could be omitted), perhaps include this as a future direction when describing the paradigm's strengths above.

      We agree the paragraph is somewhat speculative. We have omitted it in aid of keeping the messaging succinct and to the point.

      (7) p. 24: Was BPD diagnosis assess using unstructured clinical interview? Although psychosis was exclusionary, what about recent manic or hypomanic episodes or Bipolar diagnosis? A bit more detail about BPD sample ascertainment would be useful, including any instruments used to make a diagnosis and information about whether you measured inter-rater agreement.

      Participants diagnosed with BPD were recruited from specialist personality disorder services across various London NHS mental health trusts. The diagnosis of BPD was established by trained assessors at the clinical services and confirmed using the Structured Clinical Interview for DSM-IV (SCID-II) (First et al., 1997). Individuals with a history of psychotic episodes, severe learning disability or neurological illness/trauma were excluded. We have now included this extra detail within our methods in the paper:

      ‘The majority of BPD participants were recruited through referrals by psychiatrists, psychotherapists, and trainee clinical psychologists within personality disorder services across 9 NHS Foundation Trusts in the London, and 3 NHS Foundation Trusts across England (Devon, Merseyside, Cambridgeshire). Four BPD participants were also recruited by self-referral through the UCLH website, where the study was advertised. To be included in the study, all participants needed to have, or meet criteria for, a primary diagnosis of BPD (or emotionally-unstable personality disorder or complex emotional needs) based on a professional clinical assessment conducted by the referring NHS trust (for self-referrals, the presence of a recent diagnosis was ascertained through thorough discussion with the participant, whereby two of the four also provided clinical notes). The patient participants also had to be under the care of the referring trust or have a general practitioner whose details they were willing to provide. Individuals with psychotic or mood disorders, recent acute psychotic episodes, severe learning disability, or current or past neurological disorders were not eligible for participation and were therefore not referred by the clinical trusts.‘

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public Review): 

      Despite evidence suggesting the benefits of neutralizing mucosa-derived IgA in the upper airway in protection against the SARS-CoV-2 virus, all currently approved vaccines are administered intramuscularly, which mainly induces systemic IgG. Waki et al. aimed to characterize the benefits of intranasal vaccination at the molecular level by isolating B cell clones from nasal tissue. The authors found that Spike-specific plasma cells isolated from the spleen of vaccinated mice showed significant clonal overlap with Spikespecific plasma cells isolated from nasal tissue. Interestingly, they could not detect any spike-specific plasma cells in the bone marrow or Peyer's patches, indicating that these nose-derived cells did not necessarily home to and reside in these locations, although the Peyer's patch is not a typical plasma cell niche - rather the lamina propria of the gut would have been a better place to look. Furthermore, they found that multimerization improves the antibody/antigen binding when the antibody is of low or intermediate affinity, but that high-affinity monomeric antibodies do not benefit from multimerization. Lastly, the authors used a competitive ELISA assay to show that multimerization could improve the neutralizing capacity of these

      antibodies. 

      The strength of this paper is the cloning of multiple IgA from the nasal mucosae (n=99) and the periphery (n=114) post-SARS-CoV-2 i.n. vaccination to examine the clonal relationship of this IgA with other sites, including the spleen. This analysis provides novel insights into the nature of the mucosal antibody response at the site where the host would encounter the virus, and whether this IgA response disseminates to other

      tissues. 

      There were also some weaknesses: 

      (1) The finding that multimerization improves binding and neutralization is not surprising as this was observed before by Wang and Nussenzweig for anti-SARS-CoV-2 IgA (authors should cite Enhanced SARS-CoV-2 neutralization by dimeric IgA. Wang et al., Sci. Transl. Med 2021, 13:3abf1555). 

      We have cited the paper, and the relevant sentence has been modified as follows (line 51-53); Recent studies have demonstrated that multimeric IgA is more effective and provides greater cross-protection than IgG and M-IgA (Okuya et al., 2020b) (Asahi et al., 2002) (Dhakal et al., 2018) (Asahi-Ozaki et al., 2004) (Wang et al., 2021).

      In addition, as far as I can tell we cannot ascertain the purity of fractions from the size exclusion chromatography thus I wasn't sure whether the input material used in Fig. 4 was a mixed population of dimer/trimer/tetramer?  

      The S-IgAs used in the SPR analysis in Fig. 4 consist of a mixture of dimers, trimers, and tetramers. The observed values indicate the average affinity of the S-IgAs. Please refer to the revised version (line 278280).

      (2) The flow cytometric assessment of the IgA+ clones from the nasal mucosae was difficult to interpret (Fig. 1B). It was hard for me to tell what they were gating on and subsequently analyzing without an IgA-negative population for reference. 

      We have updated FACS plots to illustrate the presence of IgA+ plasma cells in Fig. 1B, and the detailed gating strategy is outlined in Fig. 1B legend. Please find the relevant statements (line 115-120).

      (3) While the i.n. study itself is large and challenging, it would have been interesting to compare an i.m. route and examine the breadth of SARS-CoV-2 variant S1 binding for IgGs as in Fig. 2A. Are the IgA responses derived from the mucosae of greater breadth than systemic IgG responses? Alternatively, and easier, authors could do some comparisons with well-characterized IgG mAb for affinity and cross-reactivity as a benchmark to compare with the IgAs they looked at. Overall the authors did a good job of looking at a large range of systemic vs mucosal S1-specific antibodies in the context of an intra-nasal vaccination and this provides additional evidence for the utility of mucosal vaccination approaches for reducing person-to-person transmission. 

      I appreciate your consideration. Recent reports indicate that some M-IgA monomers possess neutralizing activity that is equivalent to or less than that of IgGs. However, the opposite phenomenon has also been observed. These results suggest that the Fc does not merely correlate with the degree of increase in antibody reactivity or functionality. We believe the discrepancies in previous studies are due to variations in the binding modes between the epitope and paratope of each antibody clone. Nevertheless, oligomerization enhances the functionality of most monomeric antibody clones, suggesting that the multivalent S-IgA enables a mode of action that is challenging to achieve with a monomeric antibody. Please refer to the revised version (line 399-403).

      Alternatively, and easier, authors could do some comparisons with well-characterized IgG mAb for affinity and cross-reactivity as a benchmark to compare with the IgAs they looked at. Overall the authors did a good job of looking at a large range of systemic vs mucosal S1-specific antibodies in the context of an intra-nasal vaccination and this provides additional evidence for the utility of mucosal vaccination approaches for reducing person-to-person transmission. 

      We have summarized the characteristics of the four types of nasal IgAs in Fig.7 and in the Discussion. Please refer to the revised version (line 405-422).

      Reviewer #2 (Public Review): 

      Summary: 

      This research demonstrates the breadth of IgA response as determined by isolating individual antigenspecific B cells and generating mAbs in mice following intranasal immunization of mice with SARS-CoV2 Spike protein. The findings show that some IgA mAb can neutralize the virus, but many do not. Notable immunization with Wuhan S protein generates a weak response to the omicron variant. 

      Strengths: 

      Detailed analysis characterizing individual B cells with the generation of mAbs demonstrates the response's breadth and diversity of IgA responses and the ability to generate systemic immune responses. 

      Weaknesses: 

      The data presentation needs clarity, and results show mAb ability to inhibit SARS-CoV2 in vitro. How IgA functions in vivo is uncertain. 

      We conducted an additional experiment using a hamster model and confirmed that S-IgAs can protect against SARS-CoV-2 infection. Please refer to the revised version (line 349-373 and 431-438).

      Reviewer #1 (Recommendations For The Authors): 

      (1) Figure 1A shows antibody titers in nasal lavage fluid and serum of mice post intranasal vaccination with SARS-CoV-2 Spike protein. The Y-axis of this figure is labeled as "U/mg" however these units are not clearly defined. 

      The antibody titers are expressed as optical density (OD450) value per total protein in nasal lavage fluids or serum. Please find the relevant statements (line 113-114).

      Furthermore, what do antibody titers in the nasal lavage fluid and serum look like post-intramuscular vaccination with the same vaccine and dose? Comparison of titers to the intramuscular route as well as to the PBS control would make this data more impactful. 

      We appreciate your consideration. We have not conducted experiments comparing the effects of intramuscular and intranasal administration using the same dosage and adjuvant. Cholera toxin has primarily been used as an adjuvant for nasal immunization, but it is seldom applied for intramuscular injection. We are interested in its impact on the immune compartment when using cholera toxin as an adjuvant for intramuscular injection. We plan to conduct further experiments in the future.

      Lastly, in Figure 1B, the detection of nasal IgG is not shown even though the authors assess nasally-derived IgG in the spleen further into the study.  

      Since the number of lymphocytes that can be collected from the nasal mucosa is limited, there is an insufficient capacity to isolate IgG+ plasma cells after collecting IgA+ plasma cells. Therefore, conducting such an experiment on mice is technically challenging. A larger animal, such as rats, will be necessary to perform this experiment. Further investigation is needed to determine whether antigen-specific IgG+ plasma cells, sharing V-(D)-J with nasal IgA, can be detected in the nasal mucosa.

      (2) There appears to be something amiss with the IgA stain. It is smushed up against the X-axis. Better flow cytometry profiles should be shown. Likewise in Supplemental Fig. 1A, their IgA stain appears to not be working. This must be addressed using positive and negative controls. 

      We have updated FACS-polts to show the IgA+ plasma cell in Fig.1B, and the detailed gating strategy is outlined in the Fig.1B legend. Please find the relevant statements on line 115-120.

      (3) We do not know the purity of the samples that were subjected to SPR and since the legend of Fig. 4 is partially incorrect, it was difficult to know how this experiment was done. 

      The S-IgA used in the SPR analysis shown in Figure 4 is a mixture of dimers, trimers, and tetramers, and the observed values are believed to reflect the affinity of the S-IgA in the nasal mucosa. Please refer to the revised version (line 278-280).

      (4) Fig. 5 results need to compare with some of the well-characterized mAb (IgG) to understand the biological significance of these neutralizing titres. 

      We have summarized the characteristics of the four types of nasal IgA in Fig.7 and in the Discussion. Please refer to the revised version (page 405-422).

      Communication of results: 

      (1) Authors could improve the communication of their results by introducing the vaccination protocol in the results section accompanied by a diagram of the vaccination strategy (nature of the Ag, route, and frequency). This could be Fig. 1A .  

      A schematic diagram of the vaccination protocol is presented in Fig.1.

      (2) Care should be taken with some of the terminology. Intranasal is the accepted term but authors sometimes use "internasal". The term "immunosuppression" on page 2 could be misleading as it means something different to other audiences. The distinction when speaking about "protection from harmful pathogens" should be made between protection against infection (ie sterilizing immunity) vs protection against disease (ie morbidity and mortality). Instead of "nose", one should say "nasal". Nose-related could be rephrased as "potentially nasal-derived". P.5, line 2 didn't make sense: "IgG+ plasma cells that express nose-related IgA"...

      In many places, Spike is missing it's "e".  

      We have made the correction accordingly.

      (3) Page 3: The lumping of the human and animal SARS-CoV-2 intranasal studies together is a bit misleading. Very little has worked for intranasal vaccination against SARS-CoV-2 in humans at this point in time (although hopefully that will change soon!). Authors should specify which studies were done in animals and which were done in humans. 

      The manuscript has been revised to include two citations on line 73-75 (Ewer et al., 2021 and Zhu et al., 2023).

      (4) What is ER-tracker? It comes out of nowhere and should be explained why it was used to the reader (as well as why they used the other markers) to sort for Spike-specific PC. 

      ER-Tracker is a fluorescent dye that is highly selective for the endoplasmic reticulum of living cells. Because plasma cells have an expanded endoplasmic reticulum for properly folding and secreting large quantities of antibodies, using ER-Tracker along with anti-CD138 facilitates the isolation of plasma cells from lymphocytes without the need for additional antibodies. Please refer to the revised version for details. (ine 130-134).

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:  

      Reviewer #1 (Public review):  

      Summary:  

      Goal: Find downstream targets of cmk-1 phosphorylation, identify one that also seems to act in thermosensory habituation, test for genetic interactions between cmk-1 and this gene, and assess where these genes are acting in the thermosensory circuit during thermosensory habituation.  

      Methods: Two in vitro analyses of cmk-1 phosphorylation of C. elegans proteins. Thermosensory habituation of cmk-1 and tax-6 mutants and double mutants was assessed by measuring the rate of heat-evoked reversals (reversal probability) of C. elegans before and after 20s ISI repeated heat pulses over 60 minutes.  

      Conclusions: cmk-1 and tax-6 act in separate habituation processes, primarily in AFD, that interact complexly, but both serve to habituate the thermosensory reversal response. They found that cmk-1 primarily acts in AFD and tax-6 primarily acts in RIM (and FLP for naïve responses). They also identified hundreds of potential cmk-1 phosphorylation substrates in vitro.  

      Strengths:  

      The effect size in the genetic data is quite strong and a large number of genetic interaction experiments between cmk-1 and tax-1 demonstrate a complex interaction.  

      Thanks a lot for these positive remarks.

      Weaknesses:  

      The major concern about this manuscript is the assumption that the process they are observing is habituation. The two previously cited papers using this (or a very similar) protocol, Lia and Glauser 2020 and Jordan and Glauser 2023, both use the word 'adaptation' to describe the observed behavioral decrement. Jordan and Glauser 2023 use the words 'habituation' or 'habituation-like' 10 times, however, they use 'adaptation' over 100 times. It is critical to distinguish habituation from sensory adaptation (or fatigue) in this thermal reversal protocol. These processes are often confused/conflated, however, they are very different; sensory adaptation is a process that decreases how much the nervous system is activated by a repeated stimulus, therefore it can even occur outside of the nervous system. Habituation is a learning process where the nervous system responds less to a repeated stimulus, despite (at least part of the nervous system) the nervous system still being similarly activated by the stimulus. Habituation is considered an attentional process, while adaptation is due to the fatigue of sensory transduction machinery. Control experiments such as tests for dishabituation (where the application of a different stimulus causes recovery of the decremented response) or rate of spontaneous recovery (more rapid recovery after short inter-stimulus intervals) are required to determine if habituation or sensory adaptation are occurring. These experiments will allow the results to be interpreted with clarity, without them, it isn't actually clear what biological process is actually being studied.  

      Thanks for the comment. As this reviewer points out, “adaptation” and “habituation” are often conflated. Many scientists (maybe not the majority though) use a less stringent definition for the word habituation, than the one presented by this reviewer. More particularly, the term habituation is used in human pain research to refer solely to the reduction of response to repeated stimuli, in the absence of a detailed assessment of the more stringent criteria mentioned here (see, e.g.,  PMID: 22337205 ; PMID: 18947923 ; PMID: 17258858; PMID: 20685171 ; PMID: 15978487). In addition to the practice in pain research, the main reason why we steered toward ‘habituation’ from our previous publication is because it immediately conveys the idea of a response reduction, whereas ‘adaptation’ could in principle be either an up-regulation or a downregulation of the response (again, based on various definitions). But we agree that using the word “habituation” came at the cost of triggering a confusion about the exact nature of the process, for those considering the stricter definition of the word “habituation” and those not in the narrower field of pain research. In the revised manuscript, we have thus changed this terminology to “adaptation”. Also following suggestions from Reviewer 2, we have strengthened the description of the protocol in the Result section and clarified, why the adaptation phenomenon is not a ‘thermal damage’ effect or ‘fatigue’ effect in the neuro-muscular circuit controlling reversal. One of the most convincing piece of evidence it cannot be solely explained by “damages” or “exhaustion” is simply the existence of non-adapting mutants (like cmk-1(lf)) or pharmacological treatments (Cyclosporin A) blocking the adaptation effect and enabling worm to continuously reverse for hours without any problems.  

      While the discrepancy between the in vitro phosphorylation experiments and the in silico predictions was discussed, the substantial discrepancy (over 85% of the substrates in the smaller in vitro dataset were not identified in the larger dataset) between the two different in vitro datasets was not discussed. This is surprising, as these approaches were quite similar, and it may indicate a measure of unreliability in the in vitro datasets (or high false negative rates).

      Thanks for the comment. This is an important aspect which we now more extensively cover in the Discussion section.

      The strong consistency of the CMK-1 recognition consensus sequences across the two in vitro dataset speaks against the unreliability of the analyses. Instead, there are a few points to highlight that explain the somewhat low degree of overlap between the two datasets, which indeed relate to the false negative rates as this reviewer suggests.

      (1) In the peptide library analysis, Trypsin cleavage prior to kinase treatment will leave a charged N-term or C- terminus and in addition remove part of the protein context required for efficient kinase recognition. This will have a variable effect across the different substrates in the peptide library, depending on the distance between the cleavage site and the phosphosite, but will not affect the native protein library. This effect increases the false negative rate in the peptide library.

      (2) The number and distribution of “available substrate phosphosites” diverge in the two libraries. Indeed, the peptide library is expected to contain a markedly larger diversity of potential CMK-1 substrate sites than the protein library (because the Trypsin digestion will reveal substrates that are normally buried in a native protein), but the depth of MS analysis is the same for the two libraries. In somewhat simplistic terms, the peptide-library analysis is prone to be saturated with abundant phosphorylated peptides, which prevent detecting all phosphosites. If the peptide analysis could have been made deeper, we would probably have increased the overlap (at the cost of increasing the number of false positive too).

      (3) We have chosen quite strict criteria and applied them separately to define each hit list; therefore, we know we have many false negatives in each list, which will naturally reduce the expected overlap.

      We now extended the discussion of the limited overlap of the two dataset in a dedicated paragraph in the discussion. We also clarify that we tend to give more trust to the protein-library dataset (since substrates are in a configuration closer to that in vivo), with those hits also present in the peptide dataset (like TAX-6 was) as the most convincing hits, as they could be validated in a second type of experiment.

      Additionally, the rationale for, and distinction between, the two separate in vitro experiments is not made clear.  

      We reasoned that both substrate types have their own benefits and limitations (as discussed in the manuscript), so it was an added value to run both. We proposed that the subset of targets present in both datasets to be the most solid list of candidates. We have reinforced this point in the discussion.  

      Line 207: After reporting that both tax-6 and cnb-1 mutants have high spontaneous reversals, it is not made clear why cnb-1 is not further explored in the paper. Additionally, this spontaneous reversal data should be in a supplementary figure.  

      We kept the focus of the article primarily on TAX-6, because it was identified as CMK-1 target in vitro; CNB-1 was not. Moreover, we didn’t have cnb-1(gf) mutants to pursue the analysis with, and we were stuck by the cnb-1(lf) constitutive high reversal rate for any further follow up. We have added a supplementary file to present the spontaneous reversals rates.

      Figure 3 -S1: This model doesn't explain why the cmk-1(gf) group and the cmk-1(gf) +cyclo A group cause enhanced response decrement (presumably by reducing the inhibition by tax-6) but the +cyclo A group (inhibited tax-6) showed weaker response decrement, as here there is even further weakened inhibition of tax-6 on this process. Also, the cmk-1(lf) +cyclo A group is labeled as constitutive habituation, however, this doesn't appear to be the case in Figure 3 (seems like a similar initial level and response decrement phenotype to wildtype).  

      Thanks a lot for the comment. We are glad that the presentation of our complex dataset was clear enough to bring the reader to that level of detailed reflection and interpretation on the proposed model. To address the two points raised in this reviewer comment, we made modifications to the model presentation and provide additional clarifications below, where we use the term adaptation instead of habituation (as in the revised Figure):

      Regarding the first point, “why the cmk-1(gf) group and the cmk-1(gf) +cyclo A group cause enhanced response decrement … but the +cyclo A group showed weaker response decrement”. This is really a very good point, that cannot be easily explained if all the branches (arrows) in the model have the same weight or work as ON/OFF switches. We tried to convey the relative importance of the regulation effect via the thickness of the arrow lines (which we have now clarified in the legend in the revised ms). The main ‘quantitative’ nuances to take into consideration here originate from 2 assumptions of the model (which we have clarified in the revised ms):

      Assumption 1: the inhibitory effect of TAX-6 on the CMK-1 antiadaptation branch and the inhibitory effect of TAX-6 on the CMK-1 pro-adaptation branch are not of the same magnitude (we have further enhanced the line thickness differences in the revised model, top left panel for wild type).

      Assumption 2: the two antagonistic direct effects of CMK-1 on adaptation are not of the same magnitude, most strikingly in the context of CMK-1(gf) mutants.

      In our model, the cyclosporin A treatment alone (bottom left panel) causes a strong boost on the CMK-1 inhibitory branch and a less marked boost on the CMK-1 activator branch (following assumption 1). This causes an imbalance between the two antagonist direct CMK-1-dependent drives, which reduces (but doesn’t fully block) adaptation. Indeed, we don’t observe a total block of adaptation with cyclosporin A in wild type, the effect being significantly milder than the totally nonadapting phenotypes seen, e.g., in TAX-6(gf) mutants. From there, the question is what happen in CMK-1(gf) background that would mask the anti-adaptation effect of Cyclosporin A? Here assumption 2 is relevant, and the CMK-1(gf) pro-adaptation direct branch is always prevalent and imbalances the regulation toward faster adaptation (the role of TAX-6 becoming negligible in the CMK-1(gf) background and ipso facto that of Cyclosporin A).

      Regarding the second point, “the cmk-1(lf) +cyclo A group is labeled as constitutive habituation”. We regret a confusing word choice in the first version of the manuscript; we intended to mean “normal habituation phenotype” but in the joint absence of antagonistic CMK-1 and TAX-6 regulatory signaling (so the regulation is not like in wild-type, but the phenotype ends up like in wild type). We have modified the label to “normal adaptation” and left a note in the legend that an apparently normal adaptation phenotype seems to be the default situation when the two antagonistic regulatory pathways are shut off.

      More discussion of the significance of the sites of cmk-1 and tax-6 function in the neural circuit should take place. Additionally, incorporating the suspected loci of cmk-1 and tax-6 in the neural circuit into the model would be interesting (using proper hypothetical language). For example, as it seems like AFD is not required for the naïve reversal response but just its reduction, cmk-1 activity in AFD might be generating inhibition of the reversal response by AFD. It certainly would be understandable if this isn't workable, given extrasynaptic signaling and other unknowns, but it potentially could also be helpful in generating a working model for these complex interactions. For example, cmk1 induces AIZ inhibition of AVA (AIZ is electrically coupled to AFD), and tax-6 reduces RIM activation of AVA (these neurons are also electrically coupled according to the diagram). RIM is also a neuropeptide-rich neuron, so this could allow it to interact with the cmk-1-related process(es) in AFD. Some discussion of possibilities like this could be informative.  

      Thanks for the comment. These hypothetical inter-cellular communication pathways are indeed nice possibilities. On the other hand, we could envision several additional pathways. While RIM is indeed a neuropeptide-rich neurons, all these neurons actually express neuropeptides. Following this helpful suggestion, we have slightly expanded the discussion of hypothetical cellular pathways that can be modulated downstream of CMK-1 in AFD. We also slightly lengthened the discussion to mention hypothetical post-synaptic target of TAX-6 within interneurons based on the literature.

      Provide an explanation for why some of the experiments in Figure 4 have such a high N, compared to other experiments.  

      The conditions with the highest n correspond to conditions which we have also used as ‘control’ condition for other type of experiments in the lab and as part of side projects, but which could be gathered for the present article. We have been working with cmk-1(lf) and tax-6(gf) mutants for many years… and the robust non-adapting phenotype was a reference point and a quality control when analyzing other nonadapting mutants.

      Because the loss of function and gain of function mutations in cmk-1 have a similar effect, it is likely that this thermosensory plasticity phenotype is sensitive to levels of cmk-1 activity. Therefore, it is not surprising that the cmk-1 promoter failed to rescue very well as these plasmid-driven rescues often result in overexpression. Given this and that the cmk-1p rescue itself was so modest, these rescue experiments are not entirely convincing (and very hard to interpret; for example, is the AFD rescue or the ASER rescue more complete? The ASER one is actually closer to the cmk-1p rescue). Given the sensitivity to cmk-1 activity levels, a degradation strategy would be more likely to deliver clear results (or perhaps even the overactivation approach used for tax-6).  

      Thanks for the comment. We respectfully disagree with this reviewer’s statement “the loss of function and gain of function mutations in cmk-1 have a similar effect”. We suspect a confusion here, because our data clearly show that these two mutant types have an opposite phenotype. That being said, we interpret the weak rescue effect with cmk-1p as a probable result of overexpression or incomplete/imbalanced expression across neurons (as the promoter used might not include all the relevant regulatory regions). We dedicated considerable efforts to establish an endogenous CMK-1::degron knock in, for tissue-specific auxin-induced degradation (AID), but we were unfortunately not able to obtain consistent results. Unfortunately, the only useful data regarding CMK-1 place-of-action are the cell-specific rescue data already included in the report.

      Reviewer #2 (Public review):  

      Summary:  

      The reduction in a response to a specific stimulus after repeated exposures is called habituation. Alterations in habituation to noxious stimuli are associated with chronic pain in humans, however, the underlying molecular mechanisms involved are not clear. This study uses the nematode C. elegans to study genes and mechanisms that underlie habituation to a form of noxious stimuli based on heat, termed thermo-noxious stimuli. The authors previously showed that the Calcium/Calmodulin-dependent protein kinase (CMK-1) regulates thermo-nociceptive habituation in the nematode C. elegans. Although CMK-1 is a kinase with many known substrates, the downstream targets relevant for thermo-nociceptive habituation are not known. In this study, the authors use two different kinase screens to identify phosphorylation targets of CMK-1. One of the targets they identify is Calcineurin (TAX-6). The authors show that CMK-1 phosphorylates a regulatory domain of Calcineurin at a highly conserved site (S443). In a series of elegant experiments, the authors use genetic and pharmacological approaches to increase or decrease CMK-1 and Calcineurin signaling to study their effects on thermo-nociceptive habituation in C. elegans. They also combine these various approaches to study the interactions between these two signaling proteins. The authors use specific promoters to determine in which neurons CMK-1 and Calcineurin function to regulate thermonociceptive habituation. The authors propose a model based on their findings illustrating that CMK-1 and Calcineurin act mostly in different neurons to antagonistically regulate habituation to thermo-nociceptive stimuli in a complex manner.  

      Strengths:  

      (1) Given the conservation of habituation across phylogeny, identifying genes and mechanisms that underlie nociceptive habituation in C. elegans may be relevant for understanding chronic pain in humans.  

      (2) The identification of canonical CaM Kinase phosphorylation motifs in the substrates identified in the CMK-1 substrate screen validates the screen.  

      (3) The use of loss and gain of function approaches to study the effects of CMK-1 and Calcineurin on thermo-nociceptive responses and habituation is elegant.  

      (4) The ability to determine the cellular place of action of CMK-1 and Calcineurin using neuron-specific promoters in the nematode is a clear strength of the genetic model system.  

      Thanks a lot for these positive remarks.

      Weaknesses:  

      (1) The manuscript begins by identifying Calcineurin as a direct substrate of CMK-1 but ends by showing that CMK-1 and Calcineurin mostly act in different neurons to regulate nociceptive habituation which disrupts the logical flow of the manuscript.  

      We understand this point and we have carefully considered and (reconsidered) the way to articulate the report. However, we could not present the story much differently as we would have no justification to investigate the role of TAX-6 and its interaction with CMK-1, if we would not have first identified it as phospho-target in vitro. Carefully considering this point, we found that the abstract of the first manuscript version was probably too cursory and susceptible to trigger wrong expectations among readers. We have thus extensively revised the abstract to clarify this point. Furthermore, we have reinforced this point in the last paragraph of the introduction and in the conclusion paragraph of the Discussion.

      (2) The physiological relevance of CMK-1 phosphorylation of Calcineurin is not clear.

      We do agree and have explicitly mentioned this aspect in the abstract, in the end of the introduction, and in the discussion section.

      (3) It is not clear if Calcineurin is already a known substrate of CaM Kinases in other systems or if this finding is new.  

      We are not aware of any study having shown Calcineurin is a direct target of CaM kinase I. But it was found to be substrate of CaM kinase II as well as of other kinases, as we explicitly presented in the discussion section. We have complemented the text mentioning we are not aware of Calcineurin having so far been reported to be a CaM kinase I substrate.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):  

      (1) The authors might consider reorganizing the results, so that the substrate phosphorylation analysis follows the cmk-1 habituation data, as it may not be clear to the reader why you are looking for substrates downstream of cmk-1 at that point. Or the authors could mention the previous habituation data for cmk-1 at the beginning of the results.  

      Thank you. This is something that we considered while (re-)writing. However, we prefer to keep CMK-1 data side-by-side with TAX-6 data, regarding the result section. Nevertheless, we have modified the last paragraph of intro to better transition and justify the specific interest of searching for CMK-1 targets in the context of the present study.

      (2) Line 209: 'controls' is too strong a word. 'regulates' would be better, and it should be stated that this is for 'spontaneous reversal behavior'.  

      Thank you. This was modified.

      (3) Line 359: we suspect that these reflect functional enrichments.  

      We don’t see what would exactly be wrong with the original sentence. The proposed change (if it is a proposed change) would completely obliterate the intended meaning of our sentence. We rewrote the sentence to be as clear as possible, as follows: ”Even if we cannot rule out an actual inclination of the CaM kinase pathway to regulate these processes, we suspect that these GO term enrichments rather reflect an analytical bias toward abundant proteins.”

      (4) Line 563: In this subsection, it is not made clear when the T0 and T60 heat pulses are given, in relation to the 20s ISI heat pulses given for 60 minutes. Are they the first and last pulse, or given some time before or after this train of heat pulses?  

      Thanks for spotting this poor description, which we have improved in the revised manuscript. The heat pulse recording is given immediately before and immediately after the 60 min of repeated stimulation. After the T0 heat pulse recording there is a period of about 30 s (period of post stimuli recording + transfer from the recording device (INFERNO) to the habituation device (ThermINATOR)).  For the T60 acquisition, there is a lag of about 50 s between the last ‘habituation’ stimuli and the recording stimuli (time needed to move the plate between the habituation device and the recording device + 40 s of baseline reversal recording in the absence of heat stimuli).

      Reviewer #2 (Recommendations for the authors):  

      (1) There appears to be little to no connection between the phosphorylation site discovered in Calcineurin (S443) and the behavioral phenotypes being studied. What is the thermo-nociceptive response if phosphorylation of S443 in Calcineurin is blocked (using a S443A mutation) and/or combined with CMK-1 gain of function?  

      Thanks for the suggestion. The suggested analysis is complicated by several factors. First, the tax-6(lf) is not directly suitable for rescue analysis (until we would have identified a way to restore baseline reversal), so we cannot use a S443A-carrying rescue transgene. Second, the truncated TAX-6(GF) mutant lacks the C-terminal part, including S443, so we cannot introduce a S443A in this context. The left approach would be to modify the endogenous locus. This again is complicated by the fact that S443 exists in two different isoforms (with conserved RxxS motifs in two different alternative exons). It will be very difficult to perform these experiments until we know more about the expression pattern and function of the respective isoforms. This is work in progress, but this analysis will need to await a future publication.

      (2) The authors should state clearly if Calcineurin is a novel substrate of CaM Kinase or if this is already known in the field.  

      We have complemented the text mentioning we are not aware of Calcineurin having so far been reported to be a CaM kinase I substrate.

      (3) The logical flow of the manuscript could be improved given that CMK-1 and Calcineurin appear to act in different cells to regulate nociceptive habituation.  

      As detailed above, we have considered this point carefully and modified the introduction and the abstract. The discussion about the two places of action was also improved.

      (4) More detail about the experimental methods used for the heat-evoked reversals should be included in the Results section.  

      Thanks for the suggestion. We have improved the description in the Method section and expanded the partial description in the result section, so readers could hopefully proceed without needing to go back and forth with the methods.

      (5) Check for typos. For example: line 197 - fix typo "...to a series repeated heat stimulation...".  

      Thank you. We have carefully read the revised manuscript to correct remaining typos.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This manuscript described a structure-guided approach to graft important antigenic loops of the neuraminidase to a homotypic but heterologous NA. This approach allows the generation of well-expressed and thermostable recombinant proteins with antigenic epitopes of choice to some extent. The loop-grafted NA was designated hybrid.

      Strengths:

      The hybrid NA appeared to be more structurally stable than the loop-donor protein while acquiring its antigenicity. This approach is of value when developing a subunit NA vaccine which is difficult to express. So that antigenic loops could be potentially grafted to a stable NA scaffold to transfer strain-specific antigenicity.

      Weaknesses:

      However, major revisions to better organize the text, and figure and make clarifications on a number of points, are needed. There are a few cases in which a later figure was described first, data in the figures were not sufficiently described, or where there were mismatched references to figures.

      More importantly, the hybrid proteins did not show any of the advantages over the loop-donor protein in the format of VLP vaccine in mouse studies, so it's not clear why such an approach is needed to begin with if the original protein is doing fine.

      We thank the reviewer for their helpful comments. We have incorporated feedback from the authors to improve the manuscript. Please see our point-by-point response.

      The purpose of loop-grafting between H5N1/2021 (a high-expressor) and the PR8 virus was not to improve the expression of PR8, which is already a good expressing NA. Instead, the loop-grafting and the in vivo experiments were done to show the loop-specific protection following a lethal PR8 virus challenge.

      Reviewer #2 (Public review):

      In their manuscript, Rijal and colleagues describe a 'loop grafting' strategy to enhance expression levels and stability of recombinant neuraminidase. The work is interesting and important, but there are several points that need the author's attention.

      Major points

      (1) The authors overstress the importance of the epitopes covered by the loops they use and play down the importance of antibodies binding to the side, the edges, or the underside of the NA. A number of papers describing those mAbs are also not included.

      We have discussed the distribution of epitopes on NA molecule in the Discussion section "The distribution of epitopes in neuraminidase" (new line number 350). In Supplementary Figures 1 and 2, we have compiled the epitopes reported by polyclonal sera and mAbs via escape virus selection or crystal structural studies. There are 45 residues examples of escape virus selection, and we found that approximately 90% of the epitopes are located within the top loops (Loops 01 and Loops 23, which include the lateral sides and edges of NA). We have also included the epitopes of underside mAbs NDS.1 and NDS.3 in Supplementary Figure 2. Some of the interactions formed by these mAbs are also within the L01 and L23 loops. All relevant references are cited in Supplementary Figures 1 and 2.

      A new figure has been added [Figure 1b (ii)] to illustrate the surface mapping of epitopes on NA.

      (2) The rationale regarding the PR8 hybrid is not well described and should be described better.

      We described the rationale for the PR8 hybrid (new lines 247-250). For clarity, we have added the following sentence within the section "Loop transfer between two distant N1 NAs:...."

      (new lines 255-258):

      "mSN1 showed sufficient cross-reactivity to N1/09 to protect mice against virus challenge. Therefore, we performed loop transfer between mSN1 and PR8N1, which differ by 18 residues within the L01 and L23 loops and show no or minimal cross-reactivity, to assess the loop-specific protection."

      (3) Figure 3B and 6C: This should be given as numbers (quantified), not as '+'.

      We have included the numerical data in Supplementary Figure 6. The data is presented in semi-quantitative manner for simplification. To improve clarity, we have now added the following sentence to the Figure 3c legend: "Refer to Supplementary Figure 6 for binding titration data".

      (4) Figure 5A and 7A: Negative controls are missing.

      A pool of Empty VLP sera was included as a negative control, showing no inhibition at 1:40 dilution. In the figure legends, we have stated "Pooled sera to unconjugated mi3 VLP was negative control and showed no inhibition at 1:40 dilution (not included in the graphs)"

      (5) The authors claim that they generate stable tetramers. Judging from SDS-PAGE provided in Supplementary Figure 3B (BS3-crosslinked), many different species are present including monomers, dimers, tetramers, and degradation products of tetramers. In line 7 for example there are at least 5 bands.

      Tetrameric conformation of soluble proteins is evidenced by the size-exclusion chromatographs shown in Figures 3a and 6b. The BS3 crosslinked SDS-PAGE are only suggestive data, indicating that the protein is a tetramer if a band appears at ~250 kDa. However, depending on the reaction conditions, lower molecular weight bands may also be observed if crosslinking is incomplete.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Specific comments:

      - Description of Figure 2 on page 3 should go before Figure 3 lines 87-105 or swap the order of the two figures.

      We have moved lines 91-96, which refer to Figure 3, to appear after Figure 2.

      - Figure 3a, an EC50 should be calculated for both NA activity assay.

      Figure 3a has been updated to include the EC50 and AUC (Area under curve) values for both NA activity assays. The same update has also been made for Figure 6b.

      - Line 150, I'm not sure it's appropriate to cite a manuscript that was in preparation but not published. I'm referring to the two mAbs AG7C and AF9C that were claimed to bind to the L01 and L23 loops but not.

      We have changed the "manuscript in preparation" to "personal communication with Dr. Yan Wu, Capital Medical University".

      - The description in Figure 4a is lacking.

      We have added a detailed description for Figure 4a.

      - Figure 4c, sufficient description is needed. For example, the cavity should be outlined and annotated, what is the role of Val149? Why the first monomer is assigned a number of II and the second monomer with a number of I.

      We have added a detailed description for Figure 4c and amended the figure as per the reviewer’s suggestions.

      - Figure 5a, in addition to ELLA data to mSN1 and N1/09, ELLA data to N1/19 should also be measured and shown. Figure S7, please show IC50 instead of curves for better comparison.

      We included IC50 for mSN1 and N1/09 as we intended to associate the loops with protection.  Graphs for N1/19 have not been reported, but the IC50 titres from pooled sera are shown in Supplementary Figure 7 as a representation. Due to the limited sera sample sourced from tail vein bleed, these assays were performed using pooled sera, which represent the total response (established in numbers of experiments).

      - Line 234-238, the author made a statement about the data shown in Figure 7b "These results mirrored several studies in the literature which showed that immunization with the 2009 N1 could provide at least partial protection in mice and ferrets to the avian H5N1 challenge". The data did not reflect that. In Figure 5b, mSN1 protects as well as other proteins. In fact, there was no advantage of N109 and N109 hybrid over mSN1 in protection against the homologous H1N109. Although higher levels of NAI antibodies were induced with the homologous protein in Figure 5a. The protection could be contributed by non-NAI antibodies, so the authors should measure binding antibodies. The author may increase the challenge dose from 200 LD50 to 1000 LD50 to see a difference due to the strong immunogenicity of the nanoparticles vaccine plus addavax. Otherwise, it looks like loop grafting is not necessary as heterologous NA could broadly protect.

      We agree that msN1, despite its low NAI titres, was equally protective as homologous NA or its hybrid NA against H1N1/09 virus challenge at 200 LD50. There may be additional protective components, including non-NAI antibodies in homologous groups that may have contributed to the protection.

      We assessed sera binding to H1N1/2009 and found that the binding antibody levels were also lower in the msN1 group. The corresponding graph has now been added in Figure S7d. It was difficult to determine the NAI titre required to confer protection in this experiment. For this reason, we later chose PR8 as the challenge virus to demonstrate loop-specific protection.

      We are uncertain whether a 1000 LD50 challenge would have helped establish a correlation between protection and NAI IC50 titres, as the dose used is already lethal for DBA/2 mice.

      - Why would the authors separate work with N1/09 and N1/19 from PR8 N1? To this reviewer's understanding, they are all the same strategies with increasing numbers of dissimilar residues from N1/09 (12) to N1/19 (16) and to PR8 (18). They are all characterized by the same approaches in vitro and in vivo.

      We had two different goals for making hybrids with N1/09 and PR8 N1, therefore, we have presented these results separately.

      (1) For N1/09 and N1/19, we showed that loop-grafting improved protein yield and stability. Additionally, we showed that the N1/09 hybrid can be as protective as the homologous protein.

      (2) PR8 N1 is a high-yielding protein, so loop grafting did not significantly increase its yield. However, the PR8 virus challenge confirmed loop-specific protection.

      - For in vivo study testing the PR8 construct, although PR8 and PR8 hybrid protect better than the heterologous mSN1, the hybrid again did not show any advantages over the PR8 original proteins.

      That's correct - the PR8 hybrid was not advantageous over the original PR8 protein. However, the purpose of this experiment was to demonstrate loop specific protection. The PR8 hybrid (PR8 loops - mS scaffold) protected 6/6 mice, whereas mS hybrid (mS loops - PR8 scaffold) provided no protection.

      - Line 243-249, lack of reference to figures.

      References to Supplementary Figure 7b,c and Figure 2 has been added.

      - What was the reason that the challenge was one by 200 LD50 for 2009 H1N1 and 1000 LD50 for PR8.

      Viruses were titrated in the BALB/c strain for PR8 virus and the DBA/2 strain for X-179A (H1N1/2009) virus. These doses were selected based on their lethality and the time required to reach the endpoint (~20% weight loss) post-infection, which is 5-6 days. Most studies in the literature have used 10 LD50 or higher; thus the virus doses we used are relatively high.

      - Line 268, there is no Figure 5C.

      This was a mistake and has been corrected to Figure 6c.

      - Line 275 what are the readers supposed to see in supplementary Figure 5a? There is not enough description for the referred figures.

      A sentence has been added to Fig S5a description, to make a point about recognition of the NA scaffold by mAb CD6. "Binding by mAb CD6 is predominantly scaffold dependent and occurs across two protomers"

      - The discussion is very long and some of it is not relevant to the study. For example, the role of the tetramerization domain and the basis for structurally stable tetramer formation, were not the focuses of this study.

      We felt it was important to discuss the tetramerisation domain and the basis for stable tetramer formation. A previous study by Ellis et al.  used the VASP tetramerisation domain and introduced multiple NA interface mutations to achieve a more stable closed conformation. In contrast, NA proteins used in our study required the tetrabrachion tetramerisation domain to form a properly assembled tetramer.

      In lines 382-383, there is one unfinished sentence.

      This is corrected.

      The definition of the loops is also confusing. Line 381, the author stated that in the N1/19 hybrid design, residue N200S, could have been considered as part of the loop B2L23, and was it not?

      The designation of loop ends should not be rigid but rather based on multiple factors such as, their proximity to antigenic epitopes, charge, and hydrophobicity. This is discussed in the " Definition of loops" section.

      - Figure 1a and Figure S2, please provide sufficient descriptions, what do the blocks in different colors mean?

      We have updated the Figure 1a legend to indicate the colours.

      The descriptions for Figures S1 and S2 have also been revised for clarity.

      Reviewer #2 (Recommendations for the authors):

      Minor points

      (1) Line 37: Should be 'Influenza virus neuraminidase'.

      This is corrected.

      (2) Line 65: https://pubmed.ncbi.nlm.nih.gov/35446141/, https://pubmed.ncbi.nlm.nih.gov/33568453/ and https://pubmed.ncbi.nlm.nih.gov/28827718/ indicate that protective mAbs bind all over the NA head domain.

      We have discussed the epitopes on the NA head in detail in the section "The distribution of epitopes on Neuraminidase". In Supplementary Figures 1 and 2, we compiled several studies, including those on polyclonal sera and mAbs epitopes, emphasizing that loops 01 and 23 are the predominant antibody targets (~90%). Some antibodies also bind to the underside of NA. We have discussed and referenced these studies accordingly.

      A new figure has been added [Figure 1b (ii)] to illustrate the surface mapping of epitopes on NA.

      The first reference has been included in both our discussion and Supplementary figure 1.

      The NA epitopes discussed in the second reference have also been incorporated into our discussion and Supplementary figures 1 and 2. Note that, the E258K mutation generated on the NA underside was not relevant to mAbs and was generated randomly by passaging of H3N2 A/New York/PV190/2017 virus. 

      The third reference pertains to murine mAbs against influenza B virus NA.

      (3) Lines 71, 72, and throughout: 'et al.' should be in italics.

      All "et al." have been italicised.

      (4) Many abbreviations are not defined including CHO, SDS-PAGE, MUNANA, mi3, HEPES, BSA, TPCK, MWCO, HRP, PBS, TMB, TCID50, LD50, MES, PEG, PGA, MME, PGA-LM.

      The text has been amended to define these abbreviations.

      (5) Line 209: Shouldn't this be ID50 instead of IC50? Also, it is not defined.

      IC50 has been defined.

      (6) Line 210, line 346, line 581-582: No need to capitalize letters at the beginning of words mid-sentence.

      This is amended.

      (7) Line 227: Is 2009 H1N1 NA meant?

      This has been changed to "H1N1/2009 neuraminidase"

      (8) Line 310: Is this really quantitatively true? (see major comment 1).

      Based on the compilation of epitopes from published NA mAbs and polyclonal sera (via escape mutagenesis and NA-Fabs crystal structures), it is accurate to state that the protective epitopes are primarily located within loops 01 and 23.

      Please also refer to our response to minor point 2. 

      (9) Line 352 and throughout the manuscript: 'in vitro' should be in italics.

      This is amended.

      (10) Line 355: https://pubmed.ncbi.nlm.nih.gov/35446141/https://pubmed.ncbi.nlm.nih.gov/33568453/ and https://pubmed.ncbi.nlm.nih.gov/28827718/ should be included here.

      Studies reporting epitopes on Influenza A neuraminidase have been compiled in Supplementary Figures 1 and 2 and cited appropriately.

      (11) Line 365: https://pubmed.ncbi.nlm.nih.gov/35446141/ and https://pubmed.ncbi.nlm.nih.gov/33568453/ also describe epitopes on the underside of the NA.

      Please refer to the above response to point 10.

      (12) Line 365: Reference https://pubmed.ncbi.nlm.nih.gov/37506693/ is missing here.

      The reference has been added.

      (13) Line 369-371: Is it really a minority?

      In terms of the protective response, the majority of the antibody response is directed towards loops 01 and 23, which form the top antigenic surface. The term 'lateral' is used in some literature to describe NA mAb epitopes; loops 01 and 23 also encompass the lateral regions.

      To clarify this, we have added the following sentence to the Discussion section - "The distribution of epitopes on neuraminidase"

      "It is important to note that loops 01 and 23 include a portion of epitopes that have been described in the literature as side, lateral, or underside (see mAbs NDS.1, NDS.3, and CD6 in Supplementary Fig. 2)"

      Additionally in our studies in mice, we showed that protection is mediated by antibodies targeting the loops (Figure 7). We are uncertain about the binding response to the NA underside, but the NA inhibiting and protective response to the underside appears to be minimal.

      Furthermore Lederhof et al. showed that among the 'underside' mAbs, NDS.1 protected mice against virus challenge, whereas NDS.3 did not. In our analysis (Supplementary Figure 2), NDS.1 makes eight-residue contacts with B4L01 and B5L01, whereas NDS.3 make five-residue contacts with B3L01 and B4L01.

      (14) Line 530: The A in ELLA already stands for assay.

      This is corrected.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews: 

      Reviewer #1 (Public review): 

      Summary: 

      This manuscript by Kremer et al. characterizes the tissue-specific responses to changes in TFAM levels and mtDNA copy number in prematurely aging mice (polg mutator model). The authors find that overexpression of TFAM can have beneficial or detrimental effects depending on the tissue type. For instance, increased TFAM levels increase mtDNA copy number in the spleen and improve spleen homeostasis but do not elevate mtDNA copy number in the liver and impair mtDNA expression.

      Similarly, the consequences of reduced TFAM expression are tissue-specific. Reduced TFAM levels improve brown adipocyte tissue function while other tissues are unaffected. The authors conclude that these tissue-specific responses to altered TFAM levels demonstrate that there are tissue-specific endogenous compensatory mechanisms in response to the continuous mutagenesis produced in the prematurely aging mice model, including upregulation of TFAM expression, elevated mtDNA copy number, and altered mtDNA gene expression. Thus, the impact of genetically manipulating global TFAM expression is limited and there must be other determinants of mtDNA copy number under pathological conditions beyond TFAM. 

      Strengths: 

      Overall, this is an interesting study. It does a good job of demonstrating that given the multi-functional role of TFAM, the outcome of manipulating its activity is complex. 

      Weaknesses: 

      No major weaknesses were noted. We have minor suggestions for improving the clarity of the manuscript that are detailed in the "recommendations for the authors" section. 

      We thank the reviewer for the suggestions and addressed them as described in the "recommendations for the authors" section.

      Reviewer #2 (Public review): 

      Summary: 

      This study by Kremer et al. investigates the impact of modulation of expression of TFAM, a key protein involved in mitochondrial DNA (mtDNA) packaging and expression, in mtDNA mutator mice, which carry random mtDNA mutations. While previous research suggested that increasing TFAM could counteract the pathological effects of mtDNA mutations, this study reveals that the effects of TFAM modulation are tissue-specific. These findings highlight the complexity of mtDNA copy number regulation and gene expression, emphasizing that TFAM alone is not the sole determinant of mtDNA levels in contexts where oxidative phosphorylation is impaired. Other factors likely play a significant role, underscoring the need for nuanced approaches when targeting TFAM for therapeutic interventions. 

      Strengths: 

      The data presented in the manuscript is of high quality and supports major conclusions. 

      Weaknesses: 

      The statistical methods used are not clearly described, and some marked nonsignificant results appear visually significant, which raises concerns about data analysis. 

      Data presentation requires improvement. 

      We thank the reviewer for the comments. We updated the text in the Materials and Methods section to state the statistical methods and improved the figures as described in detail in the "recommendations for the authors" section.

      Recommendations for the authors:

      (1) Please include testis data in Figure 2 given previous work by authors showing that elevated mtDNA copy number can improve testis function. It would be interesting to compare the changes in mtDNA copy number in testis to these other tissues.

      We measured mtDNA copy number in testis using the CytB probe and added it as Supplementary figure 2 A.

      (2) The clarity of Table 1 could be improved. It is difficult to know whether the changes in the TFAM to mtDNA ratio are driven by changes in TFAM levels or mtDNA copy number. A suggestion is to include the TFAM and mtDNA values in parenthesis next to each listed ratio.

      We updated Table 1 and included the values of the normalized TFAM and mtDNA levels in parentheses.

      (3) The authors should consider showing TFAM western blot data in Figure 1.

      We thank the reviewer for the suggestion but would like to keep the TFAM western blot data with the other western blot data for the respective tissue.

      (4) The graphs for qPCR data (e.g. Figure 2) show mRNA or mtDNA levels relative to the control, which is always set to 1. Why, then, does the control group display error bars?

      For the normalization of the data to the WT group, we first calculate the average of the values from all the samples of the WT group. We then divide all values from the samples of all groups, including the WT group, by that average value. By doing so, we set the average value of the WT group to 1 and express all values from all samples of all groups, including the WT group, relative to this average value. Differences between the samples of the WT group are hence retained and allow for error calculations and the display of error bars.  

      (5) Page 3 second sentence to the last: overexpression of TFAM leads to...? Did the author mean mtDNA?

      We updated the text to “Heterozygous knockout of Tfam in wild-type mice results in ~50% decrease of mtDNA levels, whereas moderate overexpression of Tfam leads to ~50% increase in mtDNA levels25,26”

      (6) The sentence "In summary, mtDNA copy number regulation is more complex than previously assumed and the TFAM-to-mtDNA ratio seems to be finely tuned in a tissue-specific manner" - not clear who assumed (references?) and based on what data, please rephrase.

      We updated the text and it now reads “In summary, mtDNA copy number regulation is more complex than suggested by previous studies23–27 and the TFAM-to-mtDNA ratio seems to be finely tuned in a tissue-specific manner.”

      (7) The significant increase in complex II activity under TFAM overexpression (Figure 3) warrants additional discussion.

      We updated the Results section and it now reads “We detected increased levels of the complex II subunit Succinate Dehydrogenase Complex Iron Sulfur Subunit B (SDHB). Complex II is exclusively nuclear encoded and a compensatory increase upon impaired mitochondrial gene expresson has been observed before32.

      We proceeded to measure the enzyme activities of individual OXPHOS complexes in liver mitochondria (Fig. 3C). The complex I and complex IV activities were reduced to about 50% in Polg-/mut; Tfam+/+ mice in comparison with wild-type mice (Fig. 3C). However, we did not see any further alteration of the reduced enzyme activities induced by TFAM overexpression or reduced TFAM expression (Fig. 3C). Interestingly, we detected a significant increase in complex II and complex II + complex III activity upon TFAM overexpression, which can partially be explained by the increased complex II protein levels we oberseved in Polg-/mut; Tfam+/OE mice (Fig. 3, B and C).”

      (8) The statistical methods used should be explicitly stated. Some results marked as non-significant appear visually significant, for example, mt-Cytb in Figure 2C, Supplementary Figure 2B).

      We updated the text in the Materials and Methods section to state the statistical methods and it now reads “Statistical analysis and generation of graphs were performed with GraphPad Prism v9 software except for quantitative mass spectrometry data which was analyzed and plotted using R as described above. Statistical comparisons were performed using one-way analysis of variance (ANOVA), and post hoc analysis was conducted with Dunnett’s multiple comparisons test. Values of P < 0.05 were considered statistically significant.”

      Minor points: 

      (1) Replace numerical indications of significance with asterisks for consistency.

      We replaced all numerical indications of significance with asterisks.

      (2) Abbreviations SKM and BAT are not defined.

      We removed the mentioning of SKM (skeletal muscle) as the data from this tissue was not included. The Introduction reads “In contrast, in brown adipose tissue (BAT), a decrease in TFAM levels normalized Uncoupling protein 1 (Ucp1) expression.”

      (3) Use uniform scales across bar graphs in Figure 2 to improve clarity.

      We updated Figure 2 to have uniform scales.

      (4) Remove or increase the transparency of data points in Figure 1A to make group averages more discernible.

      We removed the data points in Figure 1A.

      (5) Add a Y-axis title to Figure 1C.

      We added the Y-axis title “Heart / body weight” to Figure 1C.

      (6) Size of the font used in some figures (4?) is not appropriate.

      We increased the font size for the figures.

      (7) All figure legend titles need work. Insert "expression" after TFAM in the Figure 2 title, Change the title to "Modulation of TFAM expression..." in Figure 4. 

      The figure legends now read as follows:

      “Figure 2: Modulation of TFAM expression affects mtDNA copy number in a tissue-specific manner.”

      “Figure 4: Alteration of TFAM expression does not affect the heart phenotype of mtDNA mutator mice.”

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this paper Kawasaki et al describe a regulatory role for the PIWI/piRNA pathway in rRNA regulation in Zebrafish. This regulatory role was uncovered through a screen for gonadogenesis defective mutants, which identified a mutation in the meioc gene, a coiled-coil germ granule protein. Loss of this gene leads to redistribution of Piwil1 from germ granules to the nucleolus, resulting in silencing of rRNA transcription.

      Strengths:

      Most of the experimental data provided in this paper is compelling. It is clear that in the absence of meioc, PiwiL1 translocates in to the nucleolus and results in down regulation of rRNA transcription. the genetic compensation of meioc mutant phenotypes (both organismal and molecular) through reduction in PiwiL1 levels are evidence for a direct role for PiwiL1 in mediating the phenotypes of meioc mutant.

      Weaknesses:

      Questions remain on the mechanistic details by which PiwiL1 mediated rRNA down regulation, and whether this is a function of Piwi in an unperturbed/wildtype setting. There is certainly some evidence provided in support of the natural function for piwi in regulating rRNA transcription (figure 5A+5B). However, the de-enrichment of H3K9me3 in the heterozygous (Figure 6F) is very modest and in my opinion not convincingly different relative to the control provided. It is certainly possible that PiwiL1 is regulating levels through cleavage of nascent transcripts. Another aspect I found confounding here is the reduction in rRNA small RNAs in the meioc mutant; I would have assumed that the interaction of PiwiL1 with the rRNA is mediated through small RNAs but the reduction in numbers do not support this model. But perhaps it is simply a redistribution of small RNAs that is occurring. Finally, the ability to reduce PiwiL1 in the nucleolus through polI inhibition with actD and BMH-21 is surprising. What drives the accumulation of PiwiL1 in the nucleolus then if in the meioc mutant there is less transcription anyway?

      Despite the weaknesses outlined, overall I find this paper to be solid and valuable, providing evidence for a consistent link between PIWI systems and ribosomal biogenesis. Their results are likely to be of interest to people in the community, and provide tools for further elucidating the reasons for this link.

      The amount of cytoplasmic rRNA in piwi+/- was increased by 26% on average (figure 5A+5B), the amount of ChiP-qPCR of H3K9 was decreased by about 26% (Figure 6F), and ChiP-qPCR of Piwil1 was decreased by 35% (Figure 6G), so we don't think there is a big discrepancy. On the other hand, the amount of ChiP-qPCR of H3K9 in meioc<sup>mo/mo</sup> was increased by about 130% (Figure 6F), while ChiP-qPCR of Piwil1 was increased by 50%, so there may be a mechanism for H3K9 regulation of Meioc that is not mediated by Piwil1. As for what drives the accumulation of Piwil1 in the nucleolus, although we have found that Piwil1 has affinity for rRNA (Fig. 6A), we do not know what recruits it. Significant increases in the 18-35nt small RNA of 18S, 28S rRNAs and R2 were not detected in meioc<sup>mo/mo</sup> testes enriched for 1-8 cell spermatogonia, compared with meioc<sup>+/mo</sup> testes. The nucleolar localization of Piwil1 has revealed in this study, which will be a new topic for future research.

      Reviewer #2 (Public review):

      Summary:

      In this study, the authors report that Meioc is required to upregulate rRNA transcription and promote differentiation of spermatogonial stem cells in zebrafish. The authors show that upregulated protein synthesis is required to support spermatogonial stem cells' differentiation into multi-celled cysts of spermatogonia. Coiled coil protein Meioc is required for this upregulated protein synthesis and for increasing rRNA transcription, such that the Meioc knockout accumulates 1-2 cell spermatogonia and fails to produce cysts with more than 8 spermatogonia. The Meioc knockout exhibits continued transcriptional repression of rDNA. Meioc interacts with and sequesters Piwil1 to the cytoplasm. Loss of Meioc increases Piwil1 localization to the nucleolus, where Piwil1 interacts with transcriptional silencers that repress rRNA transcription.

      Strengths:

      This is a fundamental study that expands our understanding of how ribosome biogenesis contributes to differentiation and demonstrates that zebrafish Meioc plays a role in this process during spermatogenesis. This work also expands our evolutionary understanding of Meioc and Ythdc2's molecular roles in germline differentiation. In mouse, the Meioc knockout phenocopies the Ythdc2 knockout, and studies thus far have indicated that Meioc and Ythdc2 act together to regulate germline differentiation. Here, in zebrafish, Meioc has acquired a Ythdc2-independent function. This study also identifies a new role for Piwil1 in directing transcriptional silencing of rDNA.

      Weaknesses:

      There are limited details on the stem cell-enriched hyperplastic testes used as a tool for mass spec experiments, and additional information is needed to fully evaluate the mass spec results. What mutation do these testes carry? Does this protein interact with Meioc in the wildtype testes? How could this mutation affect the results from the Meioc immunoprecipitation?

      Stem cell-enriched hyperplastic testes came from wild-type adult sox17::GFP transgenic zebrafish. Sperm were found in these hyperplastic testes, and when stem cells were transplanted, they self-renewed and differentiated into sperm. It is not known if the hyperplasias develop due to a genetic variant in the line. We added the following comment in L201-204.

      “The SSC-enriched hyperplastic testes, which are occasionally found in adult wildtype zebrafish, contain cells at all stages of spermatogenesis. Hyperplasia-derived SSCs self-renewed and differentiated in transplants of aggregates mixed with normal testicular cells.”

      Reviewer #3 (Public review):

      Summary:

      The paper describes the molecular pathway to regulate germ cell differentiation in zebrafish through ribosomal RNA biogenesis. Meioc sequesters Piwil1, a Piwi homolog, which suppresses the transcription of the 45S pre-rDNA by the formation of heterochromatin, to the perinuclear bodies. The key results are solid and useful to researchers in the field of germ cell/meiosis as well as RNA biosynthesis and chromatin.

      Strengths:

      The authors nicely provided the molecular evidence on the antagonism of Meioc to Piwil1 in the rRNA synthesis, which supported by the genetic evidence that the inability of the meioc mutant to enter meiosis is suppressed by the piwil1 heterozygosity.

      Weaknesses:

      (1) Although the paper provides very convincing evidence for the authors' claim, the scientific contents are poorly written and incorrectly described. As a result, it is hard to read the text. Checking by scientific experts would be highly recommended. For example, on line 38, "the global translation activity is generally [inhibited]", is incorrect and, rather, a sentence like "the activity is lowered relative to other cells" is more appropriate here. See minor points for more examples.

      Thank you for pointing that out. I corrected the parts pointed out.

      (2) In some figures, it is hard for readers outside of zebrafish meiosis to evaluate the results without more explanation and drawing.

      We refined Figure 1A and added explanation about SSC, sox17::egfp positive cells, and the SSC-enriched hyperplastic testis in L155-158.

      (3) Figure 1E, F, cycloheximide experiments: Please mention the toxicity of the concentration of the drug in cell proliferation and viability.

      When testicular tissue culture was performed at 0.1, 1, 10, 100, 250, and 500mM, abnormal strong OP-puro signals including nuclei were found in cells at 10mM or more. We added the results in the Supplemental Figure S2G. In addition, at 1mM, growth was perturbed in fast-growing 32≤-cell cysts of spermatogonia, but not in 1-4-cell spermatogonia, as described in L127-130.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      I don't have any recommendations for improvement. While I have outlined some of the weaknesses of the paper above. I don't see addressing these questions as pertinent for publication of this paper.

      Reviewer #2 (Recommendations for the authors):

      (1) The manuscript uses the terms 1-2 cell spermatogonia, GSC, and SSC throughout the figures and text. For example, 1-2 cell spermatogonia is used in Figure 1C, GSC is used in Figure 1F, and SSC is used in Figure 1 legend. The use of all three terms without definitions as to how they each relate with one another is confusing, particularly to those outside the zebrafish spermatogenesis field. It would be best to only use one term if the three terms are used interchangeably or to define each term if they represent different populations.

      GSC is a writing mistake. In this study, sox17-positive cells, which have been confirmed to self-renew and differentiate (Kawasaki et al., 2016), are considered SSCs. On the other hand, a comparison of meioc and ythdc2 mutants revealed differences in the composition of each cyst, so we describe the number of cysts confirmed. We added new data that 1-2 cell spermatogonia are sox17-positive in Supplemental Figure S3 (L157-158).

      (2) Figure 1B: What does the "SC" label represent in these figure panels?

      We added the explanation in the Figure legend.

      (3) Fig 7B and S7B show incongruent results, and the text implies that Fig S7B data better reflects in vivo biology. It is not clear how the authors interpret the different results between 7B and S7B.

      Thank you for pointing that out. Fig 7A and 7B were obtained by isolating sox17-positive cells. Because it was difficult to detect nucleoli in the isolated cells, probably due to the isolation procedure, we added S7B, which was analyzed in sectioned tissues. As this reviewer pointed out, S7B reflects the in vivo state better, so we changed S7B to 7B and 7B to S7B.

      Reviewer #3 (Recommendations for the authors):

      Minor points:

      (1) For general readers, it is nice to add a scheme of zebrafish spermatogenesis (lines 77-78) together with Figure 1A.

      As mentioned above, we refined Figure 1A.

      (2) Line 28, silence: the word "silence" is too strong here since rDNA is transcribed in some levels to ensure the cell survival.

      Thank you for your comment. We changed "silence" to "maintain low levels."

      (3) Line 60, YTDHC2: Please explain more about what protein YTDHC2 is.

      We added a description of Ythdc2 in the introduction.

      (4) Line 69, Piwil1: Please explain more about what protein Piwil1 is.

      We added a description of Piwil1 in the introduction.

      (5) Figure 1B, sperm: Please show clearly which sperms are in this figure using arrows etc.

      We represented sperm using arrowheads in Fig 1B.

      (6) Figure 1C, SC: Please show what SC is in the legend.

      We added the explanation in the Figure legend.

      (7) Line 83, meiotic makers: should be "meiotic prophase I makers".

      Thank you for pointing out the inaccurate expression description. We revised it.

      (8) Line 84, phosphor-histone H3: Should be "histone H3 phospho-S10 "

      We revised it.

      (9) Figure S1A, PH3: Please add PH3 is "histone H3 phospho-S10 ".

      We revised it.

      (10) Figure S1A, moto+/-: this heterozygous mutant showed an increased apoptosis. If so, please mention this in the text. If not, please remove the data.

      Thank you for pointing that out. The heterozygous mutant did not increase apoptosis, so we removed the data.

      (11) Line 88, no females developed: This means all males in the mutant. If so, what Figure S1B shows? These cells are spermatocytes? No "oocytes" developed is correct here?

      All meioc<sup>mo/mo</sup> zebrafish were males, and the meioc<sup>mo/mo</sup> cells in Fig. S1B are spermatogonia. No spermatocytes or oocytes were observed. To show this, we added "no oocytes" in L90.

      (12) Line 89, initial stages: What do the initial stages mean here? Please explain.

      The “initial stages” was changed to the pachytene stage.

      (13) Figure S1C: mouse Meioc rectangle lacks a right portion of it. Please explain two mutations encode a truncated protein in the main text.

      I apologize. It seems that the portion was missing during the preparation of the manuscript. We corrected it. In addition, we added a description of the protein truncation in L100-101.

      (14) Line 99: What "GRCz11" is.

      GRCz11 refers to the version of the zebrafish reference genome assembly. We added this.

      (15) Figure S2A: Dotted lines are cysts. If so, please mention it in the legend.

      We corrected the figure legend.

      (16) Figure S2B and C:, B1-4, C1-7: Rather use spermatogonia etc as a caption here.

      We corrected the figure and figure legend.

      (17) Line 113, hereafter, wildtype: Should be "wild type" or "wild-type".

      We corrected them.

      (18) Figure 1C: Please indicate what dotted lines mean here.

      We added “Dotted lines; 1-2 cell spermatogonia.”

      (19) Line 113, de novo: Please italicize it.

      We corrected it.

      (20) Line 113-116: Figure 1D shows two populations in the protein synthesis (low and high) in the 1-2-cell stage. Please mention this in the text.

      We added mention of two population.

      (21) Line 121, in vitro: Please italicize it.

      We corrected it.

      (22) Line 138-139, Figure 2A: Please indicate two populations in the rRNA concentrations (low and high) in the 1-2-cell stage. How much % of each cell is?

      We added mention of two population and % of each cell.

      (23) Figure 2B, cytes: Please explain the rRNA expression in spermatocytes (cytes) in the text.

      The decrease in rRNA signal intensity in spermatocytes was added.

      (24) Figure 2A, lines 147, low signals: Figure 2A did not show big differences between wild type and the mutant. What did the authors mean here? Lower levels of rRNAs in the mutant than in wild type. If so, please write the text in that way.

      We think that it is important to note that we were unable to find cells with upregulated rRNA signals, and therefore changed to “could not find cells with high signals of rRNAs and Rpl15 in meioc<sup>mo/mo</sup> spermatogonia”.

      (25) Figure 2E: Please add a schematic figure of a copy of rDNA locus such as Fig. S3A right.

      We added a schema of rDNA locus and primer sites such as Figure S3A right (now Figure 2F) in Figure 2E.

      (26) Figure S3A: This Figure should be in the main Figure. The quantification of Northern blots should be shown as a graph with statistical analysis.

      We added the quantification and transfer to the main Figure (Figure 2F).

      (27) Figure 4A: Please show single-color images (red or green) with merged ones.

      We added single-color images in the Figure 4A.

      (28) Line 198, Piwil1: Please explain what Piwil1 is briefly.

      We are sorry, but we could not quite understand the meaning of this comment. To show that Piwil1 is located in the nucleolus, we indicated it as (Figure 4A, arrowhead) in L209.

      (29) Line 198, Ddx4-positive: What is "Ddx4-positive"? Explain it for readers.

      Ddx4 is a marker for germinal granules, and the description was changed to reflect this.

      (30) Line 209, Fig. S4D-G: Please mention the method of the detection of piRNA briefly.

      We have described that we have sequenced small RNAs of 18-35 nt. Accordingly, we changed the term piRNA to small RNA.

      (31) Line 217: Please mention piwil1 homozygous mutant are inviable.

      We added that piwil1-/- are viable in L231.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      The study starts with the notion that in an AD-like disease model, ILC2s in the Rag1 knockout were expanded and contained relatively more IL-5<sup>+</sup> and IL-13<sup>+</sup> ILC2s. This was confirmed in the Rag2 knock-out mouse model.

      By using a chimeric mouse model in which wild-type knock-out splenocytes were injected into irradiated Rag1 knock-out mice, it was shown that even though the adaptive lymphocyte compartment was restored, there were increased AD-like symptoms and increased ILC2 expansion and activity. Moreover, in the reverse chimeric model, i.e. injecting a mix of wild-type and Rag1 knock-out splenocytes into irradiated wild-type animals, it was shown that the Rag1 knock-out ILC2s expanded more and were more active. Therefore, the authors could conclude that the RAG1 mediated effects were ILC2 cell-intrinsic.

      Subsequent fate-mapping experiments using the Rag1Cre;reporter mouse model showed that there were indeed RAGnaïve and RAGexp ILC2 populations within naïve mice. Lastly, the authors performed multi-omic profiling, using single-cell RNA sequencing and ATACsequencing, in which a specific gene expression profile was associated with ILC2. These included well-known genes but the authors notably also found expression of Ccl1 and Ccr8 within the ILC2. The authors confirmed their earlier observations that in the RAGexp ILC2 population, the Th2 regulome was more suppressed, i.e. more closed, compared to the RAGnaïve population, indicative of the suppressive function of RAG on ILC2 activity. I do agree with the authors' notion that the main weakness was that this study lacks the mechanism by which RAG regulates these changes in ILC2s.

      The manuscript is very well written and easy to follow, and the compelling conclusions are well supported by the data. The experiments are meticulously designed and presented. I wish to commend the authors for the study's quality.

      Even though the study is compelling and well supported by the presented data, some additional context could increase the significance:

      (1) The presence of the RAGnaïve and RAGexp ILC2 populations raises some questions on the (different?) origin of these populations. It is known that there are different waves of ILC2 origin (most notably shown in the Schneider et al Immunity 2019 publication, PMID 31128962). I believe it would be very interesting to further discuss or possibly show if there are different origins for these two ILC populations.

      Several publications describe the presence and origin of ILC2s in/from the thymus (PMIDs 33432227 24155745). Could the authors discuss whether there might be a common origin for the RAGexp ILC2 and Th2 cells from a thymic lineage? If true that the two populations would be derived from different populations, e.g. being the embryonic (possibly RAGnaïve) vs. adult bone marrow/thymus (possibly RAGexp), this would show a unique functional difference between the embryonic derived ILC2 vs. adult ILC2.

      We agree with the Reviewer that our findings raise important questions about ILC ontogeny. These are areas of ongoing investigation for us, and it is our hope this study may inform further investigation by others as well.

      Regarding the Schneider et al study, we have considered the possibility that RAG expression may mark a particular wave of ILC2 origin. In that study, the authors used a tamoxifen-based inducible Cre strategy in their experiments to precisely time the lineage tracing of a reporter from the Rosa26 locus. Those lineage tracing mice would overlap genetically with the RAG lineage tracing mice we used in our current study, thus performing combined timed migration fate mapping and RAG fate mapping experiments would require creating novel mouse strains.

      Similarly, the possible influence of the thymic or bone marrow environment on RAG expression in ILCs is an exciting possibility. Perhaps there are signals common to those environments that can influence all developing lymphocytes, including not only T and B cells but also ILCs, with one consequence being induction of RAG expression. While assessing levels of RAG-experienced ILCs in these tissues using our lineage tracing mouse may hint at these possibilities, conclusive evidence would require more precise control over the timing of RAG lineage tracing than our current reagents allow (e.g. to control for induction in those environments vs migration of previously fate-mapped cells to those environments).

      To answer these questions directly, we are developing orthogonal lineage tracing mouse strains, which can report on both timing of ILC development and RAG expression, but these mice are not available yet. Given the limitations of our currently available reagents, we were careful to focus our manuscript on the skin phenotype and the more descriptive aspects of the RAG-induced phenotype. We have elaborated on these important questions and referenced all the studies noted by the Reviewer in the Discussion section as areas of future inquiry on lines 421-433.  

      (2) On line 104 & Figures 1C/G etc. the authors describe that in the RAG knock-out ILC2 are relatively more abundant in the lineage negative fraction. On line 108 they further briefly mentioned that this observation is an indication of enhanced ILC2 expansion. Since the study includes an extensive multi-omics analysis, could the authors discuss whether they have seen a correlation of RAG expression in ILC2 with regulation of genes associated with proliferation, which could explain this phenomenon?

      We thank the Reviewer for pointing out this opportunity to further correlate our functional and multiomic findings. To address this, we first looked deeper into our prior analyses and found that among the pathways enriched in GSEA analysis of differentially expressed genes (DEGs) between RAG<sup>+</sup> and RAG<sup>-</sup> ILC2s, one of the pathways suppressed in RAG<sup>+</sup> ILC2s was “GOBP_EPITHELIAL_CELL_PROLIFERATION.”

      ( Author response image 1). There are a few other gene sets present in other databases such as MSigDB with terms including “proliferation,” but these are often highly specific to a particular cell type and experimental or disease condition (e.g. tissue-specific cancers). We did not find any of these enriched in our GSEA analysis.

      Author response image 1.

      GSEA plot of GOBP epithelial proliferation pathway in RAG-experienced vs RAG-naïve ILC2s.

      The ability to predict cellular proliferation states from transcriptomic data is an area of active research, and there does not appear to be any universally accepted method to do this reliably. We found two recent studies (PMIDs 34762642; 36201535) that identified novel “proliferation signatures.” Since these gene sets are not present in any curated database, we repeated our GSEA analysis using a customized database with the addition of these gene sets. However, we did not find enrichment of these sets in our RAG+/- ILC2 DEG list. We also applied our GPL strategy integrating analysis of our epigenomic data to the proliferation signature genes, but we did not see any clear trend. Conversely, our GSEA analysis did not identify any enrichment for apoptotic signatures as a potential mechanism by which RAG may suppress ILC2s.

      Notwithstanding the limitations of inferring ILC2 proliferation states from transcriptomic and epigenomic data, our experimental data suggest RAG exerts a suppressive effect on ILC2 proliferation. To formally test the hypothesis that RAG suppresses proliferation in the most rigorous way, we feel new mouse strains are needed that allow simultaneous RAG fate mapping and temporally restricted fate mapping. We elaborate on this in new additions to the discussion on lines 421-433.

      Reviewer #2 (Public Review):

      Summary:

      The study by Ver Heul et al., investigates the consequences of RAG expression for type 2 innate lymphoid cell (ILC2) function. RAG expression is essential for the generation of the receptors expressed by B and T cells and their subsequent development. Innate lymphocytes, which arise from the same initial progenitor populations, are in part defined by their ability to develop in the absence of RAG expression. However, it has been described in multiple studies that a significant proportion of innate lymphocytes show a history of Rag expression. In compelling studies several years ago, members of this research team revealed that early Rag expression during the development of Natural Killer cells (Karo et al., Cell 2014), the first described innate lymphocyte, had functional consequences.

      Here, the authors revisit this topic, a worthwhile endeavour given the broad history of Rag expression within all ILCs and the common use of RAG-deficient mice to specifically assess ILC function. Focusing on ILC2s and utilising state-of-the-art approaches, the authors sought to understand whether early expression of Rag during ILC2 development had consequences for activity, fitness, or function. Having identified cell-intrinsic effects in vivo, the authors investigated the causes of this, identifying epigenetic changes associated with the accessibility genes associated with core ILC2 functions.

      The manuscript is well written and does an excellent job of supporting the reader through reasonably complex transcriptional and epigenetic analyses, with considerate use of explanatory diagrams. Overall I think that the conclusions are fair, the topic is thoughtprovoking, and the research is likely of broad immunological interest. I think that the extent of functional data and mechanistic insight is appropriate.

      Strengths:

      - The logical and stepwise use of mouse models to first demonstrate the impact on ILC2 function in vivo and a cell-intrinsic role. Initial analyses show enhanced cytokine production by ILC2 from RAG-deficient mice. Then through two different chimeric mice (including BM chimeras), the authors convincingly show this is cell intrinsic and not simply as a result of lymphopenia. This is important given other studies implicating enhanced ILC function in RAG-/- mice reflect altered competition for resources (e.g. cytokines).

      - Use of Rag expression fate mapping to support analyses of how cells were impacted - this enables a robust platform supporting subsequent analyses of the consequences of Rag expression for ILC2.

      - Use of snRNA-seq supports gene expression and chromatin accessibility studies - these reveal clear differences in the data sets consistent with altered ILC2 function.

      - Convincing evidence of epigenetic changes associated with loci strongly linked to ILC2 function. This forms a detailed analysis that potentially helps explain some of the altered ILC2 functions observed in ex vivo stimulation assays.

      - Provision of a wealth of expression data and bioinformatics analyses that can serve as valuable resources to the field.

      We appreciate the strengths noted by the Reviewer for our study. We would like to especially highlight the last point about our single cell dataset and provision of supplemental data tables. Although our study is focused on AD-like skin disease and skin draining lymph nodes, we hope that our findings can serve as a valuable resource for future investigation into mechanisms of RAG modulation of ILC2s in other tissues and disease states.  

      Weaknesses:

      - Lack of insight into precisely how early RAG expression mediates its effects, although I think this is beyond the scale of this current manuscript. Really this is the fundamental next question from the data provided here.

      We thank the Reviewer for their recognition of the context of our current work and its future implications. We aimed to present compelling new observations within the scope of what our current data can substantiate. We believe answering the next fundamental question of the mechanisms by which RAG mediates its effects in ILC2s will require development of novel reagents. We are actively pursuing this, and we look forward to others building on our findings as well.

      - The epigenetic analyses provide evidence of differences in the state of chromatin, but there is no data on what may be interacting or binding at these sites, impeding understanding of what this means mechanistically.

      We thank the Reviewer for pointing out this aspect of the epigenomic data analysis and the opportunity to expand the scope of our manuscript. We performed additional analyses of our data to identify DNA binding motifs and infer potential transcription factors that may be driving the effects of a history of RAG expression that we observed. We hope that these additional data, analyses, and interpretation add meaningful insight for our readers.

      We first performed the analysis for the entire dataset and validated that the analysis yielded results consistent with prior studies (e.g. finding EOMES binding motifs as a marker in NK cells). Then, we examined the differences in RAG fate-mapped ILC2s. These analyses are in new Figure S10 and discussed on lines 277-316.  

      We also performed an analysis specifically on the Th2 locus, given the effects of RAG on type 2 cytokine expression. These analyses are in new Figure S12 and discussed on lines 366-378.

      - Focus on ILC2 from skin-draining lymph nodes rather than the principal site of ILC2 activity itself (the skin). This may well reflect the ease at which cells can be isolated from different tissues.

      We appreciate the Reviewer’s insight into the limitations of our study. Difficulties in isolating ILC2s from the skin were indeed a constraint in our study. In particular, we were unable to isolate enough ILC2s from the skin for stimulation and cytokine staining. Given that one of our main hypotheses was that RAG affects ILC2 function, we focused our studies on skin draining lymph nodes, which allowed measurement of the two main ILC2 functional cytokines, IL-5 and IL-13, as readouts in the key steady state and AD-like disease experiments.

      - Comparison with ILC2 from other sites would have helped to substantiate findings and compensate for the reliance on data on ILC2 from skin-draining lymph nodes, which are not usually assessed amongst ILC2 populations.

      We agree with the Reviewer that a broader survey of the RAG-mediated phenotype in other tissues and by extension other disease models would strengthen the generalizability of our observations. Indeed, we did a more expansive survey of tissues in our BM chimera experiments. We found a similar trend to our reported findings in the sdLN in tissues known to be affected by ILC2s ( Author response image 2) including the skin and lung and in other lymphoid tissues including spleen and mesenteric lymph nodes (mLN). We found that donor reconstitution in each tissue was robust except for the skin, where there was no significant difference between host and -donor CD45<sup>+</sup> immune cells and where CD45<sup>-</sup> parenchymal cells predominated ( Author response image 2A,C,E,G,I). This may explain why Rag1<sup>-/-</sup> donor ILC2s were significantly higher in proportion in all tissues except the skin, where we observed a similar trend that was not statistically significant ( Author response image 2B,D,F,H,J).

      Notwithstanding these results, given that we unexpectedly observed enhanced AD-like inflammation in the MC903 model in Rag1 KO mice, we concentrated our later experiments and analyses on defining the differences in skin draining ILC2s modulated by RAG. Our subsequent findings in the skin provoke many new hypotheses about the role of RAG in ILC2s in other tissues, and our tissue survey in the BM chimera provides additional rationale to pursue similar studies in disease models in other tissues. While this is an emerging area of investigation in our lab, we opted to focus this manuscript on our findings related to the AD-like disease model. We have ongoing studies to investigate other tissues, and we are still in the early stages of developing disease models to expand on these findings. However, if the reviewer feels strongly this additional data should be included in the manuscript, we are happy to add it. Considering the complexity of the data and concepts in the manuscript, we hoped to keep it focused to where we have strong molecular, cellular, and phenotypic outcomes.

      Author response image 2.

      Comparison of immune reconstitution in and ILC2 donor proportions in different tissues from BM chimeras. Equal quantities of bone marrow cells from Rag1<sup>-/-</sup> (CD45.2,CD90.2) and WT (CD45.2, CD90.1) C57Bl/6J donor mice were used to reconstitute the immune systems of irradiated recipient WT (CD45.1) C57Bl/6J mice. The proportion of live cells that are donor-derived (CD45.2), host-derived (CD45.1), or parenchymal (CD45-) [above] and proportion of ILC2s that are from Rag1<sup>-/-</sup> (CD90.2) or WT (CD90.1) donors [below] for A,B) skin C,D) sdLN E,F) lung G,H) spleen and I,J) mLN.

      - The studies of how ILC2 are impacted are a little limited, focused exclusively on IL-13 and IL-5 cytokine expression.

      We agree with the reviewer that our functional readout on IL-5 and IL-13 is relatively narrow. However, this focused experimental design was based on several considerations. First, IL-5 and IL-13 are widely recognized as major ILC2 effector molecules (Vivier et al, 2018, PMID 30142344). Second, in the MC903 model of AD-like disease, we have previously shown a clear correlation between ILC2s, levels of IL-5 and IL-13, and disease severity as measured by ear thickness (Kim et al, 2013, PMID 23363980). Depletion of ILC2s led to decreased levels of IL-13 and IL-5 and correspondingly reduced ear inflammation. However, while ILC2s are also recognized to produce other effector molecules such as IL-9 and Amphiregulin, which are likely involved in human atopic dermatitis (Namkung et al, 2011, PMID 21371865; Rojahn et al, 2020, PMID 32344053), there is currently no evidence linking these effectors to disease severity in the MC903 model. Third, IL-13 is emerging as a key cytokine driving atopic dermatitis in humans (Tsoi et al, 2019, PMID 30641038). Drugs targeting the IL-4/IL-13 receptor (dupilumab), or IL-13 itself (tralokinumab, lebrikizumab), have shown clear efficacy in treating atopic dermatitis. Interestingly, drugs targeting more upstream molecules, like TSLP (tezepelumab) or IL-33 (etokimab), have failed in atopic dermatitis. Taken together, these findings from both mouse and human studies suggest IL-13 is a critical therapeutic target, and thus functional readout, in determining the clinical implications of type 2 immune activation in atopic dermatitis.

      Aside from effector molecules, other readouts such as surface receptors may be of interest in understanding the mechanism of how RAG influences ILC2 function. For example, IL-18 has been shown to be an important co-stimulatory molecule along with TSLP in driving production of IL-13 by cutaneous ILC2s (Ricardo-Gonzalez et al, 2018, PMID 30201992). Our multiomic analysis showed decreased IL-18 receptor regulome activity in RAG-experienced ILC2s, which may be a mechanism by which RAG suppresses IL-13 production. Ultimately, in that study the role of IL-18 in enhancing MC903-induced inflammation through ILC2s was via increased production of IL-13, which was one of our major functional readouts. To clearly define mechanisms like these will require generation of new mice to interrogate RAG status in the context of tissue-specific knockout of other genes, such as the IL-18 receptor. We plan to perform these types of experiments in follow up studies. Notwithstanding this, we have now included additional discussion on lines 476508 to highlight why understanding how RAG impacts other regulatory and effector pathways would be an interesting area of future inquiry.

      Reviewer #3 (Public Review):

      In this study, Ver Heul et al. investigate the role of RAG expression in ILC2 functions. While RAG genes are not required for the development of ILCs, previous studies have reported a history of expression in these cells. The authors aim to determine the potential consequences of this expression in mature cells. They demonstrate that ILC2s from RAG1 or RAG2 deficient mice exhibit increased expression of IL-5 and IL-13 and suggest that these cells are expanded in the absence of RAG expression. However, it is unclear whether this effect is due to a direct impact of RAG genes or a consequence of the lack of T and B cells in this condition. This ambiguity represents a key issue with this study: distinguishing the direct effects of RAG genes from the indirect consequences of a lymphopenic environment.

      The authors focus their study on ILC2s found in the skin-draining lymph nodes, omitting analysis of tissues where ILC2s are more enriched, such as the gut, lungs, and fat tissue. This approach is surprising given the goal of evaluating the role of RAG genes in ILC2s across different tissues. The study shows that ILC2s derived from RAG-/- mice are more activated than those from WT mice, and RAG-deficient mice show increased inflammation in an atopic dermatitis (AD)-like disease model. The authors use an elegant model to distinguish ILC2s with a history of RAG expression from those that never expressed RAG genes. However, this model is currently limited to transcriptional and epigenomic analyses, which suggest that RAG genes suppress the type 2 regulome at the Th2 locus in ILC2s.

      We agree with the Reviewer that understanding the role of RAG in ILC2s across different tissues is an important goal. One of the primary inspirations for our paper was the clinical paradox that patients with Omenn syndrome, despite having profound adaptive T cell deficiency, develop AD with much greater penetrance than in the general population. Thus, there was always an appreciation for the likelihood that skin ILC2s have a unique proclivity towards the development of AD-like disease. Notwithstanding this, given the profound differences that can be found in ILC2s based on their tissue residence and disease state (as the Reviewer also points out below), we focused our investigations on characterizing the skin draining lymph nodes to better define factors underlying our initial observations of enhanced AD-like disease in Rag1<sup>-/-</sup> mice. While our findings in skin provoke the hypothesis that similar effects may be observed in other tissues and influence corresponding disease states, we were cautious not to suggest this may be the case by reporting surveys of other tissues without development of additional disease models to formally test these hypotheses. We present this manuscript now as a short, skin-focused study, rather than delaying publication to expand its scope. Truthfully, this project started in 2015 and has undergone many delays with the hopes of newer technologies and reagents coming to add greater clarity. We hope our study will enable others to pursue the goal of understanding the broader effects of RAG in ILC2s, and potentially other innate lymphoid lineages as well.

      We did a more expansive survey of tissues in our BM chimera experiments. We found a similar trend to our reported findings in the sdLN in tissues known to be affected by ILC2s ( Author response image 2) including the skin and lung and in other lymphoid tissues including spleen and mesenteric lymph nodes (mLN). We found that donor reconstitution in each tissue was robust except for the skin, where there was no significant difference between host and donor CD45<sup>+</sup> immune cells and where CD45<sup>-</sup> parenchymal cells predominated ( Author response image 2A,C,E,G,I). This may explain why Rag1<sup>-/-</sup> donor ILC2s were significantly higher in proportion in all tissues except the skin, where we observed a similar trend that was not statistically significant ( Author response image 2B,D,F,H,J). However, given the lack of correlation to disease readouts in other organ systems, we chose to not include this data in our manuscript. However, if the Reviewer feels these data should be included, we would be happy to include as a supplemental figure.

      The authors report a higher frequency of ILC2s in RAG-/- mice in skin-draining lymph nodes, which is expected as these mice lack T and B cells, leading to ILC expansion. Previous studies have reported hyper-activation of ILCs in RAG-deficient mice, suggesting that this is not necessarily an intrinsic phenomenon. For example, RAG-/- mice exhibit hyperphosphorylation of STAT3 in the gut, leading to hyperactivation of ILC3s. This study does not currently provide conclusive evidence of an intrinsic role of RAG genes in the hyperactivation of ILC2s. The splenocyte chimera model is artificial and does not reflect a normal environment in tissues other than the spleen. Similarly, the mixed BM model does not demonstrate an intrinsic role of RAG genes, as RAG1-/- BM cells cannot contribute to the B and T cell pool, leading to an expected expansion of ILC2s. As the data are currently presented it is expected that a proportion of IL-5-producing cells will come from the RAG1/- BM.

      The Reviewer raises an important point about the potential cell-intrinsic roles of RAG vs the many cell-extrinsic explanations that could affect ILC2 populations, with the most striking being the lack of T and B cells in RAG knockout mice. It is well-established that splenocyte transfer into T and B cell-deficient mice reconstitutes T cell-mediated effects (such as the T cell transfer colitis model pioneered by Powrie and others), and we were careful in our interpretation of the splenocyte chimera experiment to conclude only that lack of Tregs was unlikely to explain the enhanced ADlike disease in T (and B) cell-deficient mice.

      We agree with the Reviewer that the Rag1<sup>-/-</sup> BM will not contribute to the B and T cell pool. However, BM from the WT mice would be expected to contribute to development of the adaptive lymphocyte pool. Indeed, we found that most of the CD45<sup>+</sup> immune cells in the spleens of BM chimera mice were donor-derived ( Author response image 3A), and total levels of B cells and T cells showed reconstitution in a pattern similar to control spleens from donor WT mice, while spleens from donor Rag1<sup>-/-</sup> mice expectedly had essentially no detectable adaptive lymphocytes ( Author response image 3B-D). From this, we concluded the BM chimera experiment was successful in establishing an immune environment with the presence of adaptive lymphocytes, and the differences in ILC2 proportions we observed were in the context of developing alongside a normal number of B and T lymphocytes. Notwithstanding the potential role of the adaptive lymphocyte compartment in shaping ILC2 development, since we transplanted equal amounts of WT and Rag1<sup>-/-</sup> BM into the same recipient environment, we are not able to explain how cell-extrinsic effects alone would account for the unequal numbers of WT vs Rag1<sup>-/-</sup> ILC2s we observed after immune reconstitution.

      Author response image 3.

      Comparison of immune reconstitution in BM chimeras to controls. Equal quantities of bone marrow cells from Rag1<sup>-/-</sup> (CD45.2) and WT (CD45.2) C57Bl/6J donor mice were used to reconstitute the immune systems of irradiated recipient WT (CD45.1) C57Bl/6J mice. A) Number of WT recipient CD45.1+ immune cells in the spleens of recipient mice compared to number of donor CD45.2+ cells (WT and Rag1<sup>-/-</sup>) normalized to 100,000 live cells. Comparison of numbers of B cells, CD4+ T cells, and CD8+ T cells in spleens of B) BM chimera mice, C) control WT mice and D) control Rag1<sup>-/-</sup> mice.

      We also subsequently found transcriptional and epigenomic differences in RAG-experienced ILC2s compared to RAG-naïve ILC2s. Critically, these differences were present in ILC2s from the same mice that had developed normally within an intact immune system, rather than in the setting of a BM transplant or a defective immune background such as in Rag1<sup>-/-</sup> mice.

      We recognize that there are almost certainly cell-extrinsic factors affecting ILC2s in Rag1<sup>-/-</sup> mice due to lack of B and T cells, and that BM chimeras are not perfect substitutes for simulating normal hematopoietic development. However, the presence of cell-extrinsic effects does not negate the potential contribution of cell-intrinsic factors as well, and we respectfully stand by our conclusion that our data support a role, however significant, for cell-intrinsic effects of RAG in ILC2s.

      Finally, the Reviewer mentions the interesting observation that gut ILC3s exhibit hyperphosphorylation of STAT3 in Rag1<sup>-/-</sup> mice compared to WT as an example of cell-extrinsic effects of RAG deficiency (we assume this is in reference to Mao et al, 2018, PMID 29364878 and subsequent work). We now reference this paper and have included additional discussion on how our observations of ILC2s may be generalizable to not only other organ systems, but also other ILC subsets, limitations on these generalizations, and future directions on lines 477-520.

      Overall, the level of analysis could be improved. Total cell numbers are not presented, the response of other immune cells to IL-5 and IL-13 (except the eosinophils in the splenocyte chimera mice) is not analyzed, and the analysis is limited to skin-draining lymph nodes.

      We thank the Reviewer for the suggestions to add rigor to our analysis. ILC2 populations are relatively rare, and we designed our experiments to assess frequencies, rather than absolute numbers. We did not utilize counting beads, so our counts may not be comparable between samples. We have added additional data for absolute cell counts normalized to 100,000 live cells for each experiment (see below for a summary of new panels in each figure). Our new data on total cell numbers are consistent with the initial observations regarding frequency of ILC2s we reported from our experiments. For the BM chimera experiments, we presented the proportions of ILC2s, and IL-5 and IL-13 positive ILC2s, by donor source, as this is the critical question of the experiment. Notwithstanding our analysis by proportion, we found that the frequency of Rag1<sup>-/-</sup> ILC2s, IL-5<sup>+</sup> cells, or IL-13<sup>+</sup> cells within Lin- population was also significantly increased. While our initial submission included only the proportions for clarity and simplicity, we now include frequency and absolute numbers in new panels for more critical appraisal of our data by readers.

      In New Figure 1, we added new panels for ILC2 cell number in both the AD-like disease experiment (C) and in steady state (H).

      In New Figure S2, we added a panel for ILC2 cell number in steady state (B).

      In Figure 2 and associated supplemental data in Figure S4, we added several more panels. For the splenocyte chimera, we added a panel for ILC2 cell number in New Figure 2C.

      We incorporated multiple new panels in New Figure S4 to address the need for more data to be shown for the BM chimera (also requested by Reviewer #2). These included total cell counts and frequency for ILC2 (New Figure S4F,G), and IL-5<sup>+</sup> (New Figure S4I,K) and IL-13<sup>+</sup> (New Figure S4J,L) ILCs in addition to the proportions originally presented in Figure 2.  

      In terms of the limited analysis of other tissues, our initial observation of enhanced AD-like disease in Rag1<sup>-/-</sup> compared to WT mice built on our prior work elucidating the role of ILC2s in the MC903 model of AD-like disease in mice and AD in humans (Kim et al, 2013, PMID 23363980). Consequently, we focused on the skin to further develop our understanding of the role of RAG1 in this model. As in our prior studies, technical limitations in obtaining sufficient numbers of ILC2s from the skin itself for ex vivo stimulation to assess effector cytokine levels required performing these experiments in the skin draining lymph nodes.

      We agree that IL-5 and IL-13 are major mediators of type 2 pathology and studying their effects on immune cells is an important area of inquiry, particularly since there are multiple drugs available or in development targeting these pathways. However, our goal was not to study what was happening downstream of increased cytokine production from ILC2s, but instead to understand what was different about RAG-deficient or RAG-naïve ILC2s themselves that drive their expansion and production of effector cytokines compared to RAG-sufficient or RAGexperienced ILC2s. By utilizing the same MC903 model in which we previously showed a critical role for ILC2s in driving IL-5 and IL-13 production and subsequent inflammation in the skin, we were able to instead focus on defining the cell-intrinsic aspects of RAG function in ILC2s.

      The authors have a promising model in which they can track ILC2s that have expressed RAG or not. They need to perform a comprehensive characterization of ILC2s in these mice, which develop in a normal environment with T and B cells. Approximately 50% of the ILC2s have a history of RAG expression. It would be valuable to know whether these cells differ from ILC2s that never expressed RAG, in terms of proliferation and expression of IL5 and IL-13. These analyses should be conducted in different tissues, as ILC2s adapt their phenotype and transcriptional landscape to their environment. Additionally, the authors should perform their AD-like disease model in these mice.

      We agree with the Reviewer (and a similar comment from Reviewer #2) that a broader survey of the RAG-mediated phenotype in other tissues and by extension other disease models would strengthen the generalizability of our observations. Indeed, we did a more expansive survey of tissues in our BM chimera experiments. We found a similar trend to our reported findings in the sdLN in tissues known to be affected by ILC2s ( Author response image 2) including the skin and lung and in other lymphoid tissues including spleen and mesenteric lymph nodes (mLN). We found that donor reconstitution in each tissue was robust except for the skin, where there was no significant difference between host and donor CD45<sup>+</sup> immune cells and where CD45<sup>-</sup> parenchymal cells predominated (Author response image 2A,C,E,G,I). This may explain why Rag1<sup>-/-</sup> donor ILC2s were significantly higher in proportion in all tissues except the skin, where we observed a similar trend that was not statistically significant (Author response image 2B,D,F,H,J). We omitted these analyses to maintain the focus on the skin, but we will be happy to add this data to the manuscript if the Reviewer feels this figure should be helpful.

      Notwithstanding these results, given that we unexpectedly observed enhanced AD-like inflammation in the MC903 model in Rag1 KO mice, we concentrated our later experiments and analyses on defining the differences in skin draining ILC2s modulated by RAG. Our subsequent findings in the skin provoke many new hypotheses about the role of RAG in ILC2s in other tissues, and our tissue survey in the BM chimera provides additional rationale to pursue similar studies in disease models in other tissues. While this is an emerging area of investigation in our lab, we opted to focus this manuscript on our findings related to the AD-like disease model. We have ongoing studies to investigate other tissues, and we are still in the early stages of developing disease models to expand on these findings. However, if the reviewer feels strongly this additional data should be included in the manuscript, we are happy to add it. Considering the complexity of the data and concepts in the manuscript, we hoped to keep it focused to where we have strong molecular, cellular, and phenotypic outcomes. We elaborate on the implications of our work for future studies, including limitations of our study and currently available reagents and need for new mouse strains to rigorously answer these questions on lines 476-508

      The authors provide a valuable dataset of single-nuclei RNA sequencing (snRNA-seq) and ATAC sequencing (snATAC-seq) from RAGexp (RAG fate map-positive) and RAGnaïve (RAG fate map-negative) ILC2s. This elegant approach demonstrates that ILC2s with a history of RAG expression are epigenomically suppressed. However, key genes such as IL-5 and IL-13 do not appear to be differentially regulated between RAGexp and RAGnaïve ILC2s according to Table S5. Although the authors show that the regulome activity of IL-5 and IL-13 is decreased in RAGexp ILC2s, how do the authors explain that these genes are not differentially expressed between the RAGexp and RAGnaïve ILC2? I think that it is important to validate this in vivo.

      We thank the Reviewer for highlighting the value and possible elegance of our data. The Reviewer brings up an important issue that we grappled with in this study and that highlights a major technical limitation of single cell sequencing studies. Genes for secreted factors such as cytokines are often transcribed at low levels and are poorly detected in transcriptomic studies. This is particularly true in single cell studies with lower sequencing depth. Various efforts have been made to overcome these issues such as computational approaches to estimate missing data (e.g. van Djik et al, 2018, PMID 29961576; Huang et al, 2018, PMID 29941873), or recent use of cytokine reporter mice and dial-out PCR to enhance key cytokine signals in sequenced ILCs (Bielecki et al, 2021, PMID 33536623). We did not utilize computational methods to avoid the risk of introducing artifacts into the data, and we did not perform our study in cytokine reporter mice. Thus, cytokines were poorly detected in our transcriptomic data, as evidenced by lack of identification of cytokines as markers for specific clusters (e.g. IL-5 for ILC2s) or significant differential expression between RAG-naïve and RAG-experienced ILC2s.

      However, the multiomic features of our data allowed a synergistic analysis to identify effects on cytokines. For example, transcripts for the IL-4 and IL-5 were not detected at a high enough level to qualify as marker genes of the ILC2 cluster in the gene expression (GEX) assay but were identified as markers for the ILC2 cluster in the ATAC-seq data in the differentially accessible chromatin (DA) assay. Using the combined RNA-seq and ATAC-seq gene to peak links (GPL) analyses, many GPLs were identified in the Th2 locus for ILC2s, including for IL-13, which was not identified as a marker for ILC2s by any of the assays alone. Thus, our combined analysis took advantage of the potential of multiomic datasets to overcome a general weakness inherent to most scRNAseq datasets.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      - Line 168; Reference 23 also showed expression in the NK cells, please add this reference to reference 24.

      We thank the reviewer for catching this oversight, and we have corrected it in the revised manuscript.

      - Please add the full names for GPL and sdLN in the text of the manuscript when first using these abbreviations. They are now only explained in the legends.

      We reviewed the manuscript text and found that we defined sdLNs for the first time on line 104. We defined GPLs for the first time on line 248. We believe these definitions are placed appropriately near the first references to the corresponding figures/analysis, but if the Reviewer believes we should move these definitions earlier, we are happy to do so.

      Reviewer #2 (Recommendations For The Authors):

      I would suggest that the following reanalyses would improve the clarity of the data:

      - Can ILC2 numbers, rather than frequency, be used (e.g. in Figure 1C, S2B, and so on). This would substantiate the data that currently relies on percentages.

      This was a weakness also noted by Reviewer #3. We have added data on ILC2 numbers for each experiment as outlined below:

      In New Figure 1, we added new panels for ILC2 cell number in both the AD-like disease experiment (C) and in steady state (H).

      In New Figure S2, we added a panel for ILC2 cell number in steady state (B).

      In Figure 2 and associated supplemental data in Figure S4, we added several more panels. For the splenocyte chimera, we added a panel for ILC2 cell number in New Figure 2C.

      We incorporated multiple new panels in New Figure S4 to address the need for more data to be shown for the BM chimera (also requested by Reviewer #2). These included total cell counts and frequency for ILC2 (New Figure S4F,G), and IL-5<sup>+</sup> (New Figure S4I,K) and IL-13<sup>+</sup> (New Figure S4J,L) ILCs in addition to the proportions originally presented in Figure 2.  

      - Can the authors provide data on IL-33R expression on sdLN ILC2s? Expression of ST-2 (IL-33R) does vary between ILC2 populations and is impacted by the digestion of tissue. All of the data provided here requires ILC2 to be IL-33R<sup>+</sup>. In the control samples, the ILC2 compartment is very scarce - in LNs, ILC2s are rare. The gating strategy with limited resolution of positive and negative cells in the lineage gate doesn't help this analysis.

      The Reviewer raises a valid point regarding the IL-33R marker and ILC2s. We designed our initial experiments to be consistent with our earlier observations of skin ILC2s, which were defined as CD45<sup>+</sup>Lin-CD90+CD25+IL33+, and the scarcity of skin draining lymph node ILC2s at steady state was consistent with our prior findings (Kim et al, 2013, PMID 23363980). We can include MFI data on IL-33R expression in these cells if the reviewer feels strongly that this would add to the manuscript, but we did not include other ILC2-specific markers in these experiments that would give us an alternative total ILC2 count to calculate frequency of IL-33R<sup>+</sup> ILC2s, which would also make the context of the IL-33 MFI difficult to interpret.

      Other studies defining tissue specific expression patterns in ILC2s have called into question whether IL-33R is a reliable marker to define skin ILC2s (Ricardo-Gonzalez et al, 2018, PMID 30201992). However, there is evidence for region-specific expression of IL-33R (Kobayashi et al, 2019, PMID 30712873), with ILC2s in the subcutis expressing high levels of IL-33R and both IL5 and IL-13, while ILC2s in the epidermis and dermis have low levels of IL-33R and IL-5 expression. In contrast to the Kobayashi et al study, Ricardo-Gonzalez et al sequenced ILC2s from whole skin, thus the region-specific expression patterns were not preserved, and the lower expression of IL-33R in the epidermis and dermis may have diluted the signal from the ILC2s in the subcutis. These may also be the ILC2s most likely to drain into the lymph nodes, which is the tissue on which we focused our analyses (consistent with our prior work in Kim et al, 2013).

      - In Figure 2 (related to 2H, 2I) can flow plots of the IL-5 versus IL-13 gated on either CD90.1+CD45.2+ or CD90.2+CD45.2+ ILC2 be shown? I.e. gate on the ILC2s and show cytokine expression, rather than the proportion of donor IL5/13. The proportion of donor ILC2 is shown to be significantly higher in 2G. Therefore gating on the cells of interest and showing on a cellular basis their ability to produce the cytokines would better make the point I think.

      We agree that this is important additional data to include. We have added flow plots of sdLN ILC2s from the BM chimera divided by donor genotype showing IL-5 and IL-13 expression in New Figure S4H.

      I assume the authors have looked and there is no obvious data, but does analysis of transcription factor consensus binding sequences in the open chromatin provide any new insight?

      The Reviewer also commented on this in the public review. As copied from our response above:

      We found that the most enriched sites in the ILC2 gene loci contained the consensus sequence GGGCGG (or its reverse complement), a motif recognized by a variety of zinc finger transcription factors (TFs). Predictions from our analyses predicted the KLF family of zinc finger TFs as most likely to be enriched at the identified open chromatin regions. To infer which KLFs might be occupying these sites in the RAG-experienced or RAG-naïve cells, we also assessed the expression levels of these identified TFs. Interestingly, KLF2 and KLF6 are more expressed in RAG-experienced ILC2s. KLF6 is a tumor suppressor (PMID: 11752579), and both KLF6 and KLF2 were recently shown to be markers of “quiescent-like” ILCs (PMID: 33536623). Further, upon analysis of the Th2 locus, the (A/T)GATA(A/G) consensus site (or reverse complement) was enriched in identified open chromatin at that locus. The algorithm predicted multiple TFs from the GATA family as possible binding partners, but expression analysis showed only GATA3 was highly expressed in ILC2s, consistent with what would be predicted from prior studies (PMID: 9160750).

      We have added this data in new Figure S10 and new Figure S12, with corresponding text in the Results section on lines 277-316 and lines 366-378.

      In terms of phrasing and presentation:

      - It would help to provide some explanation of why all analyses focus on the draining LNs rather than the actual site of inflammation (the ear skin). I do not think it appropriate to ask for data on this as this would require extensive further experimentation, but there should be some discussion on this topic. This feels relevant given that the skin is the site of inflammatory insult and ILC2 is present here. How the ILC2 compartment in the skindraining lymph nodes relates to those in the skin is not completely clear, particularly given the prevailing dogma that ILC2 are tissue-resident.

      Given limitations of assessing cytokine production of the relatively rare population of skin-resident ILC2s, we focused on the skin-draining lymph nodes (sdLN). Our findings in the current manuscript are consistent with our prior work in Kim et al, 2013 (PMID 23363980), and more recently in Tamari et al, 2024 (PMID 38134932), which demonstrated correlation of increased ILC2s in sdLN with increased skin inflammation in the MC903 model. Similarly, Dutton et al (PMID 31152090) have demonstrated expansion of the sdLN ILC2 pool in response to MC903-induced AD-like inflammation in mice. We elaborate on the implications of our work for future studies, including limitations of our study (including the focus on the sdLN), and currently available reagents and need for new mouse strains to rigorously answer these questions on lines 476-508

      - I think the authors should explicitly state that cytokine production is assessed after ex vivo restimulation (e.g. Lines 112-113).

      We have added this statement to the revised text.

      - I also think that it would help to be consistent with axis scales where analyses are comparable (e.g. Figure 1D vs Figure 1H).

      We agree with the Reviewer and we have adjusted the axes for consistency. The data remains unchanged, but axes are slightly adjusted in New Figure 1 (D&I, E&J, F&K) and New Figure S2 (C-E match New Figure 1 D-F). This same axis scaling scheme is carried forward to New Figure 2 (D-E) and New Figure S4 (G,K,L). New data on cell counts is also included per request by Reviewers 2 and 3 (see above). However, we found results for total cells, including ILC2s (New Figure 1C,H, New Figure S2B, New Figure 2C, New Figure S4F), were consistent within experiments, but not between experiments, likely representing issues with normalizing counts (we did not include counting beads for more accurate total counts). Thus, the y-axes in those panels are not consistent between experiments/figures.

      We feel reporting the proportion of WT vs Rag1<sup>-/-</sup> donor cells for the BM chimera is most illustrative of the effect of RAG and have kept it in the main New Figure 2, but for the BM chimera experiment panels we also include the total counts of IL-5<sup>+</sup> and IL-13<sup>+</sup> ILC2s (New Figure S4I,J).

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Recommendations for the authors):

      (1) Storyline and Narrative Flow:

      Consider revising the manuscript to create a more coherent and consistent narrative. Clarify how each section of the study-particularly the transition from multi-omics data integration to single-cell RNA-seq validation-contributes to the overall research question. This will help readers better understand the logical flow of the study.

      We thank the reviewer’s suggestion, which have highlighted the deficiencies in this area, and we have made appropriate modifications:

      We have modified some text, including the connections between different sections in the results part and the objectives and roles of various analyses in each section, thus enhancing the coherence between the contexts and clarifying the objectives and functions of each analysis, We believe this will help readers better understand the main content of the entire text.

      (2) Immune Cell Activity Analysis:

      Reevaluate the methods used to assess immune cell activities within the context of the tumor microenvironment. Consider providing additional justification for the relevance of using the cancer cell model for this analysis. If necessary, explore alternative methods or models that might offer more meaningful insights into immune-tumor interactions.

      We thank the reviewer’s suggestion, which have highlighted the deficiencies in this area, and we have made appropriate modifications:

      Using RNA-Bulk data, we evaluated the tumor immune microenvironment through various methods to assess immune infiltration levels and responses to immunotherapy. We found that the results were largely consistent with those presented in the manuscript, providing strong support for our viewpoints. We also acknowledge the limitations of findings from bioinformatics analysis. In our upcoming research, we plan to develop organoid models with gene expression patterns of both CS1 and CS2 subtypes, using these models as a foundation for studying the tumor immune microenvironment.

      (3) Single-Cell RNA-Seq Validation:

      Expand the validation of your findings using single-cell RNA-seq data. This could include more in-depth analyses that explore the heterogeneity within the subtypes and confirm the robustness of your classification method at the single-cell level. This would strengthen the support for your claims about the relevance of the identified subtypes.

      We thank the reviewer’s suggestion, which have highlighted the deficiencies in this area, and we have made appropriate modifications:

      In this manuscript, we employed the NTP algorithm to classify malignant cells identified by the CopyKAT algorithm using characteristic genes of CS1 and CS2 subtypes. This approach is similar to previous method that analyzed patients in the ICGC cohort with the same subtype genes. We consider this classification method valid.

      After classifying the malignant cells, we performed metabolic and cell communication analyses on the CS1 and CS2 subtype cells, revealing significant differences in biological pathways enriched by differential genes, metabolic levels, and cell signaling patterns. These differences align with variations observed in prior classifications and analyses based on RNA-Bulk data.

      We also acknowledge that validating the classification method solely with the single-cell dataset from this study is insufficient. We analyzed GSE202642 using the same processes and methods as GSE229772, finding that the results were generally consistent, indicating that our classification method exhibits a degree of robustness at the single-cell level.

      (4) Methodological Justification:

      Provide a more detailed rationale for the selection of machine learning algorithms and integration strategies used in the study. Explain why the chosen methods are particularly well-suited for this research, and discuss any potential limitations they might have.

      We thank the reviewer’s suggestion, which have highlighted the deficiencies in this area, and we have made appropriate modifications:

      We have updated the methodology section to enhance readers' understanding of the fundamental principles involved. This analysis has two key features: first, it combines 10 machine learning algorithms to generate 101 models and ultimately selects the prognostic prediction model with the highest C-index from these 101 algorithms; second, it utilizes the LOOCV method to analyze the training and validation sets. Compared to the conventional method of randomly dividing the training and validation sets by a fixed ratio, this approach significantly minimizes the bias and randomness introduced by the splitting process. Therefore, we believe this analysis can leverage the characteristic genes of the CS1 and CS2 subtypes, combined with existing clinical data from public databases, to yield results that are more accurate and reliable than the commonly used prognostic models in previous literature, such as COX regression and Lasso regression, as well as other individual algorithms. While this analysis presents advantages over some previous modeling methods, it is essential to recognize that it remains based on analyses conducted using public databases, which may obscure certain factors that might be clinically relevant to patient prognosis due to the mathematical logic of the algorithms.

      (5) Figures and Visualizations:

      Improve the clarity of your figures by addressing the following:

      a) Figure 3A: Cluster the pathways to make the comparisons clearer and more meaningful.

      b) Figure 4A: Clearly explain the significance of the blue bar.

      c) Figure 4B: Ensure this figure is discussed in the main text to justify its inclusion.

      d) Figure 7C: Enhance the figure legend to provide more informative details.

      Additionally, ensure that figure descriptions go beyond the captions and provide detailed explanations that help the reader understand the significance of each figure.

      We thank the reviewer’s suggestion, which have highlighted the deficiencies in this area, and we have made appropriate modifications:

      Figure 3A: We clustered the samples based on CS1 and CS2 subtypes and displayed the immune-related cell scores of each sample as a heatmap.

      Figure 4A: The blue bars in the figure represent the average C-index of this algorithm combination in the training dataset TCGA and the validation dataset ICGC, which we have supplemented in the corresponding sections of the text.

      Figure 4B: We described this figure in the results section, which primarily aims to validate whether our prognostic prediction model can predict patient outcomes in the TCGA cohort. The results showed that after performing prognostic risk scoring on patients based on the prediction model and categorizing them into high-risk and low-risk groups, the two groups exhibited significant prognostic differences, with the high-risk group showing worse outcomes compared to the low-risk group. This indicates that our prognostic prediction model can effectively distinguish the prognostic risk differences among patients in the TCGA-LIHC cohort. We also discussed these findings in the discussion section.

      Figure 7C: We used both point color and size to visualize the levels of metabolic scores, resulting in two dimensions in the legend, which actually represent the same information. Therefore, we removed the results that used point size to indicate the levels of metabolic scores.

      (6) Supplementary Materials:

      Consider including more detailed supplementary materials that provide additional validation data, extended methodological descriptions, and any other information that would support the robustness of your findings.

      We thank the reviewer’s suggestion, which have highlighted the deficiencies in this area, and we have made appropriate modifications:

      In the subsequent version of the record, we will upload the important results obtained during the research to GitHub, and in this revision, we have updated some figures that may better explain the results or the robustness of the findings as supplementary materials.

      (7) Recent Literature:

      a) Incorporate more recent studies in your discussion, especially those related to HCC subtypes and the application of machine learning in oncology. This will provide a more current context for your work and help position your findings within the broader field.

      We thank the reviewer’s suggestion, which have highlighted the deficiencies in this area, and we have made appropriate modifications:

      We have reviewed several studies related to HCC subtype classification and the application of machine learning in this field. In the discussion section, we summarize the significance and limitations of these studies. Additionally, we discuss the characteristics of our study in comparison to previous research in this field.

      (8) Data and Code Availability:

      Ensure that all data, code, and materials used in your study are made available in line with eLife's policies. Provide clear links to repositories where readers can access the data and code used in your analyses.

      We thank the reviewer’s suggestion, which have highlighted the deficiencies in this area, and we have made appropriate modifications:

      We have examined the relevant data, code, and materials. We confirm that we have indicated the sources of the data and tools used in the analysis within the manuscript. Moreover, these data and tools are accessible via the websites or references we have provided.

      Reviewer #2 (Recommendations for the authors):

      (1) While the computational findings are robust, further experimental validation of the two subtypes, particularly the role of the MIF signaling pathway, would strengthen the biological relevance of the findings. In vitro or in vivo validation could confirm the proposed mechanisms and their influence on patient prognosis.

      We thank the reviewer’s suggestion, which have highlighted the deficiencies in this area, and we have made appropriate modifications:

      We intend to verify our findings in future studies using tumor cell line models and animal models. We aim to identify and intervene with key molecules in the MIF signaling pathway. We will investigate how the MIF signaling pathway affects tumor sensitivity to treatment in both cell line and animal models, along with the underlying mechanisms.

      (2) Consider testing the model on additional independent cohorts beyond the TCGA and ICGC datasets to further demonstrate its generalizability and applicability across different patient populations.

      We thank the reviewer’s suggestion, which have highlighted the deficiencies in this area, and we have made appropriate modifications:

      We analyzed the GSE14520 study recorded in the GEO database, which uploaded a cohort consisting of 209 HCC patients and their corresponding RNA sequencing data. We validated the prognostic model obtained in this study using this cohort, and found that the model effectively distinguishes patients into high-risk and low-risk prognostic categories. Furthermore, there is a significant prognostic difference between the high-risk and low-risk patient groups. This is consistent with the results we obtained previously.

      (3) Review the manuscript for long or complex sentences, which can be broken down into shorter, more readable parts.

      We have made revisions to the long and complex sentences in the manuscript without compromising its academic integrity and rationality, with the hope that this will help readers better understand the content of this study.

      During the revision process, in addition to addressing the reviewer comments, we conducted a thorough review of the analysis. In the course of this review, we identified a few errors in the data usage and have since corrected the relevant data and figures:

      Figure 4: Due to space constraints, we adjusted the composition of the figures after incorporating the validation results from the GSE14520 dataset.

      Figure 5A: We rechecked the regression coefficients included in the model, updated several more recent prognostic models, and calculated the C-index for 20 prognostic models in the TCGA and ICGC cohorts using a method consistent with previous studies.

      Figure 5C-D: We adjusted the clarity of the figures.

      Figure 8: We reclassified the selected malignant cells and updated the subtypes results. Subsequently, based on the repeatedly confirmed typing results, we comprehensively updated the analysis results of the subsequent cell communication network construction, ensuring that the entire analysis process remains consistent with previous findings. We also adjusted the composition of the figure and presented the images that could not be conveniently merged due to space constraints as Figure 9.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer 1:

      (1) Figure 2 is mentioned before Figure 1

      We thank the reviewer for pointing this out, this was a mistake. What was meant by Figure 2 was actually Figure 1. This has been corrected in the manuscript.

      (2) Figure 1c: red is used to indicate cell junctions on raw data, but also the error.

      The color red is used to indicate cell junctions on raw data on figure 1c left, while it is used to indicate the error on figure 1c right.

      The Lagrangian error can be negative right? This is not reflected by the error scale which goes from 0% to 100%

      A negative Lagragian error would mean that the distance between real and simulated cellular junctions decreased over time. We effectively treat this case as if there was no displacement, and the error is hence 0%.

      Why do you measure the error in percent?

      The error is measured in percentages because it is relative to the apical length of a cell.

      (3) Figure 2: The distinction between pink and red in e_2(t) is very difficult. What do the lines indicate?

      The lines indicate directions of the eigen vectors of the strain rate tensor at every material particle of the embryo.

      (4) L156 "per unit length": Rather per unit time?

      We thank the reviewer for pointing this out. We apologize for this mistake. "per unit length" has been changed to "per unit time"

      (5) L159 "Eigen vectors in this sense": is there another sense?

      "In this sense" is referring to the geometric description of eigen vectors. The phrase has been removed

      (6) L164 "magnitude of the rate of change underwent by a particle at the surface of the embryo in the three orthogonal spatial directions of most significant rate of change."

      Would a decomposition in two directions within the surface's tangent plane and one perpendicular to it not be better?

      We also performed the decomposition of the strain rate tensor as suggested within the surface's tangent plane and one perpendicular to it, but did not notice any tangible differences in the overall analysis, especially after derivation of the scalar field.

      (7) L174 "morphological activity": I think this notion is never defined

      By morphological activity we mean any noticeable shape changes

      (8) L177: I did not quite understand this part

      This part tries to convey that the scalar strain rate field evidences coordinated cell behaviors by highlighting wide regions of red that traverse cell boundaries (e.g. fig.2b, $t=5.48hpb$). At the same time, the strain rate field preserves cell boundaries, highlighted by bands of red at cellular intersections, when cell coordinated cell behaviors are not preponderant (e.g. fig.2b, $t=4hpb$).

      (9) Ll 194 "Unsurprisingly, these functions play an important role in many branches of science including quantum mechanics and geophysics Knaack and Stenflo (2005); Dahlen and Tromp (2021)." Does this really help in understanding spherical harmonics?

      This comment was made with the aim of showing to the reader that Spherical Harmonics have proved to be useful in other fields. Although it does not help in understanding spherical harmonics, it establishes that they can be effective.

      (10) Figure 3a: I do not find this panel particularly helpful. What does the color indicate? What are the prefactors of the spherical harmonics?

      This panel showcases the restriction of the strain rate scalar field to the spherical harmonics with the l and m specified. Each material particle of the embryo surface at the time  is colored with respect to the value of . The values are computed according to equation 2 and are showcased in figure 3c.

      (11) L 265: Please define "scalogram" as opposed to a spectrogram.

      Scalograms are the result of wavelet transforms applied to a signal. Although spectrogram can specifically refer to the spectrum of frequencies resulting for example from a Fourier transform, the term can also be used in a broader sense to designate any time-frequency representation. In the context of this paper, we used it interchangeably with scalogram. We have changed all occurrences of spectrogram to scalogram in the revised manuscript.

      (12) L 299 "the analysis was carried out the 64-cell stage.": Probably 'the analysis was carried out at the 64-cell stage'

      We thank the reviewer for pointing this out. The manuscript was revised to reflect the suggested change.

      (13) L 340 "Another outstanding advantage over traditional is": Something seems to be missing in this sentence.

      We thank the reviewer for pointing this out. We have modified the sentence in the revised manuscript. It now reads “Another outstanding advantage of our workflow over traditional methods is that our workflow is able to compress the story of the development ... ”.

      (14) Ll 357 "on the one hand, the overall spatial resolution of the raw data, on the other hand, the induced computational complexity.": Is there something missing in this sentence

      The sentence tries to convey the idea that in implementing our method, there is a comprise to be made between the choice of the number of particles on the constructed mesh and the computational complexity induced by this choice. There is also a comprise to be made between this choice of the number of particles and the spatial resolution of the original dataset.

      Reviewer 2:

      (1) The authors should clearly state to which data this method has been applied in this paper. Also, to what kind of data can this method be applied? For instance, should the embryo surface be segmented?

      The method has been applied on 3D+time imaging data of ascidian embryonic development data hosted on the morphonet (morphonet.org) platform. The data on the morphonet platform comes in two formats: closed surface meshes of segmented cells spatially organized into the embryo, and 3D voxelated images of the embryo. The method was first designed for the former format and then extended to the later. There is no requirement for the embryo surface to be segmented.

      (2) In this paper, it is essential to understand the way that the authors introduced the Lagrangian markers on the surface of the embryo. However, understanding the method solely based on the description in the main text was difficult. I recommend providing a detailed explanation of the methodology including equations in the main text for clarity.

      We believe that adding mathematical details of the method into the text will cloud the text and make it more difficult to understand. Interested readers can refer to the supplementary material for detailed explanation of the method.

      (3) In eq.(1) of the supplementary information, d(x,S_2(t)) could be a distance function between S_1 and S_2 although it was not stated. How was the distance function between the surfaces defined?

      What was meant here was d(x,S_1(t)) where x is a point of S_2(t). d(x,S_1(t)) referring to the distance between point x and S_1(t). The definition of the distance function has been clarified in the supplementary information.

      (4) In the section on the level set scheme of supplementary information, the derivation of eq.(4) from eq.(3) was not clear.

      We added an intermediary equation for clarification.

      (5) Why is a reference shape S_1(0) absent at t=0?

      A reference shape S_1(0) is absent at t=0 precisely because that is what we are trying to achieve: construct an evolving Lagrangian surface S_2(t) matching S_1(t) at all times.

      (6) In Figure 2(a), it is unclear what was plotted. What do the colors mean? A color bar should be provided.

      The caption of the figure describes the colors: “a) Heatmap of the eigenvector fields of the strain rate tensor. Each row represents a vector field distinguished by a distinct root color (\textit{yellow, pink, white}). The gradient from the root color to red represents increasing magnitudes of the strain rate tensor.”

      (7) With an appropriate transformation, it would be possible to create a 2D map from a 3D representation shown in for instance Figure 2. Such a 2D representation would be more tractable for looking at the overall activities.

      We thank the reviewer for pointing this out. In Figure 4b of the supplementary information, we provide a 2D projection of the scalar strain rate field.

      (8) The strain rate is a second-order tensor that contains rich information. In this paper, the information in the tensor has been compressed into a scalar field by taking the square root of the sum of the squares of the eigenvalues. However, such a representation may not distinguish important events such as stretching and compression of the tissue. The authors should provide appropriate arguments regarding the limitations of this analysis.

      The tensor form of the strain rate field is indeed endowed with more information than the scalar eigen value field derived. However, our objective in this project was not to exhaust the richness of the strain rate tensor field but rather to serve as a proof of concept that our global approach to studying morphogenesis could in fact unveil sufficiently rich information on the dynamical processes at play. Although not in the scope of this project, a more thorough exploration of the strain rate tensor field could be the object of future investigations.

      (9) The authors claimed that similarities emerge between the spatiotemporal distribution of morphogenesis processes in the previous works and the heatmaps in this work. Some concrete data should be provided to support this claim.

      All claims have been backed with references to previous works. For instances, looking at figure 2b, the two middle panels on the lower row (5.48hpf, 6.97hpf), we explained that the concentration of red refers respectively to endoderm invagination during gastrulation, and zippering during neurulation [we cited Hashimoto et al. (2015)]. Here, we relied on eye observation to spot the similarities. The rest of the paper provides substantial and robust additional support for these claims using spectral decomposition in space and time.

      (10) The authors also claimed that "A notable by-product of this scalar field is the evidencing of the duality of the embryo as both a sum of parts constituted of cells and an emerging entity in itself: the strain rate field clearly discriminates between spatiotemporal locations where isolated single cell behaviours are preponderant and those where coordinated cell behaviours dominate." The authors should provide specific examples and analysis to support this argument.

      Here, we relied on eye observation to make this claim. This whole section of the paper “Strain rate field describes ascidian morphogenesis” was about computing, plot and observing the strain rate field.

      However, specific examples were provided. This paragraph was building towards this statement, and the evidence was scattered through the paragraph. We have now revised the sentence to ensure that we highlight specific examples:

      “A notable by-product of this scalar field is the evidencing of the duality of the embryo as both a sum of parts constituted of cells and an emerging entity in itself: the strain rate field clearly discriminates between spatiotemporal locations where isolated single cell behaviours are preponderant (e.g. fig.2b, $t=4hpb$) and those where coordinated cell behaviours dominate (e.g. fig.2b, $t=5.48hpb$).”

      (11) The authors should provide the details of the analysis method used in Figure 3b, including relevant equations. In particular, it would be helpful to clarify the differences that cause the observed differences between Figure 3b and Figure 3c.

      Figure 3b was introduced with the sentence: “In analogy to Principal Components Analysis, we measure the average variance ratio over time of each harmonic with respect to the original signal (Fig.3b).” explaining the origin of variance ratio values used in figure 3b. We have now added the mathematical expression to further clarify.

      (12) The authors found that the variance ratio of Y_00 was 64.4%. Y_00 is a sphere, indicating that most of the activity can be explained by a uniform activity. Which actual biological process explains this symmetrical activity?

      The reviewer makes a good point which also gave us a lot to think about during the analysis. Observing that the contribution of Y00 peaks during synchronous divisions, which are interestingly restricted only to the animal pole, we conjecture that localized morphological ripples and can be felt throughout the embryo. 

      (13) The contribution of other spherical harmonics than Y_00 and Y_10 should be shown.

      Other spherical harmonics contributed individual to less than 1% and we did not find it important to include them in the main figure. We will add supplementary material.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      This manuscript describes a series of experiments documenting trophic egg production in a species of harvester ant, Pogonomyrmex rugosus. In brief, queens are the primary trophic egg producers, there is seasonality and periodicity to trophic egg production, trophic eggs differ in many basic dimensions and contents relative to reproductive eggs, and diets supplemented with trophic eggs had an effect on the queen/worker ratio produced (increasing worker production).

      The manuscript is very well prepared and the methods are sufficient. The outcomes are interesting and help fill gaps in knowledge, both on ants as well as insects, more generally. More context could enrich the study and flow could be improved.

      We thank the reviewer for these comments. We agree that the paper would benefit from more context. We have therefore greatly extended the introduction.

      Reviewer #2 (Public Review):

      The manuscript by Genzoni et al. provides evidence that trophic eggs laid by the queen in the ant Pogonomyrmex rugosis have an inhibitory effect on queen development. The authors also compare a number of features of trophic eggs, including protein, DNA, RNA, and miRNA content, to reproductive eggs. To support their argument that trophic eggs have an inhibitory effect on queen development, the authors show that trophic eggs have a lower content of protein, triglycerides, glycogen, and glucose than reproductive eggs, and that their miRNA distributions are different relative to reproductive eggs. Although the finding of an inhibitory influence of trophic eggs on queen development is indeed arresting, the egg cross-fostering experiment that supports this finding can be effectively boiled down to a single figure (Figure 6). The rest of the data are supplementary and correlative in nature (and can be combined), especially the miRNA differences shown between trophic and reproductive eggs. This means that the authors have not yet identified the mechanism through which the inhibitory effect on queen development is occurring. To this reviewer, this finding is more appropriate as a short report and not a research article. A full research article would be warranted if the authors had identified the mechanism underlying the inhibitory effect on queen development. Furthermore, the article is written poorly and lacks much background information necessary for the general reader to properly evaluate the robustness of the conclusions and to appreciate the significance of the findings.

      We thank the reviewer for these comments. We agree that the paper would benefit by having more background information and more discussion. We have followed this advice in the revision.

      Reviewer #3 (Public Review):

      In "Trophic eggs affect caste determination in the ant Pogonomyrmex rugosus" Genzoni et al. probe a fundamental question in sociobiology, what are the molecular and developmental processes governing caste determination? In many social insect lineages, caste determination is a major ontogenetic milestone that establishes the discrete queen and worker life histories that make up the fundamental units of their colonies. Over the last century, mechanisms of caste determination, particularly regulators of caste during development, have remained relatively elusive. Here, Genzoni et al. discovered an unexpected role for trophic eggs in suppressing queen development - where bi-potential larvae fed trophic eggs become significantly more likely to develop into workers instead of gynes (new queens). These results are unexpected, and potentially paradigm-shifting, given that previously trophic eggs have been hypothesized to evolve to act as an additional intracolony resource for colonies in potentially competitive environments or during specific times in colony ontogeny (colony foundation), where additional food sources independent of foraging would be beneficial. While the evidence and methods used are compelling (e.g., the sequence of reproductive vs. trophic egg deposition by single queens, which highlights that the production of trophic eggs is tightly regulated), the connective tissue linking many experiments is missing and the downstream mechanism is speculative (e.g., whether miRNA, proteins, triglycerides, glycogen levels in trophic eggs is what suppresses queen development). Overall, this research elevates the importance of trophic eggs in regulating queen and worker development but how this is achieved remains unknown.

      We thank the reviewer for these comments and agree that future work should focus on identifying the substances in trophic eggs that are responsible for caste determination.  

      Reviewer #1 (Recommendations For The Authors):

      Introduction:

      The context for this study is insufficiently developed in the introduction - it would be nice to have a more detailed survey of what is known about trophic eggs in insects, especially social insects. The end of the introduction nicely sets up the hypothesis through the prior work described by Helms Cahan et al. (2011) where they found JH supplementation increased trophic egg production and also increased worker size. I think that the introduction could give more context about egg production in Pogonomyrmex and other ants, including what is known about worker reproduction. For example, Suni et al. 2007 and Smith et al. 2007 both describe the absence of male production by workers in two different harvester ants. Workers tend to have underdeveloped ovaries when in the presence of the queen. Other species of ants are known to have worker reproduction seemingly for the purpose of nutrition (see Heinze and Hölldober 1995 and subsequent studies on Crematogaster smithi). Because some ants, including Pogonomyrmex, lack trophallaxis, it has been hypothesized that they distribute nutrients throughout the nest via trophic eggs as is seen in at least one other ant (Gobin and Ito 2000). Interestingly, Smith and Suarez (2009) speculated that the difference in nutrition of developing sexual versus worker larvae (as seen in their pupal stable isotope values) was due to trophic egg provisioning - they predicted the opposite as was found in this study, but their prediction was in line with that of Helms Cahan et al. (2011). This is all to say that there is a lot of context that could go into developing the ideas tested in this paper that is completely overlooked. The inclusion of more of what is known already would greatly enrich the introduction.

      We agree that it would be useful to provide a larger context to the study. We now provide more information on the life-history of ants and explained under what situations queens and workers may produce trophic eggs. We also mentioned that some ants such as Crematogaster smithi have a special caste of “large workers” which are morphologically intermediate between winged queens and small workers and appear to be specialized in the production of unfertilized eggs. We now also mention the study of Goby and Ito (200) where the authors show that trophic eggs may play an important role in food distribution withing the colony, in particular in species where trophallaxis is rare or absent.

      Methods:

      L49: What lineage is represented in the colonies used? The collection location is near where both dependent-lineage (genetic caste determining) P. rugosus and "H" lineage exist. This is important to know. Further, depending on what these are, the authors should note whether this has relevance to the study. Not mentioning genetic caste determination in a paper that examines caste determination is problematic.

      This is a good point. We have now provided information at the very beginning of the material and method section that the queens had been collected in populations known not to have dependentlineage (genetic caste determining) mechanisms of caste determination.

      L63 and throughout: It would be more efficient to have a paragraph that cites R (must be done) and RStudio once as the tool for all analyses. It also seems that most model construction and testing was done using lme4 - so just lay this out once instead of over and over.

      We agree and have updated the manuscript accordingly.

      L95: 'lenght' needs to be 'length' in the formula.

      Thanks, corrected.

      L151: A PCA was used but not described in the methods. This should be covered here. And while a Mantel test is used, I might consider a permANOVA as this more intuitively (for me, at least) goes along with the PCA.

      We added the PCA description in the Material and Method section.

      Results:

      I love Fig. 3! Super cool.

      Thanks for this positive comment.

      Discussion:

      It would be good to have more on egg cannibalism. This is reasonably well-studied and could be good extra context.

      We have added a paragraph in the discussion to mention that egg cannibalism is ubiquitous in ants.

      Supp Table 1: P. badius is missing and citations are incorrectly attributed to P. barbatus.

      P. badius was present in the Table but not with the other Pogonomyrmex species. For some genera the species were also not listed in alphabetic order. This has been corrected.

      Reviewer #2 (Recommendations For The Authors):

      COMMENTS ON INTRODUCTION:

      The introduction is missing information about caste determination in ants generally and Pogonomyrmex rugosis specifically. This is important because some colonies of Pogonomyrmex rugosis have been shown to undergo genetic caste determination, in which case the main result would be rendered insignificant. What is the evidence that caste determination in the lineages/colonies used is largely environmentally influenced and in what contexts/environmental factors? All of this should be made clear.

      This is a good point. We have expanded the introduction to discuss previous work on caste determination in Pogonomyrmex species with environmental caste determination and now also provide evidence at the beginning of the Material and Method section that the two populations studied do not have a system of genetic caste determination.

      Line 32 and throughout the paper: What is meant exactly by 'reproductive eggs'? Are these eggs that develop specifically into reproductives (i.e., queens/males) or all eggs that are non-trophic? If the latter, then it is best to refer to these eggs as 'viable' in order to prevent confusion.

      We agree and have updated the manuscript accordingly.

      Figure 1/Supp Table 1: It is surprising how few species are known to lay trophic eggs. Do the authors think this is an informative representation of the distribution of trophic egg production across subfamilies, or due to lack of study? Furthermore, the branches show ant subfamilies, not families. What does the question mark indicate? Also, the information in the table next to the phylogeny is not easy to understand. Having in the branches that information, in categories, shown in color for example, could be better and more informative. Finally, having the 'none' column with only one entry is confusing - discuss that only one species has been shown to definitely not lay trophic eggs in the text, but it does not add much to the figure.

      Trophic eggs are probably very common in ants, but this has not been very well studied. We added a sentence in the manuscript to make this clear.

      Thanks for noticing the error family/subfamily error. This has been corrected in Figure 1 and Supplementary Table 1.

      The question mark indicates uncertainty about whether queens also contribute to the production of trophic eggs in one species (Lasius niger). We have now added information on that in the Figure legend.

      We agree with the reviewer that it would be easier to have the information on whether queens and workers produce trophic on the branches of the Tree. However, having the information on the branches would suggest that the “trait” evolved on this part of the tree. As we do not know when worker or queen production of trophic eggs exactly evolved, we prefer to keep the figure as it is.

      Finally, we have also removed the none in the figure as suggested by the reviewer and discussed in the manuscript the fact that the absence of trophic eggs has been reported in only one ant species (Amblyopone silvestrii: Masuko 2003).

      COMMENTS ON MATERIALS AND METHODS:

      Why did they settle on three trophic eggs per larva for their experimental setup?

      We used three trophic eggs because under natural conditions 50-65% of the eggs are trophic. The ratio of trophic eggs to viable eggs (larvae) was thus similar natural condition.

      Line 50: In what kind of setup were the ants kept? Plaster nests? Plastic boxes? Tubes? Was the setup dry or moist? I think this information is important to know in the context of trophic eggs.

      We now explain that colonies were maintained in plastic boxes with water tubes.

      Line 60: Were all the 43 queens isolated only once, or multiple times?

      Each of the 43 queens were isolated for 8 hours every day for 2 weeks, once before and once after hibernation (so they were isolated multiple times). We have changed the text to make clear that this was done for each of the 43 queens.

      Could isolating the queen away from workers/brood have had an effect on the type of eggs laid?

      This cannot be completely ruled out. However, it is possible to reliably determine the proportion of viable and trophic eggs only by isolating queens. And importantly the main aim of these experiments was not to precisely determine the proportion viable and trophic eggs, but to show that this proportion changes before and after hibernation and that queens do not lay viable and trophic eggs in a random sequence.

      Since it was established that only queens lay trophic eggs why was the isolation necessary?

      Yes this was necessary because eggs are fragile and very difficult to collect in colonies with workers (as soon as eggs are laid they are piled up and as soon as we disturb the nest, a worker takes them all and runs away with them). Moreover, it is possible that workers preferentially eat one type of eggs thus requiring to remove eggs as soon as queens would have laid them. This would have been a huge disturbance for the colonies.

      Line 61: Is this hibernation natural or lab induced? What is the purpose of it? How long was the hibernation and at what temperature? Where are the references for the requirement of a diapause and its length?

      The hibernation was lab induced. We hibernated the queens because we previously showed that hibernation is important to trigger the production of gynes in P. rugosus colonies in the laboratory (Schwander et al 2008; Libbrecht et al 2013). Hibernation conditions were as described in Libbrecht et al (2013).  

      Line 73: If the queen is disturbed several times for three weeks, which effect does it have on its egg-laying rate and on the eggs laid? Were the eggs equally distributed in time in the recipient colonies with and without trophic eggs to avoid possible effects?

      It is difficult to respond what was the effect of disturbance on the number and type of eggs laid. But again our aim was not to precisely determine these values but determine whether there was an effect of hibernation on the proportion of trophic eggs. The recipient colonies with and without trophic eggs were formed in exactly the same way. No viable eggs were introduced in these colonies, but all first instar larvae have been introduced in the same way, at the same time, and with random assignment. We have clarified this in the Material and Method section.

      Line 77: Before placing the freshly hatched larvae in recipient colonies, how long were the recipient colonies kept without eggs and how long were they fed before giving the eggs? Were they kept long enough without the queen to avoid possible effects of trophic eggs, or too long so that their behavior changed?

      The recipient colonies were created 7 to 10 days before receiving the first larvae and were fed ad libitum with grass seeds, flies and honey water from the beginning. Trophic eggs that would have been left over from the source colony should have been eaten within the first few days after creating the recipient colonies. However, even if some trophic eggs would have remained, this would not influence our conclusion that trophic eggs influence caste fate, given the fully randomized nature of our treatments and the considerable number of independent replicates. The same applies to potential changes in worker behavior following their isolation from the queen.

      Line 77: Is it known at what stage caste determination occurs in this species? Here first instar larvae were given trophic eggs or not. Does caste-determination occur at the first instar stage? If not, what effect could providing trophic eggs at other stages have on caste-determination?

      A previous study showed that there is a maternal effect on caste determination in the focal species (Schwander et al 2008). The mechanism underlying this maternal effect was hypothesized to be differential maternal provisioning of viable eggs. However, as we detail in the discussion, the new data presented in our study suggests that the mechanism is in fact a different abundance of trophic eggs laid by queens. There is currently no information when exactly caste determination occurs during development

      COMMENTS ON RESULTS:

      Line 65: How does investigating the order of eggs laid help to "inform on the mechanisms of oogenesis"?

      We agree that the aim was not to study the mechanism of oogenesis. We have changed this sentence accordingly: “To assess whether viable and trophic eggs were laid in a random order, or whether eggs of a given type were laid in clusters, we isolated 11 queens for 10 hours, eight times over three weeks, and collected every hour the eggs laid”

      Figure 2: There is no description/discussion of data shown in panels B, C, E, and F in the main text.

      We have added information in the main text that while viable eggs showed embryonic development at 25 and 65 hours (Fig 12 B, C) there was no such development for trophic eggs (Fig. 2 E,F).

      Line 172: Please explain hibernation details and its significance on colony development/life cycle.

      We have added this information in the Material and Method section.

      Figure 6: How is B plotted? How could 0% of gynes have 100% survival?

      The survival is given for the larvae without considering caste. We have changed the de X axis of panel B and reworded the Figure legend to clarify this.

      Is reduced DNA content just an outcome of reduced cell number within trophic eggs, i.e., was this a difference in cell type or cell number? Or is it some other adaptive reason?

      It is likely to be due to a reduction in cell number (trophic eggs have maternal DNA in the chorion, while viable eggs have in addition the cells from the developing zygote) but we do not have data to make this point.

      Is there a logical sequence to the sequence of egg production? The authors showed that the sequence is non-random, but can they identify in what way? What would the biological significance be?

      We could not identify a logical sequence. Plausibly, the production of the two types of eggs implies some changes in the metabolic processes during egg production resulting in queens producing batches of either viable or trophic eggs. This would be an interesting question to study, but this is beyond the scope of this paper.

      Figure 6b is difficult to follow, and more generally, legends for all figures can be made clearer and more easy to follow.

      We agree. We have now improved the legends of Fig 6B and the other figures.

      Lines 172-174: "The percentage of eggs that were trophic was higher before hibernation...than after. This higher percentage was due to a reduced number of reproductive eggs, the number of trophic eggs laid remained stable" - are these data shown? It would be nice to see how the total egglaying rate changes after hibernation. Also, is the proportion of trophic eggs laid similar between individual queens?

      No the data were not shown and we do not have excellent data to make this point. We have therefore removed the sentence “This higher percentage was due to a reduced number of reproductive eggs, the number of trophic eggs laid remained stable” from the manuscript.

      Figure 6B: Do several colonies produce 100% gynes despite receiving trophic eggs? It would be interesting if the authors discussed why this might occur (e.g., the larvae are already fully determined to be queens and not responsive to whatever signal is in the trophic eggs).

      The reviewer is correct that 4 colonies produced 100% gynes despite receiving trophic eggs. However, the number of individuals produced in these four colonies was small (2,1,2,1, see supplementary Table 2). So, it is likely that it is just by chance that these colonies produced only gynes.

      Figure 5: Why a separation by "size distribution variation of miRNA"? What is the relevance of looking at size distributions as opposed to levels?

      We did that because there many different miRNA species, reflected by the fact that there is not just one size peak but multiple one. This is why we looked at size distribution

      Figure 2: The image of the viable embryo is not clear. If possible, redo the viable to show better quality images.

      Unfortunately, we do not anymore have colonies in the laboratory so this is not possible.

      COMMENTS ON DISCUSSION:

      Lines 236-247: Can an explanation be provided as to why the effect of trophic eggs in P. rugosus is the opposite of those observed by studies referenced in this section? Could P. rugosus have any life history traits that might explain this observation?

      In the two mentioned studies there were other factors that co-varied with variation in the quantity of trophic eggs. We mentioned that and suggested that it would be useful to conduct experimental manipulation of the quantity of trophic eggs in the Argentine ant and P. barbatus (the two species where an effect of trophic eggs had been suggested).

      The discussion should include implications and future research of the discovery.

      We made some suggestions of experiments that should be performed in the future

      The conclusion paragraph is too short and does not represent what was discussed.

      We added two sentences at the end of the paragraph to make suggestions of future studies that could be performed.

      Lines 231 to 247: Drastically reduce and move this whole part to the introduction to substantiate the assumption that trophic eggs play a nutritional role.

      We moved most of this paragraph to the introduction, as suggested by the reviewer.

      Reviewer #3 (Recommendations For The Authors):

      I would like to commend the authors on their study. The main findings of the paper are individually solid and provide novel insight into caste determination and the nature of trophic eggs. However, the inferences made from much of the data and connections between independent lines of evidence often extend too far and are unsubstantiated.

      We thank the reviewer for the positive comment. We made many changes in the manuscript to improve the discussion of our results.

    1. Author response:

      We thank the editors and the reviewers for their valuable comments. In response to these suggestions, we will add rigorous statistical measures and extend the experimental support of our findings in a revised version. Indeed, as we will show, doing so strengthens all the main claims. Specifically:

      Concerning Reviewer 1:

      - It is important to emphasise that the advantage of deriving shape measures q<sub>p</sub> from Minkowski tensors is their robustness and stability, that is well-established from extensive, rigorous mathematical analyses. Introducing q<sub>p</sub> without this connection to revised Minkowski tensors would not allow to claim this stability property for the considered measures.

      - Even though for a polygon the vertex positions contain the whole geometric information, using q<sub>p</sub> and γ<sub>p</sub> lead to different results, see Fig. 6 for an example.

      - We wholeheartedly agree that our statement on independence of values of q<sub>2</sub> and q<sub>6</sub> can be extended and more quantitatively established by rigorous statistical measures. This is exactly what we will do in the revised version, not only providing statistical measures on the presented data, but also extending our analyses to the published data from Armengol-Collado JM, Carenza LN, Eckert J, Krommydas D, Giomi L. Epithelia are multiscale active liquid crystals. Nature Physics. 2023; 19:1773–1779. As we shall show these analyses further strengthen this claim, unequivocally establishing the independence of q<sub>2</sub> and q<sub>6</sub> in two different models (active vertex model and multiphase-field model), as well as two different sets of experiments (the ones in the original manuscript, and the published one from Armengol-Collado JM, Carenza LN, Eckert J, Krommydas D, Giomi L. Epithelia are multiscale active liquid crystals. Nature Physics. 2023; 19:1773–1779).

      Concerning Reviewer 2:

      To fully address this point, we have extended our analyses to explore the published data of Armengol-Collado JM, Carenza LN, Eckert J, Krommydas D, Giomi L. Epithelia are multiscale active liquid crystals. Nature Physics. 2023; 19:1773–1779. As we shall show in the revised manuscript, the crossover between nematic and hexatic is only specific to the use of γ<sub>p</sub> for characterizing the shape and coarse-graining of the associated order. Using q<sub>p</sub> as the shape measure this crossover disappears. Therefore, this analyses concretely demonstrate that the crossover is not a robust physical feature of the system and is dependent on the method used to define shape characteristics.

      Concerning Reviewer 3:

      We respectfully note a misunderstanding from the referee: The briefly mentioned approaches of other groups, turn out to be not measuring shape but connections between cells. Conceptually these approaches are therefore related to bond order parameters. We already comment at the end of the section introducing Minkowski tensors that bond order parameters cannot quantify the shape of a cell. The same argumentation also holds for other such approaches. In our revised version we will further clarify this distinction, to avoid any confusion or misinterpretation.

    1. Author response:

      As a short response to the public reviews, we would like to outline the following planned revisions:

      (1) Address the antibody concerns as indicated by reviewer 1

      (2) Assess the role of tensin (and possibly KANK), as suggested by reviewers 2 and 3, respectively.

      (3) Validate our main experimental findings using alternative super-resolution approaches, including STED to avoid potential blinking artefacts associated to standard STORM, and most possibly DNA-PAINT as a more quantitative technique, as suggested by reviewer 3.

      (4) Implement alternative analytical strategies to DBSCAN, including Voronoi tessellation as suggested by reviewer 3.

      (5) Expanded discussion on the main findings of our work and biological significance.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In the manuscript entitled "Rtf1 HMD domain facilitates global histone H2B monoubiquitination and regulates morphogenesis and virulence in the meningitis-causing pathogen Cryptococcus neoformans" by Jiang et al., the authors employ a combination of molecular genetics and biochemical approaches, along with phenotypic evaluations and animal models, to identify the conserved subunit of the Paf1 complex (Paf1C), Rtf1, and functionally characterize its critical roles in mediating H2B monoubiquitination (H2Bub1) and the consequent regulation of gene expression, fungal development, and virulence traits in C. deneoformans or C. neoformans. Specially, the authors found that the histone modification domain (HMD) of Rtf1 is sufficient to promote H2B monoubiquitination (H2Bub1) and the expression of genes related to fungal mating and filamentation, and restores the fungal morphogenesis and pathogenicity defects caused by RTF1 deletion.

      Strengths:

      The manuscript is well-written and presents the findings in a clear manner. The findings are interesting and contribute to a better understanding of Rtf1-mediated epigenetic regulation of fungal morphogenesis and pathogenicity in a major human fungal pathogen, and potentially in other fungal species, as well.

      Weaknesses:

      A major limitation of this study is the absence of genome-wide information on Rtf1-mediated H2B monoubiquitination (H2Bub1), as well as a lack of detail regarding the function of the Plus3 domain. Although overexpression of HMD in the rtf1Δ mutant restored global H2Bub1 levels, it did not rescue certain critical biological functions, such as growth at 39 °C and melanin production (Figure 4C-D). This suggests that the precise positioning of H2Bub1 is essential for Rtf1's function. A comprehensive epigenetic landscape of H2Bub1 in the presence of HMD or full-length Rtf1 would elucidate potential mechanisms and shed light on the function of the Plus3 domain.

      We thank the reviewer (and other reviewers) for this excellent suggestion. We have conducted CUT&Tag assays with WT, _rtf1_Δ mutant, and complementary strains with the full length Rtf1 and only HMD domain cultured under 30 and 39 °C. We indeed found that the epigenetic landscape of H2Bub1 in the presence of HMD or full-length Rtf1 has variations. This results strongly suggest that the distribution of H2Bub1 is regulated by Rtf1, and H2B modifications at specific loci in the chromosome may contribute to thermal tolerance in C. neoformans. These new findings from CUT&Tag assays shed lights on understanding the mechanism of thermal tolerance, and we decided not to include these results in the current manuscript.

      Reviewer #2 (Public Review):

      Summary:

      The authors set out to determine the role of Rtf1 in Cryptococcal biology, and demonstrate that Rtf1 acts independently of the Paf1 complex to exert regulation of Histone H2B monoubiquitylation (H2Bub1). The biological impact of the loss of H2Bub1 was observed in defects in morphogenesis, reduced production of virulence factors, and reduced pathogenic potential in animal models of cryptococcal infection.

      Strengths:

      The molecular data is quite compelling, demonstrating that the Rtf1-depednent functions require only this histone modifying domain of Rtf1, and are dependent on nuclear localization. A specific point mutation in a residue conserved with the Rtf1 protein in the model yeast demonstrates the conservation of that residue in H2Bub1 modification. Interestingly, whereas expression of the HMD alone suppressed the virulence defect of the rtf1 deletion mutant, it did not suppress defects in virulence factor production.

      Weaknesses:

      The authors use two different species of Cryptococcus to investigate the biological effect of Rtf1 deletion. The work on morphogenesis utilized C. deneoformans, which is well-known to be a robust mating strain. The virulence work was performed in the C. neoformans H99 background, which is a highly pathogenic isolate. The study would be more complete if each of these processes were assessed in the other strain to understand if these biological effects are conserved across the two species of Cryptococcus. H99 is not as robust in morphogenesis, but reproducible results assessing mating and filamentation in this strain have been performed. Similarly, C. deneoformans does produce capsule and melanin.

      We thank the reviewer for the suggestion. We have conducted assays to quantify both capsule and melanin production in both C. neoformans and C. deneoformans strain background. We found that capsule production was affected in the same pattern in these two serotypes. Interestingly, we found the cell size was significantly affected by deletion of RTF1 in both serotypes. In addition, melanin production was reduced due to the deletion of RTF1 in both serotypes; However, complementation with Plus3 or mutated alleles of HMD gave different phenotypes in these two serotypes. These new findings were included Figure 4 in the revised manuscript.

      There are some concerns with the conclusions related to capsule induction. The images reported in Figure B are purported to be grown under capsule-inducing conditions, yet the H99 panel is not representative of the induced capsule for this strain. Given the lack of a baseline of induction, it is difficult to determine if any of the strains may be defective in capsule induction. Quantification of a population of cells with replicates will also help to visualize the capsular diversity in each strain population.

      We thank the reviewer for raising this concern. We have tested capsule production under capsule-inducing condition on 10% fetal bovine serum (FBS) agar medium [1]. Under this condition, the capsule layers surrounding the cells were obvious. We also included noncapsule-producing control in our assay to help the visualization of capsule. In addition, we quantified the ratio between diameters of capsule layer and cell body to show the capsular diversity in each strain population. The results were included in the Figure 4 in the revised manuscript.

      The authors demonstrate that for specific mating-related genes, the expression of the HMD recapitulated the wild-type expression pattern. The RNA-seq experiments were performed under mating conditions, suggesting specificity under this condition. The authors raise the point in the discussion that there may be differences in Rtf1 deposition on chromatin in H99, and under conditions of pathogenesis. The data that overexpression of HMD restores H2Bub1 by western is quite compelling, but does not address at which promoters H2Bub1 is modulating expression under pathogenesis conditions, and when full-length Rtf1 is present vs. only the HMD.

      We thank the reviewer for raising these concerns. Please see our response to Reviewer #1.

      Reviewer #3 (Public Review):

      Summary:

      In this very comprehensive study, the authors examine the effects of deletion and mutation of the Paf1C protein Rtf1 gene on chromatin structure, filamentation, and virulence in Cryptococcus.

      Strengths:

      The experiments are well presented and the interpretation of the data is convincing.

      Weaknesses:

      Yet, one can be frustrated by the lack of experiments that attempt to directly correlate the change in chromatin structure with the expression of a particular gene and the observed phenotype. For example, the authors observed a strong defect in the expression of ZNF2, a known regulator of filamentation, mating, and virulence, in the rtf1 mutant. Can this defect explain the observed phenotypes associated with the RTF1 mutation? Is the observed defect in melanin production associated with altered expression of laccase genes and altered chromatin structure at this locus?

      We completely agree with the reviewer. We have conducted CUT&Tag assay, and checked the Rtf1-mediated H2Bub1 at these particular gene loci. We found that the distribution of H2Bub1 at the promoter region of ZNF2 and the gene body of laccase-encoding gene varied possibly due to RTF1 mutation. We would like to save those preliminary findings for another story and not to include in this manuscript as we mentioned in the response to Reviewer #1.

      (1) Jang, E.-H., et al., Unraveling Capsule Biosynthesis and Signaling Networks in Cryptococcus neoformans. Microbiology Spectrum, 2022. 10(6): p. e02866-22.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors show for the first time that deleting GLS from rod photoreceptors results in the rapid death of these cells. The death of photoreceptor cells could result from loss of synaptic activity because of a decrease in glutamate, as has been shown in neurons, changes in redox balance, or nutrient deprivation.

      Strengths:

      The strength of this manuscript is that the author shows a similar phenotype in the mice when Gls was knocked out early in rod development or the adult rod. They showed that rapid cell death is through apoptosis, and there is an increase in the expression of genes responsive to oxidative stress.

      We thank the reviewer for their time reviewing the manuscript and their comments regarding the potential mechanism(s) by which rod photoreceptors rapidly degenerate upon knockout of GLS.

      Weaknesses:

      In this manuscript, the authors show a "metabolic dependency of photoreceptors on glutamine catabolism in vivo". However, there is a potential bias in their thinking that glutamine metabolism in rods is similar to cancer cells where it feeds into the TCA cycle. They should consider that as in neurons, GLS1 activity provides glutamate for synaptic transmission. The modest rescue shown by providing α-ketoglutarate in the drinking water suggests that glutamine isn't a key metabolic substrate for rods when glucose is plentiful. The ERG studies performed on the iCre-Glsflox/flox mice showed a large decrease in the scotopic b wave at saturating flashes which could indicate a decrease in glutamate at the rod synapse as stated by the authors. While EM micrographs of wt and iCre-Glsflox/flox mice were shown for the outer retina at p14, the synapse of the rods needs to be examined by EM.

      We agree with the reviewer that in the presence of sufficient glucose, it appears a lack of GLS-driven glutamine (Gln) catabolism does not drastically alter the levels of TCA cycle metabolites or mitochondrial function as we demonstrated in Figure 4, and supplementation with alpha-ketoglutarate improved outer nuclear layer thickness by only a small amount as observed in Figure 5e. Hence, as we stated in the Results and Discussion, at least in the mouse where Gls is selectively deleted from rod photoreceptors by crossing Gls<sup>fl/fl</sup> mice with Rho-Cre mice (Gls<sup>fl/fl</sup>; Rho-Cre<sup>+</sup>, cKO), Gln’s role in supporting the TCA cycle is not the major mechanism by which rod photoreceptors utilize Gln to suppress apoptosis.

      With regards to GLS-driven Gln catabolism providing glutamate (Glu) for synaptic transmission, we again agree with the reviewer that Glu is an important excitatory neurotransmitter, but it is also a key metabolite necessary for the synthesis of glutathione, amino acids, and proteins. As noted and discussed at length in the manuscript, a lack of GLS-driven Gln catabolism in rod photoreceptors leads to reduced levels of oxidized glutathione (Figure 4D) possibly signaling an overall reduction in the biosynthesis of glutathione as Glu is directly and indirectly responsible for its synthesis. Furthermore, Gln and GLS-derived Glu play a central role in the biosynthesis of several nonessential amino acids and proteins. To this end, we see a reduction in the level of Glu, which is the product of the GLS reaction and further confirms the loss of GLS function. We also noted a significant decrease in aspartate (Asp), which can be constructed from the carbons and nitrogens of Gln as discussed at length in the manuscript (Figure 6A). Finally, we noted a significant decrease in global protein synthesis in the cKO retina as compared to the wild-type animal as well (Figure 6E). Therefore, the data suggest that GLS-driven Gln catabolism is critical for amino acid metabolism and protein synthesis and to some degree redox balance; although, the small but statistically significant changes in oxidized glutathione, NADP/NADPH, and redox gene expression may not fully account for the rapid and complete photoreceptor degeneration observed. Future studies are necessary to shed light on the role of redox imbalance in this novel transgenic mouse model.

      Glu also plays a role in synaptic transmission, and we considered this scenario as described in Figure 1 – figure supplement 5. Here, the synaptic connectivity between photoreceptors and the inner retina did not demonstrate significant differences in the labeling of photoreceptor synaptic membranes in the outer plexiform layer nor alterations in the labeling of a key protein (Bassoon) in ribbon synapses. These data suggest that the synaptic connectivity between photoreceptors and second-order neurons was unaltered at P14 in the cKO retina, which is the time just prior to rapid photoreceptor degeneration when Glu was shown to be decreased (Figure 6A).

      With regards to the ERG changes noted in Figure 2, we agree with the reviewer that a large decrease was noted in the scotopic b-wave at P21 and P42 in the cKO. We also agree, that to obtain greater insight into these ERG changes, the ribbon synapse in EM images can be examined. The EM images shown in Figure 1 – figure supplement 4 are from P21, which coincide with the age at which the ERG changes were first noted and when significant photoreceptor degeneration has already occurred. These images were utilized to assess the ribbon synapse for the revised version of the manuscript. As now shown in Figure 1 – figure supplement 4D, ribbon synapses are intact in WT animals as denoted by the yellow boxes. Similarly, the ribbons (yellow arrows) appear structurally intact in the photoreceptors that remain in the P21 cKO retina. These results are in accordance with the lack of significant differences in the labeling of photoreceptor synaptic membranes in the outer plexiform layer as well as the lack of alterations in the labeling of a key protein (Bassoon) in ribbon synapses (Figure 1-figure supplement 5A and B).  While we cannot fully rule out that the decrease in glutamate is altering synaptic transmission, our structural data suggests the synapses remain intact. These data have been added to the revised manuscript.

      However, an even larger reduction in the scotopic a-wave was noted at these ages as well. In animal models that disrupt photoreceptor synaptic function (Dick et al. Neuron. 2003; Johnson et al. J Neuroscience. 2007; Haeseleer et al. Nature Neuroscience. 2004; Chang et al. Vis Neurosci. 2006), a more negative ERG pattern is typically observed with the b-wave altered to a much larger degree than the a-wave. Additionally, in these models that disrupt photoreceptor synaptic transmission, the overall structure of the retina with respect to thickness is maintained (Dick et al. Neuron. 2003) or noted to have modest changes in the outer plexiform layer within the first two months of age with the outer nuclear layer not significantly altered until 8-10 months of age (Haeseleer et al. Nature Neuroscience. 2004). In contrast, a rapid decline in the outer nuclear layer thickness was observed in the cKO retina after P14 likely contributing to the ERG changes noted in Figure 2. Also, Gln is catabolized to Glu primarily by GLS as suggested by the approximately 50% reduction in Glu levels in the cKO retina (Figure 6A), but other enzymes are also capable of catabolizing Gln to Glu, so Glu levels in the rod photoreceptors are unlikely to be zero. Coupling this with the fact that rods are equipped with a self-sufficient Glu recollecting system at their synaptic terminals (Hasegawa et al. Neuron. 2006; Winkler et al. Vis Neurosci. 1999) and that GLS activity is at least two-fold higher in the photoreceptor inner segments, which support energy production and metabolism, than any other layer in the retina (Ross et al. Brain Res. 1987) suggests that altered synaptic transmission secondary to reduced levels of Glu likely does not account in full for the rapid and robust photoreceptor degeneration observed in the cKO retina.

      The authors note that the outer segments are shorter but they do not address whether there is a decrease in the number of cones.

      We have adjusted Figure 2E by removing the GLS staining to better highlight the secondary degeneration of cone outer segments, the main point of the Figure, as we had already shown that GLS was cleanly knocked out of rod photoreceptors in Figure 1. Furthermore, qualitatively the number of cones appears the same at P14, P21, and P42 between the WT and cKO, which is consistent with other retinal degeneration models, like rd1 and rd10, where cones do not begin to die until all the rods have degenerated (Xue et al. eLife. 2021).

      Rod-specific Gls ko mice with an inducible promoter were generated by crossing the Pde6g-CreERT2 and homozygous for either the WT or floxed Gls allele (IND-cKO). In Figure 3 the authors document that by western blots and antibody labeling the GLS1 expression is lost in the IND-cKO 10 days post tamoxifen. OCT images show a decrease in the thickness of the outer nuclear layer between 17 and 38 days post-TAM. Ergs should be performed on the animals at 10 and 30 days post TAM, before and after major structural changes in rod photoreceptor cells, to determine if changes in light-stimulated responses are observed. These studies could help to parse out the cause of photoreceptor cell death.

      We agree with the reviewer that the IND-cKO is a useful tool to help parse out the cause of photoreceptor cell death in this model as well as shed light on the role of GLS-driven Gln catabolism in photoreceptor synaptic transmission as discussed at length above. Hence, ERG analyses were performed 10 days post TAM, before major structural changes in the ONL are observed. Interestingly, ERG demonstrated statistically significant reductions in the IND-cKO scotopic a- and b-waves as compared to the WT 10 days post TAM. Similarly, photopic ERG demonstrated statistically significant decreases in the b-wave of the IND-cKO retina. These data suggest that GLS-driven Gln catabolism plays a significant role not only in rod photoreceptor survival but their function as well. This data has been added to Figure 3H-I and discussed in the corresponding manuscript text.

      To this end, as discussed below and added to Figure 6 – figure supplement 1, amino acid levels, including glutamate (Glu), are already reduced 10 days post TAM. Reductions in the level of Glu may impact synaptic transmission and as a result, the scotopic b-wave. However, as noted above, altered synaptic transmission secondary to reduced levels of Glu likely does not account in full for the rapid and robust photoreceptor degeneration observed in the cKO retina as the b-wave to a-wave ratio is not significantly altered in the IND-cKO retina as compared to the WT retina, suggesting GLS-driven Gln catabolism is impairing both to a similar degree.

      Additionally, Pde6g is expressed by rods to a significant degree but also by cones (GSE63473, scRNAseq data). Therefore, the IND-cKO mouse likely knocks out GLS from both rods and cones, which is in accordance with the immunofluorescence image in Figure 3B where GLS is not observed in rod or cone inner segments unlike in Figure 1B where GLS remains in cones. Hence, the reduction in photopic b-wave may be demonstrating that GLS-driven Gln catabolism in cones impairs synaptic transmission. As noted in our reply to reviewer #3’s comments, we have generated mice lacking GLS in cone photoreceptors specifically and are currently elucidating the role of GLS in cone photoreceptor metabolism, function, and survival. These results will be published in a separate manuscript.

      The studies in Figure 4 were all performed on iCre-Glsflox/flox and control mice at p14, why weren't the IND-cKO mice used for these studies since the findings would not be confounded by development?

      To gain further insight into the role of GLS-driven Gln catabolism in the maintenance of rod photoreceptors as compared to their development/maturation, we conducted a targeted metabolomic analysis on IND-cKO and WT retinas 10 days post TAM. For the purpose of this manuscript, we have included data regarding changes in amino acid levels in Figure 6 – figure supplement 1. Specifically, levels of glutamate, aspartate and asparagine are all significantly decreased in the IND-cKO retina prior to PR degeneration, which demonstrates that similar to the GLS cKO mouse (i.e. iCre-Gls flox/flox), GLS-driven Gln catabolism is critical for amino acid biosynthesis in mature rod PRs as well.

      In all rescue studies, the endpoint was an ONL thickness, which only addressed rod cell death. The authors should also determine whether there are small improvements in the ERG, which would distinguish the role of GLS in preventing oxidative stress.

      Optical coherence tomography (OCT) provides a sensitive in vivo method to detect small changes in retinal thickness without potential artifacts incurred through histological processing. Considering the Gls cKO retina demonstrates significant and rapid photoreceptor degeneration, we wanted to assess pathways that may be critical to photoreceptor survival downstream of GLS-driven Gln catabolism using rescue experiments with pharmacologic treatment or metabolite supplementation. That said, disruption of GLS-driven Gln catabolism may also significantly alter rod photoreceptor function beyond that which is secondary to photoreceptor cell death as we have demonstrated in the IND-cKO animal for the revised version of this manuscript and discussed in a response above. Therefore, the IND-cKO model provides a unique tool to assess the impact of rescue studies on photoreceptor function as the functional changes occur prior to significant degeneration. Also, unlike the GLS cKO mouse (i.e. iCre-Gls flox/flox) where photoreceptor degeneration starts very early, impairing our ability to capture reliable and robust ERG measurements, the IND-cKO mice are older at the time of functional changes allowing for robust ERG measurements. While the rate of photoreceptor degeneration in both mouse models is similar and the levels of key amino acids are altered similarly in both models, the mechanisms of cell death in developing/maturing photoreceptors may be different than that in mature photoreceptors. Hence, before we can assess if similar rescue experiments impact photoreceptor function via ERG in the IND-cKO mouse, we need to thoroughly examine how these photoreceptors are dying. These experiments and results will be published in a separate manuscript in the future.

      Reviewer #2 (Public Review):

      Summary:

      Photoreceptor neurons are crucial for vision, and discovering pathways necessary for photoreceptor health and survival can open new avenues for therapeutics. Studies have shown that metabolic dysfunction can cause photoreceptor degeneration and vision loss, but the metabolic pathways maintaining photoreceptor health are not well understood. This is a fundamental study that shows that glutamine catabolism is critical for photoreceptor cell health using in vivo model systems.

      Strengths:

      The data are compelling, and the consideration of potential confounding factors (such as glutaminase 2 expression) and additional experiments to examine the synaptic connectivity and inner retina added strength to this work. The authors were also careful not to overstate their claims, but to provide solid conclusions that fit the results and data provided in their study. The findings linking asparagine supplementation and the inhibition of the integrated stress response to glutamine catabolism within the rod photoreceptor cell are intriguing and innovative. Overall, the authors provide convincing data to highlight that photoreceptors utilize various fuel sources to meet their metabolic needs, and that glutamine is critical to these cells for their biomass, redox balance, function, and survival.

      We greatly appreciate the reviewer’s thoughtful comments and time spent reviewing this manuscript.

      Weaknesses:

      Recent studies have explored the metabolic "crosstalk" that exists within the mammalian retina, where metabolites are transferred between the various retinal cells and the retinal pigment epithelium. It would be of interest to test whether the conditional knockout mice have changes in metabolism (via qPCR such as shown in Figure 4 - Supplemental Figure 1) within the retinal pigment epithelium that may be contributing to the authors' findings in the neural retina. Additionally, the authors have very compelling data to show that inhibition of eIF2a or supplementation with asparagine can delay photoreceptor death via OCT measurements in their conditional knockout mouse model (Figure 6G, H). However, does inhibition of eIF2a or asparagine adversely impact the WT retina? It would also be impactful to know whether this has a prolonged effect, or if it is short-term, as this would provide strength to potential therapeutic targeting of these pathways to maintain photoreceptor health.

      We agree with the reviewer that metabolic communication in the outer retina is crucial to the function and survival of both photoreceptors and RPE. Therefore, we have performed qRT-PCR on eyecups from cKO and WT mice at P14, prior to photoreceptor degeneration. These data, now included in Figure 4 – figure supplement 2, show no significant changes in genes related to glycolysis, pyruvate metabolism and the TCA cycle in eyecups from cKO mice compared to WT mice at P14. The only exception is a significant decrease in Pdk4 in cKO mouse eyecups compared to WT, which was not observed in retina samples.

      Additionally, we have added data demonstrating that systemic treatment with ISRIB does not adversely impact the anatomy of the wild-type retina. Specifically, we performed OCT after 21 days of ISRIB treatment via intraperitoneal delivery in WT mice and show that total retinal, ONL and inner segment/outer segment thickness is unchanged compared to vehicle. These data are now included in Figure 6 – figure supplement 2A. We have also included data to suggest that the effect of ISRIB extends beyond P21 in the cKO mouse. This data, presented in Figure 6 – figure supplement 2B, shows that at P28, ISRIB continues to statistically significantly increase ONL thickness compared to vehicle in cKO animals.

      Reviewer #3 (Public Review):

      Summary:

      The authors explored the role of GLS, a glutaminase, which is an enzyme that catalyzes the conversion of glutamine to glutamate, in rod photoreceptor function and survival. The loss of GLS was found to cause rapid autonomous death of rod photoreceptors.

      Strengths:

      Interesting and novel phenotype. Two types of cre-lines were rigorously used to knockout the Gls gene in rods. Both of the conditional knockouts led to a similar phenotype, i.e. rod death. Histology and ERG were carefully done to characterize the loss of rods over specific ages. A necessary metabolomic study was performed and appreciated. Some rescue experiments were performed and revealed possible mechanisms.

      We thank the reviewer for their comments and appreciation of the methods utilized herein to address the role of GLS-driven Gln catabolism in rod photoreceptors.

      Weaknesses:

      No major weaknesses were identified. The mechanism of GLS-loss-induced rod death seems not fully elucidated by this study but could be followed up in the future, and the same for GLS's role in cones.

      We agree with the reviewer that the downstream metabolic and molecular mechanisms by which Gln catabolism impacts rod photoreceptor health are not fully elucidated. Defining these mechanisms will advance our understanding of photoreceptor metabolism and identify therapeutic targets promoting photoreceptor resistance to stress. Future studies are underway to uncover these mechanisms. Additionally, while outside the scope of the current manuscript, we have generated mice lacking GLS in cone photoreceptors specifically and are currently elucidating the role of GLS in cone photoreceptor metabolism, function, and survival. These results will be published in a separate manuscript.

      Reviewer #1 (Recommendations For The Authors):

      (1) The results could start at line 135, but the first paragraph isn't necessary. The data is published and could be referred to in the introduction.

      We appreciate the reviewer’s suggestion to shorten the beginning of the Results section; however, we believe the supplementary data, which is described in these lines, confirms the scRNAseq gene expression data, while adding GLS expression and localization data within the retina. The scRNAseq data and its publication was noted in the introduction, so we removed the sentence in line 117-119 that restates these results to shorten this section. We also reduced redundancy by removing an introductory sentence to the second Results paragraph.

      (2) "However, like other metabolically-demanding cells, recent work has demonstrated that PRs have the flexibility to utilize fuel sources beyond glucose to meet their metabolic needs (Adler et al., 2014; Du, Cleghorn, Contreras, Linton, et al., 2013; Grenell et al., 2019; Joyal et al., 2016; Xu et al., 2020)." The paper by Daniele et al. demonstrated that glucose is essential for maintaining the viability of rod photoreceptor cells.

      We thank the reviewer for highlighting published literature, which we apologetically overlooked. The reference for Daniele et al. has now been included.

      (3) "Single-cell RNA sequencing data has demonstrated that Gls is expressed throughout the human and mouse retina and much greater than Gls2 (Voigt et al., 2020). The authors should indicate the specific databases searched in Spectacle.

      We appreciate the reviewer’s attention to detail and have now included the references in the Introduction for GSE63473 from Macosko et al. and GSE142449 from Voigt et al., which were the databases we used in Spectacle to assess Gls levels in the mouse and human retina, respectively.

      References:

      (1) Macosko EZ, Basu A, Satija R, Nemesh J, Shekhar K, Goldman M, Tirosh I, Bialas AR, Kamitaki N, Martersteck EM, Trombetta JJ, Weitz DA, Sanes JR, Shalek AK, Regev A, McCarroll SA. Highly Parallel Genome-wide Expression Profiling of Individual Cells Using Nanoliter Droplets. Cell. 2015 May 21;161(5):1202-1214. doi: 10.1016/j.cell.2015.05.002. PMID: 26000488; PMCID: PMC4481139.

      (2) Voigt AP, Binkley E, Flamme-Wiese MJ, Zeng S, DeLuca AP, Scheetz TE, Tucker BA, Mullins RF, Stone EM. Single-Cell RNA Sequencing in Human Retinal Degeneration Reveals Distinct Glial Cell Populations. Cells. 2020 Feb 13;9(2):438. doi: 10.3390/cells9020438. PMID: 32069977; PMCID: PMC7072666.

      (4) The immunolabeling in Figure 2 looks like the images are overexposed, and the Gls antibody is labeling the outer segment, not just the inner segment of photoreceptors.

      We thank the reviewer for their comments regarding our immunofluorescence data. There was background staining of the outer segment in both the WT and cKO retina with decreased GLS staining in the inner segment of the cKO rod photoreceptors at P14 demonstrating loss of GLS in rod photoreceptors similar to Figure 1B.  For Figure 2E, we have provided adjusted images with PNA staining only that better represent the secondary cone degeneration that occurs in the rod photoreceptor-specific Gls cKO, which is the take home point of Figure 2E.

      (5) The authors could use a glutamate antibody to compare it to Gls KO mice as done in Davanger, S., Ottersen, O.P. and Storm-Mathisen, J. (1991), Glutamate, GABA, and glycine in the human retina: An immunocytochemical investigation. J. Comp. Neurol., 311: 483-494. https://doi.org/10.1002/cne.903110404

      We appreciate the reviewer’s suggestion to assess glutamate levels in the wild-type and Gls KO retina via antibody labeling. Our targeted metabolomics studies in Figure 6A provide quantitative evidence that glutamate, the product of the GLS-catalyzed reaction, is decreased as one would expect in that Gls KO retina. The antibody would add to these data by providing the localization of glutamate in the retina. With a rod photoreceptor-specific genetic KO, we would expect glutamate levels to be decreased in these cells. The antibody may also show that glutamate is not only decreased in the rod photoreceptor inner segment, where GLS predominates, but also in the synaptic terminal in accordance with the reviewer’s concerns regarding the impact of GLS KO on synaptic transmission. We have addressed this concern at length above, adding TEM images of the ribbon synapses in the GLS KO retina, and ERG analyses from the IND-cKO animals prior to significant degeneration. In the end, we agree with the reviewer that reduced Glu levels in the GLS cKO retina may impact synaptic transmission to a degree, but the synapses remain intact based on immunofluorescence and TEM analyses and a negative ERG pattern is not observed in the GLS cKO (i.e. iCre-Gls flox/flox) or IND-cKO mouse. As noted above, the structure of the retina in models that disrupt photoreceptor synaptic transmission is maintained (Dick et al. Neuron. 2003) or noted to have modest changes within the first two months of age with the outer nuclear layer not significantly altered until 8-10 months of age (Haeseleer et al. Nature Neuroscience. 2004). So, the impact of the reduced Glu levels on synaptic transmission in the GLS KO retina are unlikely to account in full for the rapid and profound photoreceptor degeneration observed. That said, the IND-cKO mouse, which allows us to assess photoreceptor function prior to significant degeneration unlike the GLS cKO mouse (i.e. iCre-Gls flox/flox), demonstrates GLS-driven Gln catabolism plays a significant role in photoreceptor function but still does not demonstrate a negative ERG pattern. Therefore, assessing Glu localization in this mouse model 10 days post TAM will be informative as to how GLS-driven Gln catabolism impacts photoreceptor function prior to degeneration. The IND-cKO mouse model is currently being extensively characterized for future publication.

      Reviewer #2 (Recommendations For The Authors):

      Main Concerns:

      (1) The authors checked for Gls2 compensation at P14 in the mouse retina. However, this data would be more compelling with an additional timepoint, particularly at P21 which is used in many of their figures throughout the study.

      We thank the reviewer for their suggestion. Figure 1-figure supplement 1D demonstrates no change in Gls2 gene expression at P14 between the WT and cKO retina. With regards to the reviewer’s concern, in Figure 1-figure supplement 1E of the original submission, we demonstrate that the expression of GLS2 is not increased in the cKO retina at P21 via immunofluorescence.

      (2) Recent studies have explored the metabolic "crosstalk" that exists within the mammalian retina, where metabolites are transferred between the various retinal cells and the retinal pigment epithelium. It would be compelling to see whether the cKO mice have changes in metabolism (via qPCR such as shown in Supplementary Figure 1 for Figure 4) within the RPE that may be contributing to their findings in the neural retina. Additionally, mention of this crosstalk and how it may impact their results should be added to the discussion.

      We appreciate the reviewer’s concern for metabolism changes in the RPE of Gls cKO mice. In agreement with reviewer 2, we performed qRT-PCR on eyecups from cKO and WT mice at P14, prior to photoreceptor degeneration. These data, now included in Figure 4 – figure supplement 2, show no significant changes in genes related to glycolysis, pyruvate metabolism and the TCA cycle in eyecups from cKO mice compared to WT mice at P14. The only exception is a significant decrease in Pdk4 in cKO mouse eyecups compared to WT, which was not observed in retina samples.

      (3) The authors use a tamoxifen-inducible cKO model to support their findings in developed rods. However, in Figure 3A it appears that this model has a greater reduction in GLS compared to the Rho-cre mouse model. Can the authors discuss this? Is this cre more efficient at targeting rods or is it leaky and may have affected other retinal cells?

      We thank the reviewer for pointing out this interesting result associated with using the Pde6g-Cre-ERT2 mouse line. Pde6g is expressed by rods to a significant degree but also by cones (GSE63473, scRNAseq data). Therefore, the IND-cKO mouse likely knocks out GLS from both rods and cones upon the TAM induction. To this end, the immunofluorescence image in Figure 3B shows GLS is knocked out in both rod or cone inner segments unlike in Figure 1B where GLS remains in cones when using the rod photoreceptor-specific, Gls<sup>fl/fl</sup> Rho-Cre<sup>+</sup> mouse. As such, as the astute reviewer noted, the fact that Western blot demonstrates greater reduction in GLS protein content fits with the protein being knocked out of both rods and cones. We have added this note about the mouse model in the corresponding text.

      (4) The authors have very compelling data to show that inhibition of eIF2a can delay photoreceptor death via OCT measurements in their cKO mouse model (Figure 6G). However, does ISRIB adversely impact the WT retina? WT vehicle and ISRIB should be shown. It would also be compelling to know whether this has a prolonged effect, or if it is short-term (i.e. would the effect still be present at P42)?

      We appreciate the reviewer’s comments regarding antagonizing the effects of p-eIF2a to prolong photoreceptor survival in the Gls cKO retina. As described above, we have data demonstrating systemic treatment with ISRIB does not adversely impact the anatomy of the wild-type retina (Figure 6-figure supplement 2A). Specifically, we treated WT animals with daily intraperitoneal ISRIB starting at P5 and performed OCT at P21 to show that total retinal, ONL and the inner segment/outer segment thickness is unchanged compared to vehicle-treated WT animals. Additionally, we have included data demonstrating the photoreceptor neuroprotective effect of ISRIB treatment in the Gls cKO mouse extends beyond P21 in the cKO mouse (Figure 6-figure supplement 2B).

      (5) For Figure 6H, same as point #4.

      While we have not specifically assessed potential retinal toxicity secondary to systemic Asn supplementation, oral Asn supplementation (up to 100mg/kg/day) was provided to patients for 24 months and found to be well-tolerated (PMID:31123592). Allometric scaling of this dose to the mouse would yield a mouse dose of 1234 mg/kg/day, which is much greater than the 200mg/kg/day dose provided here (PMID: 27057123). Additionally, a 90-day toxicity study of Asn in rats demonstrated a no observed adverse effect level of 1.62g/kg bodyweight/day in males and 1.73g/kg bodyweight/day in females (PMID: 18508175). The lower dose in that study equates to a mouse dose of 3.2g/kg bodyweight/day, well above the mouse dose utilized in this report. As such, future studies should focus on a dose-response relationship with Asn supplementation, and as the reviewer suggested, determining the duration of effect with Asn supplementation.

      (6) Some of the results section belongs in the introduction or discussion and can be moved.

      We have addressed the reviewer’s concern by moving some of the results to the discussion and removing statements in the results that were either noted in the Introduction or conferred in the Discussion.

      Minor Concerns:

      (1) Scale bar mentions in the figure legends use plural when only one is present, or in some cases are missing. A scale bar should be added to the OCT images if possible.

      We appreciate the reviewer’s attention to detail, and information regarding scale bars has been updated in the figure legends.

      (2) For Figures 1I and J, the sample size changes when J is a quantification of I. Please correct.

      We have corrected the sample size to be consistent between Figures 1I and J.

      (3) In Figure 1 - Figure Supplement 3 the P42 timepoint is not mentioned in the legend. Please correct.

      We have now included the P42 timepoint in the legend for in Figure 1 – Figure Supplement 3 as well as the manuscript text.

      (4) In Figure 1 - Figure Supplement 5 the wrong P value is mentioned in the legend. Please correct.

      We have corrected the P value in the legend for Figure 1 – Figure Supplement 5.

      (5) Can the authors double-check their ERG light intensity settings? They seem high. Please confirm if they are correct.

      We appreciate the reviewer’s concern for ERG light intensity settings and have confirmed the settings used in the study were 32 cd*s/m<sup>2</sup> and 100 cd*s/m<sup>2</sup> for scotopic and photopic ERG recordings, respectively.

      (6) The legend key in Figure 2A would be more helpful if the axis were present by the representative traces.

      We thank the reviewer for the suggestion of adding axes to the ERG traces. Figure 2A has been updated to reflect this modification.

      (7) Can the authors check that the error bars are present in Figure 5E?

      We appreciate the reviewer’s concern for error bars in Figure 5E, which are included in the figure. The standard error in this experiment is so small that the symbols overlap with the error bars.

      Reviewer #3 (Recommendations For The Authors):

      Suggestions for improved or additional experiments, data, or analyses.

      (1) Figure 6: ISRIB seems to give the most dramatic rescue of cKO GLS in P21 rods. Does it completely prevent rod death? i.e. What's the ONL thickness of P21 WT control? What's the ISRIB rescue of an older cKO animal, say P35?

      The ONL thickness of P21 WT control is on average 0.06 mm (Figure 1E), while the ONL thickness of the Gls cKO retina with ISRIB treatment at P21 is on average 0.044 mm. Therefore, rod death is not completely prevented with ISRIB but rather, rod photoreceptor survival is prolonged. As noted above, we have provided data to demonstrate that the photoreceptor neuroprotective effect of ISRIB lasts beyond P21 (Figure 6-figure supplement 2B).

      (2) What's the mechanistic link between ISR and GLS beyond current speculation? Does GLS have other unknown functions beyond converting glutamine to glutamate? Any novel insights from GLS protein structure?

      We thank the reviewer for this thoughtful question. It is certainly possible that GLS has other functions outside of its role in glutaminolysis. It is well known that other metabolic enzymes have moonlighting functions including hexokinase 2, which has been shown to be important in preventing intrinsic apoptosis through blocking the binding of pro-apoptotic proteins to the mitochondria. While not directly related to ISR, a single report suggests GLS functions non-canonically in Gln-deprived states, promoting mitochondrial fusion to suppress ROS production (PMID: 29934617). Investigating the moonlighting functions of metabolic enzymes is part of our ongoing research program and GLS is included in these studies.

      (3) Just curious about GLS cKO in cones. Any similar phenotype?

      We appreciate the reviewer’s curiosity regarding Gls cKO in cones and this study is currently ongoing with a poster presented at ARVO 2024 (Subramanya et al; Glutaminase-driven glutamine catabolism supports cone photoreceptor metabolism, function, and structure. Invest. Ophthalmol. Vis. Sci. 2024;65(7):193) and a manuscript in preparation. As discussed above, GLS knock out in cones likely impacts their function, in accordance with the data presented at ARVO 2024.

      Recommendations for improving the writing and presentation.

      (1) In the Discussion, lines 458-466, it's incorrect to compare the importance of glucose metabolism to GLS-dependent pathway to photoreceptors in this way. An alternative explanation: glucose metabolism is so important that the system has many redundancies, e.g. HK1 exists in addition to HK2, thus single gene KO leads to no phenotype. The only fair comparison is nutrient deprivation, e.g. taking out glucose or glutamine from retina explants (Punzo et al., 2009).

      The reviewer makes an excellent point. While we do not see an upregulation of GLS2 in the retina or rod PRs upon GLS knockout (Figure 1-figure supplement 1 D and E), loss of Gls in rod PRs does alter the expression of many metabolism-related genes (Figure 4-figure supplement 1).  We alluded to these data and the reviewer’s point in the second paragraph of the discussion: “In any of these transgenic mouse models, PRs may use other transporters to take up fatty acids or glucose or rewire their metabolism to maintain metabolic homeostasis and stave off degeneration (Subramanya et al., 2023; Wubben et al., 2017). Our data show that any metabolic reprogramming that is occurring in the cKO mouse retina appears unable to significantly circumvent the significant and rapid PR degeneration suggesting the importance of Gln catabolism in rod PRs. Furthermore, inducing GLS knockdown in mature PRs also demonstrated rapid PR degeneration (Figure 3).”

      In the revised article, we have amended these sentences to include the importance of metabolic redundancies. “In any of these transgenic mouse models, PRs may use other transporters to take up fatty acids or glucose, rewire their metabolism, or utilize metabolic redundancies to maintain metabolic homeostasis and stave off degeneration (Subramanya et al., 2023; Wubben et al., 2017). Our data show that any metabolic reprogramming that is occurring in the cKO mouse retina appears unable to significantly circumvent the significant and rapid PR degeneration suggesting the importance of Gln catabolism in rod PRs. Furthermore, inducing GLS knockdown in mature PRs also demonstrated rapid PR degeneration (Figure 3).”

      (2) Please discuss the mosaic activity of Rho-cre used in this study, as described in the original study (Le et al 2006). Line 221 (Li et al 2005) seems to be a different Rho-Cre created by a different group. Please make sure the citation is correct and consistent.

      We apologize for the confusion and have corrected the reference on line 221 to Le et al, 2006. The reviewer is correct that the original report (Le at al. 2006) demonstrated a mosaic of Cre-mediated recombination in rod photoreceptors and rod bipolar cells in the mouse line that had the shorter (0.2 kb) mouse opsin promoter-controlled Cre. In contrast, this same report showed only Cre-mediated recombination in rod photoreceptors in another line that utilized a long (4.1 kb) mouse opsin promoter-controlled Cre. We have published using this latter promoter-controlled Cre recombinase in at least 5 different mouse models (Wubben et al. 2017; Weh et al. 2020; Weh et al. 2023; Subramanya et al. 2023; the current report), and in all these models, we observe clear and consistent knockout by immunofluorescence only in rod photoreceptors with residual protein in cones and no significant change in protein expression in the INL where bipolar cells reside. Western blots confirm the reduction in protein expression.

      (3) The authors should provide representative images of retina cross-sections for key rescue data (Figure 6G&H).

      As requested by Reviewer 3, representative histology images of retina cross-sections for the ISRIB and Asn rescue experiments in Gls cKO mice at P21 are now included in the manuscript in Figure 6 – figure supplement 3.

      Minor corrections to the text and figures.

      (1) Spell out Gln in the Abstract when used for the first time.

      We have included glutamine (Gln) in the abstract upon first use.

      (2) Line 433, Figure 6G should be 6H.

      Thank you for the correction, the manuscript has been updated.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      Summary:

      The study aimed to investigate the significant impact of criterion placement on the validity of neural measures of consciousness, examining how different standards for classifying a stimulus as 'seen' or 'unseen' can influence the interpretation of neural data. They conducted simulations and EEG experiments to demonstrate that the Perceptual Awareness Scale, a widely used tool in consciousness research, may not effectively mitigate criterion-related confounds, suggesting that even with the PAS, neural measures can be compromised by how criteria are set. Their study challenged existing paradigms by showing that the construct validity of neural measures of conscious and unconscious processing is threatened by criterion placement, and they provided practical recommendations for improving experimental designs in the field. The authors' work contributes to a deeper understanding of the nature of conscious and unconscious processing and addresses methodological concerns by exploring the pervasive influence of criterion placement on neural measures of consciousness and discussing alternative paradigms that might offer solutions to the criterion problem.

      The study effectively demonstrates that the placement of criteria for determining whether a stimulus is 'seen' or 'unseen' significantly impacts the validity of neural measures of consciousness. The authors found that conservative criteria tend to inflate effect sizes, while liberal criteria reduce them, leading to potentially misleading conclusions about conscious and unconscious processing. The authors employed robust simulations and EEG experiments to demonstrate the effects of criterion placement, ensuring that the findings are well-supported by empirical evidence. The results from both experiments confirm the predicted confounding effects of criterion placement on neural measures of unconscious and conscious processing.

      The results are consistent with their hypotheses and contribute meaningfully to the field of consciousness research.

      We would like to thank reviewer 1 for their positive words and for taking the time to evaluate our manuscript.

      Reviewer #2 (Public review):

      Summary:

      The study investigates the potential influence of the response criterion on neural decoding accuracy in consciousness and unconsciousness, utilizing either simulated data or reanalyzing experimental data with post-hoc sorting data.

      Strengths:

      When comparing the neural decoding performance of Target versus NonTarget with or without post-hoc sorting based on subject reports, it is evident that response criterion can influence the results. This was observed in simulated data as well as in two experiments that manipulated subject response criterion to be either more liberal or more conservative. One experiment involved a two-level response (seen vs unseen), while the other included a more detailed four-level response (ranging from 0 for no experience to 3 for a clear experience). The findings consistently indicated that adopting a more conservative response criterion could enhance neural decoding performance, whether in conscious or unconscious states, depending on the sensitivity or overall response threshold.

      Weaknesses:

      (1) In the realm of research methodology, conducting post-hoc sorting based on subject reports raises an issue. This operation leads to an imbalance in the number of trials between the two conditions (Target and NonTarget) during the decoding process. Such trial number disparity introduces bias during decoding, likely contributing to fluctuations in neural decoding performance. This potential confounding factor significantly impacts the interpretation of research findings. The trial number imbalance may cause models to exhibit a bias towards the category with more trials during the learning process, leading to misjudgments of neural signal differences between the two conditions and failing to accurately reflect the distinctions in brain neural activity between target and non-target states. Therefore, it is recommended that the authors extensively discuss this confounding factor in their paper. They should analyze in detail how this factor could influence the interpretation of results, such as potentially exaggerating or diminishing certain effects, and whether measures are necessary to correct the bias induced by this imbalance to ensure the reliability and validity of the research conclusions.

      We would like to thank reviewer 2 for their positive words and for taking the time to evaluate our manuscript. In response to this asserted weakness, we would like to point out that the issue of trial imbalances was already comprehensively addressed in the manuscript. No trial imbalances are present in the analyzed data for any of the conditions, so that none of our reported results could have been impacted by this. This was done through the following set of measures:

      (1) Training data (method section): “a linear discriminant analytic (LDA) classifier was trained for each participant using all trials from all sessions (3 sessions in Experiment 1, 2 sessions in Experiment 2) to discriminate target from no-target trials based on EEG data, irrespective of seen/unseen responses and irrespective of the response criterion. To maximize signal-to-noise ratio, we applied a leave-one-person-out cross validated decoding scheme by using all classifiers from all participants except the participants that was being tested (separately for Experiment 1 and for Experiment 2). This leave-one-person-outcross validation procedure maximized the available data for training without requiring k-foldingon subsets of cells with low response counts, so that all test sets were classified by the same fully independent classifiers. A single time series of classification performance across time was obtained for every participant (every testing set) by averaging classification performance across all classifiers that tested that set (see Methods and supplementary Figure S2 for details).”<br /> This leave-one-person-outcross validation scheme made surre that no trial selection needed to be performed to analyze conservative or liberal conditions. Both conditions were classified using the same classifier, consisting of all data from the other participants.

      (2) Testing data (methods section): “To ensure that differences resulting from post hoc sorting could not be explained by differences in signal-to-noise ratio resulting from disparities in trial counts in the testing set, we equated trial counts between the liberal and conservative condition within each participant by randomly selecting the same number of trials from overrepresented cells (for Experiment 1, this was done at the level of ‘seen’ and ‘unseen’ responses, for experiment 2 the trial counts were equated at eachof the PAS levels, see methods for details). As a result, response-contingent conditions in the liberal and conservative conditions had identical input for all classification analyses. Although different trial counts in the testing set might affect the precision with which AUC is estimated in a decoding analysis, it does not affect the size of AUC itself. Trial count equation was merely performed tomake sure the liberal and conservative condition were as comparable as possible.”

      Indeed, we also report at the end of this section that running the same analyses without selecting trials in the test set yielded qualitatively identical results: “Analyzing the data without equating trial counts resulted in qualitatively identical results.”

      To remove any lack of clarity about this, we now also briefly report in the beginning of the discussion section that the results cannot be explained by unequal trial counts:

      “We found that in both experiments, criterion shifts modulated effect size in neural measures of ‘unconscious’ (unseen) and/or ‘conscious’ (seen) processing, and that this happens even though the conservative and liberal condition used the same independent training data (identical classifiers), and even though the trial counts in the test sets were equated for the conservative and liberal condition.”

      Reviewer #3 (Public review):

      Summary:

      Fahrenfort et al. investigate how liberal or conservative criterion placement in a detection task affects the construct validity of neural measures of unconscious cognition and conscious processing. Participants identified instances of "seen" or "unseen" in a detection task, a method known as post hoc sorting. Simulation data convincingly demonstrate that, counterintuitively, a conservative criterion inflates effect sizes of neural measures compared to a liberal criterion. While the impact of criterion shifts on effect size is suggested by signal detection theory, this study is the first to address this explicitly within the consciousness literature. Decoding analysis of data from two EEG experiments further shows that different criteria lead to differential effects on classifier performance in post hoc sorting. The findings underscore the pervasive influence of experimental design and participant reports on neural measures of consciousness, revealing that criterion placement poses a critical challenge for researchers.

      Strengths and Weaknesses

      One of the strengths of this study is the inclusion of the Perceptual Awareness Scale (PAS), which allows participants to provide more nuanced responses regarding their perceptual experiences. This approach ensures that responses at the lowest awareness level (selection 0) are made only when trials are genuinely unseen. This methodological choice is important as it helps prevent the overestimation of unconscious processing, enhancing the validity of the findings.

      The authors also do a commendable job in the discussion by addressing alternative paradigms, such as wagering paradigms, as a possible remedy to the criterion problem (Peters & Lau, 2015; Dienes & Seth, 2010). Their consideration of these alternatives provides a balanced view and strengthens the overall discussion.

      Our initial review identified a lack of measures of variance as one potential weakness of this work. However we agree with the authors' response that plotting individual datapoints for each condition is indeed a good visualization of variance within a dataset.

      Impact of the Work:

      This study effectively demonstrates a phenomenon that, while understood within the context of signal detection theory, has been largely unexplored within the consciousness literature. Subjective measures may not reliably capture the construct they aim to measure due to criterion confounds. Future research on neural measures of consciousness should account for this issue, and no-report measures may be necessary until the criterion problem is resolved.

      We thank reviewer 3 for their positive words and for taking the time to evaluate our manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public Review):

      (1) The rationale for performing genomics, transcriptional, and proteomics work in 293T cells is not discussed. Further, there are no functional readouts mentioned in the 293T cells with expression of the fusion-oncogenes. Did these cells have any phenotypes associated with fusion-oncogene expression (proliferation differences, morphological changes, colony formation capacity)? Further, how similar are the gene expression signatures from RNA-seq to rhabdomyosarcoma? This would help the reader interpret how similar these cell models are to human disease.

      We appreciate the reviewer’s comments and understand the limitation of HEK293T cell culture. HEK293T cells were used as a surrogate system that enabled us to systemically examine and compare the transcriptional activation mechanisms between VGLL2-NCOA2/TEAD1-NCOA2 and YAP/TAZ. HEK293T cells have previously been used as a model system to study the signaling and transcriptional mechanisms of the Hippo/YAP pathway (1,2). Our data also showed that the ectopic expression of VGLL2-NCOA2 and TEAD1-NCOA2 in HEK293 cells can promote proliferation (Figure 1-figure supplement 1B), consistent with their potential oncogenic function.

      (2) TEAD1::NCOA2 fusion-oncogene model was not credentialed past H&E, and expression of Desmin. Is the transcriptional signature in C2C12 or 293T similar to a rhabdomyosarcoma gene signature?

      We understand the reviewer’s concern. VGLL2-NCOA2 in vivo tumorigenesis model generated by C2C12 cell orthotopic transplantation has recently been reported, and it exhibits similar characteristics with zebrafish transgenic tumors as well as human scRMS samples that carry the VGLL2-NCOA2 fusion (3). Due to the similar transcriptional and oncogenic mechanisms employed by both VGLL2-NCOA2 and TEAD1-NCOA2 fusion proteins, we expect that the TEAD1-NCOA2 dependent C2C12 transplantation model will closely resemble that induced by VGLL2-NCOA2.

      (3) For the fusion-oncogenes, did the HA, FLAG, or V5 tag impact fusion-oncogene activity? Was the tag on the 3' or 5' of the fusion? This was not discussed in the methods.

      To address the reviewer’s concern, we carefully compared the transcriptional activity of the fusion proteins with the HA tag at the 5’ end or FLAG and V5 tag at the 3’ end. We found that neither the tag type nor its location significantly affects the ability of VGLL2-NCOA2 and TEAD1-NCOA2 to induce downstream gene transcription, measured by qPCR. The data is summarized in Figure 1-figure supplement 1 G-H.

      (4) Generally, the lack of details in the figures, figure legends, and methods make the data difficult to interpret. A few examples are below:

      a. Individual data points are not shown for figure bar plots (how many technical or biological replicates are present and how many times was the experiment repeated?).

      As requested, we have added the individual data points to the bar plots. The Method section now includes information on the number of biological replicates and the times the experiments were repeated.

      b. What exons were included in the fusion-oncogenes from VGLL2 and NCOA2 or TEAD1 and NCOA2?

      We have now included the exon structure organization of VGLL2-NCOA2 or TEAD1-NCOA2 fusions in Figure 1-figure supplement 1A.

      c. For how long were the colony formation experiments performed? Two weeks?

      We have included more detailed information about the colony formation assay in the Methods section.

      d. In Figure 2D, what concentration of CP1 was used and for how long?

      The CP1 concentration and treatment duration information has now been included in the figure legend and Methods section.

      e. How was A485 resuspended for cell culture and mouse experiments, what is the percentage of DMSO?

      The Methods section now includes detailed information on how A485 is prepared for in vitro and in vivo experiments.

      f. How many replicates were done for RNA-seq, CUT&RUN, and ATACseq experiments?

      RNA-seq was done with three biological replicates and CUT&RUN and ATAC-seq were performed with two biological replicates. This information is now included in the Methods section for clarification.

      Reviewer #2 (Public Review):

      In the manuscript entitled "VGLL2 and TEAD1 fusion proteins drive YAP/TAZ-independent transcription and tumorigenesis by engaging p300", Gu et al. studied two Hippo pathway-related gene fusion events (i.e., VGLL2-NCOA2, TEAD1-NCOA2) in spindle cell rhabdomyosarcoma (scRMS) and showed that their fusion proteins can activate Hippo downstream gene transcription independent of YAP/TAZ. Using the BioID-based mass spectrometry analysis, the authors revealed histone acetyltransferase CBP/p300 as specific binding proteins for VGLL2-NCOA2 and TEAD1-NCOA2 fusion proteins. Pharmacologically targeting p300 inhibited the fusion proteins-induced Hippo downstream gene transcription and tumorigenic events.

      Overall, this study provides mechanistic insights into the scRMS-associated gene fusions in tumorigenesis and reveals potential therapeutic targets for cancer treatment. The manuscript is well-written and easy to follow.

      Here, several suggestions are made for the authors to improve their study.

      Main points

      (1) The authors majorly focused on the Hippo downstream gene transcription in this study, while a significant portion of genes regulated by the VGLL2-NCOA2 and TEAD1-NCOA2 fusion proteins are non-Hippo downstream genes (Figure 3). The authors should investigate whether the altered Hippo pathway transcription is essential for VGLL2-NCOA2 and TEAD1-NCOA2-induced cell transformation and tumorigenesis. Specifically, they should test if treatment with the TEAD inhibitor can reverse the cell transformation and tumorigenesis caused by VGLL2-NCOA2 but not TEAD1-NCOA2. In addition, it is important to examine whether YAP-5SA expression can rescue the inhibitory effects of A485 on VGLL2-NCOA2 and TEAD1-NCOA2-induced colony formation and tumor growth. This will help clarify whether Hippo downstream gene transcription is important for the oncogenic activities of these two fusion proteins.

      We thank the reviewer for the comments. Although we have not tested the small molecular TEAD inhibitor on VGLL2-NCOA2 or TEAD1-NCOA2-induced cell transformation and tumorigenesis, we expect that TEAD inhibition will block VGLL2-NCOA2- but not TEAD1-NCOA2-induced oncogenic activity. It is because TEAD1-NCOA2 does not contain the auto-palmitoylation sites and the hydrophobic pocket in the C-terminal YAP-binding domain of TEAD1 that the TEAD small molecule inhibitor occupies (4). We also appreciate the reviewer’s suggestion of YAP5SA rescue experiments. However, due to its strong oncogenic activity, YAP5SA itself can induce robust downstream transcription and cell transformation with or without A485 treatment, as shown in Figure 5. Thus, it will be unlikely to address whether non-Hippo downstream genes induced by the fusions are important for cell transformation and tumorigenesis. Because of the distinct nature of transcriptional and chromatin landscapes controlled by VGLL2-NCOA2/TEAD-NCOA2 and YAP, we speculate that both Hippo and non-Hippo-related downstream genes contribute to the oncogenic activation and tumor phenotypes induced by the fusion proteins.

      (2) Rationale for selecting CBP/p300 for functional studies needs to be provided. The BioID-MS experiment identified many interacting proteins for VGLL2-NCOA2 and TEAD1-NCOA2 fusion proteins (Table S4). The authors should explain the scoring system used to identify the high-interacting proteins for VGLL2-NCOA2 and TEAD1-NCOA2 fusion proteins. Was CEP/p300 the top candidates on the list? Providing this information will help justify the focus on CBP/p300 and validate their importance in this study.

      We appreciate the reviewer’s point. CBP/P300 is among the top hits in our proteomics screens of both VGLL2-NCOA2 and TEAD1-NCOA2. Our focus on CBP/P300 is mainly due to the well-established interactions between CBP/P300 and the NCOA family transcriptional co-activators, in which the CBP/P300-NCOA complex plays a central role in mediating nuclear receptors-induced transcriptional activation (5). In addition, our data is consistent with another re-current Vgll2 fusion identified in scRMS, VGLL2-CITED2 (6) that has a C-term fusion partner from CITED2, which is a known CBP/P300 interacting protein (7).

      (3) p300 was revealed as a key driver for the VGLL2-NCOA2 and TEAD1-NCOA2 fusion proteins-induced transcriptome alteration and tumorigenesis. To strengthen the point, the authors should identify the p300 binding region on VGLL2-NCOA2 and TEAD1-NCOA2 fusion proteins. Mutants with defects in p300 binding/recruitment should be generated and included as a control in the related q-PCR and tumorigenic studies. This work will help confirm the crucial role of p300 in mediating the oncogenic effects of these two fusion proteins.

      We thank the reviewer for the suggestion. We have performed the co-immunoprecipitation assay using the deletion mutant form of VGLL2-NCOA2. We have performed additional co-immunoprecipitation experiments and demonstrated that the C-term NCOA2 part of the fusion is responsible for mediating the interaction between the fusion protein and CBP/P300. These results are now included in the new Figure 5A and are consistent with the reported structural analysis of CBP/P300-NCOA complex (8). In addition, our new data showed the inability of the VGLL2-NCOA2 ∆NCOA2 mutant to induce gene transcription (Figure 1-figure supplement 1D). Furthermore, our data using the small molecular CBP/P300 inhibitor clearly demonstrated that CBP/P300 is required to mediate cell transformation and tumorigenesis induced by the two fusion proteins in vitro and in vivo (Figure 5 and 6).

      (4) Another major issue is the overexpression system extensively used in this study. It is important to determine whether the VGLL2-NCOA2 and TEAD1-NCOA2 fusion genes are also amplified in cancer. If not, the expression levels of the VGLL2-NCOA2 and TEAD1-NCOA2 fusion proteins should be adjusted to endogenous levels to assess their oncogenic effects on gene transcription and tumorigenesis. This approach would make the study more relevant to the pathological conditions observed in scRMS cancer patients.

      We appreciate the reviewer’s input and acknowledge the limitation of the HEK293T and C2C12 cell-based models that rely on ectopic expression of VGLL2-NCOA2 and TEAD1-NCOA2 fusion proteins. It is currently unclear whether the VGLL2-NCOA2 and TEAD1-NCOA2 fusion genes are also amplified in sarcoma. As mentioned before, these surrogate cell culture systems allowed us to systemically compare the transcriptional regulation by the fusion proteins and YAP/TAZ and elucidate the molecular mechanism underlying the Hippo/YAP-independent oncogenic transformation induced by VGLL2-NCOA2 and TEAD1-NCOA2.

      References:

      (1) Genes Dev . 2007 Nov 1;21(21):2747-61. doi: 10.1101/gad.1602907. Inactivation of YAP oncoprotein by the Hippo pathway is involved in cell contact inhibition and tissue growth control

      (2) Genes Dev . 2010 Jan 1;24(1):72-85. doi: 10.1101/gad.1843810. A coordinated phosphorylation by Lats and CK1 regulates YAP stability through SCF(beta-TRCP)

      (3) VGLL2-NCOA2 leverages developmental programs for pediatric sarcomagenesis. Watson S, LaVigne CA, Xu L, Surdez D, Cyrta J, Calderon D, Cannon MV, Kent MR, Cell Rep. 2023 Jan 31;42(1):112013.

      (4) Lats1/2 Sustain Intestinal Stem Cells and Wnt Activation through TEAD-Dependent and Independent Transcription. Cell Stem Cell. 2020 May 7;26(5):675-692.e8.

      (5) Yi, P., Yu, X., Wang, Z., and O’Malley, B.W. (2021). Steroid receptor-coregulator transcriptional complexes: new insights from CryoEM. Essays Biochem. 65, 857–866.

      (6) A Molecular Study of Pediatric Spindle and Sclerosing Rhabdomyosarcoma: Identification of Novel and Recurrent VGLL2-related Fusions in Infantile Cases. Am J Surg Pathol . 2016 Feb;40(2):224-35. doi: 10.1097/

      (7) CITED2 and the modulation of the hypoxic response in cancer. Fernandes MT, Calado SM, Mendes-Silva L, Bragança J.World J Clin Oncol. 2020 May 24;11(5):260-274.

      (8) Yu, X., Yi, P., Hamilton, R.A., Shen, H., Chen, M., Foulds, C.E., Mancini, M.A., Ludtke, S.J., Wang, Z., and O’Malley, B.W. (2020). Structural insights of transcriptionally active, full-length Androgen receptor coactivator complexes. Mol. Cell 79, 812–823.e4.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Giménez-Orenga et al. investigate the origin and pathophysiology of myalgic encephalomyelitis/chronic fatigue syndrome (ME/CFS) and fibromyalgia (FM). Using RNA microarrays, the authors compare the expression profiles and evaluate the biomarker potential of human endogenous retroviruses (HERV) in these two conditions. Altogether, the authors show that HERV expression is distinct between ME/CFS and FM patients, and HERV dysregulation is associated with higher symptom intensity in ME/CFS. HERV expression in ME/CFS patients is associated with impaired immune function and higher estimated levels of plasma cells and resting CD4 memory T cells. This work provides interesting insights into the pathophysiology of ME/CFS and FM, creating opportunities for several follow-up studies.

      Strengths:

      (1) Overall, the data is convincing and supports the authors' claims. The manuscript is clear and easy to understand, and the methods are generally well-detailed. It was quite enjoyable to read.

      (2) The authors combined several unbiased approaches to analyse HERV expression in ME/CFS and FM. The tools, thresholds, and statistical models used all seem appropriate to answer their biological questions.

      (3) The authors propose an interesting alternative to diagnosing these two conditions. Transcriptomic analysis of blood samples using an RNA microarray could allow a minimally invasive and reproducible way of diagnosing ME/CFS and FM.

      Weaknesses:

      (1) The cohort analysed in this study was phenotyped by a single clinician. As ME/CFS and FM are diagnosed based on unspecific symptoms and are frequently misdiagnosed, this raises the question of whether the results can be generalised to external cohorts.

      Thank you for your comment. Surely the study of larger cohorts will determine the external validity of these results in a clinical scenario. However, this pilot study, first of its kind, was designed to maximize homogeneity across participants which seemed primarily ensured by the study of females only and diagnosis by a single experienced observer.

      (2) The analyses performed to unravel the causes and effects of HERV expression in ME/CFS and FM are solely based on sequencing data. Experimental approaches could be used to validate some of the transcriptomic observations.

      Certainly, experimental approaches may add robustness to the implication of HERVs in ME/CFS. We indeed consider taking this avenue to deepen in the findings presented here for future work. However, the limited knowledge of HERV-mediated physiological functions may hamper the obtention of prompt results towards revealing causes and effects of HERV expression in ME/CFS and FM.

      Reviewer #2 (Public review):

      Summary:

      Giménez-Orenga carried out this study to assess whether human endogenous retroviruses (HERVs) could be used to improve the diagnosis of Myalgic Encephalomyelitis/Chronic Fatigue Syndrome (ME/CFS) and Fibromyalgia (FM). To this end, they used the HERV-V3 array developed previously, to characterize the genome-wide changes in the expression of HERVs in patients suffering from ME/CFS, FM, or both, compared to controls. In turn, they present a useful repertoire of HERVs that might characterize ME/CFS and FM. For the most part, the paper is written in a manner that allows a natural understanding of the workflow and analyses carried out, making it compelling. The figures and additional tables present solid support for the findings. However, some statements made by the authors seem incomplete and would benefit from a more thorough literature review. Overall, this work will be of interest to the medical community seeking in better understanding of the co-occurrence of these pathologies, hinting at a novel angle by integrating HERVs, which are often overlooked, into their assessment.

      Strengths:

      (1) The work is well-presented, allowing the reader to understand the overall workflow and how the specific aims contribute to filling the knowledge gap in the field.

      (2) The analyses carried out to understand the potential impact on gene expression mediated by HERVs are in line with previous works, making it solid and robust in the context of this study.

      Weaknesses:

      (1) The authors claim to obtain genome-wide HERV expression profiles. However, the array used was developed using hg19, while the genomic analysis of this work are carried out using a liftover to hg38. It would improve the statement and findings to include a comparison of the differences in HERVs available in hg38, and how this could impact the "genome-wide" findings.

      This is an important point. However, the low number of probes (less than 100) that were excluded from our analysis by lack of correspondence with hg38 among the 1,290,800 probesets was interpreted as insignificant for "genome-wide" claims. An aspect that will be explained in the revised version of this manuscript.

      (2) The authors in some points are not thorough with the cited literature. Two examples are:

      a) Lines 396-397 the authors say "the MLT1, usually found enriched near DE genes (Bogdan et al., 2020)". I checked the work by Bogdan, and they studied bacterial infection. A single work in a specific topic is not sufficient to support the statement that MLT1 is "usually" in close vicinity to differentially expressed genes. More works are needed to support this.

      b) After the previous statement, the authors go on to mention "contributing to the coding of conserved lncRNAs (Ramsay et al., 2017)". First, lnc = long non-coding, so this doesn't make sense. Second, in the work by Ramsay they mention "that contributed a significant amount of sequence to primate lncRNAs whose expression was conserved", which is different from what the authors in this study are trying to convey. Again, additional work and a rephrasing might help to support this idea.

      Certainly, these two sentences need rephrasing to better adjust to current evidence.

      Revised sentences can now be found in lines 397-402

      (3) When presenting the clusters, the authors overlook the fact that cluster 4 is clearly control-specific, and fail to discuss what this means. Could this subset of HERV be used as bona fide markers of healthy individuals in the context of these diseases? Are they associated with DE genes? What could be the impact of such associations?

      Using control DE HERV as bona fide markers of healthy individuals seems like an interesting possibility worth exploring. Control DE HERV (cluster 4) associate with DE genes involved in apoptosis, T cell activation and cell-cell adhesion (modules 1 and 6). The impact of which deserves further study.

      Appraisals on aims:

      The authors set specific questions and presented the results to successfully answer them. The evidence is solid, with some weaknesses discussed above that will methodologically strengthen the work.

      Likely impact of work on the field:

      This work will be of interest to the medical community looking for novel ways to improve clinical diagnosis. Although future works with a greater population size, and more robust techniques such as RNA-Seq, are needed, this is the first step in presenting a novel way to distinguish these pathologies.

      It would be of great benefit to the community to provide a table/spreadsheet indicating the specific genomic locations of the HERVs specific to each condition. This will allow proper provenance for future researchers interested in expanding on this knowledge, as these genomic coordinates will be independent of the technique used (as was the array used here).

      We agree with the reviewer that sharing genomic locations of DE HERVs in these pathologies would contribute to the development of these findings. Unfortunately, we do not hold the rights to share probe coordinates from this custom HERV-V3 microarray which we used under MTA agreement with its developer.

      Reviewer #3 (Public review):

      The authors find that HERV expression patterns can be used as new criteria for differential diagnosis of FM and ME/CFS and patient subtyping. The data are based on transcriptome analysis by microarray for HERVs using patient blood samples, followed by differential expression of ERVs and bioinformatic analyses. This is a standard and solid data processing pipeline, and the results are well presented and support the authors' claim.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Recommandations/questions:

      (1) The authors point towards the biomarker potential of HERV expression signatures. In line with this, it would be important to test if they can predict the correct pathology for patients using the expression of DE HERVs. Additionally, as a single clinician annotated the cohort analysed in this study, it would be interesting to validate the signatures identified in this work by reanalysing publicly available transcriptomic data from independent studies.

      Thank you for the suggestion. We plan to conduct this analysis and have added the following statement to the manuscript (lines 482-483): “Given the limited sample size in our cohort, validation of the findings in extended cohorts is a must.”

      (2) The authors suggest that an epigenetic mechanism causes the dysregulated HERV expression in ME/CFS patients. However, in Fig.1A, HERV expression profiles of co-diagnosed patients are more similar to healthy controls than patients with either condition. How could the co-morbidity of FM "rescue" the phenotype of ME/CFS?

      Thank you for the insightful comment. It is notable that co-diagnosed patients exhibit HERV expression profiles more similar to those of healthy controls than to either FM´s or ME/CFS´s. These findings may suggest a distinct underlying pathomechanism for this patient group, supporting the identification of a novel nosologic entity, as discussed in lines 372-374 of the manuscript.

      (3) Abundant evidence in the literature links HERV dysregulation with the production of RNA:DNA hybrids and dsRNAs and viral mimicry. The authors found that ME/CFS subgroup 2, which exhibits the most important HERV dysregulation, is also associated with decreased signatures of pathogen detection. It would be interesting to quantify the abundance of DNA:RNA hybrids and dsRNAs in PBMCs of ME/CFS and FM patients as well as healthy controls. It would be interesting to discuss how downregulation of pathogen detection pathways could be a mechanism in ME/CFS patients to avoid viral mimicry and potential links with inflammation in this disease.

      Certainly, HERVs can influence disease pathophysiology by generating RNA:DNA hybrids and dsRNA. However, microarray data does not allow this analysis. Future actions to investigate the underlying mechanisms of differentially expressed HERVs could investigate this interesting possibility.

      (4) Another intriguing result is how overexpression of Module 3 in ME/CFS subgroup 2 is associated with higher levels of plasma cells. The authors hypothesize that the changes in immune cell abundances reflect previous viral infections, but another possibility would be immune activation against HERVs. Are there protein-coding sequences (gag, pro, pol, env) amongst the HERV sequences of module 3? If so, it would be interesting to validate HERV protein expression in these samples. Additionally, blood samples of ME/CFS patients and healthy controls should be analysed in flow cytometry to describe the abundance and phenotype of immune cells precisely.

      Thank you for your insightful comments. In fact, we identified three HERV elements with protein-coding regions whose functional relevance remains uncertain. They present an interesting avenue for future investigation, particularly regarding immune activation.

      Minor comments:

      (1) On lines 170-172, it is unclear to me how Figure 1E is linked to the text.

      We have added a line better explaining Fig. 1E: “Top 10 contributing HERVs to principal components PC1 and PC2 are shown” (lines 171-172).

      (2) Figure S2: grouping or colouring the plots based on the cluster to which HERVs were assigned could facilitate the understanding of the figure.

      We appreciate the suggestion to enhance the clarity of the figures. However, this color-coding cannot be implemented, as a family is not exclusively assigned to a single cluster.

      (3) How are the 4 HERV clusters of Figure 2 and the 8 modules of Figure 3 related to the clusters identified by hierarchical clustering in Figure 1? More details should be provided in the text (Results and Methods sections), and figures to illustrate the clustering strategy should be added if needed.

      To enhance clarity, we have included the following explanation in the results section (lines 244-251): “To uncover potentially affected physiologic functions linked to DE HERV, we examined how DE HERVs and DE genes with similar expression patterns grouped together in modules based on their intrinsic relationships by their hierarchical co-clustering (Fig. 3). Then, the functional significance of these modules was assessed by gene ontology (GO) analysis of the DE genes within each module. The hierarchical clustering analysis resulted in the identification of eight distinct modules, each characterized by unique combinations of DE HERV and DE gene patterns across all four study groups (Fig. 3)”.

      (4) Related to Figure 4, are there HERV sequences in module 3 located near genes important for plasma cells and/or resting CD4 memory T cells?

      Thank you for your insightful comment. However, gene relevance for plasma cells and/or resting CD4 memory T cells may depend on multiple factors in addition to cell type and subtypes and, therefore, the analysis may not be straight forward.

      Reviewer #2 (Recommendations for the authors):

      In Figure 1, the heatmap scale goes from -4 to 4. This should reflect at least the numbers on the lowest and highest end of the scale.

      Thank you for bringing this to our attention. The scale was correct; however, when arranging the panels, the numbers were not properly positioned. The figure has now been updated with the corrected version.

      Figure 2F and G, percentages are shown as decimal numbers up to 1.00, while it should be 100%, and so on.

      We also replaced this figure, changing the numbers to fit percentages.

      It would be interesting to know how the results change using FDR of 0.05. I'm not familiar with microarray thresholds, but in RNA-Seq, 0.1 is rarely used, with 0.05 being the standard. Could it be that a more stringent result better distinguishes the pathologies?

      Applying a more stringent threshold, such as FDR 0.05, may remove sequences that, while not strongly differentially expressed, may be still important for distinguishing between these pathologies. Therefore, we decided to also include DE tendencies (FDR<0.1) in this first of a kind study. Findings will need validation in enlarged cohorts.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      Summary:

      The authors aimed to investigate the interaction between tissue-resident immune cells (microglia) and circulating systemic neutrophils in response to acute, focal retinal injury. They induced retinal lesions using 488 nm light to ablate photoreceptor (PR) outer segments, then utilized various imaging techniques (AOSLO, SLO, and OCT) to study the dynamics of fluorescent microglia and neutrophils in mice over time. Their findings revealed that while microglia showed a dynamic response and migrated to the injury site within a day, neutrophils were not recruited to the area despite being nearby. Post-mortem confocal microscopy confirmed these in vivo results. The study concluded that microglial activation does not recruit neutrophils in response to acute, focal photoreceptor loss, a scenario common in many retinal diseases.

      Strengths:

      The primary strength of this manuscript lies in the techniques employed.

      In this study, the authors utilized advanced Adaptive Optics Scanning Laser Ophthalmoscopy (AOSLO) to document immune cell interactions in the retina accurately. AOSLO's micron-level resolution and enhanced contrast, achieved through near-infrared (NIR) light and phase-contrast techniques, allowed visualization of individual immune cells without extrinsic dyes. This method combined confocal reflectance, phase-contrast, and fluorescence modalities to reveal various cell types simultaneously. Confocal AOSLO tracked cellular changes with less than 6 μm axial resolution, while phase-contrast AOSLO provided detailed views of vascular walls, blood cells, and immune cells. Fluorescence imaging enabled the study of labeled cells and dyes throughout the retina. These techniques, integrated with conventional histology and Optical Coherence Tomography (OCT), offered a comprehensive platform to visualize immune cell dynamics during retinal inflammation and injury.

      Thank you!

      Weaknesses:

      One significant weakness of the manuscript is the use of Cx3cr1GFP mice to specifically track GFP-expressing microglia. While this model is valuable for identifying resident phagocytic cells when the blood-retinal barrier (BRB) is intact, it is important to note that recruited macrophages also express the same marker following BRB breakdown. This overlap complicates the interpretation of results and makes it difficult to distinguish between the contributions of microglia and infiltrating macrophages, a point that is not addressed in the manuscript.

      We agree that greater emphasis is required that CX3CR1 mice exhibit fluorescence in not only microglia, but also other cells of macrophage origin including monocytes, perivascular macrophages and some hyalocytes.

      Through the advantages of in vivo AOSLO, however, we are able to establish that CX3CR1 cells are present within the tissue before the laser lesion is placed. This suggests they are tissue resident. We agree that it is possible that at later time points (days-weeks), systemic macrophages and/or monocytes may participate. Lack of rolling/crawling cells suggest they are not systemic. We elaborate on this point in a new section in the discussion:

      P29 L534-541:

      “CX3CR1-GFP mice exhibit fluorescence not only in microglia

      We recognize that the CX3CR1-GFP model can also label systemic cells such as monocytes/macrophages77. While it is possible these cells could infiltrate the retina in response to the lesion, we find it unlikely since there was no indication of the leukocyte extravasation cascade (rolling/crawling/stalled cells) within the nearest retinal vasculature. In addition to microglia, retinal perivascular macrophages and hyalocytes also exhibit GFP fluorescence and thus that these cells may also contribute toward damage resolution.”

      Another major concern is the time point chosen for analyzing the neutrophil response. The authors assess neutrophil activity 24 hours after injury, which may be too late to capture the initial inflammatory response. This delayed assessment could overlook crucial early dynamics that occur shortly after injury, potentially impacting the overall findings and conclusions of the study.

      The power of in vivo imaging makes these early assessments possible. Therefore, we have taken the reviewers concern and conducted an additional experiment which examines whether neutrophils are seen in the window of time between lesion and 24hrs. In a newly examined mouse, we find that within 3.5 hours post-lesion, neutrophils do not extravasate adjacent to the lesion site (see new “figure 8 – figure supplement 1”).

      Also see accompanying video (new “figure 8 – video 3”) for an example of nearby neutrophils flowing through OPL capillaries just microns away from the lesion site. Neutrophils are clearly contained within the vasculature and exhibit dynamics consistent with healthy retinal tissue. While it remains possible that the lesion may increase leukocyte stalling within the nearest capillaries, we are unable to confirm or deny this with a single experiment. We now submit this evidence as a new supplementary figure following the reviewer’s suggestion.

      Reviewer #2 (Public review):

      Summary:

      This study uses in vivo multimodal high-resolution imaging to track how microglia and neutrophils respond to light-induced retinal injury from soon after injury to 2 months post-injury. The in vivo imaging finding was subsequently verified by an ex vivo study. The results suggest that despite the highly active microglia at the injury site, neutrophils were not recruited in response to acute light-induced retinal injury.

      Strengths:

      An extremely thorough examination of the cellular-level immune activity at the injury site. In vivo imaging observations being verified using ex vivo techniques is a strong plus.

      We appreciate this recognition and hope that the reviewer considers the weaknesses below in the context of the papers identified strengths.

      Weaknesses:

      This paper is extremely long, and in the perspective of this reviewer, needs to be better organized.

      We agree and have taken the following steps to address this:

      (1) Paper has been shortened overall by 8%

      (2) We reorganized the following sections:

      a. Introduction: shortened

      b. Methods: merged section “Ex vivo confocal image processing” with “Ex vivo confocal imaging”.

      c. Results: most sections shortened, others simplified for concision

      d. Discussion: most sections shortened, removed “Microglial/neutrophil discrimination using label-free phase contrast”

      e. Figure references reorganized in order of their appearance.

      Study weakness: though the finding prompts more questions and future studies, the findings discussed in this paper are potentially important for us to understand how the immune cells respond differently to different severity levels of injury.

      On the heels of this burgeoning technology, we consider this report among the first studies of its kind. We are hopeful that it forms the foundation of many further investigations to come. We expect a rich parameter space to be explored with future studies including investigation of other time points, other injuries of varying degree and other immune cell populations (along with their interactions with each other). Each has the potential to reveal the complexities of the ocular immune system in action.

      Reviewer #3 (Public review):

      Summary:

      This work investigated the immune response in the murine retina after focal laser lesions. These lesions are made with close to 2 orders of magnitude lower laser power than the more prevalent choroidal neovascularization model of laser ablation. Histology and OCT together show that the laser insult is localized to the photoreceptors and spares the inner retina, the vasculature, and the pigment epithelium. As early as 1-day after injury, a loss of cell bodies in the outer nuclear layer is observed. This is accompanied by strong microglial proliferation at the site of injury in the outer retina where microglia do not typically reside. The injury did not seem to result in the extravasation of neutrophils from the capillary network constituting one of the main findings of the paper. The demonstrated paradigm of studying the immune response and potentially retinal remodeling in the future in vivo is valuable and would appeal to a broad audience in visual neuroscience. However, there are some issues with the conclusions drawn from the data and analysis that can be addressed to further bolster the manuscript.

      Strengths:

      Adaptive optics imaging of the murine retina is cutting edge and enables non-destructive visualization of fluorescently labeled cells in the milieu of retinal injury. As may be obvious, this in vivo approach is beneficial for studying fast and dynamic immune processes on a local time scale - minutes and hours, and also for the longer days-to-months follow-up of retinal remodeling as demonstrated in the article. In certain cases, the in vivo findings are corroborated with histology.

      Thank you!

      The analysis is sound and accompanied by stunning video and static imagery. A few different sets of mouse models are used, (a) two different mouse lines, each with a fluorescent tag for neutrophils and microglia, (b) two different models of inflammation - endotoxin-induced uveitis (EAU) and laser ablation are used to study differences in the immune interaction.

      Thank you!

      One of the major advances in this article is the development of the laser ablation model for 'mild' retinal damage as an alternative to the more severe neovascularization models. While not directly shown in the article, this model would potentially allow for controlling the size, depth, and severity of the laser injury opening interesting avenues for future study.

      We agree that there is an established community that is invested in developing titrated dosimetry for light damage models. As the reviewer recognizes, this parameter space is exceptionally large therefore we controlled this parameter by choosing a single wavelength that is commonly used in ophthalmoscopy (488nm), fixed duration and exposure regime that created a reproducible, mild damage of photoreceptors. At this titration we created a mild lesion that spares retina above and below.

      Weaknesses:

      (1) It is unclear based on the current data/study to what extent the mild laser damage phenotype is generalizable to disease phenotypes. The outer nuclear cell loss of 28% and a complete recovery in 2 months would seem quite mild, thus the generalizability in terms of immune-mediated response in the face of retinal remodeling is not certain, specifically whether the key finding regarding the lack of neutrophil recruitment will be maintained with a stronger laser ablation.

      It seems the concern here is whether our finding is generalizable to other damage regimes, especially more severe ones. While speculative, we would suspect that it is not generalizable across different lesions of greater severity. For example, puncturing Bruch’s membrane is an example of a more severe phenotype that is often encountered in laser damage. However, this creates a complicated model that not only induces inflammation, but also compromises BRB integrity and promotes CNV. The parameter space to be tested in the reviewer’s question is quite vast and therefore have tried to summarize the generalizability within our manuscript in

      P31 L586-588 “There are limitations on how generalizable this mild damage to more severe damage or disease phenotypes, but this acute damage model can begin to provide clues about how immune cells interact in response to PR loss. In this laser lesion model, we ablate 27% of the PRs in a 50 µm region.”

      (2) Mice numbers and associated statistics are insufficient to draw strong conclusions in the paper on the activity of neutrophils, some examples are below:

      a) 2 catchup mice and 2 positive control EAU mice are used to draw inferences about immune-mediated activity in response to injury. If the goal was to show 'feasibility' of imaging these mouse models for the purposes of tracking specific cell type behavior, the case is sufficiently made and already published by the authors earlier. It is possible that a larger sample size would alter the conclusion.

      We would like to highlight that the total number of mice studied in this report was 28 (18 in-vivo imaging, 10 ex-vivo histology, >40 lesions total). While power analysis is challenging as these are the first studies of their kind, we underscore that in vivo imaging allows those same mice to be studied multiple times longitudinally. This is not possible with traditional histology. Therefore, in vivo imaging not only reveals the temporal progression (unlike histology), but also increases the number of observations beyond a simple count of the “number of mice”.

      The goal of the study was not one of feasibility. The goal was to address a specific question in ocular biology: “do resident CX3CR1 cells recruit neutrophils in early, regional retinal injury”

      The low numbers that the reviewer points to, are not the primary data of the paper, rather, supportive control data. Moreover, we refocus the attention on the fact that our study is performed on 28 mice across multiple modalities and each corroborates a common finding that neutrophils do not appear to be recruited despite strong microglial response; a central finding of the paper.

      b) There are only 2 examples of extravasated neutrophils in the entire article, shown in the positive control EAU model. With the rare extravasation events of these cells and their high-speed motility, the chance of observing their exit from the vasculature is likely low overall, therefore the general conclusions made about their recruitment or lack thereof are not justified by these limited examples shown.

      The spirit of the challenge raised is that because nothing was seen, is not proof that nothing occurred. Said more commonly, “absence of evidence is not evidence of absence”- a quote often attributed to Carl Sagan. Yet we push back on this conjecture as we have shown, not only with cutting edge in vivo imaging, but also with ample histological controls as well as multiple transgenic animals (and corroborating IHC antibodies) that in none of these imaging modalities, at none of the time points we evaluated, did neutrophils aggregate or extravasate in response to photoreceptor ablation.

      Reviewer adds: “the chance of observing their exit from the vasculature is likely low overall…”

      This is the reason that we specifically chose a focal lesion model to increase any possible chance of imaging a rare event. The focal lesion provides both a time and a location for “where” to look. Small 50 micrometer lesions were sufficient to drive a strong local microglial response (figures 5,6,9). This was evidence that local inflammatory cues were present. Yet despite this activation, neutrophils were not recruited to this location. We emphasize that this is a strength of our approach over other pan-retinal damage models that may indeed miss the rare extravasation events that are geographically sparse and happen over hours.

      c) In Figure 3, the 3-day time point post laser injury shows an 18% reduction in the density of ONL nuclei (p-value of 0.17 compared to baseline). In the case of neutrophils, it is noted that "Control locations (n = 2 mice, 4 z-stacks) had 15 {plus minus} 8 neutrophils per sq.mm of retina whereas lesioned locations (n = 2 mice, 4 z-stacks) had 23 {plus minus} 5 neutrophils per sq.mm of retina (Figure 10b). The difference between control and lesioned groups was not statistically significant (p = 0.19)." These data both come from histology. While the p-values - 0.17 and 0.19 - are similar, in the first case a reduction in ONL cell density is concluded while in the latter, no difference in neutrophil density is inferred in the lesioned case compared to control. Why is there a difference in the interpretation where the same statistical test and methodology are used in both cases? Besides this statistical nuance, is there an alternate possibility that there is an increased, albeit statistically insignificant, concentration of circulating neutrophils in the lesioned model? The increase is nearly 50% (15 {plus minus} 8 vs. 23 {plus minus} 5 neutrophils per sq.mm) and the reader may wonder if a larger animal number might skew the statistic towards significance.

      The statistics and p-values will be dependent on the strategy of analysis performed. As described in the methods, we used a predetermined 50 micron cylinder for our counting analysis based on the average lesion size created. We used this circular window to roughly approximate the size of the common lesion size. However, recall that the damage is created in a single axis (a line projected on the retina) therefore it is possible that the analysis region is too generous to capture the exceptionally local damage.

      While the reviewer is focused on the nuance of statistics, we would like to refocus the conversation on our data that shows that very few neutrophils were observed at all (105 cells from 8 locations, P value reported). But missed in the above critique is that all neutrophils were contained within capillaries (Fig 10). We found no examples of extravasated neutrophils.  This is the major finding and is supported by our in vivo as well as ex vivo confirmation.

      (2) The conclusions on the relative activity of neutrophils and microglia come from separate animals. The reader may wonder why simultaneous imaging of microglia and neutrophils is not shown in either the EAU mice or the fluorescently labeled catchup mice where the non-labeled cell type could possibly be imaged with phase-contrast as has been shown by the authors previously. One might suspect that the microglia dynamics are not substantially altered in these mice compared to the CX3CR1-GFP mice subjected to laser lesions, but for future applicability of this paradigm of in vivo imaging assessment of the laser damage model, including documenting the repeatability of the laser damage model and the immune cell behavior, acquiring these data in the same animals would be critical.

      A double fluorescent mouse (neutrophils and microglia) is a logical next step of this research. In fact, we have now crossed these transgenic mice and are studying this double labeled mouse in a second manuscript in preparation. However, for this study, it was imperative that the fluorescent imaging light was kept at low levels as not to contribute or alter the lesion phenotype and accompanying immune response. Therefore, imaging two fluorescent channels to simultaneously view neutrophils and microglia in the same animal would have required at least 2X the visible light exposure for imaging. The imaging light levels used in the current study were carefully examined in our previous publications as to not create additional light damage (Joseph et al 2021).

      (3) Along the same lines as above, the phase contrast ONL images at time points from 3-day to 2-month post laser injury are not shown and the absence of this data is not addressed. This missing data pertains only to the in vivo imaging mice model but are conducted in histology that adequately conveys the time-course of cell loss in the ONL.

      The ocular preparation of the phase contrast data in figure 2, unfortunately developed an anesthesia induced cataract that precluded adequate image quality. This is not uncommon in long-term mouse ocular imaging preparations (Feng et al 2023). Instead, we chose to include the phase-contrast data to show the visually compelling intact and disrupted ONL damage for baseline and 1 day to show that the damage is not only focal, but also shows clear disruption to the somatic layers of the photoreceptors.

      It is suggested that the reason be elaborated for the exclusion of this data and the simultaneous imaging of microglia and neutrophils mentioned above.

      We agree and we have included the reason for the “not acquired” data within the figure 2 legend:

      “Phase contrast data was not acquired for time points 3 days-2 months due to development of cataract which obscured the phase contrast signal”

      Also, it would be valuable to further qualify and check the claims in the Discussion that "ex vivo analysis confirms in vivo findings" and "Microglial/neutrophil discrimination using label-free phase contrast"

      We maintain that ex vivo analysis both corroborates and in many cases, confirms our in vivo findings. We feel this is a strength of our manuscript rather than a qualifier. A) Damage localization is visible with OCT and confocal/phase contrast AOSLO in a region that matches the DAPI loss we see ex vivo. B) Disruption of the ONL seen with in vivo AOSLO is of the same size, shape and location as the ONL damage quantified ex vivo. C) No damage or disruption was seen in locations above the lesion with OCT or AOSLO, which matches our finding that only the ONL shows loss of nuclei whereas other more superficial layers are spared. D) Microglial localization is found both in vivo and ex vivo and E) lack of neutrophil aggregation or extravasation was neither seen in vivo or ex vivo. Given the evidence above, we contend that this strong synergistic and complementary approach corroborates the experimental data in two ways of studying this tissue.

      We agree that the claims made in the section entitled “Microglial/neutrophil discrimination using label-free phase contrast” are not strongly supported by the phase-contrast imaging presented in this paper. Accordingly, we have since removed this section based on reviewer suggestion.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Based on the title and abstract, the main focus of the manuscript appears to be the immune response. However, most of the manuscript is dedicated to the authors' imaging technique. Additionally, several important concerns regarding the investigation of the immune response in the retina need to be addressed.

      We understand that emphasis may appear to be on the imaging technique, however, because AOSLO is not a widely used technology, we are committed to explaining the technique so that it both builds awareness and confidence in the way this exciting new data is acquired.

      (2) The authors indicate '1 day post-injury' as a timeframe spanning between 18 and 28 hours post-injury. This is a rather wide window of time, which could potentially affect the analysis. It is necessary to demonstrate that there is no significant difference in the immune response, particularly in terms of microglial morphology and branch orientation, between 18 and 28 hours post-injury.

      We agree that a fine time scale may show even greater insight to the natural history of the inflammatory response. However, we feel that our chosen time points go above and beyond the temporal precision that is offered by other investigations, especially considering the novel multi-modal imaging performed here. Studies using finer temporal sampling are poised for future investigation.

      (3) The authors should consider using additional markers or complementary techniques to differentiate between microglia and recruited macrophages, such as incorporating immunohistochemistry with P2RY12, a specific marker for microglia that helps distinguish them from macrophages, and CD68 or F4/80, markers for recruited macrophages. It is also crucial for the authors to include a discussion addressing the limitations of using Cx3cr1GFP mice and the potential impact on result interpretation. It is fundamental to validate the findings and clarify the roles of microglia and macrophages.

      The wonders of current IHC is that there are myriad antibodies and labels that “could” be used. We used what we felt were the most compelling for this stage of early investigation. We look forward to studies that employ this wider range of labels. See our response to reviewer 1’s first comment above for addressing the limitations of using Cx3CR1 mice.

      (4) Analyzing neutrophil responses at 24 hours post-injury may be too late to capture the critical early dynamics of inflammation. By this time, the initial recruitment and activation phases of neutrophils may have already peaked or begun to resolve, potentially missing key insights into the immediate immune response. The authors should conduct additional analysis of neutrophil responses at earlier time points post-injury, such as 6 or 12 hours. Including these time points would provide a more comprehensive and conclusive analysis of the neutrophil response, helping to delineate the progression of inflammation and its implications for subsequent healing processes.

      This point has been addressed above. Briefly, we have now included a new experiment (and figure + video) that shows no neutrophil extravasation at earlier time points. We thank the reviewer for this helpful suggestion.

      Reviewer #2 (Recommendations for the authors):

      This paper is extremely long, and in the perspective of this reviewer, needs to be better organized.

      (1) There was a lengthy description and verification of light-induced injury and longitudinal tracking of healing, which I believe can be further cleaned up and made more succinct.

      We have cleaned-up and re-organized the manuscript (see above response for details). Manuscript has been reorganized and reduced by 8%.

      (2) The intention/goal of the paper can be further strengthened. On page 33: "to what extent do neutrophils respond to acute neural loss in the retina?" This particular statement is so clear and really brings out the purpose of this study, and it will be great to see something like this in the opening statement.

      We thank the reviewer for this excellent suggestion. We have modified the final paragraph of the introduction to strengthen our study’s intention.

      P4 L45-47: Here, we ask the question: “To what extent do microglia/neutrophils respond to acute neural loss in the retina?” To begin unraveling the complexities in this response, we deploy a deep retinal laser ablation model.

      (3) The figures are not mentioned in the manuscript in the order they were numbered. It makes it extremely challenging to follow along. The methods/results sections started with Figure 1, then on to Figure 4, then back to Figures 2 and 3, etc. This reviewer recommends re-organizing figures and their order of appearance so the contents of the figures are referred to in the paragraph in the most efficient and clear manner.

      We have re-organized the appearance of figure references throughout the paper.

      (4) Figure 2: phase contrast was not acquired on days 3, 7, and 2 months. Please briefly explain the reason in the caption.

      Addressed above.

      (5) Figure 4 OPL layer, the area highlighted in a dashed circle was meant to demonstrate that perfusion was intact, but I cannot see the flow in the highlighted area very well at day 7 and 2 months (especially 2 months). Please explain.

      Perfusion maps are often difficult to interpret as a static image. Therefore, we have additionally provided the raw video data (“OPL_vasculature_7d” and “OPL_vasculature_2mo”) which helps visualize active perfusion. To the reviewer’s point, videos reveal that RBC motion is maintained in the capillaries of this location.

      (6) While there's a thorough discussion of the biological impact of the finding, the uniqueness of the imaging technique can be better highlighted. Immune response toward injury is highly dynamic and is often the first step of wound healing. To observe such dynamic events longitudinally in the living eye at the cellular level, it requires a special imaging technique such as the type addressed here. The author can better address the technical uniqueness of studying this type of biological event for readers less familiar with AOSLO.

      We agree and following the reviewer’s suggestion have further emphasized the advance in the current manuscript in two additional places:

      (1) Within the introduction

      P3-4 L21-42: “A missed window of interaction is highly problematic in histological study where a single time point reveals a snapshot of the temporally complex immune response, which changes dynamically over time. Here, we use in vivo imaging to overcome these constraints.

      Documenting immune cell interactions in the retina over time has been challenged by insufficient resolution and contrast to visualize single cells in the living eye. The microscopic size of immune cells requires exceptional resolution for detection. Recently, advances in AOSLO imaging have provided micron-level resolution and enhanced contrast for imaging individual immune cells in the retina and without requiring extrinsic dyes(7,23). AOSLO provides multi-modal information from confocal reflectance, phase-contrast and fluorescence modalities, which can reveal a variety of cell types simultaneously in the living eye. Here, we used confocal AOSLO to track changes in reflectance at cellular scale. Phase-contrast AOSLO provides detail on highly translucent retinal structures such as vascular wall, single blood cells(27–29), PR somata(30), and is well-suited to image resident and systemic immune cells.(7,23) Fluorescence AOSLO provides the ability to study fluorescently-labeled cells(25,31,32) and exogenous dyes(27,33) throughout the living retina. These modalities used in combination have recently provided detailed images of the retinal response to a model of human uveitis.(23,34) Together, these innovations now provide a platform to visualize, for the first time, the dynamic interplay between many immune cell types, each with a unique role in tissue inflammation.”

      (2) Within the discussion

      P34-35 L656-662 “Beyond the context of this specific finding, we share this work with the excitement that AOSLO cellular level imaging may reveal the interaction of multiple immune cell types in the living retina. By using fluorophores associated with specific immune cell populations, the complex dynamics that orchestrate the immune response may be examined in this specialized tissue. This work and future studies may reveal further insights to the interactions of single immune cells in the living body in a non-invasive way.”

      Reviewer #3 (Recommendations for the authors):

      Some other comments:

      (1) The reader may wonder why if all findings are confirmed by histology would an in vivo imaging model be needed. This does not need a generalized explanation given the typical virtues of an in vivo model, but perhaps the authors may want to amplify their findings in the current context, for example, those on the shorter minutes to hours timescales (Figure 2, Supplement 1) that would have been resource and time intensive, and likely impossible, to gather via histology alone.

      The reviewer appropriately underscores the utility of in vivo imaging above histological-only investigation. In response, we have added text in the introduction to emphasize the nuanced, but important value of both longitudinal imaging as well as dynamic imaging which is not possible with conventional histology (e.g. blood perfusion status, immune cell interactions etc.)

      P3-4 L21-42 (these points also addressed in response to reviewer #2 above)

      (2) A few questions and comments on the laser ablation model<br /> - It is alluded to in the Discussion in Lines 519-521 that the procedure is highly reproducible (95%) but the associated data for this repeatability metric is not shown.

      We agree that the criterion for determining a “successful lesion” requires further elaboration. Therefore, we have now included the criteria for successful lesions in the methods as well as discussion (in bullet below):

      Methods:

      P9-10 L129-133: “This protocol produced a hyper-reflective phenotype in the >40 locations across 28 mice. In rare cases, the exposure yielded no hyper-reflective lesion and were often in mice with high retinal motion, where the light dosage was spread over a larger retinal area. These locations were not included in the in-vivo or histological analysis.”

      - The methods state that a 24 x 1-micron line is focused on the retina, but all lesions seem to appear elliptical where the major to minor axis ratio is a lot smaller than this intended size. One wonders what leads to this discrepancy.

      We expect that this observation is related to the response above, we have added the following:

      Discussion:

      P27 L497-505: “The damage took on an elliptical form, likely due to: 1) Eye motion from respiration and heart rate which spreads the light over a larger integrative area (rather than line). 2) The impact of focal light scatter. 3) A micron-thin line imparting damage on cells that are many microns across manifesting as an ellipse. The majority of light exposures produced lesions of this elliptical shape. In a few conditions, for the reasons described above, the exposure failed to produce a strong, focal damage phenotype. To improve lesion reproducibility, future experiments should control for subtle eye motion affecting light damage, especially for long exposures.”

      (3) Lastly, a thickening is noted in the ONL after laser injury that seems to cause a thinning of the INL as well (Figure 3) which may increase the apparent INL nuclei density.

      The reviewer’s careful eye finds local swelling after injury. However, despite swelling, the segregation between INL and ONL was maintained in all days we examined. Thus, no ONL cells were included in INL counts (see figure 3A & 3D).

      Also, the ONL - inner (panel B) seems to show a little reduction in cell density in the same elliptical shape as the outer ONL in panel C.

      We agree with this observation and was one of the reasons we included this detailed analysis of both the inner and outer half of the ONL. Our finding is that there is more prominent loss of nuclei in the outer half of the ONL. While the mechanism for this is not understood, we felt it was an important finding to include and further shows the axial specificity of the light damage we are inducing (especially at day 1 observation).

      Lastly, the reduction in nuclear density is visually obvious in the ONL at the 1 and 3-day time points but the p-statistic does not seem to convey this. One may consider performing the analysis on panel F on a smaller region surrounding the lesion to more reliably reveal these effects.

      Related to the response above, the ONL shows a persistence of nuclei in the upper half of that layer, whereas the outer half, shows a visible reduction. Therefore, we expect that the reviewer is correct that a statistical analysis that considers just the outer half of the ONL would likely show a strong statistical significance. The challenge, however, is that our analysis strategy counted all cells within a 50 micron diameter cylinder through the entirety of the ONL (meaning strong loss in the outer half was attenuated by weak loss in the inner half). A more detailed sub-layer analysis is challenging given the notable retinal remodeling over days-to-weeks that make it challenging to attribute layers within the ONL as viable landmarks for the requested analysis.

      (4) In Figure 6, the NIR confocal image and fluorescent microglia seem to share the same shape, starting from the OPL and posterior to it. This is particularly evident in the 3 and 7-day time points in the ONL and ONL/IS images. This departs from lines 567-577 where the claim is made that the hyperreflective phenotype in NIR images does not emerge from the microglia and neutrophils. This discrepancy should be clarified. It may be so that the hyperreflective phenotype as observed by Figure 2 at shorter timescales is not related to the microglia but the locus of hyper-reflections changes at longer time scales to involve the microglia as well as in Figure 6. One potential clue/speculation of the common shapes/size in confocal hyper-reflectance and fluorescent microglia of Figure 6 comes from Figure 9 where the microglia seem to engulf the photoreceptor phagosomes in the DAPI stains. It is possible that the hyper-reflections arise from the phagosomes but their co-localization with microglia seems to demonstrate a shared size/shape. As an addendum to the first point, such correlations are a power of the in vivo model and impossible to achieve in histology.

      The reviewer shows a deep understanding of our data. We agree with many of the points, but for the purpose of the paper many of the above offerings are speculative and we have chosen not to elaborate on these points as it is not definitive from the data. Instead, we direct the reader to an important finding that within hours, the hyper-reflective phenotype is seen in both OCT and AOSLO, whereas microglial somas/processes have not yet migrated into the hyper-reflective region. We have now emphasized this point in the discussion section:

      P29-30 L543-552: “A common speculation is that the increased backscatter may arise from local inflammatory cells that activate or move into the damage location. In our data, confocal AOSLO and OCT revealed a hyperreflective band at the OPL and ONL after 488 nm light exposure (Figure 2a, b). We found that the hyperreflective bands appeared within 30 minutes after the laser injury, preceding any detectable microglial migration toward the damage location (Figure 2 – figure supplement 1 and Figure 6 – figure supplement 1). We thus conclude that the initial hyperreflective phenotype is not caused by microglial cell activity or aggregation.”

    1. Author response:

      The following is the authors’ response to the previous reviews

      eLife Assessment

      This work presents a valuable self-supervised method for the segmentation of 3D cells in microscopy images, alongside an implementation as a Napari plugin and an annotated dataset. While the Napari plugin is readily applicable and promises to eliminate time consuming data labeling to speed up quantitative analysis, there is incomplete evidence to support the claim that the segmentation method generalizes to other light-sheet microscopy image datasets beyond the two specific ones used here.

      Technical Note: We showed the utility of CellSeg3D in the first submission and in our revision on 5 distinct datasets; 4 of which we showed F1-Score performance on. We do not know which “two datasets” are referenced. We also already showed this is not limited to LSM, but was used on confocal images; we already limited our scope and changed the title in the last rebuttal, but just so it’s clear, we also benchmark on two non-LSM datasets.

      In this revision, we have now additionally extended our benchmarking of Cellpose and StarDrist on all 4 benchmark datasets, where our Wet3D (our novel contribution of a self-supervised model) outperforms or matches these supervised baselines. Moreover, we perform rigorous testing of our model’s generalization by training on one dataset and testing generalization to the other 3; we believe this is on par (or beyond) what most cell segmentation papers do, thus we hope that “incomplete” can now be updated.

      Public Reviews:

      Reviewer #1 (Public review):

      This work presents a self-supervised method for the segmentation of 3D cells in microscopy images, an annotated dataset, as well as a napari plugin. While the napari plugin is potentially useful, there is insufficient evidence in the manuscript to support the claim that the proposed method is able to segment cells in other light-sheet microscopy image datasets than the two specific ones used here.

      Thank you again for your time. We benchmarked already on four datasets the performance of WNet3Dd (our 3D SSL contribution) - thus, we do not know which two you refer to. Moreover, we now additionally benchmarked Cellpose and StarDist on all four so readers can see that on all datasets, WNet3D outperforms or matches these supervised methods.

      I acknowledge that the revision is now more upfront about the scope of this work. However, my main point still stands: even with the slight modifications to the title, this paper suggests to present a general method for self-supervised 3D cell segmentation in light-sheet microscopy data. This claim is simply not backed up.

      We respectfully disagree; we benchmark on four 3D datasets: three curated by others and used in learning ML conference proceedings, and one that we provide that is a new ground truth 3D dataset - the first of its kind - on mesoSPIM-acquired brain data. We believe benchmarking on four datasets is on par (or beyond) with current best practices in the field. For example, Cellpose curated one dataset and tested on held-out test data on this one dataset (https://www.nature.com/articles/s41592-020-01018-x) and benchmarked against StarDist and Mask R-CNN (two models). StarDist (Star-convex Polyhedra for 3D Object Detection and Segmentation in Microscopy) benchmarked on two datasets and against two models, IFT-Watershed and 3D U-Net. Thus, we feel our benchmarking on more models and more datasets is sufficient to claim our model and associated code is of interest to readers and supports our claims (for comparison, Cellpose’s title is “Cellpose: a generalist algorithm for cellular segmentation”, which is much broader than our claim).

      I still think the authors should spell out the assumptions that underlie their method early on (cells need to be well separated and clearly distinguishable from background). A subordinate clause like "often in cleared neural tissue" does not serve this purpose. First, it implies that the method is also suitable for non-cleared tissue (which would have to be shown). Second, this statement does not convey the crucial assumptions of well separated cells and clear foreground/background differences that the method is presumably relying on.

      We expanded the manuscript now quite significantly. To be clear, we did show our method works on non-cleared tissue; the Mouse Skull, 3D platynereis-Nuclei, and 3D platynereis-ISH-Nuclei is not cleared tissue, and not all with LSM, but rather with confocal microscopy. We attempted to make that more clear in the main text.

      Additionally, we do not believe it needs to be well separated and have a perfectly clean background. While we removed statements like "often in cleared neural tissue", expanded the benchmarking, and added a new demo figure for the readers to judge. As in the last rebuttal, we provide video-evidence (https://www.youtube.com/watch?v=U2a9IbiO7nE) of the WNet3D working on the densely packed and hard to segment by a human, Mouse Skull dataset and linked this directly in the figure caption.

      We have re-written the main manuscript in an attempt to clarify the limitations, including a dedicated “limitations” section. Thank you for the suggestion.

      It does appear that the proposed method works very well on the two investigated datasets, compared to other pre-trained or fine-tuned models. However, it still remains unclear whether this is because of the proposed method or the properties of those specific datasets (namely: well isolated cells that are easily distinguished from the background). I disagree with the authors that a comparison to non-learning methods "is unnecessary and beyond the scope of this work". In my opinion, this is exactly what is needed to proof that CellSeg3D's performance can not be matched with simple image processing.

      We want to again stress we benchmarked WNet3D on four datasets, not two. But now additionally added benchmarking with Cellpose, StarDist and a non-deep learning method as requested (see new Figures 1 and 3).

      As I mentioned in the original review, it appears that thresholding followed by connected component analysis already produces competitive segmentations. I am confused about the authors' reply stating that "[this] is not the case, as all the other leading methods we fairly benchmark cannot solve the task without deep learning". The methods against which CellSeg3D is compared are CellPose and StarDist, both are deep-learning based methods.

      That those methods do not perform well on this dataset does not imply that a simpler method (like thresholding) would not lead to competitive results. Again, I strongly suggest the authors include a simple, non-learning based baseline method in their analysis, e.g.: * comparison to thresholding (with the same post-processing as the proposed method) * comparison to a normalized cut segmentation (with the same post-processing as the proposed method)

      We added a non-deep learning based approach, namely, comparing directly to thresholding with the same post hoc approach we use to go from semantic to instance segmentation. WNet3D (and other deep learning approaches) perform favorably (see Figure 2 and 3).

      Regarding my feedback about the napari plugin, I apologize if I was not clear. The plugin "works" as far as I tested it (i.e., it can be installed and used without errors). However, I was not able to recreate a segmentation on the provided dataset using the plugin alone (see my comments in the original review). I used the current master as available at the time of the original review and default settings in the plugin.

      We updated the plugin and code for the revision at your request to make this possible directly in the napari GUI in addition to our scripts and Jupyter Notebooks (please see main and/or `pip install --upgrade napari-cellseg3d`’ the current is version 0.2.1). Of course this means the original submission code (May 2024) will not have this in the GUI so it would require you to update to test this. Alternatively, you can see the demo video we now provide for ease: https://www.youtube.com/watch?v=U2a9IbiO7nE (we understand testing code takes a lot of time and commitment).

      We greatly thank the review for their time, and we hope our clarifications, new benchmarking, and re-write of the paper now makes them able to change their assessment from incomplete to a more favorable and reflective eLife adjective.

      Reviewer #2 (Public review):

      Summary:

      The authors propose a new method for self-supervised learning of 3d semantic segmentation for fluorescence microscopy. It is based on a WNet architecture (Encoder / Decoder using a UNet for each of these components) that reconstructs the image data after binarization in the bottleneck with a soft n-cuts clustering. They annotate a new dataset for nucleus segmentation in mesoSPIM imaging and train their model on this dataset. They create a napari plugin that provides access to this model and provides additional functionality for training of own models (both supervised and self-supervised), data labeling and instance segmentation via post-processing of the semantic model predictions. This plugin also provides access to models trained on the contributed dataset in a supervised fashion.

      Strengths:

      -  The idea behind the self-supervised learning loss is interesting.

      -  It provides a new annotated dataset for an important segmentation problem.

      -  The paper addresses an important challenge. Data annotation is very time-consuming for 3d microscopy data, so a self-supervised method that yields similar results to supervised segmentation would provide massive benefits.

      -  The comparison to other methods on the provided dataset is extensive and experiments are reproducible via public notebooks.

      Weaknesses:

      The experiments presented by the authors support the core claims made in the paper. However, they do not convincingly prove that the method is applicable to segmentation problems with more complex morphologies or more crowded cells/nuclei.

      Major weaknesses:

      (1) The method only provides functionality for semantic segmentation outputs and instance segmentation is obtained by morphological post-processing. This approach is well known to be of limited use for segmentation of crowded objects with complex morphology. This is the main reason for prediction of additional channels such as in StarDist or CellPose. The experiments do not convincingly show that this limitation can be overcome as model comparisons are only done on a single dataset with well separated nuclei with simple morphology. Note that the method and dataset are still a valuable contribution with this limitation, which is somewhat addressed in the conclusion. However, I find that the presentation is still too favorable in terms of the presentation of practical applications of the method, see next points for details.

      Thank you for noting the methods strengths and core features. Regarding weaknesses, we have revised the manuscript again and added direct benchmarking now on four datasets and a fifth “worked example” (https://www.youtube.com/watch?v=3UOvvpKxEAo&t=4s) in a new Figure 4.

      We also re-wrote the paper to more thoroughly present the work (previously we adhered to the “Brief Communication” eLife format), and added an explicit note in the results about model assumptions.

      (2) The experimental set-up for the additional datasets seems to be unrealistic as hyperparameters for instance segmentation are derived from a grid search and it is unclear how a new user could find good parameters in the plugin without having access to already annotated ground-truth data or an extensive knowledge of the underlying implementations.

      We agree that of course with any self-supervised method the user will need a sense of what a good outcome looks like; that is why we provide Google Colab Notebooks

      (https://github.com/AdaptiveMotorControlLab/CellSeg3D/tree/main/notebooks) and the napari-plugin GUI for extensive visualization and even the ability to manually correct small subsets of the data and refine the WNet3D model.

      We attempted to make this more clear with a new Figure 2 and additional functionality directly into the plugin (such as the grid search). But, we believe this “trade-off” for SSL approaches over very labor intensive 3D labeling is often worth it; annotators are also biased so extensive checking of any GT data is equally required.

      We also added the “grid search” functionality in the GUI (please `pip install --upgrade napari-cellseg3d`; the latest v0.2.1) to supplement the previously shared Notebook (https://github.com/C-Achard/cellseg3d-figures/blob/main/thresholds_opti/find_best_threshold s.ipynb) and added a new YouTube video: https://www.youtube.com/watch?v=xYbYqL1KDYE.

      (3) Obtaining segmentation results of similar quality as reported in the experiments within the napari plugin was not possible for me. I tried this on the "MouseSkull" dataset that was also used for the additional results in the paper.

      Again we are sorry this did not work for you, but we added new functionality in the GUI and made a demo video (https://www.youtube.com/watch?v=U2a9IbiO7nE) where you either update your CellSeg3D code or watch the video to see how we obtained these results.

      Here, I could not find settings in the "Utilities->Convert to instance labels" widget that yielded good segmentation quality and it is unclear to me how a new user could find good parameter settings. In more detail, I cannot use the "Voronoi-Otsu" method due to installation issues that are prohibitive for a non expert user and the "Watershed" segmentation method yields a strong oversegmentation.

      Sorry to hear of the installation issue with Voronoi-Otsu; we updated the documentation and the GUI to hopefully make this easier to install. While we do not claim this code is for beginners, we do aim to be a welcoming community, thus we provide support on GitHub, extensive docs, videos, the GUI, and Google Colab Notebooks to help users get started.

      Comments on revised version

      Many of my comments were addressed well:

      -  It is now clear that the results are reproducible as they are well documented in the provided notebooks, which are now much more prominently referenced in the text.

      Thanks!

      -  My concerns about an unfair evaluation compared to CellPose and StarDist were addressed. It is now clear that the experiments on the mesoSPIM dataset are extensive and give an adequate comparison of the methods.

      Thank you; to note we additionally added benchmarking of Cellpose and StarDist on the three additional datasets (for R1), but hopefully this serves to also increase your confidence in our approach.

      -  Several other minor points like reporting of the evaluation metric are addressed.

      I have changed my assessment of the experimental evidence to incomplete/solid and updated the review accordingly. Note that some of my main concerns with the usability of the method for segmentation tasks with more complex morphology / more crowded cells and with the napari plugin still persist. The main points are (also mentioned in Weaknesses, but here with reference to the rebuttal letter):

      - Method comparison on datasets with more complex morphology etc. are missing. I disagree that it is enough to do this on one dataset for a good method comparison.

      We benchmarked WNet3D (our contribution) on four datasets, and to aid the readers we additionally now added Cellpose and StarDist benchmarking on all four. WNet3D performs favorably, even on the crowded and complex Mouse Skull data. See the new Figure 3 as well as the associated video: https://www.youtube.com/watch?v=U2a9IbiO7nE&t=1s.

      -  The current presentation still implies that CellSeg3d **and the napari plugin** work well for a dataset with complex nucleus morphology like the Mouse Skull dataset. But I could not get this to work with the napari plugin, see next points.

      - First, deriving hyperparameters via grid search may lead to over-optimistic evaluation results. How would a user find these parameters without having access to ground-truth? Did you do any experiments on the robustness of the parameters?

      -  In my own experiments I could not do this with the plugin. I tried this again, but ran into the same problems as last time: pyClesperanto does not work for me. The solution you link requires updating openCL drivers and the accepted solution in the forum post is "switch to a different workstation".

      We apologize for the confusion here; the accepted solution (not accepted by us) was user specific as they switched work stations and it worked, so that was their solution. Other comments actually solved the issue as well. For ease this package can be installed on Google Colab (here is the link from our repo for ease: https://colab.research.google.com/github/AdaptiveMotorControlLab/CellSeg3d/blob/main/not ebooks/Colab_inference_demo.ipynb) where pyClesperanto can be installed via: !pip install pyclesperanto-prototype without issue on Google Colab.

      This a) goes beyond the time I can invest for a review and b) is unrealistic to expect computationally inexperienced users to manage. Then I tried with the "watershed" segmentation, but this yields a strong oversegmentation no matter what I try, which is consistent with the predictions that look like a slightly denoised version of the input images and not like a proper foreground-background segmentation. With respect to the video you provide: I would like to see how a user can do this in the plugin without having a prior knowledge on good parameters or just pasting code, which is again not what you would expect a computationally unexperienced user to do.

      We agree with the reviewer that the user needs domain knowledge, but we never claim our method was for inexperienced users. Our main goal was to show a new computer vision method with self-supervised learning (WNet3D) that works on LSM and confocal data for cell nuclei. To this end, we made you a demo video to show how a user can visually perform a thresholding check https://www.youtube.com/watch?v=xYbYqL1KDYE&t=5s, and we added all of these new utilities to the GUI, thanks for the suggestion. Otherwise, the threshold can also be done in a Notebook (as previously noted).

      I acknowledge that some of these points are addressed in the limitations, but the text still implies that it is possible to get good segmentation results for such segmentation problems: "we believe that our self-supervised semantic segmentation model could be applied to more challenging data as long as the above limitations are taken into account." From my point of view the evidence for this is still lacking and would need to be provided by addressing the points raised above for me to further raise the Incomplete/solid rating, especially showing how this can be done wit the napari plugin. As an alternative, I would also consider raising it if the claims are further reduced and acknowledge that the current version of the method is only a good method for well separated nuclei.

      We hope our new benchmarking and clear demo on four datasets helps improve your confidence in our evidence in our approach. We also refined our over text and hope our contributions, the limitations and the advantages are now more clear.

      I understand that this may be frustrating, but please put yourself in the role of a new reader of this work: the impression that is made is that this is a method that can solve 3D segmentation tasks in light-sheet microscopy with unsupervised learning. This would be a really big achievement! The wording in the limitation section sounds like strategic disclaimers that imply that it is still possible to do this, just that it wasn't tested enough.

      But, to the best of my assessment, the current version of the method only enables the more narrow case of well separated nuclei with a simple morphology. This is still a quite meaningful achievement, but more limited than the initial impression. So either the experimental evidence needs to be improved, including a demonstration how to achieve this in practice, including without deriving parameters via grid-search and in the plugin, or the claim needs to be meaningfully toned down.

      Thanks for raising this point; we do think that WNet3D and the associated CellSeg3D package - aimed to continue to integrate state of the art models, is a non-trivial step forward. Have we completely solved the problem, certainly not, but given the limited 3D cell segmentation tools that exist, we hope this, coupled with our novel 3D dataset, pushes the field forward. We don’t show it works on the narrow well-separated use case, but rather show this works even better than supervised models on the very challenging benchmark Mouse Skull. Given we now show evidence that we outperform or match supervised algorithms with an unsupervised approach, we respectfully do think this is a noteworthy achievement. Thank you for your time in assessing our work.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1:

      To gain further insight into the dynamics of microglial aging in the hippocampus, the authors used a bioinformatics method known as "pseudotime" or "trajectory inference" to understand how cells may progress through different functional states, as defined by cellular transcriptome (15,16). These bioinformatics approaches can reveal key patterns in scRNAseq / snRNAseq datasets and, in the present study, the authors conclude that a "stress response" module characterized by expression of TGFb1 represents a key "checkpoint" in microglial aging in midlife, after which the cells can move along distinct transcriptional trajectories as aging progresses. This is an intriguing possibility. However, pseudotime analyses need to be validated via additional bioinformatics as well as follow-up experiments. Indeed, Heumos et al, in their Nature Genetics "Expert Guidelines" Review, emphasize that "inferred trajectories might not necessarily have biological meaning." They recommend that "when the expected topology is unknown, trajectories and downstream hypotheses should be confirmed by multiple trajectory inference methods using different underlying assumptions."(15) Numerous algorithms are available for trajectory inference (e.g. Monocle, PAGA, Slingshot, RaceID/StemID, among many others) and their performance and suitability depends on the individual dataset and nature of the trajectories that are to be inferred. It is recommended to use dynGuidelines(16) for the selection of optimal pseudotime analysis methods. In the present manuscript, the authors do not provide any justification for their use of Monocle 3 over other trajectory inference approaches, nor do they employ a secondary trajectory inference method to confirm observations made with Monocle 3. Finally, follow-up validation experiments that the authors carry out have their own limitations and caveats (see below). Hence, while the microglial aging trajectories identified by this study are intriguing, they remain hypothetical trajectories that need to be proven with additional follow-up experiments.

      We thank the reviewer for their suggestion. We have utilized the dynGuidelines kindly provided by the reviewer to utilize an additional trajectory inference tool to analyze our data. We selected Scorpius based on the structure of our data. The tool has provided additional support that microglia progress from a homeostatic state (Cx3cr1, Mef2c) to the induction of stress genes (Hspa1, Atf3) at an intermediate point during aging progression. Furthermore, we observe a concordant increase in ribosomal protein genes at a time point in the pseudotime analysis immediately prior to activation of inflammation-related genes (Il1b, Cst7). These additional analyses support the main findings of our original pseudotime analysis and have been added to the manuscript as Figure S3C,D. Additionally, in the statistical test that uncovers differentially expressed genes along the pseudotime trajectory in this analyses, we find that Tgfb1 is one of the genes that is differentially expressed with peak expression at an intermediate timepoint along the pseudotime trajectory. Furthermore, we have done some preliminary trajectory analysis with slingshot (Street et al, BMC Genomics, PMID: 29914354) that found a similar trajectory with analogous gene expression patterns and dynamic expression of Tgfb1.

      To follow up on the idea that TGFb1 signaling in microglia plays a key role in determining microglial aging trajectories, the authors use RNAscope to show that TGFb1 levels in microglia peak in middle age. They also treat primary LPS-activated microglia with TGFb1 and show that this restores expression of microglial homeostatic gene expression and dampens expression of stress response and, potentially, inflammatory genes. Finally, they utilize transgenic approaches to delete TGFb1 from microglia around 8-10mo of age and scRNAseq to show that homeostatic signatures are lost and inflammatory signatures are gained. Hence, findings in this study support the idea that TGFb1 can strongly regulate microglial phenotype. Loss of TGFb1 signaling to microglia in adulthood has already been shown to cause decreased microglial morphological complexity and upregulation of genes typically associated with microglial responses to CNS insults(17-19). TGFb1 signaling to microglia has also been implicated in microglial responses to disease and manipulations to increase this signaling can improve disease progression in some cases(19). In this light, the findings in the present study are largely confirmatory of previous findings in the literature. They also fall short of unequivocally demonstrating that TGFb1 signaling acts as a "checkpoint" for determining subsequent microglial aging trajectory. To show this clearly, one would need to perturb TGFb1 signaling around 12mo of age and carry out sequencing (bulkRNAseq or scRNAseq) of microglia at 18mo and 24mo. Such experiments could directly demonstrate whether the whole microglial population has been diverted to the TGFb1-low aging trajectory (that progresses through a translational burst state to an inflammation state as proposed). Future development of tools to tag TGFb1 high or low microglia could also enable fate tracing type experiments to directly show whether the TGFb1 state in middle age predicts cell state at later phases of aging.

      We apologize for the use of the term “checkpoint” when referring to the role of Tgfb1 in microglial aging. Instead, our model posits that Tgfb1 expression increases in response to the early insults of the aging process in an attempt to return microglia to homeostasis. Therefore, this would predict that increasing TGFB1 levels after an insult would decrease activation and age-related progression of microglia, which we demonstrate in vitro (Figure 3). Alternatively, the loss of TGFB1 should prevent microglia from returning to a homeostatic state after an age-related stressor, and thus increase the number of microglia in activated states. We observe this increase in activated microglia in our middle-aged microglia-specific Tgfb1 knockout mouse model. Furthermore, the haploinsufficiency of Tgfb1 at this age indicates that TGFB1 signaling in microglia is sensitive to relative levels of Tgfb1. The transient increase in Tgfb1 expression further suggests that the threshold for TGFB1 signaling is dynamic. Finally, RNA-Seq analysis of both in vitro TGFB1 supplemented microglia and in vivo Tgfb1 depleted microglia highlight that TGFB1 alters the aging microglia transcriptome. Combined, these results provide evidence that Tgfb1 modulates advancement of microglia through an aging continuum.

      The present study would also like to draw links between features of microglial aging in the hippocampus and a decline in hippocampal-dependent cognition during aging. To this end, they carry out behavioral testing in 8-10mo old mice that have undergone microglial-specific TGFb1 deletion and find deficits in novel object recognition and contextual fear conditioning. While this provides compelling evidence that TGFb1 signaling in microglia can impact hippocampus-dependent cognition in midlife, it does not demonstrate that this signaling accelerates or modulates cognitive decline (see below). Age-associated cognitive decline refers to cognitive deficits that emerge as a result of the normative brain aging process (20-21). For a cognitive deficit to be considered age-associated cognitive decline, it must be shown that the cognitive operation under study was intact at some point earlier in the adult lifespan. This requires longitudinal study designs that determine whether a manipulation impacts the relationship between brain status and cognition as animals age (22-24). Alternatively, cross-sectional studies with adequate sample sizes can be used to sample the variability in cognitive outcomes at different points of the adult lifespan (22-24) and show that this is altered by a particular manipulation. For this specific study, one would ideally demonstrate that hippocampal-based learning/memory was intact at some point in the lifespan of mice with microglial TGFb1 KO but that this manipulation accelerated or exacerbated the emergence of deficits in hippocampal-dependent learning/memory during aging. In the absence of these types of data, the authors should tone down their claims that they have identified a cellular and molecular mechanism that contributes to cognitive decline.

      We agree with the reviewer that to adequately demonstrate an age-dependent effect of microglia-derived TGFB1 on cognition it is necessary to perturb microglial TGFB1 at young and mature ages and assess the age-dependent effect on cognition. To address this, we have now performed a complementary behavioral study utilizing the Tmem119-CreER mouse model to drive the microglia-specific excision of Tgfb1 in two separate cohorts of mice – one young (2-3 months) and one in mature mice (7-8 months) – followed by cognitive testing. Using the novel object recognition test, we find that young mice of all genotypes (WT, Tgfb1 Het and Tgfb1 cKO ) retain the ability to recognize the novel object (as determined by having a significant preference in exploring the novel object). Alternatively, only the WT mature mice demonstrate a preference for the novel object, while the Tgfb1 Het and Tgfb1 cKO show no preference for the novel object. These behavioral data demonstrate an age-dependent necessity for microglia-specific TGFB1 in in maintain proper hippocampal-dependent memory and is now included in the manuscript as revised Figure 4I-J. We have also included additional behavioral tests (Y-Maze and open field) that did not show any difference between the genotypes as Figure S6D-G. Unfortunately, we were unable to perform the fear conditioning testing, as our apparatus broke during this time. Together, these results reveal that there is an age-dependent necessity for microglia-derived TGFB1 for hippocampal-dependent cognitive function.

      A final point of clarification for the reader pertains to the mining of previously generated data sets within this study. The language in the results section, methods, and figure legends causes confusion about which experiments were actually carried out in this study versus previous studies. Some of the language makes it sound as though parabiosis experiments and experiments using mouse models of Alzheimer's Disease were carried out in this study. However, parabiosis and AD mouse model experiments were executed in previous studies (25,26), and in the present study, RNAseq datasets were accessed for targeted data mining. It is fantastic to see further mining of datasets that already exist in the field. However, descriptions in the results and methods sections need to make it crystal clear that this is what was done.

      The reviewer makes an excellent point. While we referenced the public dataset in the original manuscript, the citation style of superscripted numbers diminishes our ability to adequately reference the datasets. Therefore, we have added the names of the first authors (Palovics for the parabiosis dataset and Sala Frigerio for the Alzheimer’s Disease dataset) to all the instances in the results and figure legends when we refer to these datasets.

      Additional recommendations:

      Major comments.

      (1) There is some ambiguity surrounding how to interpret the microglial TGFb1 knockout that seems incompatible with viewing this molecule as a "checkpoint" in microglial aging. TGFb1 is believed to be primarily produced by microglia. Secreted TGFb1 is then detected by microglial TGFbR2. Are the microglia that have high levels of TGFb1 in middle age signaling to themselves (autocrine signaling)? Or contributing to a local milieu that impacts multiple neighbor microglia (paracrine signaling)? The authors could presumably look in their own dataset to evaluate microglial capacity to detect TGFb1 via its receptors.

      We thank the reviewer for this insightful suggestion. We have undertaken analysis of our dataset to assess whether Tgfb1 acts through autocrine or paracrine signaling. To do so, we reanalyzed our microglia aging scRNA-Seq dataset leveraging the variation in microglia Tgfb1 expression to probe the relative activity of TGFB1. Specifically, we partitioned microglia into quartiles based on their Tgfb1 expression, and subsequently investigated the expression of TGFB signaling effectors and targets. High expression of downstream TGFB signaling pathway components in microglia with high Tgfb1 expression would point to autocrine mechanisms while, alternatively, high expression of downstream TGFB signaling pathway components in microglia with low Tgfb1 expression would point to paracrine mechanisms. We observed highest expression of TGFB signaling pathway components and targets in microglia with the highest expression of Tgfb1. These data suggest that Tgfb1 acts through an autocrine mechanism. These results have been added to our manuscript as Figure S4E-G. Additionally, while our manuscript was under review, a paper by Bedolla et al (Nature Communications 2024; PMID: 38906887) was published that investigated the role of Tgfb1 in adult microglia. This paper utilized orthogonal techniques – sparse microglia-specific Tgfb1 knockout and IHC - to also suggest that microglia utilize autocrine Tgfb1 signaling. Together, these complementary data provide strong evidence that Tgfb1 acts through an autocrine mechanism in adult microglia.

      (2) Conclusions of the study rest on the assumption that microglial inflammatory responses are a central driver of cognitive decline. They assume that manipulations that increase microglial progression into an inflammatory state will negatively impact cognitive function. Although there are certainly a lot of data in the field that inflammatory factors can impact synaptic function, additional experiments would be required to unequivocally demonstrate that a "TGFb1 dependent" progression of microglia to an inflammatory state underlies any observed changes in cognition. For example, in the context of microglial TGFb1 deletion, can NSAIDs or blockers of soluble TNFa (e.g. XENP345), or blockers of SPP1, etc. rescue behavior? Can microglial depletion in this context rescue behavior? Assuming behavior was carried out in the same microglial TGFb1 KO mice that were used for microglial scRNAseq, they could also carry out linear regression-type analyses to link microglial inflammatory status to the behavioral performance of individual mice. In the absence of additional evidence of this sort, the authors should tone down claims about mechanistic relationships between microglial state and cognitive performance.

      We thank the reviewer for realizing that the link between cognition and inflammation in our paper is speculative. Therefore, we have taken the reviewer’s advice and toned down the claims linking inflammation to cognition in our manuscript. Instead, we connect the disruption in cognition to what is observed in our data, a loss of microglia homeostasis and a shift in the microglia aging trajectories.

      Additional Recommendations:

      Minor comments:

      (1) Ideally at some point in the results or discussion, the authors should acknowledge that the hippocampus has highly distinct sub-regions and that microglia show different functions and properties across these sub-regions (e.g. microglia in hilus and subgranular zone vs microglia in stratum radiatum, vs microglia immediately adjacent to or embedded within stratum pyrimidale). Do expression levels of TGFb1 and microglial aging trajectories vary across sub-regions? To what extent can this account for heterogeneity of aging trajectories observed in microglial aging within the hippocampus?

      We are interested in how microglia heterogeneity during aging is influenced by the specific functions, and thus microenvironments within the hippocampus. Therefore, we have expanded our IHC analysis of microglia to determine how the microenvironment influences microglia phenotypes by looking at several different regions of the hippocampus. We have included this regional analysis as Figure S2 in the manuscript. This analysis has revealed region-specific effects on microglia activation during aging.

      (2) For immunohistochemistry data, it is not particularly convincing to see one example of one cell from each condition. Generally, an accepted approach in the field is to present lower magnification images accompanied by zoom panels for several cells from each field of view. This reassures the reader that specific cells haven't simply been "cherry-picked" to support a particular conclusion.

      To allay the concerns of the reviewer that cells haven’t been “cherry-picked”, we have provided low magnification images for the aging CD68 and NF<sub>κ</sub>B stains in Supplemental Figure S2.

      (3) In immunohistochemistry data, have measures been taken to ensure that observed signals are not simply autofluorescence that becomes prominent in tissues with aging? (i.e. use of trueblack or photoquenching of tissue prior to staining) See PMID 37923732

      We agree that autofluorescence, at least partially due to the accumulation of lipofuscin, becomes prominent in certain regions and cells of the hippocampus during aging. This most prominently occurs in the microglia of the hilus. This autofluorescence has a particular subcellular distribution, as it is localized to lyso-endosomal bodies. The microglia activation marker CD68 is also localized to lysosomes. A previous publication by Burns et al (eLife; PMID: 32579115) identified autofluorescent microglia (AF+) with unique molecular profiles that accumulate with age. They posited that these AF+ microglia resembled other microglia subsets that have pronounced storage compartments, such as the pro-inflammatory lipid droplet-containing microglia that accumulate with age reported by Marschallinger et al (Nature; PMID: 31959936). As such, autofluorescence present in microglia potentially represents distinctive and functional states of microglia. Our CD68 immunostaining accumulates with age, which could overlap with autofluorescent storage bodies. Thus, we performed a complementary CD68 immunostaining in an independent cohort of young (3 months) and aged (24 months) mice with autofluorescence quencher TrueBlack, and found that the staining pattern and accumulation of CD68 microglia with age persisted as previously observed after use of this quencher (see Authpr response image 1). Images are IBA1 (cyan) and CD68 (yellow) with the molecular layer (ML), granule cell (GC), and hilus illustrated and corresponding quantification provided (Two-way ANOVA with Sidak’s multiple comparisons test; ***P<0.001; ****P<0.0001).

      We would like to note that the subcellular localization of the other immunostainings included in the manuscript was distinct from CD68, and not likely to be associated with the autofluorescent storage bodies. Additionally, our RNAScope staining for Tgfb1 did not show an accumulation with age, but rather a transient increase at 12 months of age, which indicates that the interpretation of the RNAScope stain for Tgfb1 was not unduly influenced by autofluorescence.

      Author response image 1.

      (4) Ideally, more care is needed with the language used to describe microglial state during aging. The terms "dystrophic," "dysfunctional," and "inflammatory" all carry their own implications and assumptions. Many changes exhibited by microglia during aging can initially be adaptive or protective, particularly during middle age. Without additional experiments to show that specific microglial attributes during aging are actively detrimental to the tissue and additional experiments to show that microglia have ceased to be capable of engaging in many of their normal actions to support tissue homeostasis, the authors should exercise caution in using terms like dysfunctional.

      We appreciate the reviewers’ suggestion. To allay the concerns of the reviewer about the multiple implications of terms such as “dysfunctional” and “inflammatory”, we have tried to replace them throughout the text with more specific terms.

      Reviewer #2:

      That said, given what we recently learned about microglia isolation for RNA-seq analysis, there is a danger that some of the observations are a result of not age, but cell stress from sample preparation (enzymatic digestion 10min at 37C; e.g. PMID: 35260865). Changes in cell state distribution along aging were made based on scRNA-seq and were not corroborated by any other method, such as imaging of cluster-specific marker expression in microglia at different ages. This analysis would allow confirming the scRNA-seq data and would also give us an idea of where the subsets are present within the hippocampus, and whether there is any interesting distribution of cell states (e.g. some are present closer to stem cells?). Since TGFb is thought to be crucial to microglia biology, it would be valuable to include more analysis of the mice with microglia-specific Tgfb deletion e.g. what was the efficiency of recombination in microglia? Did their numbers change after induction of Tgfb deletion in Cx3cr1-creERT2::Tgfb-flox mice.

      We thank the reviewer for their comment regarding potential ex vivo transcriptional alterations with the approaches used in our study. We performed our aging microglia scRNA-Seq characterization prior to the release of Marsh et al (Nature Neuroscience; PMID: 35260865), which revealed the potential transcriptional artefacts induced by isolation. That being said, we took great care to minimize the amount of time samples were subjected to enzymatic digestion (15 minutes) and kept cells at 4C during the remainder of the isolation. Furthermore, we performed all isolations simultaneously, so that transcriptional changes induced by the isolation would be present across all ages and should not be observed during our analysis unless indicative of a true age-related change. Additionally, we have corroborated changes in cell state distribution across ages using several markers (Tgfb1 and KLF2 for the intermediate stress state, S6 for the translation state, and NFKB and CD68 for activation states). In the revised manuscript, we have added additional hippocampal subregion analysis of several IHC immunostains to provide spatial insights into the microglia aging process (Figure S2). This analysis reveals unique spatial dynamics of microglia aging. For example, as the reviewer foresaw, we found that the granule cell layer (the location of adult hippocampal neurogenesis) had a more pronounced age-associated progression of microglial activation than several other regions. A subset of regions had minimal levels of activation during aging, such as the molecular layer and the stratum radiatum of the CA1 (inner CA1in the manuscript) – regions enriched in synaptic terminals. Furthermore, this analysis highlights the susceptibility of microglia aging to microenvironmental influences.

      Regarding the temporally controlled microglia-specific genetic KO mouse model used in our original submission, the Cx3cr1-CreER allele selected (B6.129P2(Cg)-Cx3cr1tm2.1(cre/ERT2)Litt/WganJ) has been reported to have very high recombination efficiency (~94% in Parkhurst et al (Cell; PMID: 24360280)), and we used a tamoxifen induction protocol very similar to Faust et al. (Cell Reports; PMID: 37635351) that achieved ~98% recombination (they injected 100mg/kg for 5 days, while we injected 90mg/kg for 5 days). We analyzed our scRNA-Seq data for the expression of Tgfb1 and found that the knockout mice had a 67% reduction in cells expressing higher levels of Tgfb1 (see panel A in Author response image 2). This is likely a large underestimate of the recombination efficiency, as exon 3 is floxed and residual nonfunctional transcripts could be present, given nonsense-mediated decay is not realized in a number of knockout lines (Lindner et al, Methods, PMID: 33838271). We likely achieved a much higher excision efficiency. We would like to highlight that our data indicating increased microglia activation after tamoxifen treatment (Figure S5A) and the involvement of autonomous signaling (Figure S4E-G) are consistent with recently published work by Bedolla et al, (Nature Communications; PMID: 38906887). Additionally, as part of the revision process, we have now corroborated our behavioral data using and independent temporally controlled microglia-specific KO mouse model - Tmem119-CreER::Tgfb1 knockout mice (Figure 4I-K). We performed qPCR on sorted microglia to determine RNA levels in wildtype and knockout mice. Relative levels of Tgfb1 and exon 3 of Tgfb1 (the floxed exon) on technical replicates of 3 pooled samples indicated overall loss of Tgfb1 expression, as well as undetectable levels of exon 3 as normalized to Actb (see panel B in Author response image 2).

      Author response image 2.

      With respect to the effects of aging and Tgfb1 on microglia density, we find a slight region-specific increase in microglia density with age (see Author response image 3). The density of Iba1 cells across hippocampal regions was analyzed at 3 and 24 months of age (see panel A in Author response image 3) and along an aging continuum at 3, 6, 12, 18, and 24 months (see panel B in Author response image 3). These data are also included in the revised manuscript (Figure S2D-F).

      Author response image 3.

      Deletion of Tgfb1 also had region-specific effects on microglia. While there was no difference in microglia density between wildtype and heterozygous microglia, there was a significant increase in microglia density in the hilus and molecular layers in knockout mice (see Author response image 4) and included in the revised manuscript (Figure S5A). These data indicate that there are subtle region-specific increases in microglia density with age, as well as following the deletion of Tgfb1 from microglia of mature mice.

      Author response image 4.

      Additional Recommendations:

      (1) The problem of possible digestion artifacts in scRNA-seq should be at least addressed in the discussion as a caveat in data interpretation. Staining for unique cluster markers in undigested tissue would solve the problem. It can be done with microscopy or using flow cytometry, but for this microglia, isolation should be done with no enzymes or with Actinomycin (PMID: 35260865).

      The ex vivo activation signature uncovered by Marsh et al. (Nature Neuroscience; PMID: 35260865) arises from the digestion methods used to isolate microglia. We took the utmost care in processing our microglia identically within experiments, which should minimize the amount of uneven ex vivo activation of microglia. This is borne out by the structures of our single-cell sequencing data. Unlike Marsh et al_. where they observe unique cluster after addition of their inhibitors, we do not see any clusters unique to a single condition, suggesting that any influence of _ex vivo activation was evenly distributed.

      Importantly, as suggested by the review, we have we have complemented our scRNA-Seq analysis by corroborating several markers for various stages of microglia aging progression using RNAScope and IHC in intact tissue. Specifically, the transient age-dependent increase in Tgfb1 high microglia was confirmed using RNAScope (Figure 3B), the age-related increase in ribosomal high microglia was confirmed using S6 immunostaining (Figure 3I), and the increase of various markers of age-associated activation (C1q, CD68 and NFkB) was confirmed using immunostaining (Figure 1F and Figure S2D-I). Additionally, we have also performed immunostainings for KLF2 and confirmed peak microglia expression at 18 months of age with lower levels at 24 months of age (Figure 2H).

      (2) The figures of GO and violin plots are not easy to follow sometimes... what are the data points in the violin plots, maybe worth showing them as points? For the GO, e.g. in 3D, 3J, including a short description of the figure could help, e.g. in Figure 1. it was clear.

      We chose not to include the datapoints in the violin plots for aesthetic purposes. Each violin plot would have had hundreds of points that would have made the plots very busy and hidden the structure of the distribution. In Author response image 5 we show the violin plot in Figure 2M with (panel A) and without (panel B) individual points. In a small format, the points overlap and become jumbled together. Therefore, we chose to present the violin plots without points for clarity on the data structure. As for the gene ontology plots in Figure 3, we have updated the descriptions in both the text and figure legends to provide clarification on what they represent.

      Author response image 5.

      (3) I'm very curious to see the mechanism of action of "aged" microglia in the TGFb-depletion model. Is it creating hostile conditions for stem cells, or we have increased synapse loss? Something else?

      We thank the reviewer for their insightful questions. We would like to note that during the revision process of our manuscript, a complementary study was published reporting that the loss of microglia-derived Tgfb1 leads to an aberrant increase in the density of dendritic spines in the CA1 region of the hippocampus (Bedolla et al, Nature Communications, PMID: 38906887). The data from Bedolla et al, shows sparsely labeled neurons in the CA1 with a mGreenLantern expressing virus in mice the had Tgfb1 deleted from microglia using the Cx3cr1-CreERT driver (Figure 7U,V). Additionally, McNamara et al (Nature; PMID: 36517604) demonstrated that microglia-derived Tgfb1 signaling regulates myelin integrity during development and several studies have revealed links between Tgfb1 signaling and altered neurogenesis (e.g., He et al, Nature, PMID: 24859199 and Dias et al, Neuron, PMID: 25467979). Together, this growing body of work indicates that microglia-derived TGFB1 regulates myelination, neurogenesis and synaptic plasticity, which have all been shown to play a role in cognition.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This study addresses the question of how task-relevant sensory information affects activity in the motor cortex. The authors use various approaches to address this question, looking at single units and population activity. They find that there are three subtypes of modulation by sensory information at the single unit level. Population analyses reveal that sensory information affects the neural activity orthogonally to motor output. The authors then compare both single unit and population activity to computational models to investigate how encoding of sensory information at the single unit level is coordinated in a network. They find that an RNN that displays similar orbital dynamics and sensory modulation to the motor cortex also contains nodes that are modulated similarly to the three subtypes identified by the single unit analysis.

      Strengths:

      The strengths of this study lie in the population analyses and the approach of comparing single-unit encoding to population dynamics. In particular, the analysis in Figure 3 is very elegant and informative about the effect of sensory information on motor cortical activity.

      The task is also well designed to suit the questions being asked and well controlled.

      We appreciate these kind comments.

      It is commendable that the authors compare single units to population modulation. The addition of the RNN model and perturbations strengthen the conclusion that the subtypes of individual units all contribute to the population dynamics. However, the subtypes (PD shift, gain, and addition) are not sufficiently justified. The authors also do not address that single units exhibit mixed modulation, but RNN units are not treated as such.

      We’re sorry that we didn’t provide sufficient grounds to introduce the subtypes. We have updated this in the revised manuscript, in Lines 102-104 as:

      “We determined these modulations on the basis of the classical cosine tuning model (Georgopoulos et al., 1982) and several previous studies (Bremner and Andersen, 2012; Pesaran et al., 2010; Sergio et al., 2005).”

      In our study, we applied the subtype analysis as a criterion to identify the modulation in neuron populations, rather than sorting neurons into exclusively different cell types.

      Weaknesses:

      The main weaknesses of the study lie in the categorization of the single units into PD shift, gain, and addition types. The single units exhibit clear mixed selectivity, as the authors highlight. Therefore, the subsequent analyses looking only at the individual classes in the RNN are a little limited. Another weakness of the paper is that the choice of windows for analyses is not properly justified and the dependence of the results on the time windows chosen for single-unit analyses is not assessed. This is particularly pertinent because tuning curves are known to rotate during movements (Sergio et al. 2005 Journal of Neurophysiology).

      In our study, the mixed selectivity or specifically the target-motion modulation on reach- direction tuning is a significant feature of the single neurons. We categorized the neurons into three subclasses, not intending to claim their absolute cell types, but meaning to distinguish target-motion modulation patterns. To further characterize these three patterns, we also investigated their interaction by perturbing connection weights in RNN.

      Yes, it’s important to consider the role of rotating tuning curves in neural dynamics during interception. In our case, we observed population neural state with sliding windows, and we focused on the period around movement onset (MO) due to the unexpected ring-like structure and the highest decoding accuracy of transferred decoders (Figure S7C). Then, the single-unit analyses were implemented.

      This paper shows sensory information can affect motor cortical activity whilst not affecting motor output. However, it is not the first to do so and fails to cite other papers that have investigated sensory modulation of the motor cortex (Stavinksy et al. 2017 Neuron, Pruszynski et al. 2011 Nature, Omrani et al. 2016 eLife). These studies should be mentioned in the Introduction to capture better the context around the present study. It would also be beneficial to add a discussion of how the results compare to the findings from these other works.

      Thanks for the reminder. We’ve introduced these relevant researches in the updated manuscript in Lines 422-426 as:

      “To further clarify, the discussing target-motion effect is different from the sensory modulation in action selection (Cisek and Kalaska, 2005), motor planning (Pesaran et al., 2006), visual replay and somatosensory feedback (Pruszynski et al., 2011; Stavisky et al., 2017; Suway and Schwartz, 2019; Tkach et al., 2007), because it occurred around movement onset and in predictive control trial-by-trial.”

      This study also uses insights from single-unit analysis to inform mechanistic models of these population dynamics, which is a powerful approach, but is dependent on the validity of the single-cell analysis, which I have expanded on below.

      I have clarified some of the areas that would benefit from further analysis below:

      (1) Task:

      The task is well designed, although it would have benefited from perhaps one more target speed (for each direction). One monkey appears to have experienced one more target speed than the others (seen in Figure 3C). It would have been nice to have this data for all monkeys.

      A great suggestion; however, it is hardly feasible as the Utah arrays have already been removed.

      (2) Single unit analyses:

      In some analyses, the effects of target speed look more driven by target movement direction (e.g. Figures 1D and E). To confirm target speed is the main modulator, it would be good to compare how much more variance is explained by models including speed rather than just direction. More target speeds may have been helpful here too.

      A nice suggestion. The fitting goodness of the simple model (only movement direction) is much worse than the complex models (including target speed). We’ve updated the results in the revised manuscript in Lines 119-122, as “We found that the adjusted R2 of a full model (0.55 ± 0.24, mean ± sd.) can be higher than that of the PD shift (0.47 ± 0.24), gain (0.46 ± 0.22), additive (0.41 ± 0.26), and simple models (only reach direction, 0.34 ± 0.25) for three monkeys (1162 neurons, ranksum test, one-tailed, p<0.01, Figure S5).”

      The choice of the three categories (PD shift, gain addition) is not completely justified in a satisfactory way. It would be nice to see whether these three main categories are confirmed by unsupervised methods.

      A good point. It is a pity that we haven’t found an appropriate unsupervised method.

      The decoder analyses in Figure 2 provide evidence that target speed modulation may change over the trial. Therefore, it is important to see how the window considered for the firing rate in Figure 1 (currently 100ms pre - 100ms post movement onset) affects the results.

      Thanks for the suggestion and close reading. Because the movement onset (MO) is the key time point of this study, we colored this time period in Figure 1 to highlight the perimovement neuronal activity.

      (3) Decoder:

      One feature of the task is that the reach endpoints tile the entire perimeter of the target circle (Figure 1B). However, this feature is not exploited for much of the single-unit analyses. This is most notable in Figure 2, where the use of a SVM limits the decoding to discrete values (the endpoints are divided into 8 categories). Using continuous decoding of hand kinematics would be more appropriate for this task.

      This is a very reasonable suggestion. In the revised manuscript, we’ve updated the continuous decoding results with support vector regression (SVR) in Figure S7A and in Lines 170-173 as:

      “These results were stable on the data of the other two monkeys and the pseudopopulation of all three monkeys (Figure S6) and reconfirmed by the continuous decoding results with support vector regressions (Figure S7A), suggesting that target motion information existed in M1 throughout almost the entire trial.”

      (4) RNN:

      Mixed selectivity is not analysed in the RNN, which would help to compare the model to the real data where mixed selectivity is common. Furthermore, it would be informative to compare the neural data to the RNN activity using canonical correlation or Procrustes analyses. These would help validate the claim of similarity between RNN and neural dynamics, rather than allowing comparisons to be dominated by geometric similarities that may be features of the task. There is also an absence of alternate models to compare the perturbation model results to.

      Thank you for these helpful suggestions. We have performed decoding analysis on RNN units and updated in Figure S12A and Lines 333-334 as: “First, from the decoding result, target motion information existed in nodes’ population dynamics shortly after TO (Figure S12A).”

      We also have included the results of canonical correlation analysis and Procrustes analysis in Table S2 and Lines 340-342 as: “We then performed canonical component analysis (CCA) and Procrustes analysis (Table S2; see Methods), the results also indicated the similarity between network dynamics and neural dynamics.”

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, Zhang et al. examine neural activity in the motor cortex as monkeys make reaches in a novel target interception task. Zhang et al. begin by examining the single neuron tuning properties across different moving target conditions, finding several classes of neurons: those that shift their preferred direction, those that change their modulation gain, and those that shift their baseline firing rates. The authors go on to find an interesting, tilted ring structure of the neural population activity, depending on the target speed, and find that (1) the reach direction has consistent positioning around the ring, and (2) the tilt of the ring is highly predictive of the target movement speed. The authors then model the neural activity with a single neuron representational model and a recurrent neural network model, concluding that this population structure requires a mixture of the three types of single neurons described at the beginning of the manuscript.

      Strengths:

      I find the task the authors present here to be novel and exciting. It slots nicely into an overall trend to break away from a simple reach-to-static-target task to better characterize the breadth of how the motor cortex generates movements. I also appreciate the movement from single neuron characterization to population activity exploration, which generally serves to anchor the results and make them concrete. Further, the orbital ring structure of population activity is fascinating, and the modeling work at the end serves as a useful baseline control to see how it might arise.

      Thank you for your recognition of our work.

      Weaknesses:

      While I find the behavioral task presented here to be excitingly novel, I find the presented analyses and results to be far less interesting than they could be. Key to this, I think, is that the authors are examining this task and related neural activity primarily with a singleneuron representational lens. This would be fine as an initial analysis since the population activity is of course composed of individual neurons, but the field seems to have largely moved towards a more abstract "computation through dynamics" framework that has, in the last several years, provided much more understanding of motor control than the representational framework has. As the manuscript stands now, I'm not entirely sure what interpretation to take away from the representational conclusions the authors made (i.e. the fact that the orbital population geometry arises from a mixture of different tuning types). As such, by the end of the manuscript, I'm not sure I understand any better how the motor cortex or its neural geometry might be contributing to the execution of this novel task.

      This paper shows the sensory modulation on motor tuning in single units and neural population during motor execution period. It’s a pity that the findings were constrained in certain time windows. We are still working on this task, please look forward to our following work.

      Main Comments:

      My main suggestions to the authors revolve around bringing in the computation through a dynamics framework to strengthen their population results. The authors cite the Vyas et al. review paper on the subject, so I believe they are aware of this framework. I have three suggestions for improving or adding to the population results:

      (1) Examination of delay period activity: one of the most interesting aspects of the task was the fact that the monkey had a random-length delay period before he could move to intercept the target. Presumably, the monkey had to prepare to intercept at any time between 400 and 800 ms, which means that there may be some interesting preparatory activity dynamics during this period. For example, after 400ms, does the preparatory activity rotate with the target such that once the go cue happens, the correct interception can be executed? There is some analysis of the delay period population activity in the supplement, but it doesn't quite get at the question of how the interception movement is prepared. This is perhaps the most interesting question that can be asked with this experiment, and it's one that I think may be quite novel for the field--it is a shame that it isn't discussed.

      It’s a great idea! We are on the way, and it seems promising.

      (2) Supervised examination of population structure via potent and null spaces: simply examining the first three principal components revealed an orbital structure, with a seemingly conserved motor output space and a dimension orthogonal to it that relates to the visual input. However, the authors don't push this insight any further. One way to do that would be to find the "potent space" of motor cortical activity by regression to the arm movement and examine how the tilted rings look in that space (this is actually fairly easy to see in the reach direction components of the dPCA plot in the supplement--the rings will be highly aligned in this space). Presumably, then, the null space should contain information about the target movement. dPCA shows that there's not a single dimension that clearly delineates target speed, but the ring tilt is likely evident if the authors look at the highest variance neural dimension orthogonal to the potent space (the "null space")-this is akin to PC3 in the current figures, but it would be nice to see what comes out when you look in the data for it.

      Thank you for this nice suggestion. While it was feasible to identify potent subspaces encoding reach direction and null spaces for target-velocity modulation, as suggested by the reviewer, the challenge remained that unsupervised methods were insufficient to isolate a pure target-velocity subspace from numerous possible candidates due to the small variance of target-velocity information. Although dPCA components can be used to construct orthogonal subspaces for individual task variables, we found that the targetvelocity information remained highly entangled with reach-direction representation. More details can be found in Figure S8C and its caption as below:

      “We used dPCA components with different features to construct three subspaces (same data in A, reach-direction space #3, #4, #5; target-velocity space #10, #15, #17; interaction space #6, #11, #12), and we projected trial-averaged data into these orthogonal subspaces using different colormaps. This approach allowed us to obtain a “potent subspace” coding reach direction and a “null space” for target velocity. The results showed that the reach-direction subspace effectively represented the reach direction. However, while the target-velocity subspace encoded the target velocity information, it still contained reach-direction clusters within each target-velocity condition, corroborating the results of the addition model in the main text (Figure 4). The interaction subspace revealed that multiple reach-direction rings were nested within each other, similar to the findings from the gain model (Figure 3 & 4). The interaction subspace also captured more variance than target-velocity subspace, consistent with our PCA results, suggesting the target-velocity modulation primarily coexists with reach-direction coding. Furthermore, we explored alternative methods to verify whether orthogonal subspaces could effectively separate the reach direction and target velocity. We could easily identify the reach-direction subspace, but its orthogonal subspace was relatively large, and the target-velocity information exhibited only small variance, making it difficult to isolate a subspace that purely encodes target velocity.”

      (3) RNN perturbations: as it's currently written, the RNN modeling has promise, but the perturbations performed don't provide me with much insight. I think this is because the authors are trying to use the RNN to interpret the single neuron tuning, but it's unclear to me what was learned from perturbing the connectivity between what seems to me almost arbitrary groups of neurons (especially considering that 43% of nodes were unclassifiable). It seems to me that a better perturbation might be to move the neural state before the movement onset to see how it changes the output. For example, the authors could move the neural state from one tilted ring to another to see if the virtual hand then reaches a completely different (yet predictable) target. Moreover, if the authors can more clearly characterize the preparatory movement, perhaps perturbations in the delay period would provide even more insight into how the interception might be prepared.

      We are sorry that we did not clarify the definition of “none” type, which can be misleading. The 43% unclassifiable nodes include those inactive ones; when only activate (taskrelated) nodes included, the ratio of unclassifiable nodes would be much lower. We recomputed the ratios with only activated units and have updated Table 1. By perturbing the connectivity, we intended to explore the interaction between different modulations.

      Thank you for the great advice. We considered moving neural states from one ring to another without changing the directional cluster. However, we found that this perturbation design might not be fully developed: since the top two PCs are highly correlated with movement direction, such a move—similar to exchanging two states within the same cluster but under different target-motion conditions—would presumably not affect the behavior.

      Reviewer #3 (Public Review):

      Summary:

      This experimental study investigates the influence of sensory information on neural population activity in M1 during a delayed reaching task. In the experiment, monkeys are trained to perform a delayed interception reach task, in which the goal is to intercept a potentially moving target.

      This paradigm allows the authors to investigate how, given a fixed reach endpoint (which is assumed to correspond to a fixed motor output), the sensory information regarding the target motion is encoded in neural activity.

      At the level of single neurons, the authors found that target motion modulates the activity in three main ways: gain modulation (scaling of the neural activity depending on the target direction), shift (shift of the preferred direction of neurons tuned to reach direction), or addition (offset to the neural activity).

      At the level of the neural population, target motion information was largely encoded along the 3rd PC of the neural activity, leading to a tilt of the manifold along which reach direction was encoded that was proportional to the target speed. The tilt of the neural manifold was found to be largely driven by the variation of activity of the population of gain-modulated neurons.

      Finally, the authors studied the behaviour of an RNN trained to generate the correct hand velocity given the sensory input and reach direction. The RNN units were found to similarly exhibit mixed selectivity to the sensory information, and the geometry of the “ neural population” resembled that observed in the monkeys.

      Strengths:

      - The experiment is well set up to address the question of how sensory information that is directly relevant to the behaviour but does not lead to a direct change in behavioural output modulates motor cortical activity.

      - The finding that sensory information modulates the neural activity in M1 during motor preparation and execution is non trivial, given that this modulation of the activity must occur in the nullspace of the movement.

      - The paper gives a complete picture of the effect of the target motion on neural activity, by including analyses at the single neuron level as well as at the population level. Additionally, the authors link those two levels of representation by highlighting how gain modulation contributes to shaping the population representation.

      Thank you for your recognition.

      Weaknesses:

      - One of the main premises of the paper is the fact that the motor output for a given reach point is preserved across different target motions. However, as the authors briefly mention in the conclusion, they did not record muscle activity during the task, but only hand velocity, making it impossible to directly verify how preserved muscle patterns were across movements. While the authors highlight that they did not see any difference in their results when resampling the data to control for similar hand velocities across conditions, this seems like an important potential caveat of the paper whose implications should be discussed further or highlighted earlier in the paper.

      Thanks for the suggestion. We’ve highlighted the resampling results as an important control in the revised manuscript in Figure S11 and Lines 257-260 as:

      “To eliminate hand-speed effect, we resampled trials to construct a new dataset with similar distributions of hand speed in each target-motion condition and found similar orbital neural geometry. Moreover, the target-motion gain model provided a better explanation compared to the hand-speed gain model (Figure S11).”

      - The main takeaway of the RNN analysis is not fully clear. The authors find that an RNN trained given a sensory input representing a moving target displays modulation to target motion that resembles what is seen in real data. This is interesting, but the authors do not dissect why this representation arises, and how robust it is to various task design choices. For instance, it appears that the network should be able to solve the task using only the motion intention input, which contains the reach endpoint information. If the target motion input is not used for the task, it is not obvious why the RNN units would be modulated by this input (especially as this modulation must lie in the nullspace of the movement hand velocity if the velocity depends only on the reach endpoint). It would thus be important to see alternative models compared to true neural activity, in addition to the model currently included in the paper. Besides, for the model in the paper, it would therefore be interesting to study further how the details of the network setup (eg initial spectral radius of the connectivity, weight regularization, or using only the target position input) affect the modulation by the motion input, as well as the trained population geometry and the relative ratios of modulated cells after training.

      Great suggestions. In the revised manuscript, we’ve added the results of three alternative modes in Table S4 and Lines 355-365 as below:

      “We also tested three alternative network models: (1) only receives motor intention and a GO-signal; (2) only receives target location and a GO-signal; (3) initialized with sparse connection (sparsity=0.1); the unmentioned settings and training strategies were as the same as those for original models (Table S4; see Methods). The results showed that the three modulations could emerge in these models as well, but with obviously distinctive distributions. In (1), the ring-like structure became overlapped rings parallel to the PC1PC2 plane or barrel-like structure instead; in (2), the target-motion related tilting tendency of the neural states remained, but the projection of the neural states on the PC1-PC2 plane was distorted and the reach-direction clusters dispersed. These implies that both motor intention and target location seem to be needed for the proposed ring-like structure. The initialization of connection weights of the hidden layer can influence the network’s performance and neural state structure, even so, the ring-like structure”

      - Additionally, it is unclear what insights are gained from the perturbations to the network connectivity the authors perform, as it is generally expected that modulating the connectivity will degrade task performance and the geometry of the responses. If the authors wish the make claims about the role of the subpopulations, it could be interesting to test whether similar connectivity patterns develop in networks that are not initialized with an all-to-all random connectivity or to use ablation experiments to investigate whether the presence of multiple types of modulations confers any sort of robustness to the network.

      Thank you for these great suggestions. By perturbations, we intended to explore the contribution of interaction between certain subpopulations. We’ve included the ablation experiments in the updated manuscript in Table S3 and Lines 344-346 as below: “The ablation experiments showed that losing any kind of modulation nodes would largely deteriorate the performance, and those nodes merely with PD-shift modulation could mostly impact the neural state structure (Table S3).”

      - The results suggest that the observed changes in motor cortical activity with target velocity result from M1 activity receiving an input that encodes the velocity information. This also appears to be the assumption in the RNN model. However, even though the input shown to the animal during preparation is indeed a continuously moving target, it appears that the only relevant quantity to the actual movement is the final endpoint of the reach. While this would have to be a function of the target velocity, one could imagine that the computation of where the monkeys should reach might be performed upstream of the motor cortex, in which case the actual target velocity would become irrelevant to the final motor output. This makes the results of the paper very interesting, but it would be nice if the authors could discuss further when one might expect to see modulation by sensory information that does not directly affect motor output in M1, and where those inputs may come from. It may also be interesting to discuss how the findings relate to previous work that has found behaviourally irrelevant information is being filtered out from M1 (for instance, Russo et al, Neuron 2020 found that in monkeys performing a cycling task, context can be decoded from SMA but not from M1, and Wang et al, Nature Communications 2019 found that perceptual information could not be decoded from PMd)?

      How and where sensory information modulating M1 are very interesting and open questions. In the revised manuscript, we discuss these in Lines 435-446, as below: “It would be interesting to explore whether other motor areas also allow sensory modulation during flexible interception. The functional differences between M1 and other areas lead to uncertain speculations. Although M1 has pre-movement activity, it is more related to task variables and motor outputs. Recently, a cycling task sets a good example that the supplementary motor area (SMA) encodes context information and the entire movement (Russo et al., 2020), while M1 preferably relates to cycling velocity (Saxena et al., 2022). The dorsal premotor area (PMd) has been reported to capture potential action selection and task probability, while M1 not (Cisek and Kalaska, 2005; Glaser et al., 2018; Wang et al., 2019). If the neural dynamics of other frontal motor areas are revealed, we might be able to tell whether the orbital neural geometry of mixed selectivity is unique in M1, or it is just inherited from upstream areas like PMd. Either outcome would provide us some insights into understanding the interaction between M1 and other frontal motor areas in motor planning.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      At times the writing was a little hard to parse. It could benefit from being fleshed out a bit to link sentences together better.

      There are a few grammatical errors, such as:

      "These results support strong and similar roles of gain and additive nodes, but what is even more important is that the three modulations interact each other, so the PD-shift nodes should not be neglected."

      should be

      "These results support strong and similar roles of gain and additive nodes, but what is even more important is that the three modulations interact WITH each other, so the PDshift nodes should not be neglected."

      The discussion could also be more extensive to benefit non-experts in the field.

      Thank you. We have proofread and polished the updated manuscript.

      Reviewer #2 (Recommendations For The Authors):

      Other comments:

      - The authors mention mixed selectivity a few times, but Table 1 doesn't have a column for mixed selective neurons--this seems like an important oversight. Likewise, it would be good to see an example of a "mixed" neuron.

      - The structure of the writing in the results section often talked about the supplementary results before the main results - this seems backwards. If the supplementary results are important enough to come before the main figures, then they should not be supplementary. Otherwise, if the results are truly supplementary, they should come after the main results are discussed.

      - Line 305: Authors say "most" RNN units could be classified, and this is technically true, but only barely, according to Table 1. It might be good to put the actual percentage here in the text.

      - Figure 5a: typo ("Motion intention" rather than "Motor")

      - I couldn't find any mention of code or data availability in the manuscript.

      - There were a number of lines that didn't make much sense to me and should probably be rewritten or expanded on:

      - Lines 167-168: "These results qualitatively imply the interaction as that target speeds..." - Lines 178-179: "However, these neural trajectories were not yet the ideal description, because they were shaped mostly by time."

      - Lines 187-188: "...suggesting that target motion affects M1 neural dynamics via a topologically invariant transformation."

      - Lines 224-226: "Note that here we performed an linear transformation on all resulting neural state points to make the ellipse of the static condition orthogonal to the z-axis for better visualization." Does this mean that the z-axis is not PC 3 anymore?

      - Lines 272-274: "These simulations suggest that the existence of PD-shift and additive modulation would not disrupt the neural geometry that is primarily driven by gain modulation; rather it is possible that these three modulations support each other in a mixed population."

      Thank you for these detailed suggestions. By “mixed selectivity”, we mean the joint tuning of both target-motion and movement. In this case, the target-motion modulated neurons (regardless of the modulation type) are of mixed selectivity. The term “motor intention” refers to Mazzoni et al., 1996, Journal of Neurophysiology. We also revised the manuscript for better readership.

      We have updated the data and code availability in Data availability as below:

      “The example experimental datasets and relevant analysis code have been deposited in Mendeley Data at https://data.mendeley.com/datasets/8gngr6tphf. The RNN relevant code and example model datasets are available at https://github.com/yunchenyc/RNN_ringlike_structure.“

      Reviewer #3 (Recommendations For The Authors):

      Minor typos:

      Line 153: “there were”

      Line 301: “network was trained to generate”

      Line 318: “interact with each other”

      Suggested reformulations :

      Line 310 : “tilting angles followed a pattern similar to that seen in the data” Line 187 : the claim of a “topologically invariant transformation” seems strong as the analysis is quite qualitative.

      Suggested changes to the paper (aside from those mentioned in the main review): It could be nice to show behaviour in a main figure panel early on in the paper. This could help with the task description (as it would directly show how the trials are separated based on endpoint) and could allow for discussing the potential caveats of the assumption that behaviour is preserved.

      Thank you. We have corrected these typos and writing problems. As the similar task design has been reported, we finally decided not to provide extra figures or videos. Still, we thank this nice suggestion.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript by Thronlow Lamson et al., the authors develop a "beads-on-a-string" or BOAS strategy to link diverse hemagglutinin head domains, to elicit broadly protective antibody responses. The authors are able to generate varying formulations and lengths of the BOAS and immunization of mice shows induction of antibodies against a broad range of influenza subtypes. However, several major concerns are raised, including the stability of the BOAS, that only 3 mice were used for most immunization experiments, and that important controls and analyses related to how the BOAS alone, and not the inclusion of diverse heads, impacts humoral immunity.

      Strengths:

      Vaccine strategy is new and exciting.

      Analyses were performed to support conclusions and improve paper quality.

      Weaknesses:

      Controls for how different hemagglutinin heads impact immunity versus the multivalency of the BOAS.

      Only 3 mice were used for most experiments.

      There were limited details on size exclusion data.

      We appreciate the reviewer’s comments and have made the following changes to the manuscript.

      (1) We recognize that deconvoluting the effect of including a diverse set of HA heads and multivalency in the BOAS immunogens is necessary to understand the impact on antigenicity. Therefore, we now include a cocktail of the identical eight HA heads used in the 8-mer and BOAS nanoparticle (NP) as an additional control group. While we observed similar HA binding titers relative to the 8-mer and BOAS NP groups, the cocktail group-elicited sera was unable to neutralize any of the viruses tested; multivalency thus appears to be important for eliciting neutralizing responses

      (2) We increased the sample size by repeated immunizations with n=5 mice, for a total of n=8 mice across two independent experiments.

      (3) We expanded the details on size exclusion data to include:

      a) extended chromatograms from Figure 2C as Supplemental Figure 3.

      b) additional details in the materials and methods section (lines 370-372):

      “Recovered proteins were then purified on a Superdex 200 (S200) Increase 10/300 GL (for trimeric HAs) or Superose 6 Increase 10/300 GL (for BOAS) size-exclusion column in Dulbecco’s Phosphate Buffered Saline (DPBS) within 48 hours of cobalt resin elution.”

      Reviewer #2 (Public Review):

      Summary:

      The authors describe a "beads-on-a-string" (BOAS) immunogen, where they link, using a non-flexible glycine linker, up to eight distinct hemagglutinin (HA) head domains from circulating and non-circulating influenzas and assess their immunogenicity. They also display some of their immunogens on ferritin NP and compare the immunogenicity. They conclude that this new platform can be useful to elicit robust immune responses to multiple influenza subtypes using one immunogen and that it can also be used for other viral proteins.

      Strengths:

      The paper is clearly written. While the use of flexible linkers has been used many times, this particular approach (linking different HA subtypes in the same construct resembling adding beads on a string, as the authors describe their display platform) is novel and could be of interest.

      Weaknesses:

      The authors did not compare to individuals HA ionized as cocktails and did not compare to other mosaic NP published earlier. It is thus difficult to assess how their BOAS compare.<br /> Other weaknesses include the rationale as to why these subtypes were chosen and also an explanation of why there are different sizes of the HA1 construct (apart from expression). Have the authors tried other lengths? Have they expressed all of them as FL HA1?

      We appreciate the reviewer’s comments. We responded to the concerns below and modified the manuscript accordingly.

      (1) We recognize that including a “cocktail” control is important to understand how the multivalency present in a single immunogen affects the immune response. We now include an additional control group comprised of a mixture of the same eight HA heads used in the 8-mer and the BOAS nanoparticle (NP). While this cocktail elicited similar HA binding titers relative to the 8-mer and BOAS NP immunogens (Fig. 6G), there was no detectable neutralization any of the viruses tested (Fig. 7).

      (2) In the introduction we reference other multivalent display platforms but acknowledge that distinct differences in their immunogen design platforms make direct comparisons to ours difficult—which is ultimately why we did not use them as comparators for our in vivo studies. Perhaps most directly relevant to our BOAS platform is the mosaic HA NP from Kanekiyo et al. (PMID 30742080). Here, HA heads, with similar boundaries to ours, were selected from historical H1N1 strains. These NPs however were significantly less antigenic diverse relative to our BOAS NPs as they did not include any group 2 (e.g., H7, H9) or B influenza HAs; restricting their multivalent display to group 1 H1N1s likely was an important factor in how they were able to achieve broad, neutralizing H1N1 responses. Additionally, Cohen et al. (PMID 33661993) used similarly antigenically distinct HAs in their mosaic NP, though these included full-length HAs with the conserved stem region, which likely has a significant impact on the elicited cross-reactive responses observed. Lastly, we reference Hills et al. (PMID 38710880), where authors designed similar NPs with four tandemly-linked betacoronoavirus receptor binding domains (RBDs) to make “quartets”. In contrast to our observations, the authors observed increased binding and neutralization titers following conjugation to protein-based NPs. We acknowledge potential differences between the studies, such as the antigen and larger VLP NP, that could lead to the different observed outcomes.

      (3) We intended to highlight the “plug-and-play” nature of the BOAS platform; theoretically any HA subtype could be interchanged into the BOAS. To that end, our rationale for selecting the HA subtypes in our proof-of-principle immunogen was to include an antigenically diverse set of circulating and non-circulating HAs that we could ultimately characterize with previously published subtype-specific antibodies that were also conformation-specific. In doing so, these diagnostic antibodies could confirm presence and conformation integrity of each component. We intentionally did not include HA subtypes that we did not have a conformation-specific antibody for.

      The different sizes of HA head domains was determined exclusively by expression of the recombinant protein. We have not attempted expression of full-length HA1 domains. Furthermore, we have not attempted to express the full-length HA (inclusive of HA1 and HA2) in our BOAS platform. The primary reason was to avoid including the conserved stem region of HA2 which may distract from the HA1 epitopes (e.g., receptor binding site, lateral patch) that can be engaged by broadly neutralizing antibodies. Additionally, the full-length HA is inherently trimeric and may not be as amenable to our BOAS platform as the monomeric HA1 head domain.

      Reviewer #3 (Public Review):

      This work describes the tandem linkage of influenza hemagglutinin (HA) receptor binding domains of diverse subtypes to create 'beads on a string' (BOAS) immunogens. They show that these immunogens elicit ELISA binding titers against full-length HA trimers in mice, as well as varying degrees of vaccine mismatched responses and neutralization titers. They also compare these to BOAS conjugated on ferritin nanoparticles and find that this did not largely improve immune responses. This work offers a new type of vaccine platform for influenza vaccines, and this could be useful for further studies on the effects of conformation and immunodominance on the resulting immune response.

      Overall, the central claims of immunogenicity in a murine model of the BOAS immunogens described here are supported by the data.

      Strengths included the adaptability of the approach to include several, diverse subtypes of HAs. The determination of the optimal composition of strains in the 5-BOAS that overall yielded the best immune responses was an interesting finding and one that could also be adapted to other vaccine platforms. Lastly, as the authors discuss, the ease of translation to an mRNA vaccine is indeed a strength of this platform.

      One interesting and counter-intuitive result is the high levels of neutralization titers seen in vaccine-mismatched, group 2 H7 in the 5-BOAS group that differs from the 4-BOAS with the addition of a group 1 H5 RBD. At the same time, no H5 neutralization titers were observed for any of the BOAS immunogens, yet they were seen for the BOAS-NP. Uncovering where these immune responses are being directed and why these discrepancies are being observed would constitute informative future work.

      There are a few caveats in the data that should be noted:

      (1) 20 ug is a pretty high dose for a mouse and the majority of the serology presented is after 3 doses at 20 ug. By comparison, 0.5-5 ug is a more typical range (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6380945/, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9980174/). Also, the authors state that 20 ug per immunogen was used, including for the BOAS-NP group, which would mean that the BOAS-NP group was given a lower gram dose of HA RBD relative to the BOAS groups.

      We agree that this is on the “upper end” of recombinant protein dose. While we did not do a dose-response, we now include serum analyses after a single prime. The overall trends and reactivity to matched and mis-matched BOAS components remained similar across days d28 and d42. However, the differences between the BOAS and BOAS NP groups and the mixture group were more pronounced at d28, which reinforces our observation that the multivalency of the HA heads is necessary for eliciting robust serum responses to each component. These data are included in Supplemental Figure 5, and we’ve modified the text (lines 185-187) to include;

      “Similar binding trends were also observed with d28 serum, though the difference between the 8mer and mix groups was more pronounced at d28 (Supplemental Figure 5).”

      Additionally, we acknowledge that there is a size discrepancy between the BOAS NP and the largest BOAS, leading to an approximately ~15-fold difference on a per mole basis of the BOAS immunogen. The smallest and largest BOAS also differ by ~ 2.5-fold on a per mole basis; this could favor the overall amount of the smaller immunogens, however because vaccine doses are typically calculated on a mg per kg basis, we did not calculate on a molar basis for this study. Any promising immunogens will be evaluated in dose-response study to optimize elicited responses.

      (2) Serum was pooled from all animals per group for neutralization assays, instead of testing individual animals. This could mean that a single animal with higher immune responses than the rest in the group could dominate the signal and potentially skew the interpretation of this data.

      We repeated the neutralization assays with data points for individual mice. There does appear to be variability in the immune response between mice. This is most noticeable for responses to the H5 component. We are currently assessing what properties of our BOAS immunogen might contribute to the variability across individual mice.

      (3) In Figure S2, it looks like an apparent increase in MW by changing the order of strains here, which may be due to differences in glycosylation. Further analysis would be needed to determine if there are discrepancies in glycosylation amongst the BOAS immunogens and how those differ from native HAs.

      There does appear to be a relatively small difference in MW between the two BOAS configurations shown in Figure S2. This could be due to differences in glycosylation, as the reviewer points out, and in future studies, we intend to assess the influence of native glycosylation on antibody responses elicited by our BOAS immunogens.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Major Concerns

      (1) From Figure 2D-E, it looks like BOAS are forming clusters, rather than a straight line. Do these form aggregates over time? Both at 4 degrees over a few days or after freeze-thaw cycle(s)? It is unclear from the SEC methods how long after purification this was performed and stability should be considered.

      Due to the inherent flexibility of the Gly-Ser linker between each component we do not anticipate that any rigidity would be imposed resulting in a “straight line”. Nevertheless, we appreciate the reviewers concern about the long-term stability of the BOAS immunogens. To address this, we include 1) the extended chromatograms from Figure 2C as Supplemental Figure 3 to show any aggregates present, 2) traces from up to 48 hours post-IMAC, and 3) chromatograms following a freeze-thaw cycle. Post-IMAC purification there is a minor (<10% total peak height) at ~9mL corresponding to aggregation. Note, we excluded this aggregation for immunizations. Post freeze-thaw cycle, we can see that upon immediate (<24hrs) thawing, the BOAS maintain a homogeneous peak with no significant (<10%) aggregation or degradation peak. However, after ~1 week post-freeze-thaw cycle at 4C, additional peaks within the chromatogram correspond to degradation of the BOAS.

      We modified the materials and methods section to state (lines 370-372)

      “Recovered proteins were then purified on a Superdex 200 (S200) Increase 10/300 GL (for trimeric HAs) or Superose 6 Increase 10/300 GL (for BOAS) size-exclusion column in Dulbecco’s Phosphate Buffered Saline (DPBS) within 48 hours of cobalt resin elution.”

      We commented on BOAS stability in the results section (lines 142-148)

      “Following SEC, affinity tags were removed with HRV-3C protease; cleaved tags, uncleaved BOAS, and His-tagged enzyme were removed using cobalt affinity resin and snap frozen in liquid nitrogen before immunizations. BOAS maintained monodispersity upon thawing, though over time, degradation was observed following longer term (>1 week) storage at 4C (Supplemental Figure 3). This degradation became more significant as BOAS increased in length (Supplemental Figure 3).”

      We also included in the discussion (lines 277-279):

      “Notably, for longer BOAS we observed degradation following longer term storage at 4C, which may reflect their overall stability.”

      (2) Figures 3-4 and 6-7, to make conclusions off of 3 mice per group is inappropriate. A sample size calculation should have been conducted and the appropriate number of mice tested. In addition, two independent mouse experiments should always be performed. Moreover, the reliability of the statistical tests performed seems unlikely, given the very small sample size.

      We agree that additional mice are necessary to make assessments regarding immunogenicity and cross-reactivity differences between the immunogens. To address this, we repeated the immunization with 5 additional mice, for a total of n=8 mice over two independent experiments. We incorporated these data into Figure 3B-D, as well as an additional Figure 3E (see below). We also now report the log-transformed endpoint titer (EPT) values rather than reciprocal EC50 values and added clarity to statistical analyses used. We have added the following lines to the methods section

      lines 427-431:

      “Serum endpoint titer (EPT) were determined using a non-linear regression (sigmoidal, four-parameter logistic (4PL) equation, where x is concentration) to determine the dilution at which dilution the blank-subtracted 450nm absorbance value intersect a 0.1 threshold. Serum titers for individual mice against respective antigens are reported as log transformed values of the EPT dilution.”

      lines 406-408:

      “C57BL/6 mice (Jackson Laboratory) (n=8 per group for 3-, 4-, 5-, 6-, 7-, and 8mer cohorts; n=5 for BOAS NP, NP, and mix cohorts) were immunized with 20µg of BOAS immunogens of varying length and adjuvanted with 50% Sigmas Adjuvant for a total of 100µL of inoculum.”

      lines 482-490:

      “Statistical Analysis

      Significance for ELISAs and microneutralization assays were determined using Prism (GraphPad Prism v10.2.3). ELISAs comparing serum reactivity and microneutralization and comparing >2 samples were analyzed using a Kruskal-Wallis test with Dunn’s post-hoc test to correct for multiple comparisons. Multiple comparisons were made between each possible combination or relative to a control group, where indicated. ELISAs comparing two samples were analyzed using a Mann-Whitney test. Significance was assigned with the following: * = p<0.05, ** = p<0.01, *** = p<0.001, and **** = p<0.0001. Where conditions are compared and no significance is reported, the difference was non-significant.”

      (3) One critical control that is missing is a homogenous BOAS, for example, just linking one H1 on a BOAS. Does oligomerization and increasing avidity alone improve humoral immunity?

      We agree that this is an interesting point, However, to address the impact of oligomerization and avidity on humoral immunity, we now include an additional control with a cocktail of HA heads used in the 8mer. We have incorporated this into Figure 3A, 3D and 3E, Figure 6G, and Figure 7.

      Additionally, we have added the following lines in the manuscript:

      lines 38-40:

      “Finally, vaccination with a mixture of the same HA head domains is not sufficient to elicit the same neutralization profile as the BOAS immunogens or nanoparticles.”

      lines 105-106:

      “Additionally, we showed that a mixture of the same HA head components was not sufficient to recapitulate the neutralizing responses elicited by the BOAS or BOAS NP.”

      lines 169-172:

      “To determine immunogenicity of each BOAS immunogen, we performed a prime-boost-boost vaccination regimen in C5BL/6 mice at two-week intervals with 20µg of immunogen and adjuvanted with Sigma Adjuvant (Figure 3A). We compared these BOAS to a control group immunized with a mixture of the eight HA heads present in the 8mer.”

      lines 265-267:

      “There were qualitatively immunodominant HAs, notably H4 and H9, and these were relatively consistent across BOAS in which they were a component. This effect was reduced in the mix cohort.”

      (4) While some cross-reactivity is likely (Figure 6G), there is considerable loss of binding when there is a mismatch. Of the antibodies induced, how much of this is strain-specific? For example, how well do serum antibodies bind to a pre-2009 H1?

      We agree with the reviewer that there is a considerable loss of binding when there is a mismatched HA component. To better understand this and incorporate a mismatched strain into our analysis of the 8mer and BOAS NP, we looked at serum binding titers to a pre-2009 H1, H1/Solomon Islands/2006, and an antigenically distinct H3, H3/Hong Kong/1968. We have incorporated this data into Figures 3D, 3E, 6F and 6G. We observed relatively high titers against both a mismatched H1 and H3, indicating that the BOAS maintain high titers against subtype-specific strains that are conserved over considerable antigenic distance. However, this was similar in the mixture group, indicating that this may not be specific to oligomerization of BOAS immunogens.

      We added the following to the methods section:

      lines 357-361

      “Head subdomains from these HAs were used in the BOAS immunogens, and full-length soluble ectodomain (FLsE) trimers were used in ELISAs. Additional H1 (H1/A/Solomon Islands/3/2006) and H3 (H3/A/Hong Kong/1/1968) FLsEs were used in ELISAs as mismatched, antigenically distinct HAs for all BOAS.”

      Minor Concerns

      (1) Line 44-46, the deaths per year are almost exclusively due to seasonal influenza outbreaks caused by antigenically drifted viruses in humans, not those spilling over from avian sp. and swine. For accuracy, please adjust this sentence.

      We have adjusted lines 45-48 to say “This is largely a consequence of viral evolution and antigenic drift as it circulates seasonally within humans and ultimately impacts vaccine effectiveness. Additionally, the chance for spillover events from animal reservoirs (e.g., avian, swine) is increasing as population and connectivity also increase.”

      (2) Figure 4D-E, provide a legend for what the symbols indicate, or simply just put the symbol next to either the homology score and % serum competition labels on the y-axis.

      We have included a legend in Figures 4D,E to distinguish between homology score and % serum competition

      (3) I am a bit confused by the data presented in Figure 7. The figure legend says the two symbols represent technical replicates. How? Is one technical replicate of all the mice in a group averaged and that's what's graphed? If so, this is not standard practice. I would encourage the authors to show the average technical replicates of each animal, which is standard.

      We thank the reviewer for their suggestion, and we have revised Figure 7 such that each symbol represents a single animal for n=5 animals. We have also adjusted the figure caption to the following:

      “Figure 7: Microneutralization titers to matched and mis-matched virus- Microneutralization of matched and mis-matched psuedoviruses: H1N1 (green, top left), H3N2 (orange, top right), H5N1 (yellow, bottom left), and H7N9 viruses (pink, bottom right) with d42 serum. Solid bars below each plot indicate a matched sub-type, and striped bars indicate a mis-matched subtype (i.e. not present in the BOAS). NP negative controls were used to determine threshold for neutralization. Upper and lower dashed lines represent the first dilution (1:32) (for H1N1, H3N2, and H5N1) or neutralization average with negative control NP serum (H7N9), and the last serum dilution (1:32,768), respectively, and points at the dashed lines indicate IC50s at or outside the limit of detection. Individual points indicate IC50 values from individual mice from each cohort (n=5). The mean is denoted by a bar and error bars are +/- 1 s.d., * = p<0.05 as determined by a Kruskal-Wallis test with Dunn’s multiple comparison post hoc test relative to the mix group.”

      (4) Paragraphs 298-313, multiple studies are referred to but not referenced.

      We have added the following references to this section:

      (38) Kanekiyo, M. et al. Self-assembling influenza nanoparticle vaccines elicit broadly neutralizing H1N1 antibodies. Nature 498, 102–106 (2013).

      (48) Hills, R. A. et al. Proactive vaccination using multiviral Quartet Nanocages to elicit broad anti-coronavirus responses. Nat. Nanotechnol. 1–8 (2024) doi:10.1038/s41565-024-01655-9.

      (65) Jardine, J. et al. Rational HIV immunogen design to target specific germline B cell receptors. Science 340, 711–716 (2013).

      (66) Tokatlian, T. et al. Innate immune recognition of glycans targets HIV nanoparticle immunogens to germinal centers. Science 363, 649–654 (2019).

      (67) Kato, Y. et al. Multifaceted Effects of Antigen Valency on B Cell Response Composition and Differentiation In Vivo. Immunity 53, 548-563.e8 (2020).

      (68) Marcandalli, J. et al. Induction of Potent Neutralizing Antibody Responses by a Designed Protein Nanoparticle Vaccine for Respiratory Syncytial Virus. Cell 176, 1420-1431.e17 (2019).

      (69) Bruun, T. U. J., Andersson, A.-M. C., Draper, S. J. & Howarth, M. Engineering a Rugged Nanoscaffold To Enhance Plug-and-Display Vaccination. ACS Nano 12, 8855–8866 (2018).

      (70) Kraft, J. C. et al. Antigen- and scaffold-specific antibody responses to protein nanoparticle immunogens. Cell Reports Medicine 100780 (2022) doi:10.1016/j.xcrm.2022.100780.

      Reviewer #2 (Recommendations For The Authors):

      Can the authors define "detectable titers"?

      Maybe add a threshold value of reciprocal EC on the figure for each plot.

      We recognize the reviewers concern with reporting serum titers in this way, and we have adjusted our reported titers as endpoint titers (EPT) with a dotted line for the first detectable dilution (1:50). We have also adjusted the methods section to reflect this change:

      (lines 427-431)

      “Serum endpoint titer (EPT) were determined using a non-linear regression (sigmoidal, four-parameter logistic (4PL) equation, where x is concentration) to determine the dilution at which dilution the blank-subtracted 450nm absorbance value intersect a 0.1 threshold. Serum titers for individual mice against respective antigens are reported as log transformed values of the EPT dilution.”

      It also appears that not all X-mer elicits an immune response against matched HA, e.g. for the 7 and 8 -mer. Not sure why the authors do not mention this. It could be due to too many HAs, not sure.

      We apologize for the confusion, and agree that our original method of reporting EC50 values does not reflect weak but present binding titers. Upon further analysis with additional mice as well as adjusting our method of reporting titers, it is easier to see in Figure 3D that all X-mer BOAS do indeed elicit binding detectable titers to matched HA components.

      It will be nice to add a conclusion to the cross-reactivity - again it appears that past 6-mer there has been a loss in cross-reactivity even though there are more subtypes on the BOAS.

      Also, the TI seemed to be the more conserved epitope targeted here.

      (Of note these two are mentioned in the discussion)

      We have updated the results section to include the following:

      (lines 281-294)

      “Based on the immunogenicity of the various BOAS and their ability to elicit neutralizing responses, it may not be necessary to maximize the number of HA heads into a single immunogen. Indeed, it qualitatively appears that the intermediate 4-, 5-, and 6mer BOAS were the most immunogenic and this length may be sufficient to effectively engage and crosslink BCR for potent stimulation. These BOAS also had similar or improved binding cross-reactivity to mis-matched HAs as compared to longer 7- or 8mer BOAS. Notably, the 3mer BOAS elicited detectable cross-reactive binding titers to H4 and H5 mismatched HAs in all mice. This observed cross-reactivity could be due to sequence conservation between the HAs, as H3 and H4 share ~51% sequence identity, and H1 and H2 share ~46% and ~62% overall sequence identity with H5, respectively (Supplemental Figure 6). Additionally, the degree of surface conservation decreased considerably beyond the 5mer as more antigenically distinct HAs were added to the BOAS. These data suggest that both antigenic distance between HA components and BOAS length play a key role in eliciting cross-reactive antibody responses, and further studies are necessary to optimize BOAS valency and antigenic distance for a desired response.”

      Figure 5E, the authors could indicate which subtype each mab is specific to for those who are not HA experts. (They have them color-coded but it is hard to see because very small).

      The authors also do not explain why 3E5 does not bind well to H1, H2, H3, H4 4-mer BOA, etc...

      We apologize for the lack of clarity in this figure. We updated Figure 5E to include the subtype it is specific for as well as listing the antibodies and their subtype and targeted epitope in the figure caption.

      Minor

      Figure 1B zoom looks like the line is hidden to the structure - should come in front

      We adjusted the figure accordingly.

      Line 127 - whether the order

      Corrected

      What is the rationale for thinking that a different order will lead to a different expression and antigenic results?

      We thank the reviewer for this question. We did not necessarily anticipate a difference in protein expression based on BOAS order We, however, wanted to verify that our platform was indeed “plug-and-play” platform and we could readily exchange components and order. We do, however, hypothesize that a different order may in fact lead to different antigenic results. We think that the conformation of the BOAS as well as physical and antigenic distance of HA components may influence cross-linking efficiency of BCRs and lead to different antigenic results with different levels of cross-reactivity. For example, a BOAS design with a cluster of group 1 HAs followed by a cluster of group 2 HAs, rather than our roughly alternating pattern could impact which HAs are in proximity to each other or could be potentially shielded in certain conformations, and thus could affect antigenic results. We expand on this rationale in the discussion in lines 310-314:

      “Further studies with different combinations of HAs could aid in understanding how length and composition influences epitope focusing. For example, a BOAS design with a cluster of group 1 HAs followed by a cluster of group 2 HAs, rather than our roughly alternating pattern could impact which HAs are in close proximity to one other or could be potentially shielded in certain conformations, and thus could affect antigenic results.”

      Maybe list HA#1 HA#2 HA#3 instead of HA1, HA2, HA3 to make sure it is not confounded with HA2 and HA2

      We agree that this may be confusing for readers, and have adjusted Figure 1C to show HA#1, HA#2, etc.

      For nsEM, do the authors have 2D classes and even 3D reconstructions? Line 148-149: maybe or just because there are more HAs.

      We did not obtain 2D class or 3D reconstructions of these BOAS. However, we do agree with the reviewer that the collapsed/rosette structure of the 8mer BOAS may be a consequence of the additional HA heads as well as the flexible Gly-Ser linkers between the components. We have added clarify to our statement in the discussion to read:

      lines 154-156:

      “This is likely a consequence of the flexible GSS linker separating the individual HA head components as well as the addition of significantly more HA head components to the construct.”.

      Line 153 " interface-directed" - what does this mean?

      We apologize for any confusion- we intend for “interface-directed” to refer antibodies that engage the trimer interface (TI) epitope between HA protomers. We have adjusted the manuscript to use the same terminology throughout, i.e. trimer interface or its abbreviation, TI.

      For Figure 2 F - do you have a negative control? Usually one does not determine an ELISA KD, it is not very accurate but shows binding in terms of OD value.

      We did include a negative control, MEDI8852, a stem-directed antibody, though it was not shown in the figure because we observed no binding, as expected. This negative control antibody was also used in Figure 5E for characterizing the BOAS NPs, and also shows no binding. We recognize that in an ELISA the KD is an equilibrium measurement and we do not report kinetic measurements as determined by a method such as bio-layer interferometry (BLI), and have this adjusted the figure caption to denote the values as “apparent K<sub>D</sub> values”.

      Line 169 - reads strangely, "BOAS-elicited serum, regardless of its length, reacted<br /> The length is the one of the Immunogen, not the serum

      We agree that this statement is unclear, and we have modified the sentence to read:

      lines 177-178:

      “Each of the BOAS, regardless of its length, elicited binding titers to all matched full-length HAs representing individual components (Figure 3D).”

      What is the adjuvant used (add in results)?

      We used Sigma adjuvant for all immunizations, and have included this information in the results section:

      lines 169-171:

      “To determine immunogenicity of each BOAS, we performed a prime-boost-boost vaccination regimen in C5BL/6 mice at two-week intervals with 20µg of immunogen and adjuvanted with Sigma Adjuvant (Figure 3A).”

      This information is also included in the methods section in lines 406-412.

      Line 178 - remove " across"

      We have removed the word “across” in this sentence and replaced it with “on” (line 194)

      Trimer- interface, and interface epitopes are used exchangeably - maybe keep it as trimer interface to be more precise

      As stated above, we have adjusted the manuscript to use the same term throughout, i.e., trimer interface or its abbreviation, TI.

      Line 221 - no figure 6H (6G?)

      We apologize for this typo and have corrected to Figure 6G (line 231)

      Reviewer #3 (Recommendations For The Authors):

      (1) Since 20 ug x3 doses is quite a high amount of vaccine, differences between immunogens may become blurred. Thus, it may be informative to compare post-prime serology for all immunogens or select immunogens to compare to the post-3rd dose data.

      We agree with the reviewer that this is on the upper end of vaccine dose and thus we explored the serum responses after a single boost. The overall trends and reactivity to matched and mis-matched BOAS components remained similar across days d28 and d42. However, the differences between the BOAS and BOAS NP groups and the mixture group were more pronounced at d28, which bolsters our claim that the presentation of the HA heads is important for eliciting strong serum responses to all components. We have included this data in Supplemental Figure 5, and have acknowledged this in the text:

      lines 185-187:

      “Similar binding trends were also observed with d28 serum, though the difference between the 8mer and mix groups was more pronounced at d28 (Supplemental Figure 5).”

      (2) Significance statistics for all immunogenicity data should be added and discussed; it is particularly absent in Figures 3D and 7.

      We have added statistical analyses to Figure 3 and Figure 7 to reflect changes in immunogenicity. We have also added the following to the methods section:

      lines 482-490:

      “Statistical Analysis

      Significance for ELISAs and microneutralization assays were determined using either a Mann-Whitney test or a Kruskal-Wallis test with Dunn’s post-hoc test in Prism (GraphPad Prism v10.2.3) to correct for multiple comparisons. Multiple comparisons were made between each possible combination or relative to a control group, where indicated. Significance was assigned with the following: * = p<0.05, ** = p<0.01, *** = p<0.001, and **** = p<0.0001. Where conditions are compared and no significance is reported, the difference was non-significant.”

      (3) Figure 2F: the figure has K03.12 listed for the H3-specific mAb and in the main text, but the caption says 3E5 - is the 3E5 in the caption a typo? 3E5 is listed for the competition ELISAs as an RBS mAb, but its binding site is distal to the RBS at residues 165-170 (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9787348/), H7.167 binds in the RBS periphery and not directly within the RBS, and the epitope for P2-D9 is undetermined/not presented. This could mean that there is actually a higher proportion of RBS-directed antibodies than what is determined from this serum competition data. Also, reference to these as 'RBS-directed' in the serum competition methods section should be revised for accuracy.

      We sincerely apologize for this error and the resulting confusion. 3E5 in the caption is incorrect and should be K03.12 (https://www.rcsb.org/structure/5W08) and does engage the receptor binding site. We also apologize for the oversight that H7.167 is in the RBS periphery and not directly in the RBS. The additional P2-D9 in the panel of RBS-directed antibodies was also in error, as we do not believe it is RBS-directed, but is indeed H4 specific. We also included a reference to the paper and immunogen that elicited this antibody. We agree that this indicates that there could be a higher proportion of RBS-directed antibodies in the serum and have modified the text in the results and methods sections to read:

      lines 300-306:

      “Notably, this proportion is approximate, as at the time of reporting, antibodies that bind the receptor binding site of all components were not available. RBS-directed antibodies to the H4 and H9 component were not available, and the RBS-directed antibodies used targeting the other HA components have different footprints around the periphery of the RBS. Additionally, there are currently no reported influenza B TI-directed antibodies in the literature. Therefore, this may be an underestimate of the serum proportion focused to the conserved RBS and TI epitopes.”

      lines 435-439:

      “Following blocking with BSA in PBS-T, blocking solution was discarded and 40µL of either DPBS (no competition control), a cocktail of humanized antibodies targeting the RBS and periphery (5J8, 2G1, K03.12, H5.3, H7.167, H1209), a cocktail of humanized TI-directed antibodies (S5V2-29, D1 H1-17/H3-14, D2 H1-1/H3-1), or a negative control antibody (MEDI8852) were added at a concentration of 100µg/mL per antibody.”

      (4) Only nsEM data is shown for the 3-BOAS and 8-BOAS, where differences in morphology were seen between these longer and shorter proteins. Including nsEM images for all BOAS immunogens may show trends in morphology or organization that could correlate with immune responses, e.g. if the 5-BOAS also forms a higher proportion of rosette-like structures, while the the 4-BOAS is still a mix between extended and rosette-like, this could be a factor in the better immune responses seen for 5-BOAS.

      We appreciate the reviewer’s suggestion for further analysis of morphology between the intermediate BOAS sizes. We agree that the relationship between BOAS length and morphology should be explored more in depth, and we intend to do so in future studies and to also vary linker length and rigidity.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary:

      Rossi et al. asked whether gait adaptation is solely a matter of slow perceptual realignment or if it also involves fast/flexible stimulus-response mapping mechanisms. To test this, they conducted a series of split-belt treadmill experiments with ramped perturbations, revealing behavior indicative of a flexible, automatic stimulus-response mapping mechanism.

      Strengths:

      (1) The study includes a perceptual test of leg speed, which correlates with the perceptual realignment component of motor aftereffects. This indicates that there are motor performances that are not accounted for by perceptual re-alignment.

      (2) They study incorporates qualitatively distinct, hypothesis-driven models of adaptation and proposes a new framework that integrates these various mechanisms.

      Weaknesses:

      (1) The study could benefit from considering other alternative models. As the authors noted in their discussion, while the descriptive models explain some patterns of behaviour/aftereffects, they don't currently account for how these mechanisms influence the initial learning process itself.

      (1a) For example, the pattern of gait asymmetric might differ for perceptual realignment (a smooth, gradual process), structural learning (more erratic, involving hypothesis testing/reasoning to understand the perturbation, see (Tsay et al. 2024) for a recent review on Reasoning), and stimulus-response mapping (possibly through a reinforcement based trial-and-error approach). If not formally doing a model comparison, the manuscript might benefit from clearly laying out the behavioural predictions for how these different processes shape initial learning.

      (1b) Related to the above, the authors noted that the absence of difference during initial learning suggests that the differences in Experiment 2 in the ramp-up phase are driven by two distinct processes: structural learning and memory-based processes. If the assumptions about initial learning are not clear, this logic of this conclusion is hard to follow.

      Thank you for this insightful comment. We agree that considering alternative models and clarifying their potential contributions to the initial learning process would enhance the manuscript. We performed additional analyses and revised the text to outline how the mechanisms of adaptation in our study align with the framework described by Tsay et al. (2024) regarding the initial learning process and other features of adaptation.

      First, we referenced the Tsay et al. framework in the Introduction and Discussion to highlight parallels between their description of implicit adaptation and our forward model recalibration mechanism (producing motor changes and perceptual realignment). Specifically, the features defining recalibration in our study – gradual, trial-by-trial adjustments, rigid learning that leads to aftereffects, and limited contribution to generalization – align with those described by Tsay et al.

      Second, we used the description provided by Tsay et al. to test the presence of explicit strategies in our study. We specifically test for the criteria of reportability and intentionality, corroborating the finding that our stimulus response mapping mechanism differs from explicit strategies.

      “A recent framework for motor learning by Tsay et al. defines explicit strategies as motor plans that are both intentional and reportable (Tsay et al., 2024). Within this framework, Tsay et al. clarify that "intentional" means participants deliberately perform the motor plan, while "reportable" means they are able to clearly articulate it.” (Experiment 2 Results, lines 515-518).

      “…the motor adjustments reported by participants consistently fail to meet the criteria for explicit strategies as outlined by Tsay et al.: reportability and intentionality (Tsay et al., 2024).” (Discussion, lines 657-660).

      Third, we interpreted the operation of stimulus-response mapping within the Tsay theoretical framework for the three stages of motor learning: 1) “reasoning” to acquire new action–outcome relationships, 2) “refinement” of the motor action parameters, and 3) “retrieval” of learnt motor actions based on contextual cues. We note that the definition of these stages closely aligns with our definition for stimulus response mapping mechanisms. Moreover, according to Tsay’s definition, both implicit and explicit learning mechanisms can involve similar reasoning and retrieval processes. This shared operational basis may explain why our stimulus-response mapping mechanism exhibits some characteristics associated with explicit strategies, such as flexibility and generalizability.

      We performed a new analysis to evaluate Tsay’s framework predictions that, if walking adaptation includes a stimulus-response mapping mechanism following these three stages of motor learning, the learning process would initially be erratic and would then stabilize as learning progresses. We assessed within-participant residual variance in step length asymmetry around a double exponential model fit during adaptation, testing the prediction that this variability would decrease between the start and end of adaptation. Experiment 1 results confirmed this prediction, showing that a significant reduction in variability as adaptation progressed.

      “We finally tested whether the pattern of motor variability during adaptation aligns with predictions for learning new  stimulus response maps. In contrast to recalibration, mapping mechanisms are predicted to be highly  variable  and  erratic  during  early learning, and stabilize as learning progresses (Tsay et al., 2024). Consistent with these predictions,  the  step  length  asymmetry residual  variance  (around  a  double exponential  fit)  decreased  significantly between the start and end of adaptation (residual variance at start minus end of adaptation = 0.005 [0.004, 0.007], mean [CI]; SI Appendix, Fig. S3). These control analyses corroborate the hypothesis that the “no aftereffects” region of the Ramp Down reflects the operation of a mapping mechanism.”

      (Experiment 1 Results, lines 187-194; Methods, lines 1040-1050).

      Moreover, Experiment 2 results demonstrated that the pattern of variability (its magnitude and decay in adaptation) did not differ between participants using memory-based versus structure-based stimulus-response mapping mechanisms. These findings suggest that both types of mapping operate accordingly to Tsay’s stages of motor learning.

      “Furthermore, the pattern of step length asymmetry variability was similar between the subgroups (structure – memory difference in residual variance relative to double exponential during initial adaptation = -0.0052 [0.0161, 0.0044], adaptation plateau = -0.0007 [-0.0021, 0.0003], difference in variance decay = -0.0045 [-0.0155, 0.0052], mean [CI]; SI Appendix, Fig. S16). This confirms that the distinct performance clusters in the Ramp Up & Down task are not driven by natural variations in learning ability, such as differences in learning speed or variability. Rather, these findings indicate that the subgroups employ different types of mapping mechanisms, which perform similarly during initial learning but differ fundamentally in how they encode, retrieve, and generalize relationships between perturbations and Δ motor outputs.” (Experiment 2 Results, lines 503-511).

      “Both memory- and structure-based operations of mapping align with Tsay et al.’s framework for motor learning: first, action–outcome relationships are learned through exploration; second, motor control policies are refined to optimize rewards or costs, such as reducing error; and finally, learned mappings or policies are retrieved based on contextual cues (Tsay et al., 2024). Consistent with the proposed stages of exploration followed by refinement, we found that motor behavior during adaptation was initially erratic but became less variable at later stages of learning. Similarly, consistent with the retrieval stage, the generalization observed in the ramp tasks indicates that learned motor outputs are flexibly retrieved based on belt speed cues.” (Discussion, lines 701-708).

      Finally, we addressed the prediction outlined by Tsay et al. that repeated exposure to perturbations attenuates the magnitude of forward model recalibration, with savings being driven by stimulus-response mapping mechanisms. While we could not directly test savings for the primary perturbation used during adaptation, we were able to indirectly evaluate savings for a different perturbation through analyses of our control experiments combined with previous results from Leech et al. (Leech et al., 2018). Specifically, we examined how motor aftereffects and perceptual realignment evolved across repeated iterations of the speed-matching task post-adaptation in Ascending groups. Each task began with the right leg stationary and the left leg moving at 0.5 m/s – a configuration corresponding to a perturbation of -0.5 m/s, which is opposite in direction to the adaptation perturbation. By analyzing repeated exposures to this -0.5 m/s perturbation across iterations, we gained insights into the learning dynamics associated with this perturbation and the effect of repeated exposures on motor aftereffects and perceptual realignment. Consistent with predictions from Tsay et al., our results combined with Leech et al. demonstrate that, with repeated exposures to the same perturbation, perceptual realignment decays while the contribution of stimulus-response mapping to aftereffect savings is enhanced. We present this analysis and interpretation in Control Experiments Results, lines 429-442; Figure 8B; Table S7; and Discussion lines 709-753.

      (1c) The authors could also test a variant of the dual-rate state-space model with two perceptual realignment processes where the constraints on retention and learning rate are relaxed. This model would be a stronger test for two perceptual re-alignment processes: one that is flexible and another that is rigid, without mandating that one be fast learning and fast forgetting, and the other be slow learning and slow forgetting.

      We tested multiple variants of the suggested models, and confirmed that they cannot capture the motor behavior observed in our Ramp Down task. We include Author response image 1 with the models fits, Author response table 1 with the BIC statistics, and the models equations below. Only the recalibration + mapping model captures the matching-then-divergent behavior of the Δ motor output, corroborating our interpretation that state-space based models cannot capture the mapping mechanism (see Discussion, “Implications for models of adaptation”). Furthermore, all models fit the data significantly worse than the recalibration+mapping model according to the BIC statistic.

      Model fits:

      Author response image 1.

      Statistical results:

      Author response table 1.

      Model definitions:

      • DualStateRelaxed: same equations as the original Dual State, but no constraints dictating the relative relationship between the parameters

      • DualStateRelaxedV2: same equations as the original Dual State, but no constraints dictating the relative relationship between the parameters, and “loose” parameter bounds (parameters can take values between -10 to 10).

      • PremoOriginalRelaxed: PReMo with two states (see below), no constraints dictating the relative relationship between the parameters

      • PremoOriginalRelaxed: PReMo with two states (see below), no constraints dictating the relative relationship between the parameters, and “loose” parameter bounds (parameters can take values between -10 to 10).

      PReMo with two states – the remaining equations are the same as the original PReMo (see Methods):

      (2) The authors claim that stimulus-response mapping operates outside of explicit/deliberate control. While this could be true, the survey questions may have limitations that could be more clearly acknowledged.

      (2a) Specifically, asking participants at the end of the experiments to recall their strategies may suffer from memory biases (e.g., participants may be biased by recent events, and forget about the explicit strategies early in the experiment), be susceptible to the framing of the questions (e.g., participants not being sure what the experimenter is asking and how to verbalize their own strategy), and moreover, not clear what is the category of explicit strategies one might enact here which dictates what might be considered "relevant" and "accurate".

      (2b) The concept of perceptual realignment also suggests that participants are somewhat aware of the treadmill's changing conditions; therefore, as a thought experiment, if the authors have asked participants throughout/during the experiment whether they are trying different strategies, would they predict that some behaviour is under deliberate control?

      We have expanded the discussion to explicitly acknowledge that our testing methodology for assessing explicit strategies may have limitations, recognizing the factors mentioned by the reviewer. Moreover, as mentioned in response to comment (1), we leveraged the framework from Tsay et al., 2024 and its definition of explicit strategies to ensure a robust and consistent approach in interpreting the survey responses.

      We revised the Experiment 2 Results section, lines 515-518, to specify that we are evaluating the presence of explicit strategies according to the criteria of intentionality and reportability:

      “A recent framework for motor learning by Tsay et al. defines explicit strategies as motor plans that are both intentional and reportable (Tsay et al., 2024). Within this framework, Tsay et al. clarify that "intentional" means participants deliberately perform the motor plan, while "reportable" means they are able to clearly articulate it.”

      We then reorganized the Discussion to include a separate section “Mapping operates independently of explicit control”, lines 646-661, where we discuss limitations of the survey methodology and interpretation of the results according to Tsay et al., 2024:

      “Here, we show that explicit strategies are not systematically used to adapt step length asymmetry and Δ motor output: the participants in our study either did not know what they did, reported changes that did not actually occur or would not lead symmetry. Only one person reported “leaning” on the left (slow) leg for as much time as possible, which is a relevant but incomplete description for how to walk with symmetry. Four reports mentioned pressure or weight, which may indirectly influence symmetry (Hirata et al., 2019; Lauzière et al., 2014), but they were vague and conflicting (e.g., “making heavy steps on the right foot” or “put more weight on my left foot”). All other responses were null, explicitly wrong or irrelevant, or overly generic, like wanting to “stay upright” and “not fall down”. We acknowledge that our testing methodology has limitations. First, it may introduce biases related to memory recall or framing of the questionnaire. Second, while it focuses on participants' intentional use of explicit strategies to control walking, it does not rule out the possibility of passive awareness of motor adjustments or treadmill configurations. Despite these limitations, the motor adjustments reported by participants consistently fail to meet the criteria for explicit strategies as outlined by Tsay et al.: reportability and intentionality (Tsay et al., 2024). Together with existing literature, this supports the interpretation that stimulus response mapping operates automatically.”

      We also made the following addition to the “Limitations” section of the Discussion (lines 917-919):

      “While mapping differs from explicit strategies as they are currently defined, we still lack a comprehensive framework to capture the varying levels and nuanced characteristics of intentionality and awareness of different mechanisms (Tsay et al., 2024).”

      We finally note that “Unlike explicit strategies, which are rapidly acquired and diminish over time, this mapping mechanism exhibits prolonged learning beyond 15 minutes, with a rate comparable to recalibration” (Discussion, lines 632-634).

      (3) The distinction between structural and memory-based differences in the two subgroups was based on the notion that memory-based strategies increase asymmetry. However, an alternative explanation could be that unfamiliar perturbations, due to the ramping up, trigger a surprise signal that leads to greater asymmetry due to reactive corrections to prevent one's fall - not because participants are generalizing from previously learned representations (e.g., (Iturralde & Torres-Oviedo, 2019)).

      We agree that reactive corrections could contribute to the walking pattern in response to split-belt perturbations, as detailed by Iturralde & Torres-Oviedo, 2019. We also acknowledge that reactive corrections are rapid, flexible, feedback-driven, and automatic – characteristics that make them appear similar to stimulus-response mapping. However, a detailed evaluation of our results suggests that the behaviors observed in the ramp tasks cannot be fully explained by reactive corrections. Reactive corrections occur almost immediately, quickly adjusting the walking pattern to reduce error and improve stability. This excludes the possibility that what we identified as stimulusresponse mapping could instead be reactive corrections, because the stimulus-response mapping observed in our study is acquired slowly at a rate comparable to recalibration. It also excludes the possibility that the increased asymmetry in the Ramp Up & Down could be due to reactive corrections, because these would operate alongside mapping to help reduce asymmetry rather than exacerbate it.

      We made substantial revisions to the Discussion and included the section “Stimulus-response mapping is flexible but requires learning” to explain this interpretation (lines 595-622):

      “The mapping mechanism observed in our study aligns with the corrective responses described by Iturralde and Torres-Oviedo, which operate relative to a recalibrated "new normal" rather than relying solely on environmental cues (Iturralde and Torres-Oviedo, 2019). Accordingly, our findings suggest a tandem architecture: forward model recalibration adjusts the nervous system's "normal state," while stimulus-response mapping computes motor adjustments relative to this "new normal." This architecture explains the sharp transition from flexible to rigid motor adjustments observed in our Ramp Down task. The transition occurs at the configuration perceived as "equal speeds" (~0.5 m/s speed difference) because this corresponds to the recalibrated “new normal”.

      In the first half of the Ramp Down, participants adequately modulated their walking pattern to accommodate the gradually diminishing perturbation, achieving symmetric step lengths. Due to the recalibrated “new normal”, perturbations within this range are perceived as congruent with the direction of adaptation but reduced in magnitude. This allows the mapping mechanism to flexibly modulate the walking pattern by using motor adjustments previously learned during adaptation. Importantly, the rapid duration of the Ramp Down task rules out the possibility that the observed modulation may instead reflect washout, as confirmed by the fact the aftereffects measured post-Ramp-Down were comparable to previous work (Kambic et al., 2023; Reisman et al., 2005).

      In the second half of the Ramp Down, aftereffects emerged as participants failed to accommodate perturbations smaller than the recalibrated “new normal”. These perturbations were perceived as opposite to the adaptation perturbation and, therefore, novel. Accordingly, the mapping mechanism responded as it would to a newly introduced perturbation, rather than leveraging previously learned adjustments (Iturralde and Torres-Oviedo, 2019). Due to the rapid nature of the Ramp Down, the mapping mechanism lacked sufficient time to learn the novel motor adjustments required for these perturbations – a process that typically takes several minutes, as shown by our baseline ramp tasks and control experiments. As mapping-related learning was negligible, the rigid recalibration adjustments dominated during this phase. Consequently, the walking pattern did not change to accommodate the gradually diminishing perturbation, leading to the emergence of aftereffects.”

      (4) Further contextualization: Recognizing the differences in dependent variables (reaching position vs. leg speed/symmetry in walking), could the Proprioceptive/Perceptual Re-alignment model also apply to gait adaptation (Tsay et al., 2022; Zhang et al., 2024)? Recent reaching studies show a similar link between perception and action during motor adaptation (Tsay et al., 2021) and have proposed a model aligning with the authors' correlations between perception and action. The core signal driving implicit adaptation is the discrepancy between perceived and desired limb position, integrating forward model predictions with proprioceptive/visual feedback.

      We appreciate the reviewer’s suggestion and agree that the Proprioceptive Re-alignment model (PReMo) and Perceptual Error Adaptation model (PEA), offer valuable insights into the relationship between perception and motor adaptation. To explore whether these frameworks apply to gait adaptation, we conducted an extensive modeling analysis. This is shown in Figure 5 and Supplementary Figures S7-S8, and is detailed in the text of Experiment 1 Results section “Modelling analysis for perceptual realignment” (lines 327–375), Methods section “Proprioceptive re-alignment model (PReMo)” (lines 1181-1221), Methods section “Perceptual Error Adaptation model (PEA)” (lines 1222-1247), Methods section “Perceptuomotor recalibration + mapping (PM-ReMap)” (lines 1248-1286), and SI Appendix section “Evaluation and development of perceptual models.” (lines 99-237).

      First, we evaluated how PReMo and PEA models fitted our Ramp Down data. We translated the original variables to walking adaptation variables using a conceptual equivalence explained by one of the features explored by Tsay et al. (2022). Specifically, the manuscript provides guidance on extending the PReMo model from visuomotor adaptation in response to visual-proprioceptive discrepancies, to force-field adaptation in response to mechanical perturbations – which share conceptual similarities with split-belt treadmill perturbations. The manuscript also discusses that, if vision is removed, the proprioceptive shift decays back to zero according to a decay parameter. This description entails that proprioceptive shift cannot increase or develop in the absence of vision. We applied the models to split-belt adaptation in accordance with this information, as described in the SI Appendix: “PReMo variables equivalents for walking adaptation”. As reported in Experiment 1 Results “Modelling analysis for perceptual realignment” (lines 327–375) and Figure 5, neither PReMo nor PEA adequately captured the key features of our Ramp Down data: “The models could not capture the matching-then-divergent behavior of Δ motor output, performing significantly worse than the recalibration + mapping model (PReMo minus recalibration+mapping BIC difference = 24.591 [16.483, 32.037], PEA minus recalibration+mapping BIC difference = 6.834 [1.779, 12.130], mean [CI]). Furthermore, they could not capture the perceptual realignment and instead predicted that the right leg would feel faster than the left throughout the entire Ramp Down”.

      Second, we used simulations to confirm that PReMo and PEA cannot account for the perceptual realignment observed in our study, and to understand why. At adaptation plateau, PReMo predicts that perceived and actual step length asymmetry converge, as shown in Fig. S7A, top, and as detailed in the SI Appendix “Original PReMo simulations”. We found that this is because PReMo assumes that perceptual realignment arises specifically from mismatches between different sensory modalities. This assumption works for paradigms that introduce an actual mismatch between sensory modalities, such as visuomotor adaptation paradigms with a mismatch between vision and proprioception. This assumption also works for paradigms that indirectly introduce a mismatch between integrated sensory information from different sensory modalities. In force-field adaptation, both proprioceptive and visual inputs are present and realistic, but when these inputs are integrated with sensory predictions, the resulting integrated visual estimate is mismatched compared to the integrated proprioceptive estimate. In contrast, the assumption that perceptual realignment arises from sensory modalities mismatches does not work for paradigms that involve a single sensory modality. Split-belt adaptation only involves proprioception as no visual feedback is given, and perceptual realignment arises from discrepancies between predicted and actual motor outcomes, rather than between integrated sensory modalities.

      To overcome this limitation, we reinterpreted the variables of the PReMo model, while keeping the original equations, to account for realignment driven by mismatches of the same nature as the perturbation driving adaptation. As reported in the SI Appendix “Iterative simulations for the development of PM-ReMap”, the simulation (Fig. S7A, middle row) “showed perceptual realignment at adaptation plateau, addressing a limitation of the original model. However, it failed to account for the Ramp Down perceptual results, inaccurately predicting that belt speeds feel equal when they are actually equal (Fig. S7A, middle row, perceived perturbation decays alongside actual perturbation and converge to zero at the end of the Ramp Down). […] This occurs because, under the retained PReMo equations, β<sub>p</sub> and β<sub>v</sub> change immediately and are proportional to the difference between and on each trial, so that they ramp down to zero in parallel with the perturbation”.

      We also noted that the simulations of the original and reinterpreted PReMo models could also not support the operation of the mapping mechanism observed in the Ramp Down (Fig. S7B). We describe that “This occurs because the overall motor output x<sub>p</sub>, which includes both recalibration and mapping mechanisms, changes gradually according to the learning rate 𝐾. Consequently, changes in 𝐺 take many trials to be fully reflected in x<sub>p</sub>. Hence, we found complementary limitations where PReMo assumes perceptual realignment changes immediately while mapping adjustments develop gradually – but the opposite is true in our data”.

      We therefore modified the PReMo equations and developed a new model, called perceptuomotor recalibration + mapping (PM-ReMap) that addresses these limitations and is able to capture our Ramp Down motor and perceptual results. As described in the SI Appendix “Iterative simulations for the development of PM-ReMap”, “we introduced an update equation for β<sub>p</sub> so that it changes gradually trial-by-trial according to the learning rate 𝐾. We then removed the learning rate from the update equation for x<sub>p</sub> so that it integrates two distinct types of changes: 1) the gradual changes in driven by β<sub>p</sub> and representing the recalibration mechanism, and 2) the immediate changes in 𝐺 – representing the mapping mechanism”. The final equations of the PM-ReMap model are as follows:

      As reported in Experiment 1 Results, “Modelling analysis for perceptual realignment”, and as shown in Fig. 5C, “the PM-ReMap model captured the Δ motor output in the Ramp Down with performance comparable to that of the recalibration + mapping model (BIC difference = 2.381 [-0.739, 5.147], mean [CI]). It also captured perceptual realignment, predicting that some intermediate belt speed difference in the Ramp Down is perceived as “equal speeds” (, Fig. 5C)”. We also found that the estimated aligned with the empirical measurement of the PSE in the Ramp Down both at group and individual level: “At group level, was comparable to the upper bound of compensation<sub>perceptual</sub> (difference = -7 [-15, 1]%, mean [CI]), but significantly larger than the lower bound (difference = 19 [8, 31]%, mean [CI]). Furthermore, we found a significant correlation between individual participants’ and their upper bound of compensation<sub>perceptual</sub> (r=0.63, p=0.003), but not their lower bound (r=0.30, p=0.203). Both sets of results are consistent with those observed for the recalibration + mapping model”.

      Based on these findings, we summarize that PM-ReMap “extends the recalibration + mapping model by incorporating the ability to account for forgetting – typical of state space models – while still effectively capturing both recalibration and mapping mechanisms. However, performance of the PM-ReMap model does not exceed that of the simpler recalibration + mapping model, suggesting that forgetting and unlearning do not have a substantial impact on the Ramp Down”.

      Reviewer #2 (Public review):

      Recent findings in the field of motor learning have pointed to the combined action of multiple mechanisms that potentially contribute to changes in motor output during adaptation. A nearly ubiquitous motor learning process occurs via the trial-by-trial compensation of motor errors, often attributed to cerebellar-dependent updating. This error-based learning process is slow and largely unconscious. Additional learning processes that are rapid (e.g., explicit strategy-based compensation) have been described in discrete movements like goal-directed reaching adaptation. However, the role of rapid motor updating during continuous movements such as walking has been either under-explored or inconsistent with those found during the adaptation of discrete movements. Indeed, previous results have largely discounted the role of explicit strategy-based mechanisms for locomotor learning. In the current manuscript, Rossi et al. provide convincing evidence for a previously unknown rapid updating mechanism for locomotor adaptation. Unlike the now well-studied explicit strategies employed during reaching movements, the authors demonstrate that this stimulus-response mapping process is largely unconscious. The authors show that in approximately half of subjects, the mapping process appears to be memory-based while the remainder of subjects appear to perform structural learning of the task design. The participants that learned using a structural approach had the capability to rapidly generalize to previously unexplored regions of the perturbation space.

      One result that will likely be particularly important to the field of motor learning is the authors' quite convincing correlation between the magnitude of proprioceptive recalibration and the magnitude error-based updating. This result beautifully parallels results in other motor learning tasks and appears to provide a robust marker for the magnitude of the mapping process (by means of subtracting off the contribution of error-based motor learning). This is a fascinating result with implications for the motor learning field well beyond the current study.

      A major strength of this manuscript is the large sample size across experiments and the extent of replication performed by the authors in multiple control experiments.

      Finally, I commend the authors on extending their original observations via Experiment 2. While it seems that participants use a range of mapping mechanisms (or indeed a combination of multiple mapping mechanisms), future experiments may be able to tease apart why some subjects use memory versus structural mapping. A future ability to push subjects to learn structurally-based mapping rules has the potential to inform rehabilitation strategies.

      Overall, the manuscript is well written, the results are clear, and the data and analyses are convincing. The manuscript's weaknesses are minor, mostly related to the presentation of the results and modeling.

      Weaknesses:

      The overall weaknesses in the manuscript are minor and can likely be addressed with textual changes.

      (1) A key aspect of the experimental design is the speed of the "ramp down" following the adaptation period. If the ramp-down is too slow, then no after-effects would be expected even in the alternative recalibration-only/errorbased only hypothesis. How did the authors determine the appropriate rate of ramp-down? Do alternative choices of ramp-down rates result in step length asymmetry measures that are consistent with the mapping hypothesis?

      We thank the reviewer for their insightful comment regarding the rate of the Ramp Down following the adaptation period and its potential impact on aftereffects under different hypotheses. We added a detailed explanation for how we determined the Ramp Down design, including analyses of previous work, to the SI Appendix, “Ramp Down design”, lines 22-98. We also describe the primary points in the main Methods section, “Ramp Tasks”, lines 978-991:

      As described in SI Appendix, “Ramp Down design”, the Ramp Down task was specifically designed to measure the pattern of aftereffects in a way that ensured reliable and robust measurements with sufficient resolution across speeds, and that minimized washout to prevent confounding the results. To balance time constraints with a measurement resolution adequate for capturing perceptual realignment, we used 0.05 m/s speed decrements, matching the perceptual sensitivity estimated from our re-analysis of the baseline data from Leech et al. (Leech et al., 2018a). To obtain robust motor aftereffect measurements, we collected three strides at each speed condition, as averaging over three strides represents the minimum standard for consistent and reliable aftereffect estimates in split-belt adaptation (typically used in catch trials) (Leech et al., 2018a; Rossi et al., 2019; Vazquez et al., 2015). To minimize unwanted washout by forgetting and/or unlearning, we did not pause the treadmill between adaptation and the post-adaptation ramp tasks, and ensured the Ramp Down was relatively quick, lasting approximately 80 seconds on average. Of note, the Ramp Down design ensures that even in cases of partial forgetting, the emergence pattern of aftereffects remains consistent with the underlying hypotheses.

      In the SI Appendix, we explain that, while we did not test longer ramp-down durations directly, previous data suggest that durations of up to at least 4.5 minutes would yield step length asymmetry measures consistent with our results and the mapping hypothesis. Additionally, our control experiments replicated the behavior observed in the Ramp Down using speed match tasks lasting only 30 seconds, further supporting the robustness of our findings across varying durations.

      (2) Overall, the modeling as presented in Figure 3 (Equation 1-3) is a bit convoluted. To my mind, it would be far more useful if the authors reworked Equations 1-3 and Figure 3 (with potential changes to Figure 2) so that the motor output (u) is related to the stride rather than the magnitude of the perturbation. There should be an equation relating the forward model recalibration (i.e., Equation 1) to the fraction of the motor error on a given stride, something akin to u(k+1) = r * (u(k) - p(k)). This formulation is easier to understand and commonplace in other motor learning tasks (and likely what the authors actually fit given the Smith & Shadmehr citation and the derivations in the Supplemental Materials). Such a change would require that Figure 3's independent axes be changed to "stride," but this has the benefit of complementing the presentation that is already in Figure 5.

      We reworked these equations (now numbered 4-6, lines 207-209) so that the motor output u is related to stride k as suggested by the reviewer:

      We changed Figure 2 and Figure 3 accordingly, adding a “stride” x-axis to the Ramp Down data figure.

      Reviewer #2 (Recommendations for the authors):

      I think that some changes to the text/ordering could improve the manuscript's readability. In particular:

      (1) My feeling is that much of the equations presented in the Methods section should be moved to the Results section. Particularly Equations 9-11. The introduction of these motor measures should likely precede Figure 1, as their definitions form the crux of Figure 1 and the subsequent analyses.

      (2) It is unclear to me why many of the analyses and discussion points have been relegated to Supplemental Material. I would significantly revise the manuscript to move much of the content from Supplemental Material to the Methods and Discussion (where appropriate). Even the Todorov and Herzfeld models can likely simply be referenced in the text without a need for their full description in the Supplemental material - as their implementations appear to this reviewer as consistent with those presented in the respective papers. Beyond the Supplementary Tables, my feeling is that nearly all of the content in Supplemental can either be simply cited (e.g. alternative model implementations) or directly incorporated into the main manuscript without compromising the readability of the manuscript.

      We reorganized the manuscript and SI Appendix substantially, moving content to the Results or other main text section. The changes included those recommended by the reviewer:

      • We moved the equations describing step length asymmetry, perturbation, and Δ motor output (originally numbered Eq. 9-11) to the Results section (Experiment 1, “Motor paradigm and hypothesis”, lines 131-133, now numbered Eq. 1-3).

      • We moved Supplementary Methods to the main Methods section

      • We moved the most relevant content of the Supplementary Discussion to the main Discussion, and removed the less relevant content altogether.

      • We moved the methods describing walking-adaptation specific implementation of the Todorov and Herzfeld models to the main Methods section and removed the portions that were identical to the original implementation.

      • We moved the control experiments to the main text (main Results and Methods sections).

      • We removed the SI Appendix section “Experiment 1 mechanisms characteristics”

      Reviewer #3 (Public review):

      Summary:

      In this work, Rossi et al. use a novel split-belt treadmill learning task to reveal distinct sub-components of gait adaptation. The task involved following a standard adaptation phase with a "ramp-down" phase that helped them dissociate implicit recalibration and more deliberate SR map learning. Combined with modeling and re-analysis of previous studies, the authors show multiple lines of evidence that both processes run simultaneously, with implicit learning saturating based on intrinsic learning constraints and SR learning showing sensitivity to a "perceptual" error. These results offer a parallel with work in reaching adaptation showing both explicit and implicit processes contributing to behavior; however, in the case of gait adaptation the deliberate learning component does not appear to be strategic but is instead a more implicit SR learning processes.

      Strengths:

      (1) The task design is very clever and the "ramp down" phase offers a novel way to attempt to dissociate competing models of multiple processes in gait adaptation.

      (2) The analyses are thorough, as is the re-analysis of multiple previous data sets.

      (3) The querying of perception of the different relative belt speeds is a very nice addition, allowing the authors to connect different learning components with error perception.

      (4) The conceptual framework is compelling, highlighting parallels with work in reaching but also emphasizing differences, especially w/r/t SR learning versus strategic behaviors. Thus the discovery of an SR learning process in gait adaptation would be both novel and also help conjoin different siloed subfields of motor learning research.

      Weaknesses:

      (1) The behavior in the ramp-down phase does indeed appear to support multiple learning processes. However, I may have missed something, but I have a fundamental worry about the specific modeling and framing of the "SR" learning process. If I correctly understand, the SR process learns by adjusting to perceived L/R belt speed differences (Figure 7). What is bugging me is why that process would not cause the SR system to still learn something in the later parts of the ramp-down phase when the perceived speed differences flip (Figure 4). I do believe this "blunted learning" is what the SR component is actually modeled with, given this quote in the caption to Figure 7: "When the perturbation is perceived to be opposite than adaptation, even if it is not, mapping is zero and the Δ motor output is constant, reflecting recalibration adjustments only." It seems a priori odd and perhaps a little arbitrary to me that a SR learning system would just stop working (go to zero) just because the perception flipped sign. Or for that matter "generalize" to a ramp-up (i.e., just learn a new SR mapping just like the system did at the beginning of the first perturbation). What am I missing that justifies this key assumption? Or is the model doing something else? (if so that should be more clearly described).

      We concur that this point was confusing, and we performed additional analyses and revised the text to improve clarity. Specifically, we clarify that the stimulus-response mapping does indeed still learn in the second portion of the Ramp Down, when the perceived speed differences flip. However, learning by the mapping mechanism proceeds slowly – at a rate comparable to that of forward model recalibration, taking several minutes. The duration of the task is relatively short, so that learning by the mapping mechanism is limited. We schematize the learning to be zero as an approximation. We have now included an additional modelling analysis (as part of our expanded perceptual modelling analyses), which shows there is no significant improvement in modelling performance when accounting for forgetting of recalibration or learning in the opposite direction by mapping in the second half of the ramp down, supporting this approximation. We explain this and other revisions in detail below.

      We include a Discussion section “Stimulus-response mapping is flexible but requires learning” where we improve our explanation of the operation of the mapping mechanism in the Ramp Down by leveraging the framework proposed by Iturralde and Torres-Oviedo, 2019. The section first explains that mapping operates relative to a new equilibrium corresponding to the current forward model calibration (lines 595-603):

      “The mapping mechanism observed in our study aligns with the corrective responses described by Iturralde and Torres-Oviedo, which operate relative to a recalibrated "new normal" rather than relying solely on environmental cues (Iturralde and Torres-Oviedo, 2019). Accordingly, our findings suggest a tandem architecture: forward model recalibration adjusts the nervous system's "normal state," while stimulus-response mapping computes motor adjustments relative to this "new normal." This architecture explains the sharp transition from flexible to rigid motor adjustments observed in our Ramp Down task. The transition occurs at the configuration perceived as "equal speeds" (~0.5 m/s speed difference) because this corresponds to the recalibrated “new normal”.”

      The following paragraph (lines 604-611) explain how this concept reflects in the first half of the Ramp Down:

      “In the first half of the Ramp Down, participants adequately modulated their walking pattern to accommodate the gradually diminishing perturbation, achieving symmetric step lengths. Due to the recalibrated “new normal”, perturbations within this range are perceived as congruent with the direction of adaptation but reduced in magnitude. This allows the mapping mechanism to flexibly modulate the walking pattern by using motor adjustments previously learned during adaptation. Importantly, the rapid duration of the Ramp Down task rules out the possibility that the observed modulation may instead reflect washout, as confirmed by the fact the aftereffects measured post-Ramp-Down were comparable to previous work (Kambic et al., 2023; Reisman et al., 2005).”

      The last paragraph (lines 612–622) explain the second half of the Ramp Down in light of the equilibrium concept and of the slow learning rate of mapping:

      “In the second half of the Ramp Down, aftereffects emerged as participants failed to accommodate perturbations smaller than the recalibrated “new normal”. These perturbations were perceived as opposite to the adaptation perturbation and, therefore, novel. Accordingly, the mapping mechanism responded as it would to a newly introduced perturbation, rather than leveraging previously learned adjustments (Iturralde and TorresOviedo, 2019). Due to the rapid nature of the Ramp Down, the mapping mechanism lacked sufficient time to learn the novel motor adjustments required for these perturbations – a process that typically takes several minutes, as shown by our baseline ramp tasks and control experiments. As mapping-related learning was negligible, the rigid recalibration adjustments dominated during this phase. Consequently, the walking pattern did not change to accommodate the gradually diminishing perturbation, leading to the emergence of aftereffects.”

      We also revised the Discussion section “Mapping operates as memory-based in some people, structure-based in others”, to clarify the processes of interpolation and extrapolation (lines 689-700). This revision helps explain why mapping may generalize to a ramp-up faster than learning a perturbation perceived in the opposite direction (when considered together with the explanation that mapping operates relative to the new recalibrated equilibrium) In the former case (generalize to a ramp-up), a structure-based mapping can use the extrapolation computation: it leverages previous knowledge of which gait parameters should be modified and how – e.g., modulating the positioning our right foot to be more forward on the treadmill – but must extrapolate the specific parameter values – e.g., how more far forward. In the latter case (learning a perturbation perceived in the opposite direction), even a structure-based mapping would need to figure out what gait parameters to change completely anew – e.g., modulating the positioning of the foot in the opposite way, to be less forward, requires a different set of control policies.

      We mentioned above that this illustration of the mapping mechanism relies on the assumption that the additional learning of the mapping mechanism in the second half of the Ramp Down is negligible. As part of our revisions for the “Modelling analysis for perceptual realignment”, we developed a new model – the perceptuomotor recalibration + mapping model (PM-ReMap) that extends the recalibration + mapping model by accounting for the possibility that Δ motor output is not constant in the second half of the Ramp Down (main points are at lines 355-275, and Figure 5; see response to Reviewer #1 (Public review), Comment 4, for a detailed explanation). We find that performance of the PM-ReMap model does not exceed that of the simpler recalibration + mapping model, suggesting that the Δ motor output does not change substantially in the second half of the Ramp Down. Note that, if the Δ motor output decayed in this phase, it could be due to forgetting or unlearning of the recalibration mechanism, or also it could be due to the mapping mechanism learning in the opposite direction than it did in adaptation. In the Results section, we focused on describing recalibration forgetting/unlearning for simplicity. However, in the Discussion section “Mapping may underly savings upon re-exposure to the same or different perturbation”, we explain in detail how the motor aftereffects also depend on the mapping mechanism learning in the opposite direction, as corroborated by our Control experiments and previous work. Therefore, the finding that the PM-ReMap model performance does not exceed that of the simpler recalibration + mapping model suggest that both effects – recalibration forgetting/unlearning and opposite-direction-learning of mapping – are not significant, nor is their combined effect on the Δ motor output.

      (2) A more minor point, but given the sample size it is hard to be convinced about the individual difference analysis for structure learning (Figure 5). How clear is it that these two groups of subjects are fully separable and not on a continuum? The lack of clusters in another data set seems like a somewhat less than convincing control here.

      We performed an additional analysis – a silhouette analysis – to confirm the presence of these clusters in our data (Methods, lines 1070-1072). The results, reported in Experiment 2 Results, lines 487-490, confirmed that there is strong evidence for the presence of these clusters:

      “A silhouette analysis confirmed strong evidence for these clusters: the average silhouette score was 0.90, with 19 of 20 participants scoring above 0.7 – considered strong evidence – and one scoring between 0.5 and 0.7 – considered reasonable evidence (Dalmaijer et al., 2022; Kaufman and Rousseeuw, 1990; Rousseeuw, 1987).”

      Reviewer #3 (Recommendations for the authors):

      (1) I think there is far too much content pushed into the supplement. The other models and full model comparison should be in the main text, as should the re-analysis of previous data sets. Also, key discussion points should not be in the supplement either.

      We reorganized the manuscript and SI Appendix substantially, including the changes recommended by the reviewer. Please refer to our response to “Reviewer #2 - Recommendations for the authors” for a detailed explanation.

      (2) Line 649: in reaching the calibration system does respond to different error sizes; why not here?

      We apologize for the confusion. Similar to reaching adaptation, the recalibration in walking adaptation also scales based on the error size experienced in adaptation. What we meant to convey is that, once a calibration has been acquired in adaptation, the recalibration process is rigid in that it can only change gradually. So if we jump the perturbation to a different value, the original calibration is transiently used until the system has the time to recalibrate again. For example, if we jump abruptly from the adaptation perturbation to a perturbation of zero in postadaptation, the adaptation calibration persists resulting in aftereffects.

      We revised the manuscript to clarity these points. First, we explicitly report that forward model recalibration scales based on the error size experienced in adaptation:

      “We next compared Medium Descend and Small Abrupt (1m/s or 0.4m/s perturbation), and found that recalibration contributed significantly more for the smaller perturbation (larger compensation<sub>perceptual</sub> / compensation<sub>motor-total</sub> in Small Abrupt than Medium Descend, Fig. 8A middle and Table S6).” (Control experiments Results, lines 422-425)

      “the mapping described here shares some characteristics with explicit mechanisms, such as flexibility and modulation by error size” (Discussion, lines 630-631)

      Additionally, we leverage the framework proposed by Tsay et al., 2024, to improve our explanation of the characteristics of the different learning mechanisms. Please refer to our response to “Reviewer #1 (Public review)”, Comment (1).

      (3) It would be nice to see bar graphs showing model comparison results for each individual subject in the main text, and to see how many subjects are best fit by the SR+calibration model.

      We included the recommended bar graphs to Figure 3 and Figure 5.

      (4) Why exactly does the "perturbation" in Figure 3 have error bars?

      In walking adaptation, the perturbation that participants experienced is closely dictated by the treadmill belt speeds, but not exactly, because participants are free to move their feet as they like, so that their ankle movement may not always match the treadmill belts exactly. Therefore, we record the perturbation that is actually experienced by each participant’s feet using markers. We then display the mean and standard error of this perturbation.

      We moved the equation describing the perturbation measure from the Methods to the Experiment 1 Results (lines 131-133, Eq. 1-3). We believe this change will help the reader understand the measures depicted.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Zhang et al. demonstrate that CD4<sup>+</sup> single positive (SP) thymocytes, CD4<sup>+</sup> recent thymic emigrants (RTE), and CD4<sup>+</sup> T naive (Tn) cells from Cd11c-p28-flox mice, which lack IL-27p28 selectively in Cd11c+ cells, exhibit a hyper-Th1 phenotype instead of the expected hyper Th2 phenotype. Using IL-27R-deficient mice, the authors confirm that this hyper-Th1 phenotype is due to IL-27 signaling via IL-27R, rather than the effects of monomeric IL-27p28. They also crossed Cd11c-p28-flox mice with autoimmune-prone Aire-deficient mice and showed that both T cell responses and tissue pathology are enhanced, suggesting that SP, RTE, and Tn cells from Cd11c-p28-flox mice are poised to become Th1 cells in response to self-antigens. Regarding mechanism, the authors demonstrate that SP, RTE, and Tn cells from Cd11c-p28-flox mice have reduced DNA methylation at the IFN-g and Tbx21 loci, indicating 'de-repression', along with enhanced histone tri-methylation at H3K4, indicating a 'permissive' transcriptional state. They also find evidence for enhanced STAT1 activity, which is relevant given the well-established role of STAT1 in promoting Th1 responses, and surprising given IL-27 is a potent STAT1 activator. This latter finding suggests that the Th1-inhibiting property of thymic IL-27 may not be due to direct effects on the T cells themselves.

      Strengths:

      Overall the data presented are high quality and the manuscript is well-reasoned and composed. The basic finding - that thymic IL-27 production limits the Th1 potential of SP, RTE, and Tn cells - is both unexpected and well described.

      Weaknesses:

      A credible mechanistic explanation, cellular or molecular, is lacking. The authors convincingly affirm the hyper-Th1 phenotype at epigenetic level but it remains unclear whether the observed changes reflect the capacity of IL-27 to directly elicit epigenetic remodeling in developing thymocytes or knock-on effects from other cell types which, in turn, elicit the epigenetic changes (presumably via cytokines). The authors propose that increased STAT1 activity is a driving force for the epigenetic changes and resultant hyper-Th1 phenotype. That conclusion is logical given the data at hand but the alternative hypothesis - that the hyper-STAT1 response is just a downstream consequence of the hyper-Th1 phenotype - remains equally likely. Thus, while the discovery of a new anti-inflammatory function for IL-27 within the thymus is compelling, further mechanistic studies are needed to advance the finding beyond phenomenology.

      Thank you for your insightful comments and suggestions. We appreciate your feedback and have carefully considered the concerns raised regarding the mechanistic explanation of our findings. To address the issue of whether developing thymocytes are the direct targets of IL-27, we plan to conduct further studies using Cd4-IL-27ra knockout mice or mixed bone marrow chimeras consisting of wildtype and IL-27ra knockout cells. This approach will help us determine if IL-27 directly induces epigenetic remodeling in thymocytes or if the observed effects are secondary to influences from other cell types.

      Regarding the potential autocrine loop contributing to STAT1 hyperactivation, we have performed preliminary experiments by adding IFN-γ antibody to CD4<sup>+</sup> T cell cultures and observed no significant impact on STAT1 phosphorylation. If necessary, we will further investigate this possibility in vivo using Cd4-Ifng and CD11c-p28 double knockout mice.

      The detailed mechanisms underlying STAT1 hyperactivation remain to be elucidated. Recent studies have shown that IL-27p28 can act as an antagonist of gp130-mediated signaling. Structural analyses have also demonstrated that IL-27p28 interacts with EBI3 and the two receptor subunits IL-27Rα and gp130. Given these findings and the similar phenotypes observed in p28 and IL-27ra deficient mice, we speculate that the deficiency of either p28 or IL-27ra may increase the availability of gp130 for signaling by other cytokines. We will focus our future research on gp130-related cytokines to identify potential candidates that could lead to enhanced STAT1 activation in the absence of p28. Alternatively, the release of EBI3 in p28-deficient conditions may promote its interaction with other cytokine subunits. IL-35, which is composed of EBI3 and p35, is of particular interest given the involvement of IL-27Rα in its signaling pathway.

      To narrow down the candidate cytokines, we reanalyzed single-cell RNA sequencing data from CD11c-cre p28<sup>f/f</sup> and wild-type thymocytes (Signal Transduct Target Ther. 2022, DOI: 10.1038/s41392-022-01147-z). Our analysis revealed that thymic dendritic cells (DCs) were categorized into two distinct clusters, with both Il12a (p35, which forms IL-35 with EBI3) and Clcf1 (CLCF1) being upregulated in CD11c-cre p28<sup>f/f</sup> mice. In CD4 single-positive (SP) thymocytes, the expression levels of gp130 and IL-12Rβ2 (the receptor for IL-35) were comparable between knockout and wild-type mice. However, the mRNA levels of Lifr and Cntfr were low in CD4 SP thymocytes.

      Author response image 1.

      Single-cell RNA sequencing data from CD11c-cre p28<sup>f/f</sup> (KO) and wild-type thymocytes (Signal Transduct Target Ther. 2022, DOI: 10.1038/s41392-022-01147-z).

      We have planned to assess the protein levels of IL-35 and CLCF1 in dendritic cells, as well as their respective receptors, to evaluate their effects on STAT1 phosphorylation in CD4<sup>+</sup> thymocytes from both wild-type and p28-deficient mice. Unfortunately, we have encountered challenges with the mouse breeding and anticipate that it will take approximately six months to obtain the appropriate genotype necessary to complete these experiments.

      Reviewer #2 (Public Review):

      Summary:

      Naïve CD4 T cells in CD11c-Cre p28-floxed mice express highly elevated levels of proinflammatory IFNg and the transcription factor T-bet. This phenotype turned out to be imposed by thymic dendritic cells (DCs) during CD4SP T cell development in the thymus [PMID: 23175475]. The current study affirms these observations, first, by developmentally mapping the IFNg dysregulation to newly generated thymic CD4SP cells [PMID: 23175475], second, by demonstrating increased STAT1 activation being associated with increased T-bet expression in CD11c-Cre p28-floxed CD4 T cells [PMID: 36109504], and lastly, by confirming IL-27 as the key cytokine in this process [PMID: 27469302]. The authors further demonstrate that such dysregulated cytokine expression is specific to the Th1 cytokine IFNg, without affecting the expression of the Th2 cytokine IL-4, thus proposing a role for thymic DC-derived p28 in shaping the cytokine response of newly generated CD4 helper T cells. Mechanistically, CD4SP cells of CD11c-Cre p28-floxed mice were found to display epigenetic changes in the Ifng and Tbx21 gene loci that were consistent with increased transcriptional activities of IFNg and T-bet mRNA expression. Moreover, in autoimmune Aire-deficiency settings, CD11c-Cre p28-floxed CD4 T cells still expressed significantly increased amounts of IFNg, exacerbating the autoimmune response and disease severity. Based on these results, the investigators propose a model where thymic DC-derived IL-27 is necessary to suppress IFNg expression by CD4SP cells and thus would impose a Th2-skewed predisposition of newly generated CD4 T cells in the thymus, potentially relevant in autoimmunity.

      Strengths:

      Experiments are well-designed and executed. The conclusions are convincing and supported by the experimental results.

      Weaknesses:

      The premise of the current study is confusing as it tries to use the CD11c-p28 floxed mouse model to explain the Th2-prone immune profile of newly generated CD4SP thymocytes. Instead, it would be more helpful to (1) give full credit to the original study which already described the proinflammatory IFNg+ phenotype of CD4 T cells in CD11c-p28 floxed mice to be mediated by thymic dendritic cells [PMID: 23175475], and then, (2) build on that to explain that this study is aimed to understand the molecular basis of the original finding.

      In its essence, this study mostly rediscovers and reaffirms previously reported findings, but with different tools. While the mapping of epigenetic changes in the IFNg and T-bet gene loci and the STAT1 gene signature in CD4SP cells are interesting, these are expected results, and they only reaffirm what would be assumed from the literature. Thus, there is only incremental gain in new insights and information on the role of DC-derived IL-27 in driving the Th1 phenotype of CD4SP cells in CD11c-p28 floxed mice.

      Thank you for your valuable comments and suggestions. We appreciate your input and have carefully reviewed the concerns raised regarding the premise and novelty of our study.

      Indeed, the current study is built upon the foundational work of Zhang et al. (PMID: 23175475), which first described the proinflammatory IFN-γ<sup>+</sup> phenotype of CD4 T cells in CD11c-p28 floxed mice mediated by thymic dendritic cells. We have cited this study multiple times in our manuscript to acknowledge its significance. Our goal was to expand on this original finding by exploring the functional bias of newly generated CD4<sup>+</sup> T cells, elucidating the mechanisms underlying the hyper-Th1 phenotype in the absence of thymic DC-derived IL-27, and exploring its relevance in pathogenesis of autoimmunity.

      Our study revisits this phenomenon with a focus on the molecular and epigenetic changes that drive the Th1 bias in CD4SP cells. We demonstrated that the deletion of p28 in thymic dendritic cells leads to an unexpected hyperactivation of STAT1, which is associated with epigenetic modifications that favor Th1 differentiation. These findings provide a deeper understanding of the molecular basis behind the original observation of the Th1-skewed phenotype in CD11c-p28 floxed mice.

      However, as you pointed out, there is still a gap in understanding the precise link between p28 deficiency and STAT1 activation. We acknowledge that our study primarily reaffirms previously reported findings with different tools and approaches. While the mapping of epigenetic changes in the IFN-γ and T-bet gene loci and the STAT1 gene signature in CD4SP cells are interesting, they are indeed expected results based on the existing literature. This limits the novelty and incremental gain in new insights provided by our study.

      To address this gap and enhance the novelty of our findings, we plan to conduct further investigations to elucidate the detailed mechanisms connecting p28 deficiency to STAT1 hyperactivation. We will explore potential compensatory pathways or alternative signaling mechanisms that may contribute to the observed epigenetic changes and Th1 bias. Additionally, we will consider the broader impact of IL-27 deficiency on the thymic environment and its downstream effects on CD4<sup>+</sup> T cell differentiation.

      We appreciate your feedback and will work to strengthen the mechanistic underpinnings of our study. We believe that these additional efforts will provide a more comprehensive understanding of the role of DC-derived IL-27 in shaping the Th1 phenotype of CD4SP cells and contribute meaningful insights to the field.

      Altogether, the major issues of this study remain unresolved:

      (1) It is still unclear why the p28-deficiency in thymic dendritic cells would result in increased STAT1 activation in CD4SP cells. Based on their in vitro experiments with blocking anti-IFNg antibodies, the authors conclude that it is unlikely that the constitutive activation of STAT1 would be a secondary effect due to autocrine IFNg production by CD4SP cells. However, this possibility should be further tested with in vivo models, such as Ifng-deficient CD11c-p28 floxed mice. Alternatively, is this an indirect effect by other IFNg producers in the thymus, such as iNKT cells? It is necessary to explain what drives the STAT1 activation in CD11c-p28 floxed CD4SP cells in the first place.

      Thank you for your insightful suggestions. We appreciate your feedback and are committed to addressing the critical questions raised regarding the mechanisms underlying STAT1 activation in CD4SP cells in the context of p28 deficiency in thymic dendritic cells.

      To further investigate the potential autocrine loop for IFN-γ production, we will conduct in vivo studies using Cd4-Ifng and CD11c-p28 double knockout mice. This model will allow us to directly test whether IFN-γ produced by CD4SP cells themselves contributes to the observed STAT1 activation. Additionally, this approach will help exclude the possibility of indirect effects from other IFN-γ-producing cells in the thymus, such as invariant natural killer T (iNKT) cells, as suggested by the reviewer.

      As you correctly pointed out, a key unanswered question is what drives the initial STAT1 activation in CD4SP cells of CD11c-p28 floxed mice. Our current hypothesis is that p28 deficiency enhances the responsiveness of developing thymocytes to STAT1-activating cytokines. This hypothesis is supported by several lines of evidence:

      (1) Functional Antagonism: Recent studies have shown that IL-27p28 can act as an antagonist of gp130-mediated signaling. This suggests that in the absence of p28, the inhibitory effect of IL-27p28 on downstream signaling may be lost, leading to increased sensitivity to other cytokines that activate STAT1.

      (2) Structural Insights: Structural studies have demonstrated that IL-27p28 is centrally positioned within the complex formed with EBI3 and the two receptor subunits IL-27Rα and gp130. This positioning implies that p28 deficiency could disrupt the balance of cytokine signaling pathways involving these components.

      (3)  Phenotypic Similarity: We have observed a similar hyper-Th1 phenotype in mice lacking either p28 or IL-27ra. This similarity suggests that the absence of p28 may lead to increased availability of gp130 for signaling by other cytokines, thereby enhancing STAT1 activation.

      Based on these considerations, we hypothesize that the deficiency of p28 results in a greater availability of gp130 to transduce signals from other cytokines, ultimately leading to enhanced STAT1 activation in CD4SP cells. To identify the specific cytokine(s) responsible for this effect, we will focus on gp130-related cytokines, as outlined in our response to Reviewer 1. This will involve reanalysis of single-cell RNA sequencing data and further experimental validation to pinpoint the candidate cytokines driving the observed STAT1 hyperactivation.

      We are confident that these additional studies will provide a clearer understanding of the mechanisms linking p28 deficiency in thymic dendritic cells to increased STAT1 activation in CD4SP cells. We appreciate your guidance and look forward to sharing our findings.

      (2) It is also unclear whether CD4SP cells are the direct targets of IL-27 p28. The cell-intrinsic effects of IL-27 p28 signaling in CD4SP cells should be assessed and demonstrated, ideally by CD4SP-specific deletion of IL-27Ra, or by establishing bone marrow chimeras of IL-27Ra germline KO mice.

      Thanks for the suggestions. Further studies will be performed to test whether developing thymocytes are the direct targets of IL-27 using Cd4-IL-27ra knockout mice or mixed bone marrow chimeras of wildtype and IL-27ra knockout cells. Unfortunately, we have encountered challenges with the mouse breeding and anticipate that it will take approximately six months to obtain the appropriate genotype necessary to complete these experiments.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Is the hyper-STAT1 response seen in T cells from Cd11c-p28-flox mice due to increased availability and/or increased responsiveness to STAT1 activating cytokines? Studies, where SP, RTE, and Tn cells are pulsed ex vivo with IL-27 and/or other STAT1-activating cytokines, would address the latter (with STAT1 phosphorylation as the major readout). Given the ability of IL-27 to activate STAT3, this pathway should also be addressed. It would be of interest if STAT1 signaling is selectively impaired, as suggested by the work of Twohig et al. (doi: 10.1038/s41590-019-0350-0.)(which should be cited and discussed).

      Thank you for your insightful suggestions. We appreciate your input and are committed to addressing the critical questions raised regarding the mechanisms underlying the hyper-activation of STAT1 in T cells from Cd11c-p28-flox mice.

      The detailed mechanisms driving the hyper-activation of STAT1 remain to be fully elucidated. Recent studies have shown that IL-27p28 can act as an antagonist of gp130-mediated signaling. Structural analyses have also demonstrated that IL-27p28 interacts with EBI3 and the two receptor subunits IL-27Rα and gp130. Considering these findings and the similar phenotypes observed in p28 and IL-27ra deficient mice, we speculate that the deficiency of either p28 or IL-27ra may increase the availability of gp130 for signaling by other cytokines. This could potentially enhance the responsiveness of developing thymocytes to STAT1-activating cytokines. We will focus our future research on gp130-related cytokines to identify the candidate(s) responsible for the enhanced STAT1 activation in the absence of p28. Alternatively, the release of EBI3 in the absence of p28 may facilitate its coupling with other cytokine subunits. IL-35, which is composed of EBI3 and p35, is of particular interest given the involvement of IL-27Rα in its signaling pathway.

      To narrow down the candidate cytokines, we reanalyzed single-cell RNA sequencing data from CD11c-cre p28<sup>f/f</sup> and wild-type thymocytes (Signal Transduct Target Ther. 2022, DOI: 10.1038/s41392-022-01147-z). Our analysis revealed that thymic dendritic cells (DCs) were categorized into two distinct clusters, with both Il12a (p35, which forms IL-35 with EBI3) and Clcf1 (CLCF1) being upregulated in CD11c-cre p28<sup>f/f</sup> mice. In CD4 single-positive (SP) thymocytes, the expression levels of gp130 and IL-12Rβ2 (the receptor for IL-35) were comparable between knockout and wild-type mice. However, the mRNA levels of Lifr and Cntfr were low in CD4 SP thymocytes.

      Single-cell RNA sequencing data from CD11c-cre p28<sup>f/f</sup> (KO) and wild-type thymocytes (Signal Transduct Target Ther. 2022, DOI: 10.1038/s41392-022-01147-z).

      We have planned to assess the protein levels of IL-35 and CLCF1 in dendritic cells, as well as their respective receptors, to evaluate their effects on STAT1 phosphorylation in CD4<sup>+</sup> thymocytes from both wild-type and p28-deficient mice. Unfortunately, we have encountered challenges with the mouse crosses and anticipate that it will take approximately six months to obtain the appropriate genotype necessary to complete these experiments.

      As you correctly noted, the ability of IL-27 to activate STAT3 signaling is an important consideration. We have carefully examined this pathway in our current study, and our results indicate that neither total nor phosphorylated STAT3 and STAT4 were found to be altered with IL-27p28 ablation (Figure 5B). This suggests that the impact is indeed specific to the STAT1 axis. We will also consider the possibility of selective impairment of STAT1 signaling, as suggested by the work of Twohig et al. (doi: 10.1038/s41590-019-0350-0), which we will cite and discuss in our revised manuscript.

      We appreciate your guidance and will work diligently to address these questions in our future studies. We look forward to sharing our findings and contributing to a deeper understanding of the role of IL-27 in the regulation of STAT1 activation in T cells.

      (2) It may be that the hyper-Th1 phenotype is not due to cell-intrinsic differences in STAT1 signaling (see Major Point 1) but rather, hyper-responsiveness to TCR + Co-stimulation (as provided in the re-stim assays used throughout). This issue is particularly relevant for the ChIP studies where the author notes that, "...we chose to treat the cells with anti-CD3 and anti-CD28 for 3 days prior to the assay". Why not treat these cells ex vivo with STAT1-activating cytokines instead of anti-CD3/CD28? The current methodology makes it impossible to distinguish between enhanced TCR/CD28 and cytokine signaling, and ultimately does not address SP, RTE, and Tn cells (since they are now activated, blasts.).

      Thank you for raising this important point. We appreciate your feedback and fully recognize the limitations of our current methodology, which uses anti-CD3/CD28 stimulation for ChIP experiments. This approach indeed complicates the distinction between enhanced TCR/CD28 signaling and cytokine-mediated STAT1 activation, particularly in the context of SP, RTE, and Tn cells, which become activated blasts under these conditions.

      To address these concerns and provide more precise insights into the mechanisms underlying the hyper-Th1 phenotype, we are revising our experimental strategy. Specifically, we are shifting our focus to directly investigate the role of STAT1-activating cytokines in the absence of p28. Based on our previous analysis and re-evaluation of single-cell RNA sequencing data, we have identified IL-35 and CLCF1 as the most promising candidate cytokines.

      We are now planning to perform ChIP experiments using these cytokines directly, rather than relying on TCR + co-stimulation. This approach will allow us to more accurately evaluate the impact of these cytokines on STAT1 signaling in CD4<sup>+</sup> T cells. By treating cells ex vivo with IL-35 and CLCF1, we aim to elucidate whether the observed hyper-Th1 phenotype is driven by enhanced responsiveness to these cytokines, independent of TCR/CD28 signaling.

      We regret to inform you that we have encountered unforeseen challenges with mouse crosses, which have delayed our progress. As a result, we anticipate a delay of approximately six months to obtain the appropriate genotypes necessary to complete these experiments. We understand the importance of these revisions and are committed to overcoming these challenges to provide a more robust and accurate analysis.

      (3) Studies involving STAT1-deficient mice are necessary (ideally with STAT1 deficiency restricted to the T cell compartment). At a minimum, it must be confirmed that these phenocopy Cd11c-p28-flox mice in terms of SP, RTE, and Tn cells (and their Th1-like character). If a similar hyper-Th1 phenotype is not seen, then the attendant hyper STAT1 response can only be viewed as a red herring.

      Thank you for raising this important consideration. We acknowledge the significance of addressing the role of STAT1 specifically within the T cell compartment to validate the mechanisms underlying the hyper-Th1 phenotype observed in Cd11c-p28-flox mice.

      We agree that studies involving STAT1-deficient mice, particularly with STAT1 deficiency restricted to the T cell compartment, are essential to confirm whether the hyper-Th1 phenotype is directly driven by STAT1 hyperactivation in T cells. Ideally, such studies would help determine if STAT1 deficiency in T cells phenocopies the Cd11c-p28-flox mice, particularly in terms of the SP, RTE, and Tn cells and their Th1-like characteristics.

      Unfortunately, we currently face challenges in obtaining and breeding the appropriate STAT1 conditional knockout mice with T cell-specific deletion. This has limited our ability to conduct these experiments in a timely manner. However, we recognize the importance of these studies and are actively working to secure the necessary resources and models to address this critical question.

      We understand that without these experiments, any conclusions drawn about the role of STAT1 hyperactivation in driving the hyper-Th1 phenotype must be considered with caution. If a similar hyper-Th1 phenotype is not observed in STAT1-deficient T cells, then the hyper-STAT1 response may indeed be a secondary or compensatory effect rather than a primary driver.

      We are committed to pursuing these studies and will prioritize them in our future work. We will keep you informed of our progress and will update the manuscript with the results of these experiments once completed. We appreciate your patience and understanding as we work to address this important aspect of our research.

      (4) The authors mine their RNA-seq data using a STAT1 geneset sourced from studies involving IL-21 as the upstream stimulus. Why was this geneset was chosen? It is true that IL-21 can activate STAT1 but STAT3 is typically viewed as its principal signaling pathway. There are many more appropriate genesets, especially from studies where T cells are cultured with traditional STAT1 stimuli (e.g. IL-27 in Hirahara et al., Immunity 2015 or interferons in Iwata et al., Immunity 2017)doi: 10.1016/j.immuni.2015.04.014, 10.1016/j.immuni.2017.05.005).

      Thank you for your insightful comments. We appreciate your attention to the choice of the STAT1 gene set in our RNA-seq analysis.

      Initially, we selected the STAT1 gene set from a study involving IL-21 stimulation (GSE63204) because IL-21 is known to activate STAT1, despite STAT3 being its principal signaling pathway. However, we acknowledge that this choice may not have been optimal given the context of our study, which focuses on the role of IL-27 and its impact on STAT1 signaling in T cells.

      We agree that gene sets derived from studies using more canonical STAT1 stimuli, such as IL-27 or interferons, would be more relevant for our analysis. In response to your suggestion, we have revised our approach and adopted a gene set from GSE65621, which compares STAT1-/- and wild-type CD4 T cells following IL-27 stimulation. This gene set is more aligned with the focus of our study and provides a more appropriate reference for identifying STAT1-activated genes.

      Our re-analysis revealed that 270 genes (FPKM > 1, log2FC > 2) were downregulated in STAT1-/- cells compared to wild-type cells, which we defined as STAT1-activated genes. Notably, approximately 50% of the upregulated differentially expressed genes (55 out of 137) in our dataset fell into the category of STAT1-activated genes, while none were classified as STAT1-suppressed genes (Figure 4B). Furthermore, Gene Set Enrichment Analysis (GSEA) demonstrated significant enrichment of STAT1-activated genes in the transcriptome of CD4 SP thymocytes from the knockout mice (NES = 1.67, nominal p-value = 10<sup>-16</sup>, Figure 4D).

      These findings support our conclusion that IL-27p28 deficiency leads to enhanced STAT1 activity in CD4 SP thymocytes. We believe that using a more relevant gene set has strengthened our analysis and provided clearer insights into the molecular mechanisms underlying the observed phenotype.

      We have cited the relevant studies (Hirahara et al., Immunity 2015; Iwata et al., Immunity 2017) to provide context for our revised analysis and to acknowledge the importance of canonical STAT1 stimuli in T cell signaling. We appreciate your guidance and are confident that these revisions have improved the robustness and relevance of our findings.

      (5) Given the ability of IL-27 to activate STAT1 in T cells, it is surprising that SP, RTE, and Tn cells from Cd11c-p28-flox mice exhibit more STAT1 signaling than WT controls. If not IL-27, then what is the stimulus for this STAT1 activity? The authors rule out autocrine IFN-g production in vitro (not in vivo) but provide no further insight.

      Thank you for raising this important question. We appreciate your interest in understanding the source of enhanced STAT1 signaling in SP, RTE, and Tn cells from Cd11c-p28-flox mice, especially given the role of IL-27 in activating STAT1 in T cells. As previously discussed, we have identified IL-35 and CLCF1 as the most likely candidate cytokines driving the observed STAT1 activity in the absence of p28. These cytokines are of particular interest due to their potential to activate STAT1 and their relevance in the context of our study.

      To address the question of what drives the enhanced STAT1 signaling, we are planning to perform ChIP experiments using these cytokines directly. This approach will allow us to evaluate their impact on STAT1 signaling more precisely, without relying on TCR + co-stimulation. By treating cells ex vivo with IL-35 and CLCF1, we aim to determine whether these cytokines are responsible for the increased STAT1 activity observed in Cd11c-p28-flox mice.

      We acknowledge that ruling out autocrine IFN-γ production in vitro, as we have done, does not fully address the potential role of IFN-γ in vivo. Therefore, we are also considering additional in vivo experiments to further investigate this possibility. These studies will help us determine whether other sources of IFN-γ or other cytokines contribute to the observed STAT1 hyperactivation. Unfortunately, due to unforeseen challenges with mouse crosses, we anticipate a delay of approximately six months to obtain the appropriate genotypes necessary for these experiments. We are actively working to resolve these challenges and will update the manuscript with the results of these experiments upon completion.

      (6) The RNAseq data affirms that SP, RTE, and Tn cells from Cd11c-p28-flox mice exhibit more STAT1 signaling than WT controls. However, this does little to explain the attendant hyper-Th1 phenotype. Is there evidence that epigenetic machinery is deregulated (to account for changes in DNA. histone methylation)? Were IFN-g and Tbet among these few observed DEG? If so, then this should be highlighted. If not, then the authors must address why not. Are there clues as to why STAT1 signing is exaggerated? Also, the hyper-STAT1 effect should be better described using more rigorous STAT1- and interferon-signature genesets (see the work of Virginia Pascual, Anne O'Garra).

      Thank you for your valuable feedback and suggestions. We appreciate your interest in understanding the mechanisms underlying the hyper-Th1 phenotype observed in Cd11c-p28-flox mice. Below, we address each of your points in detail:

      (1) Epigenetic Regulation:

      We have conducted a thorough analysis of the global levels of key histone modifications, including H3K4me3, H3K9me3, and H3K27me3, as well as the mRNA expression of the enzymes responsible for catalyzing these marks. Our results indicate that there are no significant differences in these histone modifications or the expression of the associated enzymes between Cd11c-p28<sup>f/f</sup> and wildtype mice (Figure 3-figure supplement 1A-C). This suggests that the enhanced STAT1 signaling is not a consequence of broad epigenetic deregulation. Instead, we hypothesize that the observed changes may be driven by more specific molecular mechanisms, such as cytokine signaling pathways.

      (2) IFN-γ and Tbx21 Expression:

      Regarding the expression of Th1-associated genes, our analysis revealed a modest induction of ifng and tbx21 (encoding T-bet) in the CD4SP population following TCR stimulation. However, the baseline expression levels of these genes were quite low in freshly isolated CD4SP cells. Specifically, ifng was undetectable, and tbx21 had an FPKM of 0.29 in wildtype mice compared to 1.05 in Cd11c-p28<sup>f/f</sup> mice. While these findings indicate some upregulation of Th1-associated genes, the overall expression levels remain relatively low, suggesting that additional factors may contribute to the hyper-Th1 phenotype.

      (3) STAT1 Signature Genesets:

      We have revised our analysis to incorporate more rigorous STAT1 and interferon-signature genesets, as suggested. We have adopted gene sets from well-established studies, including those by Virginia Pascual and Anne O'Garra, to provide a more comprehensive and accurate assessment of STAT1 signaling. This approach has enhanced our ability to identify and characterize the genes involved in the STAT1 pathway, providing clearer insights into the exaggerated STAT1 signaling observed in our model.

      We appreciate your guidance and are committed to refining our analysis to provide a more detailed understanding of the mechanisms driving the hyper-Th1 phenotype in Cd11c-p28-flox mice. We will continue to explore the potential roles of cytokines such as IL-35 and CLCF1, as well as other factors that may contribute to the observed changes in STAT1 signaling and Th1 differentiation. We look forward to sharing our updated findings and further discussing these mechanisms in our revised manuscript.

      (7) Is the hyper-Th1 phenotype of SP, RTE, and Tn cells from Cd11c-p28-flox mice unique to the CD4 compartment? Are developing CD8<sup>+</sup> cells similarly prone to increased STAT1 signaling and IFN-g production?

      Thank you for raising this important point. Our data indeed suggests that the hyper-Th1 phenotype observed in SP, RTE, and Tn cells from Cd11c-p28<sup>f/f</sup> mice is unique to the CD4<sup>+</sup> T cell compartment. Specifically, we found that while CD4<sup>+</sup> SP cells from Cd11c-p28<sup>f/f</sup> mice exhibited a significant upregulation in IL-27 receptor expression (both IL27Ra and gp130) compared to wild-type (WT) mice, CD8<sup>+</sup> SP cells from the same genotype showed markedly lower expression of these receptor subunits (Figure 1C in Sci Rep. 2016 Jul 29:6:30448. DOI: 10.1038/srep30448). This finding is further supported by our observation that the phosphorylation levels of STAT1, STAT3, and STAT4, downstream targets of IL-27 signaling, were comparable between CD8 SP cells from Cd11c-p28<sup>f/f</sup> and WT mice (Author response image 1). Additionally, we observed no significant difference in IFN-γ and granzyme B production between naïve CD8 T cells isolated from the lymph nodes of the two genotypes (Author response image 1). Taken together, these results suggest that the enhanced Th1 differentiation and IFN-γ production seen in the CD4<sup>+</sup> T cell population from Cd11c-p28<sup>f/f</sup> mice is not recapitulated in the CD8<sup>+</sup> T cell lineage.

      Author response image 2.

      (A) Intracellular staining was performed with freshly isolated thymocytes from Cd11c-p28<sup>f/f</sup> mice and WT littermates mice using antibodies against phosphorylated STAT1 (Y701), STAT3 (Y705), and STAT4 (Y693). The mean fluorescence intensity (MFI) for CD8 SP from three independent experiments (mean ± SD, n=3). (B) CD8<sup>+</sup> naive T cells were cultured under Th0 conditions for 3 days. The frequency of IFN-γ-, and granzyme B-producing CD8<sup>+</sup> T cells were determined analyzed by intracellular staining. Representative dot plots (left) and quantification (right, mean ± SD, n=6).

      Minor points and questions

      (1) Line 84 - Villarino et al. and Pflanz et al. are mis-referenced. Neither involves Trypanosome studies. The former is on Toxoplasma infection and, thus, should be properly referenced in the following sentence.

      Thank you for pointing out this error. You are correct that the references to Villarino et al. and Pflanz et al. were misapplied in the context of Trypanosome studies. Villarino et al. focuses on Toxoplasma infection, and we appreciate your guidance to ensure accurate citation. We will correct this in the manuscript and properly cite the studies in their appropriate contexts. Thank you for your vigilance in maintaining the accuracy of our references.

      (2) T-bet protein should also be measured by cytometry

      We sincerely thank the reviewer for the valuable suggestion regarding the measurement of T-bet protein levels. In response to this comment, we have performed additional experiments to quantify T-bet protein expression using flow cytometry. The results of these analyses have been incorporated into the revised manuscript as Figure 1F.

      Reviewer #2 (Recommendations For The Authors):

      (1) When new mouse strains are generated in this study, there is no comment on whether there are any changes in the frequency or cell number of CD4 T cells. For instance, in Aire-deficient CD11c-p28 floxed mice, it should be noted whether CD4SP, naïve CD4, and CD4 RTE are all the same in frequency and number compared to their littermate controls. Also, is there any effect on the generation of these thymocytes?

      We sincerely thank the reviewer for raising this important point regarding the potential changes in the frequency and cell numbers of CD4<sup>+</sup> T cells in the newly generated mouse strains. In response to the reviewer’s question, we would like to clarify the following:

      (1) Impact of Aire deficiency on CD4<sup>+</sup> T Cells:

      As previously reported by us and others (Aging Dis. 2019, doi: 10.14336/AD.2018.0608; Science. 2002, doi: 10.1126/science.1075958), Aire deficiency does not significantly alter the overall number or frequency of CD4 single-positive (CD4SP) thymocytes, recent thymic emigrants (RTEs), or naïve CD4<sup>+</sup> T cells. However, it profoundly affects their composition and functional properties, leading to the escape of autoreactive T cells and subsequent autoimmune manifestations.

      (2) Observations in Cd11c-p28<sup>f/f</sup>Aire<sup>-/-</sup> mice:

      In our study, we observed that the number and frequency of CD4<sup>+</sup> T cells in the spleen and lymph nodes were comparable among Cd11c-p28<sup>f/f</sup>, Aire<sup>-/-</sup>, and Cd11c-p28<sup>f/f</sup>Aire<sup>-/-</sup> mice, and WT controls. This suggests that the genetic modifications did not significantly impact the overall development or peripheral maintenance of CD4<sup>+</sup> T cells.

      Author response image 3.

      (3) Challenges in assessing RTEs in double knockout mice:

      To accurately assess RTEs in the double knockout mice, it would be necessary to cross these mice with Rag-GFP reporter mice, which specifically label RTEs. However, breeding the appropriate mouse strain for this analysis would require additional time and resources, which were beyond the scope of the current study.

      (2) There are a couple of typos throughout the manuscript. For example, line 91: IL-27Rα or line 313: phenotype.

      We apologize for the typographical errors. We have carefully reviewed the entire manuscript and corrected all identified mistakes, including those on line 91 (IL-27Rα) and line 305 (phenotype).

      (4) The authors should show each data point on their bar graphs.

      Thank you for the suggestion. We have presented each data point on their bar graphs in the revised manuscript.

      (4) It should be noted from which organs the RTE and the naïve T cells were harvested.

      Thank you for the constructive suggestion. We isolated CD4<sup>+</sup> RTEs and mature naive CD4<sup>+</sup> T cells by sorting GFP<sup>+</sup>CD4<sup>+</sup>CD8<sup>-</sup>CD<sup>-</sup>NK1.1<sup>-</sup> cells (RTEs) and GFP<sup>-</sup>CD4<sup>+</sup>CD8<sup>-</sup>CD<sup>-</sup>CD44<sup>lo</sup> cells (naive T cells) from lymph nodes. This detail has been added to the manuscript on line 475.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      This was a clearly written manuscript that did an excellent job summarizing complex data.

      In this manuscript, Cuevas-Zuviría et al. use protein modeling to generate over 5,000 predicted structures of nitrogenase components, encompassing both extant and ancestral forms across different clades. The study highlights that key insertions define the various Nif groups. The authors also examined the structures of three ancestral nitrogenase variants that had been previously identified and experimentally tested. These ancestral forms were shown in earlier studies to exhibit reduced activity in Azotobacter vinelandii, a model diazotroph. This work provides a useful resource for studying nitrogenase evolution.

      However, its impact is somewhat limited due to a lack of evidence linking the observed structural differences to functional changes. For example, in the ancestral nitrogenase structures, only a small set of residues (lines 421-431) were identified as potentially affecting interactions between nitrogenase components. Why didn't the authors test whether reverting these residues to their extant counterparts could improve nitrogenase activity of the ancestral variants?

      We thank the reviewer for their thoughtful comments. We acknowledge that our current study is primarily focused on a computational exploration of the structural differences in both extant and ancestral nitrogenase variants, which allowed us to generate a comprehensive structural dataset. Although we did not carry out experimental reversion tests in this study, we agree that directly assessing the functional consequences of reverting the specific residues (lines 420 to 429) to their extant counterparts is an important next step to elucidate their functional role. Indeed, these findings provide a valuable foundation for our future work, which is designed to include experimental characterization of these variants and further elucidate the role of critical residues in nitrogenase activity and evolution. We believe that these experiments will offer the direct functional validation that the reviewer has rightly pointed out, and we look forward to reporting on these results in a future study.

      Additionally, the paper feels somewhat disconnected. The predicted nitrogenase structures discussed in the first half of the manuscript were not well integrated with the findings from the ancestral structures. For instance, do the ancestral nitrogenase structures align with the predicted models? This comparison was never explicitly made and could have strengthened the study's conclusions.

      We thank the reviewer for this suggestion. Our original analysis (previously shown in Figure S9, now Figure S10) included insights into structural align comparisons. In response, we have reorganized the results section (lines 351-355) to explicitly address this comparison.

      Reviewer #2 (Public review):

      This work aims to study the evolution of nitrogenases, understanding how their structure and function adapted to changes in the environment, including oxygen levels and changes in metal availability. The study predicts > 5000 structures of nitrogenases, corresponding to extant, ancestral, and alternative ancestral sequences. It is observed that structural variations in the nitrogenases correlate with phylogenetic relationships. The amount of data generated in this study represents a massive undertaking that is certain to be a resource for the community. The study also provides strong insight into how structural evolution correlates with environmental and biological phenotypes.

      The challenge with this study is that all (or nearly all) of the quantitative analyses presented are based on RMSD calculations, many of which are under 2 angstroms. For all intents and purposes, two structures with RMSD < 2 angstroms could be considered 'structurally identical'. A lot of insight generated is based on minuscule differences in RMSD, for which it is not clear that they are significantly different. The suggestion would be to find a way to evaluate the RMSD metric and determine whether these values, as obtained for structures being compared, are reliable. Some options are provided in earlier studies: PMID: 11514933, PMID: 17218333, PMID: 11420449, PMID: 8289285 (and others). It could also be valuable to focus more on site-specific RMSDs rather than Global RMSDs. The high conservation in the nitrogenases likely ensures that the global RMSDs will remain low across the family. Focusing on specific regions might reveal interesting differences between clades that are more informative regarding the evolution of structure in tandem with environment/time.

      We thank the reviewer for their suggestions. We agree that while global RMSD values below 2Å typically indicate high structural similarity, relying solely on these measures can mask subtle yet potentially functionally meaningful differences. Our aim was not to test for overall structural identity but rather to quantify fine-scale variations between highly conserved nitrogenase structures, including extant and ancestral variants. Nevertheless, in light of the reviewer’s suggestions, we have implemented an additional metric ( rmsd<sub>100</sub>) for a more nuanced comparison. The results of our additional analyses (Figure S3) align closely with our original results (Figure 2), supporting our decision to retain the un-normalized results in the main text. As an additional measure, we also computed site-specific RMSDs for the active site’s environments (Figure S6) to further delineate subtle structural variations.

    1. Author response:

      The following is the authors’ response to the original reviews

      eLife Assessment

      Examination of (a)periodic brain activity has gained particular interest in the last few years in the neuroscience fields relating to cognition, disorders, and brain states. Using large EEG/MEG datasets from younger and older adults, the current study provides compelling evidence that age-related differences in aperiodic EEG/MEG signals can be driven by cardiac rather than brain activity. Their findings have important implications for all future research that aims to assess aperiodic neural activity, suggesting control for the influence of cardiac signals is essential.

      We want to thank the editors for their assessment of our work and highlighting its importance for the understanding of aperiodic neural activity. Additionally, we want to thank the three present and four former reviewers (at a different journal) whose comments and ideas were critical in shaping this manuscript to its current form. We hope that this paper opens up many more questions that will guide us - as a field - to an improved understanding of how “cortical” and “cardiac” changes in aperiodic activity are linked and want to invite readers to engage with our work through eLife’s comment function.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The present study addresses whether physiological signals influence aperiodic brain activity with a focus on age-related changes. The authors report age effects on aperiodic cardiac activity derived from ECG in low and high-frequency ranges in roughly 2300 participants from four different sites. Slopes of the ECGs were associated with common heart variability measures, which, according to the authors, shows that ECG, even at higher frequencies, conveys meaningful information. Using temporal response functions on concurrent ECG and M/EEG time series, the authors demonstrate that cardiac activity is instantaneously reflected in neural recordings, even after applying ICA analysis to remove cardiac activity. This was more strongly the case for EEG than MEG data. Finally, spectral parameterization was done in large-scale resting-state MEG and ECG data in individuals between 18 and 88 years, and age effects were tested. A steepening of spectral slopes with age was observed particularly for ECG and, to a lesser extent, in cleaned MEG data in most frequency ranges and sensors investigated. The authors conclude that commonly observed age effects on neural aperiodic activity can mainly be explained by cardiac activity.

      Strengths:

      Compared to previous investigations, the authors demonstrate the effects of aging on the spectral slope in the currently largest MEG dataset with equal age distribution available. Their efforts of replicating observed effects in another large MEG dataset and considering potential confounding by ocular activity, head movements, or preprocessing methods are commendable and valuable to the community. This study also employs a wide range of fitting ranges and two commonly used algorithms for spectral parameterization of neural and cardiac activity, hence providing a comprehensive overview of the impact of methodological choices. Based on their findings, the authors give recommendations for the separation of physiological and neural sources of aperiodic activity.

      Weaknesses:

      While the aim of the study is well-motivated and analyses rigorously conducted, the overall structure of the manuscript, as it stands now, is partially misleading. Some of the described results are not well-embedded and lack discussion.

      We want to thank the reviewer for their comments focussed on improving the overall structure of the manuscript. We agree with their suggestions that some results could be more clearly contextualized and restructured the manuscript accordingly.

      Reviewer #2 (Public review):

      I previously reviewed this important and timely manuscript at a previous journal where, after two rounds of review, I recommended publication. Because eLife practices an open reviewing format, I will recapitulate some of my previous comments here, for the scientific record.

      In that previous review, I revealed my identity to help reassure the authors that I was doing my best to remain unbiased because I work in this area and some of the authors' results directly impact my prior research. I was genuinely excited to see the earlier preprint version of this paper when it first appeared. I get a lot of joy out of trying to - collectively, as a field - really understand the nature of our data, and I continue to commend the authors here for pushing at the sources of aperiodic activity!

      In their manuscript, Schmidt and colleagues provide a very compelling, convincing, thorough, and measured set of analyses. Previously I recommended that the push even further, and they added the current Figure 5 analysis of event-related changes in the ECG during working memory. In my opinion this result practically warrants a separate paper its own!

      The literature analysis is very clever, and expanded upon from any other prior version I've seen.

      In my previous review, the broadest, most high-level comment I wanted to make was that authors are correct. We (in my lab) have tried to be measured in our approach to talking about aperiodic analyses - including adopting measuring ECG when possible now - because there are so many sources of aperiodic activity: neural, ECG, respiration, skin conductance, muscle activity, electrode impedances, room noise, electronics noise, etc. The authors discuss this all very clearly, and I commend them on that. We, as a field, should move more toward a model where we can account for all of those sources of noise together. (This was less of an action item, and more of an inclusion of a comment for the record.)

      I also very much appreciate the authors' excellent commentary regarding the physiological effects that pharmacological challenges such as propofol and ketamine also have on non-neural (autonomic) functions such as ECG. Previously I also asked them to discuss the possibility that, while their manuscript focuses on aperiodic activity, it is possible that the wealth of literature regarding age-related changes in "oscillatory" activity might be driven partly by age-related changes in neural (or non-neural, ECG-related) changes in aperiodic activity. They have included a nice discussion on this, and I'm excited about the possibilities for cognitive neuroscience as we move more in this direction.

      Finally, I previously asked for recommendations on how to proceed. The authors convinced me that we should care about how the ECG might impact our field potential measures, but how do I, as a relative novice, proceed. They now include three strong recommendations at the end of their manuscript that I find to be very helpful.

      As was obvious from previous review, I consider this to be an important and impactful cautionary report, that is incredibly well supported by multiple thorough analyses. The authors have done an excellent job responding to all my previous comments and concerns and, in my estimation, those of the previous reviewers as well.

      We want to thank the reviewer for agreeing to review our manuscript again and for recapitulating on their previous comments and the progress the manuscript has made over the course of the last ~2 years. The reviewer's comments have been essential in shaping the manuscript into its current form. Their feedback has made the review process truly feel like a collaborative effort, focused on strengthening the manuscript and refining its conclusions and resulting recommendations.

      Reviewer #3 (Public review):

      Summary:

      Schmidt et al., aimed to provide an extremely comprehensive demonstration of the influence cardiac electromagnetic fields have on the relationship between age and the aperiodic slope measured from electroencephalographic (EEG) and magnetoencephalographic (MEG) data.

      Strengths:

      Schmidt et al., used a multiverse approach to show that the cardiac influence on this relationship is considerable, by testing a wide range of different analysis parameters (including extensive testing of different frequency ranges assessed to determine the aperiodic fit), algorithms (including different artifact reduction approaches and different aperiodic fitting algorithms), and multiple large datasets to provide conclusions that are robust to the vast majority of potential experimental variations.

      The study showed that across these different analytical variations, the cardiac contribution to aperiodic activity measured using EEG and MEG is considerable, and likely influences the relationship between aperiodic activity and age to a greater extent than the influence of neural activity.

      Their findings have significant implications for all future research that aims to assess aperiodic neural activity, suggesting control for the influence of cardiac fields is essential.

      We want to thank the reviewer for their thorough engagement with our work and the resultant substantive amount of great ideas both mentioned in the section of Weaknesses and Authors Recommendations below. Their suggestions have sparked many ideas in us on how to move forward in better separating peripheral- from neuro-physiological signals that are likely to greatly influence our future attempts to better extract both cardiac and muscle activity from M/EEG recordings. So we want to thank them for their input, time and effort!

      Weaknesses:

      Figure 4I: The regressions explained here seem to contain a very large number of potential predictors. Based on the way it is currently written, I'm assuming it includes all sensors for both the ECG component and ECG rejected conditions?

      I'm not sure about the logic of taking a complete signal, decomposing it with ICA to separate out the ECG and non-ECG signals, then including these latent contributions to the full signal back into the same regression model. It seems that there could be some circularity or redundancy in doing so. Can the authors provide a justification for why this is a valid approach?

      After observing significant effects both in the MEG<sub>ECG component</sub> and MEG<sub>ECG rejected</sub> conditions in similar frequency bands we wanted to understand whether or not these age-related changes are statistically independent. To test this we added both variables as predictors in a regression model (thereby accounting for the influence of the other in relation to age). The regression models we performed were therefore actually not very complex. They were built using only two predictors, namely the data (in a specific frequency range) averaged over channels on which we noticed significant effects in the ECG rejected and ECG components data respectively (Wilkinson notation: age ~ 1 + ECG rejected + ECG components). This was also described in the results section stating that: “To see if MEG<sub>ECG rejected</sub> and MEG<sub>ECG component</sub> explain unique variance in aging at frequency ranges where we noticed shared effects, we averaged the spectral slope across significant channels and calculated a multiple regression model with MEG<sub>ECG component</sub> and MEG<sub>ECG rejected</sub> as predictors for age (to statistically control for the effect of MEG<sub>ECG component</sub>s and MEG<sub>ECG rejected</sub> on age). This analysis was performed to understand whether the observed shared age-related effects (MEG<sub>ECG rejected</sub> and MEG<sub>ECG component</sub>) are in(dependent).”  

      We hope this explanation solves the previous misunderstanding.

      I'm not sure whether there is good evidence or rationale to support the statement in the discussion that the presence of the ECG signal in reference electrodes makes it more difficult to isolate independent ECG components. The ICA algorithm will still function to detect common voltage shifts from the ECG as statistically independent from other voltage shifts, even if they're spread across all electrodes due to the referencing montage. I would suggest there are other reasons why the ICA might lead to imperfect separation of the ECG component (assumption of the same number of source components as sensors, non-Gaussian assumption, assumption of independence of source activities).

      The inclusion of only 32 channels in the EEG data might also have reduced the performance of ICA, increasing the chances of imperfect component separation and the mixing of cardiac artifacts into the neural components, whereas the higher number of sensors in the MEG data would enable better component separation. This could explain the difference between EEG and MEG in the ability to clean the ECG artifact (and perhaps higher-density EEG recordings would not show the same issue).

      The reviewer is making a good argument suggesting that our initial assumption that the presence of cardiac activity on the reference electrode influences the performance of the ICA may be wrong. After rereading and rethinking upon the matter we think that the reviewer is correct and that their assumptions for why the ECG signal was not so easily separable from our EEG recordings are more plausible and better grounded in the literature than our initial suggestion. We therefore now highlight their view as a main reason for why the ECG rejection was more challenging in EEG data. However, we also note that understanding the exact reason probably ends up being an empirical question that demands further research stating that:

      “Difficulties in removing ECG related components from EEG signals via ICA might be attributable to various reasons such as the number of available sensors or assumptions related to the non-gaussianity of the underlying sources. Further understanding of this matter is highly important given that ICA is the most widely used procedure to separate neural from peripheral physiological sources. ”

      In addition to the inability to effectively clean the ECG artifact from EEG data, ICA and other component subtraction methods have also all been shown to distort neural activity in periods that aren't affected by the artifact due to the ubiquitous issue of imperfect component separation (https://doi.org/10.1101/2024.06.06.597688). As such, component subtraction-based (as well as regression-based) removal of the cardiac artifact might also distort the neural contributions to the aperiodic signal, so even methods to adequately address the cardiac artifact might not solve the problem explained in the study. This poses an additional potential confound to the "M/EEG without ECG" conditions.

      The reviewer is correct in stating that, if an “artifactual” signal is not always present but appears and disappears (like e.g. eye-blinks) neural activity may be distorted in periods where the “artifactual” signal is absent. However, while this plausibly presents a problem for ocular activity, there is no obvious reason to believe that this applies to cardiac activity. While the ECG signal is non-stationary in nature, it is remarkably more stable than eye-movements in the healthy populations we analyzed (especially at rest). Therefore, the presence of the cardiac “artifact” was consistently present across the entirety of the MEG recordings we visually inspected.

      Literature Analysis, Page 23: was there a method applied to address studies that report reducing artifacts in general, but are not specific to a single type of artifact? For example, there are automated methods for cleaning EEG data that use ICLabel (a machine learning algorithm) to delete "artifact" components. Within these studies, the cardiac artifact will not be mentioned specifically, but is included under "artifacts".

      The literature analysis was largely performed automatically and solely focussed on ECG related activity as described in the methods section under Literature Analysis, if no ECG related terms were used in the context of artifact rejection a study was flagged as not having removed cardiac activity. This could have been indeed better highlighted by us and we apologize for the oversight on our behalf. We now additionally link to these details stating that:

      “However, an analysis of openly accessible M/EEG articles (N<sub>Articles</sub>=279; see Methods - Literature Analysis for further details) that investigate aperiodic activity revealed that only 17.1% of EEG studies explicitly mention that cardiac activity was removed and only 16.5% measure ECG (45.9% of MEG studies removed cardiac activity and 31.1% of MEG studies mention that ECG was measured; see Figure 1EF).”

      The reviewer makes a fair point that there is some uncertainty here and our results probably present a lower bound of ECG handling in M/EEG research as, when I manually rechecked the studies that were not initially flagged in studies it was often solely mentioned that “artifacts” were rejected. However, this information seemed too ambiguous to assume that cardiac activity was in fact accounted for. However, again this could have been mentioned more clearly in writing and we apologize for this oversight. Now this is included as part of the methods section Literature Analysis stating that:

      “All valid word contexts were then manually inspected by scanning the respective word context to ensure that the removal of “artifacts” was related specifically to cardiac and not e.g. ocular activity or the rejection of artifacts in general (without specifying which “artifactual” source was rejected in which case the manuscript was marked as invalid). This means that the results of our literature analysis likely present a lower bound for the rejection of cardiac activity in the M/EEG literature investigating aperiodic activity.”

      Statistical inferences, page 23: as far as I can tell, no methods to control for multiple comparisons were implemented. Many of the statistical comparisons were not independent (or even overlapped with similar analyses in the full analysis space to a large extent), so I wouldn't expect strong multiple comparison controls. But addressing this point to some extent would be useful (or clarifying how it has already been addressed if I've missed something).

      In the present study we tried to minimize the risk of type 1 errors by several means, such as A) weakly informative priors, B) robust regression models and C) by specifying a region of practical equivalence (ROPE, see Methods Statistical Inference for further Information) to define meaningful effects.

      Weakly informative priors can lower the risk of type 1 errors arising from multiple testing by shrinking parameter estimates towards zero (see e.g. Lemoine, 2019). Robust regression models use a Student T distribution to describe the distribution of the data. This distribution features heavier tails, meaning it allocates more probability to extreme values, which in turn minimizes the influence of outliers. The ROPE criterion ensures that only effects exceeding a negligible size are considered meaningful, representing a strict and conservative approach to interpreting our findings (see Kruschke 2018, Cohen, 1988).

      Furthermore, and more generally we do not selectively report “significant” effects in the situations in which multiple analyses were conducted on the same family of data (e.g. Figure 2 & 4). Instead we provide joint inference across several plausible analysis options (akin to a specification curve analysis, Simonsohn, Simmons & Nelson 2020) to provide other researchers with an overview of how different analysis choices impact the association between cardiac and neural aperiodic activity.

      Lemoine, N. P. (2019). Moving beyond noninformative priors: why and how to choose weakly informative priors in Bayesian analyses. Oikos, 128(7), 912-928.

      Simonsohn, U., Simmons, J. P., & Nelson, L. D. (2020). Specification curve analysis. Nature Human Behaviour, 4(11), 1208-1214.

      Methods:

      Applying ICA components from 1Hz high pass filtered data back to the 0.1Hz filtered data leads to worse artifact cleaning performance, as the contribution of the artifact in the 0.1Hz to 1Hz frequency band is not addressed (see Bailey, N. W., Hill, A. T., Biabani, M., Murphy, O. W., Rogasch, N. C., McQueen, B., ... & Fitzgerald, P. B. (2023). RELAX part 2: A fully automated EEG data cleaning algorithm that is applicable to Event-Related-Potentials. Clinical Neurophysiology, result reported in the supplementary materials). This might explain some of the lower frequency slope results (which include a lower frequency limit <1Hz) in the EEG data - the EEG cleaning method is just not addressing the cardiac artifact in that frequency range (although it certainly wouldn't explain all of the results).

      We want to thank the reviewer for suggesting this interesting paper, showing that lower high-pass filters may be preferable to the more commonly used >1Hz high-pass filters for detection of ICA components that largely contain peripheral physiological activity. However, the results presented by Bailey et al. contradict the more commonly reported findings by other researchers that >1Hz high-pass filter is actually preferable (e.g. Winkler et al. 2015; Dimingen, 2020 or Klug & Gramann, 2021) and recommendations in widely used packages for M/EEG analysis (e.g. https://mne.tools/1.8/generated/mne.preprocessing.ICA.html). Yet, the fact that there seems to be a discrepancy suggests that further research is needed to better understand which type of high-pass filtering is preferable in which situation. Furthermore, it is notable that all the findings for high-pass filtering in ICA component detection and removal that we are aware of relate to ocular activity. Given that ocular and cardiac activity have very different temporal and spectral patterns it is probably worth further investigating whether the classic 1Hz high-pass filter is really also the best option for the detection and removal of cardiac activity. However, in our opinion this requires a dedicated investigation on its own..

      We therefore highlight this now in our manuscript stating that:

      “Additionally, it is worth noting that the effectiveness of an ICA crucially depends on the quality of the extracted components(63,64) and even widely suggested settings e.g. high-pass filtering at 1Hz before fitting an ICA may not be universally applicable (see supplementary material of (64)).

      Winkler, S. Debener, K. -R. Müller and M. Tangermann, "On the influence of high-pass filtering on ICA-based artifact reduction in EEG-ERP," 2015 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Milan, Italy, 2015, pp. 4101-4105, doi: 10.1109/EMBC.2015.7319296.

      Dimigen, O. (2020). Optimizing the ICA-based removal of ocular EEG artifacts from free viewing experiments. NeuroImage, 207, 116117.

      Klug, M., & Gramann, K. (2021). Identifying key factors for improving ICA‐based decomposition of EEG data in mobile and stationary experiments. European Journal of Neuroscience, 54(12), 8406-8420.

      It looks like no methods were implemented to address muscle artifacts. These can affect the slope of EEG activity at higher frequencies. Perhaps the Riemannian Potato addressed these artifacts, but I suspect it wouldn't eliminate all muscle activity. As such, I would be concerned that remaining muscle artifacts affected some of the results, particularly those that included high frequency ranges in the aperiodic estimate. Perhaps if muscle activity were left in the EEG data, it could have disrupted the ability to detect a relationship between age and 1/f slope in a way that didn't disrupt the same relationship in the cardiac data (although I suspect it wouldn't reverse the overall conclusions given the number of converging results including in lower frequency bands). Is there a quick validity analysis the authors can implement to confirm muscle artifacts haven't negatively affected their results?

      I note that an analysis of head movement in the MEG is provided on page 32, but it would be more robust to show that removing ICA components reflecting muscle doesn't change the results. The results/conclusions of the following study might be useful for objectively detecting probable muscle artifact components: Fitzgibbon, S. P., DeLosAngeles, D., Lewis, T. W., Powers, D. M. W., Grummett, T. S., Whitham, E. M., ... & Pope, K. J. (2016). Automatic determination of EMG-contaminated components and validation of independent component analysis using EEG during pharmacologic paralysis. Clinical neurophysiology, 127(3), 1781-1793.

      We thank the reviewer for their suggestion. Muscle activity can indeed be a potential concern, for the estimation of the spectral slope. This is precisely why we used head movements (as also noted by the reviewer) as a proxy for muscle activity. We also agree with the reviewer that this is not a perfect estimate. Additionally, also the riemannian potato would probably only capture epochs that contain transient, but not persistent patterns of muscle activity.

      The paper recommended by the reviewer contains a clever approach of using the steepness of the spectral slope (or lack thereof) as an indicator whether or not an independent component (IC) is driven by muscle activity. In order to determine an optimal threshold Fitzgibbon et al. compared paralyzed to temporarily non paralyzed subjects. They determined an expected “EMG-free” threshold for their spectral slope on paralyzed subjects and used this as a benchmark to detect IC’s that were contaminated by muscle activity in non paralyzed subjects.

      This is a great idea, but unfortunately would go way beyond what we are able to sensibly estimate with our data for the following reasons. The authors estimated their optimal threshold on paralyzed subjects for EEG data and show that this is a feasible threshold to be applied across different recordings. So for EEG data it might be feasible, at least as a first shot, to use their threshold on our data. However, we are measuring MEG and as alluded to in our discussion section under “Differences in aperiodic activity between magnetic and electric field recordings” the spectral slope differs greatly between MEG and EEG recordings for non-trivial reasons. Furthermore, the spectral slope even seems to also differ across different MEG devices. We noticed this when we initially tried to pool the data recorded in Salzburg with the Cambridge dataset. This means we would need to do a complete validation of this procedure for the MEG data recorded in Cambridge and in Salzburg, which is not feasible considering that we A) don’t have direct access to one of the recording sites and B) would even if we had access face substantial hurdles to get ethical approval for the experiment performed by Fitzgibbon et al..

      However, we think the approach brought forward by Fitzgibbon and colleagues is a clever way to remove muscle activity from EEG recordings, whenever EMG was not directly recorded. We therefore suggested in the Discussion section that ideally also EMG should be recorded stating that:

      “It is worth noting that, apart from cardiac activity, muscle activity can also be captured in (non-)invasive recordings and may drastically influence measures of the spectral slope(72). To ensure that persistent muscle activity does not bias our results we used changes in head movement velocity as a control analysis (see Supplementary Figure S9). However, it should be noted that this is only a proxy for the presence of persistent muscle activity. Ideally, studies investigating aperiodic activity should also be complemented by measurements of EMG. Whenever such measurements are not available creative approaches that use the steepness of the spectral slope (or the lack thereof) as an indicator to detect whether or not e.g. an independent component is driven by muscle activity are promising(72,73). However, these approaches may require further validation to determine how well myographic aperiodic thresholds are transferable across the wide variety of different M/EEG devices.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) As outlined above, I recommend rephrasing the last section of the introduction to briefly summarize/introduce all main analysis steps undertaken in the study and why these were done (for example, it is only mentioned that the Cam-CAN dataset was used to study the impact of cardiac on MEG activity although the author used a variety of different datasets). Similarly, I am missing an overview of all main findings in the context of the study goals in the discussion. I believe clarifying the structure of the paper would not only provide a red thread to the reader but also highlight the efforts/strength of the study as described above.

      This is a good call! As suggested by the reviewer we now try to give a clearer overview of what was investigated why. We do that both at the end of the introduction stating that: “Using the publicly available Cam-CAN dataset(28,29), we find that the aperiodic signal measured using M/EEG originates from multiple physiological sources. In particular, significant portions of age-related changes in aperiodic activity –normally attributed to neural processes– can be better explained by cardiac activity. This observation holds across a wide range of processing options and control analyses (see Supplementary S1), and was replicable on a separate MEG dataset. However, the extent to which cardiac activity accounts for age-related changes in aperiodic activity varies with the investigated frequency range and recording site. Importantly, in some frequency ranges and sensor locations, age-related changes in neural aperiodic activity still prevail. But does the influence of cardiac activity on the aperiodic spectrum extend beyond age? In a preliminary analysis, we demonstrate that working memory load modulates the aperiodic spectrum of “pure” ECG recordings. The direction of this working memory effect mirrors previous findings on EEG data(5) suggesting that the impact of cardiac activity goes well beyond aging. In sum, our results highlight the complexity of aperiodic activity while cautioning against interpreting it as solely “neural“ without considering physiological influences.”

      and at the beginning of the discussion section:

      “Difficulties in removing ECG related components from EEG signals via ICA might be attributable to various reasons such as the number of available sensors or assumptions related to the non-gaussianity of the underlying sources. Further understanding of this matter is highly important given that ICA is the most widely used procedure to separate neural from peripheral physiological sources (see Figure 1EF). Additionally, it is worth noting that the effectiveness of an ICA crucially depends on the quality of the extracted components(63,64) and even widely suggested settings e.g. high-pass filtering at 1Hz before fitting an ICA may not be universally applicable (see supplementary material of (64)). “

      (2) I found it interesting that the spectral slopes of ECG activity at higher frequency ranges (> 10 Hz) seem mostly related to HRV measures such as fractal and time domain indices and less so with frequency-domain indices. Do the authors have an explanation for why this is the case? Also, the analysis of the HRV measures and their association with aperiodic ECG activity is not explained in any of the method sections.

      We apologize for the oversight in not mentioning the HRV analysis in more detail in our methods section. We added a subsection to the Methods section entitled ECG Processing - Heart rate variability analysis to further describe the HRV analyses.

      “ECG Processing - Heart rate variability analysis

      Heart rate variability (HRV) was computed using the NeuroKit2 toolbox, a high level tool for the analysis of physiological signals. First, the raw electrocardiogram (ECG) data were preprocessed, by highpass filtering the signal at 0.5Hz using an infinite impulse response (IIR) butterworth filter(order=5) and by smoothing the signal with a moving average kernel with the width of one period of 50Hz to remove the powerline noise (default settings of neurokit.ecg.ecg_clean). Afterwards, QRS complexes were detected based on the steepness of the absolute gradient of the ECG signal. Subsequently, R-Peaks were detected as local maxima in the QRS complexes (default settings of neurokit.ecg.ecg_peaks; see (98) for a validation of the algorithm). From the cleaned R-R intervals, 90 HRV indices were derived, encompassing time-domain, frequency-domain, and non-linear measures. Time-domain indices included standard metrics such as the mean and standard deviation of the normalized R-R intervals , the root mean square of successive differences, and other statistical descriptors of interbeat interval variability. Frequency-domain analyses were performed using power spectral density estimation, yielding for instance low frequency (0.04-0.15Hz) and high frequency (0.15-0.4Hz) power components. Additionally, non-linear dynamics were characterized through measures such as sample entropy, detrended fluctuation analysis and various Poincaré plot descriptors. All these measures were then related to the slopes of the low frequency (0.25 – 20 Hz) and high frequency (10 – 145 Hz) aperiodic spectrum of the raw ECG.”

      With regards to association of the ECG’s spectral slopes at high frequencies and frequency domain indices of heart rate variability. Common frequency domain indices of heart rate variability fall in the range of 0.01-.4Hz. Which probably explains why we didn’t notice any association at higher frequency ranges (>10Hz).

      This is also stated in the related part of the results section:

      “In the higher frequency ranges (10 - 145 Hz) spectral slopes were most consistently related to fractal and time domain indices of heart rate variability, but not so much to frequency-domain indices assessing spectral power in frequency ranges < 0.4 Hz.”

      (3) Related to the previous point - what is being reflected in the ECG at higher frequency ranges, with regard to biological mechanisms? Results are being mentioned, but not further discussed. However, this point seems crucial because the age effects across the four datasets differ between low and high-frequency slope limits (Figure 2C).

      This is a great question that definitely also requires further attention and investigation in general (see also Tereshchenko & Josephson, 2015). We investigated the change of the slope across frequency ranges that are typically captured in common ECG setups for adults (0.05 - 150Hz, Tereshchenko & Josephson, 2015; Kusayama, Wong, Liu et al. 2020). While most of the physiological significant spectral information of an ECG recording rests between 1-50Hz (Clifford & Azuaje, 2006), meaningful information can be extracted at much higher frequencies. For instance, ventricular late potentials have a broader frequency band (~40-250Hz) that falls straight in our spectral analysis window. However, that’s not all, as further meaningful information can be extracted at even higher frequencies (>100Hz). Yet, the exact physiological mechanisms underlying so-called high-frequency QRS remain unclear (HF-QRS; see Tereshchenko & Josephson, 2015; Qiu et al. 2024 for a review discussing possible mechanisms). Yet, at the same time the HF-QRS seems to be highly informative for the early detection of myocardial ischemia and other cardiac abnormalities that may not yet be evident in the standard frequency range (Schlegel et al. 2004; Qiu et al. 2024). All optimism aside, it is also worth noting that ECG recordings at higher frequencies can capture skeletal muscle activity with an overlapping frequency range up to 400Hz (Kusayama, Wong, Liu et al. 2020). We highlight all of this now when introducing this analysis in the results sections as outstanding research question stating that:

      “However, substantially less is known about aperiodic activity above 0.4Hz in the ECG. Yet, common ECG setups for adults capture activity at a broad bandwidth of 0.05 - 150Hz(33,34).

      Importantly, a lot of the physiological meaningful spectral information rests between 1-50Hz(35), similarly to M/EEG recordings. Furthermore, meaningful information can be extracted at much higher frequencies. For instance, ventricular late potentials have a broader frequency band (~40-250Hz(35)). However, that’s not all, as further meaningful information can be extracted at even higher frequencies (>100Hz). For instance, the so-called high-frequency QRS seems to be highly informative for the early detection of myocardial ischemia and other cardiac abnormalities that may not yet be evident in the standard frequency range(36,37). Yet, the exact physiological mechanisms underlying the high-frequency QRS remain unclear (see (37) for a review discussing possible mechanisms). ”

      Tereshchenko, L. G., & Josephson, M. E. (2015). Frequency content and characteristics of ventricular conduction. Journal of electrocardiology, 48(6), 933-937.

      Kusayama, T., Wong, J., Liu, X. et al. Simultaneous noninvasive recording of electrocardiogram and skin sympathetic nerve activity (neuECG). Nat Protoc 15, 1853–1877 (2020). https://doi.org/10.1038/s41596-020-0316-6

      Clifford, G. D., & Azuaje, F. (2006). Advanced methods and tools for ECG data analysis (Vol. 10). P. McSharry (Ed.). Boston: Artech house.

      Qiu, S., Liu, T., Zhan, Z., Li, X., Liu, X., Xin, X., ... & Xiu, J. (2024). Revisiting the diagnostic and prognostic significance of high-frequency QRS analysis in cardiovascular diseases: a comprehensive review. Postgraduate Medical Journal, qgae064.

      Schlegel, T. T., Kulecz, W. B., DePalma, J. L., Feiveson, A. H., Wilson, J. S., Rahman, M. A., & Bungo, M. W. (2004, March). Real-time 12-lead high-frequency QRS electrocardiography for enhanced detection of myocardial ischemia and coronary artery disease. In Mayo Clinic Proceedings (Vol. 79, No. 3, pp. 339-350). Elsevier.

      (4) Page 10: At first glance, it is not quite clear what is meant by "processing option" in the text. Please clarify.

      Thank you for catching this! Upon re-reading this is indeed a bit oblivious. We now swapped “processing options” with “slope fits” to make it clearer that we are talking about the percentage of effects based on the different slope fits.

      (5) The authors mention previous findings on age effects on neural 1/f activity (References Nr 5,8,27,39) that seem contrary to their own findings such as e.g., the mostly steepening of the slopes with age. Also, the authors discuss thoroughly why spectral slopes derived from MEG signals may differ from EEG signals. I encourage the authors to have a closer look at these studies and elaborate a bit more on why these studies differ in their conclusions on the age effects. For example, Tröndle et al. (2022, Ref. 39) investigated neural activity in children and young adults, hence, focused on brain maturation, whereas the CamCAN set only considers the adult lifespan. In a similar vein, others report age effects on 1/f activity in much smaller samples as reported here (e.g., Voytek et al., 2015).

      I believe taking these points into account by briefly discussing them, would strengthen the authors' claims and provide a more fine-grained perspective on aging effects on 1/f.

      The reviewer is making a very important point. As age-related differences in (neuro-)physiological activity are not necessarily strictly comparable and entirely linear across different age-cohorts (e.g. age-related changes in alpha center frequency). We therefore, added the suggested discussion points to the discussion section.

      “Differences in electric and magnetic field recordings aside, aperiodic activity may not change strictly linearly as we are ageing and studies looking at younger age groups (e.g. <22; (44) may capture different aspects of aging (e.g. brain maturation), than those looking at older subjects (>18 years; our sample). A recent report even shows some first evidence of an interesting putatively non-linear relationship with age in the sensorimotor cortex for resting recordings(59)”

      (6) The analysis of the working memory paradigm as described in the outlook-section of the discussion comes as a bit of a surprise as it has not been introduced before. If the authors want to convey with this study that, in general, aperiodic neural activity could be influenced by aperiodic cardiac activity, I recommend introducing this analysis and the results earlier in the manuscript than only in the discussion to strengthen their message.

      The reviewer is correct. This analysis really comes a bit out of the blue. However, this was also exactly the intention for placing this analysis in the discussion. As the reviewer correctly noted, the aim was to suggest “that, in general, aperiodic neural activity could be influenced by aperiodic cardiac activity”. We placed this outlook directly after the discussion of “(neuro-)physiological origins of aperiodic activity”, where we highlight the potential challenges of interpreting drug induced changes to M/EEG recordings. So the aim was to get the reader to think about whether age is the only feature affected by cardiac activity and then directly present some evidence that this might go beyond age.

      However, we have been rethinking this approach based on the reviewers comments and moved that paragraph to the end of the results section accordingly and introduce it already at the end of the introduction stating that:

      “But does the influence of cardiac activity on the aperiodic spectrum extend beyond age? In a preliminary analysis, we demonstrate that working memory load modulates the aperiodic spectrum of “pure” ECG recordings. The direction of this working memory effect mirrors previous findings on EEG data(5) suggesting that the impact of cardiac activity goes well beyond aging.”

      (7) The font in Figure 2 is a bit hard to read (especially in D). I recommend increasing the font sizes where necessary for better readability.

      We agree with the Reviewer and increased the font sizes accordingly.

      (8) Text in the discussion: Figure 3B on page 10 => shouldn't it be Figure 4?

      Thank you for catching this oversight. We have now corrected this mistake.

      (9) In the third section on page 10, the Figure labels seem to be confused. For example, Figure 4 E is supposed to show "steepening effects", which should be Figure 4B I believe.

      Please check the figure labels in this section to avoid confusion.

      Thank you for catching this oversight. We have now corrected this mistake.

      (10) Figure Legend 4 I), please check the figure labels in the text

      Thank you for catching this oversight. We have now corrected this mistake.

      Reviewer #3 (Recommendations for the authors):

      I have a number of suggestions for improving the manuscript, which I have divided by section in the following:

      ABSTRACT:

      I would suggest re-writing the first sentences to make it easier to read for non-expert readers: "The power of electrophysiologically measured cortical activity decays with an approximately 1/fX function. The slope of this decay (i.e. the spectral exponent, X) is modulated..."

      Thank you for the suggestion. We adjusted the sentence as suggested to make it easier for less technical readers to understand that “X” refers to the exponent.

      Including the age range that was studied in the abstract could be informative.

      Done as suggested.

      As an optional recommendation, I think it would increase the impact of the article if the authors note in the abstract that the current most commonly applied cardiac artifact reduction approaches don't resolve the issue for EEG data, likely due to an imperfect ability to separate the cardiac artifact from the neural activity with independent component analysis. This would highlight to the reader that they can't just expect to address these concerns by cleaning their data with typical cleaning methods.

      I think it would also be useful to convey in the abstract just how comprehensive the included analyses were (in terms of artifact reduction methods tested, different aperiodic algorithms and frequency ranges, and both MEG and EEG). Doing so would let the reader know just how robust the conclusions are likely to be.

      This is a brilliant idea! As suggested we added a sentence highlighting that simply performing an ICA may not be sufficient to separate cardiac contributions to M/EEG recordings and refer to the comprehensiveness of the performed analyses.

      INTRODUCTION:

      I would suggest re-writing the following sentence for readability: "In the past, aperiodic neural activity, other than periodic neural activity (local peaks that rise above the "power-law" distribution), was often treated as noise and simply removed from the signal"

      To something like: "In the past, aperiodic neural activity was often treated as noise and simply removed from the signal e.g. via pre-whitening, so that analyses could focus on periodic neural activity (local peaks that rise above the "power-law" distribution, which are typically thought to reflect neural oscillations).

      We are happy to follow that suggestion.

      Page 3: please provide the number of articles that were included in the examination of the percentage that remove cardiac activity, and note whether the included articles could be considered a comprehensive or nearly comprehensive list, or just a representative sample.

      We stated the exact number of articles in the methods section under Literature Analysis. However, we added it to the Introduction on page 3 as suggested by the reviewer. The selection of articles was done automatically, dependent on a list of pre-specified terms and exclusively focussed on articles that had terms related to aperiodic activity in their title (see Literature Analysis). Therefore, I would personally be hesitant in calling it a comprehensive or nearly comprehensive list of the general M/EEG literature as the analysis of aperiodic activity is still relatively niche compared to the more commonly investigated evoked potentials or oscillations. I think whether or not a reader perceives our analysis as comprehensive should be up to them to decide and does not reflect something I want to impose on them. This is exacerbated by the fact that the analysis of neural aperiodic activity has rapidly gained traction over the last years (see Figure 1D orange) and the literature analysis was performed almost 2 years ago and therefore, in my eyes, only represents a glimpse in the rapidly evolving field related to the analysis of aperiodic activity.

      Figure 1E-F: It's not completely clear that the "Cleaning Methods" part of the figure indicates just methods to clean the cardiac artifact (rather than any artifact). It also seems that ~40% of EEG studies do not apply any cleaning methods even from within the studies that do clean the cardiac artifact (if I've read the details correctly). This seems unlikely. Perhaps there should be a bar for "other methods", or "unspecified"? Having said that, I'm quite familiar with the EEG artifact reduction literature, and I would be very surprised if ~40% of studies cleaned the cardiac artifact using a different method to the methods listed in the bar graph, so I'm wondering if I've misunderstood the figure, or whether the data capture is incomplete / inaccurate (even though the conclusion that ICA is the most common method is almost certainly accurate).

      The cleaning is indeed only focussed on cardiac activity specifically. This was however also mentioned in the caption of Figure 1: “We were further interested in determining which artifact rejection approaches were most commonly used to remove cardiac activity, such as independent component analysis (ICA(22)), singular value decomposition (SVD(23)), signal space separation (SSS(24)), signal space projections (SSP(25)) and denoising source separation (DSS(26)).” and in the methods section under Literature Analysis. However, we adjusted figure 1EF to make it more obvious that the described cleaning methods were only related to the ECG. Aside from using blind source separation techniques such as ICA a good amount of studies mentioned that they cleaned their data based on visual inspection (which was not further considered). Furthermore, it has to be noted that only studies were marked as having separated cardiac from neural activity, when this was mentioned explicitly.

      RESULTS:

      Page 6: I would delete the "from a neurophysiological perspective" clause, which makes the sentence more difficult to read and isn't so accurate (frequencies 13-25Hz would probably more commonly be considered mid-range rather than low or high). Additionally, both frequency ranges include 15Hz, but the next sentence states that the ranges were selected to avoid the knee at 15Hz, which seems to be a contradiction. Could the authors explain in more detail how the split addresses the 15Hz knee?

      We removed the “from a neurophysiological perspective” clause as suggested. With regards to the “knee” at ~15Hz I would like to defer the reviewer to Supplementary Figure S1. The Knee Frequency varies substantially across subjects so splitting the data at only 1 exact Frequency did not seem appropriate. Additionally, we found only spurious significant age-related variations in Knee Frequency (i.e. only one out of the 4 datasets; not shown).

      Furthermore, we wanted to better connect our findings to our MEG results in Figure 4 and also give the readers a holistic overview of how different frequency ranges in the aperiodic ECG would be affected by age. So to fulfill all of these objectives we decided to fit slopes with respective upper/lower bounds around a range of 5Hz above and below the average 15Hz Knee Frequency across datasets.

      The later parts of this same paragraph refer to a vast amount of different frequency ranges, but only the "low" and "high" frequency ranges were previously mentioned. Perhaps the explanation could be expanded to note that multiple lower and upper bounds were tested within each of these low and high frequency windows?

      This is a good catch we adjusted the sentence as suggested. We now write: “.. slopes were fitted individually to each subject's power spectrum in several lower (0.25 – 20 Hz) and higher (10-145 Hz) frequency ranges.”

      The following two sentences seem to contradict each other: "Overall, spectral slopes in lower frequency ranges were more consistently related to heart rate variability indices(> 39.4% percent of all investigated indices)" and: "In the lower frequency range (0.25 - 20Hz), spectral slopes were consistently related to most measures of heart rate variability; i.e. significant effects were detected in all 4 datasets (see Figure 2D)." (39.4% is not "most").

      The reviewer is correct in stating that 39.4% is not most. However, the 39.4% is the lowest bound and only refers to 1 dataset. In the other 3 datasets the percentage of effects was above 64% which can be categorized as “most” i.e. above 50%. We agree that this was a bit ambiguous in the sentence so we added the other percentages as well as a reference to Figure 2D to make this point clearer.

      Figure 2D: it isn't clear what the percentages in the semi-circles reflect, nor why some semi-circles are more full circles while others are only quarter circles.

      The percentages in the semi-circles reflect the amount of effects (marked in red) and null effects (marked in green) per dataset, when viewed as average across the different measures of HRV. Sometimes less effects were found for some frequency ranges resulting in quarters instead of semi circles.

      Page 8: I think the authors could make it more clear that one of the conditions they were testing was the ECG component of the EEG data (extracted by ICA then projected back into the scalp space for the temporal response function analysis).

      As suggested by the reviewer we adjusted our wording and replaced the arguably a bit ambiguous “... projected back separately” with “... projected back into the sensor space”. We thank the reviewer for this recommendation, as it does indeed make it easier to understand the procedure.

      “After pre-processing (see Methods) the data was split in three conditions using an ICA(22). Independent components that were correlated (at r > 0.4; see Methods: MEG/EEG Processing - pre-processing) with the ECG electrode were either not removed from the data (Figure 3ABCD - blue), removed from the data (Figure 2ABCD - orange) or projected back into the sensor space (Figure 3ABCD - green).”

      Figure 4A: standardized beta coefficients for the relationship between age and spectral slope could be noted to provide improved clarity (if I'm correct in assuming that is what they reflect).

      This was indeed shown in Figure 4A and noted in the color bar as “average beta (standardized)”. We do not specifically highlight this in the text, because the exact coefficients would depend on both on the analyzed frequency range and the selected electrodes.

      Figure 4I: The regressions explained at this point seems to contain a very large number of potential predictors, as I'm assuming it includes all sensors for both the ECG component and ECG rejected conditions? (if that is not the case, it could be explained in greater detail). I'm also not sure about the logic of taking a complete signal, decomposing it with ICA to separate out the ECG and non-ECG signals, then including them back into the same regression model. It seems that there could be some circularity or redundancy in doing so. However, I'm not confident that this is an issue, so would appreciate the authors explaining why it this is a valid approach (if that is the case).

      After observing significant effects both in the MEG<sub>ECG component</sub> and MEG<sub>ECG rejected</sub> conditions in similar frequency bands we wanted to understand whether or not these age-related changes are statistically independent. To test this we added both variables as predictors in a regression model (thereby accounting for the influence of the other in relation to age). The regression models we performed were therefore actually not very complex. They were built using only two predictors, namely the data (in a specific frequency range) averaged over channels on which we noticed significant effects in the ECG rejected and ECG components data respectively (Wilkinson notation: age ~ 1 + ECG rejected + ECG components). This was also described in the results section stating that: “To see if MEG<sub>ECG rejected</sub> and MEG<sub>ECG component</sub> explain unique variance in aging at frequency ranges where we noticed shared effects, we averaged the spectral slope across significant channels and calculated a multiple regression model with MEG<sub>ECG component</sub> and MEG<sub>ECG rejected</sub> as predictors for age (to statistically control for the effect of MEG<sub>ECG component</sub>s and MEG<sub>ECG rejected</sub> on age). This analysis was performed to understand whether the observed shared age-related effects (MEG<sub>ECG rejected</sub> and MEG<sub>ECG component</sub>) are in(dependent).”  

      We hope this explanation solves the previous misunderstanding.

      The explanation of results for relationships between spectral slopes and aging reported in Figure 4 refers to clusters of effects, but the statistical inference methods section doesn't explain how these clusters were determined.

      The wording of “cluster” was used to describe a “category” of effects e.g. null effects. We changed the wording from “cluster” to “category” to make this clearer stating now that: “This analysis, which is depicted in Figure 4, shows that over a broad amount of individual fitting ranges and sensors, aging resulted in a steepening of spectral slopes across conditions (see Figure 4E) with “steepening effects” observed in 25% of the processing options in MEG<sub>ECG not rejected</sub> , 0.5% in MEG<sub>ECG rejected</sub>, and 60% for MEG<sub>ECG components</sub>. The second largest category of effects were “null effects” in 13% of the options for MEG<sub>ECG not rejected</sub> , 30% in MEG<sub>ECG rejected</sub>, and 7% for MEG<sub>ECG components</sub>. ”

      Page 12: can the authors clarify whether these age related steepenings of the spectral slope in the MEG are when the data include the ECG contribution, or when the data exclude the ECG? (clarifying this seems critical to the message the authors are presenting).

      We apologize for not making this clearer. We now write: “This analysis also indicates that a vast majority of observed effects irrespective of condition (ECG components, ECG not rejected, ECG rejected) show a steepening of the spectral slope with age across sensors and frequency ranges.”

      Page 13: I think it would be useful to describe how much variance was explained by the MEG-ECG rejected vs MEG-ECG component conditions for a range of these analyses, so the reader also has an understanding of how much aperiodic neural activity might be influenced by age (vs if the effects are really driven mostly by changes in the ECG).

      With regards to the explained variance I think that the very important question of how strong age influences changes in aperiodic activity is a topic better suited for a meta analysis. As the effect sizes seems to vary largely depending on the sample e.g. for EEG in the literature results were reported at r=-0.08 (Cesnaite et al. 2023), r=-0.26 (Cellier et al. 2021), r=-0.24/r=-0.28/r=-0.35 (Hill et al. 2022) and r=0.5/r=0.7 (Voytek et al. 2015). I would defer the reader/reviewer to the standardized beta coefficients as a measure of effect size in the current study that is depicted in Figure 4A.

      Cellier, D., Riddle, J., Petersen, I., & Hwang, K. (2021). The development of theta and alpha neural oscillations from ages 3 to 24 years. Developmental cognitive neuroscience, 50, 100969.

      Cesnaite, E., Steinfath, P., Idaji, M. J., Stephani, T., Kumral, D., Haufe, S., ... & Nikulin, V. V. (2023). Alterations in rhythmic and non‐rhythmic resting‐state EEG activity and their link to cognition in older age. NeuroImage, 268, 119810.

      Hill, A. T., Clark, G. M., Bigelow, F. J., Lum, J. A., & Enticott, P. G. (2022). Periodic and aperiodic neural activity displays age-dependent changes across early-to-middle childhood. Developmental Cognitive Neuroscience, 54, 101076.

      Voytek, B., Kramer, M. A., Case, J., Lepage, K. Q., Tempesta, Z. R., Knight, R. T., & Gazzaley, A. (2015). Age-related changes in 1/f neural electrophysiological noise. Journal of Neuroscience, 35(38), 13257-13265.

      Also, if there are specific M/EEG sensors where the 1/f activity does relate strongly to age, it would be worth noting these, so future research could explore those sensors in more detail.

      I think it is difficult to make a clear claim about this for MEG data, as the exact location or type of the sensor may differ across manufacturers. Such a statement could be easier made for source projected data or in case EEG electrodes were available, where the location would be normed eg. according to the 10-20 system.

      DISCUSSION:

      Page 15: Please change the wording of the following sentence, as the way it is currently worded seems to suggest that the authors of the current manuscript have demonstrated this point (which I think is not the case): "The authors demonstrate that EEG typically integrates activity over larger volumes than MEG, resulting in differently shaped spectra across both recording methods."

      Apologies for the oversight! The reviewer is correct we in fact did not show this, but the authors of the cited manuscript. We correct the sentence as suggested stating now that:

      “Bénar et al. demonstrate that EEG typically integrates activity over larger volumes than MEG, resulting in differently shaped spectra across both recording methods.”

      Page 16: The authors mention the results can be sensitive to the application of SSS to clean the MEG data, but not ICA. I think it would be sensitive to the application of either SSS or ICA?

      This is correct and actually also supported by Figure S7, as differences in ICA thresholds affect also the detection of age-related effects. We therefore adjusted the related sentences stating now that:

      “ In case of the MEG signal this may include the application of Signal-Space-Separation algorithms (SSS(24,55)), different thresholds for ICA component detection (see Figure S7), high and low pass filtering, choices during spectral density estimation (window length/type etc.), different parametrization algorithms (e.g. IRASA vs FOOOF) and selection of frequency ranges for the aperiodic slope estimation.”

      It would be worth clarifying that the linked mastoid re-reference alone has been proposed to cancel out the ECG signal, rather than that a linked-mastoid re-reference improves the performance of the ICA separation (which could be inferred by the explanation as it's currently written).

      This is correct and we adjusted the sentence accordingly! Stating now that:

      “ Previous work(12,56) has shown that a linked mastoid reference alone was particularly effective in reducing the impact of ECG related activity on aperiodic activity measured using EEG. “

      The issue of the number of EEG channels could probably just be noted as a potential limitation, as could the issue of neural activity being mixed into the ECG component (although this does pose a potential confound to the M/EEG without ECG condition, I suspect it wouldn't be critical).

      This is indeed a very fair point as a higher amount of electrodes would probably make it easier to better isolate ECG components in the EEG, which may be the reason why the separation did not work so well in our case. However, this is ultimately an empirical question so we highlighted it in the discussion section stating that: “Difficulties in removing ECG related components from EEG signals via ICA might be attributable to various reasons such as the number of available sensors or assumptions related to the non-gaussianity of the underlying sources. Further understanding of this matter is highly important given that ICA is the most widely used procedure to separate neural from peripheral physiological sources. ”

      OUTLOOK:

      Page 19: Although there has been a recent trend to control for 1/f activity when examining oscillatory power, recent research suggests that this should only be implemented in specific circumstances, otherwise the correction causes more of a confound than the issue does. It might be worth considering this point with regards to the final recommendation in the Outlook section: Brake, N., Duc, F., Rokos, A., Arseneau, F., Shahiri, S., Khadra, A., & Plourde, G. (2024). A neurophysiological basis for aperiodic EEG and the background spectral trend. Nature Communications, 15(1), 1514.

      We want to thank the reviewer for recommending this very interesting paper! The authors of said paper present compelling evidence showing that, while peak detection above an aperiodic trend using methods like FOOOF or IRASA is a prerequisite to determine the presence of oscillatory activity, it’s not necessarily straightforward to determine which detrending approach should be applied to determine the actual power of an oscillation. Furthermore, the authors suggest that wrongfully detrending may cause larger errors than not detrending at all. We therefore added a sentence stating that: “However, whether or not periodic activity (after detection) should be detrended using approaches like FOOOF or IRASA still remains disputed, as incorrectly detrending the data may cause larger errors than not detrending at all(75).”

      RECOMMENDATIONS:

      Page 20: "measure and account for" seems like it's missing a word, can this be re-written so the meaning is more clear?

      Done as suggested. The sentence now states: “To better disentangle physiological and neural sources of aperiodic activity, we propose the following steps to (1) measure and (2) account for physiological influences.”

      I would re-phrase "doing an ICA" to "reducing cardiac artifacts using ICA" (this wording could be changed in other places also).

      I do not like to describe cardiac or ocular activity as artifactual per se. This is also why I used hyphens whenever I mention the word “artifact” in association with the ECG or EOG. However, I do understand that the wording of “doing an ICA” is a bit sloppy. We therefore reworded it accordingly throughout the manuscript to e.g. “separating cardiac from neural sources using an ICA” and “separating physiological from neural sources using an ICA”.

      I would additionally note that even if components are identified as unambiguously cardiac, it is still likely that neural activity is mixed in, and so either subtracting or leaving the component will both be an issue (https://doi.org/10.1101/2024.06.06.597688). As such, even perfect identification of whether components are cardiac or not would still mean the issue remains (and this issue is also consistent across a considerable range of component based methods). Furthermore, current methods including wavelet transforms on the ICA component still do not provide good separation of the artifact and neural activity.

      This is definitely a fair point and we also highlight this in our recommendations under 3 stating that:

      “However, separating physiological from neural sources using an ICA is no guarantee that peripheral physiological activity is fully removed from the cortical signal. Even more sophisticated ICA based methods that e.g. apply wavelet transforms on the ICA components may still not provide a good separation of peripheral physiological and neural activity76,77. This turns the process of deciding whether or not an ICA component is e.g. either reflective of cardiac or neural activity into a challenging problem. For instance, when we only extract cardiac components using relatively high detection thresholds (e.g. r > 0.8), we might end up misclassifying residual cardiac activity as neural. In turn, we can’t always be sure that using lower thresholds won’t result in misinterpreting parts of the neural effects as cardiac. Both ways of analyzing the data can potentially result in misconceptions.”

      Castellanos, N. P., & Makarov, V. A. (2006). Recovering EEG brain signals: Artifact suppression with wavelet enhanced independent component analysis. Journal of neuroscience methods, 158(2), 300-312.

      Bailey, N. W., Hill, A. T., Godfrey, K., Perera, M. P. N., Rogasch, N. C., Fitzgibbon, B. M., & Fitzgerald, P. B. (2024). EEG is better when cleaning effectively targets artifacts. bioRxiv, 2024-06.

      METHODS:

      Pre-processing, page 24: I assume the symmetric setting of fastica was used (rather than the deflation setting), but this should be specified.

      Indeed the reviewer is correct, we used the standard setting of fastICA implemented in MNE python, which is calling the FastICA implementation in sklearn that is per default using the “parallel” or symmetric algorithm to compute an ICA. We added this information to the text accordingly, stating that:

      “For extracting physiological “artifacts” from the data, 50 independent components were calculated using the fastica algorithm(22) (implemented in MNE-Python version 1.2; with the parallel/symmetric setting; note: 50 components were selected for MEG for computational reasons for the analysis of EEG data no threshold was applied).”

      Temporal response functions, page 26: can the authors please clarify whether the TRF is computed against the ECG signal for each electrode or sensory independently, or if all electrodes/sensors are included in the analysis concurrently? I'm assuming it was computed for each electrode and sensory separately, since the TRF was computed in both the forward and backwards direction (perhaps the meaning of forwards and backwards could be explained in more detail also - i.e. using the ECG to predict the EEG signal, or using the EEG signal to predict the ECG signal?).

      A TRF can also be conceptualized as a multiple regression model over time lags. This means that we used all channels to compute the forward and backward models. In the case of the forward model we predicted the signal of the M/EEG channels in a multivariate regression model using the ECG electrode as predictor. In case of the backward model we predicted the ECG electrode based on the signal of all M/EEG channels. The forward model was used to depict the time window at which the ECG signal was encoded in the M/EEG recording, which appears at 0 time lags indicating volume conduction. The backward model was used to see how much information of the ECG was decodable by taking the information of all channels.

      We tried to further clarify this approach in the methods section stating that:

      “We calculated the same model in the forward direction (encoding model; i.e. predicting M/EEG data in a multivariate model from the ECG signal) and backward direction (decoding model; i.e. predicting the ECG signal using all M/EEG channels as predictors).”

      Page 27: the ECG data was fit using a knee, but it seems the EEG and MEG data was not.

      Does this different pose any potential confound to the conclusions drawn? (having said this, Figure S4 suggests perhaps a knee was tested in the M/EEG data, which should perhaps be explained in the text also).

      This was indeed tested in a previous review round to ensure that our results are not dependent on the presence/absence of a knee in the data. We therefore added figure S4, but forgot to actually add a description in the text. We are sorry for this oversight and added a paragraph to S1 accordingly:

      “Using FOOOF(5), we also investigated the impact of different slope fitting options (fixed vs. knee model fits) on the aperiodic age relationship (see Supplementary Figure S4). The results that we obtained from these analyses using FOOOF offer converging evidence with our main analysis using IRASA.”

      Page 32: my understanding of the result reported here is that cleaning with ICA provided better sensitivity to the effects of age on 1/f activity than cleaning with SSS. Is this accurate? I think this could also be reported in the main manuscript, as it will be useful to researchers considering how to clean their M/EEG data prior to analyzing 1/f activity.

      The reviewer is correct in stating that we overall detected slightly more “significant” effects, when not additionally cleaning the data using SSS. However, I am a bit wary of recommending omitting the use of SSS maxfilter solely based on this information. It can very well be that the higher quantity of effects (when not employing SSS maxfilter) stems from other physiological sources (e.g. muscle activity) that are correlated with age and removed when applying SSS maxfiltering. I think that just conditioning the decision of whether or not maxfilter is applied based on the amount or size of effects may not be the best idea. Instead I think that the applicability of maxfilter for research questions related to aperiodic activity should be the topic of additional methodological research. We therefore now write in Text S1:

      “Considering that we detected less and weaker aperiodic effects when using SSS maxfilter is it advisable to omit maxfilter, when analyzing aperiodic signals? We don’t think that we can make such a judgment based on our current results. This is because it's unclear whether or not the reduction of effects stems from an additional removal of peripheral information (e.g. muscle activity; that may be correlated with aging) or is induced by the SSS maxfiltering procedure itself. As the use of maxfilter in detecting changes of aperiodic activity was not subject of analysis that we are aware of, we suggest that this should be the topic of additional methodological research.”

      Page 39, Figure S6 and Figure S8: Perhaps the caption could also briefly explain the difference between maxfilter set to false vs true? I might have missed it, but I didn't gain an understanding of what varying maxfilter would mean.

      Figure S6 shows the effect of ageing on the spectral slope averaged across all channels. The maxfilter set to false in AB) means that no maxfiltering using SSS was performed vs. in CD) where the data was additionally processed using the SSS maxfilter algorithm. We now describe this more clearly by writing in the caption:

      “Supplementary Figure S6: Age-related changes in aperiodic brain activity are most prominent on explained by cardiac components irrespective of maxfiltering the data using signal space separation (SSS) or not AC) Age was used to predict the spectral slope (fitted at 0.1-145Hz) averaged across sensors at rest in three different conditions (ECG components not rejected [blue], ECG components rejected [orange], ECG components only [green].”

    1. Author response:

      The following is the authors’ response to the original reviews

      Public reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this paper, Weber et al. investigate the role of 4 dopaminergic neurons of the Drosophila larva in mediating the association between an aversive high-salt stimulus and a neutral odor. The 4 DANs belong to the DL1 cluster and innervate non-overlapping compartments of the mushroom body, distinct from those involved in appetitive associative learning. Using specific driver lines, they show that activation of the DAN-g1 is sufficient to mimic an aversive memory and it is also necessary to form a high-salt memory of full strength, although optogenetic silencing of this neuron only partially affects the performance index. The authors use calcium imaging to show that the DAN-g1 is not the only one that responds to salt. DAN-c1 and d1 also respond to salt, but they seem to play no role in the assays tested. DAN-f1, which does not respond to salt, is able to lead to the formation of memory (if optogenetically activated), but it is not necessary for the salt-odor memory formation in normal conditions. However, silencing of DAN-f1 together with DAN-g1, enhances the memory deficit of DAN-g1.

      Strengths:

      The paper therefore reveals that also in the Drosophila larva as in the adult, rewards and punishments are processed by exclusive sets of DANs and that a complex interaction between a subset of DANs mediates salt-odor association.

      Overall, the manuscript contributes valuable results that are useful for understanding the organization and function of the dopaminergic system. The behavioral role of the specific DANs is accessed using specific driver lines which allow for testing of their function individually and in pairs. Moreover, the authors perform calcium imaging to test whether DANs are activated by salt, a prerequisite for inducing a negative association with it. Proper genetic controls are carried across the manuscript.

      Weaknesses:

      The authors use two different approaches to silence dopaminergic neurons: optogenetics and induction of apoptosis. The results are not always consistent, and the authors could improve the presentation and interpretation of the data. Specifically, optogenetics seems a better approach than apoptosis, which can affect the overall development of the system, but apoptosis experiments are used to set the grounds of the paper.

      The physiological data would suggest the role of a certain subset of DANs in salt-odor association, but a different partially overlapping set seems to be necessary. This should be better discussed and integrated into the author's conclusion. The EM data analysis reveals a non-trivial organization of sensory inputs into DANs and it is hard to extrapolate a link to the functional data presented in the paper.

      We would like to thank reviewer 1 for the positive evaluation of our work and for the critical suggestions for improvement. In the new version of the manuscript, we have centralized the optogenetic results and moved some of the ablation experiments to the Supplement. We also discuss in detail the experimental differences in the results. In addition, we have softened our interpretation of the specificity of memory for salt. As a result, we now emphasize more the general role of DANs for aversive learning in the larva. These changes are now also summarized and explained more simply and clearly in the Discussion, along with a revised discussion of the EM data.

      Reviewer #2 (Public Review):

      Summary:

      In this work, the authors show that dopaminergic neurons (DANs) from the DL1 cluster in Drosophila larvae are required for the formation of aversive memories. DL1 DANs complement pPAM cluster neurons which are required for the formation of attractive memories. This shows the compartmentalized network organization of how an insect learning center (the mushroom body) encodes memory by integrating olfactory stimuli with aversive or attractive teaching signals. Interestingly, the authors found that the 4 main dopaminergic DL1 neurons act redundantly, and that single-cell ablation did not result in aversive memory defects. However, ablation or silencing of a specific DL1 subset (DAN-f1,g1) resulted in reduced salt aversion learning, which was specific to salt but no other aversive teaching stimuli were tested. Importantly, activation of these DANs using an optogenetic approach was also sufficient to induce aversive learning in the presence of high salt. Together with the functional imaging of salt and fructose responses of the individual DANs and the implemented connectome analysis of sensory (and other) inputs to DL1/pPAM DANs, this represents a very comprehensive study linking the structural, functional, and behavioral role of DL1 DANs. This provides fundamental insight into the function of a simple yet efficiently organized learning center which displays highly conserved features of integrating teaching signals with other sensory cues via dopaminergic signaling.

      Strengths:

      This is a very careful, precise, and meticulous study identifying the main larval DANs involved in aversive learning using high salt as a teaching signal. This is highly interesting because it allows us to define the cellular substrates and pathways of aversive learning down to the single-cell level in a system without much redundancy. It therefore sets the basis to conduct even more sophisticated experiments and together with the neat connectome analysis opens the possibility of unraveling different sensory processing pathways within the DL1 cluster and integration with the higher-order circuit elements (Kenyon cells and MBONs). The authors' claims are well substantiated by the data and clearly discussed in the appropriate context. The authors also implement neat pathway analyses using the larval connectome data to its full advantage, thus providing network pathways that contribute towards explaining the obtained results.

      Weaknesses:

      While there is certainly room for further analysis in the future, the study is very complete as it stands. Suggestions for clarification are minor in nature.

      We would like to thank reviewer 2 for the positive evaluation of our work. In fact, follow-up work is already underway to further analyze the role of the individual DL1 DANs. We have addressed the constructive and detailed suggestions for improvement in our point-by-point responses in the “Recommendations for the authors” section.

      Reviewer #3 (Public Review):

      The study of Weber et al. provides a thorough investigation of the roles of four individual dopamine neurons for aversive associative learning in the Drosophila larva. They focus on the neurons of the DL-1 cluster which already have been shown to signal aversive teaching signals. However, the authors go far beyond the previous publications and test whether each of these dopamine neurons responds to salt or sugar, is necessary for learning about salt, bitter, or sugar, and is sufficient to induce a memory when optogenetically activated. In addition, previously published connectomic data is used to analyze the synaptic input to each of these dopamine neurons. The authors conclude that the aversive teaching signal induced by salt is distributed across the four DL-1 dopamine neurons, with two of them, DAN-f1 and DAN-g1, being particularly important. Overall, the experiments are well designed and performed, support the authors' conclusions, and deepen our understanding of the dopaminergic punishment system.

      Strengths:

      (1) This study provides, at least to my knowledge, the first in vivo imaging of larval dopamine neurons in response to tastants. Although the selection of tastants is limited, the results close an important gap in our understanding of the function of these neurons.

      (2) The authors performed a large number of experiments to probe for the necessity of each individual dopamine neuron, as well as combinations of neurons, for associative learning. This includes two different training regimens (1 or 3 trials), three different tastants (salt, quinine, and fructose) and two different effectors, one ablating the neuron, the other one acutely silencing it. This thorough work is highly commendable, and the results prove that it was worth it. The authors find that only one neuron, DAN-g1, is partially necessary for salt learning when acutely silenced, whereas a combination of two neurons, DAN-f1 and DAN-g1, are necessary for salt learning when either being ablated or silenced.

      (3) In addition, the authors probe whether any of the DL-1 neurons is sufficient for inducing an aversive memory. They found this to be the case for three of the neurons, largely confirming previous results obtained by a different learning paradigm, parameters, and effector.

      (4) This study also takes into account connectomic data to analyze the sensory input that each of the dopamine neurons receives. This analysis provides a welcome addition to previous studies and helps to gain a more complete understanding. The authors find large differences in inputs that each neuron receives, and little overlap in input that the dopamine neurons of the "aversive" DL-1 cluster and the "appetitive" pPAM cluster seem to receive.

      (5) Finally, the authors try to link all the gathered information in order to describe an updated working model of how aversive teaching signals are carried by dopamine neurons to the larva's memory center. This includes important comparisons both between two different aversive stimuli (salt and nociception) and between the larval and adult stages.

      Weaknesses:

      (1) The authors repeatedly claim that they found/proved salt-specific memories. I think this is problematic to some extent.

      (1a) With respect to the necessity of the DL-1 neurons for aversive memories, the authors' notion of salt-specificity relies on a significant reduction in salt memory after ablating DAN-f1 and g1, and the lack of such a reduction in quinine memory. However, Fig. 5K shows a quite suspicious trend of an impaired quinine memory which might have been significant with a higher sample size. I therefore think it is not fully clear yet whether DAN-f1 and DAN-g1 are really specifically necessary for salt learning, and the conclusions should be phrased carefully.

      (1b) With respect to the results of the optogenetic activation of DL-1 neurons, the authors conclude that specific salt memories were established because the aversive memories were observed in the presence of salt. However, this does not prove that the established memory is specific to salt - it could be an unspecific aversive memory that potentially could be observed in the presence of any other aversive stimuli. In the case of DAN-f1, the authors show that the neuron does not even get activated by salt, but is inhibited by sugar. Why should activation of such a neuron establish a specific salt memory? At the current state, the authors clearly showed that optogenetic activation of the neurons does induce aversive memories - the "content" of those memories, however, remains unknown.

      (2) In many figures (e.g. figures 4, 5, 6, supplementary figures S2, S3, S5), the same behavioural data of the effector control is plotted in several sub-figures. Were these experiments done in parallel? If not, the data should not be presented together with results not gathered in parallel. If yes, this should be clearly stated in the figure legends.

      We would also like to thank reviewer 3 for his positive assessment of our work. As already mentioned by reviewer 1, we understand the criticism that the salt specificity for which the individual DANs are coded is not fully always supported by the results of the work. We have therefore rewritten the relevant passages, which are also cited by the reviewer. We have also included the second point of criticism and incorporated it into our manuscript. As the control groups were always measured in parallel with the experimental animals, we can also present the data together in a sub-figure. We clearly state this now in the revised figure legends.

      Summary of recommendations to authors:

      Overall, the study is commendable for its systematic approach and solid methodology. Several weaknesses were identified, prompting the need for careful revisions of the manuscript:

      We thank the reviewers for the careful revision of our manuscript. In the subsequent sections, we aim to address their concerns as thoroughly as possible. A comprehensive one-to-one listing can be found below.

      (1) The authors should reconsider their assertion of uncovering a salt-specific memory, as the evidence does not conclusively demonstrate the exclusive necessity of DAN-f1 and DAN-g1 for salt learning. In particular, the optogenetic activation of DAN-f1 leads to plasticity but this might not be salt-specific. The precise nature of the memory content remains elusive, warranting a nuanced rephrasing of the conclusions.

      We only partially agree – optogenetic activation of DANs does not really allow to comment on its salt-specificity, true. However, we used high-salt concentrations during test. Over the years, the Gerber lab nicely demonstrated in several papers that larvae recall an aversive odor-salt memory only if salt is present during test (Gerber and Hendel, 2006; Niewalda et al 2008; Schleyer et al. 2011; Schleyer et al. 2015). The used US has to be present during test. Even at the same concentration other aversive stimuli (e.g. bitter quinine) are not able to allow the larvae to recall this particular type of memory. So, if the optogenetic activation of DAN-f1 establishes a memory that can be recalled on salt, we argue that it has to encode aspects of the salt information. On the other hand, only for DAN-g1 we see the necessity for salt learning. And – although (based on the current literature) very unlikely, we cannot fully exclude that the activation of DAN-f1 establishes a yet unknown type of memory that can be also recalled on a salt plate. Therefore, we partially agree and accordingly have rephrased the entire manuscript to avoid an over-interpretation of our data. Throughout the manuscript we avoid now to use the term salt-specific memory but rather describe the type of memory as aversive memory.

      (2) A thorough examination or discussion about the potential influence of blue light aversion on behavioral observations is necessary to ensure a balanced interpretation of the findings.

      To address this point every single behavioral experiment that uses optogenetic blue light activation runs with appropriate and mandatory controls. For blue light activation experiments, two genetic controls are used that either get the same blue light treatment (effector control, w1118>UAS-ChR2XXL) or no blue light treatment (dark control, XY-split-Gal4>UAS-ChR2XXL). For blue light inactivation experiments one group is added that has exactly the same genotype but did not receive food containing retinal. These experiments show that blue light exposure itself does not induce an aversive nor positive memory and blue light exposure does not impair the establishment of odor-high salt memory. In addition, we used the latest established transgenes available. ChR2<sup>XXL</sup> is very sensitive to blue light. Only 220 lux (60 µW/cm<sup>²</sup>) were necessary to obtain stable results. In our hands – short term exposure for up to 5 minutes with such low intensities does not induce a blue light aversion. Following the advice of the reviewer, we also address this concern by adding several sentences into the related results and methods sections.

      (3) The authors should address the limitations associated with the use of rpr/hid for neuronal ablations, such as the effects of potential developmental compensation.

      We agree with this concern. It is well possible that the ablation experiments induce compensatory effects during larval development. Such an effect may be the reason for differences in phenotypes when comparing hid,rpr ablation with optogenetic inhibition. This is now part of the discussion. In addition, we evaluated if the ablation worked in our experiments. So far controls were missing that show that the expression of hid,rpr really leads to the ablation of DANs. We now added these experiments and clearly show anatomically that the DANs are ablated (related to figure 4-figure supplement 6).

      (4) While the connectome analysis offers valuable insights into the observed functions of specific DANs in relation to their extrinsic (sensory) and intrinsic (state) inputs, integrating this data more cohesively within the manuscript through careful rewriting would enhance the coherence of the study.

      We understand this concern. Therefore, the new version of our manuscript is now intensifying the inclusion of the EM data in our interpretation of the results. Throughout the entire manuscript we have now rewritten the related parts. We have also completely revised the corresponding section in the results chapter.

      (5) More generally, the authors are encouraged to discuss internal discrepancies in the results of their functional manipulation experiments.

      Thank you for this suggestion. We do of course understand that we have not given the different results enough space in the discussion. We have now changed this and have been happy to comprehensively address the concern. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Here are some suggestions for clarification and improvement of the manuscript:

      (1) The authors should discuss why the silencing experiment with TH-GAL4 (Fig. 1) does not abolish memory formation (I assume that the PI should go to zero). Does it mean that other non-TH neurons are involved in salt-odor memory formation? Are there other lines that completely abolish this type of learning?

      Thank you very much for highlighting this crucial point. Indeed, the functional intervention does not completely eliminate the memory. There could be several reasons, or a combination thereof, for this outcome. For instance, it's plausible that the UAS-GtACR2 effector doesn't entirely suppress the activity of dopaminergic neurons. Additionally, the memory may comprise different types, not all of which are linked to dopamine function. It's also noteworthy that TH-Gal4 doesn't encompass all dopaminergic neurons – even a neuron from the DL1 cluster is absent (as previously reported in Selcho et al., 2009). Considering we're utilizing high salt concentrations in this experiment, it's conceivable that non gustatory-driven memories are formed based solely on the systemic effects of salt (e.g., increased osmotic pressure). These possibilities are now acknowledged in the text.

      (2) The Rpr experiments in Fig. 4 do not lead to any phenotype and there is a general assumption that the system compensates during development. However, there is no demonstration that Rpr worked or that development compensated for that. What do we learn from these data? Would it make sense to move it to supplement to make the story more compact? In addition: the conclusion at L 236 "DL1.... Are not individually necessary" is later disproved by optogenetic silencing. Similarly, optogenetic silencing of f1+g1 is affecting 1X and 3X learning, but not when using Rpr. Moreover, Rpr wdid not give any phenotype in other data in the supplementary material. I'm not sure how valid these results are.

      We acknowledge this concern and have actively deliberated various options for restructuring the presented ablation data. Ultimately, we reached a consensus that relocating Figure 4 to the supplement is warranted. Furthermore, corresponding adjustments have been made in the text. This decision amplifies the significance of the optogenetic results. In addition, we also addressed the other part of the concern. We examined the efficacy of hid and rpr in our experiments. Indeed, we successfully ablated specific DANs, as illustrated in the new anatomical data presented in Figure 4- figure supplement 6, which strengthens the interpretation of the hid,rpr experiments.

      (3) In most figures that show data for 1X and 3X training, there is no difference between these two conditions (I would suggest moving one set as a supplement). When a difference appears (Fig.5A-D) the implications are not discussed properly. Is it known that some circuits are necessary for the 1X but not for the 3X protocol? Is that a reasonable finding? I would expect the opposite, but I might lack of knowledge here. However, the optogenetic silencing of the same neurons in Figure 7 shows the same phenotype for 1X and 3X. Again, the validity of the Rpr experiments seems debatable.

      Different training protocols lead to different memory phases (STM and STM+ARM). We have shown that in the past in Widmann et al. 2016. Therefore, we are convinced that it makes sense to keep both data sets in the main manuscript. However, we agree that this was not properly introduced and discussed and therefore made the respective changes in the manuscript.

      (4) In Figure 3, it is unclear what the responses were tested against. Since they are so small and noisy there would be a need for a control. Moreover, in some cases, it looks like the DF/F is normalized to the wrong value: e.g. in DAN-c1 100mM, the activity in 0-10s is always above zero, and in pPAM with fructose is always below zero. This might not have any consequence on the results but should be adjusted.

      Thank you very much for your criticism, which we greatly appreciate. We have carefully re-examined the data and found that there was a mistake for the normalization of the values. We made the necessary adjustments to the evaluation, as per your suggestions. The updated figures, figure legends, and results have been incorporated into the new version of the manuscript. As noted by the reviewer, these corrections have not altered the interpretation of the data or the primary responses of the various DANs.

      (5) In the abstract: "Optogenetic activation of DAN-f1 and DAN-g1 alone suffices to substitute for salt punishment... Each DAN encodes a different aspect of salt punishment". These sentences might be misleading and an overstatement: only DAN-g1 shows a clear role, while the function of the other DANs in the context of salt-odor learning remains obscure.

      We have refined the respective part of the abstract accordingly. Consequently, we have reworded the related section, aiming to avoid any exaggeration.

      (6) The physiology is done in L1 larvae but behavior is tested in L3 larvae. There could be a change in this time that could explain the salt responses in c1 and d1 but no role in salt-odor learning?

      While we cannot dismiss the possibility of a developmental change from L1 to L3, a comparison of the anatomical data of the DL1 DANs from electron microscopy (EM) and light microscopy (LM) data indicates that their overall morphology remains consistent. However, it's important to note that this observation does not analyse the physiological aspects of these cells. Consequently, we have incorporated this concern into the discussion of the revised version of the manuscript.

      (7) The introduction needs some editing starting at L 129, as it ends with a discussion of a previously published EM data analysis. I would rather suggest stating which questions are addressed in this paper and which methods will be used and perhaps a hint on the results obtained.

      We understand the concern. We have added a concise paragraph to the conclusion of the introduction, highlighting the biological question, technical details, and a short hint on the acquired findings.

      (8) It is clear to me that the presentation of salt during the test is necessary for recall, however in L 166 I don't understand the explanation: how is the memory used in a beneficial way in the test? The salt is present everywhere and the odor cue is actually useless to escape it.

      Extensive research, exemplified by studies such as Schleyer et al. (2015) published in Elife, clearly demonstrates that the recall of odor-high salt memory occurs exclusively when tested on a high salt plate. Even when tested on a bitter quinine plate, the aversive memory is not recalled. This phenomenon is attributed to the triggering of motivation to recall the memory by the omnipresent abundance of the unconditioned stimulus (US) during the test, which in our case is high salt. Furthermore, the concentration of the stimulus plays a crucial role (Schleyer et al. 2011). The odor cue indicates where the situation could potentially be improved; however, if high salt is absent, this motivational drive diminishes as there is no memory present to enhance the already favorable situation. Additionally, the motivation to evade the omnipresent and unpleasant high salt stimulus persists throughout the entire 5-minute test period.

      (9) L288: the fact that f1 shows a phenotype in this experiment does not mean that it encodes a salt signal, indeed it does not respond to salt. It perhaps induces a plasticity that can be recalled by salt, but not necessarily linked to salt. The synergy between f1 and g1 in the salt assay was postulated based on exp with Rpr, but the validity of these experiments is dubious. I'm not sure there is sufficient evidence from Figures 6 and 7 to support a synergistic action between f1 and g1.

      It is true that DAN-f1 alone is not necessary for mediating a high salt teaching signal based on ablation, optogenetic inhibition and even physiology. However, optogenetic activation alone shows a memory tested on a salt plate. Given the logic explained above that is accepted by several publications, we would like to keep the statement. Especially as the joined activation with DAN-g1 gives rise to significant higher or lower values after joined optogenetic activation or inactivation (Figure 5E and F, Figure 6E and F in the new version). Nevertheless, we have modified the sentence. In the text we describe these effects now as “these results may suggest that DAN-f1 and DAN-g1 encode aspects of the natural aversive high salt teaching signal under the conditions that we tested”. We think that this is an appropriate and three-fold restricted statement. Therefore, we would like to keep it in this restricted version. However, we are happy to reconsider this if the reviewer thinks it is critical. 

      (10) I find the EM analysis hard to read. First of all, because of the two different graphical representations used in Fig. 8, wouldn't one be sufficient to make the point? Secondly, I could not grasp a take-home-message: what do we learn from the EM data? Do they explain any of the results? It seems to me that they don't provide an explanation of why some DL1 neurons respond to salt and others don't.

      We understand that the EM analysis is hard to read and have now carefully rewritten this part of the manuscript. See also general concern 4 above. The main take home message is not to explain why some DL1 neurons respond to salt and other do not. This cannot be resolved due to the missing information on the salt perceiving receptor cells. Unfortunately, we miss the peripheral nervous system in the EM - the first layer of salt information processing. However, our analysis shows clearly that the 4 DANs have their own identity based on their connectivity. None of them is the same – but to a certain extent similarities exist. This nicely reflects the physiological and behavioral results. We have now clarified that in the result to ease the understanding for the readership. In addition, we also clearly state that we don’t address the point why some DL1 neurons respond to salt and why others don’t respond.

      (11) Do the manipulations (activation and silencing) affect odor preference in the presence of salt? Did the authors test that the two odors do not drive different behaviors on the salty plate? Or did they only test the odor preference on plain agarose? Can we exclude a role for the DAN in driving multisensory-driven innate behavior?

      Innate odor preferences are not changed by the presence of salt or even other tastants (this work but see also Schleyer et al 2015, Figure 3, Elife). Even the naïve choice between two odors is the same if tested in the presence of different tastants (Schleyer et al 2015, Figure 3, Elife). This shows – at least for the tested stimuli and conditions – that are similar to the ones that we use – that there is no multisensory-driven innate odor-taste behavior. Therefore – at least to our knowledge - experiments as the ones suggested by the reviewer were never done in larval odor-taste learning studies. Therefore, we suggest that DAN activation has no effect on innate larval behavior. However, we are happy to reconsider this if the reviewer thinks it is critical. 

      (12) L 280: the authors generalize the conclusion to all DL1-DANs, but it does not apply to c1 and d1.

      Thanks for this comment. We deleted that sentence as suggested and thus do not anymore generalize the conclusion to all DL-DANs.

      (13) L345: I do not see the described differences in Fig. 8F, presynaptic sites of both types seem to appear in rather broad regions: could the author try to clarify this?

      We understand that the anatomical description of the data is often hard to read. Especially to readers that are not used to these kind of figures. We have therefore modified the text to ease the understanding and clarify the difference in the labeled brain regions for the broad readership.

      (14) L373: the conclusion on c1 is unsupported by data: this neuron responds to both salt and fructose (Figure 3 ) while the conclusion is purely based on EM data analysis.

      The sentence is not a conclusion but a speculation and we also list the cell's response to positive and negative gustatory stimuli. Therefore, we do not understand exactly what the reviewer means here. However, we have tried to address the criticism and have revised the sentences.

      (15) L385: the data on d1 seem to be inconsistent with Eschbach 2020, but the authors do not discuss if this is due to the differential vs absolute training, or perhaps the presence of the US during the test (which does not seem to be there in Eschbach, 2020) - is the training protocol really responsible for this inconsistency? For f1 the data seem to be consistent across these studies. The authors should clarify how the exp in Fig 6 differs from Eschbach, 2020 and how one could interpret the differences.

      True. This concern is correct. We now discuss the difference in more detail. Eschbach et al. used Cs-Crimson as a genetic tool, a one odor paradigm with 3 training cycles, and no gustatory cues in their approach. These differences are now discussed in the new version of the manuscript.

      (16) L460-475 A long part of this paragraph discusses the similarities between c1 and d1 and corresponding PPL1 neurons in the adult fly. However, c1 and d1 do not really show any phenotype in this paper, I'm not sure what we learn from this discussion and how much this paper can contribute to it. I would have wished for a discussion of how one could possibly reconcile the observed inconsistencies.

      Based on the comments of the different reviewers several paragraphs in the discussion were modified. We agree that the part on the larval-adult comparison is quite long. Thus we have shortened it as suggested by the reviewer.

      Minor corrections:

      L28 "resultant association" maybe resulting instead.

      L55 "animals derive benefit": remove derive.

      L78 "composing 12,000 neurons": composed of.

      L79 what is stable in a "stable behavioral assay"?

      L104: 2 times cluste.

      L122: "DL1 DANs are involved" in what?

      Fig. 1 please check subpanels labels, D repeats.

      L 362: "But how do individual neurons contribute to the teaching signal of the complete cluster?" I don't understand the question.

      L364 I did not hear before about the "labeled line hypothesis" in this context - could the author clarify?

      L368: edit "combinatorically".

      L390: "current suppression" maybe acute suppression.

      L 400 I'm not sure what is meant by "judicious functional configuration" and "redundancy". The functions of these cells are not redundant, and no straightforward prediction of their function can be done from their physiological response to salt.

      Thanks a lot for your in detail review of our manuscript. We welcome your well-taken concerns and have made the requested changes for all points that you have raised.

      Reviewer #2 (Recommendations For The Authors):

      (1) In Figure 1 the reconstruction of pPAM and DL1 DANs shows the compartmentalized innervation of the larval MB. However, the images are a bit low in color contrast to appreciate the innervation well. In particular in panel B, it is hard to identify the innervated MB body structure. A schematic model of the larval MB and DAN innervation domains like in Fig. 2A would help to clarify the innervation pattern to the non-specialist.

      We understand this concern and have changed figure 1 as suggested by the reviewer. A schematic model of the MB and DANs is now presented already in figure 1 as well as the according supplemental figure.

      (2) Blue light itself can be aversive for larvae and thus interfere with the aversive learning paradigm. Does the given Illuminance (220 lux) used in these experiments affect the behavior and learning outcome?

      Yes, in former times high intensities of blue light were necessary to trigger the first generation optogenetic tools. The high intensity blue light itself was able to establish an aversive memory (e.g. Rohwedder et al. 2016). Usage of the second generation optogenetic tools allowed us to strongly reduce the applied light intensity. Now we use 220 lux (equal to 60 µW/cm<sup>2</sup>). Please note that all Gal4 and UAS controls in the manuscript are nonsignificant different from zero. The mild blue light stimulation therefore does not serve as a teaching signal and has neither an aversive nor an appetitive effect. Furthermore, we use this mild light intensity for several other behavioral paradigms (locomotion, feeding, naïve preferences) and have never seen an effect on the behavior.

      (3) Fig.2: Except for MB054B-Gal4 only the MB expression pattern is shown for other lines. Is there any additional expression in other cells of the brain? In the legend in line 761, the reporter does not show endogenous expression, rather it is a fluorescent reporter signal labeling the mushroom body.

      The lines were initially identified by a screen on larval MB neurons done together with Jim Truman, Marta Zlatic and Bertram Gerber. Here full brain scans were always analyzed. These images can be seen in Eschbach et al. 2020, extended figure 1. Neither in their evaluation nor in our anatomical evaluation (using a different protocol) additional expression in brain cells was detectable. We also modified the figure legend as suggested.

      (4) Fig.3: Precise n numbers per experiment should be stated in the figure legend.

      True, we now present n numbers per experiment whenever necessary.

      (5) Fig.4: Have the authors confirmed complete ablation of the targeted neuron using rpr/hid? Ablations can be highly incomplete depending on the onset and strength of Gal4 expression, leaving some functionality intact. While the ablation experiments are largely in line with the acute silencing of single DANs during high salt learning performed later on (Fig.7), there is potentially an interesting aspect of developmental compensation hidden in this data. Not a major point, but potentially interesting to check.

      We agree with this criticism. We have not tested if the expression of hid,rpr in DL1 DANs does really ablate them. Therefore we did an additional experiment to show that. The new data is now present as a supplemental figure (Figure 4- figure supplement 6). The result shows that expression of hid,rpr ablates also DL1 DANs similar to earlier experiments where we used the same effectors to ablate serotoniergic neurons (Huser et al., 2012, figure 5).

      (6) The performance index in Fig. 4 and 5 sometimes seems lower and the variability is higher than in some of the other experiments shown. Is this due to the high intrinsic variability of these particular experiments, or the background effects of the rpr/hid or splitGal4 lines?

      The general variability of these experiments is within the expected and known borders. In these kind of experiments there is always some variation due to several external factors (e.g. experimental time over the year). Therefore it is always important to measure controls and experimental animals at the same time. Of course that’s what we did and we only compare directly results of individual datasets. But not between different datasets. This is further hampered given that the experiments of Figure 4 (now Figure 4- figure supplement 1) and Figure 5 (now Figure 4) differ in several parameters from other learning experiments presented later in the text. Optogenetic activation uses blue light stimulation instead of “real world” high salt. Most often direct activation of specific DANs in the brain is more stable than the external high salt stimulation. Also optogenetic inactivation uses blue light stimulation and also retinal supplemented food. Both factors can affect the measurement. We thus want to argue that it is for each experiment most often the particular parameters that affect the variability of the results rather than background effects of the rpr/hid and split-Gal4 lines.

      (7) Fig.7: This is a neat experiment showing the effects of acute silencing of individual DL1 DANs. As silencing DAN-f1/g1 does not result in complete suppression of aversive learning, it would be highly interesting to test (or speculate about) additive or modulatory effects by the other DANs. Dan-c-1/d-1 also responds to high salt but does not show function on its own in these assays. I am aware that this is currently genetically not feasible. It would however be a nice future experiment.

      True, we were intensively screening for DL1 cluster specific driver lines that cover all 4 DL1 neurons or other combinations than the ones we tested. Unfortunately, we did not succeed in identifying them. Nevertheless, we will further screen new genetic resources (e.g. Meissner et al., 2024, bioRxiv) to expand our approach in future experiments. Please also see our comment on concern 1 of reviewer 1 for further technical limitations and biological questions that can also potentially explain the absence of complete suppression of high salt learning and memory. Some of these limitations are now also mentioned and discussed in the new version of the manuscript.

      (8) The discussion is excellent. I would just amend that it is likely that larval DAN-c1, which has high interconnectivity within the larval CNS, is likely integrating state-dependent network changes, similar to the role of some DANs in innate and state-dependent preference behavior. This might contribute to modulating learned behavior depending on the present (acute) and previous environmental conditions.

      Thanks a lot for bringing this up. We rewrote this part and added a discussion on recent work on DAN-c1 function in larvae as well as results on DAN function in innate and state-dependent preference behavior.

      (9) Citation in line 1115 missing access information: "Schnitzer M, Huang C, Luo J, Je Woo S, Roitman L, et al. 2023. Dopamine signals integrate innate and learned valences to regulate memory dynamics. Research Square".

      Unfortunately this escaped our notice. The paper is now published in Nature: Huang, C., Luo, J., Woo, S.J. et al. Dopamine-mediated interactions between short- and long-term memory dynamics. Nature 634, 1141–1149 (2024). https://doi.org/10.1038/s41586-024-07819-w. We have now changed the citation. The new citation includes the missing access information.

      Reviewer #3 (Recommendations For The Authors):

      Regarding my issue about salt specificity in the public review, I want to make clear that I do not suggest additional experiments, but to be very careful in phrasing the conclusions, in particular whenever referring to the experiments with optogenetic activation. This includes presenting these experiments as "(salt) substitution" experiments - inferring that the optogenetic activation would substitute for a natural salt punishment. As important and interesting as the experiments are, they simply do not allow such an interpretation at this point.

      Results, line 140ff: When presenting the results regarding TH-Gal4 crossed to ChR2-XXL, please cite Schroll et al. 2006 who demonstrated the same results for the first time.

      Thanks for mentioning this. We now cite Schroll et al. 2006 here in the text of the manuscript.

      Figure 3: The subfigure labels (ABC) are missing.

      Unfortunately this escaped our notice. Thanks a lot – we have now corrected this mistake.

      Figure 5: For I and L, it reads "salt replaced with fru", but the sketch on the left shows salt in the test. I assume that fructose was not actually present in the test, and therefore the figure can be misleading. I suggest separate sketches. Also, I and L are not mentioned in the figure legend.

      True, this is rather confusing. Based on the well taken concern we have changed the figure by adding a new and correct scheme for sugar reward learning that does not symbolize fructose during test.

      Figure S1: The experimental sketches for E,F and G,H seem to be mixed up.

      We thank the reviewer for bringing this up. In the new version we corrected this mistake.

      Figure S5: There are three sub-figures labelled with B. Please correct.

      Again, thanks a lot. We made the suggested correction in Figure S5.

      Discussion, line 353ff: this and the following sentences can be read as if the authors have discovered the DL-1 neurons as aversive teaching mediators in this study. However, Eschbach et al. 2020 already demonstrated very similar results regarding the optogenetic activation of single DL-1 DANs. I suggest to rephrase and cite Eschbach et al. 2020 at this point.

      That is correct. Our focus was on the gustatory pathway. The original discovery was made by Eschbach et al. We have now corrected this in the discussion and clarified our contribution. It was never our intention to hide this work, as the laboratory was also involved. Nevertheless, this is an annoying omission on our side.

      Line 385-387: this sentence is only correct with respect to Eschbach et al. 2020. Weiglein et al. 2021 used ChR2-XXL as an effector, but another training regimen.

      We understand this criticism. Therefore, we changed the sentence as suggested by the reviewer. See also our response on concern 15 of reviewer 1.

      Line 389ff: I do not understand this sentence. What is meant by persistent and current suppression of activity? If this refers to the behavioural experiments, it is misleading as in the hid, reaper experiments neurons are ablated and not suppressed in activity.

      We made the requested changes in the text. It is true that the ablation of a neuron throughout larval life is different from constantly blocking the output of a persisting neuron.

      Methods, line 615 ff: the performance index is said to be calculated as the difference between the two preferences, but the equation shows the average of the preferences.

      Thanks a lot. We are sorry for the confusion. We have carefully rewritten this part of the methods section to avoid any misunderstanding.

      When discussing the organization of the DL1 cluster, on several occasions I have the impression the authors use the terms "redundant" and "combinatorial" synonymously. I suggest to be more careful here. Redundancy implies that each DAN in principle can "do the job", whereas combinatorial coding implies that only a combination of DANs together can "do the job". If "the job" is establishing an aversive salt memory, the authors' results point to redundancy: no experimental manipulation totally abolished salt learning, implying that the non-manipulated neurons in each experiment sufficed to establish a memory; and several DANs, when individually activated, can establish an aversive memory, implying that each of them indeed can "do the job".

      Based on this concern we have rewritten the discussion as suggested to be more precise when talking about redundancy or combinatorial coding of the aversive teaching signal. Basically, we have removed all the combinatorial terms and replaced them by the term “redundancy”.

      The authors mix parametric and non-parametric statistical tests across the experiments dependent on whether the distribution of the data is normal or not. It would help readers if the authors would clearly state for which data which tests were used.

      We understand the criticism and now have added an additional supplemental file that includes all the information on the statistical tests applied and the distribution of the data.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary:

      This study experimentally examined diet-microbe-host interactions through a complex systems framework, centered on dietary oxalate. Multiple, independent molecular, animal, and in vitro experimental models were introduced into this research. The authors found that microbiome composition influenced multiple oxalate-microbe-host interfaces. Oxalobacter formigenes were only effective against a poor oxalate-degrading microbiota background and give critical new insights into why clinical intervention trials with this species exhibit variable outcomes. Data suggest that, while heterogeneity in the microbiome impacts multiple diet-host-microbe interfaces, metabolic redundancy among diverse microorganisms in specific diet-microbe axes is a critical variable that may impact the efficacy of bacteriotherapies, which can help guide patient and probiotic selection criteria in probiotic clinical trials.

      Thank you. The main message of this research, is that through complex modelling, we believe we have identified the critical variable (metabolic redundancy) that is responsible for the efficacy of probiotics designed to reduce oxalate levels, thus allowing for improved patient selection in clinical trials. We also believe that this process and the critical features identified can be translated to other critical microbial functions such as short chain fatty acid synthesis, secondary bile acid synthesis, and others.

      Strengths:

      The paper has made significant progress in both the depth and breadth of scientific research by systematically comparing multiple experimental methods across multiple dimensions. Particularly through in-depth analysis from the enzymatic perspective, it has not only successfully identified several key strains and redundant genes, which is of great significance for understanding the functions of enzymes, the characteristics of strains, and the mechanisms of genes in microbial communities, but also provided a valuable reference for subsequent experimental design and theoretical research.

      More importantly, the establishment of a novel research approach to probiotics and gut microbiota in this paper represents a major contribution to the current research field. The proposal of this new approach not only breaks through the limitations of traditional research but also offers new perspectives and strategies for the screening, optimization of probiotics, and the regulation of gut microbiota balance. This holds potential significant value for improving human health and the prevention and treatment of related diseases.

      Thank you for the comments. We believe that the approach taken here, which contrasts with conventional reductionist techniques, will be critical for translating gut microbiome research into actionable therapeutic approaches.

      Weaknesses:

      While the study has excellently examined the overall changes in microbial community structure and the functions of individual bacteria, it lacks a focused investigation on the metabolic cross-feeding relationships between oxalate-degrading bacteria and related microorganisms, failing to provide a foundational microbial community or model for future research. Although this paper conducts a detailed study on oxalate metabolism, it would be beneficial to visually present the enrichment of different microbial community structures in metabolic pathways using graphical models.

      Thank you for this critique.  In the current study, we broadly examined the response of the gut microbiota to dietary oxalate. Based on initial shotgun metagenomic results, we focused in on specific taxa and metabolic functions.  Through metagenomic and multiple culture-based studies, we quickly honed in on redundancy in oxalate-degrading function as a key feature for oxalate homeostasis. We believe that the defined microbial community we used for microbial transplants (particularly the taxonomic cohort) provides a strong, minimal community to explore oxalate homeostasis further. In fact, we are using this consortium in multiple follow-up studies to fully understand the cross-feeding that may occur among these microorganisms, as you suggest.  We note that figure 3 shows the change of species and metabolic pathways with oxalate exposure.   

      Furthermore, the authors have done a commendable job in studying the roles of key bacteria. If the interactions and effects of upstream and downstream metabolically related bacteria could be integrated, it would provide readers with even more meaningful information. By illustrating how these bacteria interact within the metabolic network, readers can gain a deeper understanding of the complex ecological and functional relationships within microbial communities. Such an integrated approach would not only enhance the scientific value of the study but also facilitate future research in this area.

      Thank you. We note that based on the collective data obtained in this study, that redundancy in the oxalate degradation is the critical feature that maintains oxalate homeostasis. However, we are interested potential metabolic interactions between microbes in our defined community and are currently investigating these interactions through extensive investigations.   

      Reviewer #2 (Public review):

      Summary:

      Using the well-studied oxalate-microbiome-host system, the authors propose a novel conceptual and experimental framework for developing targeted bacteriotherapies using a three-phase pre-clinical workflow. The third phase is based on a 'complex system theoretical approach' in which multi-omics technologies are combined in independent in vivo and in vitro models to successfully identify the most pertinent variables that influence specific phenotypes in diet-host-microbe systems. The innovation relies on the third phase since phase I and phase II are the dominant approaches everyone in the microbiome field uses.

      Thank you. As you note, the proposed phases I and II are the predominant approaches used. In fact, many clinical trials have been conducted to try and reduce urine oxalate in patients, based solely on mechanistic studies with Oxalobacter formigenes.  As noted in our manuscript, only 43% of those studies results in the intended outcome, necessitating the approach we took in the current study. Our results suggest that the reason for the high rate of failure, despite well established mechanisms, is due to insufficient patient selection that focused only on the presence or absence of O. formigenes, which is a species that exhibits very low prevalence and abundance in the human gut microbiota, normally.

      Strengths:

      The authors used a multidisciplinary approach which included:

      (1) fecal transplant of two distinct microbial communities into Swiss-Webster mice (SWM) to characterize the host response (hepatic response-transcriptomics) and microbial activity (untargeted metabolomics of the stool samples) to different oxalate concentrations;

      (2) longitudinal analysis of the N. albigulia gut microbiome composition in response to varying concentrations of oxalate by shotgun metagenomics, with deep bioinformatic analyses of the genomes assembled; and

      (3) development of synthetic microbial communities around oxalate metabolisms and evaluation of these communities' activity in oxalate degradation in vivo.

      Thank you for these comments.  In the complex modelling approach, we focused on complete microbiota from host species known to have high and low capacities for oxalate tolerance, combined with targeting specific metabolic functions vs. specific taxa that may include unknown functions important for oxalate metabolism.  Further, we examined the influence of our target communities on oxalate metabolism through multiple in vitro and in vivo studies.

      Weaknesses:

      However, I have concerns about the frame the authors tried to provide for a 'complex system theoretical approach' and how the data are interpreted within this frame. Several of the conclusions the authors provide do not seem to have sufficient data to support them.

      Thank you.  We have tried to address these concerns by adding an exhaustive figure that broadly represents our complex modelling approach that includes potential complex system-based hypotheses, how they were tested, and the host-microbiome-oxalate interactions found in our study.

      Recommendations for the authors:  

      Reviewer #2 (Recommendations for the authors):

      Major Concerns

      (1) The authors argue about the importance of bringing 'Complex System Theory' to the microbiome field systematically and consistently. However, the authors fail to introduce this theory throughout the entire manuscript. For example, the authors tried to describe key elements and their nomenclature, such as nodes and fractal layers, in the first part of the result section. But the description is wordy and not precise. It would be more useful if the authors connected the model description with a visual representation, such as a figure. Unfortunately, these elements are not emphasizing and carried across the results section and are not mentioned in the discussion section.

      We have now added a figure (Figure 7) that details this process extensively and ties each of our findings to the complex system model and nomenclature.  We have also reiterated how our results fit in the complex system model in the discussion.

      In addition, there is no straightforward approach to integrating multi-omics datasets to identify the variables that are determinants of the system. For example, Figure 1 focuses on the impact of the host, hepatic activity, to oxalate exposure on fecal transplants into Swiss Webster mice; Figure 2 focuses on the effects of oxalate exposure on stool metabolic activity, not only microbial metabolic activity, on fecal transplants into Swiss Webster mice; and Figure 3 focuses on microbiome responses to different oxalate concentration in Neotoma albigula. There is no "model" to really integrate the host, the microbiome activity, and the microbiome composition information. And, unfortunately, the data generated between experiments cannot directly integrate; see major concern # 2.

      Thank you.  We have made more clear the experimental approach and how it applied to understanding the critical factors that maintain oxalate homeostasis.  Specifically, Figure 1 established that the effect of oxalate on the host was dependent on the microbiota, rather than host genetics.  Figure 2 established the effect of oxalate on the gut microbiota was again dependent on the whole gut microbiota and that these oxalate-microbe effects also influenced oxalate-host effects through a direct multi-omic data integration.  Once we established that the oxalate effects on host and microbiota were dependent on the whole microbiota composition, Figure 3 then sought to figure out how oxalate impacted the gut microbiota, using our model of high oxalate tolerance (N. albigula). With the finding in Figure 3 that there were multiple genes attributed to the degradation of oxalate, or acetogenic, methanogenic, and sulfate reducing pathways, Figure 4 and relevant supplemental figures sought to quantify the redundancy of these pathways.  After establishing a very high degree of redundancy, we sought to use a culturomic approach to determine what environmental factors impacted oxalate metabolism and to evaluate oxalate metabolism using our defined, hypothesized communities of microorganisms.  Finally, figure 6 sought to validate our metagenomic, metabolomic, and culturomic results from multiple animal and in vitro models using targeted microbial transplants in mice.  While we did have some direct multi-omic data integration (Figures 2 and 3), the process employed here sought to systematically determine which factors were most important for the oxalate-microbiota-host relationship, and then to use those results to design the subsequent experiments.  We have added this description to the discussion, which helps to contextualize the complex system modelling approach we took here.

      Finally, the authors did not provide a novel variable that successfully influences oxalate degradation in the oxalate-microbiome-host system. The authors argue that "both resource availability and community composition impact oxalate metabolism," which we currently inferred by the failure of the clinical tries and do not provide a clear intervention strategy to develop functional bacteriotherapy. The identification of composition as an important variable that was predictable without any multi-omics approach was highlighted by the development of synthetic microbial communities. Synthetic microbial communities are critical to characterizing complex microbiomes. Still, the authors did not explain how this strategy can be used in their theoretical framework (that is their goal), and these communities are not well introduced across the manuscript; see major concern # 4.

      As stated, it is clear from the failed clinical trials that we do not fully understand what microbial features dictate oxalate homeostasis.  We have specifically identified, through fecal transplant studies, that microbial composition is critical for oxalate homeostasis and that diverse oxalate-degrading bacteria exist.  However, ours is the first study that explicitly shows that it is this diversity that controls oxalate homeostasis.  This is specifically ascertained through the targeted microbial transplants in mice whereby O. formigenes was given alone or with different combinations of other microorganisms.  In other words, we were able to replicate both successful and failed studies by manipulating which specific species were introduced into animals.  This is unprecedented in the literature.

      (2) The authors provide several conclusions that are not completely supported by the data available. For example:

      (a) Lines 236-239: "Within the framework of complex systems, results show microbe-host cooperation whereby oxalate effectively processed within the SW-NALB gut microbiota reduced overall liver activity, indicative of a beneficial impact." - The authors did not provide data related to oxalate levels of oxalate processing for this dataset.

      While we did not specifically quantify oxalate degradation for this specific study, as cited in the text when describing this Swiss-Webster, Neotoma albigula system, we have previously published multiple animal studies explicitly showing that the N. albigula animals were highly effective oxalate degraders, which is transferable to Swiss-Webster mice through fecal transplants. Since the gut microbiota’s impact on oxalate has been welll established through experiments by our group, the purpose of these specific experiments were to look the other way and examine the effect of oxalate on the gut microbiota of these two animal models.  In the referenced text, we again cited our studies showing that the SW-NALB system effectively degrades oxalate.

      (b) Lines 239-243: "Data also suggest that both the gut microbiota and the immune system are involved in oxalate remediation (redundancy), such that if oxalate cannot be neutralized in the gut microbiota or liver, then the molecule will be processed through host immune response mechanisms (fractality), in this case indicated through an overall increase in hepatic activity and specifically in mitochondrial activity." - The authors did not provide any evidence related to the immune system and oxalate metabolism.

      We corrected that statement as follows: “…in this case indicated through an overall increase in inflammatory cytokines with oxalate exposure combined with an ineffective oxalate-degrading microbiota (Figures S6a,b; S9a,b).”  In other words, if the liver and gut microbiota can’t eliminate a toxin, then the immune system must deal with it through inflammatory pathways.  Oxalate is a well established, pro-inflammatory compound.  Our data show that this is dependent on the gut microbiota.

      (c) Lines 250-252: "Following the diet trial, colon stool was collected post-necropsy and processed for untargeted metabolomics, which is a measure of total microbial metabolic output." - Although most metabolites in stool samples are indeed microbial, there are also host metabolites. So, it is not technically correct to relate the metabolomic analysis of stool samples to only microbial metabolic analysis. In addition, the authors discussed compounds such as alkaloids and cholesterol as microbial metabolites, which these compounds are more related to the diet and host correspondingly.

      We have corrected this to state: “total metabolites present in stool from the diet, microbial activity, and host activity”

      (d) Lines 270-273. "Specifically, the SW-NALB mice exhibit hallmarks of homeostatic feedback with oxalate exposure to maintain a consistent metabolic output, defined by the relatively small, net negative, microbial metabolite-hepatic gene network compared to the large, net positive, network of SW-SW mice." - How do the authors define oxalate homeostasis? In addition, do the authors imply feedback between the liver and the microbiome in which the microbiome responds to a liver response related to oxalate levels? Or could the observation in Figure 1 be explained just by microbial consumption of oxalate that would reduce the impact of oxalate that arrives at the liver?

      Oxalate homeostasis is defined in that sentence: “relatively small, net negative, microbial metabolite-hepatic gene network compared to the large, net positive, network of SW-SW mice” – in other words, for SW-NALB mice, oxalate did not produce a considerable change to either microbial or hepatic metabolic activity.  We did not really test the liver impact on gut microbiota and can’t speak to that.  We believe, based on Figure 2 data, that it is not just the degradation of oxalate that explains the lack of change in hepatic activity in SW-NALB mice, rather that the oxalate-induced shift in the gut microbiota metabolic activity broadly altered hepatic activity, as inferred from Figure 2 c.  We made this more clear in the results: “suggests that the oxalate-induced change in microbial metabolism is responsible for the change in hepatic activity”.

      (e) Lines 297-301: "The oxalate-dependent metagenomic divergence of the NALB gut microbiota (Figure 3), combined with the lack of change in the microbial metabolomic profile with oxalate exposure (Figure 2), suggest that oxalate stimulates taxonomically diverse, but metabolically redundant microorganisms, in support of maintaining homeostasis." - The authors cannot conclude anything related between taxonomic changes and microbial activity since the taxonomic data presented is for microbial enrichment in N. albigulia, and the "microbial activity data" is from the fecal transplantation experiment in SWM. These are two completely different systems with two completely different experimental designs.

      We have shown very similar results in that oxalate induces the taxonomic divergence for the NALB gut microbiota, in multiple previous studies.  The experiment in which a minimal, positive increase in microbial metabolites, was saw with oxalate was based on the SW-NALB model whereby Swiss-Webster mice have an NALB microbiota.  We show throughout the manuscript, that the impact of oxalate is very microbiota dependent and supports our claim.  However, the claim is hypothesis generating – that metabolic redundancy is important for oxalate homeostasis.  We modified our statement to make all of this more clear.   

      Related to microbial composition, the authors did not show data validating the efficiency of the fecal transplantations (allograft or xenograft) in the SWM after antibiotic treatment. They also did not show evidence of microbial composition dynamics in response to oxalate exposure.

      Again, the efficacy of fecal transplants, used in the way they were here, has been shown in multiple past studies of our group.  In past studies, we have extensively characterized the microbiota from fecal transplants and which taxa were associated with oxalate levels.  Therefore, that topic was not the focus of the current study, instead focusing on the oxalate impact on gut microbiota activity.  Our past studies, referenced multiple times through the current manuscript, were used in large part to help determine which microbes to include in our taxonomic cohort, as described in the manuscript.

      (f) Lines 301-303: "Given that data came from the same hosts sampled longitudinally, these data also reflect a microbiota that is adaptive to oxalate exposure, which is another important characteristic of complex systems." - In their dataset, what is the evidence that the microbiota of N. albigulia is adapted to oxalate exposure? Is the increase in genomes with pathways related to oxalate metabolism related to an increase of oxalate in the diet? If so, does the microbiota exposure with a higher oxalate concentration decrease the systemic level of oxalate? In neither of the experiments related to Figures 1 to 3, the authors showed a correlation of systemic oxalate levels with microbial composition, hepatic host response, or stool metabolism.

      Figure 3 explicitly shows the longitudinal impact of increasing levels of oxalate showing an increase in oxalate degrading genes (Figure 3d). The specific samples selected for analysis here come from a previous study in which we explicitly quantified changes to the gut microbiota composition and both stool and urine oxalate for every time point listed in figure 3a.  This information is explicitly stated in the methods coupled with the fact that “neither fecal nor urinary oxalate levels increased significantly.”  Again, the effect of the gut microbiota on oxalate in these model systems have been extensively studied by our group and provide the foundation for the current study to look at the effect of oxalate on the gut microbiota and host.

      Considering my last two points, the authors do not present substantial evidence to support their hypothesis that oxalate stimulates taxonomically diverse, metabolically redundant communities.

      As stated above, that oxalate stimulates taxonomically diverse taxa was ascertained through multiple past studies, as well as the current study (Figure 3e).  The metabolically redundant part is ascertained both through untargeted metabolomics (Figure 2a,b) and shotgun metagenomics (Figure 3c,d).  Further evidence for the metabolic redundancy with oxalate comes from our culturomic approach, which showed that 14.58% of isolates could grow on oxalate as a carbon and energy source, in addition to the high proportion of isolates that could grow on other carbon and energy sources, at least much more than can be ascribed to a single species  (Figure 5c).  We made this more clear in the discussion.

      (g) Lines 330-335. "Additionally, the broad diversity of species that contain oxalate-related genes suggests that the distribution of metabolic genes is somewhat independent of the distribution of microbial species, which suggests that microbial genes exist in an autonomous fractal layer, to some degree. This hypothesis is supported by studies which show a high degree of horizontal gene transfer within the gut microbiota as a means of adaptation." - This conclusion is highly speculative, especially since the author did not do any analysis to directly evaluate a relationship between the oxalate metabolic pathways and the microbial species where these pathways are present.

      Figure 3c,d,e explicitly shows the metabolic pathways and species enriched by oxalate exposure.  Figure 4d, generated using the same data from Figure 3, explicitly shows the taxa that harbor oxalate-degrading genes.   

      (h) Lines 364-366. "Collectively, data show that both resource availability and community composition impacts oxalate metabolism, which helps to define the adaptive nature of the NALB gut microbiota." - The authors indeed showed evidence that community composition impacts oxalate metabolism. However, the authors did not show any evidence to directly evaluate the resource availability to impact oxalate metabolism.

      This is explicitly shown through in vitro community-based and single species assays varying multiple different carbon and energy sources to quantify changes to oxalate degradation (chosen based on shotgun metagenomic results; Figure 5a,b).

      (3) Lines 321-325. "Acetogenic genes were also present in 97.18% of genomes, dominated by acetate kinase and formate-tetrahydrofolate ligase (Figure S3A323C). Methanogenic genes were present in 100% of genomes, dominated by phosphoserine phosphatase, atpdependent 6-phosphofructokinase, and phosphate acetyltransferase (Figure S4A-C)." - The authors spent much time analyzing the adjacent pathways related to oxalate and oxalaterelated products of oxalate metabolism. However, my understanding is that the genes used to analyze these pathways (formate metabolism, acetogenesis, methanogenesis), such as the ones named above, are not unique/specific for those pathways but participate in other "housekeeping" pathways. What is the relevance of these analyses when those genes are not unique/specific to the function/pathways that the authors describe? If I infer correctly, these bioinformatic analyses aim to evaluate the hypothesis of whether oxalate metabolism could be a social/cooperation metabolism and whether other species could participate in the metabolism of oxalate subproducts. However, these analyses did not explicitly evaluate this hypothesis.

      The reviewer is correct in that we aimed to evaluate the potential that oxalate metabolism could benefit from metabolic cooperation.  The specific genes chosen for this analysis were those explicitly listed in the target metabolic pathways in KEGG, as described.  However, while the analyses do show the strong potential that the CO2 and formate produced from oxalate degradation could be used in these other pathways, as intended, the genes can be used in other metabolic pathways.  We did, however, explicitly test the hypothesis that formate, produced from oxalate degradation, could be utilized by the gut microbiota.  While the targeted transplants with the taxonomic cohort did not clearly show the use of formate in this way, those from the metabolic cohort did (Figures 6d and S8d).  This question is still in ongoing investigations in our group.  

      We have made it more clear that our genome analyses provide the potential for metabolic redundancy rather than definitive proof for metabolic redundancy, which was evaluated more extensively in other experiments from this study.

      (a) Lines 481-484. "Collectively, data offer strong support for the hypothesis that metabolic redundancy among diverse taxa, is the primary driver of oxalate homeostasis, rather than metabolic cooperation in which the by-products of oxalate degradation are used in downstream pathways such as acetogenesis, methanogenesis, and sulfate reduction." - Although the authors recognize that their data about the metabolic cooperation hypothesis is inconclusive, they never tested the hypothesis related to metabolic cooperation, as mentioned above. This is highly speculative.

      As stated above, the targeted microbial transplants to animals and in vitro studies (Figure 5e,f) did explicitly test the cooperation hypothesis, but it the results did not support it and instead pointed much more strongly to metabolic redundancy.    

      (4) Lines 355-359. "Cohorts, defined in the STAR methods, were used to delineate hypotheses that either carbon and energy substrates are sufficient to explain known effects of the oxalate-degrading microbial network or that additional aspects of taxa commonly stimulated by dietary oxalate are required to explain past results (taxa defined through previous meta-analysis of studies)." - The definition of the metabolic cohorts and the taxonomic cohorts should not be hidden in the material and methods section. It should be explicit and clearly explained in the main text. Related, the table presented in Figure 5D is exceptionally confusing and does not help to understand and differentiate between the metabolic and the taxonomic cohorts. The authors need to explicitly identify the synthetic communities used in each cohort and each group by their members and their characteristics in supplementary tables.

      In the sentences before those referenced, we state: “Culturomic data recapitulates molecular data to show a considerable amount of redundancy surrounding oxalate metabolism (Fig. 5C). Isolates generated from this assay were used for subsequent study (metabolic cohort; Figure 5D). Additionally, a second cohort was defined and commercially purchased based both on known metabolic functions and the proportion of studies that saw an increase in their taxonomic population with oxalate consumption (Fig. 5D; taxonomic cohort). Where possible, isolates from human sources were obtained.”  Figure 5d explicitly shows the specific species used in each cohort along with the groups they were in for transplant studies, the explicit metabolic pathways we were targeting, along with the % of studies that these species were associated with oxalate metabolism.  All of this information is both in the main text of the results and in the figure legends.  It is not hidden in the methods, but the methods do reiterate what was also placed in the results.   

      In Figures 5 and 6, the authors used the following groups with the corresponding nomenclature: 'Group 1, No_bact; Group 2, Ox; Group 3, Ox_form; Group 4, All; Group 5, No_ox'. Although the information related to these groups is present in the material and method section in lines 1139-1143, the authors also need to explicitly explain the groups and their nomenclature in the main text.

      Since this information is explicitly and succinctly given in the referenced figures, I believe that adding the same information in the text would be too redundant.

      Related to the development of the synthetic communities. How did the authors prepare the synthetic communities or 'cohort' for the in vitro experiments? 

      We added more information for the preparation of microbes and execution of the in vitro assays, as needed.  

      Also, it is unclear in the material and method section how the metabolic profile of each isolated was evaluated (Figure 5C). Related to the bacteria isolated from the culturomic assays, including Figure 5C and metabolic cohort, the authors indeed reported the isolation methodology in lines 1262-1275. However, there is no information about the sequencing of these isolates. The authors should present these isolates as a list (supplementary table) with their names, taxonomy, metabolic profile, and Genome ID if these genomes were submitted to NCBI.

      We added additional information for how metabolic cohort isolates were chosen and how they were taxonomically identified.  The taxonomy and substrate utilization of isolates are in Figure 5D.  We did not sequence the genomes of metabolic cohort bacteria.  However, the ATCC isolates, which comprise the taxonomic cohort, are publicly available.

      The author presented the 248 metagenomics assembles in Figure S1 in a circular chart in context with other genomes. However, the metagenomic assembles should be presented in a table form, with their name, taxonomy, coverage, completeness, and Genome ID, if these genomes were submitted to NCBI.

      The information for the genomes submitted to the NCBI is provided in the data availability statement.  However, we added a table (Table S9) that includes the requested information.   

      (5) Lines 371-3374: "To delineate hypotheses of metabolic redundancy or cooperation for mitigating the negative effects of oxalate on the gut microbiota and host, two independent diet trials were conducted with analogous microbial communities derived from the metabolic and taxonomic cohorts". 

      Lines 494-496: "we and others have found that oxalate can differentially exhibit positive or negative effects on microbial growth and metabolism dependent on the species and environment present" - What is the evidence that oxalate has a negative effect on the gut microbiota? The authors clearly showed the negative effect of oxalate on the host. Although there are reports in the literature of oxalate consumers with a negative effect on the microbiome, such as Lactobacilli and Bifidobacteria, there is no evidence in this manuscript about a negative effect of oxalate on the microbiome, and there is not an experimental design to evaluate it.

      These data are presented in Figure 2A and B.  As stated, oxalate led to a net reduction in total microbial metabolites produced of 34 metabolites, with a significant shift in overall metabolome, indicative of metabolic inhibition.  This is in comparison to the net gain of 9 metabolites, with no significant shift overall,  in the mice with the NALB microbiota.  The positive and negative effects of oxalate on the whole gut microbiota here are bolstered by previous studies on the effect of oxalate on pure cultures as discussed and cited on line 623624.

      (6) Related to the last section, it is hard to really compare the results of the taxonomic cohort versus the metabolic cohort when the data of one cohort is in the main figure and the other in a supplementary figure. In addition, all the comparisons between the two cohorts seem to be qualitative. For any comparisons, the authors need to do a statistical comparison between the groups of the two cohorts.

      The comparison of the two sets of data are indeed qualitative.  This is because these mouse models were run in separate experiments to test separate hypotheses (whether utilization of specific substrates is enough to improve oxalate metabolism or if specific taxa previously responsive to dietary oxalate was better, which is stated in the manuscript).  Given that these experimental models were tested separately, it would not be statistically valid to do a direct statistical comparison, even though the experimental procedures were the same and the only difference were the transplanted bacteria.  The separation of the experiments into a main and supplemental figure was done out of necessity given the very large amount of data and many experimental mouse models that were run in this study overall.   

      Minor Comments.

      (1) The authors should define 'antinutrients'. This term is not a familiar concept and could create confusion.

      This is defined in line 104 “molecules produced in plants to deter herbivory, disrupt homeostasis by targeting the function of the microbiome, host, or both”

      (2) The authors should explicitly describe the N. albigulia, aka White-throated woodrat system, as early as possible in the result section.

      We added some statements about the Swiss webster and N. albigula gut microbiota as poor and effective oxalate degraders in the second section of the results.

      (3) SW-SW mice exhibited an oxalate-dependent alteration of 219 hepatic genes, with a net increase in activity. In comparison, the SW-NALB mice exhibited an oxalate-dependent alteration of 21 genes with a net decrease in activity. However, the visual representation of the PCoA in Figure 1B showed that the most different samples are the SW-NALB 0% and 1.5%. Could you please explain this difference?

      In Figure 1b, the SW-NALB data are represented by the blue and black data points, which directly overlap with each other.  The SW-SW data are the orange and purple data points, which exhibit very little overlap.  

      (4) Is Table S7 the same as Table S6? If not, there is a missing supplementary table.

      These tables are different.  We ensured that both are present.

      (5) How did the authors test bacterial growth in in vivo studies (Figure 5B)?

      We added a statement to the culturomic section of the methods – we used media with or without oxalate and quantified colony-forming units.

      (6) A section of 16S rRNA metagenomics in the material and method section is not used across the main manuscript.

      These data are presented in figures S7 and S10, as stated in the results.  We added statements in the results to clarify that these figures show the 16S sequencing data.

      (7) Lines 506-511: "Collectively, data from the current and previous studies on the effect of oxalate exposure on the gut microbiota support the hypothesis that the gut microbiota serves as an adaptive organ in which specific, metabolically redundant microbes respond to and eliminate dietary components, for the benefit of themselves, but which can residually protect or harm host health depending on the dietary molecules and gut microbiota composition." - What is the benefit to bacteria in eliminating oxalate? This is highly speculative to this system.

      The benefit to bacteria is stated earlier in that paragraph – “In the current (Figs. 2B, 5B) and previous studies(33,34,64,65), we and others have found that oxalate can differentially exhibit positive or negative effects on microbial growth and metabolism dependent on the species and environment present.”

      (8) Lines 504 -506: "Importantly, the near-universal presence of formate metabolism genes suggest that formate may be an even greater source of ecological pressure (Figures S2-S5)."

      - Formate is primarily produced by fermentative anaerobic bacteria, such as Bacteroides, Clostridia, and certain species of Escherichia coli, since formate would be present in anaerobic communities independently of oxalate. How is formate an even greater source of ecological pressure?

      We added a statement about the toxicity of formate to both bacteria and mammalian hosts.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public Review):

      Summary

      In this study, the authors build upon previous research that utilized non-invasive EEG and MEG by analyzing intracranial human ECoG data with high spatial resolution. They employed a receptive field mapping task to infer the retinotopic organization of the human visual system. The results present compelling evidence that the spatial distribution of human alpha oscillations is highly specific and functionally relevant, as it provides information about the position of a stimulus within the visual field.

      Using state-of-the-art modeling approaches, the authors not only strengthen the existing evidence for the spatial specificity of the human dominant rhythm but also provide new quantification of its functional utility, specifically in terms of the size of the receptive field relative to the one estimated based on broad band activity.

      We thank the reviewer for their positive summary.

      Weakness 1.1

      The present manuscript currently omits the complementary view that the retinotopic map of the visual system might be related to eye movement control. Previous research in non-human primates using microelectrode stimulation has clearly shown that neuronal circuits in the visual system possess motor properties (e.g. Schiller and Styker 1972, Schiller and Tehovnik 2001). More recent work utilizing Utah arrays, receptive field mapping, and electrical stimulation further supports this perspective, demonstrating that the retinotopic map functions as a motor map. In other words, neurons within a specific area responding to a particular stimulus location also trigger eye movements towards that location when electrically stimulated (e.g. Chen et al. 2020).

      Similarly, recent studies in humans have established a link between the retinotopic variation of human alpha oscillations and eye movements (e.g., Quax et al. 2019, Popov et al. 2021, Celli et al. 2022, Liu et al. 2023, Popov et al. 2023). Therefore, it would be valuable to discuss and acknowledge this complementary perspective on the functional relevance of the presented evidence in the discussion section.

      The reviewer notes that we do not discuss the oculomotor system and alpha oscillations. We agree that the literature relating eye movements and alpha oscillations are relevant.

      At the Reviewer’s suggestion, we added a paragraph on this topic to the first section of the Discussion (section 3.1, “Other studies have proposed … “).

      Reviewer #2 (Public Review):

      Summary:

      In this work, Yuasa et al. aimed to study the spatial resolution of modulations in alpha frequency oscillations (~10Hz) within the human occipital lobe. Specifically, the authors examined the receptive field (RF) tuning properties of alpha oscillations, using retinotopic mapping and invasive electroencephalogram (iEEG) recordings. The authors employ established approaches for population RF mapping, together with a careful approach to isolating and dissociating overlapping, but distinct, activities in the frequency domain. Whereby, the authors dissociate genuine changes in alpha oscillation amplitude from other superimposed changes occurring over a broadband range of the power spectrum. Together, the authors used this approach to test how spatially tuned estimated RFs were when based on alpha range activity, vs. broadband activities (focused on 70-180Hz). Consistent with a large body of work, the authors report clear evidence of spatially precise RFs based on changes in alpha range activity. However, the size of these RFs were far larger than those reliably estimated using broadband range activity at the same recording site. Overall, the work reflects a rigorous approach to a previously examined question, for which improved characterization leads to improved consistency in findings and some advance of prior work.

      We thank the reviewer for the summary.

      Strengths:

      Overall, the authors take a careful and well-motivated approach to data analyses. The authors successfully test a clear question with a rigorous approach and provide strong supportive findings. Firstly, well-established methods are used for modeling population RFs. Secondly, the authors employ contemporary methods for dissociating unique changes in alpha power from superimposed and concomitant broadband frequency range changes. This is an important confound in estimating changes in alpha power not employed in prior studies. The authors show this approach produces more consistent and robust findings than standard band-filtering approaches. As noted below, this approach may also account for more subtle differences when compared to prior work studying similar effects.

      We thank the reviewer for the positive comments.

      Weaknesses:

      Weakness 2.1 Theoretical framing:

      The authors frame their study as testing between two alternative views on the organization, and putative functions, of occipital alpha oscillations: i) alpha oscillation amplitude reflects broad shifts in arousal state, with large spatial coherence and uniformity across cortex; ii) alpha oscillation amplitude reflects more specific perceptual processes and can be modulated at local spatial scales. However, in the introduction this framing seems mostly focused on comparing some of the first observations of alpha with more contemporary observations. Therefore, I read their introduction to more reflect the progress in studying alpha oscillations from Berger's initial observations to the present. I am not aware of a modern alternative in the literature that posits alpha to lack spatially specific modulations. I also note this framing isn't particularly returned to in the discussion.

      This was helpful feedback. We have rewritten nearly the entire Introduction to frame the study differently. The emphasis is now on the fact that several intracranial studies of spatial tuning of alpha (in both human and macaque) tend to show increases in alpha due to visual stimulation, in contrast to a century of MEG/EEG studies, from Berger to the present, showing decreases. We believe that the discrepancy is due to an interaction between measurement type and brain signals. Specifically, intracranial measurements sum decreases in alpha oscillations and increases in broadband power on the same trials, and both signals can be large. In contrast, extracranial measures are less sensitive to the broadband signals and mostly just measure the alpha oscillation. Our study reconciles this discrepancy by removing the baseline broadband power increases, thereby isolating the alpha oscillation, and showing that with iEEG spatial analyses, the alpha oscillation decreases with visual stimulation, consistent with EEG and MEG results.

      Weakness 2.2 A second important variable here is the spatial scale of measurement.

      It follows that EEG based studies will capture changes in alpha activity up to the limits of spatial resolution of the method (i.e. limited in ability to map RFs). This methodological distinction isn't as clearly mentioned in the introduction, but is part of the author's motivation. Finally, as noted below, there are several studies in the literature specifically addressing the authors question, but they are not discussed in the introduction.

      The new Introduction now explicitly contrasts EEG/MEG with intracranial studies and refers to the studies below.

      Weakness 2.3 Prior studies:

      There are important findings in the literature preceding the author's work that are not sufficiently highlighted or cited. In general terms, the spatio-temporal properties of the EEG/iEEG spectrum are well known (i.e. that changes in high frequency activity are more focal than changes in lower frequencies). Therefore, the observations of spatially larger RFs for alpha activities is highly predicted. Specifically, prior work has examined the impact of using different frequency ranges to estimate RF properties, for example ECoG studies in the macaque by Takura et al. NeuroImage (2016) [PubMed: 26363347], as well as prior ECoG work by the author's team of collaborators (Harvey et al., NeuroImage (2013) [PubMed: 23085107]), as well as more recent findings from other groups (Luo et al., (2022) BioRxiv: https://doi.org/10.1101/2022.08.28.505627). Also, a related literature exists for invasively examining RF mapping in the time-voltage domain, which provides some insight into the author's findings (as this signal will be dominated by low-frequency effects). The authors should provide a more modern framing of our current understanding of the spatial organization of the EEG/iEEG spectrum, including prior studies examining these properties within the context of visual cortex and RF mapping. Finally, I do note that the author's approach to these questions do reflect an important test of prior findings, via an improved approach to RF characterization and iEEG frequency isolation, which suggests some important differences with prior work.

      Thank you for these references and suggestions. Some of the references were already included, and the others have been added.

      There is one issue where we disagree with the Reviewer, namely that “the observations of spatially larger RFs for alpha activities is highly predicted”. We agree that alpha oscillations and other low frequency rhythms tend to be less focal than high frequency responses, but there are also low frequency non-rhythmic signals, and these can be spatially focal. We show this by demonstrating that pRFs solved using low frequency responses outside the alpha band (both below and above the alpha frequency) are small, similar to high frequency broadband pRFs, but differing from the large pRFs associated with alpha oscillations. Hence we believe the degree to which signals are focal is more related to the degree of rhythmicity than to the temporal frequency per se. While some of these results were already in the supplement, we now address the issue more directly in the main text in a new section called, “2.5 The difference in pRF size is not due to a difference in temporal frequency.”

      We incorporated additional references into the Introduction, added a new section on low frequency broadband responses to the Results (section 2.5), and expanded the Discussion (section 3.2) to address these new references.

      Weakness 2.4 Statistical testing:

      The authors employ many important controls in their processing of data. However, for many results there is only a qualitative description or summary metric. It appears very little statistical testing was performed to establish reported differences. Related to this point, the iEEG data is highly nested, with multiple electrodes (observations) coming from each subject, how was this nesting addressed to avoid bias?

      We reviewed the primary claims made in the manuscript and for each claim, we specify the supporting analyses and, where appropriate, how we address the issue of nesting. Although some of these analyses were already in the manuscript, many of them are new, including all of the analyses concerning nesting. We believe that putting this information in one place will be useful to the reader, and we now include this text as a new section in supplement, Graphical and statistical support for primary claims.

      Reviewer #2 (Recommendations For The Authors):

      Recommendation 2.1:

      Data presentation: In several places, the authors discuss important features of cortical responses as measured with iEEG that need to be carefully considered. This is totally appropriate and a strength of the author's work, however, I feel the reader would benefit from more depiction of the time-domain responses, to help better understand the authors frequency domain approach. For example, Figure 1 would benefit from showing some form of voltage trace (ERP) and spectrogram, not just the power spectra. In addition, part (a) of Figure 1 could convey some basic information about the timing of the experimental paradigm.

      We changed panel A of Figure 1 to include the timing of the experimental paradigm, and we added panels C and D to show the electrode time series before and after regression out of the ERP.

      Recommendation 2.2

      Update introduction to include references to prior EEG/iEEG work on spatial distribution across frequency spectrum, and importantly, prior work mapping RFs with different frequencies.

      We have addressed this issue and re-written our introduction. Please refer to our response in Public Review for further details.

      Recommendation 2.3

      Figure 3 has several panels and should be labeled to make it easier to follow.The dashed line in lower power spectra isn't defined in a legend and is missing from the upper panel - please clarify.

      We updated Figure 3 and reordered the panels to clarify how we computed the summary metrics in broadband and alpha for each stimulus location (i.e., the “ratio” values plotted in panel B). We also simplified the plot of the alpha power spectrum. It now shows a dashed line representing a baseline-corrected response to the mapping stimulus, which is defined in the legend and explained in the caption.

      Recommendation 2.4

      Power spectra are always shown without error shading, but they are mean estimates.

      We added error shading to Figures 1, 2 and 3.

      Recommendation 2.5

      The authors deal with voltage transients in response to visual stimulation, by subtracting out the trail averaged mean (commonly performed). However, the efficacy of this approach depends on signal quality and so some form of depiction for this processing step is needed.

      We added a depiction of the processing steps for regressing out the averaged responses in Figure 1 in an example electrode (panels C and D). We also show in the supplement the effect of regressing out the ERP on all the electrode pRFs. We have added Supplementary Figure 1-2.

      Recommendation 2.6

      I have a similar request for the authors latency correction of their data, where they identified a timing error and re-aligned the data without ground truth. Again, this is appropriate, but some depiction of the success of this correction is very critical for confirming the integrity of the data.

      We now report more detail on the latency correction, and also point out that any small error in the estimate would not affect our conclusions (4.6 ECoG data analysis | Data epoching). The correction was important for a prior paper on temporal dynamics (Groen et al, 2022), which used data from the same participants and estimated the latency of responses. In this paper, our analyses are in the spectral domain (and discard phase), so small temporal shifts are not critical. We now also link to the public code associated with that paper, which implemented the adjustment and quantified the uncertainty in the latency adjustment.

      More details on latency adjustment provided in section 4.6.

      Recommendation 2.7

      In many places the authors report their data shows a 'summary' value, please clarify if this means averaging or summation over a range.

      For both broadband and alpha, we derive one summary value (a scalar) for trial for each stimulus. For broadband, the summary metric is the ratio of power during a given trial and power during blanks, where power in a trial is the geometric mean of the power at each frequency within the defined band). This is equation 3 in the methods, which is now referred to the first time that summary metrics are mentioned in the results.  For alpha, the summary metric is the height of the Gaussian from our model-based approach. This is in equations 1 and 2, and is also now referred to the first time summary metrics are mentioned in the results.

      We added explanation of the summary metrics in the figure captions and results where they are first used, and also referred to the equations in the methods where they are defined.

      Recommendation 2.8

      The authors conclude: "we have discovered that spectral power changes in the alpha range reflect both suppression of alpha oscillations and elevation of broadband power." It might not have been the intention, but 'discovered' seems overstated.

      We agree and changed this sentence.

      Recommendation 2.9

      Supp Fig 9 is a great effort by the authors to convey their findings to the reader, it should be a main figure.

      We are glad you found Supplementary Figure 9 valuable. We moved this figure to the main text.

      Reviewer #3 (Public Review):

      Summary:

      This study tackles the important subject of sensory driven suppression of alpha oscillations using a unique intracranial dataset in human patients. Using a model-based approach to separate changes in alpha oscillations from broadband power changes, the authors try to demonstrate that alpha suppression is spatially tuned, with similar center location as high broadband power changes, but much larger receptive field. They also point to interesting differences between low-order (V1-V3) and higher-order (dorsolateral) visual cortex. While I find some of the methodology convincing, I also find significant parts of the data analysis, statistics and their presentation incomplete. Thus, I find that some of the main claims are not sufficiently supported. If these aspects could be improved upon, this study could potentially serve as an important contribution to the literature with implications for invasive and non-invasive electrophysiological studies in humans.

      We thank the reviewer for the summary.

      Strengths:

      The study utilizes a unique dataset (ECOG & high-density ECOG) to elucidate an important phenomenon of visually driven alpha suppression. The central question is important and the general approach is sound. The manuscript is clearly written and the methods are generally described transparently (and with reference to the corresponding code used to generate them). The model-based approach for separating alpha from broadband power changes is especially convincing and well-motivated. The link to exogenous attention behavioral findings (figure 8) is also very interesting. Overall, the main claims are potentially important, but they need to be further substantiated (see weaknesses).

      We thank the reviewer for the positive comments.

      Weaknesses:

      I have three major concerns:

      Weakness 3.1. Low N / no single subject results/statistics:

      The crucial results of Figure 4,5 hang on 53 electrodes from four patients (Table 2). Almost half of these electrodes (25/53) are from a single subject. Data and statistical analysis seem to just pool all electrodes, as if these were statistically independent, and without taking into account subject-specific variability. The mean effect per each patient was not described in text or presented in figures. Therefore, it is impossible to know if the results could be skewed by a single unrepresentative patient. This is crucial for readers to be able to assess the robustness of the results. N of subjects should also be explicitly specified next to each result.

      We have added substantial changes to deal with subject specific effects, including new results and new figures.

      • Figure 4 now shows variance explained by the alpha pRF broken down by each participant for electrodes in V1 to V3. We also now show a similar figure for dorsolateral electrodes in Supplementary Figure 4-2.

      • Figure 5, which shows results from individual electrodes in V1 to V3, now includes color coding of electrodes by participant to make it clear how the electrodes group with participant. Similarly, for dorsolateral electrodes, we show electrodes grouped by participant in Supplementary Figure 5-1. Same for Supplementary Figure 6-2.

      • Supplementary Figure 7-2 now shows the benefits of our model-based approach for estimating alpha broken down by individual participants.

      • We also now include a new section in the supplement that summarizes for every major claim, what the supporting data are and how we addressed the issue of nesting electrodes by participant, section Graphical and statistical support for primary claims.

      Weakness 3.2. Separation between V1-V3 and dorsolateral electrodes:

      Out of 53 electrodes, 27 were doubly assigned as both V1-V3 and dorsolateral (Table 2, Figures 4,5). That means that out of 35 V1-V3 electrodes, 27 might actually be dorsolateral. This problem is exasperated by the low N. for example all the 20 electrodes in patient 8 assigned as V1-V3 might as well be dorsolateral. This double assignment didn't make sense to me and I wasn't convinced by the authors' reasoning. I think it needlessly inflates the N for comparing the two groups and casts doubts on the robustness of these analyses.

      Electrode assignment was probabilistic to reflect uncertainty in the mapping between location and retinotopic map. The probabilistic assignment is handled in two ways.

      (1) For visualizing results of single electrodes, we simply go with the maximum probability, so no electrode is visualized for both groups of data. For example, Figure 5a (V1-V3) and supplementary Figure 5-1a (dorsolateral electrodes) have no electrodes in common: no electrode is in both plots.

      (2) For quantitative summaries, we sample the electrodes probabilistically (for example Figures 4, 5c). So, if for example, an electrode has a 20% chance of being in V1 to V3, and 30% chance of being in dorsolateral maps, and a 50% chance of being in neither, the data from that electrode is used in only 20% of V1-V3 calculations and 30% of dorsolateral calculations. In 50% of calculations, it is not used at all. This process ensures that an electrode with uncertain assignment makes no more contribution to the results than an electrode with certain assignment. An electrode with a low probability of being in, say, V1-V3, makes little contribution to any reported results about V1-V3. This procedure is essentially a weighted mean, which the reviewer suggests in the recommendations. Thus, we believe there is not a problem of “double counting”.

      The alternative would have been to use maximum probability for all calculations. However, we think that doing so would be misleading, since it would not take into account uncertainty of assignment, and would thus overstate differences in results between the maps.

      We now clarify in the Results that for probabilistic calculations, the contribution of an electrode is limited by the likelihood of assignment (Section 2.3). We also now explain in the methods why we think probabilistic sampling is important.

      Weakness 3.3. Alpha pRFs are larger than broadband pRFs:

      First, as broadband pRF models were on average better fit to the data than alpha pRF models (dark bars in Supp Fig 3. Top row), I wonder if this could entirely explain the larger Alpha pRF (i.e. worse fits lead to larger pRFs). There was no anlaysis to rule out this possibility.

      We addressed this question in a new paragraph in Discussion section 3.1 (“What is the function of the large alpha pRFs?”, paragraph beginning… “Another possible interpretation is that the poorer model fit in the alpha pRF is due to lower signal-to-noise”). This paragraph both refers to prior work on the relationship between noise and pRF size and to our own control analyses (Supplementary Figure 5-2).

      Weakness 3.4 Statistics

      Second, examining closely the entire 2.4 section there wasn't any formal statistical test to back up any of the claims (not a single p-value is mentioned). It is crucial in my opinion to support each of the main claims of the paper with formal statistical testing.

      We agree that it is important for the reader to be able to link specific results and analyses to specific claims. We are not convinced that null hypothesis statistical testing is always the best approach. This is a topic of active debate in the scientific community.

      We added a new section that concisely states each major claim and explicitly annotates the supporting evidence. (Section 4.7). Please also refer to our responses to Reviewer #2 regarding statistical testing (Reviewer weakness 2.4 “Statistical testing”)

      Weakness 3.5 Summary

      While I judge these issues as crucial, I can also appreciate the considerable effort and thoughtfulness that went into this study. I think that addressing these concerns will substantially raise the confidence of the readership in the study's findings, which are potentially important and interesting.

      We again thank the reviewer for the positive comments.

      Reviewer #3 (Recommendations For The Authors):

      Suggestions for how to address the three major concerns:

      Suggestion 3.1.

      I am very well aware that it's very hard to have n=30 in a visual cortex ECOG study. That's fine. Best practice would be to have a linear mixed effects model with patients as a random effect. However, for some figures with just 3-4 patients (Figure 4,5) the sample size might be too small even for that. At the very minimum, I would expect to show in figures/describe in text all results per patient (perhaps one can do statistics within each patient, and show for each patient that the effect is significant). Even in primate studies with just two subjects it is expected to show that the results replicate for subject A and B. It is necessary to show that your results don't depend on a single unrepresentative subject. And if they do, at least be transparent about it.

      We have addressed this thoroughly. Please see response to Weakness 3.1 (“Low N / no single subject results/statistics”).

      Suggestion 3.2.

      I just don't get it. I would simply assign an electrode to V1-V3 or dorsolateral cortex based on which area has the highest probability. It doesn't make sense to me that an electrode that has 60% of being in dorsolateral cortex and only 10% to be in V1-V3 would be assigned as both V1-V3 and dorsolateral. Also, what's the rationale to include such electrode in the analysis for let's say V1-V3 (we have weak evidence to believe it's there)? I would either assign electrodes based on the highest probability, or alternatively do a weighted mean based on the probability of each electrode belonging to each region group (e.g. electrode with 40% to be in V1-V3, will get twice the weight as an electrode who has 20% to be in V1-V3) but this is more complicated.

      We have addressed this issue. Please refer to our response in Public Review (“Weakness 3.2 Separation between V1-V3 and dorsolateral”) for details.

      Suggestion 3.3.

      First, to exclude the possibility that alpha pRF are larger simply because they have a worse fit to the neural data, I would show if there is a correlation between the goodnessof-fit and pRF size (for alpha and broadband signals, separately). No [negative] correlation between goodness-of-fit and pRF size would be a good sign. I would also compare alpha & broadband receptive field size when controlling for the goodness-of-fit (selecting electrodes with similar goodness-of-fit for both signals). If the results replicate this way it would be convincing.

      Second, there are no statistical tests in section 2.4, possibly also in others. Even if you employ bootstrap / Monte-Carlo resampling methods you can extract a p-value.

      We have addressed this issue. Please refer to our response in Public Review Point 3.3 (“Alpha pRFs are larger than broadband pRFs”) for further details.

      Suggestion 3.4.

      Also, I don't understand the resampling procedure described in lines 652-660: "17.7 electrodes were assigned to V1-V3, 23.2 to dorsolateral, and 53 to either " - but 17.7 + 23.2 doesn't add up to 53. It also seems as if you assign visual areas differently in this resampling procedure than in the real data - "and randomly assigned each electrode to a visual area according to the Wang full probability distributions". If you assign in your actual data 27 electrodes to both visual areas, the same should be done in the resampling procedure (I would expect exactly 35 V1-V3 and 45 dorsolateral electrodes in every resampling, just the pRFs will be shuffled across electrodes).

      We apologize for the confusion.

      We fixed the sentence above, clarified the caption to Table 2, and also explained the overall strategy of probabilistic resampling better. See response to Public Review point 3.2 for details.

      Suggestion 3.5.

      These are rather technical comments but I believe they are crucial points to address in order to support your claims. I genuinely think your results are potentially interesting and important but these issues need to be first addressed in a revision. I also think your study may carry implications beyond just the visual domain, as alpha suppression is observed for different sensory modalities and cortical regions. Might be useful to discuss this in the discussion section.

      Agree. We added a paragraph on this point to the Discussion (very end of 3.2).

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Astrocytes are known to express neuroligins 1-3. Within neurons, these cell adhesion molecules perform important roles in synapse formation and function. Within astrocytes, a significant role for neuroligin 2 in determining excitatory synapse formation and astrocyte morphology was shown in 2017. However, there has been no assessment of what happens to synapses or astrocyte morphology when all three major forms of neuroligins within astrocytes (isoforms 1-3) are deleted using a well characterized, astrocyte specific, and inducible cre line. By using such selective mouse genetic methods, the authors here show that astrocytic neuroligin 1-3 expression in astrocytes is not consequential for synapse function or for astrocyte morphology. They reach these conclusions with careful experiments employing quantitative western blot analyses, imaging and electrophysiology. They also characterize the specificity of the cre line they used. Overall, this is a very clear and strong paper that is supported by rigorous experiments. The discussion considers the findings carefully in relation to past work. This paper is of high importance, because it now raises the fundamental question of exactly what neuroligins 1-3 are actually doing in astrocytes. In addition, it enriches our understanding of the mechanisms by which astrocytes participate in synapse formation and function. The paper is very clear, well written and well illustrated with raw and average data.

      We thank the reviewer for the balanced and informative summary.

      Reviewer #2 (Public Review):

      In the present manuscript, Golf et al. investigate the consequences of astrocyte-specific deletion of Neuroligin family cell adhesion proteins on synapse structure and function in the brain. Decades of prior research had shown that Neuroligins mediate their effects at synapses through their role in the postsynaptic compartment of neurons and their transsynaptic interaction with presynaptic Neurexins. More recently, it was proposed for the first time that Neuroligins expressed by astrocytes can also bind to presynaptic Neurexins to regulate synaptogenesis (Stogsdill et al. 2017, Nature). However, several aspects of the model proposed by Stogsdill et al. on astrocytic Neuroligin function conflict with prior evidence on the role of Neuroligins at synapses, prompting Golf et al. to further investigate astrocytic Neuroligin function in the current study. Using postnatal conditional deletion of Neuroligins 1, 2 and 3 specifically from astrocytes, Golf et al. show that virtually no changes in the expression of synaptic proteins or in the properties of synaptic transmission at either excitatory or inhibitory synapses are observed. Moreover, no alterations in the morphology of astrocytes themselves were found. The authors conclude that while Neuroligins are indeed expressed in astrocytes and are hence likely to play some role there, this role does not include any direct consequences on synaptic structure and function, in direct contrast to the model proposed by Stogsdill et al.

      Overall, this is a strong study that addresses an important and highly relevant question in the field of synaptic neuroscience. Neuroligins are not only key regulators of synaptic function, they have also been linked to numerous psychiatric and neurodevelopmental disorders, highlighting the need to precisely define their mechanisms of action. The authors take a wide range of approaches to convincingly demonstrate that under their experimental conditions, no alterations in the levels of synaptic proteins or in synaptic transmission at excitatory or inhibitory synapses, or in the morphology of astrocytes, are observed.

      We are also grateful for this reviewer’s constructive comments.

      One caveat to this study is that the authors do not directly provide evidence that their Tamoxifen-inducible conditional deletion paradigm does indeed result in efficient deletion of all three Neuroligins from astrocytes. Using a Cre-dependent tdTomato reporter line, they show that tdTomato expression is efficiently induced by the current paradigm, and they refer to a prior study showing efficient deletion of Neuroligins from neurons using the same conditional Nlgn1-3 mouse lines but a different Cre driver strategy. However, neither of these approaches directly provide evidence that all three Neuroligins are indeed deleted from astrocytes in the current study. In contrast, Stogsdill et al. employed FACS and qPCR to directly quantify the loss of Nlgn2 mRNA from astrocytes. This leaves the current Golf et al. study somewhat vulnerable to the criticism, however unlikely, that their lack of synaptic effects may be a consequence of incomplete Neuroligin deletion, rather than a true lack of effect of astrocytic Neuroligins.

      The concern is valid. In the original submission of this paper, we did not establish that the Cre recombinase we used actually deleted neuroligins in astrocytes. We have now addressed this issue in the revised paper with new experiments as described below.

      However, the reviewer’s impression that the Stogsdill et al. paper confirmed full deletion of Nlgn2 is a misunderstanding of the data in that paper. The reviewer is correct that Stogsdill et al. performed FACS to test the efficacy of the GLAST-Cre mediated deletion of Nlgn2-flox mice, followed by qRT-PCR comparing heterozygous with homozygous mutant mice. With their approach, no wild-type control could be used, as these would lack reporter expression. However, this experiment does NOT allow conclusions about the degree of recombination, both overall recombination (i.e. recombination in all astrocytes regardless of TdT+) and recombination in TdT+ astrocytes because it doesn’t quantify recombination. To quantify the degree of recombination, the paper would have had to perform genomic PCR measurements.  

      The problem with the data on the degree of recombination in the Stogsdill et al. (2017) paper, as we understand them, is two-fold.

      First, the GLAST-Cre line only targets ~40-70% of astrocytes, at least as evidenced by highly sensitive Cre-reporter mice in a variety of studies using this Cre line. The 40-70% variation is likely due to differences in the reporter mice and the tamoxifen injection schedule used. In comparison, we are targeting most astrocytes using the Aldh1l1-CreERT2 mice. Moreover, GLAST-Cre mice exhibit neuronal off-targeting, consistent with at least some of the remaining Nlgn2 qRT-PCR signal in the FACS-sorted cells. As we describe next, this signal also likely comes from astrocytes where recombination was incomplete This is the reason why we, like everyone else, are now using the Aldh1l1-Cre line that has been shown to be more efficient both in terms of the overall targeting of astrocytes (i.e. nearly complete) and the level of recombination observed in reporter(+) astrocytes.

      Second, Stogsdill et al. detected a significant decrease in the Nlgn2 qRT-PCR signal in the FACS-sorted homozygous Nlgn2 KO cells compared to the heterozygous Nlgn2 KO cells but the Nlgn2 qRT-PCR signal was still quite large. The data is presented as normalized to the HET condition. As a result, we don’t know the true level of gene deletion (i.e. compared to TdT- astrocytes). For example, based on the Stogsdill et al. data the HET manipulation could have induced only a 20% reduction in Nlgn2 mRNA levels in TdT(+) astrocytes, in which case the KO would have produced a 40% reduction in Nlgn2 mRNA in TdT(+) astrocytes. Moreover, it is possible based on our own experience with the GLAST-Cre line, that the reporter may also not turn on in some astrocytes where other alleles have been independently recombined – just as some astrocytes that are Td(+) would still be wild-type or heterozygous for Nlgn2. Thus, it is impossible to calculate the actual percentage of recombination from these data, even in TdT(+) cells, absent of PCR of genomic DNA from isolated cells. Alternatively, comparison of mRNA levels using primers sensitive to floxed sequences in wild-type controls versus cKO mice would have also yielded a much better idea of the recombination efficiency.

      In summary, it is unclear whether the Nlgn2 deletion in the Stogsdill et al. paper was substantial or marginal – it is simply impossible to tell.

      Reviewer #3 (Public Review):

      This study investigates the roles of astrocytes in the regulation of synapse development and astrocyte morphology using conditional KO mice carrying mutations of three neuroligins1-3 in astrocytes with the deletion starting at two different time points (P1 and P10/11). The authors use morphological, electrophysiological, and cell-biological approaches and find that there are no differences in synapse formation and astrocyte cytoarchitecture in the mutant hippocampus and visual cortex. These results differ from the previous results (Stogsdill et al., 2017), although the authors make several discussion points on how the differences could have been induced. This study provides important information on how astrocytes and neurons interact with each other to coordinate neural development and function. The experiments were well-designed, and the data are of high quality.

      We also thank this reviewer for helpful comments!

      Recommendations for the authors:

      This project was meant to rigorously test the intriguing overall question whether neuroligins, which are abundantly expressed in astrocytes, regulate synapse formation as astrocytic synapse organizers. The goal of the paper was NOT to confirm or dispute the conclusion by Stogsdill et al. (Nature 2017) that Nlgn2 expressed in astrocytes is essential for excitatory synapse formation and that astrocytic Nlgn1-3 are required for proper astrocyte morphogenesis. Instead, the project was meant to address the much broader question whether the abundant expression of any neuroligin, not just Nlgn2, in astrocytes is essential for neuronal excitatory or inhibitory synapse formation and/or for the astrocyte cytoarchitecture. We felt that this was an important question independent of the Stogsdill et al. paper. We analyzed in our experiments young adult mice, a timepoint that was chosen deliberately to avoid the possibility of observing a possible developmental delay rather than a fundamental function that extends beyond development.

      We do recognize that the conclusion by Stogsdill et al. (2017) that Nlgn2 expression in astrocytes is essential for excitatory synapse formation was very exciting to the field but contradicted a large literature demonstrating that Nlgn2 protein is exclusively localized to inhibitory synapses and absent from excitatory synapses (to name just a few papers, see Graf et al., Cell 2004; Varoqueaux et al., Eur. J. Cell Biol. 2004; Patrizi et al., PNAS 2008;  Hoon et al., J. Neurosci. 2009). In addition, the conclusion of Stogsdill et al. that astrocytic Nlgn2 specifically drove excitatory synapse formation was at odds with previous findings documenting that the constitutive deletion of Nlgn2 in all cells, including astrocytes, has no effect on excitatory synapse numbers (again, to name a few papers, see Varoqueaux et al., Neuron 2006; Blundell et al., Genes Brain Behav. 2008; Poulopoulos et al., Neuron 2009; Gibson et al., J. Neurosci. 2009). These contradictions conferred further urgency to our project, but please note that this project was primarily driven by our curiosity about the function of astrocytic neuroligins, not by a fruitless desire to test the validity of one particular Nature paper.

      The general goal of our paper notwithstanding, few papers from our lab have received as much attention and as many negative comments on social media as this paper when it was published as a preprint. Because we take these criticisms seriously, we have over the last year performed extensive additional experiments to ensure that our findings are well founded. We feel that, on balance, our data are incompatible with the notion that astrocytic neuroligins play a fundamental role in excitatory synapse formation but are consistent with other prior findings obtained with neuroligin KO mice. In the new data we added to the paper, we not only characterized the Cre-mediated deletion of neuroligins in depth, but also employed an independent second system -human neurons cultured on mouse glia- to further validate our conclusions as described below. Although we believe that our results are incompatible with the notion that astrocytic neuroligins fundamentally regulate excitatory or inhibitory synapse formation, we also conclude with regret that we still don’t know what astrocytic neuroligins actually do. Thus, the function of astrocytic neuroligins, as there surely must be one, remains a mystery.

      Finally, there are many possible explanations for the discrepancies between our conclusions and those of Stogsdill et al. as described in our paper. Most of these explanations are technical and may explain why not only our, but also the results of many other previous studies from multiple labs, are inconsistent with the conclusions by Stogsdill et al. (2017), as discussed in detail in the revised paper.

      Reviewer #1 (Recommendations For The Authors):

      The paper is very clear and well written. I have only one comment and that is to increase the sizes of Figs 2, 4 and 6 so that the imaging panels can be seen more clearly. Also, although I know the n numbers are provided in the figure legends, the authors may help the reader by providing them in the results when key data and findings are reported.

      We agree and have followed the reviewer’s suggestions as best as we could.

      Reviewer #2 (Recommendations For The Authors):

      (1) Given the strength and importance of the claims that the authors make, I would highly recommend adding some quantitative evidence regarding the efficacy of deletion in astrocytes, e.g. using the same strategy as in Stogsdill et al. As unlikely as it may be that Neuroligin deletion is in fact incomplete, this possibility cannot be excluded unless directly measured. To avoid future discussions on this subject, it seems that the onus is on the authors to provide this information.

      We concur that this is an important point and have devoted a year-long effort to address it. Note, however, that the strategy employed by Stogsdill et al. does not actually allow conclusions about their recombination efficiency. As described above, it only allows the conclusion that some recombination took place. The Stogsdill et al. Nature paper (2017) is a bit confusing on this point. This approach is thus not appropriate to address the question raised by the reviewer.

      We have performed two experiments to address the issue raised by the reviewer.

      First, we used a viral (i.e. AAV2/5) approach to express Rpl22 with a triple HA-tag, also known as Ribotag, which allows us to purify ribosome-bound mRNA from targeted cells for downstream gene expression analysis. The novel construct is driven by the GfaABC1D promoter and includes two additional features which make it particularly useful. First, upstream of Ribotag is a membrane-targeted, Lck-mVenus followed by a self-cleaving P2A sequence. This allows easy visualization of targeted astrocytes. Second, we have incorporated a cassette of four copies of six miRNA targeting sequences (4x6T) for mIR-124 as was recently published (Gleichman et al., 2023) to eliminate off-target expression in neurons. Based on qPCR analysis, the updated construct allowed >95% de-enrichment of neuronal mRNA and slightly improved observed recombination rates (~10% per gene) relative to an earlier version without 4x6T. Mice that were injected with tamoxifen at P1, similar to other experiments in the paper, were then stereotactically injected at ~P35-40 within the dorsal hippocampus with AAV2/5-GfaABC1D-Lck-mVenus-P2A-Rpl22-HA-4x6T. Approximately 3 weeks later, acute slices were prepared, visualized for fluorescence, and both CA1 and nearby cortex that was partially targeted were isolated for downstream ribosome affinity purification with HA antibodies. Total RNA was saved as input. qPCR was performed using assays that are sensitive to the exons that are floxed in the Nlgn123 cKO mice, so that our quantifications are not confounded by potential differences in non-sense mediated decay. Our control data reveals a striking enrichment of an astrocyte marker gene (e.g. aquaporin-4) and de-enrichment of genes for other cell types. In the CA1, we observed robust loss of Nlgn3 (~96%), Nlgn2 (~86%), and Nlgn1 (65%) gene expression. Similarly, in the cortex, we observed a similarly robust loss of Nlgn3 (93%), Nlgn2 (83%), and Nlgn1 (72%) expression. Given that our targeting of astrocytes based on Ai14 Cre-reporter mice was ~90-99%, these reductions are striking and definitive. The existence of some residual transcript reflects the presence of a small population of astrocytes heterozygous for Nlgn2 and Nlgn3. In contrast, Nlgn1 appears more difficult to recombine and it is likely that some astrocytes are either heterozygous or homozygous knockout cells. Although it is thus possible that Nlgn1 could provide some compensation in our experiments, it is worth noting that Stogsdill et al. found that only Nlgn2 and Nlgn3 knockdown with shRNAs resulted in impaired astrocyte morphology by P21. Moreover, they found that Nlgn2 cKO in astrocytes with PALE of a Cre-containing pDNA impaired astrocyte morphology in a gene-dosage dependent manner and suppressed excitatory synapse formation at P21. Thus, our inability to delete all of Nlgn1 doesn’t readily explain contradictions between our findings and theirs.

      Second, in an independent approach we have cultured glia from mouse quadruple conditional Nlgn1234 KO mice and infected the glia with lentiviruses expressing inactive (DCre, control) or active Cre-recombinase. We confirmed complete recombination by PCR. We then cultured human neurons forming excitatory synapses on the glia expressing or lacking neuroligins and measured the frequency and amplitude of mEPSCs as a proxy for synapse numbers and synaptic function. As shown in the new Figure 9, we detected no significant changes in mEPSCs, demonstrating in this independent system that the glial neuroligins do not detectably influence excitatory synapse formation.

      (2) Along the same lines, the authors should be careful not to overstate their findings in this direction. For example, the figure caption for Figure 2 reads 'Nlgn1-3 are efficiently and selectively deleted in astrocytes by crossing triple Nlgn1-3 conditional KO mice with Adh1l1-CreERT2 driver mice and inducing Cre-activity with tamoxifen early during postnatal development'. This is not technically correct and should be modified to reflect that the authors are not in fact assessing deletion of Nlgn1-3, but only expression of a tdTomato reporter.

      We agree – this is essentially the same criticism as comment #1.

      (3) In general, the animal numbers used for the experiments are rather low. With an n = 4 for most experiments, only large abnormalities would be detected anyway, while smaller alterations would not reach statistical significance due to the inherent biological and technical variance. For the most part, this is not a concern, since there really is no difference between WTs and Nlgn1-3 cKOs. However, trends are observed in some cases, and it is conceivable that these would become significant changes with larger n's, e.g. Figure 3H (Vglut2); Figure 4E (VGlut2 S.P., D.G.); Figure 6D (Vglut2). Increasing the numbers to n = 6 here would greatly strengthen the claims that no differences are observed.

      We concur that small differences would not have been detected in our experiments but feel that given the very large phenotypes of the neuroligin deletions in neurons and of the phenotypes reported by Stogsdill et al. (2017), which also did not employ a large number of animals, a very small phenotype in astrocytes would not have been very informative.

      Minor points:

      (1) Please state the exact genetic background for the mouse lines used.

      Our lab generally uses hybrid CD1/Bl6 mice to avoid artifacts produced by inbred genetic mutations in so-called ‘pure’ lines, especially Bl6 mice. This standard protocol was followed in the present study. Thus, the mice are on a mixed CD1/Bl6 hybrid background.

      Reviewer #3 (Recommendations For The Authors):

      (1) Figure 4 demonstrates that neuroligin 1-3 deletions restricted to astrocytes do not affect the number of excitatory and inhibitory synapses in layer IV of the primary visual cortex. This conclusion could be further strengthened if the authors could provide electrophysiological evidence such as mE/IPSCs.

      We agree but have chosen a different avenue to further test our conclusions because slice electrophysiological experiments are time-consuming, labor intensive, and difficult to quantitate, especially in cortex.

      Specifically, we have co-cultured human neurons with astrocytes that either contain or lack neuroligins (new Fig. 9). With this experimental design, we have total control over ALL neuroligins in astrocytes. Electrophysiological recordings then demonstrated that the complete deletion of all glial neuroligins has no effect on mEPSC frequencies and amplitudes. Although clearly much more needs to be done, the new results confirm in an independent system that glial neuroligins have no effect on synapse formation in the neurons, even though neurons depend on astrocytes for synaptogenic factors as Ben Barres brilliantly showed a decade ago. However, it is important to note that dissociated glia in culture, while synaptogenic, are reactive and may not faithfully recapitulate all roles of astrocytes in synaptogenesis.

      (2) It would help readers if the images showing the punctate double marker stainings of excitatory/inhibitory synapses are presented in merged colors (i.e., yellow colors for red and green puncta colors).

      We have tried to improve the visualization of the rather voluminous studies we performed and illustrate in the figures as best as we could.

      (3) The resolutions of the images in the figures are not good, although I guess it is because the images are for review processes.

      We apologize and would like to assure the reviewer that we are supplying high-resolution images to the journal.

      (4) Typos in lines 82 and 274.

      We have corrected these errors.

    1. Author response:

      The following is the authors’ response to the original reviews

      We thank the reviewers for their thoughtful feedback. We have made substantial revisions to the manuscript to address each of their comments, as we detail below. We want to highlight one major change in particular that addresses a concern raised by both reviewers: the role of the drift rate in our models. Motivated by their astute comments, we went back through our models and realized that we had made a particular assumption that deserved more scrutiny. We previously assumed that the process of encoding the observations made correct use of the objective, generative correlation, but then the process of calculating the weight of evidence used a mis-scaled, subjective version of the correlation. These assumptions led us to scale the drift rate in the model by a term that quantified how the standard deviation of the observation distribution was affected by the objective correlation (encoding), but to scale the bound height by the subjective estimate of the correlation (evidence weighing). However, we realized that encoding may also depend on the subjective correlation experienced by the participant. We have now tested several alternative models and found that the best-fitting model assumes that a single, subjective estimate of the correlation governs both encoding and evidence weighing. An important consequence of updating our models in this way is that we can now account for the behavioral data without needing the additional correlation-dependent drift terms (which, as reviewer #2 pointed out, were difficult to explain).

      We also note that we changed the title slightly, replacing “weighting” with “weighing” for consistency with our usage throughout the manuscript.

      Please see below for more details about this important point and our responses to the reviewers’ specific concerns. 

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The behavioral strategies underlying decisions based on perceptual evidence are often studied in the lab with stimuli whose elements provide independent pieces of decision-related evidence that can thus be equally weighted to form a decision. In more natural scenarios, in contrast, the information provided by these pieces is often correlated, which impacts how they should be weighted. Tardiff, Kang & Gold set out to study decisions based on correlated evidence and compare the observed behavior of human decision-makers to normative decision strategies. To do so, they presented participants with visual sequences of pairs of localized cues whose location was either uncorrelated, or positively or negatively correlated, and whose mean location across a sequence determined the correct choice. Importantly, they adjusted this mean location such that, when correctly weighted, each pair of cues was equally informative, irrespective of how correlated it was. Thus, if participants follow the normative decision strategy, their choices and reaction times should not be impacted by these correlations. While Tardiff and colleagues found no impact of correlations on choices, they did find them to impact reaction times, suggesting that participants deviated from the normative decision strategy. To assess the degree of this deviation, Tardiff et al. adjusted drift-diffusion models (DDMs) for decision-making to process correlated decision evidence. Fitting these models to the behavior of individual participants revealed that participants considered correlations when weighing evidence, but did so with a slight underestimation of the magnitude of this correlation. This finding made Tardiff et al. conclude that participants followed a close-to-normative decision strategy that adequately took into account correlated evidence.

      Strengths:

      The authors adjust a previously used experimental design to include correlated evidence in a simple, yet powerful way. The way it does so is easy to understand and intuitive, such that participants don't need extensive training to perform the task. Limited training makes it more likely that the observed behavior is natural and reflective of everyday decision-making. Furthermore, the design allowed the authors to make the amount of decision-related evidence equal across different correlation magnitudes, which makes it easy to assess whether participants correctly take account of these correlations when weighing evidence: if they do, their behavior should not be impacted by the correlation magnitude.

      The relative simplicity with which correlated evidence is introduced also allowed the authors to fall back to the well-established DDM for perceptual decisions, which has few parameters, is known to implement the normative decision strategy in certain circumstances, and enjoys a great deal of empirical support. The authors show how correlations ought to impact these parameters, and which changes in parameters one would expect to see if participants misestimate these correlations or ignore them altogether (i.e., estimate correlations to be zero). This allowed them to assess the degree to which participants took into account correlations on the full continuum from perfect evidence weighting to complete ignorance. With this, they could show that participants in fact performed rational evidence weighting if one assumed that they slightly underestimated the correlation magnitude.

      Weaknesses:

      The experiment varies the correlation magnitude across trials such that participants need to estimate this magnitude within individual trials. This has several consequences:

      (1) Given that correlation magnitudes are estimated from limited data, the (subjective) estimates might be biased towards their average. This implies that, while the amount of evidence provided by each 'sample' is objectively independent of the correlation magnitude, it might subjectively depend on the correlation magnitude. As a result, the normative strategy might differ across correlation magnitudes, unlike what is suggested in the paper. In fact, it might be the case that the observed correlation magnitude underestimates corresponds to the normative strategy.

      We thank the reviewer for raising this interesting point, which we now address directly with new analyses including model fits (pp. 15–24). These analyses show that the participants were computing correlation-dependent weights of evidence from observation distributions that reflected suboptimal misestimates of correlation magnitudes. This strategy is normative in the sense that it is the best that they can do, given the encoding suboptimality. However, as we note in the manuscript, we do not know the source of the encoding suboptimality (pp. 23–24). We thus do not know if there might be a strategy they could have used to make the encoding more optimal.

      (2) The authors link the normative decision strategy to putting a bound on the log-likelihood ratio (logLR), as implemented by the two decision boundaries in DDMs. However, as the authors also highlight in their discussion, the 'particle location' in DDMs ceases to correspond to the logLR as soon as the strength of evidence varies across trials and isn't known by the decision maker before the start of each trial. In fact, in the used experiment, the strength of evidence is modulated in two ways:

      (i) by the (uncorrected) distance of the cue location mean from the decision boundary (what the authors call the evidence strength) and

      (ii) by the correlation magnitude. Both vary pseudo-randomly across trials, and are unknown to the decision-maker at the start of each trial. As previous work has shown (e.g. Kiani & Shadlen (2009), Drugowitsch et al. (2012)), the normative strategy then requires averaging over different evidence strength magnitudes while forming one's belief. This averaging causes the 'particle location' to deviate from the logLR. This deviation makes it unclear if the DDM used in the paper indeed implements the normative strategy, or is even a good approximation to it.

      We appreciate this subtle, but important, point. We now clarify that the DDM we use includes degrees of freedom that are consistent with normative decision processes that rely on the imperfect knowledge that participants have about the generative process on each trial, specifically: 1) a single drift-rate parameter that is fit to data across different values of the mean of the generative distribution, which is based on the standard assumption for these kinds of task conditions in which stimulus strength is varied randomly from trial-to-trial and thus prevents the use of exact logLR (which would require stimulus strength-specific scale factors; Gold and Shadlen, 2001); 2) the use of a collapsing bound, which in certain cases (including our task) is thought to support a stimulus strength-dependent calibration of the decision variable to optimize decisions (Drugowitsch et al, 2012); and 3) free parameters (one per correlation) to account for subjective estimates of the correlation, which affected the encoding of the observations that are otherwise weighed in a normative manner in the best-fitting model.

      Also, to clarify our terminology, we define the objective evidence strength as the expected logLR in a given condition, which for our task is dependent on both the distance of the mean from the decision boundary and the correlation (p. 7). 

      Given that participants observe 5 evidence samples per second and on average require multiple seconds to form their decisions, it might be that they are able to form a fairly precise estimate of the correlation magnitude within individual trials. However, whether this is indeed the case is not clear from the paper.

      These points are now addressed directly in Results (pp. 23–24) and Figure 7 supplemental figures 1–3. Specifically, we show that, as the reviewer correctly surmised above, empirical correlations computed on each trial tended to be biased towards zero (Fig 7–figure supplement 1). However, two other analyses were not consistent with the idea that participants’ decisions were based on trial-by-trial estimates of the empirical correlations: 1) those with the shortest RTs did not have the most-biased estimates (Fig 7–figure supplement 2), and 2) there was no systematic relationship between objective and subjective fit correlations across participants (Fig 7–figure supplement 3).

      Furthermore, the authors capture any underestimation of the correlation magnitude by an adjustment to the DDM bound parameter. They justify this adjustment by asking how this bound parameter needs to be set to achieve correlation-independent psychometric curves (as observed in their experiments) even if participants use a 'wrong' correlation magnitude to process the provided evidence. Curiously, however, the drift rate, which is the second critical DDM parameter, is not adjusted in the same way. If participants use the 'wrong' correlation magnitude, then wouldn't this lead to a mis-weighting of the evidence that would also impact the drift rate? The current model does not account for this, such that the provided estimates of the mis-estimated correlation magnitudes might be biased.

      We appreciate this valuable comment, and we agree that we previously neglected the potential impact of correlation misestimates on evidence strength. As we now clarify, the correlation enters these models in two ways: 1) via its effect on how the observations are encoded, which involves scaling both the drift and the bound; and 2) via its effect on evidence weighing, which involves scaling only the bound (pp. 15–18). We previously assumed that only the second form of scaling might involve a subjective (mis-)estimate of the correlation. We now examine several models that also include the possibility of either or both forms using subjective correlation estimates. We show that a model that assumes that the same subjective estimate drives both encoding and weighing (the “full-rho-hat” model) best accounts for the data. This model provides better fits (after accounting for differences in numbers of parameters) than models with: 1) no correlation-dependent adjustments (“base” model), 2) separate drift parameters for each correlation condition (“drift” model), 3) optimal (correlation-dependent) encoding but suboptimal weighing (“bound-rho-hat” model, which was our previous formulation), 4) suboptimal encoding and weighing (“scaled-rho-hat” model), and 5) optimal encoding but suboptimal weighing and separate correlation-dependent adjustments to the drift rate (“boundrho-hat plus drift” model). We have substantially revised Figures 5–7 and the associated text to address these points.

      Lastly, the paper makes it hard to assess how much better the participants' choices would be if they used the correct correlation magnitudes rather than underestimates thereof. This is important to know, as it only makes sense to strictly follow the normative strategy if it comes with a significant performance gain.

      We now include new analyses in Fig. 7 that demonstrate how much participants' choices and RT deviate from: 1) an ideal observer using the objective correlations, and 2) an observer who failed to adjust for the fit subjective correlation when weighing the evidence (i.e., using the subjective correlation for encoding but a correlation of zero for weighing). We now indicate that participants’ performance was quite close to that predicted by the ideal observer (using the true, objective correlation) for many conditions. Thus, we agree that they might not have had the impetus to optimize the decision process further, assuming it were possible under these task conditions.

      Reviewer #2 (Public review):

      Summary:

      This study by Tardiff, Kang & Gold seeks to: i) develop a normative account of how observers should adapt their decision-making across environments with different levels of correlation between successive pairs of observations, and ii) assess whether human decisions in such environments are consistent with this normative model.

      The authors first demonstrate that, in the range of environments under consideration here, an observer with full knowledge of the generative statistics should take both the magnitude and sign of the underlying correlation into account when assigning weight in their decisions to new observations: stronger negative correlations should translate into stronger weighting (due to the greater information furnished by an anticorrelated generative source), while stronger positive correlations should translate into weaker weighting (due to the greater redundancy of information provided by a positively correlated generative source). The authors then report an empirical study in which human participants performed a perceptual decision-making task requiring accumulation of information provided by pairs of perceptual samples, under different levels of pairwise correlation. They describe a nuanced pattern of results with effects of correlation being largely restricted to response times and not choice accuracy, which could partly be captured through fits of their normative model (in this implementation, an extension of the well-known drift-diffusion model) to the participants' behaviour while allowing for misestimation of the underlying correlations.

      Strengths:

      As the authors point out in their very well-written paper, appropriate weighting of information gathered in correlated environments has important consequences for real-world decisionmaking. Yet, while this function has been well studied for 'high-level' (e.g. economic) decisions, how we account for correlations when making simple perceptual decisions on well-controlled behavioural tasks has not been investigated. As such, this study addresses an important and timely question that will be of broad interest to psychologists and neuroscientists. The computational approach to arrive at normative principles for evidence weighting across environments with different levels of correlation is very elegant, makes strong connections with prior work in different decision-making contexts, and should serve as a valuable reference point for future studies in this domain. The empirical study is well designed and executed, and the modelling approach applied to these data showcases a deep understanding of relationships between different parameters of the drift-diffusion model and its application to this setting. Another strength of the study is that it is preregistered.

      Weaknesses:

      In my view, the major weaknesses of the study center on the narrow focus and subsequent interpretation of the modelling applied to the empirical data. I elaborate on each below:

      Modelling interpretation: the authors' preference for fitting and interpreting the observed behavioural effects primarily in terms of raising or lowering the decision bound is not well motivated and will potentially be confusing for readers, for several reasons. First, the entire study is conceived, in the Introduction and first part of the Results at least, as an investigation of appropriate adjustments of evidence weighting in the face of varying correlations. The authors do describe how changes in the scaling of the evidence in the drift-diffusion model are mathematically equivalent to changes in the decision bound - but this comes amidst a lengthy treatment of the interaction between different parameters of the model and aspects of the current task which I must admit to finding challenging to follow, and the motivation behind shifting the focus to bound adjustments remained quite opaque. 

      We appreciate this valuable feedback. We have revised the text in several places to make these important points more clearly. For example, in the Introduction we now clarify that “The weight of evidence is computed as a scaled version of each observation (the scaling can be applied to the observations or to the bound, which are mathematically equivalent; Green and Swets, 1966) to form the logLR” (p. 3). We also provide more details and intuition in the Results section for how and why we implemented the DDM the way we did. In particular, we now emphasize that the correlation enters these models in two ways: 1) via its effect on encoding the observations, which scales both the drift and the bound; and 2) via its effect on evidence weighing, which scales only the bound (pp. 15–18).

      Second, and more seriously, bound adjustments of the form modelled here do not seem to be a viable candidate for producing behavioural effects of varying correlations on this task. As the authors state toward the end of the Introduction, the decision bound is typically conceived of as being "predefined" - that is, set before a trial begins, at a level that should strike an appropriate balance between producing fast and accurate decisions. There is an abundance of evidence now that bounds can change over the course of a trial - but typically these changes are considered to be consistently applied in response to learned, predictable constraints imposed by a particular task (e.g. response deadlines, varying evidence strengths). In the present case, however, the critical consideration is that the correlation conditions were randomly interleaved across trials and were not signaled to participants in advance of each trial - and as such, what correlation the participant would encounter on an upcoming trial could not be predicted. It is unclear, then, how participants are meant to have implemented the bound adjustments prescribed by the model fits. At best, participants needed to form estimates of the correlation strength/direction (only possible by observing several pairs of samples in sequence) as each trial unfolded, and they might have dynamically adjusted their bounds (e.g. collapsing at a different rate across correlation conditions) in the process. But this is very different from the modelling approach that was taken. In general, then, I view the emphasis on bound adjustment as the candidate mechanism for producing the observed behavioural effects to be unjustified (see also next point).

      We again appreciate this valuable feedback and have made a number of revisions to try to clarify these points. In addition to addressing the equivalence of scaling the evidence and the bound in the Introduction, we have added the following section to Results (Results, p.18):

      “Note that scaling the bound in these formulations follows conventions of the DDM, as detailed above, to facilitate interpretation of the parameters. These formulations also raise an apparent contradiction: the “predefined” bound is scaled by subjective estimates of the correlation, but the correlation was randomized from trial to trial and thus could not be known in advance. However, scaling the bound in these ways is mathematically equivalent to using a fixed bound on each trial and scaling the observations to approximate logLR (see Methods). This equivalence implies that in the brain, effectively scaling a “predefined” bound could occur when assigning a weight of evidence to the observations as they are presented.”

      We also note in Methods (pp. 40–41):

      “In the DDM, this scaling of the evidence is equivalent to assuming that the decision variable accumulates momentary evidence of the form (x1 + x2) and then dividing the bound height by the appropriate scale factor. An alternative approach would be to scale both the signal and noise components of the DDM by the scale factor. However, scaling the bound is both simpler and maintains the conventional interpretation of the DDM parameters in which the bound reflects the decision-related components of the evidence accumulation process, and the drift rate represents sensory-related components.”

      We believe we provide strong evidence that participants adjust their evidence weighing to account for the correlations (see response below), but we remain agnostic as to how exactly this weighing is implemented in the brain.

      Modelling focus: Related to the previous point, it is stated that participants' choice and RT patterns across correlation conditions were qualitatively consistent with bound adjustments (p.20), but evidence for this claim is limited. Bound adjustments imply effects on both accuracy and RTs, but the data here show either only effects on RTs, or RT effects mixed with accuracy trends that are in the opposite direction to what would be expected from bound adjustment (i.e. slower RT with a trend toward diminished accuracy in the strong negative correlation condition; Figure 3b). Allowing both drift rate and bound to vary with correlation conditions allowed the model to provide a better account of the data in the strong correlation conditions - but from what I can tell this is not consistent with the authors' preregistered hypotheses, and they rely on a posthoc explanation that is necessarily speculative and cannot presently be tested (that the diminished drift rates for higher negative correlations are due to imperfect mapping between subjective evidence strength and the experimenter-controlled adjustment to objective evidence strengths to account for effects of correlations). In my opinion, there are other candidate explanations for the observed effects that could be tested but lie outside of the relatively narrow focus of the current modelling efforts. Both explanations arise from aspects of the task, which are not mutually exclusive. The first is that an interesting aspect of this task, which contrasts with most common 'univariate' perceptual decision-making tasks, is that participants need to integrate two pieces of information at a time, which may or may not require an additional computational step (e.g. averaging of two spatial locations before adding a single quantum of evidence to the building decision variable). There is abundant evidence that such intermediate computations on the evidence can give rise to certain forms of bias in the way that evidence is accumulated (e.g. 'selective integration' as outlined in Usher et al., 2019, Current Directions in Psychological Science; Luyckx et al., 2020, Cerebral Cortex) which may affect RTs and/or accuracy on the current task. The second candidate explanation is that participants in the current study were only given 200 ms to process and accumulate each pair of evidence samples, which may create a processing bottleneck causing certain pairs or individual samples to be missed (and which, assuming fixed decision bounds, would presumably selectively affect RT and not accuracy). If I were to speculate, I would say that both factors could be exacerbated in the negative correlation conditions, where pairs of samples will on average be more 'conflicting' (i.e. further apart) and, speculatively, more challenging to process in the limited time available here to participants. Such possibilities could be tested through, for example, an interrogation paradigm version of the current task which would allow the impact of individual pairs of evidence samples to be more straightforwardly assessed; and by assessing the impact of varying inter-sample intervals on the behavioural effects reported presently.

      We thank the reviewer for this thoughtful and valuable feedback. We have thoroughly updated the modeling section to include new analysis and clearer descriptions and interpretations of our findings (including Figs. 5–7 and additional references to the Usher, Luyckx, and other studies that identified decision suboptimalities). The comment about “an additional computational step” in converting the observations to evidence was particularly useful, in that it made us realize that we were making what we now consider to be a faulty assumption in our version of the DDM. Specifically, we assumed that subjective misestimates of the correlation affected how observations were converted to evidence (logLR) to form the decision (implemented as a scaling of the bound height), but we neglected to consider how suboptimalities in encoding the observations could also lead to misestimates of the correlation. We have retained the previous best-fitting models in the text, for comparison (the “bound-rho-hat” and “bound-rho-hat + drift” models). In addition, we now include a “full-rho-hat” model that assumes that misestimates of rho affect both the encoding of the observations, which affects the drift rate and bound height, and the weighing of the evidence, which affects only the bound height. This was the best-fitting model for most participants (after accounting for different numbers of parameters associated with the different models we tested). Note that the full-rho-hat model predicts the lack of correlation-dependent choice effects and the substantial correlation-dependent RT effects that we observed, without requiring any additional adjustments to the drift rate (as we resorted to previously).

      In summary, we believe that we now have a much more parsimonious account of our data, in terms of a model in which subjective estimates of the correlation are alone able to account for our patterns of choice and RT data. We fully agree that more work is needed to better understand the source of these misestimates but also think those questions are outside the scope of the present study.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      A few minor comments:

      (1) Evidence can be correlated in multiple ways. It could be correlated within individual pieces of evidence in a sequence, or across elements in that sequence (e.g., across time). This distinction is important, as it determines how evidence ought to be accumulated across time. In particular, if evidence is correlated across time, simply summing it up might be the wrong thing to do. Thus, it would be beneficial to make this distinction in the Introduction, and to mention that this paper is only concerned with the first type of correlation.

      We now clarify this point in the Introduction (p. 5–6).

      (2) It is unclear without reading the Methods how the blue dashed line in Figure 4c is generated. To my understanding, it is a prediction of the naive DDM model. Is this correct?

      We now specify the models used to make the predictions shown in Fig. 4c (which now includes an additional model that uses unscaled observations as evidence).

      (3) In Methods, given the importance of the distribution of x1 + x2, it would be useful to write it out explicitly, e.g., x1 + x2 ~ N(2 mu_g, ..), specifying its mean and its variance.

      Excellent suggestion, added to p. 38.

      (4) From Methods and the caption of Figure 6 - Supplement 1 it becomes clear that the fitted DDM features a bound that collapses over time. I think that this should also be mentioned in the main text, as it is a not-too-unimportant feature of the model.

      Excellent suggestion, added to p. 15, with reference to Fig. 6-supplement 1 on p. 20.

      (5) The functional form of the bound is 2 (B - tb t). To my understanding, the effective B changes as a function of the correlation magnitude. Does tb as well? If not, wouldn't it be better if it does, to ensure that 2 (B - tb t) = 0 independent of the correlation magnitude?

      In our initial modeling, we also considered whether the correlation-dependent adjustment, which is a function of both correlation sign and magnitude, should be applied to the initial bound or to the instantaneous bound (i.e., after collapse, affecting tb as well). In a pilot analysis of data from 22 participants in the 0.6 correlation-magnitude group, we found that this choice had a negligible effect on the goodness-of-fit (deltaAIC = -0.9, protected exceedance probability = 0.63, in favor of the instantaneous bound scaling). We therefore used the instantaneous bound version in the analyses reported in the manuscript but doubt this choice was critical based on these results. We have clarified our implementation of the bound in Methods (p. 43–44).

      Reviewer #2 (Recommendations for the authors):

      In addition to the points raised above, I have some minor suggestions/open questions that arose from my reading of the manuscript:

      (1) Are the predictions outlined in the paper specific to cases where the two sources are symmetric around zero? If distributions are allowed to be asymmetric then one can imagine cases (i.e. when distribution means are sufficiently offset from one another) where positive correlations can increase evidence strength and negative correlations decrease evidence strength. There's absolutely still value and much elegance in what the authors are showing with this work, but if my intuition is correct, it should ideally be acknowledged that the predictions are restricted to a specific set of generative circumstances.

      We agree that there are a lot of ways to manipulate correlations and their effect on the weight of evidence. At the end of the Discussion, we emphasize that our results apply to this particular form of correlation (p. 32).

      (2) Isn't Figure 4C misleading in the sense that it collapses across the asymmetry in the effect of negative vs positive correlations on RT, which is clearly there in the data and which simply adjusting the correlation-dependent scale factor will not reproduce?

      We agree that this analysis does not address any asymmetries in suboptimal estimates of positive versus negative correlations. We believe that those effects are much better addressed using the model fitting, which we present later in the Results section. We have now simplified the analyses in Fig. 4c, reporting the difference in RT between positive and negative correlation conditions instead of a linear regression.

      (3) I found the transition on p.17 of the Results section from the scaling of drift rate by correlation to scaling of bound height to be quite abrupt and unclear. I suspect that many readers coming from a typical DDM modelling background will be operating under the assumption that drift rate and bound height are independent, and I think more could be done here to explain why scaling one parameter by correlation in the present case is in fact directly equivalent to scaling the other.

      Thank you for the very useful feedback, we have substantially revised this text to make these points more clearly.

      (4) P.3, typo: Alan *Turing*

      That’s embarrassing. Fixed.

      (5) P.27, typo: "participants adopt a *fixed* bound"

      Fixed.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1:

      The manuscript suggests the zebrafish homolog of ctla-4 and generates a new mutant in it. However, the locus that is mutated is confusingly annotated as both CD28 (current main annotation in ZFIN) and CTLA-4/CD152 (one publication from 2020), see: https://zfin.org/ZDB-GENE-070912-128. Both human CTLA-4 and CD28 align with relatively similar scores to this gene. There seem to be other orthologs of these receptors in the zebrafish genome, including CD28-like (https://zfin.org/ZDB-GENE-070912-309) which neighbors the gene annotated as CD28 (exhibiting similar synteny as human CD28 and CTLA-4). It would be helpful to provide more information to distinguish between this family of genes and to further strengthen the evidence that this mutant is in ctla-4, not cd28. Also, is one of these genes in the zebrafish genome (e.g. cd28l) potentially a second homolog of CTLA-4? Is this why this mutant is viable in zebrafish and not mammals? Some suggestions:

      (a) A more extensive sequence alignment that considers both CTLA-4 and CD28, potentially identifying the best homolog of each human gene, especially taking into account any regions that are known to produce the functional differences between these receptors in mammals and effectively assigns identities to the two genes annotated as "cd28" and "cd28l" as well as the gene "si:dkey-1H24.6" that your CD28 ORF primers seem to bind to in zebrafish.

      In response to the reviewer's insightful suggestions, we have conducted more extensive sequence alignment and phylogenetic analyses that consider both CTLA-4, CD28, and CD28-like molecules, taking into account key regions crucial for the functionalities and functional differences between these molecules across various species, including mammals and zebrafish.

      Identification of zebrafish Ctla-4: We identified zebrafish Ctla-4 as a homolog of mammalian CTLA-4 based on key conserved structural and functional characteristics. Structurally, the Ctla-4 gene shares similar exon organization compared to mammalian CTLA-4. Ctla-4 is a type I transmembrane protein with typical immunoglobulin superfamily features. Multiple amino acid sequence alignments revealed that Ctla-4 contains a <sup>113</sup>LFPPPY<sup>118</sup> motif and a <sup>123</sup>GNGT<sup>126</sup> motif in the ectodomain, and a tyrosine-based <sup>206</sup>YVKF<sup>209</sup> motif in the distal C-terminal region. These motifs closely resemble MYPPPY, GNGT, and YVKM motifs in mammalian CTLA-4s, which are essential for binding to CD80/CD86 ligands and molecular internalization and signaling inhibition. Despite only 23.7% sequence identity to human CTLA-4, zebrafish Ctla-4 exhibits a similar tertiary structure with a two-layer β-sandwich architecture in its extracellular IgV-like domain. Four cysteine residues responsible for the formation of two pairs of disulfide bonds (Cys<sup>20</sup>-Cys<sup>91</sup>/Cys<sup>46</sup>-Cys<sup>65</sup> in zebrafish and Cys<sup>21</sup>-Cys<sup>92</sup>/Cys<sup>48</sup>-Cys<sup>66</sup> in humans) that connect the two-layer β-sandwich are conserved. Additionally, a separate cysteine residue (Cys<sup>120</sup> in zebrafish and Cys<sup>120</sup> in humans) involved in dimerization is also present, and Western blot analysis under reducing and non-reducing conditions confirmed Ctla-4’s dimerization. Phylogenetically, Ctla-4 clusters with other known CTLA-4 homologs from different species with high bootstrap probability, while zebrafish Cd28 groups separately with other CD28s. Functionally, Ctla-4 is predominantly expressed on CD4<sup>+</sup> T and CD8<sup>+</sup> T cells in zebrafish. It plays a pivotal inhibitory role in T cell activation by competing with CD28 for binding to CD80/86, as validated through a series of both in vitro and in vivo assays, including microscale thermophoresis assays which demonstrated that Ctla-4 exhibits a significantly higher affinity for Cd80/86 than Cd28 (KD = 0.50 ± 0.25 μM vs. KD = 2.64 ± 0.45 μM). These findings confirm Ctla-4 as an immune checkpoint molecule, reinforcing its identification within the CTLA-4 family.

      Comparison between zebrafish Cd28 and "Cd28l": Zebrafish Cd28 contains an extracellular SYPPPF motif and an intracellular FYIQ motif. The extracellular SYPPPF motif is essential for binding to Cd80/CD86, while the intracellular FYIQ motif likely mediates kinase recruitment and co-stimulatory signaling. In contrast, the "Cd28l" molecule lacks the SYPPPF motif, which is critical for Cd80/CD86 binding, and exhibits strong similarity in its C-terminal 79 amino acids to Ctla-4 rather than Cd28. Consequently, "Cd28l" resembles an atypical Ctla-4-like molecule but fails to exhibit Cd80/CD86 binding activity.

      We have incorporated the relevant analysis results into the main text of the revised manuscript and updated Supplementary Figure 1. Additionally, we provide key supplementary analyses here for the reviewer's convenience.  

      Author response image 1.

      Illustrates the alignment of Ctla-4 (XP_005167576.1) and Ctla-4-like (XP_005167567.1, previously referred to as "Cd28l") in zebrafish, generated using ClustalX and Jalview. Conserved and partially conserved amino acid residues are highlighted in color gradients ranging from carnation to red, respectively. The B7-binding motif is encircled with a red square.

      (b) Clearer description in the main text of such an analysis to better establish that the mutated gene is a homolog of ctla-4, NOT cd28.

      We appreciate the reviewer's advice. Additional confirmation of zebrafish Ctla-4 is detailed in lines 119-126 of the revised manuscript.

      (c) Are there mammalian anti-ctla-4 and/or anti-cd28 antibodies that are expected to bind to these zebrafish proteins? If so, looking to see whether staining is lost (or western blotting is lost) in your mutants could be additionally informative. (Our understanding is that your mouse anti-Ctla-4 antibody is raised against recombinant protein generated from this same locus, and so is an elegant demonstration that your mutant eliminates the production of the protein, but unfortunately does not contribute additional information to help establish its homology to mammalian proteins).

      This suggestion holds significant value. However, a major challenge in fish immunology research is the limited availability of antibodies suitable for use in fish species; antibodies developed for mammals are generally not applicable. We attempted to use human and mouse anti-CTLA-4 and anti-CD28 antibodies to identify Ctla-4 and Cd28 in zebrafish, but the results were inconclusive, with no expected signals. This outcome likely arises from the low sequence identity between human/mouse CTLA-4 and CD28 and their zebrafish homologs (ranging from 21.3% to 23.7% for CTLA-4 and 21.2% to 24.0% for CD28). Therefore, developing specific antibodies against zebrafish Ctla-4 is essential for advancing this research.

      The methods section is generally insufficient and doesn't describe many of the experiments performed in this manuscript. Some examples:

      (a) No description of antibodies used for staining or Western blots (Figure1C, 1D, 1F).

      (b) No description of immunofluorescence protocol (Figure 1D, 1F).

      (c) No description of Western blot protocol (Figure 1C, 2C).

      (d) No description of electron microscopy approach (Figure 2K).

      (e) No description of the approach for determining microbial diversity (Entirety of Figure 6).

      (f) No description of PHA/CFSE/Flow experiments (Figure 7A-E).

      (g) No description of AlphaFold approach (Figures 7F-G).

      (h) No description of co-IP approach (Figure 7H).

      (i) No description of MST assay or experiment (Figure 7I).

      (j) No description of purification of recombinant proteins, generation of anti-Ctla-4 antibody, or molecular interaction assays (Figures S2 and S6).

      We apologize for this oversight. The methods section was inadvertently incomplete due to an error during the file upload process at submission. This issue has been addressed in the revised manuscript. We appreciate your understanding.

      Figure 5 suggests that there are more Th2 cells 1, Th2 cells 2, and NKT cells in ctla-4 mutants through scRNA-seq. However, as the cell numbers for these are low in both genotypes, there is only a single replicate for each genotype scRNA-seq experiment, and dissociation stress can skew cell-type proportions, this finding would be much more convincing if another method that does not depend on dissociation was used to verify these results. Furthermore, while Th2 cells 2 are almost absent in WT scRNA-seq, KEGG analysis suggests that a major contributor to their clustering may be ribosomal genes (Fig. 5I). Since no batch correction was described in the methods, it would be beneficial to verify the presence of this cluster in ctla-4 mutants and WT animals through other means, such as in situ hybridization or transgenic lines.   

      We are grateful for the insightful comments provided by the reviewer. Given that research on T cell subpopulations in fish is still in its nascent stages, the availability of specific marker antibodies and relevant transgenic strains remains limited. Our single-cell RNA sequencing (scRNA-seq) analysis revealed that a distinct Th2 subset 2 was predominantly observed in Ctla-4 mutants but was rare in wild-type zebrafish, it suggests that this subset may primarily arise under pathological conditions associated with Ctla-4 mutation. Due to the near absence of Th2 subset 2 in wild-type samples, KEGG enrichment analysis was performed exclusively on this subset from Ctla-4-deficient intestines. The ribosome pathway was significantly enriched, suggesting that these cells may be activated to fulfill their effector functions. However, confirming the presence of Th2 subset 2 using in situ hybridization or transgenic zebrafish lines is currently challenging due to the lack of lineage-specific markers for detailed classification of Th2 cell subsets and the preliminary nature of scRNA-seq predictions.

      To address the reviewers' suggestion to confirm compositional changes in Th2 and NKT cells using dissociation-independent methods, we quantified mRNA levels of Th2 (il4, il13, and gata3) and NKT (nkl.2, nkl.4, and prf1.1) cell marker genes via RT-qPCR in intestines from wild-type and mutant zebrafish. As shown in Figure S7B and S7C, these markers were significantly upregulated in Ctla-4-deficient intestines compared to wild-type controls. This indicates an overall increase in Th2 and NKT cell activity in mutant zebrafish, aligning with our scRNA-seq analysis and supports the validity of our initial findings.

      Before analyzing the scRNA-seq data, we performed batch correction using the Harmony algorithm via cloud-based Cumulus v1.0 on the aggregated gene-count matrices. This methodological detail has been included in the “Materials and Methods” section of the revised manuscript. Moreover, the RT-qPCR results are presented in Supplementary Figures S7B and S7C.

      Quality control (e.g., no. of UMIs, no. of genes, etc.) metrics of the scRNAseq experiments should be presented in the supplementary information for each sample to help support that observed differential expression is not merely an outcome of different sequencing depths of the two samples.

      As illustrated in Fig. S5, the quality control data have been supplemented to include the effective cell number of the sample, along with pre- and post-filtering metrics such as nFeature_RNA, nCount_RNA and mitochondrial percentage (percent.mito). Furthermore, scatter plots comparing the basic information of the sample cells before and after filtering are provided.

      Some references to prior research lack citations. Examples:

      (a)"Given that Ctla-4 is primarily expressed on T cells (Figure 1E-F), and its absence has been shown to result in intestinal immune dysregulation, indicating a crucial role of this molecule as a conserved immune checkpoint in T cell inhibition."

      The references were incorporated into line 71 of the revised manuscript.

      (b) Line 83: Cite evidence/review for the high degree of conservation in adaptive immunity.

      The references were incorporated into line 93 of the revised manuscript.

      (c) Lines 100-102: Cite the evidence that MYPPPY is a CD80/86 binding motif.

      The references were incorporated into line 117 of the revised manuscript.

      The text associated with Figure 8 (Lines 280-289) does not clearly state that rescue experiments are being done in mutant zebrafish.

      We have provided a clear explanation of the rescue experiments conducted in Ctla-4-deficient zebrafish. This revision has been incorporated into line 319.

      Line 102: Is there evidence from other animals that LFPPPY can function as a binding site for CD80/CD86? Does CD28 also have this same motif?

      The extracellular domains of CTLA-4 and CD28, which bind to CD80/CD86, are largely conserved across various species. This conservation is exemplified by a central PPP core motif, although the flanking amino acids exhibit slight variations. In mammals, both CTLA-4 and CD28 feature the conserved MYPPPY motif. By contrast, in teleost fish, such as rainbow trout, CTLA-4 contains an LYPPPY motif, while CD28 has an MYPPPI motif (Ref. 1). Grass carp CTLA-4 displays an LFPPPY motif, whereas its CD28 variant bears an IYPPPF motif. Yeast two-hybrid assays confirm that these motifs facilitate interactions between grass carp CTLA-4 and CD28 with CD80/CD86 (Ref. 2). Similarly, zebrafish Ctla-4 contains the LFPPPY motif observed in grass carp, while Cd28 exhibits a closely related SYPPPF motif.

      References:

      (1) Bernard, D et al. (2006) Costimulatory Receptors in a Teleost Fish: Typical CD28, Elusive CTLA-4. J Immunol. 176: 4191-4200.

      (2) Lu T Z et al. (2022) Molecular and Functional Analyses of the Primordial Costimulatory Molecule CD80/86 and Its Receptors CD28 and CD152 (CTLA-4) in a Teleost Fish. Frontiers in Immunology. 13:885005.

      Line 110-111: Suggest adding citation of these previously published scRNAseq data to the main text in addition to the current description in the Figure legend.

      The reference has been added in line 129 in the main text.

      Figure 3B: It would be helpful to label a few of the top differentially expressed genes in Panel B?

      The top differentially expressed genes have been labeled in Figure 3B.

      Figure 3G: It's unclear how this analysis was conducted, what this figure is supposed to demonstrate, and in its current form it is illegible.

      Figure 3G displays a protein-protein interaction network constructed from differentially expressed genes. The densely connected nodes, representing physical interactions among proteins, provide valuable insights for basic scientific inquiry and biological or biomedical applications. As proteins are crucial to diverse biological functions, their interactions illuminate the molecular and cellular mechanisms that govern both healthy and diseased states in organisms. Consequently, these networks facilitate the understanding of pathogenic and physiological processes involved in disease onset and progression.

      To construct this network, we first utilized the STRING database (https://string-db.org) to generate an initial network diagram using the differentially expressed genes. This diagram was subsequently imported into Cytoscape (version 3.9.1) for visualization and further analysis. Node size and color intensity reflect the density of interactions, indicating the relative importance of each protein. Figure 3G illustrates that IL1β was a central cytokine hub in the disease process of intestinal inflammation in Ctla-4-deficient zebrafish.

      Expression scale labeling:

      (a) Most gene expression scales are not clearly labeled: do they represent mean expression or scaled expression? Has the expression been log-transformed, and if so, which log (natural log? Log10? Log2?). See: Figure 3E, 3I, 4D, 4E, 5B, 5G, 5H, 6I.

      The gene expression scales are detailed in the figure legends. Specifically, Figures 3E, 3I, and 6I present heatmaps depicting row-scaled expression levels for the corresponding genes. In contrast, Figures 4D and 4E display heatmaps illustrating the mean expression of these genes. Additionally, the dot plots in Figures 5B, 5G, and 5H visualize the mean expression levels of the respective genes.

      (b) For some plots, diverging color schemes (i.e. with white/yellow in the middle) are used for non-diverging scales and would be better represented with a sequential color scale. See: 4D, 4E, and potentially others (not fully clear because of the previous point).

      The color schemes in Figures 4D and 4E have been updated to a sequential color scale. The gene expression data depicted in these figures represent mean expression values and have not undergone log transformation. This information has been incorporated into the figure legend for clarity.

      Lines 186-187: Though it is merely suggested, apoptotic gene expression can be upregulated as part of the dissociation process for single-cell RNAseq. This would be much stronger if supported by a staining, such as anti-Caspase 3.

      Following the reviewer's insightful recommendations, we conducted a TUNEL assay to evaluate apoptosis in the posterior intestinal epithelial cells of both wild-type and Ctla-4-deficient zebrafish. As expected, our results demonstrate a significant increase in epithelial cell apoptosis in Ctla-4-deficient zebrafish compared with wild-type fish. The corresponding data are presented in Figure S6D and have been incorporated into the manuscript. Detailed protocols for the TUNEL assay have also been included in the Materials and Methods section.

      Author response image 2.

      Illustrates the quantification of TUNEL-positive cells per 1 × 10<sup>4</sup> μm<sup>2/⁻</sup> in the posterior intestines of both wild-type (WT) and ctla-4<sup>⁻/⁻</sup> zebrafish (n = 5). The data demonstrate a comparative analysis of apoptotic cell density between the two genotypes.

      Lines 248-251: This manuscript demonstrates gut inflammation and also changes in microbial diversity, but I don't think it demonstrates an association between them, which would require an experiment that for instance rescues one of these changes and shows that it ameliorates the other change, despite still being a ctla-4 mutant.

      We appreciate the valuable comments from the reviewer. Recently, the relationship between inflammatory bowel disease (IBD) and gut microbial diversity has garnered considerable attention, with several key findings emerging from human IBD studies. For instance, patients with IBD (including ulcerative colitis and Crohn's disease) exhibit reduced microbial diversity, which is correlated with disease severity. This decrease in microbial richness is thought to stem from the loss of normal anaerobic bacteria, such as Bacteroides, Eubacterium, and Lactobacillus (Refs. 1-6). Research using mouse models has shown that inflammation increases oxygen and nitrate levels within the intestinal lumen, along with elevated host-derived electron acceptors, thereby promoting anaerobic respiration and overgrowth of Enterobacteriaceae (Ref 7). Consistent with these findings, our study observed a significant enrichment of Enterobacteriaceae in the inflamed intestines of Ctla-4-deficient zebrafish, which supporting the observations in mice. Despite this progress, the zebrafish model for intestinal inflammation remains under development, with limitations in available techniques for manipulating intestinal inflammation and reconstructing gut microbiota. These challenges hinder investigations into the association between intestinal inflammation and changes in microbial diversity. We plan to address these issues through ongoing technological advancements and further research. We thank the reviewer for their understanding.

      References:

      (1) Ott S J, Musfeldt M, Wenderoth D F, Hampe J, Brant O, Fölsch U R et al. (2004) Reduction in diversity of the colonic mucosa associated bacterial microflora in patients with active inflammatory bowel disease. Gut 53:685-693.

      (2) Manichanh C, Rigottier-Gois L, Bonnaud E, Gloux K, Pelletier E, Frangeul L et al. (2006) Reduced diversity of faecal microbiota in Crohn's disease revealed by a metagenomic approach. Gut 55:205-211.

      (3) Qin J J, Li R Q, Raes J, Arumugam M, Burgdorf K S, Manichanh C et al. (2010) A human gut microbial gene catalogue established by metagenomic sequencing. Nature 464:59-U70.

      (4) Sha S M, Xu B, Wang X, Zhang Y G, Wang H H, Kong X Y et al. (2013) The biodiversity and composition of the dominant fecal microbiota in patients with inflammatory bowel disease. Diagn Micr Infec Dis 75:245-251.

      (5) Ray K. (2015) IBD. Gut microbiota in IBD goes viral. Nat Rev Gastroenterol Hepatol 12:122.

      (6) Papa E, Docktor M, Smillie C, Weber S, Preheim S P, Gevers D et al. (2012) Non-Invasive Mapping of the Gastrointestinal Microbiota Identifies Children with Inflammatory Bowel Disease. Plos One 7: e39242-39254.

      (7) Hughes E R, Winter M G, Duerkop B A, Spiga L, de Carvalho T F, Zhu W H et al. (2017) Microbial Respiration and Formate Oxidation as Metabolic Signatures of Inflammation-Associated Dysbiosis. Cell Host Microbe 21:208-219.

      Lines 270-272 say that interaction between Cd28/ctla-4 and Cd80/86 was demonstrated through bioinformatics, flow-cytometry, and Co-IP. Does this need to reference Fig S6D for the flow data? Figures 7F-G are very hard to read or comprehend as they are very small. Figure 7H is the most compelling evidence of this interaction and might stand out better if emphasized with a sentence referencing it on its own in the manuscript. 

      In this study, we utilized an integrated approach combining bioinformatics prediction, flow cytometry, and co-immunoprecipitation (Co-IP) to comprehensively investigate and validate the interactions between Cd28/Ctla-4 and Cd80/86. Flow cytometry analysis, as depicted in Supplementary Figure 6D (revised as Supplementary Figure 8F), demonstrated the surface expression of Cd80/86 on HEK293T cells and quantified their interactions with Cd28 and Ctla-4. These experiments not only validated the interactions between Cd80/86 and Cd28/Ctla-4 but also revealed a dose-dependent relationship, providing robust supplementary evidence for the molecular interactions under investigation. Furthermore, in Figure 7F-G, the axis font sizes were enlarged to improve readability. Additionally, in response to reviewers' feedback, we have emphasized Figure 7H, which presents the most compelling evidence for molecular interactions, by including a standalone sentence in the text to enhance its prominence.

      For Figure 7A-E, for non-immunologists, it is unclear what experiment was performed here - it would be helpful to add a 1-sentence summary of the assay to the main text or figure legend.

      We apologize for this oversight. Figures 7A–E illustrate the functional assessment of the inhibitory role of Ctla-4 in Cd80/86 and Cd28-mediated T cell activation. A detailed description of the methodologies associated with Figures 7A–E is provided in the ‘Materials and Methods’ section of the revised manuscript.

      For Figure 7F-G, it is extremely hard to read the heat map legends and the X and Y-axis. Also, what the heatmaps show and how that fits the overall narrative can be elaborated significantly.

      We regret this oversight. To enhance clarity, we have increased the font size of the heatmap legends and the X and Y-axes, as shown in the following figure. Additionally, a detailed analysis of these figures is provided in lines 299–306 of the main text.

      In general, the main text that accompanies Figure 7 should be expanded to more clearly describe these experiments/analyses and their results.

      We have conducted a detailed analysis of the experiments and results presented in Figure 7. This analysis is described in lines 278-314.

      Reviewer #2:

      The scRNASeq assay is missing some basic characterization: how many WT and mutant fish were assayed in the experiment? how many WT and mutant cells were subject to sequencing? Before going to the immune cell types, are intestinal cell types comparable between the two conditions? Are there specific regions in the tSNE plot in Figure 4A abundant of WT or ctla-4 mutant cells?

      In the experiment, we analyzed 30 wild-type and 30 mutant zebrafish for scRNA-seq, with an initial dataset comprising 8,047 cells in the wild-type group and 8,321 cells in the mutant group. Sample preparation details are provided on lines 620-652. Due to the relatively high expression of mitochondrial genes in intestinal tissue, quality control filtering yielded 3,263 cells in the wild-type group and 4,276 cells in the mutant group. Given that the intestinal tissues were dissociated using identical protocols, the resulting cell types are comparable between the two conditions. Both the wild-type and Ctla-4-deficient groups contained enterocytes, enteroendocrine cells, smooth muscle cells, neutrophils, macrophages, B cells, and a cluster of T/NK/ILC-like cells. Notably, no distinct regions were enriched for either condition in the tSNE plot (Figure 4A).

      The cell proliferation experiment using PHA stimulation assay demonstrated the role of Ctla-4 in cell proliferation, while the transcriptomic evidence points towards activation rather than an overall expansion of T-cell numbers. This should be discussed towards a more comprehensive model of how subtypes of cells can be differentially proliferating in the disease model.

      In the PHA-stimulated T cell proliferation assay, we aimed to investigate the regulatory roles of Ctla-4, Cd28, and Cd80/86 in T cell activation, focusing on validating Ctla-4's inhibitory function as an immune checkpoint. While our study examined general regulatory mechanisms, it did not specifically address the distinct roles of Ctla-4 in different T cell subsets. We appreciate the reviewer's suggestion to develop a more comprehensive model that elucidates differential T cell activation across various subsets in disease models. However, due to the nascent stage of research on fish T cell subsets and limitations in lineage-specific antibodies and transgenic strains, such investigations are currently challenging. We plan to pursue these studies in the future. Despite these constraints, our single-cell RNA sequencing data revealed an increased proportion of Th2 subset cells in Ctla-4-deficient zebrafish, as evidenced by elevated expression levels of Th2 markers (Il4, Il13, and Gata3) via RT-qPCR (see Figures S7B). Notably, recent studies in mouse models have shown that naïve T cells from CTLA-4-deficient mice tend to differentiate into Th2 cells post-proliferation, with activated Th2 cells secreting higher levels of cytokines like IL-4, IL-5, and IL-13, thereby exerting their effector functions (Refs. 1-2). Consequently, our findings align with observations in mice, suggesting conserved CTLA-4 functions across species. We have expanded the "Discussion" section to clarify these points.

      References:

      (1) Bour-Jordan H, Grogan J L, Tang Q Z, Auger J A, Locksley R M, Bluestone J A et al. (2003) CTLA-4 regulates the requirement for cytokine-induced signals in T<sub>H</sub>2 lineage commitment. Nature Immunology 4: 182-188.

      (2) Khattri Roli, Auger, Julie A, Griffin Matthew D, Sharpe Arlene H, Bluestone Jeffrey A et al. (1999) Lymphoproliferative Disorder in CTLA-4 Knockout Mice Is Characterized by CD28-Regulated Activation of Th2 Responses. The Journal of Immunology 162:5784-5791.

      It would be nice if the authors could also demonstrate whether other tissues in the zebrafish have an inflammation response, to show whether the model is specific to IBD.

      In addition to intestinal tissues, we also performed histological analysis on the liver of Ctla-4-deficient zebrafish. The results showed that Ctla-4 deficiency led to mild edema in a few hepatocytes, and lymphocyte infiltration was not significant. Compared to the liver, we consider intestinal inflammation to be more pronounced.

      Some minor comments on terminology

      (a) "multiomics" usually refers to omics experiments with different modalities (e.g. transcriptomics, proteomics, metabolomics etc), while the current paper only has transcriptomics assays. I wouldn't call it "multiomics" analysis.

      We appreciate the reviewer's attention to this issue. The "multi-omics" has been revised to "transcriptomics".

      (b) In several parts of the figure legend the author mentioned "tSNE nonlinear clustering" (Figures 4A and 5A). tSNE is an embedding method rather than a clustering method.

      The "tSNE nonlinear clustering" has been revised to "tSNE embedding”.

      (c) Figure 1E is a UMAP rather than tSNE.

      The "tSNE" has been revised to "UMAP" in the figure legend in line 1043.

      Reviewer #3: 

      Line 28: The link is not directly reflected in this sentence describing CTLA-4 knockout mice.

      We appreciate the reviewer for bringing this issue to our attention. We have expanded our description of CTLA-4 knockout mice on lines 77-84.

      Line 80-83: There is a lack of details about the CTLA-4-deficient mice. The factor that Th2 response could be induced has been revealed in mouse model. See the reference entitled "CTLA-4 regulates the requirement for cytokine-induced signals in TH2 lineage commitment" published in Nature Immunology.

      We thank the reviewer for providing valuable references. We have added descriptions detailing the differentiation of T cells into Th2 cells in CTLA-4-deficient mice on lines 78–81, and the relevant references have been cited in the revised manuscript.

      To better introduce the CTLA-4 immunobiology, the paper entitled "Current Understanding of Cytotoxic T Lymphocyte Antigen-4 (CTLA-4) Signaling in T-Cell Biology and Disease Therapy" published in Molecules and Cells should be referred.

      We have provided additional details on CTLA-4 immunology (lines 75-84) and have included the relevant reference in the revised manuscript.

      In current results, there are many sentences that should be moved to the discussion, such as lines 123-124, lines 152-153, lines 199-200, and lines 206-207. So, the result sections just describe the results, and the discussions should be put together in the discussion.

      We have relocated these sentences to the 'Discussion' section and refined the writing.

      In the discussion, the zebrafish enteritis model, such as DSS/TNBS and SBMIE models, should also be compared with the current CTLA-4 knockout model. Also, the comparison between the current fish IBD model and the previous mouse model should also be included, to enlighten the usage of CTLA-4 knockout zebrafish IBD model.

      We compared the phenotypes of our current Ctla-4-knockout zebrafish IBD model with other models, including DSS-induced IBD models in zebrafish and mice, as well as TNBS- and SBM-induced IBD models in zebrafish. The details are included in the "Discussion" section (lines 353-365).

      As to the writing, the structure of the discussion is poor. The paragraphs are very long and hard to follow. Many findings from current results were not yet discussed. I just can't find any discussion about the alteration of intestinal microbiota.

      In response to the reviewers' constructive feedback, we have revised and enhanced the discussion section. Furthermore, we have integrated the most recent research findings relevant to this study into the discussion to improve its relevance and comprehensiveness.

      In the discussion, the aerobic-related bacteria in 16s rRNA sequencing results should be focused on echoing the histopathological findings, such as the emptier gut of CTLA-4 knockout zebrafish.

      As mentioned above, the discussion section has been revised and expanded to provide a better understanding of the potential interplay among intestinal inflammatory pathology, gut microbiota alterations, and immune cell dysregulation in Ctla-4-deficient zebrafish. Furthermore, promising avenues for future research that warrant further investigation were also discussed.

      In the current method, there are no descriptions for many used methods, which already generated results, such as WB, MLR, MST, Co-IP, AlphaFold2 prediction, and how to make currently used anti-zfCTLA4 antibody. Also, there is a lack of description of the method of the husbandry of knockout zebrafish line.

      We regret these flaws. The methods section was inadvertently incomplete due to an error during the file upload process at submission. This issue has been rectified in the revised manuscript. Additionally, Ctla-4-deficient zebrafish were reared under the same conditions as wild-type zebrafish, and the rearing methods are now described in the "Generation of Ctla-4-deficient zebrafish" section of the Materials and Methods.

      Line 360: the experimental zebrafish with different ages could be a risk for unstable intestinal health. See the reference entitled "The immunoregulatory role of fish-specific type II SOCS via inhibiting metaflammation in the gut-liver axis" published in Water Biology and Security. The age-related differences in zebrafish could be observed in the gut.

      We appreciate the reviewers' reminders. The Ctla-4 mutant zebrafish used in our experiments were 4 months old, while the wild-type zebrafish ranged from 4 to 6 months old. These experimental fish were relatively young and uniformly distributed in age. During our study, we examined the morphological structures of the intestines in zebrafish aged 4 to 6 months and observed no significant abnormalities. These findings align with previous research indicating no significant difference in intestinal health between 3-month-old and 6-month-old wild-type zebrafish (Ref. 1). Consequently, we conclude that there is no notable aging-related change in the intestines of zebrafish aged 4 to 6 months. This reduces the risk associated with age-related variables in our study. We have added an explanation stating that the Ctla-4 mutant zebrafish used in the experiments were 4 months old (Line 449) in the revised manuscript.

      Reference

      (1) Shan Junwei, Wang Guangxin, Li Heng, Zhao Xuyang et al. (2023) The immunoregulatory role of fish-specific type II SOCS via inhibiting metaflammation in the gut-liver axis. Water Biology and Security 2: 100131-100144.

      Section "Generation of Ctla-4-deficient zebrafish": There is a lack of description of PCR condition for the genotyping.

      The target DNA sequence was amplified at 94 °C for 4 min, followed by 35 cycles at 94°C for 30 s, 58°C for 30 s and 72°C for 30 s, culminating in a final extension at 72 °C for 10 min. The polymerase chain reaction (PCR) conditions are described in lines 458-460.

      How old of the used mutant fish? There should be a section "sampling" to provide the sampling details.

      The "Sampling" information has been incorporated into the "Materials and Methods" section of the revised manuscript. Wild-type and Ctla-4-deficient zebrafish of varying months were housed in separate tanks, each labeled with its corresponding birth date. Experiments utilized Ctla-4-deficient zebrafish aged 4 months and wild-type zebrafish aged between 4 to 6 months.

      Line 378-380: The index for the histopathological analysis should be detailed, rather than just provide a reference. I don't think these indexes are good enough to specifically describe the pathological changes of intestinal villi and mucosa. It is suggested to improve with detailed parameters. As described in the paper entitled "Pathology of Gastric Intestinal Metaplasia: Clinical Implications" published in Am J Gastroenterol., histochemical, normal gastric mucins are pH neutral, and they stain magenta with periodic acid-Schiff (PAS). In an inflamed gut, acid mucins replace the original gastric mucins and are stained blue with Alcian blue (AB). So, to reveal the pathological changes of goblet cells and involved mucin components, AB staining should be added. Also, for the number of goblet cells in the inflammatory intestine, combining PAS and AB staining is the best way to reveal all the goblet cells. In Figure 2, there were very few goblet cells. The infiltration of lymphocytes and the empty intestinal lumen could be observed. Thus, the ratio between the length of intestinal villi and the intestinal ring radius should calculated.

      In response to the reviewers’ valuable suggestions, we have augmented the manuscript by providing additional parameters related to the pathological changes observed in the Ctlta-4-deficient zebrafish intestines, including the mucin component changes identified through PAS and AB-PAS staining, the variations in the number of goblet cells evaluated by AB-PAS staining, and the ratio of intestinal villi length to the intestinal ring radius, as illustrated in the following figures. These new findings are detailed in the "Materials and Methods" (lines 563-566) and "Results" (lines 143-146) sections, along with Supplementary Figure S3 of the revised manuscript.

      Section "Quantitative real-time PCR": What's the machine used for qPCR? How about the qPCR validation of RNA seq data? I did not see any related description of data and methods for qPCR validation. In addition, beta-actin is not a stable internal reference gene, to analyze inflammation and immune-related gene expression. See the reference entitled "Actin, a reliable marker of internal control?" published in Clin Chim Acta. Other stable housekeeping genes, such as EF1alpha and 18s, could be better internal references.

      RT-qPCR experiments were conducted using a PCR thermocycler device (CFX Connect Real-Time PCR Detection System with Precision Melt Analysis<sup>TM</sup> Software, Bio-Rad, Cat. No. 1855200EM1). This information has been incorporated into lines 608-610 of the "Materials and Methods" section. In these experiments, key gene sequences of interest, including il13, mpx, and il1β, were extracted from RNA-seq data for RT-qPCR validation. To ensure accurate normalization, potential internal controls were evaluated, and β-actin was identified as a suitable candidate due to its consistent expression levels in the intestines of both wild-type and Ctla-4-deficient zebrafish. The use of β-actin as an internal control is further supported by its application in recent studies on intestinal inflammation (Refs 1–2).

      References:

      (1) Tang Duozhuang, Zeng Ting, Wang Yiting, Cui Hui et al. (2020) Dietary restriction increases protective gut bacteria to rescue lethal methotrexate-induced intestinal toxicity. Gut Microbes 12: 1714401-1714422.

      (2) Malik Ankit, Sharma Deepika et al. (2023) Epithelial IFNγ signaling and compartmentalized antigen presentation orchestrate gut immunity. Nature 623: 1044-1052.

      How to generate sCtla-4-Ig, Cd28-Ig and Cd80/86? No method could be found.

      We apologize for the omission of these methods. The detailed protocols have now been added to the "Materials and Methods" section of the revised manuscript (lines 464-481).

      Figure 5: As reviewed in the paper entitled "Teleost T and NK cell immunity" published in Fish and Shellfsh Immunology, two types of NK cell homologues have been described in fish: non-specific cytotoxic cells and NK-like cells. There is no NKT cell identified in the teleost yet. Therefore, "NKT-like" could be better to describe this cell type.

      We refer to "NKT" cells as "NKT-like" cells, as suggested.

      For the supplementary data of scRNA-seq, there lacks the details of expression level.

      The expression levels of the corresponding genes are provided in Supplemental Table 4.

      Supplemental Table 1: There are no accession numbers of amplified genes.

      The accession numbers of the amplified genes are included in Supplemental Table 1.

      The English needs further editing.

      We have made efforts to enhance the English to meet the reviewers' expectations.

      Line 32: The tense should be the past.

      This tense error has been corrected.

      Line 363-365: The letter of this approval should be provided as an attachment.

      The approval document is provided as an attachment.

      Line 376: How to distinguish the different intestinal parts? Were they judged as the first third, second third, and last third parts of the whole intestine?

      The differences among the three segments of zebrafish intestine are apparent. The intestinal tube narrows progressively from the anterior to the mid-intestine and then to the posterior intestine. Moreover, the boundaries between the intestinal segments are well-defined, facilitating the isolation of each segment.

      Line 404: Which version of Cytoscape was used?

      The version of Cytoscape used in this study is 3.9.1. Information about the Cytoscape version is provided on line 603.

      The product information of both percoll and cell strainer should be provided.

      The information regarding Percoll and cell strainers has been added on lines 626 and 628, respectively.

      Line 814: Here should be a full name to tell what is MST.

      The acronym MST stands for "Microscale Thermophoresis", a technique that has been referenced on lines 1157-1158.

    1. Author response:

      The following is the authors’ response to the original reviews

      eLife Assessment

      This study presents valuable findings related to seasonal brain size plasticity in the Eurasian common shrew (Sorex araneus), which is an excellent model system for these studies. The evidence supporting the authors' claims is convincing. However, the authors should be careful when applying the term adaptive to the gene expression changes they observe; it would be challenging to demonstrate the differential fitness effects of these gene expression changes. The work will be of interest to biologists working on neuroscience, plasticity, and evolution.

      We appreciate the reviewers’ suggestions and comments. For the phylogenetic ANOVA we used (EVE), which tests for a separate RNA expression optimum specific to the shrew lineage consistent with expectations for adaptive evolution of gene expression. But, as you noted, while this analysis highlights many candidate genes evolving in a manner consistent with positive selection, further functional validation is required to confirm if and how these genes contribute to Dehnel’s phenomenon. In the discussion, we now emphasize that inferred adaptive expression of these genes is putative and outline that future studies are needed to test the function of proposed adaptations. For example, cell line validations of BCL2L1 on apoptosis is a case study that tests the function of a putatively adaptive change in gene expression, and it illuminates this limitation. We also have refined our discussion to focus more on pathway-level analyses rather than on individual genes, and have addressed other issues presented, including clarity of methods and using sex as a covariate in our analyses.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this paper, Thomas et al. set out to study seasonal brain gene expression changes in the Eurasian common shrew. This mammalian species is unusual in that it does not hibernate or migrate but instead stays active all winter while shrinking and then regrowing its brain and other organs. The authors previously examined gene expression changes in two brain regions and the liver. Here, they added data from the hypothalamus, a brain region involved in the regulation of metabolism and homeostasis. The specific goals were to identify genes and gene groups that change expression with the seasons and to identify genes with unusual expression compared to other mammalian species. The reason for this second goal is that genes that change with the season could be due to plastic gene regulation, where the organism simply reacts to environmental change using processes available to all mammals. Such changes are not necessarily indicative of adaptation in the shrew. However, if the same genes are also expression outliers compared to other species that do not show this overwintering strategy, it is more likely that they reflect adaptive changes that contribute to the shrew's unique traits.

      The authors succeeded in implementing their experimental design and identified significant genes in each of their specific goals. There was an overlap between these gene lists. The authors provide extensive discussion of the genes they found.

      The scope of this paper is quite narrow, as it adds gene expression data for only one additional tissue compared to the authors' previous work in a 2023 preprint. The two papers even use the same animals, which had been collected for that earlier work. As a consequence, the current paper is limited in the results it can present. This is somewhat compensated by an expansive interpretation of the results in the discussion section, but I felt that much of this was too speculative. More importantly, there are several limitations to the design, making it hard to draw stronger conclusions from the data. The main contribution of this work lies in the generated data and the formulation of hypotheses to be tested by future work.

      Thank you for your interest in our manuscript and for your insights. We addressed your comments below: we now highlight the limitations of our study design in the discussion and emphasize that, while a second optimum of gene expression in shrews is consistent with adaptive evolution, we recognize that not all sources of variation in gene expression can be fully accounted for. We highlight the putative nature of these results in our revisions, especially in our new limitations section (lines 541-555).

      Strengths:

      The unique biological model system under study is fascinating. The data were collected in a technically sound manner, and the analyses were done well. The paper is overall very clear, well-written, and easy to follow. It does a thorough job of exploring patterns and enrichments in the various gene sets that are identified.

      I specifically applaud the authors for doing a functional follow-up experiment on one of the differentially expressed genes (BCL2L1), even if the results did not support the hypothesis. It is important to report experiments like this and it is terrific to see it done here.

      We are glad to hear that you found our manuscript fascinating and clearly written. While we hoped to see an effect of BCL2L1 on apoptosis as proposed, we agree that reporting null results is valuable when validating evolutionary inferences.

      Weaknesses:

      While the paper successfully identifies differentially expressed seasonal genes, the real question is (as explained by the authors) whether these are evolved adaptations in the shrews or whether they reflect plastic changes that also exist in other species. This question was the motivation for the inter-species analyses in the paper, but in my view, these cannot rigorously address this question. Presumably, the data from the other species were not collected in comparable environments as those experienced by the shrews studied here. Instead, they likely (it is not specified, and might not be knowable for the public data) reflect baseline gene expression. To see why this is problematic, consider this analogy: if we were to compare gene expression in the immune system of an individual undergoing an acute infection to other, uninfected individuals, we would see many, strong expression differences. However, it would not be appropriate to claim that the infected individual has unique features - the relevant physiological changes are simply not triggered in the other individuals. The same applies here: it is hard to draw conclusions from seasonal expression data in the shrews to non-seasonal data in the other species, as shrew outlier genes might still reflect physiological changes that weren't active in the other species.

      There is no solution for this design flaw given the public data available to the authors except for creating matched data in the other species, which is of course not feasible. The authors should acknowledge and discuss this shortcoming in the paper.

      Thank you for taking the time to provide such insightful feedback. As you noted, whiles shrews experience seasonal size changes, their environments may differ from the other species used in this experiment, leading to increased or decreased expression of certain genes and reducing our ability accurately detect selection across the phylogeny. Although we sought to control for as many sources of variation as possible, such as using only post-pubescent, wild, or non-domesticated individuals when feasible, we recognize that not all sources of variation can be fully accounted for within a practical experiment. We agree that these sources of variation can introduce both false positives and negatives into our results, and we have now highlighted this limitation within our discussion (lines 538-552).

      Related to the point above: in the section "Evolutionary Divergence in Expression" it is not clear which of the shrew samples were used. Was it all of them, or only those from winter, fall, etc? One might expect different results depending on this. E.g., there could be fewer genes with inferred adaptive change when using only summer samples. The authors should specify which samples were included in these analyses, and, if all samples were used, conduct a robustness analysis to see which of their detected genes survive the exclusion of certain time points.

      Thank you for this attention to detail. We used spring adults for this analysis. This decision was made as only used post pubescent individuals for all species in the analysis, and this was the only season where adult shrews were going through Dehnel’s phenomenon. We have now clarified this in both the methods and results (line 247 and line 667)

      In the same section, were there also genes with lower shrew expression? None are mentioned in the text, so did the authors not test for this direction, or did they test and there were no significant hits?

      We did test for decreased shrew expression compared to the rest of the species, but there were no significant genes with significant decreases. We hypothesize that there are two potential reasons for this results; 1) If a gene were to be selected for decreased expression, selection for constitutive expression of the gene across all species may be weak, and thus found in other lineages as well, or 2) decreased or no expression may relax selection on the coding regions, and thus these genes are not pulled out as we identify 1:1 orthologs. This is consistent with results provided from the original methods manuscript. Thank you for pointing out that we did not discuss this information in the text, and we now include it in our results (lines 250-251).

      The Discussion is too long and detailed, given that it can ultimately only speculate about what the various expression changes might mean. Many of the specific points made (e.g. about the blood-brain-barrier being more permissive to sensing metabolic state, about cross-organ communication, the paragraphs on single, specific genes) are a stretch based on the available data. Illustrating this point, the one follow-up experiment the authors did (on BCL2L1) did not give the expected result. I really applaud the authors for having done this experiment, which goes beyond typical studies in this space. At the same time, its result highlights the dangers of reading too much into differential expression analyses.

      We agree with your point, while our extensive discussion is useful for testing future hypotheses, ultimately some of the discussion may be too speculative for our readers. To amend this, we have reduced some portions of our discussion and focused more on pathways than individual genes, including removing mechanisms related to HRH2, FAM57B, GPR3, and GABAergic neurons. We hope that this highlights to the reader the speculative nature of many of our results.

      There is no test of whether the five genes observed in both analyses (seasonal change and inter-species) exceed the number expected by chance. When two gene sets are drawn at random, some overlap is expected randomly. The expected overlap can be computed by repeated draws of pairs of random sets of the same size as seen in real data and by noting the overlap between the random pairs. If this random distribution often includes sets of five genes, this weakens the conclusions that can be drawn from the genes observed in the real data.

      Thank you for highlighting this approach, it is greatly needed. After running this test, we found that observed overlapping genes were more than the expected overlap, yet not significant. We now show this in our methods (lines 277-278) and results (lines 719-720).

      Reviewer #2 (Public review):

      Summary:

      Shrews go through winter by shrinking their brain and most organs, then regrow them in the spring. The gene expression changes underlying this unusual brain size plasticity were unknown. Here, the authors looked for potential adaptations underlying this trait by looking at differential expression in the hypothalamus. They found enrichments for DE in genes related to the blood-brain barrier and calcium signaling, as well as used comparative data to look at gene expression differences that are unique in shrews. This study leverages a fascinating organismal trait to understand plasticity and what might be driving it at the level of gene expression. This manuscript also lays the groundwork for further developing this interesting system.

      We are glad you found our manuscript interesting and thank and thank you for your feedback. We hope that we have addressed all of your concerns as described below.

      Strengths:

      One strength is that the authors used OU models to look for adaptation in gene expression. The authors also added cell culture work to bolster their findings.

      Weaknesses:

      I think that there should be a bit more of an introduction to Dehnel's phenomenon, given how much it is used throughout.

      Thank you for this insight. With a lengthy introduction and discussion, we agree that the importance of Dehnel’s phenomenon may have been overshadowed. We have shortened both sections and emphasized the background on Dehnel’s phenomenon in the first two paragraphs of the introduction, allowing this extraordinary seasonal size plasticity to stand out.

      Reviewer #3 (Public review):

      Summary:

      In their study, the authors combine developmental and comparative transcriptomics to identify candidate genes with plastic, canalized, or lineage-specific (i.e., divergent) expression patterns associated with an unusual overwintering phenomenon (Dehnel's phenomenon - seasonal size plasticity) in the Eurasian shrew. Their focus is on the shrinkage and regrowth of the hypothalamus, a brain region that undergoes significant seasonal size changes in shrews and plays a key role in regulating metabolic homeostasis. Through combined transcriptomic analysis, they identify genes showing derived (lineage-specific), plastic (seasonally regulated), and canalized (both lineage-specific and plastic) expression patterns. The authors hypothesize that genes involved in pathways such as the blood-brain barrier, metabolic state sensing, and ion-dependent signaling will be enriched among those with notable transcriptomic patterns. They complement their transcriptomic findings with a cell culture-based functional assessment of a candidate gene believed to reduce apoptosis.

      Strengths:

      The study's rationale and its integration of developmental and comparative transcriptomics are well-articulated and represent an advancement in the field. The transcriptome, known for its dynamic and plastic nature, is also influenced by evolutionary history. The authors effectively demonstrate how multiple signals-evolutionary, constitutive, and plastic-can be extracted, quantified, and interpreted. The chosen phenotype and study system are particularly compelling, as it not only exemplifies an extreme case of Dehnel's phenotype, but the metabolic requirements of the shrew suggest that genes regulating metabolic homeostasis are under strong selection.

      Weaknesses:

      (1) In a number of places (described in detail below), the motivation for the experimental, analytical, or visualization approach is unclear and may obscure or prevent discoveries.

      Thank you for finding our research and manuscript compelling, as well as the valuable feedback that will drastically improve our manuscript. We hope that we have alleviated your concerns below by following your instructions below.

      (2) Temporal Expression - Figure 1 and Supplemental Figure 2 and associated text:

      - It is unclear whether quantitative criteria were used to distinguish "developmental shift" clusters from "season shift" clusters. A visual inspection of Supplemental Figure 2 suggests that some clusters (e.g., clusters 2, 8, and to a lesser extent 12) show seasonal variation, not just developmental differences between stages 1 and 2. While clustering helps to visualize expression patterns, it may not be the most appropriate filter in this case, particularly since all "season shift" clusters are later combined in KEGG pathway and GO analyses (Figure 1B).

      - The authors do not indicate whether they perform cluster-specific GO or KEGG pathway enrichment analyses. The current analysis picks up relevant pathways for hypothalamic control of homeostasis, which is a useful validation, but this approach might not fully address the study's key hypotheses.

      Thank you for this valuable feedback. We did not want to include clusters we deemed to be related to development, as this should not be attributed to changes associated with Dehnel’s phenomenon. We did this through qualitative, visual inspection, which we realize can differ between parties (i.e., clusters 2, 8, and 12 appeared to be seasonal). Qualitatively, we were looking for extreme divergence between Stage 1 and Stage 5 individuals, as expression was related to season and not development, then the average of these stages within cluster should be relatively similar. We have now quantified this as large differences in z-score (abs(summer juvenile-summer adult)>1.25) without meaningful interseason variations determined by a second local maximum (abs(autumn-winter)<0.5 and abs(winter-summer)<0.5)), and added it both our methods (lines 699-702) and results (line 192).

      Regarding the combination of clusters for pathway enrichment compared to individual pathways, we agree that combining clusters may be more informative for overall homeostasis, compared to individual clusters which may inform us on processes directly related to Dehnel’s phenomenon. Initially, we were tentative to conduct this analysis, as clusters contain small gene sets, reducing the ability to detect pathway enrichments. We have now included this analysis, which is reported in our methods (lines 703-704), results (lines 203-204)., and new supplemental table.

      (3) Differential expression between shrinkage (stage 2) and regrowth (stage 4) and cell culture targets

      - The rationale for selecting BCL2L1 for cell culture experiments should be clarified. While it is part of the apoptosis pathway, several other apoptosis-related genes were identified in the differential gene expression (DGE) analysis, some showing stronger differential expression or shrew-specific branch shifts. Why was BCL2L1 prioritized over these other candidates?

      We agree that our rationale for validating BCL2L1 function in neural cell lines was not clearly explained in the manuscript. We selected BCL2L1 because it is the furthest downstream gene in the apoptotic pathway, thus making it the most directly involved gene in programmed cell death, whereas upstream genes could influence additional genes or alternative processes. We have clarified this choice in the revised methods section (lines 748-750).

      - The authors mention maintaining (or at least attempting to maintain) a 1:1 sex ratio for the comparative analysis, but it is unclear if this was also done for the S. araneus analysis. If not, why? If so, was sex included as a covariate (e.g., a random effect) in the differential expression analysis? Sex-specific expression elevates with group variation and could impact the discovery of differentially expressed genes.

      Regarding the use of sex as a covariate, we acknowledge the concerns raised. In our evolutionary analyses, we maintained a balanced sex ratio within species when possible. EVE models handle the effect of sex on gene expression as intraspecific variation. In shrews, however, we used males exclusively, as females were only found among juvenile individuals. Including those juvenile females would have introduced age effects, with perhaps a larger effect on our results. For the seasonal data, we have now included sex as a covariate in differential expression analyses. However, our design is imbalanced in relation to sex, which we have now discussed in our methods (lines 713-714) and discussion limitations (lines 544-548).

      (4) Discussion: The term "adaptive" is used frequently and liberally throughout the discussion. The interpretation of seasonal changes in gene expression as indicators of adaptive evolution should be done cautiously as such changes do not necessarily imply causal or adaptive associations.

      Thank you for this insight. We have reviewed our discussion and clarified that adaptations are putative (i.e. lines 146, 285, and 332), and highlighted this in our limitations section.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) I would recommend always spelling out "Dehnel's phenomenon" or even replacing this term (after crediting the DP term) with the more informative "seasonal size plasticity". Every time I saw "DP", I had to remind myself what this referred to. If the authors choose not to do so, please use the acronym consistently (e.g. line 186 has it spelled out).

      We have replaced the acronym DP with either the full term or the more informative “seasonal size plasticity” throughout the text.

      (2) Line 202: "DEG" has not been defined. Simply add to the line before.

      Thank you for this attention to detail. We have added this to the line above (210).

      (3) Please add a reference for the "AnAge" tool that was used to determine if samples were pubescent.

      Thank you for identifying this oversight. We have now cited the proper paper in line 634.

      (4) In the BCL2L1 section in the results, add a callout to Figure 2D.

      We have now added a callout to Figure 2D within the results (line 234).

      Reviewer #2 (Recommendations for the authors):

      (1) Line 122: is associated? These adaptations?

      Thank you for identifying that we were missing the words “associated with” here. We have fixed this in the revision.

      (2) The first paragraph of the Results should be moved to the methods, except maybe the number of orthologs.

      Thank you for this insight. We have removed this portion from the results section.

      (3) Why a Bonferroni correction on line 188? That seems too strict.

      We agree the Bonferroni correction is strict. Results when using other less strict methods for controlling false discovery rate are also not significant after correction. These corrections can be found within the data, however, we only report on the Bonferroni correction.

      (4) Line 427: "is a novel candidate gene for several neurological disorders" needs some references. I see them a couple of sentences later, but that's quite a sentence with no references at the end.

      We have added the proper citations for this sentence (line 524).

      Reviewer #3 (Recommendations for the authors):

      (1) Temporal Expression - Figure 1 and Supplemental Figure 2 and associated text Line176-193:

      - The authors report the total number of genes meeting inclusion criteria (>0.5-fold change between any two stages and 2 samples >10 normalized reads), but it would be more informative to also provide the number of genes within each temporal cluster. This would offer a clearer understanding of how gene expression patterns are distributed over time.

      Unfortunately, this information is difficult to depict on our figure and would use too much space in the text. We have thus added a description of the range of genes in a new supplemental table depicting this information.

      - It is unclear whether quantitative criteria were used to distinguish "developmental shift" clusters from "season shift" clusters. A visual inspection of Supplemental Figure 2 suggests that some clusters (e.g., clusters 2, 8, and to a lesser extent 12) show seasonal variation, not just developmental differences between stages 1 and 2. While clustering helps to visualize expression patterns, it may not be the most appropriate filter in this case, particularly since all "season shift" clusters are later combined in KEGG pathway and GO analyses (Fig. 1B). Using a differential gene expression criterion might be more suitable. For example, do excluded genes show significant log-fold differences between late-stage comparisons?

      As previously mentioned, we have now quantified seasonal shifts as large differences in z-score (abs(summer juveniles-summer adults)>1.25) without meaningful interseason variations determined by a second local maximum (abs(autumn-winter)<0.5 and abs(winter-summer)<0.5)), and added it to our methods (lines 699-702).  We then follow this up with differential expression analyses as described in Figure 2.

      - Did the authors perform cluster-specific GO or KEGG pathway enrichment analyses instead of focusing on the combined set of genes across the season shift clusters? While I understand that the small number of genes in each cluster may be limiting, if pathways emerge from cluster-specific analysis, they could provide more detailed insights into the functional significance of these temporal expression patterns. The current analysis picks up relevant pathways for hypothalamic control of homeostasis, which is a useful validation, but this approach might not fully address the study's key hypotheses. Additionally, no corrections for multiple hypothesis testing were applied, as noted in the results. A more refined gene set (e.g., using differential expression criteria, described above) could be more appropriate for these analyses.

      We have now included cluster-specific KEGG enrichments as previously described.

      (2) Differential expression between shrinkage (stage 2) and regrowth (stage 4) and cell culture targets - Figure 2 and lines195-227:

      - The rationale for selecting BCL2L1 for cell culture experiments should be clarified. While it is part of the apoptosis pathway, several other apoptosis-related genes were identified in the differential gene expression (DGE) analysis, some showing stronger differential expression or shrew-specific branch shifts. Why was BCL2L1 prioritized over these other candidates?

      We have now included the reasoning for further validation of BCL2L1 as described above.

      - The relevance of the "higher degree" differentially expressed genes needs more explanation. Although this group of genes is highlighted in the results, they are not featured in any subsequent analyses, leaving their importance unclear.

      Thank you for this insight. We have removed this from the methods as it is not relevant to subsequent analyses or conclusions.

      - The authors mention maintaining (or at least attempting to maintain) a 1:1 sex ratio for the comparative analysis (Line 525), but it is unclear if this was also done for the S. araneus analysis. If so, was sex included as a covariate (e.g., a random effect) in the differential expression analysis?

      We have now incorporated information on sex as described above.

      (3) Discussion:

      The term "adaptive" is used frequently and liberally throughout the discussion, but the authors should be cautious in interpreting seasonal changes in gene expression as indicators of adaptive evolution. Such changes do not necessarily imply causal or adaptive associations, and this distinction should be clearly stated when discussing the results.

      Thank you for this feedback and we agree with your conclusion, while a second expression optimum in the shrew lineage is indicative of adaptive expression, we cannot fully determine whether these are caused by genetic or environmental factors, despite careful attention to experimental design. We have highlighted this as a limitation in the discussion.

      (4) Minor Editorial Comment:

      Line 105: "... maintenance of an energy budgets..." delete "an"

      We have removed this grammatical error.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Recommendations for the authors):

      Major comments

      (1) Line 201: The threshold of 0.25 was maintained to select enriched genes, which minimize the value of the GO term enrichment analyses. It may notably explain why the term phagosome is enriched in cluster 7, while experimental data indicate that cluster 7 is not phagocytic. In addition, the authors mentioned in the 1st response to reviewer that they would include DotPlot to illustrate the specificity of the genes corresponding to the main GO terms. This should notably include the ribosomal genes found enriched in cluster 4, which constitute the basis used by the authors to call cluster 4 the progenitor cluster.

      We appreciate the reviewer’s concern regarding our chosen log2FC threshold (0.25) for GO term enrichment. To assess the robustness of our approach, we tested more stringent thresholds (e.g., 0.5) and verified that our overall interpretations remain consistent. However, we acknowledge that certain GO terms, such as phagosome, may appear in clusters that are not primarily phagocytic. This is likely due to the fact that genes involved in vesicle trafficking, endo-lysosomal compartments and intracellular degradation processes overlap with those classically associated with phagocytosis.

      Therefore, the KEGG-based enrichment of phagosome in cluster 7 does not necessarily imply active phagocytosis but could instead reflect these alternative vesicular processes. As we show, cluster 7 correspond to vesicular cells, and as seen in cytology we named these cells after their very high content of vesicular structures. As functional annotation based solely on transcriptomic data can sometimes lead to overinterpretations, we emphasize the importance of biological validation, which we have partially addressed through functional assays in this study.

      Regarding the specificity of ribosomal gene expression in cluster 4, we analyzed the distribution of ribosomal genes expressed across all clusters, as shown in Supplementary Figure S1-J. This analysis demonstrates that cluster 4 is specifically enriched in ribosome-related genes, reinforcing its characterization as a transcriptionally active population. Given that ribosomal gene expression is a key feature often associated with proliferative or metabolically active cells, these findings support our initial interpretation that cluster 4 may represent an undifferentiated or progenitor-like population.

      We acknowledge the reviewer’s suggestion to include a DotPlot to further illustrate the specificity of these genes in cluster 4. However, we believe that Supplementary Figure S1-J already effectively demonstrates this enrichment by presenting the percentage of ribosomal genes per cluster. A DotPlot representation would primarily convey the same information in a different format, but without providing additional insight into the specificity of ribosomal gene expression within cluster 4.

      (2) The lineage analysis is highly speculative and based on weak evidences. Initiating the hemocyte lineage to C4 is based on rRNA expression levels. C6 would constitute a better candidate, notably with the expression of PU-1, ELF2 and GATA3 that regulate progenitors differentiation in mammals (doi: 10.3389/fimmu.2019.00228, doi:10.1128/microbiolspec.mchd-0024-2, doi: 10.1098/rsob.180152) while C4 do not display any specific transcription factors (Figure 7I). In addition, the representation and interpretation of the transcriptome dynamics in the different lineages are erroneous. There are major inconsistencies between the data shown in the heatmaps Fig7C-H, Fig S10 and the dotplot in Fig7I. For example, Gata3 (G31054) and CgTFEB (G30997) illustrate the inconsistency. Fig S10C show GATA3 going down from cluster 4 to cluster 6 while Fig 7I show an increase level of expression in 6 compared to 4. CgTFEB (G30997) decrease from C4 to VC in Fig 7F while it increases according to Fig 7I. At last, Figure 7D: the umap show transition from C4 to C5 while the heatmap mention C4 to C6 (I believe there is a mix up with Figure 7E.

      We sincerely apologize for the inconsistencies noted between the different panels of Figure 7. These discrepancies resulted from using an incorrect matrix dataset during the initial representation. To address this issue, we have fully reprocessed the data and now provide a corrected and improved depiction of gene expression dynamics along the pseudotime trajectory. We are grateful to the reviewer for having help us to correct theses mistakes.

      In the revised version, we offer a comprehensive and consistent representation of expression level variations for key genes identified by the Monocle3 algorithm. Supplementary Figure S10 now presents the average expression variation of these significant genes as a function of pseudotime. Based on this dataset, we carefully selected representative genes to construct panels C to H of Figure 7, ensuring coherence across all figures. These updated panels show both average expression levels and the percentage of expressing cells along the pseudotime trajectory, providing a clearer interpretation of transcriptomic dynamics.

      We appreciate the reviewer’s helpful feedback regarding our lineage analysis and the suggestion that cluster 6 might be a more appropriate progenitor based on the expression of mammalian-like transcription factors such as PU-1, ELF2, and GATA3. Below, we clarify our rationale for choosing cluster 4 as the root of the pseudotime and discuss the functional implications of the identified transcription factors.

      We can hypothesize that clusters 4, 5, or 6 could each potentially represent early progenitor-like states, as these three clusters are transcriptionally close (Lines 539-541). These clusters have not yet been conclusively identified in terms of classical hemocyte morphology, and they appear to arise from ABL- or BBL-type cells. Our decision to root the pseudotime at cluster 4 was motivated by its strong expression of core transcription and translation genes, suggesting a particular stage of translation activity that was not observed for cluster 5 or cluster 6. Cluster 5 and 6 may correspond to a similar population of cells, most probably Blast-Like cells at different stages of cell cycle or differentiation engagement.

      Although cluster 6 expresses PU-1, ELF2, and GATA3, which are known regulators of haematopoietic progenitor differentiation in vertebrates, it is essential to highlight that structural homology does not necessarily imply functional equivalence. Moreover, the expression of PU-1, ELF2, and GATA3 does not strictly characterize a population as “undifferentiated” or progenitor-like. Studies such as those by Buenrostro et al. (Cell, 2018) have demonstrated that these transcription factors can remain active in or reemerge during more lineage-committed stages. For instance, PU-1 is essential for myeloid and B-cell differentiation, GATA3 is involved in T-lymphocyte lineage commitment (though transiently expressed in early progenitors), and ELF2 participates in lineage-specific pathways. Thus, their presence does not imply a primitive state but rather highlights their broader functional roles in guiding and refining lineage decisions. Functional annotation of these transcription factors in invertebrate systems remains speculative, particularly as morphological or molecular markers specific to these early hemocyte lineages are not yet fully established. Further functional assays (e.g., knockdown/overexpression or lineage tracing using cells (ABL and BBL) from clusters 4, 5 and 6) will be necessary to determine which hemocyte population harbor progenitor properties and differentiation potential.

      To further address the reviewer’s concern, we performed complementary pseudotime analyses by initiating Monocle 3 trajectories from clusters 4, 5, and 6 individually, as well as collectively (4/5/6). These analyses (see attached figure) confirm that the overall differentiation topology remains unchanged regardless of the selected root, consistently revealing two main pathways: one leading to hyalinocytes and the other to the granular lineage (ML, SGC, and VC). This consistency strongly suggests that clusters 4, 5, and 6 represent related pools of progenitor-like cells. Therefore, choosing cluster 4 based on its transcription/translation readiness does not alter the inferred branching architecture of hemocyte differentiation.

      We appreciate the reviewer’s suggestions, which have helped us improve our manuscript and clarify our rationale.

      Author response image 1.

      Representation of the trajectories obtained from Monocle3 analysis using different pseudotime origins, showing that changing the rooting did not alter the overall differentiation topology. (A) Pathways identified with cluster 4, (B) cluster 5, (C) cluster 6, and (D) cluster 4/5/6 origins.

      (3) Concerning the AMP expression analysis in Figure 6: the qPCR data show that Cg-BPI and Cg-Defh are expressed broadly in all fractions including 6 and 7, which is in conflict with the statement Line 473 indicating that SGC (fractions 6 and 7) is not expressing AMP. In addition, this analysis should be combined with the expression profile of all AMP in the scRNAseq data (list available in 10.1016/j.fsi.2015.02.040).

      We thank the reviewer for highlighting this point. We acknowledge that the qPCR data show expression of Cg-BPI and Cg-Defh across all fractions, including fractions 6 and 7 corresponding to SGC. However, our conclusion that SGCs do not express antimicrobial peptides (AMPs) was based on a correlation analysis rather than direct detection of AMPs in granular cells. Specifically, the qPCR experiments were designed to measure AMP expression levels in fractionated hemocyte populations relative to a control sample of whole hemolymph. We then performed a correlation analysis between AMP expression levels and the proportion of each hemocyte type in the fractions. This approach allowed us to infer a lower expression of AMP in granular cells, as reflected in the heatmap presented in Figure 6.

      Regarding the suggestion to integrate AMP expression profiles from scRNA-seq data, we wrote that the limited sequencing depth of our scRNA-seq analysis was insufficient to accurately detect AMP expression (Ligne 472-473 → “However, due to the limited sequencing depth, the scRNA-seq analysis was not sensitive enough to reveal AMP expression.”.  Additionally, many of the known AMPs of Crassostrea gigas are not annotated in the genome, further complicating their identification within the scRNA-seq dataset. As a result, we were unable to perform the requested integration of AMP expression profiles from scRNA-seq data.

      (4) The transcription factor expression analysis is descriptive and the interpretation too partial. These data should be compared with other systems. Most transcription factors show functional conservation, notably in the inflammatory pathways, which can provide valuable information to understand the function of the clusters 5 and 6 for which limited data are available.

      We appreciate the reviewer’s suggestion to compare the identified transcription factors with other systems. However, since we did not perform a detailed phylogenetic analysis of the transcription factors identified in our dataset, we refrain from making assumptions about their functional conservation across species. Our analysis aims to provide a descriptive overview of transcription factor expression patterns in hemocyte clusters, which serves as a foundation for future functional studies. While transcription factor profiles may provide insights into the potential roles of clusters 5 and 6, assigning precise functions based solely on bioinformatic predictions remains speculative. Further experimental validation, including functional assays and evolutionary analyses, would be necessary to confirm the roles of these transcription factors, which is beyond the scope of the present study.

      Minor comments

      Line 212-213: the text should be reformulated. In the result part, it is more important to mention that the reannotation is based on conserved proteins functions than to mention the tool Orson.

      We have reworded this section to emphasize that the updated annotation is function-based, using Orson primarily as the bioinformatics tool for improved GO annotation. We now place the emphasis on the conserved protein functions underlying the reannotation. Lines 212-215 : “Using the Orson pipeline (see Materials and Methods), these files were used to extract and process the longest CDSs for GO-term annotation, and we then reannotated each predicted protein by sequence homology, assigning putative functions and improving downstream GO-term analyses.”

      Figure 2: I would recommend homogenizing the two Dotplot representation with the same color gradient and representing the gene numbers in both case.

      We appreciate the reviewer’s suggestion to improve the clarity and consistency of Figure 2. In response, we have homogenized the color gradients across the two DotPlot representations and have included gene numbers in both cases to ensure a more uniform and informative visualization.

      Table 2: pct1 and pct2 should be presented individually like in table 1

      We now present these columns separately (pct1, pct2) as in Table 1, so readers can compare the fraction of expressing cells in each cluster more transparently.

      Line 403-414: how many cells were quantified for the phagocytic experiments ?

      We have added the exact number of cells that were counted to determine phagocytic indices and the number of technical/biological replicates. Line 411, the text was modified : “Macrophage-like cells and small granule cells showed a phagocytic activity of 49 % and 55 %, respectively, and a phagocytosis index of 3.5 and 5.2 particles per cell respectively (Fig. 5B and Supp. Fig. 7B), as confirmed in 3 independent experiments examining a total of 2,807 cells.”

      Line 458: for copper staining, how many cells and how many replicates were done for the quantification ?

      We have specified the number of hemocytes and number of independent replicates used when quantifying rhodanine-stained (copper-accumulating) cells. Line 458 the following text was added : “and a total of 1,562 cells were examined across three independent experiments.”

      Line 461: what are the authors referring to when mentioning the link between copper homeostasis and scRNAseq?

      Single-cell RNA sequencing (scRNA-seq) analysis revealed an upregulation of several copper transport– related genes, including G4790 (a copper transporter) with a 2.7 log2FC and a pct ratio of 42, as well as the divalent cation transporters G5864 (zinc transporter ZIP10) and G4920 (zinc transporter 8), specifically in cluster 3 cells identified as small granule cells. These findings reinforce a potential role for this cluster in metal homeostasis.

      We modified lines 462-467 as : “ These results provide functional evidence that small granule cells (SGCs) are specialized in metal homeostasis in addition to phagocytosis, as suggested by the scRNA-seq data identifying cluster 3. Specifically, single-cell RNA sequencing revealed an upregulation of copper transport– related genes, including G4790 (a copper transporter) with a 2.7 log2FC and a pct ratio of 42, reinforcing the role of SGCs in copper homeostasis (see Supp. File S1).”

      Line 611: it would be nice to display the enrichment of the phagocytic receptor in cluster 3 (dotplot or feature plot) to illustrate the comment.

      We appreciate the reviewer’s insightful suggestion regarding a more comprehensive analysis of phagocytic receptors. While a full inventory is beyond the scope of this study, we acknowledge the value of such an approach and hope that our findings will serve as a foundation for future investigations in this direction.

      Although we have highlighted certain phagocytic receptors (e.g., a scavenger receptor domain-containing gene) in our scRNA-seq dataset, it is beyond the scope of the current study to inventory all phagocytosisrelated receptors in the C. gigas genome, which itself would be a substantial undertaking. Moreover, singlecell RNA sequencing captures only about 15–20% of each cell’s mRNA, so we inherently lose a significant portion of the transcriptome, further limiting our ability to pinpoint all relevant phagocytic receptor genes. Adding more figures to cover every candidate receptor would risk overloading this paper, thus we focus on the most prominent examples. A promising approach for more exhaustive analysis would involve efficiently isolating granulocytes (e.g., via Percoll gradient) and performing targeted RNA-seq on this cell population to thoroughly explore genes involved in phagocytosis.

      Line 640-644: the authors mentioned that ML may be able to perform ETosis based on the oxidative burst.

      This hypothesis requires further evidences. Are other markers of ETosis expressed in this cell type?

      We agree that additional experimental evidence (e.g., detection of histone citrullination, extracellular DNA networks) is necessary to confirm ETosis in molluscan immune cells. We present ML-mediated ETosis only as a speculative possibility based on oxidative burst capacity as it was shown in different pieces of work that ETosis is inhibited by NADPH inhibitors (Poirier et al. 2014). Nevertheless, the expression of histones in the macrophage-like cluster (cluster 1) reinforces this possibility, as histone modifications play a key role in chromatin decondensation during ETosis.

      Reviewer #2 (Recommendations for the authors):

      Figure 1: In Figure 1B, the cell clusters are named 1 to 7, whereas in Figure 1C they are displayed as clusters 0 to 6. There is a mismatch between the identification of the clusters.

      We thank the reviewer for identifying this inconsistency. The cluster numbering has been corrected to ensure consistency between Figures 1B and 1C.

      Figure 2B: the font size could be increased for greater clarity.

      We thank the reviewer for this suggestion. The font size in Figure 2B has been increased to improve clarity and readability.

      Line 221: "Figures 2B, C and D" appears to refer to Figure S2 rather than the main Figure 2.

      The text has been corrected to properly reference the figure.

      Line 754: "Anopheles gambiae" should be italicised

      We thank the reviewer for pointing this out. "Anopheles gambiae" has been italicized accordingly.

      Bibliography

      Integrated Single-Cell Analysis Maps the Continuous Regulatory Landscape of Human Hematopoietic

      Differentiation. Buenrostro, Jason D. et al. Cell, Volume 173, Issue 6, 1535 - 1548.e16

      Antimicrobial Histones and DNA Traps in Invertebrate Immunity

      Poirier, Aurore C. et al. Journal of Biological Chemistry, Volume 289, Issue 36, 24821 - 24831

    1. Author response:

      Reviewer #1:

      Strengths:

      (1) Using a fairly generic ecological model, the method can identify the change in the relative importance of different ecological forces (distribution of interspecies interactions, demographic noise, and immigration) in different sample groups. The authors focus on the case of the human gut microbiota, showing that the data are consistent with a higher influence of species interactions (relative to demographic noise and immigration) in a disease microbiota state than in healthy ones. (2) The method is novel, original, and it improves the state-of-the-art methodology for the inference of ecologically relevant parameters. The analysis provides solid evidence for the conclusions. 

      Weaknesses:

      In the way it is written, this work might be mostly read by physicists. We believe that, with some rewriting, the authors could better highlight the ecological implications of the results and make the method more accessible to a broader audience.

      We thank the reviewer for their positive and constructive feedback. We particularly appreciate the recognition of the novelty and robustness of our method, as well as the insight that it sheds light on the shifting ecological forces between healthy and diseased microbiomes. In response to the concern about the manuscript’s accessibility, we aim to revise key sections – including the Introduction, Results, and Discussion – to more clearly articulate the ecological relevance of our theoretical findings. We would like to emphasize that our approach offers a novel perspective for analyzing individual species' abundances, as well as for understanding interaction patterns and stability at the community level. By placing our results within a broader context accessible to readers from diverse backgrounds, we aim for the revised version to appeal to a wider audience, including ecologists and microbiome scientists, while preserving the rigor of our underlying statistical physics framework.

      Reviewer #2:

      Strengths:

      A well-written article, relatively easy to follow and transparent despite the high degree of technicality of the underlying theory. The authors provide a powerful inferring procedure, which bypasses the issue of having only compositional data. 

      Weaknesses:

      (1) This sentence in the introduction seems key to me: "Focusing on single species properties as species abundance distribution (SAD), it fails to characterise altered states of microbiome." Yet it is not explained what is meant by 'fail', and thus what the proposed approach 'solves'. (2) Lack of validation, following arbitrary modelling choices made (symmetry of interactions, weak-interaction limit, uniform carrying capacity). Inconsistent interpretation of instability. Here, instability is associated with the transition to the marginal phase, which becomes chaotic when interaction symmetry is broken. But as the authors acknowledge, the weak interaction limit does not reproduce fat-tailed abundance distributions found in data. On the other hand, strong interaction regimes, where chaos prevails, tend to do so (Mallmin et al, PNAS 2024). Thus, the nature of the instability towards which unhealthy microbiomes approach is unclear. (3) Three technical points about the methodology and interpretation. a) How can order parameters ℎ and 𝑞0 can be inferred, if in the compositional data they are fixed by definition? b) How is it possible that weaker interaction variance is associated with an approach to instability, when the opposite is usually true? c) Having an idea of what the empirical data compares to the theoretical fits would be valuable. Implications: As the authors say, this is a proof of concept. They point at limits and ways to go forward, in particular pointing at ways in which species abundance distributions could be better reproduced by the predicted dynamical models. One implication that is missing, in my opinion, is the interpretability of the results, and what this work achieves that was missing from other approaches (see weaknesses section above): what do we learn from the fact that changes in microbial interactions characterise healthy from unhealthy microbiota? For instance, what does this mean for medical research?

      We greatly appreciate the reviewer’s thoughtful analysis highlighting both the strengths and areas of ambiguity in our work.

      (1) To clarify the sentence on the limitations of species abundance distributions (SADs), we aim to explain in the revised version that while SADs summarize the relative abundance of individual species, they fail to capture the species-species correlations that we have shown (Seppi et al., Biomolecules 2023) to be more susceptible to the healthy state of the host. Our method thus focused on the interaction statistics among species, providing insights into underlying dynamics and stability of the microbiomes and their differences between healthy and unhealthy hosts.

      (2) Regarding model assumptions, we acknowledge that the weak interaction regime and symmetry hypotheses simplify the analysis and may not capture all empirical richness, such as fat-tailed distributions of species abundance. However, we interpret instability not as a path to chaos per se, but as a transition toward a multi-attractor phase, where each microbiome reaches a different fixed point. This is consistent with prior empirical findings invoking the “Anna Karenina principle”, where healthy microbiomes resemble one another, but disease states tend to deviate from this picture (see Pasqualini et al., PLOS Comp. Bio. 2024). We consider our framework as a starting point and agree that further extensions incorporating strong interaction regimes (as suggested by Mallmin et al., PNAS 2024) or relaxing other model assumptions could reveal even richer dynamical patterns. The computational pipeline we present can be, in fact, easily generalizable to include different population dynamics models.

      On the technical questions: (a) While compositional data constrain relative abundances, we can still estimate diversity-dependent parameters (h and q0) using alpha-diversity statistics across samples, which show meaningful variation; (b) The counter-intuitive instability that the reviewer pointed out arises from the interplay between demographic stochasticity and quenched disorder. It is the combined contribution of these two factors in phase space – not either one alone – that drives the transition. For clarity, see Figure 1 in Altieri et al., Phys. Rev. Lett. 2021; (c) We plan to include plots that compare empirical data to theoretical model fits. This will help visualize how well the model captures observed microbial community properties demographic noise (𝑇), healthy communities are more stable (i.e., distantσ from the and how even with larger species interaction heterogeneity (σ) and larger critical line), as measured, by the replicon eigenvalue. Finally, regarding interpretability and implications: by showing that ecological interaction networks – not just species identities – differ between healthy and unhealthy states, our work suggests a conceptual shift. This could inform medical strategies aimed at restoring community-level stability rather than targeting individual microbes. In the revised Discussion section, we will elaborate on this point to better highlight its practical implications and outline potential directions for future research.

      Reviewer #3:

      Strengths:

      The modeling efforts of this study primarily rely on a disordered form of the generalized Lotka-Volterra (gLV) model. This model can be appropriate for investigating certain systems, and the authors are clear about when and how more mechanistic models (i.e., consumer-resource) can lead to gLV. Phenomenological models such as this have been found to be highly useful for investigating the ecology of microbiomes, so this modeling choice seems justified, and the limitations are laid out. 

      Weaknesses:

      The authors use metagenomic data of diseased and healthy patients that were first processed in Pasqualini et al. (2024). The use of metagenomic data leads me to a question regarding the role of sampling effort (i.e., read counts) in shaping model parameters such as h. This parameter is equal to the average of 1/# species across samples because the data are compositional in nature. My understanding is that it was calculated using total abundances (i.e., read counts). The number of observed species is strongly influenced by sampling effort, so it would be useful if the number of reads were plotted against the number of species for healthy and diseased subjects. However, the role of sampling effort can depend on the type of data, and my instinct about the role that sampling effort plays in species detection is primarily based on 16S data. The dependency between these two variables may be less severe for the authors' metagenomic pipeline. This potential discrepancy raises a broader issue regarding the investigation of microbial macroecological patterns and the inference of ecological parameters. Often microbial macroecology researchers rely on 16S rRNA amplicon data because that type of data is abundant and comparatively low-cost. Some in microbiology and bioinformatics are increasingly pushing researchers to choose metagenomics over 16S. Sometimes this choice is valid (discovery of new MAGs, investigate allele frequency changes within species, etc.), sometimes it is driven by the false equivalence "more data = better". The outcome, though, is that we have a body of more-or-less established microbial macroecological patterns which rest on 16S data and are now slowly incorporating results from metagenomics. To my knowledge, there has not been a systematic evaluation of the macroecological patterns that do and do not vary by one's choice in 16S vs. metagenomics. Several of the authors in this manuscript have previously compared the MAD shape for 16S and metagenomic datasets in Pasqualini et al., but moving forward, a more comprehensive study seems necessary.

      We thank the reviewer for this insightful and nuanced comment, which particularly highlights the broader methodological context of our data sources. Indeed, metagenomic sequencing introduces different biases with respect to 16S data. First, we would like to emphasize that we estimated the order parameters from the data by using relative abundances. Second, while the concern regarding the influence of sequencing depth and species diversity on the estimation of the order parameters is valid, we refer to a previous publication by some of the authors (Pasqualini et al., 2024; see Figure 4, panels g and h). There, we pointed out that the observed outcome is weakly influenced by sequencing depth in our dataset, while the main impact on the order parameters estimate comes from the species diversity of the two groups. In the same publication, we showed that other well-known patterns (species abundance distribution, mean abundance distribution) are also observed. Also, to mitigate the effect of the number of samples and sequencing depth, we estimated the order parameters by a bootstrap procedure (90% of samples for healthy and diseased groups, 5000 resamples), which resulted in the error bars in Figure 2.

      We also fully agree with the broader call for a systematic comparison of macroecological patterns derived from 16S and metagenomic data. While some of us have already begun exploring this direction (e.g., Pasqualini et al., 2024), the reviewer’s comment highlights its significance and motivates us to pursue a more comprehensive, integrative analysis across data types. While we found qualitative agreement of these patterns with previous publications (e.g., Grilli, Nature Comm. 2020), we will acknowledge this as an important future direction in the Discussion section.

      References

      (1) Seppi, M., Pasqualini, J., Facchin, S., Savarino, E.V. and Suweis, S., 2023. Emergent functional organization of gut microbiomes in health and diseases. Biomolecules, 14(1), p.5.

      (2) Pasqualini, J., Facchin, S., Rinaldo, A., Maritan, A., Savarino, E. and Suweis, S., 2024. Emergent ecological patterns and modelling of gut microbiomes in health and in disease. PLOS Computational Biology, 20(9), p.e1012482.

      (3) Mallmin, E., Traulsen, A. and De Monte, S., 2024. Chaotic turnover of rare and abundant species in a strongly interacting model community. Proceedings of the National Academy of Sciences, 121(11), p.e2312822121.

      (4) Altieri, A., Roy, F., Cammarota, C., & Biroli, G. (2021). Properties of equilibria and glassy phases of the random Lotka-Volterra model with demographic noise. Physical Review Letters, 126(25), 258301.

      (5) Grilli, J. (2020). Macroecological laws describe variation and diversity in microbial communities. Nature communications, 11(1), 4743.

    1. Author response:

      Reviewer 1:

      (1) Clarification of axon mistargeting patterns and model interpretation

      We will clarify the apparent discrepancy between chick and mouse axon mistargeting data. Specifically, we will expand the explanation in the main text and Figure 7 legend and/or revise the model in Figure 7 to better reflect observed phenotypes and clarify how Sp1 overexpression contributes to mistargeting.

      (2) Evidence for Sp1-dependent ephrin expression

      We agree that demonstrating ephrin expression changes in motor neurons is essential. We will: • Conduct in situ hybridization and/or immunostaining for ephrins in control and Sp1 mutant spinal cords from both chick and mouse embryos.

      Clarify and expand the methodological details of the NSC-34 cell experiments shown in Figure 4G.

      (3) RNA-seq experiment details

      We will revise the Methods section to provide additional experimental details.

      (4) Use of Syn1-cre

      We acknowledge concerns about the broad expression of Syn1-cre. To address this:

      We will clarify our rationale for using Syn1-cre and describe its expression pattern in the spinal cord.

      We are evaluating the feasibility of additional experiments using a motor neuron-specific Cre driver to confirm cell-type specificity.

      We will include a new paragraph in the Discussion addressing potential contributions from other neuronal populations.

      Reviewer 2:

      (1) & (2) Clarification and localization of RNA-seq data

      We will expand the Methods section to provide greater detail on the RNA-seq approach. In addition, we will validate ephrin downregulation in LMC neurons using in situ hybridization and/or immunostaining.

      (3) Integration of ChIP and RNA-seq data We will:

      Report additional ChIP peaks for ephrinA5 and other differentially expressed genes such as Sema7a.

      Add a summary figure that integrates ChIP and RNA-seq results to strengthen the link between Sp1 binding and transcriptional regulation.

      (4) Clarification of the cis-attenuation model

      We recognize that our data do not yet directly demonstrate Sp1’s role in cis-attenuation. To address this:

      We will revise the abstract and main text to frame Sp1's role in cis-attenuation as a hypothesis. • We are exploring the feasibility of ephrinA5 and B2 rescue experiments in Sp1-deficient embryos to test specificity.

      (5) Behavioral phenotypes and cell-type specificity

      We will clarify that behavioral phenotypes may result from combined effects across neuron populations due to Syn1-cre expression. To address this:

      We are planning rescue experiments with Sp1 expression in chick embryos to test for rescue of axon misrouting.

      We will include a new paragraph in the Discussion to highlight this limitation and discuss alternative interpretations.

      Reviewer 3:

      We appreciate your positive evaluation and support for the rigor of our study.

      In response to your suggestions:

      We are revising the manuscript to improve clarity and flow, particularly the transitions between datasets.

      We will update Figure 7 and the associated text to more clearly convey the working model and avoid overinterpretation.

      We thank all reviewers for their constructive feedback and are committed to addressing each point thoroughly. All revisions will be clearly marked in the resubmitted manuscript.

    1. Author response:

      (This author response relates to the first round of peer review by Biophysics Colab. Reviews and responses to both rounds of review are available here: https://sciety.org/articles/activity/10.1101/2023.10.23.563601.)

      General Assessment:

      Pannexin (Panx) hemichannels are a family of heptameric membrane proteins that form pores in the plasma membrane through which ions and relatively large organic molecules can permeate. ATP release through Panx channels during the process of apoptosis is one established biological role of these proteins in the immune system, but they are widely expressed in many cells throughout the body, including the nervous system, and likely play many interesting and important roles that are yet to be defined. Although several structures have now been solved of different Panx subtypes from different species, their biophysical mechanisms remain poorly understood, including what physiological signals control their activation. Electrophysiological measurements of ionic currents flowing in response to Panx channel activation have shown that some subtypes can be activated by strong membrane depolarization or caspase cleavage of the C-terminus. Here, Henze and colleagues set out to identify endogenous activators of Panx channels, focusing on the Panx1 and Panx2 subtypes, by fractionating mouse liver extracts and screening for activation of Panx channels expressed in mammalian cells using whole-cell patch clamp recordings. The authors present a comprehensive examination with robust methodologies and supporting data that demonstrate that lysophospholipids (LPCs) directly Panx-1 and 2 channels. These methodologies include channel mutagenesis, electrophysiology, ATP release and fluorescence assays, molecular modelling, and cryogenic electron microscopy (cryo-EM). Mouse liver extracts were initially used to identify LPC activators, but the authors go on to individually evaluate many different types of LPCs to determine those that are more specific for Panx channel activation. Importantly, the enzymes that endogenously regulate the production of these LPCs were also assessed along with other by-products that were shown not to promote pannexin channel activation. In addition, the authors used synovial fluid from canine patients, which is enriched in LPCs, to highlight the importance of the findings in pathology. Overall, we think this is likely to be a landmark study because it provides strong evidence that LPCs can function as activators of Panx1 and Panx2 channels, linking two established mediators of inflammatory responses and opening an entirely new area for exploring the biological roles of Panx channels. Although the mechanism of LPC activation of Panx channels remains unresolved, this study provides an excellent foundation for future studies and importantly provides clinical relevance.

      We thank the reviewers for their time and effort in reviewing our manuscript. Based on their valuable comments and suggestions, we have made substantial revisions. The updated manuscript now includes two new experiments supporting that lysophospholipid-triggered channel activation promotes the release of signaling molecules critical for immune response and demonstrates that this novel class of agonist activates the inflammasome in human macrophages through endogenously expressed Panx1. To better highlight the significance of our findings, we have excluded the cryo-EM panel from this manuscript. We believe these changes address the main concerns raised by the reviewers and enhance the overall clarity and impact of our findings. Below, we provide a point-by-point response to each of the reviewers’ comments.

      Recommendations:

      (1) The authors present a tremendous amount of data using different approaches, cells and assays along with a written presentation that is quite abbreviated, which may make comprehension challenging for some readers. We would encourage the authors to expand the written presentation to more fully describe the experiments that were done and how the data were analysed so that the 2 key conclusions can be more fully appreciated by readers. A lot of data is also presented in supplemental figures that could be brought into the main figures and more thoroughly presented and discussed.

      We appreciate and agree with the reviewers’ observation. Our initial manuscript may have been challenging to follow due to our use of both wild-type and GS-tagged versions of Panx1 from human and frog origins, combined with different fluorescence techniques across cell types. In this revision, we used only human wild-type Panx1 expressed in HEK293S GnTI- cells, except for activity-guided fractionation experiments, where we used GS-tagged Panx1 expressed in HEK293 cells (Fig. 1). For functional reconstitution studies, we employed YO-PRO-1 uptake assays, as optimizing the Venus-based assay was challenging. We have clarified these exceptions in the main text. We think these adjustments simplify the narrative and ensure an appropriate balance between main and supplemental figures.

      (2) It would also be useful to present data on the ion selectivity of Panx channels activated by LPC. How does this compare to data obtained when the channel is activated by depolarization? If the two stimuli activate related open states then the ion selectivity may be quite similar, but perhaps not if the two stimuli activate different open states. The authors earlier work in eLife shows interesting shifts in reversal potentials (Vrev) when substituting external chloride with gluconate but not when substituting external sodium with N-methyl-D-glucamine, and these changed with mutations within the external pore of Panx channels. Related measurements comparing channels activated by LPC with membrane depolarization would be valuable for assessing whether similar or distinct open states are activated by LPC and voltage. It would be ideal to make Vrev measurements using a fixed step depolarization to open the channel and then various steps to more negative voltages to measure tail currents in pinpointing Vrev (a so called instantaneous IV).

      We fully agree with the reviewer on the importance of ion selectivity experiments. However, comparing the properties of LPC-activated channels with those activated by membrane depolarization presented technical challenges, as LPC appears to stimulate Panx1 in synergy with voltage. Prolonged LPC exposure destabilizes patches, complicating G-V curve acquisition and kinetic analyses. While such experiments could provide mechanistic insights, we think they are beyond the scope of current study.

      (3) Data is presented for expression of Panx channels in different cell types (HEK vs HEKS GnTI-) and different constructs (Panx1 vs Panx1-GS vs other engineered constructs). The authors have tried to be clear about what was done in each experiment, but it can be challenging for the reader to keep everything straight. The labelling in Fig 1E helps a lot, and we encourage the authors to use that approach systematically throughout. It would also help to clearly identify the cell type and channel construct whenever showing traces, like those in Fig 1D. Doing this systematically throughout all the figures would also make it clear where a control is missing. For example, if labelling for the type of cell was included in Fig 1D it would be immediately clear that a GnTI- vector alone control for WT Panx1 is missing as the vector control shown is for HEK cells and formally that is only a control for Panx2 and 3. Can the authors explain why PLC activates Panx1 overexpressed in HEK293 GnTl- cells but not in HEK293 cells? Is this purely a function of expression levels? If so, it would be good to provide that supporting information.

      As mentioned above, we believe our revised version is more straightforward to digest. We have improved labeling and provided explanations where necessary to clarify the manuscript. While Panx1 expression levels are indeed higher in GnTI- than in HEK293 cells, we are uncertain whether the absence of detectable currents in HEK293 cells is solely due to expression levels. Some post-translational modifications that inhibit Panx1, such as lysine acetylation, may also impact activity. Future studies are needed to explore these mechanisms further.

      (4) The mVenus quenching experiments are somewhat confusing in the way data are presented. In Fig 2B the y axis is labelled fluorescence (%) but when the channel is closed at time = 0 the value of fluorescence is 0 rather than 100 %, and as the channel opens when LPC is added the values grow towards 100 instead of towards 0 as iodide permeates and quenches. It would be helpful if these types of data could be presented more intuitively. Also, how was the initial rate calculated that is plotted in Fig 2C? It would be helpful to show how this is done in a figure panel somewhere. Why was the initial rate expressed as a percent maximum, what is the maximum and why are the values so low? Why is the effect of CBX so weak in these quenching experiments with Panx1 compared to other assays? This assay is used in a lot of experiments so anything that could be done to bolster confidence is what it reports on would be valuable to readers. Bringing in as many control experiments that have been done, including any that are already published, would be helpful.

      We modified the Y-axis in Figure 2 to “Quench (%)” for clarity. The data reflects fluorescence reduction over time, starting from LPC addition, normalized to the maximal decrease observed after Triton-X100 addition (3 minutes), enabling consistent quenching value comparisons. Although the quenching value appears small, normalization against complete cell solubilization provides reproducible comparisons. We do not fully understand why CBX effects vary in Venus quenching experiments, but we speculate that its steroid-like pentacyclic structure may influence the lysophospholipid agonistic effects. As noted in prior studies (DOI: 10.1085/jgp.201511505; DOI: 10.7554/eLife.54670), CBX likely acts as an allosteric modulator rather than a simple pore blocker, potentially contributing to these variations.

      (5) Could provide more information to help rationalize how Yo-Pro-1, which has a charge of +2, can permeate what are thought to be anion favouring Panx channels? We appreciate that the biophysical properties of Panx channel remain mysterious, but it would help to hear how a bit more about the authors thinking. It might also help to cite other papers that have measured Yo-Pro-1 uptake through Panx channels. Was the Strep-tagged construct of Panx1 expressed in GnTI- cells and shown to be functional using electrophysiology?

      Our recent study suggest that the electrostatic landscape along the permeation pathway may influence its ion selectivity (DOI: 10.1101/2024.06.13.598903). However, we have not yet fully elucidated how Panx1 permeates both anions and cations. Based on our findings, ion selectivity may vary with activation stimulus intensity and duration. Cation permeation through Panx1 is often demonstrated with YO-PRO-1, which measures uptake over minutes, unlike electrophysiological measurements conducted over milliseconds to seconds. We referenced two representative studies employing YO-PRO-1 to assess Panx1 activity. Whole-cell current measurements from a similar construct with an intracellular loop insertion indicate that our STREP-tagged construct likely retains functional capacity.

      (6) In Fig 5 panel C, data is presented as the ratio of LPC induced current at -60 mV to that measured at +110 mV in the absence of LPC. What is the rationale for analysing the data this way? It would be helpful to also plot the two values separately for all of the constructs presented so the reader can see whether any of the mutants disproportionately alter LPC induced current relative to depolarization activated current. Also, for all currents shown in the figures, the authors should include a dashed coloured line at zero current, both for the LPC activated currents and the voltage steps.

      We used the ratio of LPC-induced current to the current measured at +110 mV to determine whether any of the mutants disproportionately affect LPC-induced current relative to depolarization-activated current. Since the mutants that did not respond to LPC also exhibited smaller voltage-stimulated currents than those that did respond, we reasoned that using this ratio would better capture the information the reviewer is suggesting to gauge. Showing the zero current level may be helpful if the goal was to compare basal currents, which in our experience vary significantly from patch to patch. However, since we are comparing LPC- and voltage-induced currents within the same patch, we believe that including basal current measurements would not add useful information to our study.

      Given that new experiments included to further highlight the significance of the discovery of Panx1 agonists, we opted to separate structure-based mechanistic studies from this manuscript and removed this experiment along with the docking and cryo-EM studies.

      (7) The fragmented NTD density shown in Fig S8 panel A may resemble either lipid density or the average density of both NTD and lipid. For example, Class7 and Class8 in Fig.S8 panel D displayed split densities, which may resemble a phosphate head group and two tails of lipid. A protomer mask may not be the ideal approach to separate different classes of NTD because as shown in Fig S8 panel D, most high-resolution features are located on TM1-4, suggesting that the classification was focused on TM1-4. A more suitable approach would involve using a smaller mask including NTD, TM1, and the neighbouring TM2 region to separate different NTD classes.

      We agree with the reviewer and attempted 3D classification using multiple smaller masks including the suggested region. However, the maps remained poorly defined, and we were unable to confidently assign the NTD.

      (8) The authors don’t discuss whether the LPC-bound structures display changes in the external part of the pore, which is the anion-selective filter and the narrower part of the pore. If there are no conformational changes there, then the present structures cannot explain permeability to large molecules like ATP. In this context, a plot for the pore dimension will be helpful to see differences along the pore between their different structures. It would also be clearer if the authors overlaid maps of protomers to illustrate differences at the NTD and the "selectivity filter."

      Both maps show that the narrowest constriction, formed by W74, has a diameter of approximately 9 Å. Previous steered molecular dynamics simulations suggest that ATP can permeate through such a constriction, implying an ion selection mechanism distinct from a simple steric barrier.

      (9) The time between the addition of LPC to the nanodisc-reconstituted protein and grid preparation is not mentioned. Dynamic diffusion of LPC could result in equal probabilities for the bound and unbound forms. This raises the possibility of finding the Primed state in the LPC-bound state as well. Additionally, can the authors rationalize how LPC might reach the pore region when the channel is in the closed state before the application of LPC?

      We appreciate the reviewer’s insight. We incubated LPC and nanodisc-reconstituted protein for 30 minutes, speculating that LPC approaches the pore similarly to other lipids in prior structures. In separate studies, we are optimizing conditions to capture more defined conformations.

      (10) In the cryo-EM map of the “resting” state (EMDB-21150), a part of the density was interpreted as NTD flipped to the intracellular side. This density, however, is poorly defined, and not connected to the S1 helix, raising concerns about whether this density corresponds to the NTD as seen in the “resting” state structure (PDB-ID: 6VD7). In addition, some residues in the C-terminus (after K333 in frog PANX1) are missing from the atomic model. Some of these residues are predicted by AlphaFold2 to form a short alpha helix and are shown to form a short alpha helix in some published PANX1 structures. Interestingly, in both the AF2 model and 6WBF, this short alpha helix is located approximately in the weak density that the authors suggest represents the “flipped” NTD. We encourage the authors to be cautious in interpreting this part as the “flipped” NTD without further validation or justification.

      We agree that the density corresponding the extended NTD into the cytoplasm is relatively weak. In our recent study, we compared two Panx1 structures with or without the mentioned C-terminal helix and found evidence suggesting the likelihood of NTD extension (DOI: 10.1101/2024.06.13.598903). Nevertheless, to prevent potential confusion, we have removed the cryo-EM panel from this manuscript.

      (11) Since the authors did not observe densities of bound PLC in the cryo-EM map, it is important to acknowledge in the text the inherent limitations of using docking and mutagenesis methods to locate where PLC binds.

      Thank you for the suggestion. We have removed this section to avoid potential confusion.

      Optional suggestions:

      (1) The authors used MeOH to extract mouse liver for reversed-phase chromatography. Was the study designed to focus on hydrophobic compounds that likely bind to the TMD? Panx1 has both ECD and ICD with substantial sizes that could interact with water soluble compounds? Also, the use of whole-cell recordings to screen fractions would not likely identify polar compounds that interact with the cytoplasmic part of the TMD? It would be useful for the authors to comment on these aspects of their screen and provide their rationale for fractionating liver rather than other tissues.

      We have added a rationale in line 90, stating: “The soluble fractions were excluded from this study, as the most polar fraction induced strong channel activities in the absence of exogenously expressed pannexins.” Additionally, we have included a figure to support this rationale (Fig. S1A).

      (2) The authors show that LPCs reversibly increase inward currents at a holding voltage of -60 mV (not always specified in legends) in cells expressing Panx1 and 2, and then show families of currents activated by depolarizing voltage steps in the absence of LPC without asking what happens when you depolarize the membrane after LPC activation? If LPCs can be applied for long enough without disrupting recordings, it would be valuable to obtain both I-V relations and G-V relations before and after LPC activation of Panx channels. Does LPC disproportionately increase current at some voltages compared to others? Is the outward rectification reduced by LPC? Does Vrev remain unchanged (see point above)? Its hard to predict what would be observed, but almost any outcome from these experiments would suggest additional experiments to explore the extent to which the open states activated by LPC and depolarization are similar or distinct.

      Unfortunately, in our hands, the prolonged application of lysolipids at concentrations necessary to achieve significant currents tends to destabilize the patch. This makes it challenging to obtain G-V curves or perform the previously mentioned kinetic analyses. We believe this destabilization may be due to lysolipids’ surfactant-like qualities, which can disrupt the giga seal. Additionally, prolonged exposure seems to cause channel desensitization, which could be another confounding factor.

      (3) From the results presented, the authors cannot rule out that mutagenesis-induced insensitivity of Panx channels to LPCs results from allosteric perturbations in the channels rather than direct binding/gating by LPCs. In Fig 5 panel A-C, the authors introduced double mutants on TM1 and TM2 to interfere with LPC binding, however, the double mutants may also disrupt the interaction network formed within NTD, TM1, and TM2. This disruption could potentially rearrange the conformation of NTD, favouring the resting closed state. Three double Asn mutants, which abolished LPC induced current, also exhibited lower currents through voltage activation in Fig 5S, raising the possibility the mutant channels fail to activate in response to LPC due to an increased energy barrier. One way to gain further insight would be to mutate residues in NTD that interact with those substituted by the three double Asn mutants and to measuring currents from both voltage activation and LPC activation. Such results might help to elucidate whether the three double Asn mutants interfere with LPC binding. It would also be important to show that the voltage-activated currents in Fig. S5 are sensitive to CBX?

      Thank you for the comment, with which we agree. Our initial intention was to use the mutagenesis studies to experimentally support the docking study. Due to uncertainties associated with the presented cryo-EM maps, we have decided to remove this study from the current manuscript. We will consider the proposed experiments in a future study.

      (4) Could the authors elaborate on how LPC opens Panx1 by altering the conformation of the NTDs in an uncoordinated manner, going from “primed” state to the “active” state. In the “primed” state, the NTDs seem to be ordered by forming interactions with the TMD, thus resulting in the largest (possible?) pore size around the NTDs. In contrast, in the “active” state, the authors suggest that the NTDs are fragmented as a result of uncoordinated rearrangement, which conceivably will lead to a reduction in pore size around NTDs (isn’t it?). It is therefore not intuitive to understand why a conformation with a smaller pore size represents an “active” state.

      We believe the uncoordinated arrangement of NTDs is dynamic, allowing for potential variations in pore size during the activated conformation. Alternatively, NTD movement may be coupled with conformational changes in TM1 and the extracellular domain, which in turn could alter the electrostatic properties of the permeation pathway. We believe a functional study exploring this mechanism would be more appropriately presented as a separate study.

      (5) Can the authors provide a positive control for these negative results presented in Fig S1B and C?

      The positive results are presented in Fig. 1D and E.

      (6) Raw images in Fig S6 and Fig S7 should contain units of measurement.

      Thank you for pointing this out.

      (7) It may be beneficial to show the superposition between primed state and activated state in both protomer and overall structure. In addition, superposition between primed state and PDB 7F8J.

      We attempted to superimpose the cryo-EM maps; however, visually highlighting the differences in figure format proved challenging. Higher-resolution maps would allow for model building, which would more effectively convey these distinctions.

      (8) Including particles number in each class in Fig S8 panel C and D would help in evaluating the quality of classification.

      Noted.

      (9) A table for cryo-EM statistics should be included.

      Thanks, noted.

      (10) n values are often provided as a range within legends but it would be better to provide individual values for each dataset. In many figures you can see most of the data points, which is great, but it would be easy to add n values to the plots themselves, perhaps in parentheses above the data points.

      While we agree that transparency is essential, adding n-values to each graph would make some figures less clear and potentially harder to interpret in this case. We believe that the dot plots, n-value range, and statistical analysis provide adequate support for our claims.

      (11) The way caspase activation of Panx channels is presented in the introduction could be viewed as dismissive or inflammatory for those who have studied that mechanism. We think the caspase activation literature is quite convincing and there is no need to be dismissive when pointing out that there are good reasons to believe that other mechanisms of activation likely exist. We encourage you to revise the introduction accordingly.

      Thank you for this comment. Although we intended to support the caspase activation mechanism in our introduction, we understand that the reviewer’s interpretation indicates a need for clarification. We hope the revised introduction removes any perception of dismissiveness.

      (12) Why is the patient data in Fig 4F normalized differently than everything else? Once the above issues with mVenus quenching data are clarified, it would be good to be systematic and use the same approach here.

      For Fig. 4F, we used a distinct normalization method to account for substantial day-to-day variation in experiments involving body fluids. Notably, we did not apply this normalization to other experimental panels due to their considerably lower day-to-day variation.

      (13) What was the rational for using the structure from ref 35 in the docking task?

      The docking task utilized the human orthologue with a flipped-up NTD. We believe that this flipped-up conformation is likely the active form that responds to lysolipids. As our functional experiments primarily use the human orthologue for biological relevance, this structure choice is consistent. Our docking data shows that LPC does not dock at this site when using a construct with the downward-flipped NTD.

      (14) Perhaps better to refer to double Asn ‘substitutions’ rather than as ‘mutations’ because that makes one think they are Asn in the wt protein.

      Done.

      (15) From Fig S1, we gather that Panx2 is much larger than Panx1 and 3. If that is the case, its worth noting that to readers somewhere.

      We have added the molecular weight of each subtype in the figure legend.

      (16) Please provide holding voltages and zero current levels in all figures presenting currents.

      We provided holding voltages. However, the zero current levels vary among the examples presented, making direct comparisons difficult. Since we are comparing currents with and without LPC, we believe that indicating zero current levels is unnecessary for this study.

      (17) While the authors successfully establish lysophospholipid-gating of Panx1 and Panx2, Panx3 appears unaffected. It may be advisable to be more specific in the title of the article.

      We are uncertain whether Panx3 is unaffected by lysophospholipids, as we have not observed activation of this subtype under any tested conditions.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This study aims to understand the malaria antigen-specific cTfh profile of children and adults living in a malaria holoendemic area. PBMC samples from children and adults were unstimulated or stimulated with PfSEA-1A or PfGARP in vitro for 6h and analysed by a cTfh-focused panel. Unsupervised clustering and analysis on cTfh were performed.

      The main conclusions are:

      (1) the cohort of children has more diverse (cTfh1/2/17) recall responses compared to the cohort of adults (mainly cTfh17) and

      (2) Pf-GARP stimulates better cTfh17 responses in adults, thus a promising vaccine candidate.

      Strengths:

      This study is in general well-designed and with excellent data analysis. The use of unsupervised clustering is a nice attempt to understand the heterogeneity of cTfh cells. Figure 9 is a beautiful summary of the findings.

      Weaknesses:

      (1) Most of my concerns are related to using PfSEA-1A and PfGARP to analyse cTfh in vitro stimulation response. In vitro, stimulation on cTfh cells has been frequently used (e.g. Dan et al, PMID: 27342848), usually by antigen stimulation for 9h and analysed CD69/CD40L expression, or 18h and CD25/OX40. However, the authors use a different strategy that has not been validated to analyse in vitro stimulated cTfh. Also, they excluded CD25+ cells which might be activated cTfh. I am concerned about whether the conclusions based on these results are reliable.

      It has been shown that cTfh cells can hardly produce cytokines by Dan et al. However, in this paper, the authors report the significant secretion of IL-4 and IFNg on some cTfh clusters after 6h stimulation. If the stimulation is antigen-specific through TCR, why cTfh1 cells upregulate IL-4 but not IFNg in Figure 6? I believe including the representative FACS plots of IL-4, IFNg, IL21 staining, and using %positive rather than MFI can make the conclusion more convincing. Similarly, the author should validate whether TCR stimulation under their system for 6h can induce robust BCL6/cMAF expression in cTfh cells. Moreover, there is no CD40L expression. Does this mean TCR stimulation mediated BCl6/cMAF upregulation and cytokine secretion precede CD40L expression?

      In summary, I am particularly concerned about the method used to analyse PfSEA-1A and PfGARP-specific cTfh responses because it lacks proper validation. I am unsure if the conclusions related to PfSEA-1A/PfGARP-specific responses are reliable.

      An unfortunate reality of these types of complex immunologic studies is that it takes time to optimize a multiparameter flow cytometry panel, run this number of samples, and then conduct the analysis (not to mention the time it takes for a manuscript to be accepted for peer-review). An unexpected delay, frankly, was the COVID-19 pandemic when non-essential research lab activities were put on hold. We designed our panel in 2019 and referred to the “T Follicular Helper Cells” Methods and Protocols book from Springer 2015. Obviously the field of human immunology took a huge leap forward during the pandemic as we sought to characterize components of protective immunity, and as a result there are several new markers we will choose for future studies of Tfh subsets. We agree with the reviewer that cytokine expression kinetics differ depending on the in vitro stimulation conditions. Due to small blood volumes obtained from healthy children, we were limited in the number of timepoints we could test. However, since we were most interested in IL21 expression, we found 6 hrs to be the best in combination with the other markers of interest during our optimization experiments. We did find IFNg expression from non-Tfh cells, therefore we believe our stimulation conditions worked.

      Dan et al used stimulated tonsils cells to assess the CXCR5<sup>pos</sup>PD1<sup>pos</sup>CD45RA<sup>neg</sup> Tfh and CXCR5<sup>neg</sup> CD45RA<sup>neg</sup> non-Tfh whereas in our study, we evaluated CXCR5<sup>pos</sup>PD1<sup>pos</sup>CD45RA<sup>neg</sup> Tfh from PBMCs. Dan et al PBMCs’ work used EBV/CMV or other pathogen product stimuli and only gated on CD25<sup>pos</sup>OX40<sup>pos</sup> cells which are not the cells we are assessing in our study. This might explain in part the differences in cytokine kinetics, as we evaluated CD25<sup>neg</sup> PBMCs only. However, we agree that more recent studies focused on CXCR5<sup>pos</sup>PD1<sup>pos</sup> cells included more Activation-induced marker (AIM) markers, which are missing in our study, inducing a lack of depth in our analysis.

      Percentage of positive cells and MFI are complementary data. Indeed, the percentage of positive cells only indicates which cells express the marker of interest without giving a quantitative value of this expression. MFI indicates how much the marker of interest is expressed by cells which is important as it can indicate degree of activation or exhaustion per cell. Meta-cluster analysis is not ideal to assess the percentage of positivity whereas it does provide essential information regarding the intensity of expression. We added supplemental figures 14 (Bcl6 and cMAF), 15 (INFg and IL21) and 16 (IL4 and IL21) where percentage of positive cells were manually gated directly from the total CXCR5<sup>pos</sup>CD4<sup>pos</sup>CD45RA<sup>neg</sup>CD25<sup>neg</sup> TfH based on the FMO or negative control, and we overlaid the positive cells on the UMAP of all the CXCR5<sup>pos</sup>CD4<sup>pos</sup>CD45RA<sup>neg</sup>CD25<sup>neg</sup> meta-clusters. Results from the manual gating are consistent with the results we show using clustering. However, it helps to better visualize that antigen-specific IL21 expression was statistically significant in children whereas the high background observed for adults did not reveal higher expression after stimulation, perhaps suggesting an upper threshold of cytokine expression (supplemental figure 15). The following sentence has been added in the methods at the end of the “OMIQ analysis” section: “ However, the percentage of positive IFN𝛾, IL-4, IL-21, Bcl6, or cMAF using manual gating can be found in Supplemental Figures 14, 15, and 16 along with the overlay of the gated positive cells on the CD4<sup>pos</sup>CXCR5<sup>pos</sup>CD25<sup>neg</sup> UMAP and the cytoplots of the gated positive cells for each meta-cluster (Supplemental Figures 14, 15, and 16).”

      Indeed cMAF can be induced by TCR signaling, ICOS and IL6 (Imbratta et. al, 2020). However, in our study populations, ICOS was expressed (see Author response image 1, panel A) in absence of any stimulation suggesting that CXCR5<sup>pos</sup>CD4<sup>pos</sup>CD25<sup>neg</sup>CD45RA<sup>neg</sup> cells were already capable of expressing cMAF. Indeed, after gating Bcl6 and cMAF positive cells based on their FMOs (Author response image 1, panel B and C, respectively), we overlaid positive cells on the CXCR5<sup>pos</sup>CD4<sup>pos</sup>CD25<sup>neg</sup>CD45RA<sup>neg</sup> cells UMAP and we can see that most of our cells already express cMAF alone (Author response image 1, panel D), co-express cMAF and Bcl6 (Author response image 1, panel E), confirming that they are TfH cells, whereas very few cells only expressed Bcl6 alone (Author response image 1, panel F). Because we knew that cT<sub>FH</sub> already expresses Bcl6 and cMAF, we focused our analysis on the intensity of their expression to assess if our vaccine candidates were inducing more expression of these transcription factors.

      Author response image 1.

      (2) The section between lines 246-269 is confusing. Line 249, comparing the abundance after antigen stimulation is improper because 6h stimulation (under Golgi stop) should not induce cell division. I think the major conclusions are contained in Figure 5e, that (A) antigen stimulation will not alter cell number in each cluster and (B) children have more MC03, 06 and fewer MC02, etc.). The authors should consider removing statements between lines 255-259 because the trends are the same regardless of stimulations.

      We agree, there is no cell division after 6h and that different meta clusters did not proliferate after this short of in vitro stimulation. The use of the word ‘abundance’ in the context of cluster analysis is in reference to comparing the contribution of events by each group to the concatenated data. After the meta clusters are defined and then deconvoluted by study group, certain meta clusters could be more abundant in one group compared to another - meaning they contributed more events to a particular metacluster.

      Dimensionality reduction is more nuanced than manual gating and reveals a continuum of marker expression between the cell subsets, as there is no hard “straight line” threshold, as observed when using in 2D gating. Because of this, differences are revealed in marker expression levels after stimulation making them shift from one cluster to another - thereby changing their abundance.

      To clarify how this type of analysis is interpreted, we have modified lines 255-259 as follows:

      “In contrast, the quiescent PfSEA-1A- and PfGARP-specific cT<sub>FH</sub>2-like cluster (MC02) was significantly more abundant in adults compared to children (Figure 5c and 5d, pf<0.05). Interestingly, following PfGARP stimulation, the activated cT<sub>FH</sub>1/17-like subset (MC09) became more abundant in children compared to adults (Figure 5d, pf<0.05 with a False Discovery Rate=0.08), but no additional subsets shifted phenotype after PfSEA-1A stimulation (Figure 5c).”

      Reviewer #2 (Public Review):

      Summary:

      Forconi et al explore the heterogeneity of circulating Tfh cell responses in children and adults from malaria-endemic Kenya, and further compare such differences following stimulation with two malaria antigens. In particular, the authors also raised an important consideration for the study of Tfh cells in general, which is the hidden diversity that may exist within the current 'standard' gating strategies for these cells. The utility of multiparametric flow cytometry as well as unbiased clustering analysis provides a potentially potent methodology for exploring this hidden depth. However, the current state of analysis presented does not aid the understanding of this heterogeneity. This main goal of the study could hopefully be achieved by putting all the parameters used in one context, before dissecting such differences into their specific clinical contexts.

      Strengths:

      Understanding the full heterogeneity of Tfh cells in the context of infection is an important topic of interest to the community. The study included clinical groupings such as age group differences and differences in response to different malaria antigens to further highlight context-dependent heterogeneity, which offers new knowledge to the field. However, improvements in data analyses and presentation strategies should be made in order to fully utilize the potential of this study.

      Weaknesses:

      In general, most studies using multiparameter analysis coupled with an unbiased grouping/clustering approach aim to describe differences between all the parameters used for defining groupings, prior to exploring differences between these groupings in specific contexts. However, the authors have opted to separate these into sections using "subset chemokine markers", "surface activation markers" and then "cytokine responses", yet nuances within all three of these major groups were taken into account when defining the various Tfh identities. Thus, it would make sense to show how all of these parameters are associated with one another within one specific context to first logically establish to the readers how can we better define Tfh heterogeneity. When presented this way, some of the identities such as those that are less clear such as "MC03/MC04/ MC05/ MC08" may even be better revealed. once established, all of these clusters can then be subsequently explored in further detail to understand cluster-specific differences in children vs adults, and in the various stimulation conditions. Since the authors also showed that many of the activation markers were not significantly altered post-stimulation thus there is no real obstacle for merging the entire dataset for the first part of this study which is to define Tfh heterogeneity in an unbiased manner regardless of age groups or stimulation conditions. Other studies using similar approaches such as Mathew et al 2020 (doi: 10.1126/science.abc8) or Orecchioni et al 2017 (doi: 10.1038/s41467-017-01015-3) can be referred to for more effective data presentation strategies.

      Accordingly, the expression of cytokines and transcription factors can only be reliably detected following stimulation. However, the underlying background responses need to be taken into account for understanding "true" positive signals. The only raw data for this was shown in the form of a heatmap where no proper ordering was given to ensure that readers can easily interpret the expression of these markers following stimulation relative to no stimulation. Thus, it is difficult to reliably interpret any real differences reported without this. Finally, the authors report differences in either cluster abundance or cluster-specific cytokine/ transcription factor expression in Tfh cell subsets when comparing children vs adults, and between the two malaria antigens. The comparisons of cytokine/transcription factor between groups will be more clearly highlighted by appropriately combining groupings rather than keeping them separate as in Figures 6 and 7.

      Thank you for sharing these references. Similar to SPADE clustering and ViSNE dimensionality algorithms used in Orecchioni et al, we used all the extracellular markers from our panel in our FlowSOM algorithm with consensus meta-clustering which includes both the chemokine receptors and activation markers even though they are presented separately in our manuscript across the figure 3 and 4. This was explained in the methods section (lines 573 - 587). We then chose the UMAP algorithm as visual dimensionality reduction of the meta-clusters generated by FlowSOM-consensus meta-clustering as explained under the “OMIQ analysis” subpart of our methods (lines 588- 604). Therefore, we believe we have conducted the analysis as this reviewer suggests even if we chose to show the figures that were informative to our story. The heatmap of the results brings the possibility to see which combination of markers respond or not to the different conditions and between groups, all the raw data are present from the supplemental figures 10 to 13 showing, using bar plots, the differences expressed in the heatmaps. We believe it strengthens our interpretation of the results.

      Regarding the transcription factor and cytokine background, we added supplemental figures 14, 15 and 16 where we used manual gating to select Bcl6, cMAF, IFNg, IL21 or IL4 positive cells directly from total CXCR5<sup>pos</sup>CD4<sup>pos</sup>CD45RA<sup>neg</sup>CD25<sup>neg</sup> TfH cells based on the FMO or negative control, and we overlaid the positive cells on the UMAP of all the CXCR5<sup>pos</sup>CD4<sup>pos</sup>CD45RA<sup>neg</sup>CD25<sup>neg</sup> meta-clusters. Moreover, all the dot plots (with their statistics) used for the heatmap figure 6 and 7 can be found in the supplemental figures 10, 11, 12 and 13. These supplemental figures address the concerns above by showing the difference of signals between unstimulated and stimulated conditions.

      Reviewer #3 (Public Review):

      Summary:

      The goal of this study was to carry out an in-depth granular and unbiased phenotyping of peripheral blood circulating Tfh specific to two malaria vaccine candidates, PfSEA-1A and PfGARP, and correlate these with age (children vs adults) and protection from malaria (antibody titers against Plasmodium antigens.). The authors further attempted to identify any specific differences in the Tfh responses to these two distinct malaria antigens.

      Strengths:

      The authors had access to peripheral blood samples from children and adults living in a malaria-endemic region of Kenya. The authors studied these samples using in vitro restimulation in the presence of specific malaria antigens. The authors generated a very rich data set from these valuable samples using cutting-edge spectral flow cytometry and a 21-plex panel that included a variety of surface markers, cytokines, and transcription factors.

      Weaknesses:

      - Quantifying antigen-specific T cells by flow cytometry requires the use of either 1- tetramers or 2- in vitro restimulation with specific antigens followed by identification of TCR-activated cells based on de-novo expression of activation markers (e.g. intracellular cytokine staining and/or surface marker staining). Although authors use an in vitro restimulation strategy, they do not focus their study on cells de-novo expressing activation markers as a result of restimulation; therefore, their study is not really on antigen-specific cTfh. Moreover, the authors report no changes in the expression of activation markers commonly used to identify antigen-specific T cells upon in vitro restimulation (including IFNg and CD40L); therefore, it is not clear if their in vitro restimulation with malaria antigens actually worked.

      We understand the reviewer’s point of view and apologies for any confusion. IFNg was expressed but not statistically different between groups. Indeed, looking at the CD8 T cells and using manual gating, we were able to show that IFNg was increased but not statistically significant upon stimulation from CD4<sup>pos</sup>CXCR5<sup>pos</sup> cells (supplemental figure 15, panel C), confirming our primary observation using clustering analysis. These results showed that our malaria antigen induced IFNg response in some participants, but not all of them, revealing heterogeneity in this response among individuals within the same group.

      Regarding CD40L, in the supplemental figure 7, we can see that some of our meta-clusters expressed more CD40L upon stimulation, but again without leading to statistical differences between groups. Combined with the increased expression of other cytokines and transcription factors, we showed that our stimulation did indeed work. However, because of the high variation within groups, there were no statistical differences across our groups. Because CD40L is not the only marker showing specific T cell activation, and not all T cells respond using this marker alone, a more comprehensive multimarker AIM panel might have highlighted differences between groups. We recognized the limitations of our study and believe that future study will benefit from more activation markers commonly used to identify antigone-specific T cells such as CD69, OX40, 4-1BB (AIM panel), among other markers.

      - CXCR5+CD4+ memory T cells have been shown to present multi-potency and plasticity, capable of differentiating to non-Tfh subsets upon re-challenge. Although authors included in their flow panel a good number of markers commonly used in combination to identify Tfh (CXCR5, PD-1, ICOS, Bcl-6, IL-21), they only used one single marker (CXCR5) as their basis to define Tfh, thus providing a weak definition for Tfh cells and follow up downstream analysis.

      Sorry for the confusion, even though the subsampled on the CD4<sup>pos</sup>CXCR5<sup>pos</sup> CD25<sup>neg</sup> cells to run our FlowSOM, we showed the different levels of expression across meta-clusters (figure 4 panels A and B) of PD1 (Tfh being PD1 positive cells) and ICOS (indicating the activation stage of the Tfh, “T Follicular Helper Cells” Methods and Protocols book from Springer 2015). We also included an overlay of the manually gated double positive Bcl6-cMAF cells on the CXCR5<sup>pos</sup>CD45RA<sup>neg</sup>CD25<sup>neg</sup> CD4 T cell UMAP plot to show that most of them express Bcl6 (supplemental figure 14). Interestingly, the manually gated IL21 positive cells were less abundant, particularly for children (supplemental figure 15). Because we were not able to include all the markers that are now used to define Tfh cells, we referred to our cell subsets as “TFH-like”. This is an acknowledged limitation of our study. Due to the limited blood volume obtained from children and cost of running multiplex flow cytometry assays, our results showing antigen-specific heterogeneity of Tfh subset will have to be validated in future studies that include these additional defining markers.

      - Previous works have used FACS-sorting and in vitro assays for cytokine production and B cell help to study the functional capacity of different cTfh subsets in blood from Plasmodium-infected individuals. In this study, authors do not carry out any such assays to isolate and evaluate the functional capacity of the different Tfh subsets identified. Thus, all the suggestions for the role that these different cTfh subsets may have in vivo in the context of malaria remain highly hypothetical.

      Unfortunately, low blood volumes obtained from children prevented us from running in vitro functional assays and the study design did not allow us to correlate them with protection. However, since the function of identified Tfh subsets from malaria-exposed individuals has been evaluated using Pf lysates in other studies, we referenced them when interpreting the differences we reported in Tfh subset recognition between malaria antigens. If either of these antigens move forward into vaccine trials, then evaluating their function would be important.

      - The authors have not included malaria unexposed control groups in their study, and experimental groups are relatively small (n=13).

      This study design did not include the recruitment of malaria naive negative controls as its goal was to assess malaria antigen-specific responses comparing the quality and abundance between malaria-exposed children to adults to these potential new vaccine targets PfSEA-1A and PfGARP. We did however test 3 malaria-naive adults and found no non-specific activation after stimulation with these two malaria antigens. Since this was done as part of our assay optimization, we did not feel the need to show these negative findings.

      And even with our small sample size, we demonstrated significant age-associated differences in malaria antigen-specific responses from cT<sub>FH</sub>-like subsets.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Minor points are:

      (1) Line 88, cTfh cells are not only from GC-Tfh, they have GC-independent origin (He et al, PMID: 24138884).

      The following sentence was added line 88 “Interestingly, cT<sub>FH</sub> cells can also come from peripheral cT<sub>FH</sub> precursor CCR7<sup>low</sup>PD1<sup>high</sup>CXCR5<sup>pos</sup> cells; thus, they also have a GC-independent origin (He, Cell, 2013 PMID: 24138884).

      (2) I believe all participants were free of blood-stage infection upon enrolment. But can authors clearly state this information between lines 151-159?

      We mentioned in the methods, line 495-496 “Participants were eligible if they were healthy and not experiencing any symptoms of malaria at the time venous blood was collected”. However, using qPCR we found 5 children with malaria blood stage. As shown in Author response image 2, comparing malaria free to blood-stage children, no differences were observed without any stimulation. However, MC03 is more abundant upon malaria antigen stimulation in the blood-stage group whereas MC04 is more abundant in the malaria free group upon PfGARP stimulation only confirming that our stimulation worked.

      Author response image 2.

      Reviewer #3 (Recommendations For The Authors):

      (1) The strategy for gating on antigen-specific cTfh cells needs to be revised. The correct approach would be to gate on those cells that respond by de-novo expression of activation markers upon antigen restimulation (also termed activation-induced markers. e.g. CD69, CD40L, CXCL13 and IL-21, Niessl 2020; CD69, CD40L, CD137 and OX40, Lemieux 2023; CD137 and OX40, Grifoni 2020). As it stands, the study is not really on antigen-specific T cells, but rather on the overall CD4 T cell compartment plus or minus antigenic stimulation.

      We recognized the limitation in our flow panel design which prevents us from performing this gating. We originally based our panel design on the “T follicular helper cells methods and protocols” book (Springer 2015) which used CD45RA, CD25, CXCR5, CCR6, CXCR3, CCR7, ICOS and PD1 to define cT<sub>FH</sub>. We had already optimized our 21-color panel, purchased reagents and started to run our experiments by the time these publications modified how to define TFH cells Niessl, Lemieux and Grifoni’s publication. Indeed we optimized and performed our assay from November 2019 to March 2020, finishing to run the samples during the first quarantine. Because of the urgent needs of research on SARS-CoV-2 that we were involved with from this time and moving forward, the analysis of our TFH work got highly postponed. Moreover, 2020 is also the year where many TFH papers came out with better ways to define cT<sub>FH</sub> and responses to antigen stimulations. In our future studies, our panel will include AIM.

      (2) It is not clear if the antigenic stimulation actually worked. Does the proportion of IFNg+ or IL-4+ or IL-21+ or CD40L+ or CD25+ CD4 or CD8 T cells increase following in vitro antigen restimulation?

      Yes, using manual gating, we are able to show an increase of IL4 (supplemental figure 16 panel B and C), and IL21 (supplemental figure 15 panel J and K) production in both children and adults. However, we did not observe significant production of IFNg (supplemental figure 15, panel C) and changes in CD40L expression (supplemental figure 7) after malaria antigen stimulation, however, our positive control SEB worked. So, yes our stimulation assay worked but these 2 malaria antigens did not significantly induce these cytokines. This could be that they are too low to detect in every participant since they are single antigens and not whole parasite lysates, as other studies have used. It could also be that these antigens don’t stimulate CD40L or IFNg in all our participants. We brought up this limitation as follow in the discussion, line 473: “Although the heterogeneity in the response of CD40L and IFNγ suggests that our tested malaria antigens did not induce significant differences in the expression of these markers in all our participants, our panel did not include other activated induced markers, such as OX40, 4-1BB, and CD69”.

      (3) It is not clear what is the proportion of cTfh over the total CD4 T cell compartment among the different groups. Does this vary among different groups? It would be valuable to display this as an old-fashioned combination of contour plots with outliers for illustrating flow cytometry and bar graphs for the cumulative data.

      The proportion of CD3<sup>pos</sup>CD4<sup>pos</sup>CD25<sup>neg</sup>CXCR5<sup>pos</sup> cTfh cells did not differ within the total number of CD4 T cells between groups (figure 2).

      (4) The gating strategy could be refined and become more robust if adding additional markers in combination with CXCR5 for identifying cTfh (e.g. CXCR5+Bcl6+).

      Thank you for this suggestion. An overlay of Bcl6 expression can be found in supplemental figure 14 where we confirm that our CXCR5+ cT<sub>FH</sub>-like subsets express cMAF and Bcl6.

      (5) The protocols for intracellular and intranuclear staining seem to be incomplete in Materials and Methods. In particular, cell permeabilization strategies seem to be missing.

      Our apologies for this oversight, we added the following sentences in the methods line 545: “Cells were fixed and permeabilized for 45 mins using the transcription factor buffer set (BD Pharmingen) followed by a wash with the perm-wash buffer. Intracellular staining was performed at 4 °C for 45 more mins followed by two washes using the kit’s perm-wash buffer”.

      (6) In Materials and Methods, the authors mention they have used fluorescence minus one control to set their gating strategy. It would be valuable to show these, either on the main body or as part of supplementary figures.

      We added the cytoplots of the FMOs and/or negative controls as appropriate in the supplemental figures 14 (cMAF and Bcl6), 15 (IFNg and IL21) and 16 (IL4 and IL21).

      (7) Line 194 and Figure 3, it is not clear the criteria that the authors used for down-sampling events before FlowSOM analysis. Was this random? Was this done with unstimulated or stimulated samples?

      We chose to down-sample on CD3posCD4<sup>pos</sup>CD25<sup>neg</sup>CD45RA<sup>neg</sup> and CXCR5<sup>pos</sup> cells prior to our FlowSOM to allow more cluster analysis to focus only on the differences among those cells. The down-sampling used 1,000 CD3posCD4<sup>pos</sup>CD25<sup>neg</sup> CD45RA<sup>neg</sup>CXCR5<sup>pos</sup> cells from each fcs file (unstimulated and stimulated samples). If the fcs file had more than 1,000 CXCR5<sup>pos</sup> cells, the down-sampling was done randomly by the OMIQ platform algorithm to select only 1,000 CXCR5<sup>pos</sup> cells within this specific fcs file. The latest sentence was added to the methods line 593.

      (8) Lanes 201, 202, As it stands, the take of the authors on the role of different cTfh subsets during infection remains highly speculative. Are these differences in cTfh phenotypes actually reflected in their in vitro capacity to provide B cell help (e.g. as in the Obeng-Adjei 2015 paper) or to produce IL-21, express co-stimulatory molecules, or any other characteristic that would allow them to better infer their functional roles during infection? Any additional in vitro analysis of the functional capacity of isolated cTfh subsets identified in this research would greatly increase its value.

      We agree with the reviewer that this sentence is speculative, and we rephrase it as follow: “First, we found different CXCR5 expression levels between meta-clusters (Figure 3b); CXCR5 is essential for cT<sub>FH</sub> cells to migrate to the lymph nodes and interact with B-cells”. We would have liked to perform in vitro functional assays. However, as explained above, we did not have sufficient cells collected from children to do so.

      (9) It is not clear why authors omitted IL-17 and did not use IFNg and IL-4 to refine their definition of Th1, Th2 and Th17 cTfh.

      We would have liked to include IL-17, however we were constrained by only having access to a 4 lasers cytometer at the time we ran our assay. In light of needing to prioritize markers, when we were designing our flow panel, cTfh1 were shown to be preferentially activated during episodes of acute febrile malaria children (Obeng-Adjei). Therefore, we chose to focus on IFNg and IL4 to differentiate Tfh1 from Tfh2, in addition to other markers as surrogate of functional potential. We did not use IFNg and IL4 to refine our definition of Tfh1, Tfh2 and Tfh17 as recent publications have shown that IL4 is not only expressed in Tfh2 but also in the other Tfh subsets, at lower intensity (Gowthaman among others). Therefore IFNg and IL4 by themselves were not sufficient to properly define the different Tfh subsets. In future studies, we plan to include transcription factor profiles (T-bet, BATF, GATA3) to further refine definitions of Tfh subsets.

      (10) Lines, 226, 228, based on the combination of markers that the MC03 subset expresses, it is tempting to think that this is the only "truly" committed Tfh subset from the entire analysis. Please, discuss.

      If the reviewer is referring to changes in marker expression levels that indicate they have not reached a level of differentiation that would make them reliable (ie “true) Tfh cells, we agree that this is an important question now that we have technology that can measure and analyse so many phenotypic markers at once. This brings forward the need for the scientific method - to replicate study findings to determine whether they are consistent given the same study design and experimental conditions.

      (11) Lines 243 244, Again, is this reflected in functional capacity?

      The study described in this manuscript did not include functional assays. However, this did not change the key finding that different malaria antigens behaved differently, demonstrating heterogeneity in Tfh recognition of malaria antigens. Regarding CD40L expression, we did not observe differences between groups, however some individuals had an increase of their CD40L (supplemental figure 7). It is possible that some individuals had responded through other activated induced markers (CD69, ICOS, OX40, 4-1BB among others) and that our stimulation condition was not long enough to assess CD40L expression upon malaria antigen stimulation. This limitation has been addressed by editing the line 243-244 as follows: “we were unable to find statistical differences in the CD40L expression between groups as only few individuals responded through it (supplemental figure 7).”

      (12) Lines 243, 244, Are these cTfh subsets exclusively detected in malaria-exposed individuals? This is confounded by the lack of a malaria unexposed control group in this study, which would have been highly valuable.

      We agree with the reviewer that having non-naive children would have been valuable as a negative control group. However, this study was conducted in Kenya where all children are suspected to have had at least one malaria infection. We also did not have ethical approval or the means to enroll children in the USA who would not have been exposed to malaria as a negative control group. Since we were also evaluating differences by age group, comparing US adults would not have helped to address this point. Therefore, this remains an open question that might be addressed by another study recruiting children in non-malaria endemic areas.

      (13) Line 267, as the authors have not gated on T cells de-novo expressing activation markers in response to antigen restimulation, how do they know these are indeed antigen-specific cTfh?

      Omiq analysis accounts for marker expression levels in the resting cells (unstimulated well) for each individual compared to each experimental/stimulated well. The algorithm computationally determines whether that expression level changed without an arbitrary positive threshold, keeping the expression levels as a continuous variable, not dichotomous - which is the power of unbiased cluster analyses. Therefore, we know that these cells are antigen-specific based on the statistical difference in intensity expression between the resting cells and the stimulated ones. Nevertheless, manual gating to show “de-novo” responding cells, produced the same results as assessing the MFI of each meta-cluster (supplemental figures 14, 15 and 16).

      (14) Lines, 292-295, it is very surprising that Tfh cells would not produce IL-21 upon restimulation. Have the authors observed upregulation of IL-21 following SEB restimulation?

      Yes, we observed IL21 positive cells upon SEB stimulation (supplemental figure 15, panel J and K). However we found unexpectedly high background levels of IL21, specifically within the adult group (supplemental figure 15, panel K and M) making it challenging to find antigen-specific increases above background. Interestingly, an increase in IL21 using manual gating was observed upon PfSEA-1A or PfGARP stimulation in children (supplemental figure 15, panel J and L).

      (15) In Figures 3 and 4, it is not clear if there are any significant differences in expression of different markers between different cTfh subsets and/or different conditions. Moreover, the lack of differences in response to antigen stimulation seems to suggest that it did not work adequately.

      We intentionally chose 6-hours stimulation to better assess changes in cytokines which we did. However, because it is a short stimulation, we did not expect dramatic changes in the extracellular markers presented in the figure 3 and 4. A longer stimulation, such as 24h, will highlight properly these changes.

      (16) Figure 5b would benefit from bar graphs.

      Please find below the bar-graphs for the highlighted meta-clusters in figure 5b. We did not include these bar-graphs to our figure 5 as they do not bring new information. They repeat the information already presented through the EdgeR plot.

      Author response image 3.

      (17) Figures 6 and 7 would greatly benefit from showing individual examples of old-fashioned contour with outliers flow plots to illustrate the different cTfh subsets identified in the study.

      The different cT<sub>FH</sub> subsets can be found with a contour plot with outliers in the supplemental figure 4.

      (18) Figures 3,4, 6, and 7, the authors exclusively focused on the study of MFI to measure the expression of cytokine and transcription factors among different groups/stimulations. Have the authors observed any differences in the percentage or absolute counts of cytokine+ and/or TF+ between different subsets of cTfh and/or different conditions?

      Yes. We added the supplemental figures 14 (transcription factors) and 15/16 (cytokines) where cytokines and transcription factors were assessed using manual gating. We found that total CD4<sup>pos</sup>CXCR5<sup>pos</sup> IL4 was significantly increased upon stimulation in both adults and children while IFNg was not. However, we found significantly higher IFNg on total CD8<sup>pos</sup> cells showing that the stimulation worked, but the total CD4<sup>pos</sup>CXCR5<sup>pos</sup> did not express IFNg. Finally, we observed a trend of higher IL21<sup>pos</sup>CD4<sup>pos</sup>CXCR5<sup>pos</sup> in adults, not significant due to high background whereas IL21 was significantly increased upon stimulation in children. Regarding cMAF and Bcl6, both transcription factors were significantly increased upon stimulation within children only.

      (19) Figure 8, the definition for high and low PfGARP antibody titers seems rather arbitrary. Are these associations still significant when attempting a regular correlation analysis between Ab values (i.e. Net MFI) and different cTfh subsets?

      Yes, the definition for high and low PfGARP antibody levels is arbitrary but when looking at the antibody data (figure 1b), it was naturally bimodal. Therefore as a sub-analysis, we assess the association between PfGARP antibodies levels and cT<sub>FH</sub> subsets, see Author response image 4. We checked the correlation between the abundance of the meta-clusters and the level of IgG anti-PfGARP and anti-PfSEA after PfGARP and PfSEA stimulation. We also checked the correlation between the MFI expression of Bcl6 and cMAF after stimulation (PfGARP or PfSEA-1A minus the unstimulated) by the meta-clusters and the level of IgG anti-PfGARP and anti-PfSEA. However, we believe that because of our small sample size, our results are not robust enough and that we risk over-interpreting the data. Therefore, we choose not to include this analysis in the manuscript.

      Author response image 4.

      (20) The comprehensive 21-plex panel that authors used in this study could generate insights on additional immune cells beyond cTfh (e.g. additional CD4 T cell subsets, CD8 T cells, CD19 B cells). It is not clear why the authors limited their analysis to cTfh only.

      The primary goal of the study was to assess the cT<sub>FH</sub> response to malaria vaccine candidates. However, we were able to assess the IFNg expression for CD8 T cells upon stimulation using the manual gating as indicated in the supplemental figure 15. Without additional markers to more clearly define other CD4 T cell or B cell subsets, we do not believe this dataset would go deep enough into characterizing antigen-specific responses to malaria antigens that would yield new insight.

      (21) Minor point, the punctuation should be revised throughout the manuscript.

      Punctuation was revised throughout the manuscript by our departmental scientific writer Dr. Trombly, as per reviewer request.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      This a comprehensive study that sheds light on how Wag31 functions and localises in mycobacterial cells. A clear link to interactions with CL is shown using a combination of microscopy in combination with fusion fluorescent constructs, and lipid specific dyes. Furthermore, studies using mutant versions of Wag31 shed light on the functionalities of each domain in the protein. My concerns/suggestions for the manuscript are minor:

      (1) Ln 130. A better clarification/discussion is required here. It is clear that both depletion and overexpression have an effect on levels of various lipids, but subsequent descriptions show that they affect different classes of lipids.

      We thank the reviewer for the comment. We have added a better clarification on this in the discussion of revised manuscript. The lipid classes that get impacted by the depletion of Wag31 vs overexpression are different. Wag31 is an adaptor protein that interacts with proteins of the ACCase complex (Meniche et al., 2014; Xu et al., 2014) that synthesize fatty acid precursors and regulate their activity (Habibi Arejan et al., 2022).

      The varied response on lipid homeostasis could be attributed to a change in the stoichiometry of these interactions of Wag31. While Wag31 depletion would prevent such interactions from occurring and might affect lipid synthesis that directly depends on Wag31-protein partner interactions, its overexpression would lead to promiscuous interactions and a change in the stoichiometry of native interactions that would ultimately modulate lipid synthesis pathways.

      (2) The pulldown assays results are interesting, but links are tentative.

      We thank the reviewer for the comment. The interactome of Wag31 was identified through the immunoprecipitation of FLAG-Wag31 complemented at an integrative locus in Wag31 mutant background to avoid overexpression artifacts. We used Msm::gfp expressing an integrative copy (at L5 locus) of FLAG-GFP as a control to subtract non-specific interactions. The experiment was performed in biological triplicates, and interactors that appeared in all replicates but not in the control were selected for further analysis. Although we identified more than 100 interactors of Wag31, we analyzed only the top 25 hits, with a PSM cut-off 18 and unique peptides5. Additionally, two of Wag31's established interactors, AccD5 and Rne, were among the top five hits, thus validating our data.

      As mentioned in line 139 of the previous version of the manuscript, we agree that the interactions can either be direct or through a third partner. The fact that we obtained known interactors of Wag31 makes us believe these interactions are genuine. Moreover, for validation, we performed pulldown experiments by mixing E. coli lysates expressing His-Wag31 full-length or truncated protein with M. smegmatis lysates expressing FLAG-tagged interacting proteins. The wash conditions used were quite stringent for these pull-down assays—the wash buffer contained 1% Triton X100 that eliminates all non-specific and indirect interactions. However, we agree that we cannot conclusively state that the interactions are direct without purifying the proteins and performing the experiment. As mentioned above, this caveat was stated in the previous version of the manuscript.

      (3) The authors may perhaps like to rephrase claims of effects lipid homeostasis, as my understanding is that lipid localisation rather than catabolism/breakdown is affected.

      We thank the reviewer for the comment. In this manuscript, we are trying to convey that Wag31 is a spatiotemporal regulator of lipid metabolism. It is a peripheral protein that is hooked to the membrane via Cardiolipin and forms a scaffold at the poles, which helps localize several enzymes involved in lipid metabolism.

      Homeostasis is the process by which an organism maintains a steady-state of balance and stability in response to changes. Depletion of Wag31 not only results in delocalisation of lipids in intracellular lipid inclusions but also leads to changes in the levels of various lipid classes. Advancement in the field of spatial biology underscores the importance of native localization of various biological molecules crucial for maintaining a steady-cell of the cell. Hence, we have used the word “homeostasis” to describe both the changes observed in lipid metabolism.

      Reviewer #2 (Public review):

      Summary:

      Kapoor et. al. investigated the role of the mycobacterial protein Wag31 in lipid and peptidoglycan synthesis and sought to delineate the role of the N- and C- terminal domains of Wag31. They demonstrated that modulating Wag31 levels influences lipid homeostasis in M. smegmatis and cardiolipin (CL) localisation in cells. Wag31 was found to preferentially bind CL-containing liposomes, and deleting the N-terminus of the protein significantly decreased this interaction. Novel interactions between Wag31 and proteins involved in lipid metabolism and cell wall synthesis were identified, suggesting that Wag31 recruits proteins to the intracellular membrane domain by direct interaction.

      Strengths:

      (1) The importance of Wag31 in maintaining lipid homeostasis is supported by several lines of evidence. (2) The interaction between Wag31 and cardiolipin, and the role of the N-terminus in this interaction was convincingly demonstrated.

      Weaknesses:

      (1) MS experiments provide some evidence for novel protein-protein interactions. However, the pulldown experiments lack a valid negative control.

      We thank the reviewer for the comment. We have included two non-interactors of Wag31 i.e. MmpL4 and MmpS5 which were not identified in our interactome database as negative controls in the experiment. As shown in Figure S3, we performed His pull-down experiments with both of them independently twice, each time with a positive control (known interactor of Wag31 (Msm2092)). Fig. S3b revised shows E. coli lysate expressing His-Wag31 which was incubated with Msm lysates expressing either FLAG tagged-MmpL4 or -MmpS5 or Msm2092 (revised Fig. S3c). The mixed lysates were pulled down with Cobalt beads that bind to the His-tagged protein and analysed using Western blot analysis by probing with anti-FLAG antibody (revised Fig. S3d.). The data presented confirms that the interactions validated through the pull down assay were indeed specific.

      (2) The role of the N-terminus in the protein-protein interaction has not been ruled out.

      We thank the reviewer for the comment. Wag31<sub>Msm</sub> is a 272 amino acids long protein. The Nterminal of Wag31, which houses the DivIVA-domain, comprises the first 60 amino acids. Previously, we attempted to express the N-terminal (60 aa long) and the C-terminal (212 aa long) truncated proteins in various mycobacterial shuttle vectors to perform MS/MS experiments. Despite numerous efforts, neither expressed with the N/C-terminal FLAG tag or no tag in episomal or integrative vectors due to instability of the protein. Eventually, we successfully expressed the C-terminal Wag31 with an N and Cterminal hexa-His tag. However, this expression was not sufficient or stable enough for us to perform Ni<sup>2+</sup>-affinity pull-down experiments for mass spectrometry. N-terminal of Wag31 could not be expressed in M. smegmatis even with N and C-terminal Hexa-His tags.

      To rule out the role of the N-terminal in mediating protein-protein interactions, we cloned the N-terminal of Wag31 that comprises the DivIVA-domain in pET28b vector (Fig. 7a revised). Subsequently, the truncated protein, hereafter called  Wag31<sub>∆C</sub>  flanked by 6X His tags at both the termini was expressed in E. coli and mixed with Msm lysates expressing interactors of Wag31 (Fig. 7b-c revised). Earlier experiments with Wag31<sub>∆1-60</sub or Wag31<sub>∆N</sub> (in the revised manuscript) were performed with MurG, SepIVA, Msm2092 and AccA3 (Fig. 7e-g). Thus, we used the same set of interactors to test our hypothesis. Briefly, His-  Wag31<sub>∆C</sub>  was mixed with Msm lysates expressing either FLAG-MurG, -SepIVA, -Msm2092 or -AccA3 and pull down experiments were performed as described previously. FLAGMmpS5, a non-interactor of Wag31 was used as a negative control. As shown in Fig. 7d revised, His-Wag31 could bind to all the four interactors whereas His- Wag31<sub>∆C</sub>  couldn’t, strengthening the conclusion that interactions of Wag31 with other proteins are mediated by its Cterminal. However, we can’t ignore the possibility of other interactors binding to the N-terminal of Wag31. Unfortunately, due to poor expression/instability of  Wag31<sub>∆C</sub>  in mycobacterial shuttle vectors, we are unable to perform a global interactome analysis of  Wag31<sub>∆C</sub>

      Reviewer #3 (Public review):

      Summary:

      This manuscript describes the characterization of mycobacterial cytoskeleton protein Wag31, examining its role in orchestrating protein-lipid and protein-protein interactions essential for mycobacterial survival. The most significant finding is that Wag31, which directs polar elongation and maintains the intracellular membrane domain, was revealed to have membrane tethering capabilities.

      Strengths:

      The authors provided a detailed analysis of Wag31 domain architecture, revealing distinct functional roles: the N-terminal domain facilitates lipid binding and membrane tethering, while the C-terminal domain mediates protein-protein interactions. Overall, this study offers a robust and new understanding of Wag31 function.

      Weaknesses:

      The following major concerns should be addressed.

      • Authors use 10-N-Nonyl-acridine orange (NAO) as a marker for cardiolipin localization. However, given that NAO is known to bind to various anionic phospholipids, how do the authors know that what they are seeing is specifically visualizing cardiolipin and not a different anionic phospholipid? For example, phosphatidylinositol is another abundant anionic phospholipid in mycobacterial plasma membrane.

      We thank the reviewer for the comment. Despite its promiscuous binding to other anionic phospholipids, 10-N-Nonyl-acridine orange is widely used to stain Cardiolipin and determine its localisation in bacterial cells and mitochondria of eukaryotes (Garcia Fernandez et al., 2004; Mileykovskaya & Dowhan, 2000; Renner & Weibel, 2011). This is because it has a stronger affinity for Cardiolipin than other anionic phospholipids with the affinity constant being 2 × 10<sup>6</sup> M−<sup>1</sup> for Cardiolipin association and 7 × 10<sup>4</sup> M−<sup>1</sup> for that of phosphatidylserine and phosphatidylinositol association (Petit et al., 1992). Additionally, there is not yet another stain available for detecting Cardiolipin. Our proteinlipid binding assays suggest that Wag31 preferentially binds to Cardiolipin over other anionic phospholipids (Fig. 4b), hence it is likely that the majority of redistribution of NAO fluorescence that we observe might be contributed by Cardiolipin mislocalization due to altered Wag31 levels, with smaller degree of NAO redistribution intensity coming indirectly from other anionic phospholipids displaced from the membrane due to the loss of membrane integrity and cell shape changes due to Wag31.

      • Authors' data show that the N-terminal region of Wag31 is important for membrane tethering. The authors' data also show that the N-terminal region is important for sustaining mycobacterial morphology. However, the authors' statement in Line 256 "These results highlight the importance of tethering for sustaining mycobacterial morphology and survival" requires additional proof. It remains possible that the N-terminal region has another unknown activity, and this yet-unknown activity rather than the membrane tethering activity drives the morphological maintenance. Similarly, the N-terminal region is important for lipid homeostasis, but the statement in Line 270, "the maintenance of lipid homeostasis by Wag31 is a consequence of its tethering activity" requires additional proof. The authors should tone down these overstatements or provide additional data to support their claims.

      We agree with the reviewer that there exists a possibility for another function of the N-terminal that may contribute to sustaining mycobacterial physiology and survival. We would revise our statements in the paper to reflect the data. Results shown suggest that the tethering activity of the Nterminal region may contribute to mycobacterial morphology and survival. However, additional functions of this region can’t be ruled out. Similarly, the maintenance of lipid homeostasis by Wag31 may be associated with its tethering activity, although other mechanisms could also contribute to this process.

      • Authors suggest that Wag31 acts as a scaffold for the IMD (Fig. 8). However, Meniche et. al. has shown that MurG as well as GlfT2, two well-characterized IMD proteins, do not colocalize with Wag31 (DivIVA) (https://doi.org/10.1073/pnas.1402158111). IMD proteins are always slightly subpolar while Wag31 is located to the tip of the cell. Therefore, the authors' biochemical data cannot be easily reconciled with microscopic observations in the literature. This raises a question regarding the validity of protein-protein interaction shown in Figure 7. Since this pull-down assay was conducted by mixing E. coli lysate expressing Wag31 and Msm lysate expression Wag31 interactors like MurG, it is possible that the interactions are not direct. Authors should interpret their data more cautiously. If authors cannot provide additional data and sufficient justifications, they should avoid proposing a confusing model like Figure 8 that contradicts published observations.

      In the literature, MurG and GlfT2 have been shown to have polar localisation (Freeman et al., 2023; Hayashi et al., 2016; Kado et al., 2023) and two groups have shown slightly sub-polar localisation of MurG (García-Heredia et al., 2021; Meniche et al., 2014). Additionally, (Freeman et al., 2023) showed SepIVA to be a spatio-temporal regulator of MurG. MS/MS analysis of Wag31 immunoprecipitation data yielded both MurG and SepIVA to be interactors of Wag31 (Fig. 3). Given Wag31 also displays polar localisation, it is likely that it associates with the polar MurG. However, since a sub-polar localisation of MurG has also been reported, it is possible that they do not interact directly and another protein mediates their interaction. Based on the above, we will modify the model proposed in Fig. 8.

      We agree that for validation of interaction, we performed pulldown experiments by mixing E. coli lysates expressing His-Wag31 full-length or truncated protein with M. smegmatis lysates expressing FLAG-tagged interacting proteins. The wash conditions used were quite stringent for these pull-down assays—the wash buffer contained 1% Triton X100 that eliminates all non-specific and indirect interactions. However, we agree that we cannot conclusively state that the interactions are direct without purifying the proteins and performing the experiment. We will describe this caveat in the revised manuscript and propose a model that reflects the results we obtained.

      References:

      Freeman, A. H., Tembiwa, K., Brenner, J. R., Chase, M. R., Fortune, S. M., Morita, Y. S., & Boutte, C. C. (2023). Arginine methylation sites on SepIVA help balance elongation and septation in Mycobacterium smegmatis. Mol Microbiol, 119(2), 208-223. https://doi.org/10.1111/mmi.15006

      Garcia Fernandez, M. I., Ceccarelli, D., & Muscatello, U. (2004). Use of the fluorescent dye 10-N-nonyl acridine orange in quantitative and location assays of cardiolipin: a study on different experimental models. Anal Biochem, 328(2), 174-180. https://doi.org/10.1016/j.ab.2004.01.020

      García-Heredia, A., Kado, T., Sein, C. E., Puffal, J., Osman, S. H., Judd, J., Gray, T. A., Morita, Y. S., & Siegrist, M. S. (2021). Membrane-partitioned cell wall synthesis in mycobacteria. eLife, 10. https://doi.org/10.7554/eLife.60263

      Habibi Arejan, N., Ensinck, D., Diacovich, L., Patel, P. B., Quintanilla, S. Y., Emami Saleh, A., Gramajo, H., & Boutte, C. C. (2022). Polar protein Wag31 both activates and inhibits cell wall metabolism at the poles and septum. Front Microbiol, 13, 1085918. https://doi.org/10.3389/fmicb.2022.1085918

      Hayashi, J. M., Luo, C. Y., Mayfield, J. A., Hsu, T., Fukuda, T., Walfield, A. L., Giffen, S. R., Leszyk, J. D., Baer, C. E., Bennion, O. T., Madduri, A., Shaffer, S. A., Aldridge, B. B., Sassetti, C. M., Sandler, S. J., Kinoshita, T., Moody, D. B., & Morita, Y. S. (2016). Spatially distinct and metabolically active membrane domain in mycobacteria. Proc Natl Acad Sci U S A, 113(19), 5400-5405. https://doi.org/10.1073/pnas.1525165113

      Kado, T., Akbary, Z., Motooka, D., Sparks, I. L., Melzer, E. S., Nakamura, S., Rojas, E. R., Morita, Y. S., & Siegrist, M. S. (2023). A cell wall synthase accelerates plasma membrane partitioning in mycobacteria. eLife, 12, e81924. https://doi.org/10.7554/eLife.81924

      Meniche, X., Otten, R., Siegrist, M. S., Baer, C. E., Murphy, K. C., Bertozzi, C. R., & Sassetti, C. M. (2014). Subpolar addition of new cell wall is directed by DivIVA in mycobacteria. Proc Natl Acad Sci U S A, 111(31), E32433251. https://doi.org/10.1073/pnas.1402158111

      Mileykovskaya, E., & Dowhan, W. (2000). Visualization of phospholipid domains in Escherichia coli by using the cardiolipin-specific fluorescent dye 10-N-nonyl acridine orange. J Bacteriol, 182(4), 1172-1175. https://doi.org/10.1128/JB.182.4.1172-1175.2000

      Petit, J. M., Maftah, A., Ratinaud, M. H., & Julien, R. (1992). 10N-nonyl acridine orange interacts with cardiolipin and allows the quantification of this phospholipid in isolated mitochondria. Eur J Biochem, 209(1), 267273. https://doi.org/10.1111/j.1432-1033.1992.tb17285.x

      Renner, L. D., & Weibel, D. B. (2011). Cardiolipin microdomains localize to negatively curved regions of Escherichia coli membranes. Proc Natl Acad Sci U S A, 108(15), 6264-6269. https://doi.org/10.1073/pnas.1015757108

      Schägger, H. (2006). Tricine-SDS-PAGE. Nat Protoc, 1(1), 16-22. https://doi.org/10.1038/nprot.2006.4

      Xu, W. X., Zhang, L., Mai, J. T., Peng, R. C., Yang, E. Z., Peng, C., & Wang, H. H. (2014). The Wag31 protein interacts with AccA3 and coordinates cell wall lipid permeability and lipophilic drug resistance in Mycobacterium smegmatis. Biochem Biophys Res Commun, 448(3), 255-260. https://doi.org/10.1016/j.bbrc.2014.04.116

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Ln 130. A better clarification/discussion is required here. It is clear that both depletion and overexpression have an effect in levels of various lipids, but subsequent descriptions show that they affect different classes of lipids.

      We thank the reviewer for the comment. We have included a clarification for this in the discussion section.

      (2) The pulldown assays results are interesting, but the links are tentative.

      We thank the reviewer for the comment. The interactome of Wag31 was identified through the immunoprecipitation of Flag-tagged Wag31 complemented at an integrative locus in Wag31 mutant background to avoid overexpression artifacts. We used Msm::gfp expressing an integrative copy (at L5 locus) of FLAG-GFP as a control to subtract non-specific interactions. The experiment was performed in biological triplicates, and interactors that appeared in all replicates were selected for further analysis. Although we identified more than 100 interactors of Wag31, we analyzed only the top 25 hits, with a PSM cut-off 18 and unique peptides5. Additionally, two of Wag31's established interactors, AccD5 and Rne, were among the top five hits, thus validating our data.

      Though we agree that the interactions can either be direct or through a third partner, the fact that we obtained known interactors of Wag31 makes us believe these interactions are genuine. Moreover, for validation, we performed pulldown experiments by mixing E. coli lysates expressing HisWag31 full-length or truncated protein with M. smegmatis lysates expressing FLAG-tagged interacting proteins. The wash conditions used were quite stringent for these pull-down assays—the wash buffer contained 1% Triton X100 that eliminates all non-specific and indirect interactions. However, we agree that we cannot conclusively state that the interactions are direct without purifying the proteins and performing the experiment. We will describe this caveat in the revised manuscript.

      (3) The authors may perhaps like to rephrase claims of effects lipid homeostasis, as my understanding is that lipid localisation rather than catabolism/breakdown is affected.

      We thank the reviewer for the comment. In this manuscript, we are trying to convey that Wag31 is a spatiotemporal regulator of lipid metabolism. It is a peripheral protein that is hooked to the membrane via Cardiolipin and forms a scaffold at the poles, which helps localize several enzymes involved in lipid metabolism.

      Homeostasis is the process by which an organism maintains a steady-state of balance and stability in response to changes. Depletion of Wag31 not only results in delocalisation of lipids in intracellular lipid inclusions but also leads to changes in the levels of various lipid classes. Advancement in the field of spatial biology underscores the importance of native localization of various biological molecules crucial for maintaining a steady-cell of the cell. Hence, we have used the word “homeostasis” to describe both the changes observed in lipid metabolism.

      Reviewer #2 (Recommendations for the authors):

      I recommend the following experiments to strengthen the data presented:

      (1) Include a non-interacting FLAG-tagged protein as a negative control in the pull-down experiment to strengthen this data.

      We thank the reviewer for the comment. As suggested, we have included non-interacting FLAGtagged proteins as negative controls in the pulldown experiment. We chose MmpL4 and MmpS5 which were not found in the Wag31 interactome data. We performed pull-down experiments with both of them and included an interactor of Wag31 i.e. Msm2092 as a positive control. Fig. S3b revised shows E. coli lysate expressing His-Wag31 which was incubated with Msm lysates expressing either FLAG taggedMmpL4 or -MmpS5 or -Msm2092 (Fig. S3c revised). The mixed lysates were pulled down with Cobalt beads that bind to the His-tagged protein and analysed using Western blot analysis by probing with anti-FLAG antibody. The pull down experiments were performed independently twice, every time with Msm2092 as the positive control (Fig. S3d. revised).

      (2) Perform the pull-down experiments using only the Wag31 N-terminus to rule out any role that it may have in the protein-protein interactions.

      We thank the reviewer for the comment. To rule out the possibility of N-terminal of Wag31 in mediating protein-protein interactions, we cloned the N-terminal of Wag31 that comprises the DivIVAdomain in pET28b vector (Fig. 7a revised). Subsequently, the truncated protein, hereafter called Wag31<sub>∆C</sub> flanked by 6X His tags at both the termini was expressed in E. coli and subsequently mixed with Msm lysates expressing interactors of Wag31 (Fig. 7b-c revised). Earlier experiments with Wag31<sub>∆1-60</sub> or Wag31<sub>∆N</sub>  were performed with MurG, SepIVA, Msm2092 and AccA3 (Fig. 7 previous) so we used the same set of interactors to test our hypothesis. Briefly, His-Wag31<sub>∆C</sub>was mixed with Msm lysates expressing either FLAG-MurG, -SepIVA, -Msm2092 or -AccA3 and pull down experiments were performed as described previously. FLAG-MmpS5, a non-interactor of Wag31 was used as a negative control. As shown in Fig. 7d revised, His-Wag31 could bind to all the four interactors whereas His-Wag31<sub>∆C</sub> couldn’t, strengthening the conclusion that interactions of Wag31 with other proteins are mediated by its C-terminal. However, we can’t ignore the possibility of other proteins binding to the Nterminal of Wag31. Unfortunately, due to poor expression/instability of Wag31<sub>∆C</sub> in mycobacterial shuttle vectors, we couldn’t perform a global interactome analysis of Wag31<sub>∆C</sub>.

      Minor comments:

      - Please check the legend of Fig. 1g, it appears to be labelled incorrectly.

      We have checked it. It is correct. From Fig. 1g we are trying to reflect on the percentages of cells of the three strains i.e. Msm+ATc, Δwag31-ATc, and Δwag31+ATc displaying rod, round or bulged morphology.

      - For MS/MS analysis, a GFP control is mentioned but it is not indicated how this was incorporated in the data analysis. This information should be added.

      We have incorporated that in the revised methodology.

      - The information presented in Fig. 3a, e and f could be combined in one table.

      We appreciate the idea of the reviewer but we prefer a pictorial representation of the data. It allows readers to consume the information in parts, make quicker comparisons and understand trends easily.

      - Fig. 4c Wag31K20A appears smaller in size than the wild-type protein - why is this the case? Is this not a single amino acid substitution?

      Though K20A is a single amino acid substitution, it alters the mobility of Wag31 on SDS-PAGE gel. The sequence analysis of the plasmid expressing Wag31<sub>K20A</sub> doesn’t show additional mutations other than the desired K20A. The change in mobility could be due to a change in the conformation of Wag31<sub>K20A</sub> or its ability to bind to SDS or both that modify its mobility under the influence of electric field.

      - Please clarify what is contained in the first panel of fig 4e. compared to what is in the second panel.

      The first panel represents CL-Dil-Liposomes before incubation with Wag31-GFP and the second panel shows CL-Dil-Liposomes after incubation with Wag31-GFP. The third panel shows the mixture as observed in the green channel to investigate the localisation of Wag31-GFP in the liposome-protein mix. Fourth panel shows the merged of second and third.

      - The data in Fig 6d suggests higher levels of CL in the ∆wag31 compared to wild-type - how do the authors reconcile this with the MS data in Fig. 2g showing lower CL levels?

      Fig. 6d represents the distribution of CL localisation in the tested strains of mycobacteria whereas Fig. 2g shows the absolute levels of CL in various strains. We attribute greater confidence on the lipidomics data which suggests down regulation of CL species. The NAO staining and microscopy is merely for studying localization of the CL along the cell, and cannot be used to reliably quantify or equate it to CL levels. The staining using a probe such as NAO is dependent on factors such as hydrophobicity and permeability of the cell wall, which we expect to be severely altered in a Wag31 mutant. Therefore, the increased staining of NAO seen in Wag31 mutant could just be reflective of the increased uptake of the dye rather than absolute levels of CL. The specificity of staining and localization however can be expected to be unaltered.

      Reviewer #3 (Recommendations for the authors):

      Following are suggestions for improving the writing and presentation.

      • Figure 1, the meaning of the yellow arrows present in f and h should be mentioned in the figure legend.

      We have incorporated that in the revised legend. In Fig.1f, the yellow arrowhead represents the bulged pole morphology whereas in Fig. 1h, it indicates intracellular lipid inclusions.

      • Figure 7 legend refers to panels g, h, and i. However, Figure 7 only has panels a-c. The legend lacks a description of panel c.

      We have corrected the typos and the legend.

      • Figure S1, F2-R2 and F3-R3 expected sizes should be stated in the legend of the figure.

      We have updated the legends.

      • Figure S5, is this the same figure as 5e? If so, there is no need for this figure.

      We have removed Fig. S5.

      • Methods need to be written more carefully with enough details. I listed some of the concerns below.

      Detailed methodology was previously provided in the supplementary material and now we have moved it to the materials and methods in the revised manuscript.

      • Line 392, provide more details on western blotting. What is the secondary antibody? What image documentation system was used?

      We have updated the methodology.

      • Line 400, while the methods may be the same as the reference 64, authors should still provide key details such as the way samples were fixed and processed for SEM and TEM.

      We have provided a detailed description of the same in methodology in the revised version.

      • Line 437, how do authors calculate the concentration of liposome to be 10 µM? Do they possibly mean the concentration of phospholipids used to make the liposomes?

      Yes, this is the concentration of total lipids used to make liposomes. 1 μM of Wag31 or its mutants were mixed with 100 nm extruded liposomes containing 10 μm total lipid in separate Eppendorf tubes.

      • Supplemental Line 9, "turns of" should read "turns off".

      We have edited this.

      • Supplemental Line 13, define LHS and RHS.

      LHS or left hand sequence and RHS or right hand sequence refers to the upstream and downstream flanking regions of the gene of interest.

      • Supplemental Line 20, indicate the manufacturer of the microscope and type of the objective lens.

      We have added these details now.

      • Supplemental Line 31, define MeOH, or use a chemical formula like chloroform.

      MeOH is methanol. We have provided a chemical formula in the revised version.

      • Supplemental Line 53, indicate the concentration of trypsin.

      We have included that in the revised version.

      • Supplemental Line 72, g is not a unit. "30,000 g" should be "30,000x g".

      We have revised this in the manuscript.

      • Supplemental Line 114, provide more details on western blotting. What is the manufacturer of antiFLAG antibody? What is the secondary antibody? How was the antibody binding visualized? What image documentation system was used?

      We have provided these details in the revised version.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1:

      Weaknesses:

      (1) Only Experiment 1 of Rademaker et al (2019) is reanalyzed. The previous study included another experiment (Expt 2) using different types of distractors which did result in distractor-related costs to neural and behavioral measures of working memory. The Rademaker et al (2019) study uses these two results to conclude that neural WM representations are protected from distraction when distraction does not impact behavior, but conditions that do impact behavior also impact neural WM representations. Considering this previous result is critical for relating the present manuscript's results to the previous findings, it seems necessary to address Experiment 2's data in the present work

      We thank the reviewer for the proposal to analyze Experiment 2 where subjects completed the same type of visual working memory task, but instead had either a flashing orientation distractor or a naturalistic (gazebo or face) distractor present during two-thirds of the trials. As the reviewer points out, unlike Experiment 1, these two conditions in Experiment 2 had a behavioral impact on recall accuracy, when compared to the blank delay. We have now run the temporal cross-decoding analysis, temporally-stable neural subspace analysis, and condition cross-decoding analysis in Experiment 2. The results from the stable subspace analysis are present in Figure 3, while the results from the temporal cross-decoding analysis and condition cross-decoding analysis are present in the Supplementary Data.

      First, we are unable to draw strong conclusions from the temporal cross-decoding analysis, as the decoding accuracies across time in Experiment 2 are much lower compared to Experiment 1. In some ROIs of the naturalistic distractor condition we see that some diagonal elements are not part of the above-chance decoding cluster, making it difficult to draw any conclusions regarding dynamic clusters. We do see some dynamic coding in the naturalistic condition in V3 where the off-diagonals do not show above-chance decoding. Since the temporal cross-decoding provides low accuracies, we do not examine the dynamics of neural subspaces across time.

      We do, however, run the stable subspace analysis on the flashing orientation distractor condition. Just like in Experiment 1, we examine temporally stable target and distractor subspaces. When projecting the distractor onto the working memory target subspace, we see a higher overlap between the two as compared to Experiment 1. A similar pattern is seen also when projecting the target onto the distractor subspace. We still see an above-chance principal angle between the target and distractor; however, this angle is qualitatively smaller compared to Experiment 1. This shows that the degree of separation between the two neural subspaces is impacted by behavioral performance during recall.

      (2) Primary evidence for 'dynamic coding', especially in the early visual cortex, appears to be related to the transition between encoding/maintenance and maintenance/recall, but the delay period representations seem overall stable, consistent with previous findings

      We agree with the reviewer that we primarily see dynamic coding between the encoding/maintenance and at the end of the maintenance periods, implying the WM representations are stable in most ROIs. The only place where we argue that we might see more dynamic coding during the delay itself is in V1 during the noise distractor trials in Experiment 1.

      (3) Dynamicism index used in Figure 1f quantifies the proportion of off-diagonal cells with significant differences in decoding performance from the diagonal cell. It's unclear why the proportion of time points is the best metric, rather than something like a change in decoding accuracy. This is addressed in the subsequent analysis considering coding subspaces, but the utility of the Figure 1f analysis remains weakly justified.

      We agree that other metrics can also provide a summary of dynamics; here, the dynamicism index just acts as a summary visualizing the dynamic elements. It offers an intuitive way to visualize peaks and troughs of the dynamic code across the extent of the trial.

      (4) There is no report of how much total variance is explained by the two PCs defining the subspaces of interest in each condition, and timepoint. It could be the case that the first two principal components in one condition (e.g., sensory distractor) explain less variance than the first two principal components of another condition.

      We thank the reviewer for this comment. We have now included the percent variance explained for the two PCs in both the temporally-stable target and distractor subspace and the dynamic subspace analysis. The percent-explained is comparable across analyses; the first PC ranges from 43-50% and the second ranges from 28-37%. The PCs within each analysis (dynamic no-distractor, orientation and noise distractor; temporally-stable target and distractor) are even closer in range (Figure 2c and 3d).

      (5) Converting a continuous decoding metric (angular error) to "% decoding accuracy" serves to obfuscate the units of the actual results. Decoding precision (e.g., sd of decoding error histogram) would be more interpretable and better related to both the previous study and behavioral measures of WM performance.

      We thank the reviewer for the comments. FCA is a linear function of the angular error that uses the following equation:

      We think that the FCA does not obfuscate the results, but instead provides an intuitive scale where 0% accuracy corresponds to a 180° error, 50% to a 90° error and so on. This also makes it easy to reverse-calculate the absolute error if need be. Our lab has previously used this method in other neuroimaging papers with continuous variables (Barbieri et al. 2023, Weber et al. 2024).

      We do, however, agree that “% decoding accuracy” does not provide an accurate reflection of the metric used. We have thus now changed “% decoding accuracy” to “Accuracy (% FCA)”.

      (6) This report does not make use of behavioral performance data in the Rademaker et al (2019) dataset.

      We have now analyzed Experiment 2 which, as previously mentioned by the reviewer and unlike Experiment 1, showed a decrease in recall accuracy during the two distractor conditions. We address the results from Experiment 2 in a previous response (please see Weaknesses 1).

      We do not, however, relate single subject behavioral performance to neural measurements, as we do not think there is enough power to do so with a small number of subjects in both Experiment 1 and 2. 

      (7) Given there were observed differences between individual retinotopic ROIs in the temporal cross-decoding analyses shown in Figure 1, the lack of data presented for the subspace analyses for the corresponding individual ROIs is a weakness

      We have now included an additional supplementary figure that shows individual plots of each ROI for the temporally stable subspace analysis for both Experiment 1 and Experiment 2 (Supplementary Figure 5). 

      Reviewer #1 (Recommendations For The Authors):

      (1) Is there any relationship between stable/dynamic coding properties and aspects of behavioral performance? This seems like a major missed opportunity to better understand the behavioral relevance or importance of the proposed dynamic and orthogonal coding schemes. For example, is it the case that participants who have more orthogonal coding subspaces between orientation distractor and remembered orientation show less of a behavioral consequence to distracting orientations? Less induced bias? I know these differences weren't significant at the group level in the original study, but maybe individual variability in the metrics of this study can explain differences in performance between participants in the reported dataset

      As mentioned in the previous response, we do not run individual correlations between dynamic or orthogonal coding metrics and behavioral performance, because of the small number of subjects in both experiments. We believe that for a brain-behavior correlation between average behavioral error of subjects and an average brain measure, we would need a larger sample size.  

      (2) The voxel selection procedure differs from the original study. The authors should add additional detail about the number of voxels included in their analyses, and how this number of voxels compares to that used in the original study.

      We have now added a figure summarizing the number of voxels selected across participants. We do select fewer voxels compared to Rademaker et al. 2019 (see their Supplementary Tables 9 and 10 and our Supplementary Figure 8). For example we have ~500 voxels on average in V1 in Experiment 1, while the original study had ~1000. As mentioned in the methods, we aimed to select voxels that reliably responded to both the perception localizer conditions and the working memory trials.

      (3) Lines 428-436 specify details about how data is rescaled prior to decoding. The procedure seems to estimate rescaling factors according to some aspect of the training data, and then apply this rescaling to the training and testing data. Is there a possibility of leakage here? That is - do aspects of the training data impact aspects of the testing data, and could a decoder pick up on such leakage to change decoding? It seems this is performed for each training/testing timepoint pair, and so the temporal unfolding of results may depend on this analysis choice.

      Thank you for the suggestion. To prevent data leakage, the mean and standard deviation are computed exclusively from the training set. These scaling parameters are then applied to the test set, ensuring that no information from the test set influences the training process. This transformation simply adjusts the test set to the same scale as the training data, without exposing the model to unseen test data during training.

      (4) Figure 1d, V1: it looks like the 'dynamics' are a bit non-symmetric - perhaps the authors could comment on this detail of the results? Why would we expect there would be a dynamic cluster on one side of the diagonal, but not the other? Given that this region, condition is the primary evidence for a dynamic code that's not related to the beginning/end of delay (see other comments), figuring this out is of particular importance.

      We thank the reviewer for this question. We think that this is just due to small numerical differences in the upper and lower triangles of the matrix, rather than a neuroscientifically interesting effect. However, this is only a speculative observation.

      (5) I think it's important to address the issue I raised in "weaknesses" about variance explained by the top N principal components in each condition. What are we supposed to learn from data projected into subspaces fit to different conditions if the subspaces themselves are differently useful?

      Thank you, this has now been addressed in a previous comment (please see Weakness 4). 

      Reviewer #2:

      Weaknesses:

      (1) An alternative interpretation of the temporal dynamic pattern is that working memory representations become less reliable over time. As shown by the authors in Figure 1c and Figure 4a, the on-diagonal decoding accuracy generally decreased over time. This implies that the signal-to-noise ratio was decreasing over time. Classifiers trained with data of relatively higher SNR and lower SNR may rely on different features, leading to poor generalization performance. This issue should be addressed in the paper.

      We thank the reviewer for raising this issue and we have now run three simulations that aim to address whether a changing SNR across time might create dynamic clusters. 

      In the first simulation we created a dataset of 200 voxels that have a sine or cosine response function to orientations between 1° to 180°, the same orientations as the remembered target. A circular shift is applied to each voxel to vary preferred (or maximal) responses of each simulated voxel. We then assess the decoding performance under different SNR conditions during training and testing. For each of the seven iterations we selected 108 responses (out of 180) to train on and 108 to test on. To increase variability the selected trials differed in each iteration. Random white noise was applied to the data and thus the SNR was independently scaled according to the specified levels for train and test data. We then use the same pSVR decoder as in the temporal cross decoding analysis to train and test. 

      The second and third simulations more directly address whether increased noise levels  would induce the decoder to rely on different features of the no-distractor and noise distractor data. We use empirical data from the primary visual cortex (V1; where dynamic coding was seen in the noise distractor trials) under the no-distractor and noise distractor conditions for the second and third simulations, respectively. Data from time points 5.6–8.8 seconds after stimulus onset are averaged across five TRs. As in the first simulation, SNR is systematically manipulated by adding white noise. Additionally, to see whether the initial decrease in SNR and subsequent increase would result in dynamic coding clusters, we initially increased and subsequently decreased the amplitude of added noise. The same pSVR decoder was used to train and test on the data with different levels of added noise.

      We see an absence of dynamic elements in the SNR cross-decoding matrices, as the decoding accuracy primarily depends on the training data rather than test data. This results in some off-diagonal values in the decoding matrix that are higher, rather than smaller, than corresponding on-diagonal elements.

      We have now added a Methods section explaining the simulations in more detail and Supplementary Figure 9 showing the SNR cross-decoding matrices. 

      (2) The paper tests against a strong version of stable coding, where neural spaces representing WM contents must remain identical over time. In this version, any changes in the neural space will be evidence of dynamic coding. As the paper acknowledges, there is already ample evidence arguing against this possibility. However, the evidence provided here (dynamic coding cluster, angle between coding spaces) is not as strong as what prior studies have shown for meaningful transformations in neural coding. For instance, the principal angle between coding spaces over time was smaller than 8 degrees, and around 7 degrees between sensory distractors and WM contents. This suggests that the coding space for WM was largely overlapping across time and with that for sensory distractors. Therefore, the major conclusion that working memory contents are dynamically coded is not well-supported by the presented results.

      We thank the reviewer for this comment. The principal angles we calculate are above-baseline, meaning that we subtract the within-subspace principal angles from the between-subspace principal angles and take the average. Thus a 7 degree difference does not imply that there are only 7 degrees separating e.g. the sensory distractor from the target; it just indicates that the separation is 7 degrees above chance. 

      (3) Relatedly, the main conclusions, such as "VWM code in several visual regions did not generalize well between different time points" and "VWM and feature-matching sensory distractors are encoded in separable coding spaces" are somewhat subjective given that cross-condition generalization analyses consistently showed above chance-level performance. These results could be interpreted as evidence of stable coding. The authors should use more objective descriptions, such as 'temporal generalization decoding showed reduced decoding accuracy in off-diagonals compared to on-diagonals.

      Thank you, we agree that our previous claims might have been too strong. We have now toned down our statements in the Abstract and use “did not fully generalize” and “VWM and feature-matching sensory distractors are encoded in coding spaces that do not fully overlap.”

      Reviewer #2 (Recommendations For The Authors):

      Weakness 1 can potentially be addressed with data simulations that fix the signal pattern, vary the noise pattern, and perform the same temporal generalization analysis to test whether changes in SNR can lead to seemingly dynamic coding formats.

      Thank you for the great suggestion. We have now run the suggested simulations. Please see above (response to Weakness 1).

      There are mismatches in the statistical symbols shown in Figure 4 and Supplementary Table 2. It seems that there was a swap between the symbols for the noise between-condition and noise within-condition.

      Thank you, this has now been fixed.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors investigate ligand and protein-binding processes in GPCRs (including dimerization) by the multiple walker supervised molecular dynamics method. The paper is interesting and it is very well written.

      Strengths:

      The authors' method is a powerful tool to gain insight on the structural basis for the pharmacology of G protein-coupled receptors.

      We thank the Reviewer for the positive comment on the manuscript and the proposed methods.

      Reviewer #2 (Public review):

      The study by Deganutti and co-workers is a methodological report on an adaptive sampling approach, multiple walker supervised molecular dynamics (mwSuMD), which represents an improved version of the previous SuMD.

      Case-studies concern complex conformational transitions in a number of G protein Coupled Receptors (GPCRs) involving long time-scale motions such as binding-unbinding and collective motions of domains or portions. GPCRs are specialized GEFs (guanine nucleotide exchange factors) of heterotrimeric Gα proteins of the Ras GTPase superfamily. They constitute the largest superfamily of membrane proteins and are of central biomedical relevance as privileged targets of currently marketed drugs.

      MwSuMD was exploited to address:

      a) binding and unbinding of the arginine-vasopressin (AVP) cyclic peptide agonist to the V2 vasopressin receptor (V2R);

      b) molecular recognition of the β2-adrenergic receptor (β2-AR) and heterotrimeric GDPbound Gs protein;

      c) molecular recognition of the A1-adenosine receptor (A1R) and palmotoylated and geranylgeranylated membrane-anchored heterotrimeric GDP-bound Gi protein;

      d) the whole process of GDP release from membrane-anchored heterotrimeric Gs following interaction with the glucagon-like peptide 1 receptor (GLP1R), converted to the active state following interaction with the orthosteric non-peptide agonist danuglipron.

      The revised version has improved clarity and rigor compared to the original also thanks to the reduction in the number of complex case studies treated superficially.

      The mwSuMD method is solid and valuable, has wide applicability and is compatible with the most world-widely used MD engines. It may be of interest to the computational structural biology community.

      The huge amount of high-resolution data on GPCRs makes those systems suitable, although challenging, for method validation and development.

      While the approach is less energy-biased than other enhanced sampling methods, knowledge, at the atomic detail, of binding sites/interfaces and conformational states is needed to define the supervised metrics, the higher the resolution of such metrics is the more accurate the outcome is expected to be. Definition of the metrics is a user- and system-dependent process.

      We thank the Reviewer for the positive comment on the revised manuscript and mwSuMD. We agree that the choice of supervised metrics is user- and systemdependent. We aim to improve this aspect in the future with the aid of interpretable machine learning.

      Reviewer #3 (Public review):

      Summary:

      In the present work Deganutti et al. report a structural study on GPCR functional dynamics using a computational approach called supervised molecular dynamics.

      Strengths:

      The study has potential to provide novel insight into GPCR functionality. Example is the interaction between D344 and R385 identified during the Gs coupling by GLP-1R. However, validation of the findings, even computationally through for instance in silico mutagenesis study, is advisable.

      Weaknesses:

      No significant advance of the existing structural data on GPCR and GPCR/G protein coupling is provided. Most of the results are reproductions of the previously reported structures.

      The method focus of our study (mwSuMD) is an enhancement of the supervised molecular dynamics that allows supervising two metrics at the same time and uses a score, rather than a tabù-like algorithm, for handing the simulation. Further changes are the seeding of parallel short replicas (walkers) rather than a series of short simulations, and the software implementation on different MD engines (e.g. Acemd, OpenMM, NAMD, Gromacs).

      We agree with the Reviewer that experimental validation of the findings would be advisable, in line with any computational prediction. We are positive that future studies from our group employing mwSuMD will inform mutagenesis and BRET-based experiments.

      Reviewer #2 (Recommendations for the authors):

      As for GLP1R, I remain convinced that the 7LCI would have been better as a reference for all simulations than 7LCJ, also because 7LCI holds a slightly more complete ECD.

      We agree that 7LCJ would have been a better starting point than 7LCI for simulations because it presents the stalk region, contrary to 7LCJ. However, we do not think it might have influenced the output because the stalk is the most flexible segment of GLP1R, and any initial conformation is usually not retained during MD simulations.

      Please, correct everywhere the definition of the 6LN2 structure of GPL1R as a ligand-free or apo, because that structure is indeed bound to a negative allosteric modulator docked on the cytosolic end of helix-6

      We thank the reviewer for this precision. The text has been modified accordingly.

      As for the beta2-AR, the "full-length" AlphaFold model downloaded from the GPCRdb is not an intermediate active state because it is very similar to the receptor in the 3SN6 complex with Gs. Please, eliminate the inappropriate and speculative adjective "intermediate".

      We have changed “intermediate” to “not fully active”, which is less speculative since full activation can be achieved only in the presence of the G protein.

      Incidentally, in that model, the C-tail, eliminated by the authors, is completely wrong and occupies the G protein binding site. It is not clear to me the reason why the authors preferred to used an AlphaFold model as an input of simulations rather than a high resolution structural model, e.g. 4LDO. Perhaps, the reason is that all ICL regions, including ICL3, were modeled by AlphaFold even if with low confidence. I disagree with that choice.

      We understand the reviewer’s point of view. Should we have simulated an “equilibrium” receptor-ligand complex, we would have made the same choice. However, the conformational changes occurring during a G protein binding are so consistent that the starting conformation of the receptor becomes almost irrelevant as long as a sensate structure is used.  

      Reviewer #3 (Recommendations for the authors):

      The revised version of the manuscript is more concise, focusing only on two systems. However, the authors have responded superficially to the reviewers' comments, merely deleting sections of text, making minor corrections, or adding small additions to the text. In particular, the authors have not addressed the main critical points raised by both Reviewer 2 and Reviewer 3. 

      For example, the RMSD values for the binding of PF06882961 to GLP-1R remain high, raising doubts about the predictive capabilities of the method, at least for this type of system.

      What is the RMSD of the ligand relative to the experimental pose obtained in the simulations? This value must be included in the text.

      We have added this piece of information about PF06882961 RMSD in the text, which on page 6 now reads “We simulated the binding of PF06882961, reaching an RMSD to its bound conformation in 7LCJ of 3.79 +- 0.83 Å (computed on the second half of the merged trajectory, superimposing on GLP-1R Ca atoms of TMD residues 150 to 390), using multistep supervision on different system metrics (Figure 2) to model the structural hallmark of GLP-1R activation (Video S5, Video S6).”

      Similarly, the activation mechanism of GLP-1R is only partially simulated.

      Furthermore, it is not particularly meaningful to justify the high RMSD values of the SuMD simulations for the binding of Gs to GLP-1R by comparing them with those reported under unbiased MD conditions. "Replica 2, in particular, well reproduced the cryo-EM GLP-1R complex as suggested by RMSDs to 7LCI of 7.59{plus minus}1.58Å, 12.15{plus minus}2.13Å, and 13.73{plus minus}2.24Å for Gα, Gβ, and Gγ respectively. Such values are not far from the RMSDs measured in our previous simulations of GLP-1R in complex with Gs and GLP-149 (Gα = 6.18 {plus minus} 2.40 Å; Gβ = 7.22 {plus minus} 3.12 Å; Gγ = 9.30 {plus minus} 3.65 Å), which indicates overall higher flexibility of Gβ and Gγ compared to Gα, which acts as a sort of fulcrum bound to GLP-1R."

      Without delving into the accuracy of the various calculations, the authors should acknowledge that comparing protein structures with such high RMSD values has no meaningful significance in terms of convergence toward the same three-dimensional structure.

      The text has been edited to accommodate the reviewer’s suggestion and still give the readers the measure of the high flexibility of Gs bound to GLP-1R. It now reads “Such values do not support convergence with the static experimental structure but are not far from the RMSDs measured in our previous simulations of GLP-1R in complex with G<sub>s</sub> and GLP-1 (G<sub>α</sub> = 6.18 ± 2.40 Å; G<sub>b</sub> = 7.22 ± 3.12 Å; G<sub>g</sub> = 9.30 ± 3.65 Å), which indicates overall higher flexibility of G<sub>b</sub> and G<sub>g</sub> compared to G<sub>α</sub>, which acts as a sort of fulcrum bound to GLP-1R.”

      Have the authors simulated the binding of the Gs protein using the experimentally active structure of GLP-1R in complex with the ligand PF06882961 (PDB ID 7LCJ)? Such a simulation would be useful to assess the quality of the binding simulation of Gs to the GLP1R/PF06882961 complex obtained from the previous SuMD.

      We considered performing the Gs binding simulation to the active structure of GLP-1R.

      However, the GLP-1R (and other class B receptors) fully active state, as reported in 7LCJ, depends on the presence of the Gs and can be reached only upon effector coupling. Since it is unlikely that the unbound receptor is already in the fully active state, we reasoned that considering it as a starting point for Gs binding simulations would have been an artifact.

      An example of the insufficient depth of the authors' replies can be seen in their response: "We note that among the suggested references, only Mafi et al report about a simulated G protein (in a pre-formed complex) and none of the work sampled TM6 rotation without input of energy."

      This statement is inaccurate. For instance, D'Amore et al. (Chem 2024, doi: 10.1016/j.chempr.2024.08.004) simulated Gs coupling to A2A as well as TM6 rotation, as did Maria-Solano and Choi (eLife 2023, doi: 10.7554/eLife.90773.1). The former employed path collective variables metadynamics, which is not cited in the introduction or the discussion, despite its relevance to the methodologies mentioned.

      Respectfully, our previous reply is correct, as all of the mentioned articles used enhanced (energy-biased) approaches, so the claim “none of the work sampled TM6 rotation without input of energy” stands. The reference to D’Amore et al. (published after the previous round of reviews of this manuscript) has been added to the introduction; we thank the reviewer for pointing it out. 

      Additionally, SuMD employs a tabu algorithm that applies geometric supervision to the simulation, serving as an alternative approach to enhancing sampling compared to the "input of energy" techniques as called by the authors. A fair discussion should clearly acknowledge this aspect of the SuMD methodology.

      We have now specified in the Methods that a tabù-like algorithm is part of SuMD, which, despite being the parent technique of mwSuMD, is not the focus of the present work. We provide extended references for readers interested in SuMD. mwSuMD, on the other hand, does not use a tabù-like algorithm but rather a continuative approach based on a score to select the best walker for each batch, as described in the Methods.

    1. Author response:

      Reviewer #1 (Evidence, reproducibility and clarity):

      Minor comments:

      In the results section (lines 498-499), the authors describe free kinetochores in many cells without associated spindle microtubules. However, some nuclei appear to have kinetochores, as presented in Figure 6. Could the authors clarify how this conclusion was derived using transmission electron microscopy (TEM) without serial sectioning, as this is not explicitly mentioned in the materials and methods?

      We observed free kinetochores in the ALLAN-KO parasites with no associated spindle microtubules (see Fig. 6Gh), while kinetochores are attached to spindle microtubules in WT-GFP cells (see Fig. 6Gc). To provide further evidence we analysed additional images and found that ALLAN-KO cells have free kinetochores in the centre of nucleus, unattached to spindle microtubules. We provide some more images clearly showing free kinetochores in these cells (new supplementary Fig. S11).

      However, in the ALLAN mutant, this difference is not absolute: in a search of over 50 cells, one example of a cell with a “normal” nuclear spindle and attached kinetochores was observed.

      The use of serial sectioning has limitations for examining small structures like kinetochores in whole cells. The limitations of the various techniques (for example, SBF-SEM vs tomography) are highlighted in our previous study (Hair et al 2022; PMID: 38092766), and we consider that examining a population of randomly sectioned cells provides a better understanding of the overall incidence of specific features.

      Discussion Section:

      Could the authors expand on why SUN1 and ALLAN are not required during asexual replication, even though they play essential roles during male gametogenesis?

      We observed no phenotype in asexual blood stage parasites associated with the sun1 and allan gene deletions. Several other Plasmodium berghei gene knockout parasites with a phenotype in sexual stages, for example CDPK4 (PMID: 15137943), SRPK (PMID: 20951971), PPKL (PMID: 23028336) and kinesin-5 (PMID: 33154955) have no phenotype in blood stages, so perhaps this is not surprising. One explanation may be the substantial differences in the mode of cell division between these two stages. Asexual blood stages produce new progeny (merozoites) over 24 hours with closed mitosis and asynchronous karyokinesis during schizogony, while male gametogenesis is a rapid process, completed within 15 min to produce eight flagellated gametes. During male gametogenesis the nuclear envelope must expand to accommodate the increased DNA content (from 1N to 8N) before cytokinesis. Furthermore, male gametogenesis is the only stage of the life cycle to make flagella, and axonemes must be assembled in the cytoplasm to produce the flagellated motile male gametes at the end of the process. Thus, these two stages of parasite development have some very different and specific features.

      Lines 611-613 states: "These loops serve as structural hubs for spindle assembly and kinetochore attachment at the nuclear MTOC, separating nuclear and cytoplasmic compartments." Could the authors elaborate on the evidence supporting this statement?

      We observed the loops/folds in the nuclear envelope (NE) as revealed by SUN1-GFP and 3D TEM images during male gametogenesis. These folds/loops occur mainly in the vicinity of the nuclear MTOC where the spindles are assembled (as visualised by EB1 fluorescence) and attached to kinetochores (as visualised by NDC80 fluorescence). These loops/folds may form due to the contraction of the spindle pole back to the nuclear periphery, inducing distortion of the NE. Since there is no physical segregation of chromosomes during the three rounds of mitosis (DNA increasing from 1N to 8N), we suggest that these folds provide additional space for spindle and kinetochore dynamics within an intact NE to maintain separation from the cytoplasm (as shown by location of kinesin-8B).

      In lines 621-622, the authors suggest that ALLAN may have a broader role in NE remodelling across the parasite's lifecycle. Could they reflect on or remind readers of the finding that ALLAN is not essential during the asexual stage?

      ALLAN-GFP is expressed throughout the parasite life cycle but as the reviewer points out, a functional role is more pronounced during male gametogenesis. This does not mean that it has no role at other stages of the life cycle even if there is no obvious phenotype following deletion of the gene during the asexual blood stage. The fact that ALLAN is not essential during the asexual blood stage is noted in lines 628-29.

      Reviewer #2 (Evidence, reproducibility and clarity):

      Introduction

      Line 63: The authors stat: "NE is integral to mitosis, supporting spindle formation, kinetochore attachment, and chromosome segregation..". Seemingly at odds, they also say (Line 69) that 'open' "mitosis is "characterized by complete NE disassembly".

      The authors could explain better the ideas presented in their quoted review from Dey and Baum, which points out that truly 'open' and 'closed' topologies may not exist and that even in 'open' mitosis, remnants of the NE may help support the mitotic spindle.

      We have modified the sentence in which we discuss current opinions about ‘open’ and ‘closed’ mitosis. It is believed that there is no complete disassembly of the NE during open mitosis and no completely intact NE during closed mitosis, respectively. In fact, the NE plays a critical role in the different modes of mitosis during MTOC organisation and spindle dynamics. Please see the modified lines 64-71.

      Results

      Fig 7 is the final figure; but would be more useful upfront.

      We have provided a new introductory figure (Fig 1) showing a schematic of conventional /canonical LINC complexes and evidence of SUN protein functions in model eukaryotes and compare them to what is known in apicomplexans.

      Fig 1D. The authors generated a C-terminal GFP-tagged SUN1 transfectants and used ultrastructure expansion microscopy (U-ExM) and structured illumination microscopy (SIM) to examine SUN1-GFP in male gametocytes post-activation. The immuno-labelling of SUN1-GFP in these fixed cells appears very different to the live cell images of SUN1-GFP. The labelling profile comprises distinct punctate structures (particularly in the U-ExM images), suggesting that paraformaldehyde fixation process, followed by the addition of the primary and secondary antibodies has caused coalescing of the SUN1-GFP signal into particular regions within the NE.

      We agree with the reviewer. Fixation with paraformaldehyde (PFA) results in a coalescence of the SUN1-GFP signal. We have also tried methanol fixation (see new Fig. S2), but a similar problem was encountered.

      Given these fixation issues, the suggestion that the SUN1-GFP signal is concentrated at the BB/ nuclear MTOC and "enriched near spindle poles" needs further support.

      These statements seem at odd with the data for live cell imaging where the SUN1-GFP seems evenly distributed around the nuclear periphery. Can the observation be quantitated by calculating the percentage of BB/ nuclear MTOC structures with associated SUN1-GFP puncta? If not, I am not convinced these data help understand the molecular events.

      We agree with the reviewer that whilst the live cell imaging showed an even distribution of SUN1-GFP signal, after fixation with either PFA or methanol, then SUN1-GFP puncta are observed in addition to the peripheral location around the stained DNA (Hoechst) (See Fig. S2; puncta are indicated by arrows). These SUN1-GFP labelled puncta were observed at the junction of the nuclear MTOC and the basal body (Fig. 2F). Quantification of the distribution showed that these SUN1-GFP puncta are associated with nuclear MTOC in more than 90 % of cells (18 cells examined). Live cell imaging of the dual labelled parasites; SUN1xkinesin-8B (Fig. 2H) and SUN1x EB1 (Fig. 2I) provides further support for the association of SUN1-GFP puncta with BB (kinesin-8B) /nuclear MTOC (EB1).

      The authors then generated dual transfectants and examined the relative locations of different markers in live cells. These data are more informative.

      The authors state; " ..SUN1-GFP marked the NE with strong signals located near the nuclear MTOCs situated between the BB tetrads". The nuclear MTOCs are not labelled in this experiment. The SUN1-GFP signal between the kinesin-8B puncta is evident as small puncta on regions of NE distortion. I would prefer to not describe this signal as "strong". The signal is stronger in other regions of the NE.

      We have modified the sentence on line 213 to accommodate this suggestion.

      Line 219. The authors state; "..SUN1-GFP is partially colocalized with spindle poles as indicated by EB1,.. it shows no overlap with kinetochores (NDC80)." The authors should provide an analysis of the level of overlap at a pixel by pixel level to support this statement.

      We now provide the overlap at a pixel-by-pixel level for representative images, and we have quantified more cells (n>30), as documented in the new Fig. S4A. We have also modified the sentence on line 219 to reflect these additions.

      The SUN1 construct is C-terminally GFP-tagged. By analogy with human SUN1, the C-terminal SUN domain is expected to be in the NE lumen. That is in a different compartment to EB1, which is located in the nuclear lumen (on the spindle). Thus, the overlap of signal is expected to be minimal.

      We agree with the reviewer that the overlap between EB1 and Sun1 signals is expected to be minimal. We have quantified the data and included it in Supplementary Fig. S4A.

      Similarly, given that EB1 and NDC80 are known to occupy overlapping locations on the spindle, it seems unlikely that SUN1 can overlap with one and not the other.

      We agree with the reviewer’s analysis that EB1 and NDC80 occupy overlapping locations on the spindle, although the length of NDC80 is less at the ends of spindles (see Author response image 1A) as shown in our previous study where we compared the locations of two spindle proteins, ARK2 and EB1, with that of NDC80 (Zeeshan et al, 2022; PMID: 37704606). In the present study we observed that Sun1-GFP partially overlaps with EB1 at the ends of the spindle, but not with NDC80. Please see Author response image 1B.

      Author response image 1.

      I note on Line 609, the authors state "Our study demonstrates that SUN1 is primarily localized to the nuclear side of the NE.." As per Fig 7D, and as discussed above, the bulk of the protein, including the SUN1 domain, is located in the space between the INM and the ONM.

      We appreciate the reviewer’s correction; we have now modified the sentence to indicate that the protein is largely localized in the space between the INM and the ONM on line 617.

      Interestingly, as the authors point out, nuclear membrane loops are evident around EB1 and NDC80 focal regions. The data suggests that the contraction of the spindle pole back to the nuclear periphery induces distortion of the NE.

      We agree with the reviewer’s suggestion that the data indicate that contraction of spindle poles back to the nuclear periphery may induce distortion of the NE.

      The author should discuss further the overlap of findings of this study with that from a recent manuscript (https://doi.org/10.1016/j.cels.2024.10.008). That Sayers et al. study identified a complex of SUN1 and ALLC1 as essential for male fertility in P. berghei. Sayers et al. also provide evidence that this complex particulate in the linkage of the MTOC to the NE and is needed for correct mitotic spindle formation during male gametogenesis.

      We thank the reviewer for this suggestion. The study by Sayers et al, (2024) was published while our manuscript was under preparation. It was interesting to see that these complementary studies have similar findings about the role of SUN1 and the novel complex of SUN1-ALLAN. Our study contains a more detailed, in-depth analysis both by Expansion and TEM of SUN1. We include additional studies on the role of ALLAN.  We discuss the overlap in the findings of the two studies in lines 590-605.

      While the work is interesting, the conclusions may need to be tempered. The authors suggestion that in the absence of KASH-domain proteins, the SUN1-ALLAN complex forms a non-canonical LINC complex (that is, a connection across the NE), that "achieves precise nuclear and cytoskeletal coordination".

      We have toned down the wording of this conclusion in lines 665-677.

      In other organisms, KASH interacts with the C-terminal domain on SUN1, which as mentioned above is located between the INM and ONM. By contrast, ALLAN interacts with the N-terminal domain of SUN1, which is located in the nuclear lumen. The SUN1-ALLAN interaction is clearly of interest, and ALLAN might replace some of the roles of lamins. However, the protein that functionally replaces KASH (i.e. links SUN1 to the ONM) remains unidentified.

      We agree with reviewer, and future studies will need to focus on identifying the KASH replacement that links SUN1 to the ONM.

      It may also be premature to suggest that the SUN1-ALLAN complex is promising target for blocking malaria transmission. How would it be targeted?

      We have deleted the sentence that raised this suggestion.

      While the above datasets are interesting and internally consistent, there are two other aspects of the manuscript that need further development before they can usefully contribute to the molecular story.

      The authors undertook a transcriptomic analysis of Δsun1 and WT gametocytes, at 8 and 30 min post-activation, revealing moderate changes (~2-fold change) in different genes. GO-based analysis suggested up-regulation of genes involved in lipid metabolism. Given the modest changes, it may not be correct to conclude that "lipid metabolism and microtubule function may be critical functions for gametogenesis that can be perturbed by sun1 deletion." These changes may simply be a consequence of the stalled male gametocyte development.

      Following the reviewer’s suggestion we have moved these data to the supplementary information (Fig. S5D-I) and toned down their discussion in the results and discussion sections.

      The authors have then undertaken a detailed lipid analysis of the Δsun1 and WT gametocytes, before and after activation. Substantial changes in lipid metabolites might not be expected in such a short period of time. And indeed, the changes appear minimal. Similarly, there are only minor changes in a few lipid sub-classes between Δsun1 and WT gametocytes. In my opinion, the data are not sufficient to support the authors conclusion that "SUN1 plays a crucial role, linking lipid metabolism to NE remodelling and gamete formation."

      In agreement with the reviewer’s comments we have moved  these data to supplementary information (Fig. S6) and substantially toned down the conclusions based on these findings.

      Reviewer #3 (Evidence, reproducibility and clarity):

      Major comments:

      My main concern with this manuscript is that the authors do conclude not only that SUN1 is important for spindle formation and basal body segregation, but also that it influences for lipid metabolism and NE dynamics. I don't think the data supports this conclusion, for several reasons listed below. I would suggest to remove this claim from the manuscript or at least tone it down unless more supporting data are provided, in particular showing any change in NE dynamics in the SUN1-KO. Instead I would recommend to focus on the more interesting role of SUN1-ALLAN in bipartite MTOC organisation, which likely explains all observed phenotypes (including those in later stages of the parasite life cycle). In addition, some aspects of the knockout phenotype should be quantified to a bit deeper level.

      In more detail:

      - The lipidomics analysis is clearly the weakest point of the manuscript: The authors state that there are significant changes in some lipid populations between WT and sun1-KO, and between activated and non-activated cells, yet no statistical analysis is shown and the error bars are quite high compared to only minor changes in the means. For some discussed lipids, the result text does not match the graphs, e.g. PA, where the increase upon activation is more pronounced in the SUN1-KO vs WT (contrary to the text), or MAG, which is reduced in the SUN1-KO vs WT (contrary to the text). I don't see the discussed changes in arachidonic acid levels and myristic acid levels in the data either. Even if the authors find after analysis some statistically significant differences between some groups, they should carefully discuss the biological significance of these differences. As it is, I do not think the presented data warrants the conclusion that deletion of SUN1 changes lipid homeostasis, but rather shows that overall lipid homeostasis is not majorly affected by gametogenesis or SUN1 deletion. As a minor comment, if you decide to keep the lipidomics analysis in the manuscript, please state how many replicates were done.

      As detailed above we have moved the lipidomics data to supplementary information (Fig. S6) and substantially toned down the discussion of these data in the results and discussion sections.

      - I can't quite follow the logic why the authors performed transcriptomic analysis of the SUN1 and how they chose their time points. Their data up to this point indicate that SUN1 has a structural or coordinating role in the bipartite MTOC during male gametogenesis. Based on that it is rather unlikely that SUN1 KO directly leads to transcriptional changes within the 8 min of exflagellation. Isn't it more likely that transcriptional differences are purely a downstream effect of incomplete/failed gametogenesis? This is particularly true for the comparison at 30 min, which compares a mixture of exflagellated/emerged gametes and zygotes in WT to a mixture of aberrant, arrested gametes in the knockout, which will likely not give any meaningful insight. The by far most significant GO-term is then also nuclear-transcribed mRNA catabolic process, which is likely not related at all to SUN1 function (and the authors do not even comment on this in the main text). I would therefore suggest removing the 30 min data set from this manuscript. As a minor point, I would suggest highlighting some of the top de-regulated gene IDs in the volcano plots and stating their function. Also, please state how you prepared the cells for the transcriptomes and in how many replicates this was done.

      As suggested by the reviewer we have removed the 30 min post activation data from the manuscript. We have also moved the rest of the transcriptomics data to supplementary information (Fig. S5) and toned down the presentation of this aspect of the work in the results and discussion sections.

      - Live-cell imaging of SUN1-GFP does nicely visualise the NE during gametogenesis, showing a highly dynamic NE forming loops and folds, which is very exciting to see. It would be beneficial to also show a video from the life-cell imaging.

      We have now added videos to the manuscript as suggested by the reviewer. Please see the supplementary Videos S1 and S2.

      In their discussion, the authors state multiple times that NE dynamics are changed upon SUN1 KO. Yet, they do not provide data supporting this claim, i.e. that the extended loops and folds found in the nuclear envelope during gametogenesis are affected in any way by the knockout of SUN1 or ALLAN. What happens to the NE in absence of SUN1? Are there less loops and folds? In absence of a reliable NE marker this may not be entirely easy to address, but at least some SBF-SEM images of the sun1-KO gametocytes could provide insight.

      It was difficult to provide SBF-SEM images as that work is beyond the scope of this manuscript. We will consider this approach in our future work. We re-examined many of our TEM images of SUN1-KO and ALLAN-KO parasites and did find some micrographs showing aberrant nuclear membrane folding (<5%) (Please see Author response image 2). However, we also observed similar structures in some of the WT-GFP samples (<5%), so we do not think this is a strong phenotype of the SUN1 or ALLAN mutants.

      Author response image 2.

       

      - I think the exciting part of the manuscript is the cell biological role of SUN1 on male gametogenesis, which could be carved out a bit more by a more detailed phenotyping. Specifically it would be good to quantify

      (1) If DNA replication to an octoploid state still occurs in SUN1-KO and ALLAN-KO,

      DNA replication is not affected in the SUN1-KO and ALLAN-KO mutants: DNA content increases to 8N (data added in Fig. 3J and Fig. S10F).

      (2) The proportion of anucleated gametes in WT and the KO lines

      We have added these data in Fig. 3K and Fig. S10G

      (3) A quantification of the BB clustering phenotype (in which proportion of cells do the authors see this phenotype). This could be addressed by simple fixed immunofluorescence images of the respective WT/KO lines at various time points after activation (or possibly by reanalysis of the already obtained images) and would really improve the manuscript.

      We have reanalysed the BB clustering phenotype and added the quantitative data in Fig. 4E and Fig. S7.

      Especially the claim that emerged SUN1-KO gametes lack a nucleus is currently only based on single slices of few TEM cells and would benefit from a more thorough quantification in both SUN1- and ALLAN-Kos

      We have examined many microgametes (100+ sections). In WT parasites a small proportion of gametes can appear to lack a nucleus if it does not extend all the way to the apical and basal ends (Hair et al. 2022). However, the proportion of microgametes that appear to lack a nucleus (no nucleus seen in any section) was much higher in the SUN1 mutant. In contrast, this difference was not as clear cut in the ALLAN mutant with a small proportion of intact (with axoneme and nucleus) microgametes being observed.

      We have done additional analysis of male gametes, looking for the presence of the nucleus by live cell imaging after DNA staining with Hoechst. These data are added in Fig. 3K (for Sun1-KO) and Fig. S10G (for Allan-KO).

      - The TEM suggests that in the SUN1-KO, kinetochores are free in the nucleus. Are all kinetochores free or do some still associate to a (minor/incorrectly formed) spindle? The authors could address this by tagging NDC80 in the KO lines.

      Our observation and quantification of the data indicated that 100% of kinetochores were attached to spindle microtubules and that 0% were unattached kinetochores in the WT parasites. However, the exact opposite was found for the SUN1 mutant with 100% unattached kinetochores and 0% attached. The result was not quite as clear cut in the ALLAN mutant, with 98% unattached and 2% attached. An important observation was the lack of separation of the nuclear poles and any spindle formation. Spindle formation was never or very rarely observed in the mutants.

      - Finally, I think it is curious that in contrast to SUN1, ALLAN seems to be less important, with some KO parasite completing the life cycle. Maybe a more detailed phenotyping as above gives some more hints to where the phenotypic difference between the two proteins lies. I would assume some ALLAN-KO cells can still segregate the basal body. Can the authors speculate/discuss in more detail why these two proteins seems to have slightly different phenotypes?

      We agree with the reviewer. Overall, the ALLAN-KO has a less prominent phenotype than that of the Sun1-KO. The main difference is that in the ALLAN-KO mutant some basal body segregation can occur, leading to the production of some fertile microgametocytes, and ookinetes, and oocyst formation (Fig. 8). Approximately 5% of oocysts sporulated to release infective sporozoites that could infect mice in bite back experiments and complete the life cycle. In contrast the Sun1-KO mutant made no healthy oocysts, or infective sporozoites, and could not complete the life cycle in bite back experiments. We have analysed the phenotype in detail and provide quantitative data for gametocyte stages by EM and ExM in Figs. 4 and S8 (SUN1) and Figs. 7 and S11 (ALLAN). We have also performed detailed analysis of oocyst and sporozoite stages and included the data in Fig. 3 (SUN1) and S10 (ALLAN).

      Based on the location, and functional and interactome data, we think that SUN1 plays a central role in coordinating nucleoplasm and cytoplasmic events as a key component of the nuclear membrane lumen, whereas ALLAN is located in the nucleoplasm. Deleting the SUN1 gene may disrupt the connection between INM and ONM whereas the deletion of ALLAN may affect only the INM.

      Some additional points where the data is not entirely sound yet or could be improved:

      - Localisation of SUN1: There seems to be a discrepancy between SUN1-GFP location as observed by live cell microscopy, and by Expansion Microscopy (ExM), similar for ALLAN-GFP. By live-cell microscopy, the SUN1 localisation is much more evenly distributed around the NE, while the localisation in ExM is much more punctuated, and e.g. in Figure 1E seems to be within the nucleus. Do the authors have an explanation for this? Also, in Fig. 1D there are two GFP foci at the cell periphery (bottom left of the image), which I would think are not SUN1-Foci, as they seem to be outside of the cell. Is the antibody specific? Was there a negative control done for the antibody (WT cells stained with GFP antibodies after ExM)?

      High resolution SIM and expansion microscopy showed that the SUN1-GFP molecules coalesce to form puncta, in contrast to the more uniform distribution observed by live cell imaging. This apparent difference may be due to a better resolution that could not be achieved by live cell imaging. We agree with the reviewer that the two green foci are outside of the cell. As a negative control we have used WT-ANKA cells (which contain no GFP) and the anti-GFP antibody, which gave no signal. This confirms the specificity of the antibody (please see the new Fig. S3). 

      - The authors argue that SIM gave unexpected results due to PFA fixation leading to collapse of the NE loops. However, they also fix their ExM cells and their EM cells with PFA and do not observe a collapse, at least from what I see in the two presented images and in the 3D reconstruction. Is there something else different in the sample preparation?

      There was no difference in the fixation process for samples examined by SIM and ExM, but we used an anti-GFP antibody in ExM to visualise the SUN1-GFP, while in SIM the images of GFP signal were collected directly after fixation.  We used both PFA and methanol as fixative, and both methods showed a coalescing of the SUN1-GFP signal (please see the new Fig. S2 and S3).

      Can the authors trace their NE in ExM according to the NHS-Ester signal?

      We could trace the NE in the ExM by the NHS-ester signal and observed that the SUN1-GFP signal was largely coincident with the NE (Please see the new Fig. S3B).

      - Fig 2D: It would be good to not just show images of oocysts but actually quantify their size from images. Also, have the authors determined the sporozoite numbers in SUN1-KO?

      We have measured oocyst size (data added in new Fig. 3) and added the sporozoite quantification data in Fig. 3D.

      - Line 481-483: the authors state that oocyst size is reduced in ALLAN-KO but do not show the data. Please quantify oocyst size or at least show representative images. Also the drastic decrease in sporozoite numbers (Fig. 6D, E) is not mentioned in the text. Please add reference to Fig S7D when talking about the bite back data.

      We have added the oocyst size data in Fig. S10. We mention the changes in sporozoite numbers (now  shown in Fig. 7D, E), and refer to  the bite back data shown in current Fig. 7E.

      - Fig S1C, 6C: Both WB images are stitched, but this is not clearly indicated e.g. by leaving a small gap between the lanes. Also please show a loading control along with the western blots. Also there seems to be a (unspecific?) band in the control, running at the same height as Allan-GFP WB. What exactly is the control?

      We have provided the original blot showing the bands of ALLAN-GFP and SUN1-GFP. As a positive control, we used an RNA associated protein (RAP-GFP) that is highly expressed in Plasmodium and regularly used in our lab for this purpose.

      - Regarding the crossing experiment: The authors conclude from this cross that SUN1 is only needed in males, yet for this conclusion they would need to also show that a cross with a female line does not rescue the phenotype. The authors should repeat the cross with a male-deficient line to really test if the phenotype is an exclusively male phenotype. In addition, line 270-272 states that no oocysts/sporozoites were detected in sun1-ko and nek4-ko parasites. However, the figure 2E shows only oocysts, not sporozoites, and shows also that sun1-ko does form oocysts, albeit dead ones.

      We have now performed the experiment of crossing the Sun1-KO parasite line with a male deficient line (Hap2-KO) and added the data in Fig. 3I. We have added images showing sporozoites in oocysts.

      - In Fig S1 the authors show that they also generated a SUN1-mCherry line, yet they do not use it in any of the presented experiments (unless I missed it). Would it be beneficial to cross the SUN1-mCherry line with the Allan1-GFP line to test colocalisation (possibly also by expansion microscopy)?

      We did generate a SUN1-mCherry line, with the intent to cross ALLAN-GFP and SUN1-mCherry lines and observe the co-location of the proteins. Despite multiple attempts this cross was unsuccessful. This may have been due to their close proximity such that the addition of both GFP and mCherry was difficult to facilitate a proper protein-protein interaction between either of the proteins.

      - Line 498: "In a significant proportion of cells" - What was the proportion of cells, and what does significant mean in this context?

      Approximately 67% of cells showed the clumping of BBs. We have now added the numbers in Figs. 6H and S11I.

      - The authors should discuss a bit more how their work relates to the work of Sayers et al. 2024, which also identified the SUN1-ALLAN complex. The paper is cited, but only very briefly commented on.

      We have extended this discussion now in lines 590-605.

      Suggestions how to improve the writing and data presentation.

      - General presentation of microscopy images: Considering that large parts of the manuscript are based on microscopy data, their presentation could be improved. Single-channel microscopy images would benefit from being depicted in gray scale instead of color, which would make it easier to see the structures and intensities (especially for blue channels).

      Whilst we agree with the reviewer, sometimes it is difficult to see the features in the merged images. Therefore, we would like to request to be allowed to retain the colours, which can be easily followed in both individual and merged images.

      Also, it would be good to harmonize in which panels arrows are shown (e.g. Fig 1G, where some white arrows are in the SUN1-GFP panel, while others are in the merge panel, but they presumably indicate the same thing.). At the same time, Fig 1H doesn't have any with arrows, even though the figure legend states so.

      We apologise for this lack of consistency, and we have now added arrows wherever they are missing to harmonise in the presentations.

      Fig 3A and S4 show the same experiment but are coloured in different colours (NHS-Eester in green vs grey scale).

      - Are the scale bars of all expansion microscopy images adjusted for the expansion factor?

      Yes, the scale bars are adjusted accordingly.

      - The figure legends would benefit from streamlining, as they have very different style between figures (eg Fig. 6 which has a concise figure legend vs microscopy figures where figure legends are very long and describe not only the figure but the results)

      The figure legends have been streamlined, with removal of the description of results.

      - Line 155-156: The text makes it sound like the expression only happens after activation. is that the case? Are these images activated or non-activated gametocytes?

      They are expressed before activation, but the signal intensifies after activation. Images from before and after activation of gametocytes have been added in Fig. S1F.

      - Line 267: Reference to the original nek4-KO paper missing

      This reference is now included.

      - Line 301: The reference to Figure 2J seems to be a bit arbitrarily placed. Also, this schematic of lipid metabolism is never discussed in relation to the transcriptomic or lipidomic data.

      We have moved these data to supplementary information and modified the text.

      - Line 347-349 states that gametes emerged, but the referenced figure shows activated gametocytes before exflagellation.

      We have corrected the text to the start of exflagellation.

      - Line 588: Spelling mistake in SUN1-domain

      Corrected.

      - Line 726/731: i missing in anti-GFP

      Corrected.

      - Line 787-789: statement of scale bar and number of cells imaged is not at the right position in the figure legend.

      Moved to right place

      - Line 779, 783: "shades of green" should be just "green". Same goes for line 986, 989 with "shades of grey"

      Changed.

      - Line 974, 976: please correct to WT-GFP and dsun1

      Corrected.

      - Line 1041, 1044: WT-GFP instead of WTGFP.

      Corrected to WT-GFP.

      - Fig 1B, D, E, Fig S1G, H: What are the time points of imaging?

      We have added the time points to the images in these figures.

      - Fig 1D/Line 727: the scale of the scale bar on the inset is missing.

      We have added the scale bar.

      - Fig 3 E-G and 6H-J: Please indicate total number of cells/images analysed per quantification, either in the graphs themselves or in the figure legend.

      We indicate now the number of cells analysed in individual figures and also in Fig. S5C and S8C, respectively.

      - Fig 5B: What is NP

      Nuclear Pole (NP), also known as the nuclear/acentriolar MTOC (Zeeshan et al 2022; PMID: 35550346).

      - Fig S1B/D: The legend states that there is an arrow indicating the band, but there is none.

      We have added the arrow.

      - Fig S2C: Is the scale bar really the same for the zygote and the ookinete?

      We have checked this and used the same for both zygote and ookinete.

      - Fig S3C, S7C: which stages was qRT-PCR done on?

      Gametocytes activated for 8 min.

      - Fig. S3D, S7D: According to the figure legend, three independent experiments were performed. How many mice were used per experiment? It would be good to depict the individual data points instead of the bar graph. For S7D, 3 data points are depicted (one in WT, two in allan-KO), what do they mean?

      The bite back experiment was performed using 15-20 mosquitoes infected with WT-GFP and gene knockout lines to feed on one naïve mouse each, in three different experiments. We have now included the data points in the bar diagrams.

      - Fig S3: Panel letters E and G are missing

      We have updated the lettering in current Fig. S5

      - Fig 3D: Please indicate what those boxes are. I presume that these are the insets show in b, e and j, but it is never mentioned. J is not even larger than i. Also, f is quite cropped, it would be good to see the large-scale image it comes from to see where in the nucleus these kinetochores are placed. Were there unbound kinetochores found in WT?

      We mention the boxes in the figure legends. It is rare to find unbound kinetochores in WT parasite. We provide large scale and zoomed-in images of free kinetochores in Fig. S8.

      - Fig S4: Insets are not mentioned in the figure legend. Please add scale bar to zoom-ins

      We now describe the insets in the figure legends and have added scale bars to the zoomed-in images.

      - Fig S5A, B: Please indicate which inset belongs to which sub-panel. Where does Ac stem from?

      We have now included the full image showing the inset (new Fig. S8).

      - Fig S5C and S8C: Change "DNA" to "Nucleus".

      We have changed “DNA” to “Nucleus”. Now they are Fig. S8K and S11I.

      Reviewer #3 (Significance):

      Yet, the statement that SUN1 is also important for lipid homoeostasis and NE dynamics is currently not backed up by sufficient data. I believe that the manuscript would benefit from removing the less convincing transcriptomic and lipidomic datasets and rather focus on more deeply characterising the cell biology of the knockouts. This way, the results would be interesting not only for parasitologists, but also for more general cell biologists.

      We have moved the lipidomics and transcriptomics data to supplementary information and toned down the emphasis on these data to make the manuscript more focused on the cell biology and analysis of the genetic KO data.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This paper contains what could be described as a "classic" approach towards evaluating a novel taste stimuli in an animal model, including standard behavioral tests (some with nerve transections), taste nerve physiology, and immunocytochemistry of taste cells of the tongue. The stimulus being tested is ornithine, from a class of stimuli called "kokumi" (in terms of human taste); these kokumi stimuli appear to enhance other canonical tastes, increasing what are essentially hedonic attributes of other stimuli. The mechanism for ornithine detection is thought to be GPRC6A receptors expressed in taste cells. The authors showed evidence for this in an earlier paper with mice; this paper evaluates ornithine taste in a rat model, and comes to a similar conclusion, albeit with some small differences between the two rodent species.

      Strengths:

      The data show effects of ornithine on taste/intake in laboratory rats: In two-bottle and briefer intake tests, adding ornithine results in higher intake of most, but all not all stimuli tested. Bilateral chorda tympani (CT) nerve cuts or the addition of GPRC6A antagonists decreased or eliminated these effects. Ornithine also evoked responses by itself in the CT nerve, but mainly at higher concentrations; at lower concentrations it potentiated the response to monosodium glutamate. Finally, immunocytochemistry of taste cell expression indicated that GPRC6A was expressed predominantly in the anterior tongue, and co-localized (to a small extent) with only IP3R3, indicative of expression in a subset of type II taste receptor cells.

      Weaknesses:

      As the authors are aware, it is difficult to assess a complex human taste with complex attributes, such as kokumi, in an animal model. In these experiments they attempt to uncover mechanistic insights about how ornithine potentiates other stimuli by using a variety of established experimental approaches in rats. They partially succeed by finding evidence that GPRC6A may mediate effects of ornithine when it is used at lower concentrations. In the revision they have scaled back their interpretations accordingly. A supplementary experiment measuring certain aspects of the effects of ornithine added to Miso soup in human subjects is included for the express purpose of establishing that the kokumi sensation of a complex solution is enhanced by ornithine; however, they do not use any such complex solutions in the rat studies. Moreover, the sample size of the human experiment is (still) small - it really doesn't belong in the same manuscript with the rat studies.

      Despite the reviewer’s suggestion, we would like to include the human sensory experiment. Our rationale is that we must first demonstrate that the kokumi of miso soup is enhanced by the addition of ornithine, which is then followed by basic animal experiments to investigate the underlying mechanisms of kokumi in humans.

      We did not present the additive effects of ornithine on miso soup in the present rat study because our previous companion paper (Fig. 1B in Mizuta et al., 2021, Ref. #26) already confirmed that miso soup supplemented with 3 mM L-ornithine (but not D-ornithine) was statistically significantly (P < 0.001) preferred to plain miso soup by mice.

      Furthermore, we believe that our sample size (n = 22) is comparable to those employed in other studies. For example, the representative kokumi studies by Ohsu et al. (Ref. #9), Ueda et al. (Ref. #10), Shibata et al. (Ref. #20), Dunkel et al. (Ref. #37), and Yang et al. (Ref. #44) used sample sizes of 20, 19, 17, 9, and 15, respectively.

      Reviewer #2 (Public review):

      Summary:

      The authors used rats to determine the receptor for a food-related perception (kokumi) that has been characterized in humans. They employ a combination of behavioral, electrophysiological, and immunohistochemical results to support their conclusion that ornithine-mediated kokumi effects are mediated by the GPRC6A receptor. They complemented the rat data with some human psychophysical data. I find the results intriguing, but believe that the authors overinterpret their data.

      Strengths:

      The authors provide compelling evidence that ornithine enhances the palatability of several chemical stimuli (i.e., IMP, MSG, MPG, Intralipos, sucrose, NaCl, quinine). Ornithine also increases CT nerve responses to MSG. Additionally, the authors provide evidence that the effects of ornithine are mediated by GPRC6A, a G-protein-coupled receptor family C group 6 subtype A, and that this receptor is expressed primarily in fungiform taste buds. Taken together, these results indicate that ornithine enhances the palatability of multiple taste stimuli in rats and that the enhancement is mediated, at least in part, within fungiform taste buds. This is an important finding that could stand on its own. The question of whether ornithine produces these effects by eliciting kokumi-like perceptions (see below) should be presented as speculation in the Discussion section.

      Weaknesses:

      I am still unconvinced that the measurements in rats reflect the "kokumi" taste percept described in humans. The authors conducted long-term preference tests, 10-min avidity tests and whole chorda tympani (CT) nerve recordings. None of these procedures specifically model features of "kokumi" perception in humans, which (according to the authors) include increasing "intensity of whole complex tastes (rich flavor with complex tastes), mouthfulness (spread of taste and flavor throughout the oral cavity), and persistence of taste (lingering flavor)." While it may be possible to develop behavioral assays in rats (or mice) that effectively model kokumi taste perception in humans, the authors have not made any effort to do so. As a result, I do not think that the rat data provide support for the main conclusion of the study--that "ornithine is a kokumi substance and GPRC6A is a novel kokumi receptor."

      Kokumi can be assessed in humans, as demonstrated by the enhanced kokumi perception observed when miso soup is supplemented with ornithine (Fig. S1). Currently, we do not have a method to measure the same kokumi perception in animals. However, in the two-bottle preference test, our previous companion paper (Fig. 1B in Mizuta et al. 2021, Ref. #26) confirmed that miso soup supplemented with 3 mM L-ornithine (but not D-ornithine) was statistically significantly (P < 0.001) preferred over plain miso soup by mice.

      Of the three attributes of kokumi perception in humans, the “intensity of whole complex tastes (rich flavor with complex tastes)” was partly demonstrated in the present rat study. In contrast, “mouthfulness (the spread of taste and flavor throughout the oral cavity)” could not be directly detected in animals and had to be inferred in the Discussion. “Persistence of taste (lingering flavor)” was evident at least in the chorda tympani responses; however, because the tongue was rinsed 30 seconds after the onset of stimulation, the duration of the response was not fully recorded.

      It is well accepted in sensory physiology that the stronger the stimulus, the larger the tonic response—and consequently, the longer it takes for the response to return to baseline. For example, Kawasaki et al. (2016, Ref. #45) clearly showed that the duration of sensation increased proportionally with the concentration of MSG, lactic acid, and NaCl in human sensory tests. The essence of this explanation has been incorporated into the Discussion (p. 12).

      Why are the authors hypothesizing that the primary impacts of ornithine are on the peripheral taste system? While the CT recordings provide support for peripheral taste enhancement, they do not rule out the possibility of additional central enhancement. Indeed, based on the definition of human kokumi described above, it is likely that the effects of kokumi stimuli in humans are mediated at least in part by the central flavor system.

      We agree with the reviewer’s comment. Our CT recordings indicate that the effects of kokumi stimuli on taste enhancement occur primarily at the peripheral taste organs. The resulting sensory signals are then transmitted to the brain, where they are processed by the central gustatory and flavor systems, ultimately giving rise to kokumi attributes. This central involvement in kokumi perception is discussed on page 12. Although kokumi substances exert their effects at low concentrations—levels at which the substance itself (e.g., ornithine) does not become more favorable or (in the case of γ-Glu-Val-Gly) exhibits no distinct taste—we cannot rule out the possibility that even faint taste signals from these substances are transmitted to the brain and interact with other taste modalities.

      The authors include (in the supplemental data section) a pilot study that examined the impact of ornithine on variety of subjective measures of flavor perception in humans. The presence of this pilot study within the larger rat study does not really mice sense. While I agree with the authors that there is value in conducting parallel tests in both humans and rodents, I think that this can only be done effectively when the measurements in both species are the same. For this reason, I recommend that the human data be published in a separate article.

      Despite the reviewer’s suggestion, we intend to include the human sensory experiment. Our rationale is that we must first demonstrate that the kokumi of miso soup is enhanced by the addition of ornithine, and then follow up with basic animal experiments to investigate the potential underlying mechanisms of kokumi in humans.

      In our previous companion paper (Fig. 1B in Mizuta et al., 2021, Ref. #26), we confirmed with statistical significance (P < 0.001) that mice preferred miso soup supplemented with 3 mM L-ornithine (but not D-ornithine) over plain miso soup. However, as explained in our response to Reviewer #2’s first concern (in the Public review), it is difficult to measure two of the three kokumi attributes—aside from the “intensity of whole complex tastes (rich flavor with complex tastes)”—in animal models.

      The authors indicated on several occasions (e.g., see Abstract) that ornithine produced "synergistic" effects on the CT nerve response to chemical stimuli. "Synergy" is used to describe a situation where two stimuli produce an effect that is greater than the sum of the response to each stimulus alone (i.e., 2 + 2 = 5). As far as I can tell, the CT recordings in Fig. 3 do not reflect a synergism.

      We appreciate your comments regarding the definition of synergy. In Fig. 5 (not Fig. 3), please note the difference in the scaling of the ordinate between Fig. 5D (ornithine responses) and Fig. 5E (MSG responses). When both responses are presented on the same scale, it becomes evident that the response to 1 mM ornithine is negligibly small compared to the MSG response, which clearly indicates that the response to the mixture of MSG and 1 mM ornithine exceeds the sum of the individual responses to MSG and 1 mM ornithine. Therefore, we have described the effect as “synergistic” rather than “additive.” The same observation applies to the mice experiments in our previous companion paper (Fig. 8 in Mizuta et al. 2021, Ref. #26), where synergistic effects are similarly demonstrated by graphical representation. We have also added the following sentence to the legend of Fig. 5:

      “Note the different scaling of the ordinate in (D) and (E).”

      Reviewer #3 (Public review):

      Summary:

      In this study the authors set out to investigate whether GPRC6A mediates kokumi taste initiated by the amino acid L-ornithine. They used Wistar rats, a standard laboratory strain, as the primary model and also performed an informative taste test in humans, in which miso soup was supplemented with various concentrations of L-ornithine. The findings are valuable and overall the evidence is solid. L-Ornithine should be considered to be a useful test substance in future studies of kokumi taste and the class C G protein coupled receptor known as GPRC6A (C6A) along with its homolog, the calcium-sensing receptor (CaSR) should be considered candidate mediators of kokumi taste. The researchers confirmed in rats their previous work on Ornithine and C6A in mice (Mizuta et al Nutrients 2021).

      Strengths:

      The overall experimental design is solid based on two bottle preference tests in rats. After determining the optimal concentration for L-Ornithine (1 mM) in the presence of MSG, it was added to various tastants including: inosine 5'-monophosphate; monosodium glutamate (MSG); mono-potassium glutamate (MPG); intralipos (a soybean oil emulsion); sucrose; sodium chloride (NaCl; salt); citric acid (sour) and quinine hydrochloride (bitter). Robust effects of ornithine were observed in the cases of IMP, MSG, MPG and sucrose; and little or no effects were observed in the cases of sodium chloride, citric acid; quinine HCl. The researchers then focused on the preference for Ornithine-containing MSG solutions. Inclusion of the C6A inhibitors Calindol (0.3 mM but not 0.06 mM) or the gallate derivative EGCG (0.1 mM but not 0.03 mM) eliminated the preference for solutions that contained Ornithine in addition to MSG. The researchers next performed transections of the chord tympani nerves (with sham operation controls) in anesthetized rats to identify a role of the chorda tympani branches of the facial nerves (cranial nerve VII) in the preference for Ornithine-containing MSG solutions. This finding implicates the anterior half-two thirds of the tongue in ornithine-induced kokumi taste. They then used electrical recordings from intact chorda tympani nerves in anesthetized rats to demonstrate that ornithine enhanced MSG-induced responses following the application of tastants to the anterior surface of the tongue. They went on to show that this enhanced response was insensitive to amiloride, selected to inhibit 'salt tastant' responses mediated by the epithelial Na+ channel, but eliminated by Calindol. Finally they performed immunohistochemistry on sections of rat tongue demonstrating C6A positive spindle-shaped cells in fungiform papillae that partially overlapped in its distribution with the IP3 type-3 receptor, used as a marker of Type-II cells, but not with (i) gustducin, the G protein partner of Tas1 receptors (T1Rs), used as a marker of a subset of type-II cells; or (ii) 5-HT (serotonin) and Synaptosome-associated protein 25 kDa (SNAP-25) used as markers of Type-III cells.

      At least two other receptors in addition to C6A might mediate taste responses to ornithine: (i) the CaSR, which binds and responds to multiple L-amino acids (Conigrave et al, PNAS 2000), and which has been previously reported to mediate kokumi taste (Ohsu et al., JBC 2010) as well as responses to Ornithine (Shin et al., Cell Signaling 2020); and (ii) T1R1/T1R3 heterodimers which also respond to L-amino acids and exhibit enhanced responses to IMP (Nelson et al., Nature 2001). These alternatives are appropriately discussed and, taken together, the experimental results favor the authors' interpretation that C6A mediates the Ornithine responses. The authors provide preliminary data in Suppl. 3 for the possibility of co-expression of C6A with the CaSR.

      Weaknesses:

      The authors point out that animal models pose some difficulties of interpretation in studies of taste and raise the possibility in the Discussion that umami substances may enhance the taste response to ornithine (Line 271, Page 9).

      Ornithine and umami substances interact to produce synergistic effects in both directions—ornithine enhances responses to umami substances, and vice versa. These effects may depend on the concentrations used, as described in the Discussion (pp. 9–10). Further studies are required to clarify the precise nature of this interaction.

      One issue that is not addressed, and could be usefully addressed in the Discussion, relates to the potential effects of kokumi substances on the threshold concentrations of key tastants such as glutamate. Thus, an extension of taste distribution to additional areas of the mouth (previously referred to as 'mouthfulness') and persistence of taste/flavor responses (previously referred to as 'continuity') could arise from a reduction in the threshold concentrations of umami and other substances that evoke taste responses.

      Thank you for this important suggestion. If ornithine reduces the threshold concentrations of tastants—including glutamate—and enhances their suprathreshold responses, then adding ornithine may activate additional taste cells. This effect could explain kokumi attributes such as an “extension of taste distribution” and possibly the “persistence of responses.” As shown in Fig. 2, the lowest concentrations used for each taste stimulus are near or below the thresholds, which indicates that threshold concentrations are reduced—especially for MSG and MPG. We have incorporated this possibility into the Discussion as follows (p.12):

      “Kokumi substances may reduce the threshold concentrations as well as they increase the suprathreshold responses of tastants. Once the threshold concentrations are lowered, additional taste cells in the oral cavity become activated, and this information is transmitted to the brain. As a result, the brain perceives this input as coming from a wider area of the mouth.”

      The status of one of the compounds used as an inhibitor of C6A, the gallate derivative EGCG, as a potential inhibitor of the CaSR or T1R1/T1R3 is unknown. It would have been helpful to show that a specific inhibitor of the CaSR failed to block the ornithine response.

      Thank you for this important comment. We attempted to identify a specific inhibitor of CaSR. Although we considered using NPS-2143—a commonly used CaSR inhibitor—it is known to also inhibit GPRC6A. We agree that using a specific CaSR inhibitor would be beneficial and plan to pursue this in future studies.

      It would have been helpful to include a positive control kokumi substance in the two bottle preference experiment (e.g., one of the known gamma glutamyl peptides such as gamma-glu-Val-Gly or glutathione), to compare the relative potencies of the control kokumi compound and Ornithine, and to compare the sensitivities of the two responses to C6A and CaSR inhibitors.

      We agree with this comment. In retrospect, it may have been advantageous to directly compare the potencies of CaSR and GPRC6A agonists in enhancing taste preferences—and to evaluate the sensitivity of these preferences to CaSR and GPRC6A antagonists. However, we did not include γ-Glu-Val-Gly in the present study because we have already reported its supplementation effects on the ingestion of basic taste solutions in rats using the same methodology in a separate paper (Yamamoto and Mizuta, 2022, Ref. #25). The results from both studies are compared in the Discussion (p. 11).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Major:

      I am not convinced by the Author's arguments for including the human data. I appreciate their efforts in adding a few (5) subjects and improving the description, but it still feels like it is shoehorned into this paper, and would be better published as a different manuscript.

      This human study is short, but it is complete rather than preliminary. The rationale for us to include the human data as supplementary information is shown in responses to the reviewer’s Public review.

      Minor concerns:

      Page 3 paragraph 1: Suggest "contributing to palatability".

      Thank you for this suggestion. We have rewritten the text as follows:

      “…, the brain further processes these sensations to evoke emotional responses, contributing to palatability or unpleasantness”.

      Page 4 paragraph 2: The text still assumes that "kokumi" is a meaningful descriptor for what rodents experience. Re-wording the following sentence like this could help:

      "Neuroscientific studies in mice and rats provide evidence that gluthione and y-Glu-Val-Gly activate CaSRs, and modify behavioral responses to other tastants in a way that may correspond to kokumi taste as experienced by humans. However, to our..."

      Or something similar.

      Thank you for this suggestion. We have rewritten the sentence according to your suggestion as follows:

      "Neuroscientific studies (23,25,30) in mice and rats provide evidence that glutathione and y-Glu-Val-Gly activate CaSRs, and modify behavioral responses to other tastants in a way that may correspond to kokumi as experienced by humans”.

      Page 7 paragraph 1 - put the concentrations of Calindol and EGCG used (in the physiology exps) in the text.

      We have added the concentrations: “300 µM calindol and 100 µM EGCG”.

      Reviewer #2 (Recommendations for the authors):

      I have included all of my recommendations in the public review section.

      Reviewer #3 (Recommendations for the authors):

      Although the definitions of 'thickness', 'mouthfulness' and 'continuity' have been revised very helpfully in the Introduction, 'mouthfulness' reappears at other points in the MS e.g., Page 4, Results, Line 3; Page 9, Line 3. It is best replaced by the new definition in these other locations too.

      We wish to clarify that our revised text stated, “…to clarify that kokumi attributes are inherently gustatory, in the present study we use the terms ‘intensity of whole complex tastes (rich flavor with complex tastes)’ instead of ‘thickness,’ ‘mouthfulness (spread of taste and flavor throughout the oral cavity)’ instead of ‘continuity,’ and ‘persistence of taste (lingering flavor)’ instead of ‘continuity.’” The term “mouthfulness” was retained in our text, though we provided a more specific explanation. In the re-revised version, we have added “(spread of taste in the oral cavity)” immediately after “mouthfulness.”

      I doubt that many scientific readers will be familliar with the term 'intragemmal nerve fibres' (Page 8, Line 4). It is used appropriately but it would be helpful to briefly define/explain it.

      We have added an explanation as follows:

      “… intragemmal nerve fibers, which are nerve processes that extend directly into the structure of the taste bud to transmit taste signals from taste cells to the brain.”

      I previously pointed out the overlap between the CaSR's amino acid (AA) and gamma-glutamyl-peptide binding site. I was surprised by the authors' response which appeared to miss the point being made. It was based on the impacts of selected mutations in the receptor's Venus FlyTrap domain (Broadhead JBC 2011) on the responses to AAs and glutathione analogs. The significantly more active analog, S-methylglutathione is of additional interest because, like glutathione itself, it is present in mammalian body fluids. My apologies to the authors for not more carefully explaining this point.

      Thank you for this comment. Both CaSR and GPRC6A are recognized as broad-spectrum amino acid sensors; however, their agonist profiles differ. Aromatic amino acids preferentially activate CaSR, whereas basic amino acids tend to activate GPRC6A. For instance, among basic amino acids, ornithine is a potent and specific activator of GPRC6A, while γ-Glu-Val-Gly in addition to amino acids is a high-potency activator of CaSR. It remains unclear how effectively ornithine activates CaSR and whether γ-glutamyl peptides also activate GPRC6A. These questions should be addressed in future studies.

    1. Author response:

      The following is the authors’ response to the original reviews

      eLife Assessment

      This valuable study uses consensus-independent component analysis to highlight transcriptional components (TC) in high-grade serous ovarian cancers (HGSOC). The study presents a convincing preliminary finding by identifying a TC linked to synaptic signaling that is associated with shorter overall survival in HGSOC patients, highlighting the potential role of neuronal interactions in the tumour microenvironment. This finding is corroborated by comparing spatially resolved transcriptomics in a small-scale study; a weakness is in being descriptive, non-mechanistic, and requiring experimental validation.”

      We sincerely thank the editors for their valuable and constructive feedback. We are grateful for the recognition of our findings and the importance of identifying transcriptional components in high-grade serous ovarian cancers.

      We acknowledge the editors’ observation regarding the descriptive nature of our study and its limited mechanistic depth. We agree that additional experimental validation would further strengthen our conclusions. We are planning and executing the experiments for a future study to provide mechanistic insights into the associations found in this study. In addition, recent reviews focused on the emerging field of cancer neuroscience emphasize the early stages the field is in, specifically in terms of a mechanistic understanding of the contributions of tumor-infiltrating nerves in tumor initiation and progression (Amit et al., 2024; Hwang et al., 2024). Nonetheless, we wish to emphasize that emerging mechanistic preclinical studies have demonstrated the influence of tumour-infiltrating nerves on disease progression (Allen et al., 2018; Balood et al., 2022; Darragh et al., 2024; Globig et al., 2023; Jin et al., 2022; Restaino et al., 2023; Zahalka et al., 2017). Several of these studies include contributions from our co-authors and feature in vitro and in vivo research on head and neck squamous cell carcinoma as well as high-grade serous ovarian carcinoma samples. This study further strengthens the preclinical work by showing in patient data, the potential relevance of neuronal signaling on disease outcome.

      For instance, Restiano et al. (2023) demonstrated that substance P, released from tumour-infiltrating nociceptors, potentiates MAP kinase signaling in cancer cells, thereby driving disease progression. Crucially, this effect was shown to be reversible in vivo by blocking the substance P receptor (Restaino et al., 2023). These findings offer compelling evidence of the role of tumour innervation in cancer biology.

      Our current study in tumor samples of patients with high-grade serous ovarian cancer identifies a transcriptional component that is enriched for genes for which the protein is located in the synapse. We believe that the previously published mechanistic insights support our findings and suggest that this transcriptional component could serve as a valuable screening tool to identify innervated tumours based on bulk transcriptomes. Clinically, this information is highly relevant, as patients with innervated tumours may benefit from alternate therapeutic strategies targeting these innervations.

      Reviewer #1 (Public review)

      This manuscript explores the transcriptional landscape of high-grade serous ovarian cancer (HGSOC) using consensus-independent component analysis (c-ICA) to identify transcriptional components (TCs) associated with patient outcomes. The study analyzes 678 HGSOC transcriptomes, supplemented with 447 transcriptomes from other ovarian cancer types and noncancerous tissues. By identifying 374 TCs, the authors aim to uncover subtle transcriptional patterns that could serve as novel drug targets. Notably, a transcriptional component linked to synaptic signaling was associated with shorter overall survival (OS) in patients, suggesting a potential role for neuronal interactions in the tumour microenvironment. Given notable weaknesses like lack of validation cohort or validation using another platform (other than the 11 samples with ST), the data is considered highly descriptive and preliminary.

      Strengths:

      (1) Innovative Methodology:

      The use of c-ICA to dissect bulk transcriptomes into independent components is a novel approach that allows for the identification of subtle transcriptional patterns that may be overshadowed in traditional analyses.

      We thank the reviewer for recognizing the strengths and novelty of our study. We appreciate the positive feedback on using consensus-independent component analysis (c-ICA) to decompose bulk transcriptomes, which allowed us to detect subtle transcriptional signals often overlooked in traditional analyses.

      (2) Comprehensive Data Integration:

      The study integrates a large dataset from multiple public repositories, enhancing the robustness of the findings. The inclusion of spatially resolved transcriptomes adds a valuable dimension to the analysis.

      We thank the reviewer for recognizing the robustness of our study through comprehensive data integration. We appreciate the acknowledgment of our efforts to leverage a large, multi-source dataset, as well as the additional insights gained from spatially resolved transcriptomes. We consider this integrative approach enhances the depth of our analysis and contributes to a more nuanced understanding of the tumour microenvironment.

      (3) Clinical Relevance:

      The identification of a synaptic signaling-related TC associated with poor prognosis highlights a potential new avenue for therapeutic intervention, emphasizing the role of the tumour microenvironment in cancer progression.

      We appreciate the recognition of the clinical implications of our findings. The identification of a synaptic signaling-related transcriptional component associated with poor prognosis underscores the potential for novel therapeutic targets within the tumour microenvironment. We agree that this insight could open new avenues for intervention and further highlights the role of neuronal interactions in cancer progression.

      Weaknesses:

      (1) Mechanistic Insights:

      While the study identifies TCs associated with survival, it provides limited mechanistic insights into how these components influence cancer progression. Further experimental validation is necessary to elucidate the underlying biological processes.

      We acknowledge the point regarding the limited mechanistic insights provided in our study. We agree that further experimental validation would significantly enhance our understanding of how the biological processes captured by these transcriptional components influence cancer progression. We are planning and executing the experiments for  a future study to provide mechanistic insights into the associations found in this study.

      Our analyses were performed on publicly available bulk and spatial resolved expression profiles. To investigate the mechanistic insights in future studies, we plan to integrate spatial transcriptomic data with immunohistochemical analysis of the same tumour samples to validate our findings. Additionally, we have initiated efforts to set up in vitro co-cultures of neurons and ovarian cancer cells. These co-cultures will enable us to investigate how synaptic signaling impacts ovarian cancer cell behavior.

      (2) Generalizability:

      The findings are primarily based on transcriptomic data from HGSOC. It remains unclear how these results apply to other subtypes of ovarian cancer or different cancer types.

      To respond to this remark, we utilized survival data from Bolton et al. (2022) and TCGA to investigate associations between TC activity scores and overall survival of patients with ovarian clear cell carcinoma, the second most common subtype of epithelial ovarian cancer, and  other cancer types respectively. However, we acknowledge the limitations of TCGA survival data, as highlighted in the referenced article (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8726696/). Additionally, as shown in Figure 5, we provided evidence of TC121 activity across various cancer types, suggesting broader relevance. For the results of the analyses mentioned above, please refer to our response to remark 1.3 of the recommendation section (page 4).

      (3) Innovative Methodology:

      Requires more validation using different platforms (IHC) to validate the performance of this bulk-derived data. Also, the lack of control over data quality is a concern.

      We acknowledge the value of validating our results with alternative platforms such as IHC. We are planning and executing the experiments for a future study to provide mechanistic insights into the associations found in this study.

      We implemented regarding data quality control, the following measures to ensure the reliability of our analysis:

      Bulk Transcriptional Profiles: To assess data quality, we conducted principal component analysis (PCA) on the sample Pearson product-moment correlation matrix. The first principal component (PCqc), which explains approximately 80-90% of the variance, was used to distinguish technical variability from biological signals (Bhattacharya et al., 2020). Samples with a correlation coefficient below 0.8 relative to PCqc were identified as outliers and excluded. Additionally, MD5 hash values were generated for each CEL file to identify and remove duplicate samples. Expression values were standardized to a mean of zero and a variance of one for each gene to minimize probeset- or gene-specific variability across datasets (GEO, CCLE, GDSC, and TCGA).

      Spatial Transcriptional Profiles: PCA was also applied to spatial transcriptomic data for quality control. Only samples with consistent loading factor signs for the first principal component across all individual spot profiles were retained. Samples failing this criterion were excluded from further analyses.

      (4) Clinical Application:

      Although the study suggests potential drug targets, the translation of these findings into clinical practice is not addressed. Probably given the lack of some QA/QC procedures it'll be hard to translate these results. Future studies should focus on validating these targets in clinical settings.”

      Regarding clinical applications, we acknowledge the importance of further exploring strategies targeting synaptic signaling and neurotransmitter release in the tumour microenvironment (TME). As partially discussed in the first version of the manuscript, drugs such as ifenprodil and lamotrigine—commonly used to treat neuronal disorders—can block glutamate release, thereby inhibiting subsequent synaptic signaling. Additionally, the vesicular monoamine transporter (VMAT) inhibitor reserpine blocks the formation of synaptic vesicles (Reid et al., 2013; Williams et al., 2001). Previous in vitro studies with HGSOC cell lines demonstrated that ifenprodil significantly reduced cancer cell proliferation, while reserpine triggered apoptosis in cancer cells (North et al., 2015; Ramamoorthy et al., 2019). The findings highlight the potential of such approaches to disrupt synaptic neurotransmission in the TME.

      To address potential translation of our findings into clinical practice more comprehensively, we have included additional details in the manuscript:

      Section discussion, page 16, lines 338-341:

      “This interaction can be targeted with pan-TRK inhibitors such as entrectinib and larotrectinib. Both drugs are showing promising results in multiple phase II trials, including ovarian cancer and breast cancer patients. Furthermore, a TRKB-specific inhibitor was developed (ANA-12), but has not been subjected to any clinical trials in cancer so far (Ardini et al., 2016; Burris et al., 2015; Drilon et al., 2018, 2017).”

      On page 17, lines 361-374:

      “Strategies to disrupt neuronal signaling and neurotransmitter release in neurons target key elements of excitatory neurotransmission, such as calcium flux and vesicle formation. Drugs like ifenprodil and lamotrigine, commonly used to treat neuronal disorders, block glutamate release and subsequent neuronal signaling. Additionally, the vesicular monoamine transporter (VMAT) inhibitor reserpine prevents synaptic vesicle formation (Reid et al., 2013; Williams, 2001). In vitro studies with HGSOC cell lines have demonstrated that ifenprodil significantly inhibits tumour proliferation, while reserpine induces apoptosis in cancer cells (North et al., 2015; Ramamoorthy et al., 2019). These approaches hold promise for inhibiting neuronal signaling and interactions in the TME.”

      Reviewer #2 (Public review):

      Summary:

      Consensus-independent component analysis and closely related methods have previously been used to reveal components of transcriptomic data that are not captured by principal component or gene-gene coexpression analyses.

      Here, the authors asked whether applying consensus-independent component analysis (c-ICA) to published high-grade serous ovarian cancer (HGSOC) microarray-based transcriptomes would reveal subtle transcriptional patterns that are not captured by existing molecular omics classifications of HGSOC.

      Statistical associations of these (hitherto masked) transcriptional components with prognostic outcomes in HGSOC could lead to additional insights into underlying mechanisms and, coupled with corroborating evidence from spatial transcriptomics, are proposed for further investigation.

      This approach is complementary to existing transcriptomics classifications of HGSOC.

      The authors have previously applied the same approach in colorectal carcinoma (Knapen et al. (2024) Commun. Med).

      Strengths:

      (1) Overall, this study describes a solid data-driven description of c-ICA-derived transcriptional components that the authors identified in HGSOC microarray transcriptomics data, supported by detailed methods and supplementary documentation.

      We thank the reviewer for acknowledging the strength of our data-driven approach and the use of consensus-independent component analysis (c-ICA) to identify transcriptional components within HGSOC microarray data. We aimed to provide comprehensive methodological detail and supplementary documentation to support the reproducibility and robustness of our findings. We believe this approach allows for the identification of subtle transcriptional signals that might have been overlooked by traditional analysis methods.

      (2) The biological interpretation of transcriptional components is convincing based on (data-driven) permutation analysis and a suite of analyses of association with copy-number, gene sets, and prognostic outcomes.

      We appreciate the positive feedback on the biological interpretation of our transcriptional components. We are pleased that our approach, which includes data-driven permutation testing and analyses of associations with copy-number alterations, gene sets, and prognostic outcomes, was found to be convincing. These analyses were integral to enhancing our findings’ robustness and biological relevance.

      (3) The resulting annotated transcriptional components have been made available in a searchable online format.

      Thank you for this important positive remark.

      (4) For the highlighted transcriptional component which has been annotated as related to synaptic signalling, the detection of the transcriptional component among 11 published spatial transcriptomics samples from ovarian cancers appears to support this preliminary finding and requires further mechanistic follow-up.

      Thank you for acknowledging the accessibility of our annotated transcriptional components. We prioritized making these data available in a searchable online format to facilitate further research and enable the community to explore and validate our findings.

      Weaknesses:

      (1) This study has not explicitly compared the c-ICA transcriptional components to the existing reported transcriptional landscape and classifications for ovarian cancers (e.g. Smith et al Nat Comms 2023; TCGA Nature 2011; Engqvist et al Sci Rep 2020) which would enable a further assessment of the additional contribution of c-ICA - whether the cICA approach captured entirely complementary components, or whether some components are correlated with the existing reported ovarian transcriptomic classifications.

      We acknowledge the reviewer’s insightful suggestion to compare our c-ICA-derived transcriptional components with previously reported ovarian cancer classifications, such as those from Smith et al. (2023), TCGA (2011), and Engqvist et al. (2020). To address this, we incorporated analyses comparing the activity scores of our transcriptional components with these published landscapes and classifications, particularly focusing on any associations with overall survival. Additionally, we evaluated correlations between gene signatures from a subset of these studies and our identified TCs, enhancing our understanding of the unique contributions of the c-ICA approach. Please refer to our response to remark 10 for the results of these analyses.

      (2) Here, the authors primarily interpret the c-ICA transcriptional components as a deconvolution of bulk transcriptomics due to the presence of cells from tumour cells and the tumour microenvironment.

      However, c-ICA is not explicitly a deconvolution method with respect to cell types: the transcriptional components do not necessarily correspond to distinct cell types, and may reflect differential dysregulation within a cell type. This application of c-ICA for the purpose of data-driven deconvolution of cell populations is distinct from other deconvolution methods that explicitly use a prior cell signature matrix.”

      We acknowledge that c-ICA, unlike traditional deconvolution methods, is not specifically designed for cell-type deconvolution and does not rely on a predefined cell signature matrix. While we explored the transcriptional components in the context of tumour and microenvironmental interactions, we agree that these components may not correspond directly to distinct cell types but rather reflect complex patterns of dysregulation, potentially within individual cell populations.

      Our goal with c-ICA was to uncover hidden transcriptional patterns possibly influenced by cellular heterogeneity. However, we recognize these patterns may also arise from regulatory processes within a single cell type. To investigate further, we used single-cell transcriptional data (~60,000 cell-types annotated profiles from GSE158722) and projected our transcriptional components onto these profiles to obtain activity scores, allowing us to assess each TC’s behavior across diverse cellular contexts after removing the first principal component to minimize background effects. Please refer to our response to remark 2.2 in the recommendations to the authors (page 14) for the results of this analysis.

      References

      Allen JK, Armaiz-Pena GN, Nagaraja AS, Sadaoui NC, Ortiz T, Dood R, Ozcan M, Herder DM, Haemerrle M, Gharpure KM, Rupaimoole R, Previs R, Wu SY, Pradeep S, Xu X, Han HD, Zand B, Dalton HJ, Taylor M, Hu W, Bottsford-Miller J, Moreno-Smith M, Kang Y, Mangala LS, Rodriguez-Aguayo C, Sehgal V, Spaeth EL, Ram PT, Wong ST, Marini FC, Lopez-Berestein G, Cole SW, Lutgendorf SK, diBiasi M, Sood AK. 2018. Sustained adrenergic signaling promotes intratumoral innervation through BDNF induction. Cancer Res 78 (12):3233-3242.

      Ardini E, Menichincheri M, Banfi P, Bosotti R, Ponti CD, Pulci R, Ballinari D, Ciomei M, Texido G, Degrassi A, Avanzi N, Amboldi N, Saccardo MB, Casero D, Orsini P, Bandiera T, Mologni L, Anderson D, Wei G, Harris J, Vernier J-M, Li G, Felder E, Donati D, Isacchi A, Pesenti E, Magnaghi P, Galvani A. 2016. Entrectinib, a Pan–TRK, ROS1, and ALK Inhibitor with activity in multiple molecularly defined cancer Indications. Mol Cancer Ther 15:628–639.

      Balood M, Ahmadi M, Eichwald T, Ahmadi A, Majdoubi A, Roversi Karine, Roversi Katiane, Lucido CT, Restaino AC, Huang S, Ji L, Huang K-C, Semerena E, Thomas SC, Trevino AE, Merrison H, Parrin A, Doyle B, Vermeer DW, Spanos WC, Williamson CS, Seehus CR, Foster SL, Dai H, Shu CJ, Rangachari M, Thibodeau J, Rincon SVD, Drapkin R, Rafei M, Ghasemlou N, Vermeer PD, Woolf CJ, Talbot S. 2022. Nociceptor neurons affect cancer immunosurveillance. Nature 611:405–412.

      Bhattacharya A, Bense RD, Urzúa-Traslaviña CG, Vries EGE de, Vugt MATM van, Fehrmann RSN. 2020. Transcriptional effects of copy number alterations in a large set of human cancers. Nat Commun 11:715.

      Burris HA, Shaw AT, Bauer TM, Farago AF, Doebele RC, Smith S, Nanda N, Cruickshank S, Low JA, Brose MS. 2015. Abstract 4529: Pharmacokinetics (PK) of LOXO-101 during the first-in-human Phase I study in patients with advanced solid tumors: Interim update. Cancer Res 75:4529–4529.

    1. Author response:

      We thank the reviewers for their evaluation, for helpful suggestions to improve clarity and accuracy, and for their positive reception of the manuscript. We will incorporate their suggestions in a revised manuscript. Here, we respond to their major comments. 

      The reviewers suggest that a molecular study of Hofstenia’s reproductive systems would be beneficial, as would mechanistic explanations for its unusual reproductive behavior. We agree with the reviewers that both of these would be interesting avenues, although we think this is outside the scope of this current manuscript. This manuscript studies growth and reproductive dynamics in acoels, and establishes a foundation to study its underlying molecular, developmental, and physiological machinery. 

      Our previous molecular work, using scRNAseq and FISH, identified several germline markers. Here, we show that two of them are specific markers of testes and ovaries, respectively. This, together, with our new anatomical data, allows us to identify the expression domains of most of these other markers more clearly. Some markers may be expressed in a presumptive common germline that eventually splits into an anterior male germline and posterior female germline. We agree with the reviewers that understanding the dynamics of germline differentiation and its molecular genetic underpinnings would be very interesting, and we hope to address this in future work. 

      As the reviewers note, we do not understand how sperm is stored, how the worm’s own sperm can travel to its ovaries to enable selfing, or how eggs in the ovaries travel within the body. We agree with the reviewers that understanding these processes would be very interesting. Our histological and molecular work so far has been unable to find tube-like structures or other cavities for storage and transport. Potentially, cells could move within the parenchyma. Explaining these events will require substantial effort (including mechanistic studies of cell behavior and ultrastructural studies that the reviewers suggest), and we hope to do this in future work. 

      We agree with Reviewer 1 that it is interesting that Piwi-1 expression is only observed in the ovaries and not in the testes - unusual given its broad germline expression in many taxa. Although there are several possible explanations for this finding (for eg. Piwi-1 could be expressed at low levels in male germline, perhaps other Piwi proteins are expressed in male germline, or Piwi may play roles in male germline progenitors that are not co-located with maturing sperm, etc), we do not currently know why this is so, and we will discuss these possibilities in our revised manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors report the role of a novel gene Aff3ir-ORF2 in flow-induced atherosclerosis. They show that the gene is anti-inflammatory in nature. It inhibits the IRF5-mediated athero-progression by inhibiting the causal factor (IRF5). Furthermore, the authors show a significant connection between shear stress and Aff3ir-ORF2 and its connection to IRF5 mediated athero-progression in different established mice models which further validates the ex vivo findings.

      Strengths:

      (1) An adequate number of replicates were used for this study.

      (2) Both in vitro and in vivo validation was done.

      (3) The figures are well presented.

      (4) In vivo causality is checked with cleverly designed experiments.

      We thank you for your positive remarks.

      Weaknesses:

      (1) Inflammatory proteins must be measured with standard methods e.g ELISA as mRNA level and protein level does not always correlate.

      Thanks. We have followed your advice and performed ELISA experiments to measure the concentrations of inflammatory cytokines, including IL-6 and IL-1β. The newly acquired results have been included in Figure 2E (Line 160-163) in the revised manuscript.

      (2) RNA seq analysis has to be done very carefully. How does the euclidean distance correlate with the differential expression of genes. Do they represent the neighborhood?

      If they do how does this correlation affect the conclusion of the paper?

      We thank the reviewer for this professional comments and apologize for the confusion. The heatmap using Euclidean distance was generated based on the expression levels of all differentially expressed genes (calculated with deseq2). Since its interpretation overlaps with the volcano plot presented in Figure 4B, we have moved the heatmap to Figure S5A in the revised manuscript and provided a detailed description in the figure legend (Lines 106-108 in the supporting information). Additionally, to better illustrate the variation among all samples, we have performed PCA analysis and included the new results in Figure 4A of the revised manuscript.

      (3) The volcano plot does not indicate the q value of the shown genes. It is advisable to calculate the q value for each of the genes which represents the FDR probability of the identified genes.

      Thank you for your careful review. We apologize for the incorrect labeling.

      It was P.adj value. The label for Figure 4B has been corrected in the revised manuscript. 

      (4) GO enrichment was done against the Global gene set or a local geneset? The authors should provide more detailed information about the analysis.

      Thank you. We performed GO enrichment analysis against the global gene set. The description of the results has been updated in the revised manuscript (Lines 222–224).

      (5) If the analysis was performed against a global gene set. How does that connect with this specific atherosclerotic microenvironment?

      Thank you for your insightful comments. We have followed your advice and investigated the functional characteristics of these differentially expressed genes in the context of the atherosclerotic microenvironment. The RNA-seq differential gene list was further mapped onto the atherosclerosis-related gene dataset (PMID: 27374120), resulting in 363 overlapping genes. The 363 genes were subjected to bioinformatics enrichment analysis using Gene Ontology (GO) databases. GO analysis of these genes revealed enrichment in processes related to cell−cell adhesion and leukocyte activation involved in immune response (Figure S5B), which is highly consistent with the observed effects of AFF3ir-ORF2 on VCAM-1 expression. The newly acquired data are presented in Figure S5B and the description of the results is included in the revised manuscript (Line 227-233).

      (6) What was the basal expression of genes and how did the DGE (differential gene expression) values differ?

      Thanks for the comments. The RNA-sequencing data has been submitted to GEO datasets (GSE286206), making the basal gene expression data available to readers.

      The differential expression analysis was performed using DESeq2 (v1.4.5) (PMID: 25516281) with a criterion of 1.5-fold change and P<0.05. We has included the description in the revised manuscript in Lines 220-222 and Lines 575-576.

      (7) How was IRF5 picked from GO analysis? was it within the 20 most significant genes?

      Sorry for the confusion. IRF5 was not identified through GO analysis. To determine the upstream transcriptional regulators, we used the ChEA3 database to predict potential upstream transcription factors based on all differentially expressed genes. The top 20 transcription factors were selected based on their scores. To further explore their relationship with atherosclerosis, these top 20 transcription factors were mapped to the atherosclerosis-related gene list in the DisGeNET database. IRF5 and IRF8 were the only two overlapping genes. To clarify this process, we have included a more detailed description of the IRF prediction approach in the revised manuscript (Lines 234–239).

      (8) Microscopic studies should be done more carefully? There seems to be a global expression present on the vascular wall for Aff3ir-ORF2 and the expression seems to be similar to AFF3 in Figure 1.

      We thank the reviewer for the valuable suggestion. We have followed your advice and provided the more representative images in Figure 1F.

      Reviewer #2 (Public review):

      Summary:

      The authors recently uncovered a novel nested gene, Aff3ir, and this work sets out to study its function in endothelial cells further. Based on differences in expression correlating with areas of altered shear stress, they investigate a role for the isoform Aff3ir-ORF2 in endothelial activation and development of atherosclerosis downstream of disturbed shear stress. Using a knockout mouse model and in vivo overexpression experiments, they demonstrate a strong potential for Aff3ir-ORF2 to alleviate atherosclerosis. They find that Aff3ir-ORF2 interacts with the pro-inflammatory transcription factor IRF5 and retains it in the cytoplasm, hence preventing upregulation of inflammation-associated genes. The data expands our knowledge of IRF5 regulation which could be relevant to researchers studying various inflammatory diseases as well as adding to our understanding of atherosclerosis development.

      Strengths:

      The in vivo data is solid using immunofluorescence staining to assess AFF3ir-ORF2 expression, a knockout mouse model, overexpression and knockdown studies, and rescue experiments in combination with two atherosclerotic models to demonstrate that Aff3ir-ORF2 can lessen atherosclerotic plaque formation in ApoE<sup>-/-</sup> mice.

      We thank you for your positive remarks.

      Weaknesses:

      While the in vivo data is generally convincing, a few data panels have issues and will need addressing. Also, the knockout mouse model will need to be described, since the paper referred to in the manuscript does not actually report any knockout mouse model. Hence it is unclear how Aff3ir-ORF2 is targeted, but Figure S2B shows that targeting is partial, since about 30% expression remains at the RNA level in MEFs isolated from the knockout mice.

      We thank you for the valuable comments. 

      First, we have followed your advice and included detailed information regarding the animal construction in the revised manuscript in Line 405-415. Additionally, the genotyping results have been included in new Figure S3A.

      Second, we acknowledge your concern about the knockout efficiency of ORF2 in mice. While the PCR assay indicated approximately 30% residual expression, our Western blot analysis of aorta samples demonstrated that ORF2 protein was barely detectable in knockout mice, as shown in new Figure S3B-C. Besides, our in vivo experiments using MEF from WT and AFF3ir-ORF2<sup>-/-</sup> mice (Figure 4I) further confirmed successful knockout. 

      Third, we have included a discussion addressing the discrepancies between PCR and Western blot results. In addition to technical differences between the two methods, the nature of AFF3ir-ORF2 may also contribute to these inconsistencies. The parent gene AFF3 is located in a genetically variable region and can be excised via intron 5 to form a replicable transposon, which translocates to other chromosomes and has been linked to leukemia (PMID: 34995897, 12203795, 12743608, and 17968322). AFF3ir is located in the intron 6, thus it exists in the transposon, which may complicate the measurement of its expression. Replicable transposons can exist as extrachromosomal elements, allowing them to be inherited across generations. We have included these discussion in the revised manuscript in Line 188-196.

      While the effect on atherosclerosis is clear, the conclusion that this is the result of reduced endothelial cell activation is not supported by the data. The mouse model is described as a global knockout and the shRNA knockdowns (Figure 5) and overexpression data in Figure 2 are not cell type-specific. Only the overexpression construct in Figure 6 uses an ICAM-2 promoter construct, which drives expression in endothelial cells, though leaky expression of this promoter has been reported in the literature. Therefore, other cell types such as smooth muscle cells or macrophages could be responsible for the effects observed.

      Thank you for your critical comment. To address your concern, we have made the following three revisions:

      First, we have analyzed the expression of AFF3ir-ORF2 in the vascular wall with or without intima in WT and AFF3ir-ORF2 knockout mice. As shown in Figure 1B and Figure S1A, while the expression of AFF3ir-ORF2 was notably downregulated in the aortic intima of athero-prone regions compared to the protective region, it remained largely unchanged in the aortic wall without intima across different regions of the aorta. This suggested that AFF3ir-ORF2 might play a predominant role in endothelial cells rather than other cell types in the context of shear stress.

      Second, we have used human endothelial cells (HUVECs) to further confirm our findings. As shown in Figure 2C and Figure S2B, we found that AFF3ir-ORF2 overexpression could attenuate disturbed shear stress-induced IRF5 nuclear translocation and the expression of inflammatory genes in HUVECs, suggesting the potential anti-inflammatory effects of AFF3ir-ORF2 in endothelial cells.

      Third, we agree with the reviewer’s comment that we cannot completely exclude the potential involvement of other cell types. Hence, we have included a limitation statement in the discussion part in Lines 341-344.

      The weakest part of the manuscript is the in vitro experiment using some nonidentifiable expression differences. The data is used to hypothesise on a role for IRF5 in the effects observed with Aff3ir-ORF2 knockout.

      Thank you for the comments. To address your concerns, we have made the following two changes:

      First, we have further investigated the functional features of the differential genes from the RNA-seq in the context of atherosclerotic microenvironment. The differential gene list was mapped onto the atherosclerosis-related gene dataset (PMID: 27374120), and a total of 363 genes overlapped. These 363 genes were subjected to bioinformatics enrichment analysis using Gene Ontology (GO) databases. GO analysis showed that these genes were mainly enriched in cell−cell adhesion and leukocyte activation involved in immune response, which aligns with the expression of VCAM-1 affected by AFF3ir-ORF2. The newly acquired data are presented in Figure S5B and the description of the results has been updated in the revised manuscript (Line 227-233).

      Second, we have further verified the RNA-seq results in vitro. Several classical inflammatory factors, including ICAM-1, CCL5, and CXCL10, which mRNA levels were significantly downregulated in RNA-seq and were also identified as target genes of IRF5, were analyzed. We found that AFF3ir-ORF2 deficiency aggravated, while AFF3ir-ORF2 overexpression attenuated, the expression of ICAM-1, CCL5, and CXCL10 induced by disturbed shear stress (New Figure S5D). Besides, the regulation of ICAM-1 by AFF3ir-ORF2 was confirmed at both protein and mRNA levels in HUVECs (Figure 2C-D and Figure S2B). 

      Overall, the paper succeeds in demonstrating a link between Aff3ir-ORF2 and atherosclerosis, but the cell types involved and mechanisms remain unclear. The study also shows a functional interaction between Aff3ir-ORF2 and IRF5 in embryonic fibroblasts, but any relevance of this mechanism for atherosclerosis or any cell types involved in the development of this disease remains largely speculative.

      Thank you for all the valuable comments. The specific responses have been provided above. Briefly, we have followed your advice and further confirmed the regulation of AFF3ir-ORF2 on IRF5 in endothelial cells. Besides, the RNA-seq results have been further analyzed, and partial results have been verified in endothelial cells to support the anti-inflammatory role of AFF3ir-ORF2. We greatly appreciate the reviewer’s insightful comments, which guided our revisions and contributed to significantly improving the paper.

      Reviewer #3 (Public review):

      This study is to demonstrate the role of Aff3ir-ORF2 in the atheroprone flow-induced EC dysfunction and ensuing atherosclerosis in mouse models. Overall, the data quality and comprehensiveness are convincing. In silico, in vitro, and in vivo experiments and several atherosclerosis were well executed. To strengthen further, the authors can address human EC relevance.

      We thank you for your positive remarks and insightful comments.

      Major comments:

      (1) The tissue source in Figures 1A and 1B should be clarified, the whole aortic segments or intima? If aortic segment was used, the authors should repeat the experiments using intima, due to the focus of the current study on the endothelium.

      We thank you for the suggestion. The tissue used in Figures 1A and 1B was from aortic intima. The description has been updated for clarity in the revised manuscript on Lines 114-125. 

      (2) Why were MEFs used exclusively in the in vitro experiments? Can the authors repeat some of the critical experiments in mouse or human ECs?

      Thank you for this insightful comment. Isolation and culture of mouse primary aortic ECs were notorious technically difficult and shear stress experiment require a large number of cells. Considering MEFs exhibit responses consistent with those of ECs, which has been delicately proved (PMID: 23754392), we used MEFs in our in vitro experiments.

      However, following your valuable advice, we have now employed human ECs (HUVECs) to confirm our findings. Consistent with our results in MEFs, we found that AFF3ir-ORF2 overexpression reduced the expression of inflammatory genes induced by disturbed shear stress at both protein and mRNA levels in HUVECs (Figure 2C, Figure S2B). Notably, despite the significant anti-inflammatory effects of AFF3irORF2, the sequence of this gene is not conserved in Homo sapiens and lacks an initiation codon, which is why we did not further proceed with the loss-of-function experiments.

      (3) The authors should explain why AFF3ir-ORF2 overexpression did not affect the basal level expression of ICAM-1, VCAM-1, IL-1b, and IL-6 under ST conditions (Figure 2A-C).

      We thank you for raising this critical question. Indeed, we found that AFF3ir-ORF2 overexpression did not affect the basal level of inflammatory genes under ST conditions, while it exerted anti-inflammatory effects under OSS conditions. One underlying reason might be the relative low level of expression of inflammatory genes under ST compared to OSS conditions. Additionally, as our findings suggested, AFF3ir-ORF2 exerted its anti-inflammatory role by binding to IRF5 and inhibiting IRF5 nuclear translocation. However, as shown in Figure 4I, IRF5 might be predominantly localized in the cytoplasm rather than the nucleus under ST conditions.

      We have included the description in the revised manuscript on Lines 157-163.

      (4) Please include data from sham controls, i.e., right carotid artery in Figure 2E.

      Thank you for the suggestion. We have followed your advice and included sham controls (staining of the right carotid arteries) in Figure S2E.

      (5) Given that the merit of the study lies in the effect of different flow patterns, the legion areas in AA and TA (Figure 3B, 3C) should be separately compared.

      We have followed your valuable suggestion and included the additional statistical results in Figure 3C in the revised manuscript.

      (6) For confirmatory purposes for the variations of IRF5 and IRF8, can the authors mine available RNA-seq or even scRNA-seq data on human or mouse atherosclerosis? This approach is important and could complement the current results that are lacking EC data.

      Thank you for your valuable suggestion. In the present study, we found that disturbed flow did not alter the protein level of IRF5 but promoted its nuclear translocation. Following your advice, we analyzed the expression of IRF5 in human ECs (GSE276195) and atherosclerotic mouse arteries (GSE222583) using public databases. Consistently, IRF5 did not show significant changes in mRNA levels under these conditions (Figure S5E-F), suggesting that the regulation of IRF5 in the context of disturbed flow or atherosclerosis is primarily post-translational.

      (7) With the efficacy of using AAV-ICAM2-AFF3ir-ORF2 in atherosclerosis reduction (Figure 6), the authors are encouraged to use lung ECs isolated from the AFF3ir-ORF2/-mice to recapitulate its regulation of IRF5.

      We greatly appreciate your valuable suggestion to use lung ECs from mice. We have observed that AFF3ir-ORF2 deficiency enhanced the nuclear translocation of IRF5 induced by OSS. Noteworthy, the transcriptional levels of IRF5 were minimally affected by AFF3ir-ORF2 deficiency. Hence, to recapitulate the regulation of IRF5 with lung ECs isolated from the AFF3ir-ORF2<sup>-/-</sup> mice, it would require treating lung ECs with OSS followed by isolation of subcellular components. However, both in vitro shear stress treatment and subcellular fraction isolation require a large number of cells, and mouse lung ECs are difficult to culture and pass through several passages. Therefore, we hope the reviewer understands that these experiments were not performed. As an alternative, we have confirmed the transcriptional activity changes of IRF5 due to AFF3ir-ORF2 manipulation by analyzing the expression of its target genes indicated from RNA-seq results in both the intima of mouse aorta (Figure S5C-D) and HUVECs (Figure 2C-D and Figure S2B). Our findings show that AFF3ir-ORF2 deficiency increases, while its overexpression decreases, the expression levels of IRF5-targeted genes in endothelial cells.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Figure 2H - As I understand it, this is MFI measurement of VCAM. Please change accordingly.

      Thanks. Corrected.

      Reviewer #2 (Recommendations for the authors):

      My major concern is the use of MEFs for all in vitro experiments. All experiments should be done in endothelial cells if the aim is to show a mechanism relevant to endothelial activation and atherosclerosis. Lines 314-316 of the conclusion are absolutely not supported by the data.

      Thank you for the insightful comment. Following your advice, we have employed human ECs (HUVECs) to confirm our findings. Consistent with the findings in MEFs, we found that AFF3ir-ORF2 decreased the expression of inflammatory genes induced by disturbed shear stress, both at protein and mRNA levels in HUVECs (Figure 2C, Figure S2B). 

      Since the in vivo experiments are not cell type-specific, it would be important to test and compare the expression of Aff3ir-ORF2 in endothelial cells as well as smooth muscle and macrophages to support any claim of cell type involvement in the effects observed.

      We thank you for the valuable suggestion. In the revised manuscript, we have followed your suggestion and analyzed the expression pattern of AFF3ir-ORF2 in different regions of the aorta with or without endothelium. We observed a marked reduction in AFF3ir-ORF2 expression in the intima of the aortic arch compared to that in the intima of the thoracic aorta (Figure 1B-C). In contrast, the expression of AFF3irORF2 in the media and adventitia was comparable between the aortic arch and thoracic aorta (Figure S1A-B). These findings provide further evidence supporting the predominant role of endothelial cells. The description has been modified accordingly in the revised manuscript on Lines 121-134.

      The results of the RNA-seq experiment should be disclosed. The experiment should be deposited on GEO or similar and a table of differentially expressed genes added to the manuscript.

      Thank you for the suggestion. We have followed your advice and submitted the RNA-sequencing data to GEO datasets (GSE286206). Besides, a table of differentially expressed genes has been included in the revised manuscript as Table S3.

      Minor comments:

      (1) Figure 1A. Missing the labels of the target.

      Thanks. Corrected. 

      (2) Figure 1D. Cell alignment in AA compared to TA suggests that the image is of the outer curvature, but Figure 1F is showing that the outer curvature is expressing more ORF2 than the inner. Why was the outer curvature chosen for this panel and is it true to conclude on that assumption that expression of ORF2 compares as TA > Outer > Inner curvature?

      We thank you for the insightful suggestion. We have followed your advice and performed en-face immunofluorescence staining of AFF3ir-ORF2 and quantification of AFF3ir-ORF2 expression in AA inner, AA outer, and TA regions. As shown in new Figure 1D-E, the results indeed indicated that expression of AFF3irORF2 compares as TA > AA outer > AA inner.

      (3) Figure 2H. Target mislabelled as ICAM-1 instead of VCAM-.

      Thanks. Corrected. 

      (4) Figure S1A. VE-cad staining and cell shape differ between control and overexpression. Is this a phenotype or are different areas of the vasculature shown, which would make it hard to interpret since Aff3ir-ORF2 levels differ in different vessel areas?

      We thank the reviewer for raising this important question. For Figure S1A, only common carotid arteries were used for the staining. The potential differences in cell shape observed might be due to variations in the procedure during immunofluorescence staining. To avoid any misinterpretation, more representative images have been provided in the revised Figure S2C.

      (5) Figure 3D-G. Images are not representative of the quantification results.

      Thank you. More representative images have been replaced in the revised Figure 3D and Figure 3F.

      (6) Line 220. Data for IRF8 are not shown in the figure to support this claim.

      Thank you for pointing this out. The expression level of IRF8 has been included in Figure S5C.

      (7) Figure 6F. AAV-AFF3ir-ORF2 panel order inverted.

      Thanks. Corrected. 

      (8) Line 401. Type "hat" instead of "h at".

      Sorry for the typo. Corrected.

      Reviewer #3 (Recommendations for the authors):

      Minor comments:

      (1)  The rationale for the following sentence (lines 126-128) is lacking: "Moreover, 126 we observed the expression of AFF3ir-ORF2 in longitudinal sections of the mouse aorta (B. 127 Li et al., 2019)".

      Thanks. The rationale for these experiments have been included in the revised manuscript on Line 127-129. 

      (2) The source of antibodies against AFF3ir-ORF1 and AFF3ir-ORF2 used in western blot and immunostaining experiments were not mentioned in the manuscript.

      Thanks. The antibody information has been included in the method part on Line 456-457, 510-511. 

      (3) The rationale and data interpretation is not clear for the following sentence (lines 220-221): "In addition, neither IRF5 nor IRF8 expression was regulated by AFF3irORF2 220 (Figure 4F)".

      Thank you for pointing this out. The expression level of IRF8 has been included in Figure S5C. The sentence has been modified accordingly on Lines 253254. 

      (4) The quality of AFF3ir-ORF2 blot in Figure 4I needs improvement.

      Thanks. More representative images have been included in Figure 4I.

      (5) It appears that AFF3ir-ORF2 was present in both cytoplasm and nucleus. Does AFF3ir-ORF2 have a nuclear entry peptide? Also, the nuclear entry of AFF3ir-ORF2 can be enhanced by an immunofluorescence staining experiment.

      Thank you for your insightful comments. Indeed, although we did not observe any significant subcellular changes in the localization of AFF3ir-ORF2 under shear stress conditions, our immunostaining results revealed that AFF3ir-ORF2 is localized in both the cytoplasm and nucleus. To explore whether AFF3ir-ORF2 contains nuclear localization signals, we utilized the NLStradamus tool (http://www.moseslab.csb.utoronto.ca/NLStradamus/) to analyze its sequence. The predication indicated that AFF3ir-ORF2 lacks a nuclear localization signal.

    1. Author response:

      Reviewer 1: “The authors over-emphasized this study's relevance to RP disease (i.e. patients and mammals are not capable of regeneration like zebrafish).”

      It is true that humans and other mammals are not capable of regeneration.  This is why we and many other groups study zebrafish to identify mechanisms of regeneration that successfully form new rods.  That said, our previous paper on the molecular basis or retinal remodeling in this zebrafish model system (Santhanam et al., 2023; Cell Mol Life Sci. 2023;80(12):362) revealed remarkable similarities in the stress and physiological responses of rods, cones, RPE and inner retinal neurons to those in mammalian RP models.  Thus, we believe this zebrafish is an adequate model of RP and an excellent model to study rod regeneration. 

      Reviewer 1: “They under-explained this regeneration's relevance or difference to normal developmental process, which is pretty much conserved in evolution.”  and:

      Reviewer 3: “It would also benefit from integration with single-cell multiome data from developing retinas (Lyu, et al. 2023).”

      It is an excellent suggestion to compare the regenerative response we have studied in a chronic degeneration/regeneration model to the trajectory of developmental rod formation. In Lyu, et at. 2023, it was found that while retinal regeneration has similarities to retinal development, it does not precisely recapitulate the same transcription factors and processes. Any differences between this trajectory and that revealed in developmental studies would be enlightening.  We intend to do such analyses to add to a revised manuscript in the future. 

      Reviewer 2: “Perhaps the authors can consider explaining why the Prdm1a knock-down cells would have a higher Retp1 signal per cell in Fig 9B. Is this a representative picture? This appears to contradict Figure 8's conclusion, although I could tell that the number of Retp1+ cells in the ONL appears to be lower.”

      These are different experimental paradigms.  Figure 8 shows knockdown 48 hours after injection, at which time prdm1a knockdown is affecting rhodopsin expression directly.  That experiment investigated whether prdm1a knockdown affected progenitor proliferation.  Figure 9 shows a time point 6 days after injection, at which time we were asking if prdm1a knockdown affected differentiation of progenitors into rods. 

      Reviewer 2: “The authors noted "Surprisingly, the knockdown of prdm1a resulted in a significantly higher number of rhodopsin-positive cells in the INL (p=0.0293)", while it appears in Figure 9B, 9C that the difference is 2 cells vs 0 in a rightly broader field. It seems to be too strong of a statement for this effect.”

      This was a very unexpected finding.  We included statistics (Figure 9D) to support the finding, so we don’t think it is too strong a statement to make.  Speculation as to what might cause this is fascinating.  Are Muller cells producing progenitors that fail to migrate to the ONL before differentiating into rods?  The lack of BrdU labeling does not support this idea.  Do neurogenic progenitor cells in the INL differentiate towards rods via a pathway that does not require prdm1a?  Perhaps.  Perhaps there are other explanations.

      Reviewer 2: “It appears to this reviewer that the proteomic data didn't reveal much in line with the overall hypothesis or the mechanism, and it's unclear why the authors went for proteomics rather than bulk RNA-seq or ChIP-seq for a transcription factor knock-down experiment. Overall this is a minor point.”

      We agree that bulk RNA sequencing would provide a similar answer, possibly with greater sensitivity.  We chose proteomics for two reasons: 1) We wanted an independent assessment of the knockdown effects that could evaluate whether the knockdowns worked and what pathways were affected.  Since our pathway comparison is to single cell RNAseq data, bulk RNA seq did not seem to be fully independent. 2) Because we used translation-blocking antisense oligos for most knockdown experiments, we did not expect the transcript abundance of the targeted gene to be affected, although these oligos can lead to target transcript degradation.  Thus, we were not likely to be able to validate that our knockdown worked with this technique. 

      Reviewer 3: “The gene regulatory network analysis here would also benefit from the addition of matched scATAC-Seq data, …”

      This is certainly true, and the reviewer points to several studies that have made excellent use of this strategy.  Given the 1-2 year timeline to obtain and analyze such data, it is unlikely that we will be able to incorporate such data in our revised manuscript, but we hope to do so for follow-up studies.

      Reviewer 3: “The description of the time points analyzed is vague, stating only that "fish from 6 to 12 months of age were analyzed". Since photoreceptor degeneration is progressive, it is unclear how progenitor behavior changes over time, or how the gene expression profile of other cell types such as microglia, cones, or surviving rods is altered by disease progression.”

      We have shown in a previous study (Santhanam et al. Cells. 2020;9(10)) that rod degeneration and regeneration are in a steady state from at least 4 to 8 months of age, and in other experiments in the lab at least to 12 months of age.  In this age range, regeneration keeps up with the pace of degeneration, both of which are very fast.  This encompasses the cell types that we specifically study in this manuscript.  The reviewer is right that other cell types could undergo changes.  This is a separate topic of study in the lab.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      The objective of this research is to understand how the expression of key selector transcription factors, Tal1, Gata2, Gata3, involved in GABAergic vs glutamatergic neuron fate from a single anterior hindbrain progenitor domain is transcriptionally controlled. With suitable scRNAseq, scATAC-seq, CUT&TAG, and footprinting datasets, the authors use an extensive set of computational approaches to identify putative regulatory elements and upstream transcription factors that may control selector TF expression. This data-rich study will be a valuable resource for future hypothesis testing, through perturbation approaches, of the many putative regulators identified in the study. The data are displayed in some of the main and supplemental figures in a way that makes it difficult to appreciate and understand the authors' presentation and interpretation of the data in the Results narrative. Primary images used for studying the timing and coexpression of putative upstream regulators, Insm1, E2f1, Ebf1, and Tead2 with Tal1 are difficult to interpret and do not convincingly support the authors' conclusions. There appears to be little overlap in the fluorescent labeling, and it is not clear whether the signals are located in the cell soma nucleus.

      Strengths:

      The main strength is that it is a data-rich compilation of putative upstream regulators of selector TFs that control GABAergic vs glutamatergic neuron fates in the brainstem. This resource now enables future perturbation-based hypothesis testing of the gene regulatory networks that help to build brain circuitry.

      We thank Reviewer #1 for the thoughtful assessment and recognition of the extensive datasets and computational approaches employed in our study. We appreciate the acknowledgment that our efforts in compiling data-rich resources for identifying putative regulators of key selector transcription factors (TFs)—Tal1, Gata2, and Gata3—are valuable for future hypothesis-driven research.

      Weaknesses:

      Some of the findings could be better displayed and discussed.

      We acknowledge the concerns raised regarding the clarity and interpretability of certain figures, particularly those related to expression analyses of candidate upstream regulators such as Insm1, E2f1, Ebf1, and Tead2 in relation to Tal1. We agree that clearer visualization and improved annotation of fluorescence signals are crucial to accurately support our conclusions. In our revised manuscript, we will enhance image clarity and clearly indicate sites of co-expression for Tal1 and its putative regulators, ensuring the results are more readily interpretable. Additionally, we will expand explanatory narratives within the figure legends to better align the figures with the results section.

      Reviewer #2 (Public review):

      Summary:

      In the manuscript, the authors seek to discover putative gene regulatory interactions underlying the lineage bifurcation process of neural progenitor cells in the embryonic mouse anterior brainstem into GABAergic and glutamatergic neuronal subtypes. The authors analyze single-cell RNA-seq and single-cell ATAC-seq datasets derived from the ventral rhombomere 1 of embryonic mouse brainstems to annotate cell types and make predictions or where TFs bind upstream and downstream of the effector TFs using computational methods. They add data on the genomic distributions of some of the key transcription factors and layer these onto the single-cell data to get a sense of the transcriptional dynamics.

      Strengths:

      The authors use a well-defined fate decision point from brainstem progenitors that can make two very different kinds of neurons. They already know the key TFs for selecting the neuronal type from genetic studies, so they focus their gene regulatory analysis squarely on the mechanisms that are immediately upstream and downstream of these key factors. The authors use a combination of single-cell and bulk sequencing data, prediction and validation, and computation.

      We also appreciate the thoughtful comments from Reviewer #2, highlighting the strengths of our approach in elucidating gene regulatory interactions that govern neuronal fate decisions in the embryonic mouse brainstem. We are pleased that our focus on a critical cell-fate decision point and the integration of diverse data modalities, combined with computational analyses, has been recognized as a key strength.

      Weaknesses:

      The study generates a lot of data about transcription factor binding sites, both predicted and validated, but the data are substantially descriptive. It remains challenging to understand how the integration of all these different TFs works together to switch terminal programs on and off.

      Reviewer #2 correctly points out that while our study provides extensive data on predicted and validated transcription factor binding sites, clearly illustrating how these factors collectively interact to regulate terminal neuronal differentiation programs remains challenging. We acknowledge the inherently descriptive nature of the current interpretation of our combined datasets.

      In our revision, we will clarify how the different data types support and corroborate one another, highlighting what we consider the most reliable observations of TF activity. Additionally, we will revise the discussion to address the challenges associated with interpreting the highly complex networks of interactions within the gene regulatory landscape.

      We sincerely thank both reviewers for their constructive feedback, which we believe will significantly enhance the quality and accessibility of our manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors demonstrate impairments induced by a high cholesterol diet on GLP-1R dependent glucoregulation in vivo as well as an improvement after reduction in cholesterol synthesis with simvastatin in pancreatic islets. They also map sites of cholesterol high occupancy and residence time on active versus inactive GLP-1Rs using coarse-grained molecular dynamics (cgMD) simulations and screened for key residues selected from these sites and performed detailed analyses of the effects of mutating one of these residues, Val229, to alanine on GLP-1R interactions with cholesterol, plasma membrane behaviour, clustering, trafficking and signalling in pancreatic beta cells and primary islets, and describe an improved insulin secretion profile for the V229A mutant receptor.

      These are extensive and very impressive studies indeed. I am impressed with the tireless effort exerted to understand the details of molecular mechanisms involved in the effects of cholesterol for GLP-1 activation of its receptor. In general, the study is convincing, the manuscript well written and the data well presented.

      Some of the changes are small and insignificant which makes one wonder how important the observations are. For instance, in figure 2 E (which is difficult to interpret anyway because the data are presented in percent, conveniently hiding the absolute results) does not show a significant result of the cyclodextrin except for insignificant increases in basal secretion. That is not identical to impairment of GLP-1 receptor signaling!

      We assume that the reviewer refers to Figure 1E, where we show the percentage of insulin secretion in response to 11 mM glucose +/- exendin-4 stimulation in mouse islets pretreated with vehicle or MβCD loaded with 20 mM cholesterol. While we concur with the reviewer that the effect in this case is triggered by increased basal insulin secretion at 11 mM glucose, exendin-4 appears to no longer compensate for this increase by proportionally amplifying insulin responses in cholesterol-loaded islets, leading to a significantly decreased exendin-4induced insulin secretion fold increase under these circumstances, as shown in Figure 1F. We interpret these results as a defect in the GLP-1R capacity to amplify insulin secretion beyond the basal level to the same extent as in vehicle conditions. An alternative explanation is that there is a maximum level of insulin secretion in our cells, and 11 mM glucose + exendin-4 stimulation gets close to that value. With the increasing effect of cholesterol-loaded MβCD on basal secretion at 11 mM glucose, exendin-4 stimulation would then appear to work less well.

      We have performed a simple experiment to investigate this possibility: insulin secretion following stimulation with a secretagogue cocktail (20 mM glucose, 30 mM KCl, 10 µM FSK and 100 µM IBMX) in islets +/- MβCD/cholesterol loading to determine if maximal stimulation had been reached or not in our original experiment. This experiment, now included in Supplementary Figure 1C, demonstrates that insulin secretion can increase up to ~4% (from ~2%) in our islets, supporting our initial conclusion. We have also included absolute insulin concentrations as well as percentages of secretion for all the experiments included in the study in the new Supplementary File 1 to improve the completeness of the report.

      To me the most important experiment of them all is the simvastatin experiment, but the results rest on very few numbers and there is a large variation. Apparently, in a previous study using more extensive reduction in cholesterol the opposite response was detected casting doubt on the significance of the current observation. I agree with the authors that the use of cyclodextrin may have been associated with other changes in plasma membrane structure than cholesterol depletion at the GLP-1 receptor.

      We agree with the reviewer that the insulin secretion results in vehicle versus LPDS/simvastatin treated mouse islets (Figure 1H, I) are relatively variable. We have therefore performed 2 extra biological repeats of this experiment (for a total n of 7). Results now show a significant increase in exendin-4-stimulated secretion with no change in basal secretion in islets pre-incubated with LPDS/simvastatin.  

      The entire discussion regarding the importance of cholesterol would benefit tremendously from studies of GLP-1 induced insulin secretion in people with different cholesterol levels before and after treatment with cholesterol-lowering agents. I suspect that such a study would not reveal major differences.

      We agree with the reviewer that such study would be highly relevant. While this falls outside the scope of the present paper, we encourage other researchers with access to clinical data on GLP-1R agonist responses in individuals taking cholesterol lowering agents to share their results with the scientific community. We have highlighted this point in the paper discussion to emphasise the importance of more research in this area.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript the authors provided a proof of concept that they can identify and mutate a cholesterol-binding site of a high-interest class B receptor, the GLP-1R, and functionally characterize the impact of this mutation on receptor behavior in the membrane and downstream signaling with the intent that similar methods can be useful to optimize small molecules that as ligands or allosteric modulators of GLP-1R can improve the therapeutic tools targeting this signaling system.

      Strengths:

      The majority of results on receptor behavior are elucidated in INS-1 cells expressing the wt or mutant GLP-1R, with one experiment translating the findings to primary mouse beta-cells. I think this paper lays a very strong foundation to characterize this mutation and does a good job discussing how complex cholesterol-receptor interactions can be (ie lower cholesterol binding to V229A GLP-1R, yet increased segregation to lipid rafts). Table 1 and Figure 9 are very beneficial to summarize the findings. The lower interaction with cholesterol and lower membrane diffusion in V229A GLP-1R resembles the reduced diffusion of wt GLP-1R with simv-induced cholesterol reductions, although by presumably decreasing the cholesterol available to interact with wt GLP-1R. This could be interesting to see if lowering cholesterol alters other behaviors of wt GLP-1R that look similar to V229A GLP-1R. I further wonder if the authors expect that increased cholesterol content of islets (with loading of MβCD saturated with cholesterol or high-cholesterol diets) would elevate baseline GLP-1R membrane diffusion, and if a more broad relationship can be drawn between GLP-1R membrane movement and downstream signaling.

      Membrane diffusion experiments are difficult to perform in intact islets as our method requires cell monolayers for RICS analysis. We however agree that it is of interest to investigate if cholesterol loading affects GLP-1R diffusion. To this end, we have performed further RICS analysis in INS-1 832/3 SNAP/FLAG-hGLP-1R cells pretreated with vehicle or MβCD loaded with 20 mM cholesterol (new Supplementary Figures 1D and 1E). Interestingly, results show significantly increased plasma membrane diffusion of exendin-4-stimulated receptors, with no change in basal diffusion, following MβCD/cholesterol loading. This behaviour differs from that of the V229A mutant receptor which shows reduced diffusion under basal conditions, a pattern that mimics that of the WT receptor under low cholesterol conditions (by pre-treatment with LPDS/simvastatin).

      Weaknesses:

      I think there are no obvious weaknesses in this manuscript and overall, I believe the authors achieved their aims and have demonstrated the importance of cholesterol interactions on GLP-1R functioning in beta-cells. I think this paper will be of interest to many physiologists who may not be familiar with many of the techniques used in this paper and the authors largely do a good job explaining the goals of using each method in the results section.

      The intent of some methods, for example the Laurdan probe studies, are better expanded in the discussion.

      We have expanded on the rationale behind the use of Laurdan to assess behaviours of lipid packed membrane nanodomains in the methods, results and discussion of the revised manuscript.

      I found it unclear what exactly was being measured to assess 'receptor activity' in Fig 7E and F.

      Figures 7E and F refer to bystander complementation assays measuring the recruitment of nanobody 37 (Nb37)-SmBiT, which binds to active Gas, to either the plasma membrane (labelled with KRAS CAAX motif-LgBiT), or to endosomes (labelled with Endofin FYVE domain-LgBiT) in response to GLP-1R stimulation with exendin-4. This assay therefore measures GLP-1R activation specifically at each of these two subcellular locations. We have included a schematic of this assay in the new Supplementary Figure 3 to clarify the aim of these experiments.

      Certainly many follow-up experiments are possible from these initial findings and of primary interest is how this mutation affects insulin homeostasis in vivo under different physiological conditions. One of the biggest pathologies in insulin homeostasis in obesity/t2d is an elevation of baseline insulin release (as modeled in Fig 1E) that renders the fold-change in glucose stimulated insulin levels lower and physiologically less effective. No difference in primary mouse islet baseline insulin secretion was seen here but I wonder if this mutation would ameliorate diet-induced baseline hyperinsulinemia.

      We concur with the reviewer that it would be interesting to determine the effects of the GLP1R V229A mutation on insulin secretion responses under diet-induced metabolic stress conditions. While performing in vivo experiments on glucoregulation in mice harbouring the V229A mutation falls outside the scope of the present study, we have included ex vivo insulin secretion experiments in islets from GLP-1R KO mice transduced with adenoviruses expressing SNAP/FLAG-hGLP-1R WT or V229A and subsequently treated with vehicle versus MβCD loaded with 20 mM cholesterol to replicate the conditions of Figure 1E in the new Supplementary Figure 4.

      I would have liked to see the actual islet cholesterol content after 5wks high-cholesterol diet measured to correlate increased cholesterol load with diminished glucose-stimulated inulin. While not necessary for this paper, a comparison of islet cholesterol content after this cholesterol diet vs the more typical 60% HFD used in obesity research would be beneficial for GLP-1 physiology research broadly to take these findings into consideration with model choice.

      We have included these data in Supplementary Figure 1A.

      Another area to further investigate is does this mutation alter ex4 interaction/affinity/time of binding to GLP-1 or are all of the described findings due to changes in behavior and function of the receptor?

      To answer this question, have performed binding affinity experiments, which show no differences, in INS-1 832/3 SNAP/FLAG-hGLP-1R WT versus V229A cells (new Supplementary Figure 2D).

      Lastly, I wonder if V229A would have the same impact in a different cell type, especially in neurons? How similar are the cholesterol profiles of beta-cells and neurons? How this mutation (and future developed small molecules) may affect satiation, gut motility, and especially nausea, are of high translational interest. The comparison is drawn in the discussion between this mutation and ex4-phe1 to have biased agonism towards Gs over beta-arrestin signaling. Ex4-phe1 lowered pica behavior (a proxy for nausea) in the authors previously co-authored paper on ex4-phe1 (PMID 29686402) and I think drawing a parallel for this mutation or modification of cholesterol binding to potentially mitigate nausea is worth highlighting.

      While experiments in neurons are outside the scope of the present study, we have added this worthy point to the discussion and hypothesise on possible effects of GLP-1R mutants with modified cholesterol interactions on central GLP-1R actions in the revised manuscript.

      Reviewer #1 (Recommendations for the authors):

      There are no line numbers

      These have now been added.

      Abstract: "Cholesterol is a plasma membrane enriched lipid" - sorry for being finicky, but shouldn't this read; "a lipid often enriched in plasma membranes"

      We have modified the abstract to state that: “Cholesterol is a lipid enriched at the plasma membrane”.

      p. 4 "Moreover, islets extracted from high cholesterol-fed mice". How do you "extract islets"?

      We have exchanged the term “extracted” by “isolated”. Islet isolation is described in the paper methods section.

      p. 4 The sentence "These effects were accompanied by decreased GLP-1R plasma membrane diffusion under vehicle conditions, measured by Raster Image Correlation Spectroscopy (RICS) in rat insulinoma INS-1 832/3 cells with endogenous GLP-1R deleted [INS-1 832/3 GLP-1R KO cells (27)] stably expressing SNAP/FLAG-tagged human GLP-1R (SNAP/FLAG-hGLP-1R), an effect that is normally triggered by agonist binding (28), as also observed here (Supplementary Figure 1C, D)" is a masterpiece of complexity. Perhaps breaking up would facilitate reading?

      This paragraph has now been modified in the revised manuscript.

      p. 5. I cannot evaluate the "coarse grain molecular dynamics" studies.

      Reviewer #2 (Recommendations for the authors):

      I view this as an excellent manuscript with very comprehensive work and clear translational relevance. I don't think any further experiments are needed for the scope outlined in this manuscript. The discussion is already long but a short postulation on how this may translate to GLP-1R-cholesterol interactions in other cell types, specifically neurons with the intent on manipulating satiation and nausea, could be worthwhile.

      This has now been added.

      The only thing for readability I would suggest is a sentence in the results mentioning why you're doing the Laurdan analysis, and what is the output for assessing 'receptor activity' in the membrane and endosomes.

      Both points have now been added.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary:

      The authors examine CD8 T cell selective pressure in early HCV infection using. They propose that after initial CD8-T mediated loss of virus fitness, in some participants around 3 months after infection, HCV acquires compensatory mutations and improved fitness leading to virus progression.

      Strengths:

      Throughout the paper, the authors apply well-established approaches in studies of acute to chronic HIV infection for studies of HCV infection. This lends rigor the to the authors' work.

      Weaknesses:

      (1) The Discussion could be strengthened by a direct discussion of the parallels/differences in results between HIV and HCV infections in terms of T cell selection, entropy, and fitness.

      We have added a direct discussion of the parallels/differences between HIV and HCV throughout the discussion including at lines 308 – 310 and 315 -327.

      Lines 308-310: “In fact, many parallels can be drawn between HIV infections and HCV infections in the context of emerging viral species that escape T cell immune responses.”

      Lines: 315-327: “One major difference between HCV and HIV infection is the event where patients infected with HCV have an approximately 25% chance to naturally clear the infection as opposed to just achieving viral control in HIV infections. Here, we probed the underlying mechanism, and questioned how the host immune response and HCV mutational landscape can allow the virus to escape the immune system. To understand this process, taking inspiration from HIV studies (24), a quantitative analysis of viral fitness relative to viral haplotypes was conducted using longitudinal samples to investigate whether a similar phenomenon was identified in HCV infections for our cohort for patients who progress to chronic infection. We observed a decrease in population average relative fitness in the period of <90DPI with respect to the T/F virus in chronic subjects infected with HCV. The decrease in fitness correlated positively with IFN-γ ELISPOT responses and negatively with SE indicating that CD8+ T-cell responses drove the rapid emergence of immune escape variants, which initially reduced viral fitness. This is similarly reflected in HIV infected patients where strong CD8+ T-cell responses drove quicker emergence of immune escape variants, often accompanied by compensatory mutations (24).”

      (2) In the Results, please describe the Barton model functionality and why the fitness landscape model was most applicable for studies of HCV viral diversity.

      This has been added to the introduction section rather than Results as we feel that it is more appropriate to show why it is most applicable to HCV viral diversity in the background section of the manuscript. We write at lines 77-90:

      “Barton et al.’s [23] approach to understand HIV mutational landscape resulting in immune escape had two fundamental points: 1) replicative fitness depends on the virus sequence and the requirement to consider the effect of co-occurring mutations, and 2) evolutionary dynamics (e.g. host immune pressure). Together they pave the way to predict the mutational space in which viral strains can change given the unique immune pressure exerted by individuals infected with HIV. This model fits well with the pathology of HCV infection. For instance, HIV and HCV are both RNA viruses with rapid rate of mutation. Additionally, like HIV, chronic infection is an outcome for HCV infected individuals, however, unlike HIV, there is a 25% probability that individuals infected with HCV will naturally clear the virus. Previously published studies [9] have shown that HIV also goes through a genetic bottleneck which results in the T/F virus losing dominance and replaced by a chronic subtype, identified by the immune escape mutations. The concepts in Barton’s model and its functionality to assess the fitness based on the complex interaction between viral sequence composition and host immune response is also applicable to early HCV infection.”

      (3) Recognize the caveats of the HCV mapping data presented.

      We have now recognized the caveats of the HCV mapping data at lines 354-256 “While our findings here are promising, it should be recognized that although the bioinformatics tool (iedb_tool.py) proved useful for identifying potential epitopes, there could be epitopes that are not predicted or false-positive from the output which could lead to missing real epitopes”

      (4) The authors should provide more data or cite publications to support the authors' statement that HCV-specific CD8 T cell responses decline following infection.

      We have now clarified at lines 352-353 that the decline was toward “selected epitopes that showed evidence of escape”.

      Furthermore, we have cited two publications at line 352 that support our statement.

      (5) Similarly, as the authors' measurements of HCV T and humoral responses were not exhaustive, the text describing the decline of T cells with the onset of humoral immunity needs caveats or more rigorous discussion with citations (Discussion lines 319-321).

      We have now added a caveat in the discussion at lines 357-360 which reads

      “In conclusion, this study provides initial insights into the evolutionary dynamics of HCV, showing that an early, robust CD8+ T-cell response without nAbs strongly selects against the T/F virus, enabling it to escape and establish chronic infection. However, these findings are preliminary and not exhaustive, warranting further investigation to fully understand these dynamics. “

      (6) What role does antigen drive play in these data -for both T can and antibody induction?

      It is possible that HLA-adapted mutations could limit CD8 T cell induction if the HLAs were matched between transmission pairs, as has been shown previously for HIV (https://doi.org/10.1371/journal.ppat.1008177) with some data for HCV (https://journals.asm.org/doi/10.1128/jvi.00912-06). However, we apologise as we are not entirely sure that this is what the reviewer is asking for in this instance.

      (7) Figure 3 - are the X and Y axes wrongly labelled? The Divergent ranges of population fitness do not make sense.

      Our apologies, there was an error with the plot in Figure 3 and the X and Y axis were wrongly labelled. This has now been resolved.

      (8) Figure S3 - is the green line, average virus fitness?

      This has now been clarified in Figure S3.

      (9) Use the term antibody epitopes, not B cell epitopes.

      We now use the term antibody epitopes throughout the manuscript.

      Reviewer #1 (Recommendations for the authors):

      Recommendations for improving the writing and presentation:

      (1) Introduction:

      Line 52: 'carry mutations B/T cell epitopes'. Two points

      i) These are antibody epitopes (and antibody selection) not B cell epitopes

      We have corrected this sentence at line 55 which now reads: “carry mutations within epitopes targeted by B cells and CD8+ T cells”.

      ii) To avoid confusion, add text that mutations were generated following selection in the donor.

      For HCV, it is unclear if mutations are generated following selection or have been occurring in low frequencies outside detection range. Only when selection by host immune pressure arises do the potentially low-frequency variants become dominant. However, we do acknowledge it is potentially misleading to only mention new variants replacing the transmitted/founder population. We have modified the sentence at line 52 to read:

      “At this stage either an existing variant that was occurring in low-frequency outside detection range or an existing variant with novel mutations generated following immune selection is observed in those who progress to chronic infection”

      - Lines 51-56: Human studies of escape and progression are associative, not causative as implied.

      Correct, evidence suggesting that escape and progression are currently associative. We have now corrected these lines to no longer suggest causation.

      - Line 65: Suggest you clarify your meaning of 'easier'?

      This sentence, now at line 72, has been modified to: “subtype 1b viruses have a higher probability to evade immune responses”

      (2) Results:

      - Line 147: Barton model (ref'd in Intro) is directly referred to here but not referenced.

      The reference has been added.

      - The authors should cite previous HIV literature describing associations between the rate of escape and Shannon Entropy e.g. the interaction between immunodominance, entropy, and rate of escape in acute HIV infection was described in Liu et al JCI 2013 but is not cited.

      We have now cited previous HIV research at line 147-151, adding Liu et al:

      “Additionally, the interaction between immunodominance, entropy, and escape rate in acute HIV infection has been described, where immunodominance during acute infection was the most significant factor influencing CD8+ T cell pressure, with higher immunodominance linked to faster escape (27). In contrast, lower epitope entropy slowed escape, and together, immunodominance and entropy explained half of the variability in escape timing (27).”

      - Line 319: The authors suggest that HCV-specific CD8 T cell response declines following early infection. On what are they basing this statement? The authors show their measured T cell responses decline but their approach uses selected epitopes and they are therefore unable to assess total HCV T cell response in participants (Where there is no escape, are T cell magnitudes maintained or do they still decline?). Can the authors cite other studies to support their statement?

      We have now clarified that the decline was toward “selected epitopes that showed evidence of escape”. Furthermore, we also cite two studies to support our findings.

      - Throughout the authors talk in terms of CD8 T cells but the ELISpot detects both CD4 and CD8 T cell responses. I suggest the authors be more explicit that their peptide design (9-10mers) is strongly biased to only the detection of CD8 T cells.

      To make this clearer and more explicit we have now added to the methods section at line 433-435:

      “While the ELISpot assay detects responses from both CD4 and CD8 T cells, our peptide design (9-10mers) is strongly biased toward CD8 T-cell detection. We have therefore interpreted ELISpot responses primarily in terms of CD8 T-cell activity.”

      - The points made in lines 307-321 could be more succinct

      We have now edited the discussion (lines 307 – 321) to make the points more succinct (now lines 307-323).

      Minor corrections to text, figures:

      - Figure 2: suggest making the Key bigger and more obvious.

      We have now made the key bigger and more obvious

      - Figure 3 A & D....is there an error on the X-axis...are you really reporting ELISpot data of < 1 spot/10^6? Perhaps the X and Y axes are wrongly labelled?

      Our apologies, there was an error with the plot in Figure 3 and the X and Y axis were wrongly labelled. This has now been resolved.

      - Figure 5: As this is PBMC, remove CD8 from the description of ELISpot. 

      We have now removed CD8 from the description of ELISpot in both Figure 5 and Figure S3

      Reviewer #2 (Public review):

      Summary:

      In this work, Walker and collaborators study the evolution of hepatitis C virus (HCV) in a cohort of 14 subjects with recent HCV infections. They focus in particular on the interplay between HCV and the immune system, including the accumulation of mutations in CD8+ T cell epitopes to evade immunity. Using a computational method to estimate the fitness effects of HCV mutations, they find that viral fitness declines as the virus mutates to escape T-cell responses. In long-term infections, they found that viral fitness can rebound later in infection as HCV accumulates additional mutations.

      Strengths:

      This work is especially interesting for several reasons. Individuals who developed chronic infections were followed over fairly long times and, in most cases, samples of the viral population were obtained frequently. At the same time, the authors also measured CD8+ T cell and antibody responses to infection. The analysis of HCV evolution focused not only on variation within particular CD8+ T cell epitopes but also on the surrounding proteins. Overall, this work is notable for integrating information about HCV sequence evolution, host immune responses, and computational metrics of fitness and sequence variation. The evidence presented by the authors supports the main conclusions of the paper described above.

      Weaknesses:

      One notable weakness of the present version of the manuscript is a lack of clarity in the description of the method of fitness estimation. In the previous studies of HIV and HCV cited by the authors, fitness models were derived by fitting the model (equation between lines 435 and 436) to viral sequence data collected from many different individuals. In the section "Estimating survival fitness of viral variants," it is not entirely clear if Walker and collaborators have used the same approach (i.e., fitting the model to viral sequences from many individuals), or whether they have used the sequence data from each individual to produce models that are specific to each subject. If it is the former, then the authors should describe where these sequences were obtained and the statistics of the data.

      If the fitness models were inferred based on the data from each subject, then more explanation is needed. In prior work, the use of these models to estimate fitness was justified by arguing that sequence variants common to many individuals are likely to be well-tolerated by the virus, while ones that are rare are likely to have high fitness costs. This justification is less clear for sequence variation within a single individual, where the viral population has had much less time to "explore" the sequence landscape. Nonetheless, there is precedent for this kind of analysis (see, e.g., Asti et al., PLoS Comput Biol 2016). If the authors took this approach, then this point should be discussed clearly and contrasted with the prior HIV and HCV studies.

      We thank the reviewer for pointing out the weakness in our explanation and description of the fitness model. The model has been generated using publicly released viral sequences and this has been described in a previous publication by Hart et al. 2015. T/F virus from each of the subjects chronically infected with HCV in our cohort were given to the model by Hart et al. to estimate the initial viral fitness of the T/F variant. Subsequent time points of each subject containing the subvariants of the viral population were also estimated using the same model (each subtype). For each subject, these subvariant viral fitness values were divided by the fitness value of the initial T/F virus (hence relative fitness of the earliest time points with no mutations in the epitope regions were a value of 1.000). All other fitness values are therefore relative fitness to the T/F variant.

      We have further clarified this point in the methods section “Estimating survival fitness of viral variant” to better describe how the data of the model was sourced (Lines 465-499).

      To add to the reviewer’s point, we agree that sequence variants common to many individuals are likely to be well-tolerated by the virus and this event was observed in our findings as our data suggested that immune escape variants tended to revert to variants that were closer the global consensus strain. Our previous publications have indicated that T/F viruses during transmission were variants that were “fit” for transmission between hosts, especially in cases where the donor was a chronic progressor, a single T/F is often observed. Progression to immune escape and adaptation to chronic infection in the new host has an in-between process of genetic expansion via replication followed by a bottleneck event under immune pressure where overall fitness (overall survivability including replication and exploring immune escape pathways) can change. Under this assumption we questioned whether the observation reported in HIV studies (i.e. mutation landscapes that allow HIV adaptation to host) also happens in HCV infections. Furthermore, cohort used in this study is a rare cohort where patients were tracked from uninfected, to HCV RNA+, to seroconversion and finally either clearing the virus or progression to chronic infection. Thus, it is of importance to understand the difference between clearance and chronic progression.

      Another important point for clarification is the definition of fitness. In the abstract, the authors note that multiple studies have shown that viral escape variants can have reduced fitness, "diminishing the survival of the viral strain within the host, and the capacity of the variant to survive future transmission events." It would be helpful to distinguish between this notion of fitness, which has sometimes been referred to as "intrinsic fitness," and a definition of fitness that describes the success of different viral strains within a particular individual, including the potential benefits of immune escape. In many cases, escape variants displace variants without escape mutations, showing that their ability to survive and replicate within a specific host is actually improved relative to variants without escape mutations. However, escape mutations may harm the virus's ability to replicate in other contexts. Given the major role that fitness plays in this paper, it would be helpful for readers to clearly discuss how fitness is defined and to distinguish between fitness within and between hosts (potentially also mentioning relevant concepts such as "transmission fitness," i.e., the relative ability of a particular variant to establish new infections).

      Thank you for pointing out the weakness of our definition of fitness. We have now clarified this at multiple sections of the paper: In the abstract at lines 18-21 and in the introduction at lines 64-69.

      These read:

      Lines 18-21: “However, this generic definition can be further divided into two categories where intrinsic fitness describes the viral fitness without the influence of any immune pressure and effective fitness considers both intrinsic fitness with the influence of host immune pressure.”

      Lines 64-69: “This generic definition of fitness can be further divided into intrinsic fitness (also referred to as replicative fitness), where the fitness of sequence composition of the variant is estimated without the influence of host immune pressure. On the other hand, effective fitness (from here on referred to as viral fitness) considers fundamental intrinsic fitness with host immune pressure acting as a selective force to direct mutational landscape (19)[REF], which subsequently influences future transmission events as it dictates which subvariants remain in the quasispecies.”

      One concern about the analysis is in the test of Shannon entropy as a way to quantify the rate of escape. The authors describe computing the entropy at multiple time points preceding the time when escape mutations were observed to fix in a particular epitope. Which entropy values were used to compare with the escape rate? If just the time point directly preceding the fixation of escape mutations, could escape mutations have already been present in the population at that time, increasing the entropy and thus drawing an association with the rate of escape? It would also be helpful for readers to include a definition of entropy in the methods, in addition to a reference to prior work. For example, it is not clear what is being averaged when "average SE" is described.

      We thank the reviewer to point out the ambiguity in describing average SE. This has been rectified by adding more information in the methods section (Lines 397 to 400):

      “Briefly, SE was calculated using the frequency of occurrence of SNPs based on per codon position, this was further normalized by the length of the number of codons in the sequence which made up respective protein. An average SE value was calculated for each time point in each protein region for all subjects until the fixation event.”

      To answer the reviewer’s question, we computed entropy at multiple time points preceding the observation in the escape mutation. The escape rate was calculated for the epitopes targeted by immune response. We compared the average SE based on change of each codon position and then normalised by protein length, where the region contained the epitope and the time it took to reach fixation. We observed that if the protein region had a higher rate of variation (i.e. higher average SE) then we also see a quicker emergence of an immune escape epitope. Since we took SE from the very first time point and all subsequent time points until fixation, we do not think that escape mutations already been present at the population would alter the findings of the association with rate of escape. Especially, these escape mutations were rarely observed at early time points. It is likely that due to host immune pressure that the escape variant could be observed, the SE therefore suggest the liberty of exploration in the mutation landscape. If the region was highly restrictive where any mutations would result in a failed variant, then we should observe relatively lower values of average SE. In other words, the higher variability that is allowed in the region, the greater the probability that it will find a solution to achieve immune escape.

      Reviewer #2 (Recommendations for the authors):

      In addition to the main points above, there are a few minor comments and suggestions about the presentation of the data.

      (1) It's not clear how, precisely, the model-based fitness has been calculated and normalized. It would be helpful for the authors to describe this explicitly. Especially in Figure 3, the plotted fitness values lie in dramatically different ranges, which should be explained (maybe this is just an error with the plot?).

      We have now clarified how the model-based fitness has been calculated and normalized in the method section “Estimating survival fitness of viral variants” at line 465-472.

      “The model used for estimating viral fitness has been previously described by Hart et al. (19). Briefly, the original approach used HCV subtype 1a sequences to generate the model for the NS5B protein region. To update the model for other regions (NS3 and NS2) as well as other HCV subtypes in this study, subtype 1b and subtype 3a sequences were extracted from the Los Almos National Laboratory HCV database. An intrinsic fitness model was first generated for each subtype for NS5B, NS3 and NS2 region of the HCV polyprotein. Then using, longitudinally sequenced data from patients chronically infected with HCV as well as clinically documented immune escape to describe high viral fitness variants, we generated estimates of the viral fitness for subjects chronically infected with HCV in our cohort.”

      Our apologies, there was an error with the plot in Figure 3. This has now been resolved.

      (2) In different plots, the authors show every pairwise comparison of ELISPOT values, population fitness, average SE, and rate of escape. It may be helpful to make one large matrix of plots that shows all of these pairwise comparisons at the same time. This could make it clear how all the variables are associated with one another. To be clear, this is a suggestion that the authors can consider at their discretion.

      Thank you for the suggestion to create a matrix of plots for pairwise comparisons. While this approach could indeed clarify variable associations, implementing it is outside the scope of this project. We appreciate the idea and may consider it in future studies as we continue to expand on this work.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Zhang et al. describe a delicate relationship between Tet2 and FBP1 in the regulation of hepatic gluconeogenesis.

      Strengths:

      The studies are very mechanistic, indicating that this interaction occurs via demethylation of HNF4a. Phosphorylation of HNF4a at ser 313 induced by metformin also controls the interaction between Tet2 and FBP1.

      We are grateful for the reviewer's praise on the manuscript.

      Weaknesses:

      The results are briefly described, and oftentimes, the necessary information is not provided to interpret the data. Similarly, the methods section is not well developed to inform the reader about how these experiments were performed. While the findings are interesting, the results section needs to be better developed to increase confidence in the interpretation of the results.

      Thanks very much for pointing out the shortcomings of the manuscript. We apologize that we did not provide detailed description for some experimental methods and results. Following reviewer’s suggestion, we added the details in method section, including the generation of whole-body Tet2 KO mice and liver-specific Tet2 knockdown mice (AAV8-shTet2), the missing information of reagent, antibody, primer sequences and mutant generation, and the methods of chromatin immunoprecipitation (ChIP) and immunofluorescence. The interpretation of the results was also further developed according to reviewer’s comments.

      Reviewer #2 (Public review):

      Summary:

      This study reveals a novel role of TET2 in regulating gluconeogenesis. It shows that fasting and a high-fat diet increase TET2 expression in mice, and TET2 knockout reduces glucose production. The findings highlight that TET2 positively regulates FBP1, a key enzyme in gluconeogenesis, by interacting with HNF4α to demethylate the FBP1 promoter in response to glucagon. Additionally, metformin reduces FBP1 expression by preventing TET2-HNF4α interaction. This identifies an HNF4α-TET2-FBP1 axis as a potential target for T2D treatment.

      Strengths:

      The authors use several methods in vivo (PTT, GTT, and ITT in fasted and HFD mice; and KO mice) and in vitro (in HepG2 and primary hepatocytes) to support the existence of the HNF4alpha-TET-2-FBP-1 axis in the control of gluconeogenesis. These findings uncovered a previously unknown function of TET2 in gluconeogenesis.

      We are grateful for the reviewer's praise on the manuscript.

      Weaknesses:

      Although the authors provide evidence of an HNF4α-TET2-FBP1 axis in the control of gluconeogenesis, which contributes to the therapeutic effect of metformin on T2D, its role in the pathogenesis of T2D is less clear. The mechanisms by which TET2 is up-regulated by glucagon should be more explored.

      Thanks very much for pointing out the shortcomings of the manuscript. We agree with the reviewer that the manuscript is focused on the function of HNF4α-TET2-FBP1 axis in the control of gluconeogenesis, but not on its role in the pathogenesis of T2D. Following reviewer’s suggestion, we changed the title of the manuscript to “HNF4α-TET2-FBP1 axis contributes to gluconeogenesis and type 2 diabetes”. For the mechanisms by which TET2 is up-regulated by glucagon, we examined TET2 mRNA levels at different time points after a single dose of glucagon treatment in HepG2 cells. Interestingly, the results showed that TET2 mRNA levels significantly increased by 6 folds at 30 min and the sustained effect of glucagon on Tet2 mRNA levels persisted for more than 48 hours (refer to Fig. 3E).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):<br /> The authors indicate that they have overexpressed TET2 in HepG2 cells and primary mouse hepatocytes. The degree of overexpression should be shown. Is this similar to an increase in TET2 with fasting or HFD treatment?

      Thanks for reviewer’s helpful comment. Following reviewer’s suggestion, we examined the protein levels of overexpressed TET2 in HepG2 cells and primary mouse hepatocytes. The results revealed that the degree of TET2 overexpression (refer to Fig. 3J) is similar to the increase of TET2 under fasting or HFD treatment (Fig. 1C, D).

      In Figures 2E-2G, the authors report results in Tet2-KO mice. Information on how these mice were generated is lacking. There is limited information about how Tet2-KO cells were generated, but again, I could not find anything about these mice in the methods section or figure legend. Is this whole-body or liver-specific Tet2-KO? How old were the mice at the time of PTT, GTT, or ITT?

      Were these mice on chow or HFD? Are there any differences in body weight between WT and Tet2-KO mice?

      Thanks for reviewer’s helpful comment. Following reviewer’s suggestion, we provided the detailed information about the Tet2-KO mice, including the mouse generation in methods section. Moreover, the details of Tet2-KO mice used in each figure were clearly described in the figure legend. In this study, two mouse models were employed: whole-body Tet2-KO mice and liver-specific TET2 knockdown mice (AAV8-shTet2). The mice used for PTT, GTT and ITT were 8 weeks old and on HFD. To address reviewer’s concern, we compared the body weight of WT and Tet2-KO mice and results revealed that no significant differences in the body weight between WT and Tet2-KO mice at 8 and 10 weeks old when on a normal chow diet, as depicted in Figure 2I.

      Figures 3A-C shows that 48 hours after glucagon treatment, Tet2 and FBP1 mRNA increased. It's surprising that a single dose of glucagon would have effects that last that long. The peak rise in glucose following glucagon treatment occurs in 30 minutes. How do authors explain such a long effect of glucagon on Tet2 mRNA and protein?

      Thanks for reviewer’s constructive comment. To address reviewer’s concern, we examined the mRNA levels of TET2 and FBP1 at different time points following a single dose of glucagon treatment in HepG2 cells. Interestingly, the results showed that TET2 mRNA levels significantly increased by 6 folds at 30 min and the sustained effect of glucagon on Tet2 mRNA levels persisted for more than 48 hours (refer to Fig. 3E). The detailed mechanism underlying long effect of glucagon on Tet2 mRNA and protein needs further exploration.

      It's interesting that in Figure 3F, Fbp1 and Tet2 mRNA expression correlated positively in both ad libitum and fasting conditions. I would expect that during fed conditions, gluconeogenesis would not be activated and thus would expect no correlation.

      Thanks for reviewer’s constructive comment. According to the results in new Fig. 3H, the mRNA levels of Fbp1 and Tet2 indeed positively correlated in both ad libitum and fasting conditions, while the r value is higher and p value is lower in fasting condition compared to ad libitum. Notably, both the expression levels of Fbp1 and Tet2 increased under fasting treatment, which is consistent with Fig. 1C and Fig. 4K.

      The authors state that "Our results demonstrated that HNF4α recruits TET2 to the FBP1 promoter and activates FBP1 expression through demethylation" What data points out that this is mediated through demethylation?

      Thanks for reviewer’s constructive comment. Following reviewer’s suggestion, we conducted new ChIP experiments. These data demonstrated that HNF4α recruits TET2 to the FBP1 promoter and activates FBP1 expression through demethylation, as showed in Fig. 4F-H.

      For Figures 5B, 4D, and 3L-N y-axes are labeled as fold enrichment. The authors should clearly indicate what was being measured on y-axes.

      Thanks for reviewer’s helpful comment. Following reviewer’s suggestion, we clearly labeled all the y-axes in each figure.

      The authors indicate that metformin increases phosphorylation of Hnf4a at ser 313 Figure 5C. How do we know that ser 313 is involved? Only one antibody is listed for Hnf4a (SAB, 32591).

      Thanks very much for pointing out. We determined the phosphorylation levels of HNF4α at S313 using Anti-HNF4α (phospho S313) (ab78356), we apologize for not labeling it clearly. Now, we made it clear in Fig. 5C and the detailed information of the antibody was added to the method section of “Western Blot and Immunoprecipitation”.

      How did the authors make phosphomimetic mutation (S313D) and phosphoresistant mutation (S313A) of HNF4α? This is not described.

      Thanks very much for pointing out. Following reviewer’s suggestion, the detailed method for making phosphomimetic mutation (S313D) and phosphoresistant mutation (S313A) of HNF4α was added to the method section of “Gene Knockout Cells and Mutagenesis”.

      Reviewer #2 (Recommendations for the authors):

      Major points:

      (1) Other key gluconeogenesis genes (e.g. PEPCK and G6Pase) should have been investigated to demonstrate whether or not the regulation of TET-2 is specific on FBP-1.

      Thanks for reviewer’s helpful comment. Following reviewer’s suggestion, we designed the qPCR to assay other key gluconeogenesis genes, including PEPCK and G6Pase, and the results showed that glucagon treatment had no effect on PEPCK and G6Pase expression (Fig. 3D), suggesting the regulation of TET2 is specific on FBP1.

      (2) The methods are not well defined and more details should be given, for example, to explain how the Tet2 KO mice were generated. Since these animals are not KO liver-specific and TET2 is expressed in a variety of tissues and organs and is predominantly found in hematopoietic cells, including bone marrow and blood cells, the phenotype of these mice should be better characterized.

      Thanks for reviewer’s helpful comment. The Tet2 knockout (Tet2 KO) mice were originally purchased from the Jackson Laboratory (strain No. 023359) and we added the detailed information to method section of “Animal”. According to the previously reported phenotype of Tet2 KO mice, it mainly includes bone marrow, spleen, islet and heart. Specifically, Tet2 KO mice led to an increase of total cell numbers in the bone marrow and spleen (PMID: 21873190), as well as an elevated white blood cell (WBC) count (PMID: 37541212). Additionally, Tet2 KO mice exhibited splenomegaly (PMID: 37541212, PMID: 21723200, PMID: 38773071, PMID: 21723200). And the morphology of the islets (PMID: 34417463), anatomical chamber volumes or ventricular functions (PMID: 38357791) were indistinguishable between the Tet2 KO and wild type (WT) mice.

      (3) An experiment showing the co-localization of TET2 and HNF4α in the mouse liver in fasted mice and/or in HFD-mice would strengthen the data shown in Figure 3.

      Thanks for reviewer’s helpful comment. Following reviewer’s suggestion, the experiments showing the co-localization of TET2 and HNF4α in the mouse liver in fasted mice and FD mice were conducted, as shown in new Fig. 4B and C.

      Minor points:

      (1) Given that the manuscript does not focus on the role of TET2 in the pathogenesis of T2D, its title should be changed.

      hanks for reviewer’s helpful comment. Following reviewer’s suggestion, we changed the title of the manuscript to “HNF4α-TET2-FBP1 axis contributes to gluconeogenesis and type 2 diabetes”.

      (2) Please indicate the molecular weight of bands in all figures.

      Thanks for reviewer’s helpful comment. Following reviewer’s suggestion, the molecular weight of bands was indicated in all figures.

      (3) Why do the control values of the y-axis in Figure 1 A and B are so different? Please maintain the same scale in both figures.

      Thanks for reviewer’s helpful comment. Following reviewer’s suggestion, we recalculated and normalized the control value in Fig. 1A to maintain the same scale in both figures.

      (4) In Figure 2F, do the plasma insulin levels have altered in response to GTT in Tet2-KO mice? If so, please show the data and discuss.

      Thanks for reviewer’s helpful comment. Following reviewer’s suggestion, we examined the plasma insulin levels in the process of GTT assay, and the result revealed that Tet2-KO mice showed lower insulin levels after glucose administration, which reflects higher insulin sensitivity, as shown in new Fig. 2H.

      (5) The increase of TET2 hepatic protein levels in response to fasting occur in other tissues and hematopoietic cells?

      Thanks for reviewer’s helpful comment. Following reviewer’s suggestion, we examined Tet2 protein levels under fasting condition in other tissues and hematopoietic cells, and found that fasting also increased Tet2 protein levels in kidney, brain, and hematopoietic cells, but not in heart.

      Author response image 1.

      (6) Please indicate the glucagon concentration and metformin dose in all figures in which they are mentioned.

      Thanks for reviewer’s helpful comment. Following reviewer’s suggestion, the glucagon concentration (20 nM) and metformin concentration (10 mM for HepG2 cell treatment and 300 mg/kg per day for mice treatment) were added in the figure legends, respectively.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The crystal structure of the Sld3CBD-Cdc45 complex presented by Li et al. is a novel contribution that significantly advances our understanding of CMG formation during the rate-limiting step of DNA replication initiation. This structure provides insights into the intermediate steps of CMG formation. The study builds upon previously known structures of Sld3 and Cdc45 and offers new perspectives into how Cdc45 is loaded onto MCM DH through Sld3-Sld7. The most notable finding is the structural difference in Sld3CBD when bound to Cdc45, particularly the arrangement of the α8-helix, which is essential for Cdc45 binding and may also pertain to its metazoan counterpart, Treslin. Additionally, the conformational shift in the DHHA1 domain of Cdc45 suggests a possible mechanism for its binding to MCM2NTD.

      Strengths:

      The manuscript is generally well-written, with a precise structural analysis and a solid methodological section that will significantly advance future studies in the field. The predictions based on structural alignments are intriguing and provide a new direction for exploring CMG formation, potentially shaping the future of DNA replication research.

      Weaknesses:

      The main weakness of the manuscript lies in the lack of experimental validation for the proposed Sld3-Sld7-Cdc45 model. Specifically, the claim that Sld3 binding to Cdc45-MCM does not inhibit GINS binding, a finding that contradicts previous research, is not sufficiently substantiated with experimental evidence. To strengthen their model, the authors must provide additional experimental data to support this mechanism. Also, the authors have not compared the recently published Cryo-EM structures of the metazoan CMG helicases with their predicted models to see if Sld3/Treslin does not cause any clash with the GINS when bound to the CMG. Still, the work holds great potential in its current form but requires further experiments to confirm the authors' conclusions.

      We appreciate the reviewers’ careful reading and the comments.

      Our structural analysis of Sld3CBD-Cdc45 showed the detailed interaction map between Sld3CBD and Cdc45 at 2.6 Å resolution. The Sld3, MCM and GINS binding sites of Cdc45 completely differed, suggesting that the Sld3CBD, Cdc45 and GINS could bind to MCM together. The SCMG-DNA model confirmed such a binding manner, although our study does not show how this binding manner affects the GINS loading by other initiation factors (Dpb11, Sld2, et. al). Regarding the previous studies, competition of Sld3 and GINS for binding to Cdc45 or Cdc45-MCM (Bruck et. al), which may be caused by the conformation change of Cdc45 DHHA1 between Sld3CBD-Cdc45 and CMG. We modified our manuscript and discussed (P7/L168-173, and P10/L282-286). Following the comment, we checked the recently published Cryo-EM structure (PDBID:8Q6O) with their predicted models of the metazoan CMG helicases (P7/L198-P8/L202) and added the Cdc45 mutation experiments to confirm our conclusion ([Recommendations for the authors] Q18).

      Reviewer #2 (Public review):

      Summary

      The manuscript presents valuable findings, particularly in the crystal structure of the Sld3CBD-Cdc45 interaction and the identification of additional sequences involved in their binding. The modeling of the Sld7-Sld3CBD-CDC45 subcomplex is novel, and the results provide insights into potential conformational changes that occur upon interaction. However, the work remains incomplete as several main claims are only partially supported by experimental data, particularly the proposed model for Sld3 interaction with GINS on the CMG. Additionally, the single-stranded DNA binding data from different species do not convincingly advance the manuscript's central arguments.

      Strengths

      (1) The Sld3CBD-Cdc45 structure is a novel contribution, revealing critical residues involved in the interaction.

      (2) The model structures generated from the crystal data are well presented and provide valuable insights into the interaction sequences between Sld3 and Cdc45.

      (3) The experiments testing the requirements for interaction sequences are thorough and conducted well, with clear figures supporting the conclusions.

      (4) The conformational changes observed in Sld3 and Cdc45 upon binding are interesting and enhance our understanding of the interaction.

      (5) The modeling of the Sld7-Sld3CBD-CDC45 subcomplex is a new and valuable addition to the field.

      Weaknesses

      (1) The proposed model for Sld3 interacting with GINS on the CMG needs more experimental validation and conflicts with published findings. These discrepancies need more detailed discussion and exploration.

      Our structural analysis experiment of Sld3CBD-Cdc45 showed the detailed interaction information between Sld3CBD and Cdc45 at 2.6 Å resolution. The Sld3CBD-binding site of Cdc45 is completely different from that of GINS and MCM binding to Cdc45, suggesting that the Sld3CBD, Cdc45, and GINS could bind to MCM together. The SCMG-DNA model confirmed such a binding manner. Following the comment, we added a Cdc45 mutant analysis, disrupting the binding to MCM and GINS but not affecting the Sld3CBD binding (Supplementary Figure 9). Our model is consistent with the GINS-loading requirement (the phosphorylation of Sld3 on Cdc45-MCM) and has no discrepancies with the stepwise loading fashion (Please see the responses to [Recommendations for the authors] Reviewer#1-Q14-15]). Regarding the previous studies, competition of Sld3 and GINS for binding to Cdc45 or Cdc45-MCM (Bruck et. al), by in vitro binding experiments, please see the responses to [Recommendations for the authors] Q6.

      (2) The section on the binding of Sld3 complexes to origin single-stranded DNA needs significant improvement. The comparisons between Sld3-CBD, Sld3CBD-Cdc45, and Sld7-Sld3CBD-Cdc45 involve complexes from different species, limiting the comparisons' value.

      As suggested, we tried to improve the ssDNA-binding section (Please see the responses to [Recommendations for the authors]: Q4 and Q5). We used Sld7-Sld3CBD-Cdc45 from different sources due to limitations in protein expression. These two sources belong to the same family and the proteins Sld7, Sld3 and Cdc45 have sequence conservation with similar structures predicted by the alphafold3 (RMSD = 0.356, 1.392, and 0.891 for Ca atoms of Sld7CTD, Sld7NTD-Sld3NTD, and Sld3CBD-Cdc45). Such similarity in source and protein lever allows us to do the comparison.

      (3) The authors' model proposing the release of Sld3 from CMG based on its binding to single-stranded DNA is unclear and needs more elaboration.

      Considering that ssDNA (ssARS1) is produced by CMG, the ssDNA-binding of Sld3 should happen after forming an active CMG. Therefore, the results of ssDNA binding experiments implied that the Sld3 release could be with the binding to ssDNA produced by CMG. We tried to present more elaborations in the revised version. (Please see the responses to [Recommendations for the authors] Q4, Q5).

      Reviewer #3 (Public review):

      Summary:

      The paper by Li et al. describes the crystal structure of a complex of Sld3-Cdc45-binding domain (CBD) with Cdc45 and a model of the dimer of an Sld3-binding protein, Sld7, with two Sld3-CBD-Cdc45 for the tethering. In addition, the authors showed the genetic analysis of the amino acid substitution of residues of Sld3 in the interface with Cdc45 and biochemical analysis of the protein interaction between Sld3 and Cdc45 as well as DNA binding activity of Sld3 to the single-strand DNAs of the ARS sequence.

      Strengths:

      The authors provided a nice model of an intermediate step in the assembly of an active Cdc45-MCM-GINS (CMG) double hexamers at the replication origin, which is mediated by the Sld3-Sld7 complex. The dimer of the Sld3-Sld7 complexes tethers two MCM hexamers together for the recruitment of GINS-Pol epsilon on the replication origin.

      Weaknesses:

      The biochemical analysis should be carefully evaluated with more quantitative ways to strengthen the authors' conclusion.

      We thank your positive assessment. We provided more quantitative information and tried to quantify the experiments as suggested (Please see the responses to [Recommendations for the authors]).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      I have several concerns that I will outline below, accompanied by my suggestions.

      (1) "The title of the paper- "Structural and functional insights into Cdc45 recruitment by Sld7-Sld3 for CMG complex Formation," appears misleading because it appears that authors present a structure of Sld3-Sld7 in complex with Cdc45, which is not the case here. If authors can provide additional structures proving the function of this complex, then this title justifies it. Otherwise, I recommend making a title that justifies the presented work in its current form.

      Following the comment, we change the title to “Sld3CBD-Cdc45 structural insights into Cdc45 recruitment for CMG complex formation”.

      (2) In lines 70-72, where the authors mention the known structures of different proteins, intermediates, and complexes, I recommend including PDB IDs of the described structures and reference citations. This will help the readers to analyze what is missing in the pathway and why this structure is essential.

      Following the comment, we added PBDIDs and references (P3/L72-74).

      (3) The representation of Figure 1A is unclear and looks clumsy. If the structure were rotated in another orientation, where α8 and α9 would be displayed on the forward side, it would be more helpful to understand the complex forming regions by looking at the structure. Also, I recommend highlighting the α8 and α9 in a contrasting color to be easily visible and attract readers' attention. Similarly, it would also be helpful if DHAA1 would be shown in a different color.

      Following the comment, we modified the Figure1 to show α8 and α9 of Sld3CBD and DHAA1 of Cdc45 clearly in revised version.

      (4) Can authors add a supplementary figure showing the probability of disorderness of the α8 helix region in the Sld3? Also, highlight what region became ordered in their structure.

      Yes, we have showed the disordered α8 helix region and highlight ordered α8 in the Sld3 in Figure S4 A.

      (5) Can you compare the Cdc45 long distorted helix (Supplementary Figure 3B) in the Sld3-Cdc45 complex with the Xenoupus and drosophila Cdc45 from their CMG structures? Also, can the authors explain why this helix is destabilized in their structure but is relatively stable in another Cdc45 structure (in CMG and HuCdc45)?

      We have checked all Cdc45 from published cryo-EM CMG structures, including Xenopus CMG-donson (8Q6O) and Drosophila CMG (6RAW), and all of them ordered the long helix in the CMG complex, whereas this long helix was disordered in the crystal structure of Sld3CBD-Cdc45 and Entamoeba histolytica Cdc45. The crystal packing around the long helix showed that it looks to be stabilized by crystal packing only in huCdc45, therefore we suggested that this long helix is detestable for crystallization.

      (6) I recommend adding the following parameters to Supplementary Table 2: 1. Rmerge values, 2. Wilson B factor, 3. Average B factor, and 4. Total number of molecules in ASU.

      We are sorry to make a mistake about Rmerge in Table 2. We correct it. We added the Wilson B factor, the average B factor, and the total number of Sld3CBD-Cde45 in ASU.

      (7) Can authors provide the B factor values of the α8 helix of Sld3?

      We checked the B factor values of the helix α8CTP of Sld3 in Sld3CBD-Cdc45. Since this helix binds to Cdc45 stably, the average B factor of the main chain is 45 Å<sup>2</sup> less than that of the whole structure. We added the average B factor of helix α8CTP into the Supplementary Figure 4A legend.

      (8) Can authors explain why higher Ramachandran outliers exist in their structure? Can it be reduced below 1% during refinement?

      There are 13 outliers (1.67%) in different places: four are close to the disorder regions (poor electron map), four are in a loop with poor map and the remains are turn parts or a loop. For the residues with poor electron maps, we could not modify them to the allow Ramachandran region with low Rfree value, so we could not reduce them to below 1% during refinement while keeping the current Rfree value.

      (9) In Supplementary Figure 8, please show the CD spectra of the Sld3WT. Why is the Sld3-3S peak relatively flat? Was the sample precipitating while doing the measurements, or does it have less concentration than others?

      To check the folding of the mutants, we did CD experiments with the estimated secondary structure elements. Because WT Sld3CBD was prepared in a complex with Cdc45, while the mutants of Sld3CBD existed along, we calculated the elements of secondary structure from the crystal structure of Sld3CBD-Cdc45. The concentration of samples was controlled to the same level for CD measurement. The relative plat of the Sld3-3S peak may be caused by precipitating while doing the measurement.

      (10) Can authors generate the alpha fold three models of the Sld3CBD-Cdc45-MCM-dsDNA and SCMG-dsDNA and compare them with the models they have generated?

      We tried to predict the Sld3CBD-Cdc45-MCM-dsDNA and SCMG-dsDNA using Alphafold3. Although the results showed similar structures to our models, many parts were disordered. So, we did not use the predicted structures.

      (11) The authors say that the overall molecular mass of the Sld7-Sld3ΔC-Cdc45 was >400kDa on the SEC column. However, the column used for purifying this complex and the standards that were run on it for molecular weight calculations have not been written anywhere. If the Superdex 200 column was used, then the sample of more than 400kDa should not elute at the position shown in Supplementary Figure 2B. I recommend showing the standard MW plot and where the elution volume of the Sld7-Sld3ΔC-Cdc45 lies on the standard curve. Also, add how molecular weight calculations were done and the calculated molecular mass.

      Following the comment, we added a measurement of Superdex 200 16/60 column (SEC) using a standard sample kit into Supplementary Figure 2 to show that the molecular weight of the peak at the position was estimated to be > 400 k Da.

      (12) I also recommend using at least one of the techniques, either SEC-MALS or AUC, to calculate the actual molecular mass of the Sld7-Sld3ΔC-Cdc45 complex and to find its oligomeric state. If the authors want to prove their hypothesis that a dimer of this complex binds to MCMDH, it is essential to show that it exists as a dimer. Based on the current SEC profile, it appears as a monomer peak if the S200 SEC column is being used.

      As the response to (11), we added the standard MW plot (measurement using Superdex 200 16/60 column) using a standard sample kit. The molecular weight at the peak elution position of Sld7-Sld3ΔC-Cdc45 was estimated to be 429k Da. Considering that the Sld7-Sld3ΔC-Cdc45 dimer should be a flexible long-shaped molecule, the elution position could be at a larger molecular weight position than the real one (158 x 2 k Da). We also tried to confirm the particle size using SEC-SAXS, as the response to the next question (13).

      (13) Dynamic light scattering is not the most accurate method for calculating intermolecular distance. I recommend using another technique that calculates the accurate molecular distances between two Cdc45 if Sld7-Sld3ΔC-Cdc45 is forming a dimer. Techniques such as FRET could be used. Otherwise, some complementary methods, such as SAXS, could also be used to generate a low-resolution envelope and fit the speculated dimer model inside, or authors could try negative staining the purified Sld7-Sld3ΔC-Cdc45 and generate 2D class averages and low-resolution ab initio models to see how the structure of this complex appears and whether it satisfies the speculated model of the dimeric complex.

      We have tried both negative staining TEM and SEC-SAXS experiments. We could not obtain images good enough of negative staining of TEM to generate 2D class averages and low-resolution ab initio models. The results of SEC-SAXS provided a molecular weight of 370 - 420 kDa, and an Rg > 85 Å, which are consistent with our conclusion from SEC and DLS results but with large error due to the measurement temperature at 10-15°C (measuring equipment limitation). The peak of SCE-SAXS under measurement conditions was not as sharp as purification at 4°C and SAXS data is not good enough to make a molecular model, so we did not add them to our manuscript.

      (14) Authors mentioned in the introduction section (lines 72-73) that based on the single-molecule experiments, Cdc45 is recruited in a stepwise manner to MCMDH. If this is true and if Sld7-Sld3ΔC-Cdc45 forms a dimer, this is also true, then for stepwise recruitment, the dimer will have to break into monomers, and this will be an energy-expensive process for the cell. So, would such a process occur physiologically? Can the authors explain how this would physiologically happen inside the cell?

      Sld7-Sld3-Cdc45 consists of domains linked by long loops, so the dimer Cdc45-Sld3-[Sld7]2-Sld3-Cdc45 is flexible long-sharp. Such a flexible dimer does not mean that two Cdc45 molecules must bind to MCM DH simultaneously and may bind to MCM DH by stepwise manner. The dimer formation of Sld7-Sld3-Cdc45 is advantageous for recruiting efficiently and saving energy. Moreover, our proposal of Cdc45-Sld3-[Sld7]2-Sld3-Cdc45 on MCM DH could be a stage during CMG formation in the cell. Following the comment, we added such descriptions (P7/L194, and P10/L276-279).

      (15) Can authors show experimentally that a dimer of Sld7-Sld3ΔC-Cdc45 is binding to MCMDH and not a monomer in a stepwise fashion?

      In our study, we provided experiments of particle size to show the dimer of Sld7-Sld3-Cdc45 off MCM DH and a model of SCMG to indicate the dimer of Sld7-Sld3ΔC-Cdc45 on MCM DH. This question should be addressed future by the Cryo-EM of Sld7-Sld3-Cdc45-MCM DH or Sld7-Sld3-CMG. As the response to Q14, the flexible dimer of Sld7-Sld3ΔC-Cdc45 binding on MCMDH does not contradict the stepwise-loading fashion. The dimer of Sld7-Sld3ΔC-Cdc45 binding on MCM DH shows a stage.

      (16) Can authors highlight where Sld7 will lie on their model shown in Figures 3A and 3C, considering their model shown in 3B is true?

      We predict that the Sld7-Sld3-Cdc45 should be in a dimer form of Cdc45-Sld3-[Sld7]2-Sld3-Cdc45 based on the structures and the particle size analysis. The Sld7 dimer could be across MCM DH on the top of Figure 3A right and 3C right. However, we could not add the Sld7 molecule to the models because there is no interaction data between Sld7 and MCM.

      (17) In Supplementary Figure 10, can authors show the residues between the loop region highlighted in the dotted circle to show that there is no steric clash between the residues in that region of their predicted model?

      Following the comment, we added the residues in Supplementary Figure 10 (Supplementary Figure 11 in the revised version) to show no steric clash in our predicted model.

      (18) It is essential to show experimentally that Sld3CBD neighbors MCM2 and binds Cdc45 on the opposite side of the GINS binding site. I recommend that the authors design an experiment that proves this statement. Mutagenesis experiments for the predicted residues that could be involved in interaction with proper controls might help to prove this point. Since this is the overall crux of the paper, it has to be demonstrated experimentally.

      We thank the reviewer’s recommendation. Our structural analysis experiment shows the interaction information between Sld3CBD and Cdc45 at 2.6 Å resolution. The Sld3CBD-binding site, GINS-binding site, and MCM-binding site of Cdc45 are completely different, indicating that the Sld3CBD, Cdc45 and GINS could bind to MCM together. The SCMG model confirmed such a binding manner. Following the recommendation, we added mutant analysis of Cdc45 G367D and W481R, which was reported to disrupt the binding to MCM and GINS, respectively. Both mutants do not affect the binging to Sld3CBD as we predicted (Supplementary Figure 9B). We modified our manuscript and discussed this point more clearly (P7/L170-173).

      (19) I recommend rewriting the sentence in lines 208-210. During EMSA experiments, new bands do not appear; instead, there is no shift at lower ratios, so you see a band similar to the control for Sld3CBD-Cdc45. So, re-write the sentence correctly to avoid confusion when interpreting the result.

      Following the comment, we rewrote this sentence to "The ssDNA band remained (Figure 4B) and new bands corresponding to the ssDNA–protein complex appeared in CBB staining PAGE (Supplementary Figures 13) when the Sld3CBD–Cdc45 complex was mixed with ssDNA at the same ratio, indicating that the binding affinity of Sld3CBD–Cdc45 for ssDNA was lower than that of Sld3CBD alone” (P8/L226-229)

      (20) Since CDK-mediated phosphorylation of Sld3 is known to be required for GINS loading, the ssDNA binding affinity of phosphorylated Sld3 remains the same. I wonder what would happen if phosphorylated Sld3 were used for the experiment shown in Figure 4B.

      The CDK phosphorylation site is located at Sld3CTD and our ssDNA-binding experiment did not include the Sld3CTD, so phosphorylated Sld3 does not affect the results shown in Figure 4B.

      (21) Sld3CBD-Cdc45 has a reduced binding affinity for ss DNA, and Sld7-Sld3ΔC-Cdc45 and Sl7-Sld3ΔC have a similar binding affinity to Sld3CBD based on figure 4B. It appears that Sld3CBD reduces the DNA binding affinity of CDC45 or vice versa. Is it correct to say so?

      Our opinion is “vice versa”. Cdc45 reduces the ssDNA-binding affinity of Sld3CBD. Although we could not point out the ssDNA-binding sites of Sld3CBD, the surface charge of Sld3CBD implies that α8CTP could contribute to ssDNA-binding (Supplementary Figures 15).

      (22) Cdc45 binds to the ssDNA by itself, but in the case of Sld3CBD-Cdc45, the binding affinity is reduced for Sld3CBD and Cdc45. Based on their structure, can authors explain what leads to this complex's reduced binding affinity to the ssDNA? Including a figure showing how Sld7-Sld3CBD-Cdc45 interacts with the DNA would be a nice idea.

      Previous studies showed that Cdc45 binds tighter to long ssDNA (> 60 bases) and the C-terminus of Cdc45 is responsible for the ssDNA binding activity. The structure of Sld3CBD-Cdc45 shows the C-terminal domain DHHA1 of Cdc45 binds to Sld3CBD, which may lead to Sld3CBD-Cdc45 complex reduced ssDNA-binding affinity of Cdc45. We agree that showing a figure of how Sld7-Sld3CBD-Cdc45 interacts with ssDNA is a nice idea. However, there is no detailed interaction information between Sld7-Sld3Δ-Cdc45 and ssDNA, so we could not give a figure to show the ssDNA-binding manner. We added a figure to show the surface charges of Sld3CBD of Sld3CBD-Cdc45, and Sld3NTD-Sld7NTD, respectively (Supplemental Figure 15).

      (23) Based on the predicted model of Sld7-Sld3 and Cdc45 complex, can authors explain how Sld7 would restore the DNA binding ability of the Sld3CBD?

      It can be considered that Sld7 and Sld3NTD could bind ssDNA. Although we did not perform the ssDNA-binding assay of Sld7, the Sld3NTD-Sld7NTD surface shows a large positive charge area which may contribute to ssDNA-binding (Supplemental Figure 15). We added the explanation (P9/L245-248).

      (24) It would be important to show binding measurements and Kd values of all the different complexes shown in Figure 4B with ssDNA to explain the dissociation of Cdc45 from Sld7-Sld3 after the CMG formation. I also recommend describing the statement from lines 224-227 more clearly how Sld7-Sld3-Cdc45 is loading Cdc45 on CMG.

      As the reviewer mentioned, the binding measurements and Kd of values of all the different complexes are important to explain the dissociation of Sld7-Sld3 from CMG. The pull-down assay using chromatography may be affected by balancing the binding affinity and chromatography conditions. Therefore, we used EMSA with native-PAGE, which is closest to the natural state. However, the disadvantage is that the Kd values could not be estimated. For lines 224-227, the ssARS1-binding affinity of Sld3 and its complex should relate to the dissociation of Sld7–Sld3 from the CMG complex but not Cdc45 loading, because ssARS1 is unwound from dsDNA by the CMG complex after Cdc45 and GINS loading. We modified the description (P9/L248-251).

      (25) Can authors explain why SDS-PAGE was used to assess the ssDNA (See line 420)?

      We are sorry for making this mistake and corrected it to “polyacrylamide gel electrophoresis”.

      (26) In line 421, can the authors elaborate on a TMK buffer?

      We are sorry for this omission and added the content of the TMK buffer (P16/L453).

      (27) I am curious to know if the authors also attempted to Crystallize the Sld7-Sld3CBD-Cdc45 complex. This complex structure would support the authors' hypothesis in this article.

      We tried to crystallize Sld7-Sld3Δ-Cdc45 but could not get crystals. We also tried using cryo-EM but failed to obtain data.

      Reviewer #2 (Recommendations for the authors):

      (1) The manuscript would be strengthened if the authors acknowledged in greater detail how their work agrees with or disagrees with Itou et al. (PMID: 25126958 DOI: 10.1016/j.str.2014.07.001). The introduction insufficiently described the findings of that previous work in lines 63-64.

      We compared Sld3CBD in Sld3CBD-Cdc45 to the monomer reported by Itou et al. (PMID: 25126958 DOI: 10.1016/j.str.2014.07.001) in the section of [The overall structure of Sld3CBD-Cdc45] and point out the structural similarity and difference (P5/L105-106), especially, conformation change of Sld3CBD α8 for binding to Cdcd45, which agrees to the mutant experiments of Itou et al., (P3/L126-127). Another Cdc45-binding site of Sld3CBD in the Sld3CBD-Cdc45 complex is α9 not residues predicted in previous studies.

      (2) Figure 2. Could you please perform and present data from multiple biological replicates (e.g., at least two independent experiments) for each mutant strain? This would help ensure that the observed pull-downs (2A-B) and growth patterns (2C) are consistent and reproducible.

      We have done pull-downs three times from co-expression to purification and pull-down assay. We added descriptions to the method of [Mutant analysis of Sld3 and Cdc45]. The growth patterns are two times in Figure 2C.

      (3) Figure 3B. The match between the predicted complex length and particle size measured by dynamic light scattering (DLS) is striking. Did the authors run the analysis with vehicle controls and particle size standards? There is no mention of these controls.

      Following the comment, we added the control data of buffer and standard protein lysozyme, and the descriptions to the method of [Dynamic light scattering].

      (4) Figure 4. In lines 216-217, the authors write that the binding of the K. marxianus complex "demonstrates that the presence of Sld7 could restore the single-stranded DNA binding capacity of Sld3." Another explanation is that complexes from each species bind differently. If the authors want to make a strong claim, they should compare the binding of complexes containing the same proteins.

      Agree with the comment, to make a strong claim using samples from the same source is better. Due to limitations in protein overexpression, we used Sld7-Sld3ΔC-Cdc45 from different sources two sources belong to the identical family (Saccharomycetaceae) and the proteins Sld7, Sld3 and Cdc45 have sequence conservation with similar structures (RMSD = 0.356, 1.392, and 0.891 for Ca atoms of Sld7CTD, Sld7NTD-Sld3NTD, and Sld3CBD-Cdc45) predicted by the alphafold3. Such similarity in source and protein level allows us to do the comparison. Moreover, we modified the description to “indicates that the presence of Sld7 and Sld3NTD could increase the ssDNA-binding affinity to a level comparable to that of Sld3CBD.

      (5) The logic of the following is unclear: "Considering that ssDNA is unwound from dsDNA by the helicase CMG complex, Sld7-Sld3ΔC-Cdc45, and Sld7-Sld3C having a stronger ssDNA-binding capacity than Sld3CBD-Cdc45 may imply a relationship between the dissociation of Sld7-Sld3 from the CMG complex and binding to ssDNA unwound by CMG." (Lines 224-227). How do the authors imagine that the binding affinity difference due to Sld7 contributes to the release of Sld3? Please explain.

      Considering that ssARS1 is unwound from dsARS1 by the activated helicase CMG complex formed after loading Cdc45 and GINS, Sld3–Sld7 having a stronger ssARS1-binding affinity may provide an advantage for the dissociation of Sld7–Sld3 from the CMG complex. We modified the sentence of Lines 224-227 (P9/L248-251).

      (6) The authors suggest that the release of Sld3 from the helicase is related to its association with single-stranded ARS1 DNA. They refer to the work of Bruck et al. (doi: 10.1074/jbc.M111.226332), which demonstrates that single-stranded origin DNA inhibits the interaction between Sld3 and MCM2-7 in vitro. The authors selectively choose data from this previous work, only including data that supports their model while disregarding other data. This approach hinders progress in the field. Specifically, Bruck proposed a model in which the association of Sld3 and GINS with MCM2-7 is mutually exclusive, explaining how Sld3 is released upon CMG assembly. In Figure 3 of the authors' model, they suggest that Sld3 can associate with MCM2-7 through CDC45, even when GINS is bound. Furthermore, Bruck's work showed that ssARS1-2 does not disrupt the Sld3-Cdc45 interaction. Instead, Bruck's data demonstrated that ssARS1-2 disrupts the interaction between MCM2-7 and Sld3 without Cdc45. While we do not expect the authors to consider all data in the literature when formulating a model, we urge them to acknowledge and discuss other critical data that challenges their model. Additionally, it would be beneficial for the field if the authors include both modes of Sld3 interaction with MCM2-7 (i.e., directly with MCM or through CDC45) when proposing a model for how CMG assembly and Sld3 release occurs.

      In our discussion, we referred to the studies of Bruck’s data (doi: 10.1074/jbc.M111.226332) but did not discuss more because we didn’t perform similar experiments in vitro, and we do not think that no discussion hinders progress in the field. Promoting research progress, the new experiment should provide a new proposal and updated knowledge. Although we do not know exactly the positional relationship between Sld3 and Dpb11-Sld2 on MCM during GINS recruiting, the Sld3CBD-Cdc45 structure shows clearly that the Sld3CBD-binding site of Cdc45 is completely different from that of GINS and MCM binding to Cdc45. The model SCMG confirmed such a binding manner, Sld3, Cdc45 and GINS could bind together. The competition of Sld3 and GINS for binding to Cdc45 or Cdc45-MCM reported by Bruck et. al, may be caused by the conformation change of Cdc45 DHHA1 between Sld3CBD-Cdc45 and CMG, or without other initiation factors (CMG formation is regulated by the initial factors). We modified the discussion (P10/L282-286). Regarding ssARS1-binding, we did not discuss with Bruck's data that ARS1-2 does not disrupt the Sld3-Cdc45 interaction, because the data does not conflict with our proposal, although the data does not have an advantage. We propose that the release of Sld3 and Sld7 from CMG could be associated with the binding of ssARS1 unwound by CMG, but the dissociation event of Sl3-Sld7 doesn’t only ssARS1-binding. The exploration of unwound-ssARS1 causes the conformation change of CMG, which may be another event for Sld3-Sld7 dissociation. However, we do not have more experiments to confirm this and Bruck’s ssDNA-binding experiment did not use all of Sld3, Cdc45 and MCM, so we do not discuss more with Bruck’ data in the revised version (P11/L303-305).,

      Reviewer #3 (Recommendations for the authors):

      Major points:

      (1) Figure 1, Sld3CBD-Cdc45 complex: Please indicate the number of critical residues and those of alpha-helixes and beta-sheets in this Figure or Supplemental Figure to confirm the authors' claim.

      Following the comment, we added the number of alpha-helixes and beta-sheets with residue numbers in Figure 1, and Supplemental Figures 4 and 5. We also added a topology diagram (Supplemental Figure 3).

      (2) Figure 2A and B: Please quantify the interaction here with a proper statistical comparison.

      In the experiments of Figures 2A and 2B, we used a co-expression system to co-purify the complexes and check their binding. For quantifying, we added the concentrations of the samples used in the Method of [Mutant analysis of Sld3 and Cdc45].

      (3) Figure 3B, EMSA: If these are from the EMSA assay, at least free DNAs and protein-bound DNAs are present on the gel. However, the authors showed one band, which seems to be free DNA in Figure 3B and separately the smear band of the protein complex in Supplementary Figure 12, and judged the DNA binding by the disappearance of the band (line 207). Interestingly, in the case of Sld3CBD, there are few smear bands (Supplementary Figure 12). Where is DNA in this case? The disappearance could be due to the contaminated nucleases (need a control non-specific DNA). Without showing the Sld3CBD-DNA complex in the gel, the conclusion that the DNA binding activity of Sld3CBD-Cdc45 to DNA is lower than Sld3CBD alone (line 210) is very much speculative. The same is true for Sld7-Sld3dC-Cdc45.

      Please explain the method (EMSA) briefly in the main text and show a whole gel in both Figures. If the authors insist that the Sld3 DNA-binding activity is altered with Cdc43 (and MCM), it is better to perform a more quantitative DNA binding assay such as BIAcore (surface plasmon), etc.

      In the EMSA, we use SYBR (Figure 4B) and CBB (Supplementary Figure 13) staining to show bands of ssDNA and protein, respectively. As the reviewer mentioned, the disappearance of the bands could be due to the contaminated nucleases, we did experiments with non-specific ssDNA-binding as a control using the same proteins shown in Supplementary Figure 14. So, we are convinced that the disappearance of the ssDNA bands or not disappearance could occur when binding to protein or not. We added such explanations in the text (P9/L242-244). As we mentioned in the legend of Supplementary Figure 13, the Sld3CBD could not enter the gel, even when bound to ssDNA, because the pI values exceeded the pH of the running buffer.

      Following the reviewer's comments, we attempted a pull-down experiment using Histag (C-terminal histag of Sld3CBD/Sld3ΔC). Unfortunately, we encountered difficulties in achieving the balance between binding and chromatography conditions.

      (4) Figure 3B: Please quantify the DNA binding here with a proper statistical comparison with triplicate.

      For EMSA (Figure 3B), we used samples of ssDNA:protein= 1:0. 1:1, 1:2, 1:4 and 0:1 molecular ratios with 10 pM as a 1 unit. We added concentrations of the samples in the Method of [Electrophoretic mobility shift assay for ssDNA binding].

      Following the comment, we tried to quantify the binding strength by integrating the grayscale of the bands in gel photos. However, we are concerned because this quantitative calculation through grayscale could not provide an accurate representation of results. Many sample groups cannot be run on one gel. Therefore, the gel differences in parameters cause large errors in the calculation as shown in Author response image 1. Although the calculated integral grayscale chart is consistent with our conclusion, we do not want to add this to our manuscript.

      Author response image 1.

      (5) Because of poor writing, the authors need to ask for English editing.

      We are very sorry for the language. We asked a company (Editag, https:www.editage.jp) to do a native speaker revision and used AI to recheck English.

      Minor points:

      (1) Lines 47-58, Supplementary Figure 1: Although the sentences describe well how CMG assembles on the replication origin, the figure does not reflect what is written, but rather shows a simple schematic figure related to the work. However, for the general readers, it is very useful to see a general model of the CMG assembly. Then, the authors need to emphasize the steps focused in this study.

      Thank you for your thoughtful comments. We optimized Figure 1 and hope it will be more understandable to general readers.

      (2) Line 50, DDK[6F0L](superscript): what is 5F0L?

      We are sorry for this mistake, that is a PDBID of the DDK structure. we deleted 6F0L.

      (3) Lines 68 and 69, ssDNA and dsDNA: should be "single-stranded DNA (ssDNA)" and double-stranded DNA (dsDNA) when these words appear for the first time.

      Following the comment, we modified it to “single-stranded DNA (ssDNA)” and “double-stranded DNA (dsDNA)” (P3/L68,70).

      (4) Line 84, Cdc45s: What "s" means here?

      We are sorry for this mistake, we modified it to “Cdc45”.

      (5) Line 87, Sld3deltaC: What is Sld3deltaC? This is the deletion of either the Cdc45-binding domain or the C-terminal domain.

      Sld3ΔC is a deletion of the C-terminal domain of Sld3. We added the residue range and explanation (P4/L91).

      (6) Line 103: Although the authors mentioned beta-sheets 1-14 in the text, there is no indication in Figures. It is impossible to see the authors' conclusion.

      The secondary structure elements of Sld3CBD-Cdc45 are shown in Supplementary Figures 4 and 5. Following the comment, we added a topology diagram of Sld3CBD and Cdc45 in the Sld3CBD-Cdc45 complex as Supplementary Figure 3 and added citations when describing structural elements.

      (7) Line 106, huCdc45: Does this mean human Cdc45? If so, it should be "human CDC45 (huCDC45). CMG form is from budding yeast? Please specify the species.

      Yes, huCdc45 is human Cdc45. We modified it into “human CDC45 (huCdc45)”.

      (8) Line 107, Supplemental Figure 3B, black ovals: Please add "alpha7" in the Figure.

      Following the comment, we added a label of Cdc45 α7 to Supplemental Figure 3B and 3C (Supplemental Figure 4B and 4C in revised version).

      (9) Line 128, DHHA1: What is this? Please explain it in the text.

      Following the comment, we added the information on DHHA1 (P3/L75-77).

      (10) Line 130, beta13, and beta14: If the authors would like to point out these structures, please indicate where these sheets are in Figures.

      We added a topology diagram as Supplementary Figure 3 to show the β-sheet in DHH and added a citation in the text.

      (11) Line 133: Please add (Figure 1B) after the a8CTP.

      Following the comment, we added “(Figure 1C)” (1B is 1C in revised version) after the α8CTP (P6/L133).

      (12) Line 140: After DHHA1, please add (Figure 1C).

      Following the comment, we added the figure citation after the DHHA1 (P6/L140).

      (13) Line 142: After DHHA1, please add (Figure 1D).

      Following the comment, we added the figure citation after the DHHA1 (P6/L142).

      (14) Line 149, Sld3-Y seemed to retain a faint interaction with Cdc45. The Cdc45 band is too faint here. Moreover, as shown above, without the quantification with proper statistics, it is hard to draw this kind of conclusion.

      We agree that the Cdc45 band corresponding to Sld3-Y in the pull-down assay was very faint, so we performed an in vivo experiment (Fig2C) to confirm this result.

      (15) Line 149, Figure 2A and B: What kind of interaction assay was used here? Simple pull-down. It seems to eluate from the column. If so, how do the authors evaluate the presence of the proteins in different fractions? Please explain the method briefly in the main text.

      Figure 2 shows a co-express pull-down binding assay. To describe the co-express pull-down experiments clearly, we added more explanations in the Methods [Mutation analysis of Sld3 and Cdc45].

      (16) Line 154-155: Please show the quantification to see if the reduced binding is statistically significant.

      Here, we explain why Cdc45-A remained Sld3CBD-bind ability. Although mutant Cdc45-A has reduced three hydrogen bonds with D344 of Sld3CBD, the remaining hydrogen-bond network keeps contact between Sld3CBD and Cdc45.

      (17) Line 158, cell death: "No growth" does not mean cell death. Please rephrase here.

      Following the comment, we modified it to “no growth” (P6/L158).

      (18) Line 166: After CMG dimer, please add "respectively".

      Following the comment, we added the word “, respectively” after CMG dimer (P7/L178).

      (19) Line 194-195: I can not catch the meaning. Please rephrase here to clarify the claim. What are ssARS1-2 and ARS1-5?

      Following the comment, we added more information about ssDNA fragments at the beginning of this section (P8/L210-214).

      (20) Figure 4A and Supplemental Figure 12 top, schematic figure of ARS region. It is hard to catch. More explanation of the nature of the DNA substrates and much better schematic presentations would be appreciated.

      Following the comment, we added more information about ARS1 to the figure legend.

      (21) Figure 1A, dotted ovals should be dotted squares as shown in the enlarged images on the bottom.

      Following the comment, we modified Figure 1A and the legend to change the dotted ovals into dotted squares.

    1. Author response:

      We have reviewed the helpful feedback from the reviewers and would like to thank them for their careful consideration of our manuscript. By way of provisional response, we agree with many of the above points and plan to revise our manuscript accordingly.

      In an effort to replicate some of the heme trafficking-related experiments in the original paper using a C. elegans model of TDD, we were either unable to do so or demonstrated an alternative explanation for the findings we could partially reproduce. As the reviewers correctly point out, there were some methodological and reagent-related differences between the study by Sun et al. and our own that we will more directly highlight in a subsequent manuscript version. Additionally, where possible, we will attempt to replicate these experiments using the same protocol(s).

      We observed several phenotypic traits observed in the C. elegans model of TDD that were not previously described in prior studies. While we believe these features to be consistent with a bioenergetic problem in the worm, direct evidence for this is admittedly lacking in our original manuscript. We are actively engaged in experiments examining potential functions of HRG-9 and HRG-10 unrelated to heme trafficking and will consider which data best aligns with the scope of this study, thus warranting inclusion in a subsequent manuscript version. We will also provide a more comprehensive review of relevant data generated by other groups (e.g., lipid dysregulation, impaired autophagy, mitochondrial dysfunction in the absence of TANGO2) in the discussion section.

      Recommended improvements related to figure legends, terminology, and formatting will also be executed in our forthcoming version. On behalf of my co-authors and myself, thank you again for your time and effort improving this work.

    1. Author response:

      We thank both reviewers for their time and effort in considering our manuscript. We are pleased that the reviewers recognised the strength of our theoretical analysis and found it "elegant" and "reasonably accessible". We also acknowledge the suggestions made by both reviewers that the manuscript could be improved by more discussion of potential experiments. We were concerned not to make the original manuscript too long but, in the light of the reviewers' comments, we will submit a revised version with more details of the kinds of experiments that would build on the results that we have presented.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      The present study aims to associate reproduction with age-related disease as support of the antagonistic pleiotropy hypothesis of ageing, predominantly using Mendelian Randomization. The authors found evidence that early-life reproductive success is associated with advanced ageing.

      Strengths:

      Large sample size. Many analyses.

      Weaknesses:

      There are some errors in the methodology, that require revisions.

      In particular, the main conclusions drawn by the authors refer to the Mendelian Randomization analyses. However, the authors made a few errors here that need to be reconsidered:

      (1) Many of the outcomes investigated by the authors are continuous outcomes, while the authors report odds ratios. This is not correct and should be revised.

      Thank you for your observation. We have revised the manuscript to ensure that the results for continuous outcomes are appropriately reported using beta coefficients, which indicate the change in the outcome per unit increase in exposure. This will accurately reflect the nature of the analysis and provide a clearer interpretation of continuous outcomes (lines 56-109).

      (2) Some of the odds ratios (for example the one for osteoporosis) are really small, while still reaching the level of statistical significance. After some checking, I found the GWAS data used to generate these MR estimates were processed by the program BOLT-LLM. This program is a linear mixed model program, which requires the transformation of the beta estimates to be useful for dichotomous outcomes. The authors should check the manual of BOLT-LLM and recalculate the beta estimates of the SNP-outcome associations prior to the Mendelian Randomization analyses. This should be checked for all outcomes as it doesn't apply to all.

      Thank you for your detailed feedback. We have reviewed all the GWAS data used in our MR analyses and confirmed that all GWAS of continuous traits have already been processed using the BOLT-LMM, including age at menarche, age at first birth, BMI, frailty index, father's age at death, mother's age at death, DNA methylation GrimAge acceleration, age at menopause, eye age, and facial aging. Most of the dichotomous outcomes have not been processed by BOLT-LMM, including late-onset Alzheimer's disease, type 2 diabetes, chronic heart failure, essential hypertension, cirrhosis, chronic kidney disease, early onset chronic obstructive pulmonary disease, breast cancer, ovarian cancer, endometrial cancer, and cervical cancer, except osteoporosis. We have reprocessed the GWAS beta values of osteoporosis and re-conducted the MR analysis (lines 74-75; lines 366-373).

      (3) The authors should follow the MR-Strobe guidelines for presentation.

      Thank you for your suggestion to follow the MR-STROBE guidelines for the presentation of our study. We appreciate the importance of adhering to these standardized guidelines to ensure clarity and transparency in reporting Mendelian Randomization (MR) analyses. We confirm that the MR components of our research are structured and presented following the MR-STROBE checklist. In addition to the MR analyses, our study also integrates Colocalization analysis, Genetic correlation analysis, Ingenuity Pathway Analysis (IPA), and population validation to provide a more comprehensive understanding of the genetic and biological context. While these analyses are not strictly covered by MR-STROBE guidelines, they complement the MR results by offering additional validation and mechanistic insights.

      We have structured our manuscript to separate these complementary analyses from the core MR results, maintaining alignment with MR-STROBE for the MR-specific components. The additional analyses are discussed in dedicated sections to highlight their unique contributions and avoid conflating them with the MR findings.

      (4) The authors should report data in the text with a 95% confidence interval.

      Thank you for your feedback. We have added the 95% confidence intervals for the reported data within the main text to enhance clarity and provide comprehensive context (lines 56-109). Additionally, the complete analysis data, including all detailed results, can be found in Table S3.

      (5) The authors should consider correction for multiple testing

      Thank you for your comment regarding the need to consider correction for multiple testing. We agree that correcting for multiple comparisons is an important step to control for the possibility of false-positive findings, particularly in studies involving large numbers of statistical tests. In our study, we carefully considered the issue of multiple testing and adopted the following approach:

      Context of Multiple Testing: The tests we conducted were hypothesis-driven, focusing on specific relationships (e.g., genetic correlation, colocalization, and Mendelian Randomization). These analyses are based on priori hypotheses supported by existing literature or biological relevance.

      Statistical Methods: Where applicable, we applied appropriate measures to account for multiple tests. For instance, in Mendelian Randomization, sensitivity analyses serve to validate the robustness of the results.

      We believe that the methodology and corrections applied in our study appropriately address concerns about multiple testing, given the hypothesis-driven nature of our analyses and the rigorous steps taken to validate our findings. If you feel that additional corrections are required for specific parts of the analysis, we would be happy to further clarify or revise as needed.

      Reviewer #2 (Public review):

      Summary:

      The authors present an interesting paper where they test the antagonistic pleiotropy theory. Based on this theory they hypothesize that genetic variants associated with later onset of age at menarche and age at first birth have a positive causal effect on a multitude of health outcomes later in life, such as epigenetic aging and prevalence of chronic diseases. Using a mendelian randomization and colocalization approach, the authors show that SNPs associated with later age at menarche are associated with delayed aging measurements, such as slower epigenetic aging and reduced facial aging, and a lower risk of chronic diseases, such as type 2 diabetes and hypertension. Moreover, they identified 128 fertility-related SNPs that are associated with age-related outcomes and they identified BMI as a mediating factor for disease risk, discussing this finding in the context of evolutionary theory.

      Strengths:

      The major strength of this manuscript is that it addresses the antagonistic pleiotropy theory in aging. Aging theories are not frequently empirically tested although this is highly necessary. The work is therefore relevant for the aging field as well as beyond this field, as the antagonistic pleiotropy theory addresses the link between fitness (early life health and reproduction) and aging.

      Points that have to be clarified/addressed:

      (1) The antagonistic pleiotropy is an evolutionary theory pointing to the possibility that mutations that are beneficial for fitness (early life health and reproduction) may be detrimental later in life. As it concerns an evolutionary process and the authors focus on contemporary data from a single generation, more context is necessary on how this theory is accurately testable. For example, why and how much natural variation is there for fitness outcomes in humans?

      Thank you for these insightful questions. We appreciate the opportunity to clarify how we approach the testing of AP theory within a contemporary human cohort and address the evolutionary context and comparative considerations with the disposable soma theory.

      We recognize that modern human populations experience selection pressures that differ from those in the past, which may affect how well certain genetic variants reflect historical fitness benefits. Nonetheless, the genetic variation present today still offers valuable insights into potential AP mechanisms through statistical associations in contemporary cohorts. We believe that AP can indeed be explored in current populations by examining genetic links between reproductive traits and age-related health outcomes. In our study, we investigate whether certain genetic variants linked to reproductive timing—such as age at menarche and age at first birth—also correlate with late-life health risks. By identifying SNPs associated with both early-life reproductive success and adverse aging outcomes, we aim to capture the evolutionary trade-offs that AP theory suggests.

      Despite contemporary selection pressures that differ from historical conditions, there remains natural genetic variation in traits like reproductive timing and longevity in humans today. This diversity allows us to apply MR to test causal relationships between reproductive traits and aging outcomes, providing insights into potential AP mechanisms. Prior studies have demonstrated that reproductive behaviors exhibit significant heritability and have identified genetic loci associated with reproductive timing (1,2). This genetic variation facilitates causal inference in modern cohorts, despite environmental and healthcare advances that might modulate these associations (3). By leveraging genetic risk scores for reproductive timing, our study captures the necessary variability to assess potential AP effects, thus providing valuable insights into how evolutionary trade-offs may continue to influence human health outcomes.

      How do genetic risk score distributions of the exposure data look like?

      Thank you for your question. Our study is focused on Mendelian Randomization (MR) analysis, which aims to infer causal relationships between exposures and outcomes. While genetic risk scores (GRS) provide valuable insights at an individual level, they do not directly align with our study's objective, which is centered on population-level causal inference rather than individual-level genetic risk assessment. In MR, we use genetic variants as instrumental variables to determine the causal effect of an exposure on an outcome. GRS analysis typically focuses on summarizing an individual's risk based on multiple genetic variants, which is outside the scope of our current research. Therefore, we did not perform or analyze the distribution of genetic risk scores, as our primary goal was to understand broader causal relationships using established genetic instruments.

      Also, how can the authors distinguish in their data between the antagonistic pleiotropy theory and the disposable soma theory, which considers a trade-off between investment in reproduction and somatic maintenance and can be used to derive similar hypotheses? There is just a very brief mention of the disposable soma theory in lines 196-198.

      In our manuscript, we test AP theory specifically by examining genetic variants associated with reproductive timing and their association with age-related health risks in later life. MR and genetic risk scores allow us to assess these associations, directly testing the hypothesis that certain alleles enhancing reproductive success might have adverse effects on aging outcomes. This gene-centered approach aligns with AP’s premise of genetic trade-offs, enabling us to observe whether alleles associated with early-life reproductive traits correlate with increased risks of age-related diseases. Distinguishing from disposable soma theory, which would predict a general trade-off in energy allocation affecting somatic maintenance and not specific genetic effects, our data focuses on how certain alleles have differential impacts across life stages. Our findings thus support AP theory over disposable soma by highlighting the effects of specific genetic loci on both reproductive and aging phenotypes. However, future research could indeed explore the intersection of these theories, for example, by examining how resource allocation and genetic predispositions interact to influence longevity in various environmental contexts.

      (2) The antagonistic pleiotropy theory, used to derive the hypothesis, does not necessarily distinguish between male and female fitness. Would the authors expect that their results extrapolate to males as well? And can they test that?

      Emerging evidence suggests that early puberty in males is linked to adverse health outcomes, such as an increased risk of cardiovascular disease, type 2 diabetes, and hypertension in later life (4). A Mendelian randomization study also reported a genetic association between the timing of male puberty and reduced lifespan (5). These findings support the hypothesis that genetic variants associated with delayed reproductive timing in males might similarly confer health benefits or improved longevity, akin to the patterns observed in females. This would suggest that similar mechanisms of antagonistic pleiotropy could operate in males as well.

      In our study, BMI was identified as a mediator between reproductive timing and disease risk. Given that BMI is a common risk factor for age-related diseases in both males and females (6-9), it is plausible that similar mechanisms involving BMI, reproductive timing, and disease risk could exist in males. This shared mediator points to the possibility that, while reproductive timelines may differ, the pathways through which these traits influence aging outcomes may be consistent across genders.

      AP theory could potentially be tested in males, as the principles of the theory may extend to analogous reproductive traits in males, such as age at puberty and testosterone levels, which could similarly influence health outcomes later in life. However, as our current study focuses specifically on female reproductive traits, testing the AP theory in males is outside the scope of this work. We acknowledge the importance of exploring these mechanisms in males, and we hope that future research will address this by investigating male-specific reproductive traits and their relationship to aging and health outcomes.

      (3) There is no statistical analyses section providing the exact equations that are tested. Hence it's not clear how many tests were performed and if correction for multiple testing is necessary. It is also not clear what type of analyses have been done and why they have been done. For example in the section starting at line 47, Odds Ratios are presented, indicating that logistic regression analyses have been performed. As it's not clear how the outcomes are defined (genotype or phenotype, cross-sectional or longitudinal, etc.) it's also not clear why logistic regression analysis was used for the analyses.

      Thank you for your thoughtful comments regarding the statistical analyses and the clarification of methods and variables used in the study.

      Statistical Analyses Section: We have included a detailed explanation of all statistical analyses in the Methods section (lines 291–408), specifying the rationale for the choice of methods, the variables analyzed, and their relationships. Additionally, we have provided the relevant equations or statistical models used where appropriate to ensure transparency.

      Beta Values and Odds Ratios: In the Results section (starting at line 56), both Beta values and Odds Ratios are presented: Beta values were used for analyses of continuous outcomes to quantify the linear relationship between predictors and outcomes. Odds Ratios (ORs) were calculated for binary or categorical disease outcomes to describe the relative odds of an outcome given specific exposures or independent variables.

      Validation and Regression Analyses: For further validation of the MR results, we conducted analyses using the UK Biobank dataset (starting at line 162). Logistic regression analysis was then employed for disease risk assessments involving categorical outcomes (e.g., diseased or not).

      We hope that this clarifies the methods and their applicability to our study, as well as the rationale for the presentation of Beta values and Odds Ratios. If further details or refinements are required, we are happy to incorporate them.

      (4) Mendelian Randomization is an important part of the analyses done in the manuscript. It is not clear to what extent the MR assumptions are met, how the assumptions were tested, and if/what sensitivity analyses are performed; e.g. reverse MR, biological knowledge of the studied traits, etc. Can the authors explain to what extent the genetic instruments represent their targets (applicable expression/protein levels) well?

      Thank you for your insightful comments regarding the Mendelian Randomization (MR) analysis and the evaluation of its assumptions. Below, we provide additional clarification on how the MR assumptions were addressed, sensitivity analyses performed, and the representativeness of the genetic instruments (starting at line 314):

      Relevance Assumption (Genetic instruments are associated with the exposure): “We identified single nucleotide polymorphisms (SNPs) associated with exposure datasets with p < 5 × 10<sup>-8</sup> (10,11). In this case, 249 SNPs and 67 SNPs were selected as eligible instrumental variables (IVs) for exposures of age at menarche and age at first birth, respectively. All selected SNPs for every exposure would be clumped to avoid the linkage disequilibrium (r<sup>2</sup> = 0.001 and kb = 10,000).” “During the harmonization process, we aligned the alleles to the human genome reference sequence and removed incompatible SNPs. Subsequent analyses were based on the merged exposure-outcome dataset. We calculated the F statistics to quantify the strength of IVs for each exposure with a threshold of F>10 (12).”

      Independence Assumption (Genetic instruments are not associated with confounders, Genetic instruments affect the outcome only through the exposure): Then we identified whether there were potential confounders of IVs associated with the outcomes based on a database of human genotype-phenotype associations, PhenoScanner V2 (13,14) (http://www.phenoscanner.medschl.cam.ac.uk/), with a threshold of p < 1 × 10<sup>-5</sup>. IVs associated with education, smoking, alcohol, activity, and other confounders related to outcomes would be excluded.

      Sensitivity Analyses Performed: A pleiotropy test was used to check if the IVs influence the outcome through pathways other than the exposure of interest. A heterogeneity test was applied to ensure whether there is a variation in the causal effect estimates across different IVs. Significant heterogeneity test results indicate that some instruments are invalid or that the causal effect varies depending on the IVs used. MRPRESSO was applied to detect and correct potential outliers of IVs with NbDistribution = 10,000 and threshold p = 0.05. Outliers would be excluded for repeated analysis. The causal estimates were given as odds ratios (ORs) and 95% confidence intervals (CI). A leave-one-out analysis was conducted to ensure the robustness of the results by sequentially excluding each IV and confirming the direction and statistical significance of the remained remaining SNPs.

      Supplemental post-GWAS analysis: Colocalization analysis (starting at line 356), Genetic correlation analysis (starting at line 366).

      Our MR analysis adheres to the guidelines for causal inference in MR studies. By combining multiple sensitivity analyses and ensuring the quality of genetic instruments, we demonstrate that the results are robust and unlikely to be driven by confounding or pleiotropy.

      (5) It is not clear what reference genome is used and if or what imputation panel is used. It is also not clear what QC steps are applied to the genotype data in order to construct the genetic instruments of MR.

      Starting in line 314, the steps of SNPs selection were included in the Methods part. “We identified single nucleotide polymorphisms (SNPs) associated with exposure datasets with p < 5 × 10<sup>-8</sup> (10,11). In this case, 249 SNPs and 67 SNPs were selected as eligible instrumental variables (IVs) for exposures of age at menarche and age at first birth, respectively. All selected SNPs for every exposure would be clumped to avoid the linkage disequilibrium (r<sup>2</sup> = 0.001 and kb = 10,000). Then we identified whether there were potential confounders of IVs associated with the outcomes based on a database of human genotype-phenotype associations, PhenoScanner V2 (13,14) (http://www.phenoscanner.medschl.cam.ac.uk/), with a threshold of p < 1 × 10<sup>-5</sup>. IVs associated with education, smoking, alcohol, activity, and other confounders related to outcomes would be excluded. During the harmonization process, we aligned the alleles to the human genome reference sequence and removed incompatible SNPs. Subsequent analyses were based on the merged exposure-outcome dataset. We calculated the F statistics to quantify the strength of IVs for each exposure with a threshold of F>10 (12). If the effect allele frequency (EAF) was missing in the primary dataset, EAF would be collected from dsSNP (https://www.ncbi.nlm.nih.gov/snp/) based on the population to calculate the F value.” The SNP numbers of exposures for each outcome and F statistics results were listed in supplemental table S2.

      (6) A code availability statement is missing. It is understandable that data cannot always be shared, but code should be openly accessible.

      We have added it to the manuscript (starting at line 410).

      Reviewer #2 (Recommendations for the authors):

      (1) The outcomes seem to be genotypes (lines 274-288). In MR, genotypes are used as an instrument, representing an exposure, which is then associated with an outcome that is typically observed and measured at a later moment in time than the predictors. If both exposure and outcome are genotypes it is not clear how this works in terms of causality; it would rather reflect a genetic correlation. One would expect the genotypes that function as instruments for the exposure to have a functional cascade of (age-related) effects, leading to an (age-related) outcome. From line 149 the outcomes seem to be phenotypes. Can the authors please clearly explain in each section what is analyzed, how the analyses were done, and why the analyses were done that way?

      Thank you for your insightful comment. We understand the concern regarding the use of genotypes as both exposures and outcomes and the implications this has for interpreting causality versus genetic correlation. To clarify, in our study, the outcomes analyzed in the MR framework are indeed genotypes, starting from line 47. We use genotypes as instrumental variables for exposures, which are then linked to phenotypic outcomes observed at a later stage, in line with standard MR principles.

      To improve the robustness of the MR results, we validated the genetic associations in the population with phenotype data from UK Biobank (lines 162-203), and the detailed methods were listed in lines 385-408.

      (2) Overall, the English writing is good. However, some small errors slipped in. Please check the manuscript for small grammar mistakes like in sentences 10 (punctuation) and 33 (grammar).

      Thank you for your feedback. We appreciate your careful review and attention to detail. We thoroughly rechecked the manuscript for any grammatical errors, including punctuation and sentence structure, especially in sentences 11 and 35 in revised manuscript, as suggested.

      (3) There is currently no results and discussion section.

      The manuscript was submitted as Short Reports article type with a combined Results and Discussion section. We have added the section title of Discussion.

      (4) Why did the authors not include SNPs associated with age at menopausal onset? See for example: https://www.nature.com/articles/s41586-021-03779-7https://urldefense.com/v3/__https://www.nature.com/articles/s41586-021-03779-7__;!!HYjtAOY1tjP_!Kl_ZKCmWOQEnvEbl46TG0TuhlsxapwvFdAFfZJkMvz8z7XhX5VEA1cT8CVvNu8xrv9k679Kl0XTrxwSajUeiXWm04XP4$.

      Thank you for your information. Our manuscript focuses on the antagonistic pleiotropy theory, which posits that inherent trade-off in natural selection, where genes beneficial for early survival and reproduction (like menarche and childbirth) may have costly consequences later. So, we only included age at menarche and age at first childbirth as exposures in our research.

      (5) Can the authors include genetic correlations between menarche, age at first child, BMI, and preferably menopause?

      Thank you for your suggestion. We acknowledge that including genetic correlations between age at menarche, age at first childbirth, BMI, and menopause can provide valuable context to our analysis. While our current MR study sets age at menarche and age at first childbirth as exposures and menopause as the outcome, and we have already included results that account for BMI-related SNPs before and after correction, we recognize the importance of assessing genetic correlations.

      To address this, we calculated the genetic correlations between these traits to provide insight into their shared genetic architecture. This analysis helps clarify whether there is a significant genetic overlap between the two exposures and between exposure and outcome, which can inform and support the interpretation of our MR results. We appreciate your suggestion and include these calculations to enhance the robustness and comprehensiveness of our study. In the genetic correlations analysis, LDSC software was applied and the genetic correlation values for all pairwise comparisons among age at menarche, age at first birth, BMI, and age at menopause onset were calculated(15,16). The results are listed in Table S6.

      (6) Line 39-40: that is not entirely true. There is also amounting evidence that socioeconomic factors cause earlier onset of menarche through stress-related mechanisms: https://doi.org/10.1016/j.annepidem.2010.08.006https://urldefense.com/v3/__https://doi.org/10.1016/j.annepidem.2010.08.006__;!!HYjtAOY1tjP_!Kl_ZKCmWOQEnvEbl46TG0TuhlsxapwvFdAFfZJkMvz8z7XhX5VEA1cT8CVvNu8xrv9k679Kl0XTrxwSajUeiXZ4vbX0y$

      Thank you so much for your information. We changed it to “Considering reproductive events are partly regulated by genetic factors that can manifest the physiological outcome later in life”.

      (7) Why did the authors choose to work with studies derived from IEU Open GWAS? as it is often does not contain the most recent and relevant GWAS for a specific trait.

      We chose to work with studies derived from the IEU Open GWAS database after careful consideration of several sources, including the GWAS Catalog database and recently published GWAS papers. Our selection criteria focused on publicly available GWAS with large sample sizes and a higher number of SNPs to ensure robust analysis. For specific traits such as late-onset Alzheimer's disease and eye aging, we used GWAS data published in scientific articles to ensure that our research reflects the latest findings in the field.

      (1) Barban, N. et al. Genome-wide analysis identifies 12 loci influencing human reproductive behavior. Nat Genet 48, 1462-1472 (2016). https://doi.org/10.1038/ng.3698

      (2) Tropf, F. C. et al. Hidden heritability due to heterogeneity across seven populations. Nat Hum Behav 1, 757-765 (2017). https://doi.org/10.1038/s41562-017-0195-1

      (3) Stearns, S. C., Byars, S. G., Govindaraju, D. R. & Ewbank, D. Measuring selection in contemporary human populations. Nat Rev Genet 11, 611-622 (2010). https://doi.org/10.1038/nrg2831

      (4) Day, F. R., Elks, C. E., Murray, A., Ong, K. K. & Perry, J. R. Puberty timing associated with diabetes, cardiovascular disease and also diverse health outcomes in men and women: the UK Biobank study. Sci Rep 5, 11208 (2015). https://doi.org/10.1038/srep11208

      (5) Hollis, B. et al. Genomic analysis of male puberty timing highlights shared genetic basis with hair colour and lifespan. Nat Commun 11, 1536 (2020). https://doi.org/10.1038/s41467-020-14451-5

      (6) Field, A. E. et al. Impact of overweight on the risk of developing common chronic diseases during a 10-year period. Arch Intern Med 161, 1581-1586 (2001). https://doi.org/10.1001/archinte.161.13.1581

      (7) Singh, G. M. et al. The age-specific quantitative effects of metabolic risk factors on cardiovascular diseases and diabetes: a pooled analysis. PLoS One 8, e65174 (2013). https://doi.org/10.1371/journal.pone.0065174

      (8) Kivimaki, M. et al. Obesity and risk of diseases associated with hallmarks of cellular ageing: a multicohort study. Lancet Healthy Longev 5, e454-e463 (2024). https://doi.org/10.1016/S2666-7568(24)00087-4

      (9) Kivimaki, M. et al. Body-mass index and risk of obesity-related complex multimorbidity: an observational multicohort study. Lancet Diabetes Endocrinol 10, 253-263 (2022). https://doi.org/10.1016/S2213-8587(22)00033-X

      (10) Savage, J. E. et al. Genome-wide association meta-analysis in 269,867 individuals identifies new genetic and functional links to intelligence. Nat Genet 50, 912-919 (2018). https://doi.org/10.1038/s41588-018-0152-6

      (11) Gao, X. et al. The bidirectional causal relationships of insomnia with five major psychiatric disorders: A Mendelian randomization study. Eur Psychiatry 60, 79-85 (2019). https://doi.org/10.1016/j.eurpsy.2019.05.004

      (12) Burgess, S., Small, D. S. & Thompson, S. G. A review of instrumental variable estimators for Mendelian randomization. Stat Methods Med Res 26, 2333-2355 (2017). https://doi.org/10.1177/0962280215597579

      (13) Staley, J. R. et al. PhenoScanner: a database of human genotype-phenotype associations. Bioinformatics 32, 3207-3209 (2016). https://doi.org/10.1093/bioinformatics/btw373

      (14) Kamat, M. A. et al. PhenoScanner V2: an expanded tool for searching human genotype-phenotype associations. Bioinformatics 35, 4851-4853 (2019). https://doi.org/10.1093/bioinformatics/btz469

      (15) Bulik-Sullivan, B. et al. An atlas of genetic correlations across human diseases and traits. Nat Genet 47, 1236-1241 (2015). https://doi.org/10.1038/ng.3406

      (16) Bulik-Sullivan, B. K. et al. LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat Genet 47, 291-295 (2015). https://doi.org/10.1038/ng.3211

    1. Author response:

      We thank the reviewers for their thoughtful comments and suggestions. We plan to make a number of revisions to the manuscript to address their feedback.

      Firstly, we plan to incorporate feedback related to our modeling approach. We will provide justification for the chosen models and why this dataset is not appropriate for an in-depth exploration of other models. In particular, we will highlight that the models included in this manuscript were taken from Langdon et al. (2019) with a minor extension. Model development and validation in the Langdon et al. (2019) paper required a dataset with >100 rats per task. As the current n per variant is 28-32, and behavioral performance on this task is highly variable, it would be difficult to sufficiently test the validity of models that majorly depart from the previously tested RL models. Nevertheless, we will acknowledge this as a limitation in the discussion section. Additionally, we will test some alternatives suggested by reviewers that fall within the scope of the current RL modeling framework (e.g., comparison to a standard delta-rule update for unrewarded choices). We will address other concerns brought up by reviewers by a.) providing a rationale for why we constrained our analyses to the first five sessions, b.) simulating data for sessions that match those that were analyzed in the real data (i.e., sessions 35-40 instead of 18-20), and c.) including a figure of the simulated choice probabilities rather than just risk score.

      Secondly, we will include additional analyses and clarify the current statistical approach to address comments on how the data were analyzed. We will include an analysis of task acquisition to investigate when choice preferences emerge across the different variants. We will justify the statistical approach used for detecting behavioral differences between task variants, including a better explanation of the inclusion of the risky/optimal label as a between-subjects factor in the ANOVAs. We will also expand the section on parameters predicting risk preference on the rGT to fully explain the statistical method used and provide a figure of the results.

      Lastly, we will provide a more detailed rationale for the reinforcer devaluation test, and describe the hypothesis it tests. We will also expand on how the results from the devaluation test support our conclusions, and address alternative explanations suggested by the reviewers.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1: 

      (1) As discussed in review and nicely simulated by the authors, the large figure error indicated by profilometry (~10 um in some cases on average) is inconsistent with the optical performance improvements observed, suggesting that those measurements are inaccurate.

      I see no reason to include these inaccurate measurements.  

      We agree with the Referee and removed the indicated figure (old Supplementary Fig. 4) and data.

      Reviewer #3:

      (1) It would be interesting to comment on how the addition of a coverslip changes the performance of the uncorrected microendoscope compared to the use of bare grin lenses. 

      We modified the discussion section (page 18) and added a new reference (#36) to include the request of the Referee.

      (2) In Figure 6C-H, the authors can indeed show data corresponding to all detected cells, but I still think that the statistics should be calculated using the same effective FOV. 

      We modified Figure 6 legend to include the request of the Referee.

      (3) Authors could present the images in Figures 4-6 as in the original version, with a scale bar in the centre of the FOV that is different for the two types of objectives (corrected vs uncorrected). They could add a short justification for this choice, and perhaps present the other version for Figure 4 in a supplementary information sheet (with similar scale bars at the centre of the FOV for both types of objectives). It would allow readers to appreciate that the FOV still appears significantly enlarged with this other presentation.

      As requested by the Referee, we modified the text in the Result section (page 11) and added the additional version of Figure 4 as Figure 4-figure supplement 1.

    1. Author response:

      The following is the authors’ response to the original reviews

      eLife Assessment

      This study presents potentially valuable insights into the role of climbing fibers in cerebellar learning. The main claim is that climbing fiber activity is necessary for optokinetic reflex adaptation, but is dispensable for its long-term consolidation. There is evidence to support the first part of this claim, though it requires a clearer demonstration of the penetrance and selectivity of the manipulation. However, support for the latter part of the claim is incomplete owing to methodological concerns, including unclear efficacy of longer-duration climbing fiber activity suppression.

      We sincerely appreciate the thoughtful feedback provided by the reviewer regarding our study on the role of climbing fibers in cerebellar learning. Each point raised has been carefully considered, and we are committed to addressing them comprehensively. We acknowledge the importance of addressing methodological concerns, particularly regarding the efficacy of long-term suppression of CF activity, as well as ensuring clarity regarding the penetrance and selectivity of our manipulation. To this end, we have outlined plans for substantial revisions to the manuscript to adequately address these issues.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The study by Seo et al highlights knowledge gaps regarding the role of cerebellar complex spike (CS) activity during different phases of learning related to optokinetic reflex (OKR) in mice. The novelty of the approach is twofold: first, specifically perturbing the activity of climbing fibers (CFs) in the flocculus (as opposed to disrupting communication between the inferior olive (IO) and its cerebellar targets globally); and second, examining whether disruption of the CS activity during the putative "consolidation phase" following training affects OKR performance.

      The first part of the results provides adequate evidence supporting the notion that optogenetic disruption of normal CF-Purkinje neuron (PN) signaling results in the degradation of OKR performance. As no effects are seen in OKR performance in animals subjected to optogenetic irradiation during the memory consolidation or retrieval phases, the authors conclude that CF function is not essential beyond memory acquisition. However, the manuscript does not provide a sufficiently solid demonstration that their longterm activity manipulation of CF activity is effective, thus undermining the confidence of the conclusions.

      Strengths:

      The main strength of the work is the aim to examine the specific involvement of the CF activity in the flocculus during distinct phases of learning. This is a challenging goal, due to the technical challenges related to the anatomical location of the flocculus as well as the IO. These obstacles are counterbalanced by the use of a well-established and easy-to-analyse behavioral model (OKR), that can lead to fundamental insights regarding the long-term cerebellar learning process.

      Weaknesses:

      The impact of the work is diminshed by several methodological shortcomings.

      Most importantly, the key finding that prolonged optogenetic inhibition of CFs (for 30 min to 6 hours after the training period) must be complemented by the demonstration that the manipulation maintains its efficacy. In its current form, the authors only show inhibition by short-term optogenetic irradiation in the context of electrical-stimulation-evoked CSs in an ex vivo preparation. As the inhibitory effect of even the eNpHR3.0 is greatly diminished during seconds-long stimulations (especially when using the yellow laser as is done in this work (see Zhang, Chuanqiang, et al. "Optimized photo-stimulation of halorhodopsin for long-term neuronal inhibition." BMC biology 17.1 (2019): 1-17. ), we remain skeptical of the extent of inhibition during the long manipulations. In short, without a demonstration of effective inhibition throughout the putative consolidation phase (for example by showing a significant decrease in CS frequency throughout the irradiation period), the main claim of the manuscript of phase-specific involvement of CF activity in OKR learning cannot be considered to be based on evidence.

      Second, the choice of viral targeting strategy leaves gaps in the argument for CF-specific mechanisms. CaMKII promoters are not selective for the IO neurons, and even the most precise viral injections always lead to the transfection of neurons in the surrounding brainstem, many of which project to the cerebellar cortex in the form of mossy fibers (MF). Figure 1Bii shows sparsely-labelled CFs in the flocculus, but possibly also MFs. While obtaining homogenous and strong labeling in all floccular CFs might be impossible, at the very least the authors should demonstrate that their optogenetic manipulation does not affect simple spiking in PNs.

      Finally, while the paper explicitly focuses on the effects of CF-evoked complex spikes in the PNs and not, for example, on those mediated by molecular layer interneurons or via direct interaction of the CF with vestibular nuclear neurons, it would be best if these other dimensions of CF involvement in cerebellar learning were candidly discussed.

      We appreciate the reviewer’s thorough evaluation, which thoughtfully highlights the strengths and areas for improvement in our study.

      We agree with the reviewer’s recognition of the novelty of our approach, particularly in specifically perturbing climbing fiber (CF) activity in the flocculus and examining its effects across distinct phases of learning. Additionally, our use of the well-established OKR behavior paradigm provides a robust framework for investigating cerebellar learning processes, further strengthening our study.

      To address concerns regarding the efficacy of long-term optogenetic inhibition and the specificity of viral targeting, we conducted additional experiments. These include in vivo monitoring of CF activity during the irradiation period, confirming sustained inhibition of complex spikes throughout the consolidation phase. To ensure precise targeting and mitigate potential side effects, such as unintended modification of Purkinje cell (PC) simple spike activity, we demonstrated that optogenetic suppression of CF transmission did not affect simple spike firing. Furthermore, we made additional characterizations to confirm the specificity of viral targeting.

      Lastly, we recognize the importance of exploring alternative mechanisms underlying CF involvement in cerebellar learning. Accordingly, we expanded the manuscript to provide a more comprehensive discussion of these mechanisms, offering a clearer perspective on the broader implications of our findings.

      Reviewer #2 (Public Review):

      Summary:

      The authors aimed to explore the role of climbing fibers (CFs) in cerebellar learning, with a focus on optokinetic reflex (OKR) adaptation. Their goal was to understand how CF activity influences memory acquisition, memory consolidation, and memory retrieval by optogenetically suppressing CF inputs at various stages of the learning process.

      Strengths:

      The study addresses a significant question in the cerebellar field by focusing on the specific role of CFs in adaptive learning. The authors use optogenetic tools to manipulate CF activity. This provides a direct method to test the causal relationship between CF activity and learning outcomes.

      Weaknesses:

      Despite shedding light on the potential role of CFs in cerebellar learning, the study is hampered by significant methodological issues that question the validity of its conclusions. The absence of detailed evidence on the effectiveness of CF suppression and concerns over tissue damage from optogenetic stimulation weakens the argument that CFs are not essential for memory consolidation. These challenges make it difficult to confirm whether the study's objectives were fully met or if the findings conclusively support the authors' claims. The research commendably attempts to unravel the temporal involvement of CFs in learning but also underscores the difficulties in pinpointing specific neural mechanisms that underlie the phases of learning. Addressing these methodological issues, investigating other signals that might instruct consolidation, and understanding CFs' broader impact on various learning behaviors are crucial steps for future studies.

      We appreciate the reviewer’s recognition of the significance of our study in addressing the fundamental question of the role of CF in adaptive learning within the cerebellar field. The use of optogenetic tools indeed provides a direct means to investigate the causal relationship between CF activity and learning outcomes.

      To address concerns regarding the effectiveness of CF suppression during consolidation, we plan to conduct further in-vivo recordings. These will demonstrate how reliably CF transmission can be suppressed through optogenetic manipulation over an extended period.

      In response to the concern about potential tissue damage from laser stimulation, we believe that our optogenetic manipulation was not strong enough to induce significant heat-induced tissue damage in the flocculus. According to Cardin et al. (2010), light applied through an optic fiber may cause critical damage if the intensity exceeds 100 mW, which is eight times stronger than the intensity we used in our OKR experiment. Furthermore, if there had been tissue damage from chronic laser stimulation, we would expect to see impaired long-term memory reflected in abnormal gain retrieval results tested the following day. However, as shown in Figures 2 and 3, there were no significant abnormalities in consolidation percentages even after the optogenetic manipulation.

      Finally, we appreciate the reviewer’s recognition of the challenges involved in pinpointing specific neural mechanisms. We plan to expand the discussion to address these complexities and outline future research directions.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Inhibitory optogenetic actuators are generally problematic, especially in time frames longer than seconds. If the authors wish to be able to inhibit activity in the flocculus-targeting CFs for a long time, maybe it would make sense to try to retrogradely transfect the IO neurons from the flocculus (using a cre-lox approach) with inhibitory DREADDs. This approach is also full of problems, so the absence or significant decrease in CS activity throughout the period of manipulation must be demonstrated.

      In addition to re-examining the strength of the evidence regarding the role of CFs in the consolidation and retrival phases, the manuscript would benefit from significant reworking of the details in the manuscript and figures. Below is a possibly incomplete list of things we would want to highlight:

      (1) While the text states the authors "... verified the potential reduction of Cs firing rate in PCs of awake mice in vivo by inhibiting CF signals", the data nor a figure are shown. This is of critical importance when judging the reliability of the following results. The data presented in panels Figure 1D-E should also be improved to be more informative, specifically, the waveforms of EPSCs should be shown in higher resolution. We are not informed about how many cells/slices/animals the results are obtained from, nor how many trials were done per condition. Finally, the in vitro data is from vermal Purkinje neurons, while the focus of the work is in the flocculus. Please provide these verifications for the flocculus.

      To verify the suppression of complex spike (Cs) activity, we conducted additional in-vivo experiments and added Figure 2, which presents recordings of Cs firing rates from Purkinje cells (PCs) during optogenetic suppression of climbing fiber (CF) activity. These data demonstrate that the suppression specifically and robustly targets Cs activity without affecting simple spike firing, as shown in Figure 2C. The results presented in Figure 2 were acquired at 40 minutes of optostimulation, consistently showing effective suppression of Cs activity throughout this period. While continuous recordings over several hours were not performed, the stability and sustained suppression observed at the 40-minute mark strongly suggest that the manipulation remains effective during the extended durations required for the behavioral tests.

      Additionally, we have improved Figure 1D by enhancing the resolution of EPSC waveforms and including more detailed information in the figure legend regarding the number of cells and animals analyzed. For the current-clamp mode data (Figures 1E and F), we clarified the experimental conditions to provide additional context. While the in vitro data were collected from vermal PCs, these experiments were intended to illustrate the fundamental properties of CF-PC transmission.

      (2) It is challenging to get a homogenous transfection of all CFs in a given region. To be able to judge the significance of the results, the readers should be provided with material allowing assessing the transfection quality. The images shown in panels Bi-ii are spatially restricted and of too low quality to make judgements. Also, it is not stated whether the images shown are from GFP or NpHR-transfected animals. These different payloads are delivered using different viral capsids (AAV1 vs. AAV9) that have significantly different transfection capacities and results from AAV9-CamKIIGFP cannot be generalized to AAV1-CamKII-NpHR. Please show the expression for the capsid used with NpHR.

      To clarify, the images in Figure Bi-ii are representative of GFP expression in animals transfected using AAV1-CamKII-EGFP. The purpose of these panels is to confirm the successful targeting of the region of interest rather than to evaluate viral tropism or capsid-specific transfection efficiency. Moreover, while the transfection characteristics of AAV1 and AAV9 may differ, the key experimental parameter of effective CF suppression was validated through in-vivo electrophysiological recordings, which robustly confirm the efficacy of NpHR expression.

      (3) Finally, please show the location of the optic fiber implant in the flocculus from post-mortem images.

      In Figure 3a of our revised manuscript, we added post-mortem histological images showing the exact location of the optic fiber implants in the flocculus. These images provided clear confirmation that the optogenetic stimulation was targeted to the correct anatomical region, ensuring that the observed effects are attributable to CF manipulation in the flocculus.

      Reviewer #2 (Recommendations For The Authors):

      (1) The efficacy of CF suppression is questionable. The histology in Figure 1 shows that only a handful of CFs are transduced in their approach. This observation casts doubt on the claimed complete suppression of CF-evoked EPSCs in every recorded PC in the same figure. This necessitates a more detailed explanation for this apparent discrepancy. Also, the absence of current-clamp recordings to measure the effect on CF-evoked complex spiking in PCs and the lack of detail regarding the timing of optogenetic actuation (continuous or pulsed) during these slice experiments are also significant omissions.

      We are providing additional in vivo electrophysiological recordings showing sustained CF suppression in awake animals (Figure 2). These recordings will directly demonstrate the extent of CFevoked complex spike (Cs) suppression.

      Moreover, we have included additional data of current-clamp recordings to measure the impact of CF suppression on Cs activity (Figures 1E and 1F). Regarding the timing of the optogenetic actuation, the stimulation was applied continuously in the slice experiments.

      (2) The authors claim that their method effectively suppresses CF activity in vivo, yet they do not present any supporting data. Given the histological evidence provided, it's questionable whether their approach truly impacts the CF population broadly, casting doubts on the efficacy of their suppression approach to identify the role of CFs during behavior. To address these concerns, further experiments and detailed quantification are essential to validate the extent and uniformity of CF suppression achieved.

      As we responded earlier, we conducted additional in-vivo experiments with continuous recordings of CF-evoked complex spike (Cs) activity during optogenetic suppression (Figure 2). These data directly demonstrate effective and sustained inhibition of CF transmission throughout the behavioral experiments. Quantification of CF suppression revealed consistent inhibition across the manipulation period, with no observable alterations in Purkinje cell simple spike firing rates, confirming that our intervention specifically targeted CF activity without off-target effects. In addition to the in-vivo data, the in-vitro data presented in Figure 1 (lines 107~116) further validate the efficacy of our optogenetic manipulation, showing consistent suppression of CF transmission without any failures. These findings collectively confirm the reliability and specificity of our suppression approach for studying CF contributions to behavior.

      (3) To optogenetically test the role of CFs in memory consolidation, the authors deliver continuous, high-power light to the flocculus (13 mW for 6 hrs). This extends well beyond typical experimental conditions. The sustained nature of the light exposure thus brings into question the consistency and reliability of CF suppression over time. Firstly, it is imperative to determine whether CF activity is suppressed throughout this extended period. Secondly, the intensity and duration of light exposure carry a significant risk of causing extensive damage to the surrounding tissue. Given these concerns, a thorough histological examination is warranted to assess the potential adverse effects on tissue integrity. Such an analysis is crucial not only for validating the experimental outcomes but also for ensuring that the observed effects are not confounded by light-induced tissue damage.

      To address whether CF activity is suppressed throughout the extended period, we included new in-vivo recordings demonstrating robust suppression of CF transmission, as evidenced by inhibited complex spikes sustained at 40 minutes of optostimulation. Regarding potential tissue damage, our optogenetic protocol used a light intensity (13 mW), which is much lower than the 75 mW threshold reported by Cardin et al. (2010) as sufficient to maintain normal neuronal activity. Moreover, critical damage typically requires intensities exceeding 100 mW for several hours (Cardin, Jessica A., et al. "Targeted optogenetic stimulation and recording of neurons in vivo using cell-type-specific expression of Channelrhodopsin-2." Nature protocols 5.2 (2010): 247-254.). Finally, we observed no abnormalities in long-term memory consolidation or gain retrieval (Figures 3C, 4C, 4F), further supporting that our light stimulation did not induce tissue damage.

      (4) The generalizability of their findings to various learning behaviors remains uncertain. Given that the flocculus plays a role in vestibulo-ocular reflex (VOR) adaptation, which encompasses both CFdependent and CF-independent learning types (gain increase and gain decrease, respectively), this system could offer a more feasible approach for investigating hypotheses about the role of CFs in guiding distinct learning processes.

      In response to the reviewer’s comment on the generalizability of our findings to learning behaviors involving both CF-dependent and CF-independent mechanisms, we acknowledge the importance of examining these dynamics in cerebellar motor adaptation systems, such as the OKR. Although our study used an OKR task, findings from VOR studies apply here. Ke et al. (2009) demonstrated that VOR gain increases (CF-dependent) and gain decreases (CF-independent) involve distinct plasticity processes (Ke, Michael C., Cong C. Guo, and Jennifer L. Raymond. "Elimination of climbing fiber instructive signals during motor learning." Nature neuroscience 12.9 (2009): 1171-1179), suggesting that CF engagement is task-dependent, particularly for larger error signals that require CF-guided adaptation.

      Similarly, our OKR findings suggest that CF-dependent pathways are likely used for large, persistent errors, whereas CF-independent mechanisms may drive more gradual adjustments. This alignment between OKR and VOR systems supports the generalizability of CF-selective adaptation across cerebellar learning tasks. We have elaborated on this point in our revised manuscript (lines 219~237), clarifying how CF-dependent and CF-independent mechanisms can generalize across motor learning contexts in the cerebellum.

      (5) The acute effect of CF suppression on OKR eye movements warrants investigation. If OKR eye movements are altered by their method, this could complicate the interpretation of their results.

      During our experiments, we monitored ocular movements during CF optogenetic manipulation and found no aberrant effects, such as nystagmus. As shown in Figures 4G and 4H, disrupting CF signaling during gain retrieval did not alter the gain, confirming that our manipulation neither acutely affects ocular reflexes nor induces abnormal eye movement. Therefore, it leads to the conclusion that the observed effects are specific to learning and memory processes.

      (6) The authors raise the potential issue of inducing presynaptic LTD in CFs. Can they be sure that their manipulation doesn't generate a similar effect? Additional controls or techniques to accurately interpret the results are needed considering this concern.

      However, our discussion does not claim that optogenetic suppression directly induces CF-LTD. Instead, we posit that CF suppression may have mimicked the functional consequences of CFLTD, such as reduced complex spike (Cs) activity and associated calcium signaling. This, in turn, may have indirectly interfered with the induction of parallel fiber-Purkinje cell (PF-PC) LTD, thereby preventing gain enhancement during learning.

      This hypothesis is consistent with previous studies highlighting the interplay between CF and PF synaptic plasticity in cerebellar motor learning. For example, Hansel and Linden (2000) and Weber et al. (2003) discuss how changes at CF synapses can modulate Cs waveforms and calcium dynamics, which are critical for PF-PC LTD. Coesmans et al. (2004) and Han et al. (2007) further elaborate on the necessity of CF input for effective PF-PC LTD induction during learning tasks such as retinal slip correction.

      While our experiments were not designed to directly measure CF-LTD, the observed prevention of gain enhancement aligns with the hypothesis that CF suppression functionally disrupted downstream PF-PC LTD. We have clarified these points in our revised manuscript (lines 250~258) to avoid misunderstanding.

      (7) The specific timeframe for OKR consolidation remains uncertain, with evidence from numerous studies indicating that cerebellar memory consolidation unfolds over several days. Therefore, a more thorough investigation into these extended durations, supported by control experiments to validate the outcomes, would significantly strengthen the study's conclusions, and provide clearer insights into the consolidation process of OKR learning.

      Our current study specifically focused on the early phase of the post-learning period, as supported by findings from several studies: Cooke et al., (2004); Titley et al., (2007); Steinmetz et al., (2016); Seo et al., (2024)

      These studies collectively indicate that cerebellar-dependent memory consolidation—including OKR—can occur rapidly during the early consolidation phase. While the specific mechanisms examined in these studies vary (e.g., synaptic plasticity, intrinsic plasticity, or circuit-level changes), they consistently demonstrate that modifications in the cerebellum after the early consolidation period no longer influence memory storage or performance. This evidence strongly supports the relevance of our experimental focus and the timing of our interventions.

      We acknowledge the importance of investigating extended consolidation periods, which could indeed provide additional insights. However, given our current aims, the rapid consolidation dynamics observed in the early phase are most relevant to the questions addressed in this study. We have elaborated on these matter in our revised manuscript (lines 273~283).

      (8) Issues around whether the authors have control over CF activity with their optogenetic intervention raise questions of whether learning can be recovered during the training procedure if the optogenetic stimuli are halted. Specifically, if suppression is applied for three blocks (what the authors refer to as "sessions") during the training procedure and then ceases, does learning rapidly recover in the immediately following blocks?

      While we did not directly examine the restoration of learning capability within the same training session following the cessation of optogenetic inhibition, we believe several aspects of our experimental design and insights from prior studies support our interpretation.

      Our optogenetic intervention specifically targeted Purkinje cells (PCs) in the flocculus and was applied continuously during designated training sessions to modulate cerebellar activity. Notably, Medina et al. (2001) demonstrated that transient inactivation of the cerebellar cortex impairs the expression of learned responses but does not disrupt the underlying plasticity mechanisms (Medina, Javier F., Keith S. Garcia, and Michael D. Mauk. "A mechanism for savings in the cerebellum." Journal of Neuroscience 21.11 (2001): 4081-4089.). This finding suggests that cerebellar plasticity remains intact and functional even after transient perturbations.

      Therefore, it is plausible that once optogenetic inhibition is lifted, the cerebellar network regains its capacity for learning and adaptation, as the intrinsic plasticity and memory encoding processes remain preserved. While we acknowledge that direct experimental confirmation of rapid recovery in our setup was not performed, this interpretation is consistent with our experimental framework and the broader literature.

      (9) The study does not fully explore the instructive signals/mechanisms underlying the memory consolidation process. A detailed investigation into potential instructive signals for consolidation beyond CF-induced signaling, like the simple spiking of PCs, could significantly enhance the study's conclusions. Indeed, there is currently no evidence to suggest that CFs play a role in the consolidation phase anyway so testing their role seems a bit of a strawman argument.

      While our study primarily focused on characterizing CF-dependent pathways, we acknowledge that memory consolidation is likely driven by a multifaceted interplay of instructive signals beyond CF-induced mechanisms. In particular, Purkinje cell (PC) simple spiking may act as a critical signal during the consolidation phase, either complementing or functioning independently of CF input. Emerging evidence suggests that simple spiking can modulate downstream circuitry in ways that stabilize and strengthen memory traces.

      To address this, we have expanded the discussion in the revised manuscript to explore potential instructive signals for consolidation, including PC simple spiking, local circuit plasticity within the cerebellar cortex, and its interaction with the cerebellar nuclei. We propose that these mechanisms collectively contribute to the transfer and stabilization of motor memory, offering a more comprehensive framework for understanding consolidation. We have elaborated on these matter in our revised manuscript (lines 238~250).

      (10) Previous reports have highlighted the necessity of CF activity for extinction/memory maintenance (Medina et al. 2002; Kim et al. 2020). That is, the absence of CF activity is consequential for cerebellar function. These results present a potential contrast to the findings reported in this current study. This discrepancy raises important questions about the experimental conditions, methodologies, and interpretations of CF function across different studies. A thorough discussion comparing these divergent outcomes is essential, as it could elucidate the specific contexts or conditions under which CF activity influences memory processes.

      We acknowledge that previous studies (Medina et al., 2002; Kim et al., 2020) have suggested a role for climbing fiber (CF) activity in extinction. However, our study specifically focuses on the acquisition phase of motor learning and does not extend to extinction or maintenance. As such, we have revised our discussion to limit interpretations strictly to the scope of our findings and removed references to extinction.

      The discrepancies between our results and prior work may arise from differences in methodologies and behavioral paradigms. For instance, we utilized optogenetic inhibition to achieve precise temporal and spatial control of CF activity, whereas previous studies employed pharmacological or lesion methods that may have broader effects on the cerebellar circuitry. Additionally, differences in behavioral paradigms, such as the optokinetic reflex (OKR) task used in our study compared to the eye-blink conditioning tasks in prior studies, may demand distinct roles for CF signaling depending on the specific requirements for error correction and adaptation.

      This clarification is now incorporated into our revised manuscript, and the discussion has been streamlined to focus on the phase-specific role of CF activity during acquisition without extending to extinction or maintenance (lines 259~270).

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public Review):

      The article emphasizes vocal social behavior but none of the experiments involve a social element. Marmosets are recorded in isolation which could be sufficient for examining the development of vocal behavior in that particular context. However, the early-life maturation of vocal behavior is strongly influenced by social interactions with conspecifics. For example, the transition of cries and subharmonic phees which are high-entropy calls to more low-entropy mature phees is affected by social reinforcement from the parents. And this effect extends cross context where differences in these interaction patterns extend to vocal behavior when the marmosets are alone. From the chord diagrams, cries still consist of a significant proportion of call types in lesioned animals. Additionally, though it is an intriguing finding that the infants' phee calls have acoustic differences being 'blunted of variation, less diverse and more regular,' the suggestion that the social message conveyed by these infants was 'deficient, limited, and/or indiscriminate' is not but can be tested with, for example, playback experiments.

      We recognize that our definition of vocal social behavior is not within the normal realm of direct social interactions. We were particularly interested in marmoset vocalizations as a social signal, such as phees, cries and twitter, even when their family members or conspecifics are not visibly present. Generally speaking, in the laboratory, infant marmosets make few calls when in the presence of another conspecific, but when isolated they naturally make phee calls to reach out to their distantly located relatives. In this context, while we did not assess the animals interacting directly, we assessed what are normally referred to as ‘social contact calls,’ hence the term ‘social vocalizations.’ Playback recordings might provide potential evidence of antiphonal calling as a means of social interaction and might reveal the poor quality of the social message conveyed by the infant, but even here, the vocalizing marmoset would be calling to a non-visible conspecific. Thus, although our experiment lacked a direct social element, our data suggest that in the absence of a functioning ACC in early life, infant calls that convey social information, and which would elicit feedback from parents and other family members, may be compromised, and this could potentially influence how that infant develops its social interactive skills. We have now commented on the significance of social vocalizations in the introductory text (page 3) and discussion (page 15).

      The manuscript would benefit from the addition of more details to be able to better determine if the conclusions are well supported by the data. Understanding that this is very difficult data to get, the number of marmosets and some variability in the collection of the data would allow for the plotting of each individual across figures. For example, in the behavioral figures, which is the marmoset that is in the behavioral data that has a sparing of the ACC lesion in one hemisphere? Certain figures, described below in the recommendations for the authors, could also do with additional description.

      Thanks for these suggestions. We have plotted the individual animals in the relevant figures and addressed the comments and recommendations listed below.

      Reviewer #1 (Recommendations For The Authors):

      Given the number of marmosets, variability in the collected data, lesion extent, and different controls, I would like to see more plots with individuals indicated (perhaps with different symbols). More details could also be added for several plots.

      Figure 2D (new) and 2E now have plots that represent the individual animals, each represented by a different symbol.

      Figure 2A) Since lesions are bilateral, could you also show the extent of the lesions on the other side for completeness?

      Our intention was to process one hemisphere of each brain for Golgi staining to examine changes in cell morphology in the ACC and associated brain regions following the lesion. Unfortunately, the Golgi stain was unsuccessful. Consequently, we were unable to use the tissue to reconstruct the bilateral extent of the lesion. We did, however, first establish the bilateral nature of the lesion through coronal slices of the animals MRI scan before processing the intact hemisphere to confirm the bilateral extent of the lesion. The MRI scans (every 5th section) for each control and lesioned animal is compiled in a figure in the supplementary materials (Fig. S1). These scans show that the ACC-lesioned animals have bilateral lesions with one animal (ACC1) showing some sparing in one hemisphere, as we noted in the text. We have now made reference to this supplemental figure in the text (page 5).

      Figure 2B/C) In Figure 2B, control and ACC lesions are in the columns while right next to it in 2C, ACC lesion and control are in the rows. Could these figures be adjusted so that they are consistent?

      We have now adjusted these figures and updated the figure legends accordingly.

      Figure 2C) Is there quantification of the 'loss of neurons and respective increase in glial cells at the lesioned site especially at the interface between gray and white matter'? There are multiple slices for each animal.

      Thanks for suggesting this. We have now quantified these data which are presented as a new graph as Fig. 2D. These data revealed a significant loss of neurons (NeuN) in the ACC group as well as an increase in glial cells (GFAP and Iba1) relative to the controls. The figure legend and results have also been updated.

      Figure 2C) It is difficult for me to distinguish between white and purple - could you show color channels independently since images were split into separate channels for each fluorophore?

      Fig. 2C has been revised to better visualize the neurons and glia at the gray and white matter interface. We found that grayscale images for each channel offered a better contrast than separating the channels for each fluorophore.

      Figure 2C/D) I like how there are individual dots here for the individual marmosets. Since there are four in each group, could they be represented throughout with symbols (with a key indicating the pair and also the control condition)? For example, were there changes in the histology for control animals that got saline injections as opposed to those that didn't get any surgery?

      We have highlighted the individual animals with different symbols in the figures. Although some animals were twin pairs, it was not possible to have twins in all cases. Only two sets were twins. We have indicated the symbols that represent the twin pair in Fig. 2 as well as the MRI scans of the twin pairs in Fig. S1. There were no observed changes in histology for the sham animals relative to the other non-sham controls. The MRI scan for one sham CON2 shows herniated tissue in the right hemisphere which is a normal consequence of brain exposure caused by a craniotomy.

      Figure 3D-E) Here, individual data points could be informative especially given that some animals are missing data past the third week.

      To prevent cluttering the figure with too many data points, we have added the sample size for each group in the figure legend (pages 33).

      Figure 3D/F) What exactly is the period that goes into this analysis? In the text, 'Further analysis showed that the ACC lesion had minimal effects on the rate of most call types during this period'. Is this period from weeks 3 to 6 relative to the proportions in week 2? I think I also don't quite understand the chord diagram. The legend says 'the numbers around each chord diagram represents relative probability value for each call type transition' so how does that relate to the proportion of these call types? It looks like there is a wider slice for cries for ACC-lesioned animals each week. I also don't see in the week 4 chord diagram, the text description of 'elevation in the rate of 'other' calls, which comprised tsik, egg, eck, chatter and seep calls. These calls were significantly elevated in animals after the ACC lesion."

      We apologize for the confusion. Fig 3D and Fig 3F are not directly related. Fig. 3D shows the different types of emitted calls. The figure shows the averaged data per group pooled from post-surgery weeks (week 3 – week 6). It represents the proportion of individual call types relative to the total number of calls during each recording period. The only major finding here was the increased rate of ‘other’ calls comprising tsik, egg, ock, chatter and seep calls. These calls were significantly elevated in animals after the ACC lesion.

      While Fig. 3D represents the differences in the proportion of calls, the chord diagrams in Fig. 3F represents the probability of call-to-call transition obtained from a probability matrix. At postnatal week 6, marmosets with ACC lesions showed a higher likelihood of transitions between all call types, but less frequent transitions between social contact calls relative to sham controls. The chord diagrams visualize the weighted probabilities and directionality of these transitions between the different call types. Weighted probabilities were used to account for variations in call counts. The thickness of the arrows or links indicates the probability of a call transition, while the numbers surrounding each chord diagram represent the relative probability value for each specific transition. We have now reworded the text and clarified these details in the figure legend (pages 32-33).

      Figure 3E) How is the ratio on the y-axis calculated here?

      The y-axis represents the averaged value of the ratios of the number of social contact calls relative to non-social contact calls in each recording per subject per group (i.e., (x̄ (# social calls / # non-social calls). This is now included in the figure legend and the axis is updated (page 32).

      Also, cries could be considered a 'social contact call' since they are produced by infants to elicit responses from the parents. There is also the hypothesis in the literature that cries transition into phees.

      The reviewer is correct. Cries are often considered a social contact call because they elicit parental feedback. We decided to separate cry-calls from other social contact calls for two reasons. First, in our sample, we found cry behavior to be highly variable across the animals. For example, one control infant cried incessantly whereas another control infant cried less than normal. This extreme variability in animals of the same group masked the features between animals that reliably differentiated between them. Second, cry-calls elicit feedback from parents who are normally within the vicinity of the infant whereas phee calls elicit antiphonal phee calls from any distantly located conspecific. In other words, the context in which these calls are often elicited are very different.

      The use of 'syntactical' is a bit jarring to me because outside of linguistics, its use in animal communication generally refers to meaning-bearing units that can be combined into well-formed complexes such as pod-specific whale songs or predator alarm calls with concatenated syllable types in some species of monkeys. To my knowledge, individual phee syllables have not been currently shown to carry information on their own and may be better described as 'sequential' rather than 'syntactical'.

      We agree. We have made this change accordingly.

      Figure 4B) How many phee calls with differing numbers of syllables are present each week? How equal is the distribution given that later analyses go up to 5 syllables?

      The total number of phee calls with differing number of syllables ranged between 20-40 phees. This number varied between subjects, per week. The most common were 3- and 4-syllable phee calls which ranged from 7-15. Due to this variability, Fig. 4B presents the average syllable count. The axis is now updated.

      Figure 4C-E) How is the data combined here? Is there a 2nd syllable, the combined data from the 2nd syllable from phee calls of all lengths (1 - 5?). If so, are there differences based on how long the total sequence is?

      The combined data represents the specific syllable (e.g., the 1st syllable in a 2-syllable phee, in a 3-syllable phee and in a 4-syllable phee) irrespective of the length of the sequence in a sequence. No differences were observed between 2nd syllable in a 2 syllable phee and 2nd syllable in a 3 or a 4 syllable phee. We have included this detail in the figure legend (page 33-34).

      So duration is a vocal parameter that is highly dependent on physical factors such as body size and lung volume, where there differences in physical growth between the pairs of ACC-lesioned marmosets and their twins? Entropy is less closely tied to these physical factors but has previously been shown to decrease as phee calls mature, which we can also see in the negative relationship of the control animals. Do you know of experiments that show that lower entropy calls are more 'blunted'?

      Thank you for raising the important issue of physical growth factors. For twin pairs, it is not uncommon for one infant to be slightly bigger, heavier or stronger than the other presumably because one gets more access to food. With increasing age, we did not observe significant changes in bodyweight between the groups. We examined grip strength in all infants as a means of assessing how well the infant was able to access food during nursing. Poor grip strength would indicate a lower propensity to ‘hang on’ to the mother for nursing which could lead to lower weight gain and reduced physical growth. We found that both grip strength and body weight increased as the infants got older and both parameters were equivalent. We have included an additional figure to show the normal increase in both weight and grip strength to the supplemental materials (Fig. S3) and have made reference to this in the text (page 8).

      As for entropy, it’s impact on the emotional quality of vocalizations has not been systematically explored. Generally speaking, high entropy relates to high randomness and distortion in the signal. Accordingly, one view posits low-entropy phee calls represent mature sounding calls relative to noisy and immature high-entropy calls (e.g., Takahasi et al 2017). In the current study, the reduction in syllable entropy observed for both groups of animals with increasing age is consistent with this view. At the same time entropy can relate to vocal complexity; high entropy refers to complex and variable sound patterns whereas low entropy sounds are predictable, less diverse and simple vocal sequences (Kershenbaum, A. 2013. Entropy rate as a measure of animal vocal complexity. Bioacoustics, 23(3), 195–208). One possibility is that call maturity does not equate directly to emotional quality. In other words, a low-entropy mature call can also be lacking in emotion as observed in humans with ACC damage; these patients show mature speech, but they lack the variations in rhythms, patterns and intonation (i.e., prosody) that would normally convey emotional salience and meaning. Our observation of a reduction in phee syllable entropy in the ACC group in the context of being short and loud with reduced peak frequency is consistent with this view. Our use of the word ‘blunt’ was to convey how the calls exhibited by the ACC group were potentially lacking emotional meaning. Beyond this speculation, we are not aware of any papers that have examined the relationship between entropy and blunted calls directly. We have now included this speculation in the discussion (pages 12-13).

      Reviewer #2 (Public Review):

      The authors state that the integrity of white matter tracts at the injection site was impacted but do not show data.

      We have added representative micrographs of a control and ACC-lesioned animal in a new supplementary figure which shows the neurotoxin impacted the integrity of white matter tracts local to the site of the lesion (Fig. S2).

      The study only provides data up to the 6th week after birth. Given the plasticity of the cortex, it would be interesting to see if these impairments in vocal behavior persist throughout adulthood or if the lesioned marmosets will recover their social-vocal behavior compared to the control animals.

      We agree. Our original intention was to examine behavior into adulthood. Unfortunately, the COVID-19 pandemic compromised the continuation of the study. We were limited by the data that we were allowed to acquire due to imposed restrictions. Some non-vocalization data collected when the animals were young adults is currently being prepared for another paper.

      Even though this study focuses entirely on the development of social vocalizations, providing data about altered social non-vocal behaviors that accompany ACC lesions is missing. This data can provide further insights and generate new hypotheses about the exact role of ACC in social vocal development. For example, do these marmosets behave differently towards their conspecifics or family members and vice versa, and is this an alternate cause for the observed changes in social-vocal development?

      We agree. At the time however, apparatus for assessing behavior between the infant’s family and non-family members was not available. Assessing such behaviors in the animals holding room posed some difficulty since marmosets are easily distracted by other animals as well as the presence of an experimenter, amongst other things. This is an area of investigation we are currently pursuing.

      Reviewer #3 (Public Review):

      It is striking to find that the vocal repertoire of infant marmosets was not significantly affected by ACC lesions. During development, the neural circuits are still maturing and the role of different brain regions may evolve over time. While the ACC likely contributes to vocalizations across the lifespan, its relative importance may vary depending on the developmental stage. In neonates, vocalizations may be more reflexive or driven by physiological needs. At this stage, the ACC may play a role in basic socioemotional regulation but may not be as critical for vocal production. Since the animals lived for two years, further analysis might be helpful to elucidate the precise role of ACC in the vocal behavior of marmosets.

      Figure 3D. According to the Introduction "...infant ACC lesions abolish the characteristic cries that infants normally issue when separated from its mother". Are the present results in marmosets showing the opposite effect? Please discuss.

      To date, the work of Maclean (1985) is the only publication that describes the effect of early cingulate ablation on the spontaneous production of ‘separation calls’ largely construed as cries, coos and whimpers in response to maternal separation. All of this work was largely performed in rhesus macaques or squirrel monkeys. In addition to ablating the cingulate cortex, Maclean found that it was necessary to ablate the subcallosal (areas 25) and preseptal cingulate cortex (presumably referring to prelimbic area 32) to permanently eliminate the spontaneous production of separation cry calls. Our ablation of the ACC was more circumscribed to area 24 and is therefore consistent with MacLean’s earlier work that removal of ACC alone does not eliminate cry behavior. In adults, ACC ablation is insufficient at eliminating vocalization as well. We make reference to this on pages 13-14 of the discussion.

      Figure 3E and Discussion. Phees are mature contact calls and cries immature contact calls (Zhang et al, 2019, Nat Commun). Therefore, I would rather say that the proportion of immature (cries) contact calls increases vs the mature (phee, trill, twitters) contact calls in the ACC group. Cries are also "isolated-induced contact calls" to attract the attention of the caregivers.

      The reviewer is correct in that cries are directed towards caregivers but in our sample, cry behavior was highly variable between the infants. Consequently, in Fig. 3E social contact calls include phee, twitter and trill calls but does not include cries which were separated (see also response to reviewer #1). Many of the calls made during babbling were immature in their spectral pattern (compare phee calls between Fig. 3A and 3B). Cries typically transitioned into phees, twitters or trills before they fully matured. Fig 3E shows that the controls made more isolation-induced social contact calls at postnatal week 6 which were presumably maturing at this time point. Thus, if anything, there was an increase in the proportion of mature contact calls vs immature contact calls with increasing age.

      Figure 4D. Animal location and head direction within the recording incubator can have significant effects on the perceived amplitude of a call. Were these factors taken into account?

      The reviewer makes an excellent observation. Unfortunately, we did not account for location and head direction because the infants were quite mobile in the incubator. The directional microphone was hidden from view because the infants were distracted by it, and positioned ~12 cm from the marmoset, and placed in the exact same location for every recording. In addition, calls with phantom frequencies were eliminated during visual inspection of spectrograms. Beyond these details, location and head direction were not taken into account.

      Figure 4E. When a phee call has a higher amplitude, as is the case for the ACC group (Figure 4D), the energy of the signal will be concentrated more strongly at the phee call frequency ~8KHz. This concentration of the energy reduces the variability in the frequency distribution, leading to lower entropy. The interpretation of the results should be reconsidered. A faint call (control group) can exhibit more variability in the frequency content since the energy is distributed across a wider range of frequencies contributing to higher entropy. It can still be "fixed, regular, and stereotyped" if the behavior is consistent or predictable with little variation. Also, to define ACC calls as "monotonic" I would rather search for the lack of frequency modulation, amplitude variation, or narrower bandwidth.

      We very much appreciate this explanation. We were able to identify the maximum frequency that closely matched pitch of a sound for each syllable in a multisyllabic phee. New Fig. 4E shows that the peak frequency for each phee syllable was lower in the ACC-lesioned monkeys which may directly translate to the low entropy observed in this group. The term “monotonic” was used to relate our data to the classical and long-standing evidence of human ACC lesions causing monotonous intonation of speech. When all factors are taken into account, it is evident that the vocal phee signature of the ACC-lesioned animal was structurally different to the controls implicating a less complex and stereotyped ACC signal. Further studies are needed to systematically explore the relationship between entropy and emotional quality of vocalizations

      Apart from the changes in the vocal behavior, did the AAC lesions manifest in any other observable cognitive, emotional, or social behavior? ACC plays a role in processing pain and modulating pain perception. Could that be the reason for the observed increase in the proportion of cries in the ACC group and the increase in the phee call amplitude? Did the cries in the ACC group also display a higher amplitude than the cries in the control group?

      It was our intention to acquire as much data as possible from these infants as they matured from a cognitive, social and emotional perspective. Unfortunately, our study was hampered by variety of reasons including the COVID-19 pandemic which imposed major restrictions on our ability to continue with the experiment in a time sensitive manner. In addition, the development and construction of the custom apparatus to measure these behaviors was stalled during this period further preventing us from collecting behavioral data at regular time intervals. As for the cry behavior, the number of cries, in the ACC group were very low especially at postnatal week 5 and 6. Consequently, there were very few data points to work with.

      Discussion. Louder calls have the potential to travel longer distances compared to fainter calls, possess higher energy levels, and can propagate through the environment more effectively. If the ACC group produced louder phee syllables, how could be the message conveyed over long distances "deficient, limited, and/or indiscriminate"?

      Thanks for raising this interesting concept. Not all calls emitted by the animals were loud. We specifically examined the long-distance phee call in this regard. The phee syllables emitted by the ACC group were high amplitude with low frequencies, short duration and low entropy. Taking these factors into account, it is conceivable that the phee calls produced by the ACC group could not effectively convey their message over long distances despite their propagation through the environment. We have made reference to this in the discussion where we focus is specifically on the phee calls only (pages 12).

      Abstract: Do marmosets have syntax? Consider replacing "syntactical" with a more appropriate term (maybe "syntax-like").

      Thanks for this suggestion. We have replaced the term syntactical with ‘sequential’ as per the recommendation of reviewer #1.

      Introduction: "...cries that infants normally issue when separated from its mother". Please replace "its" with "their".

      This has been corrected.

      Results: Is the reference to Fig 1B related to the text?

      We have included and referred to Fig. 1B in the text (results and methods) to show other researchers how they can use this technique as a reliable and safe means of monitoring tidal volume under anesthesia in small infant marmoset without intubation.

      I understand that both "spectrograph" and "spectrogram" are used to analyze the frequency content of a signal. Nevertheless, "spectrogram" refers to the visual representation of the frequency content of a signal over time, and this term is commonly used in audio signal processing and specifically in the vocal communication field. I would recommend replacing "spectrograph" with "spectrogram".

      Thanks for this suggestion. We have corrected this throughout the manuscript.

      (Concerning the previous comment in the public review). Cries are uttered to attract the attention of the caregivers. The increase in the proportion of cries in the ACC group does not match the sentence: "...these infants appeared to make little effort in using vocalizations to solicit social contact when socially isolated".

      We apologize for the confusion. It is not the case that the ACC animals make more cries. Cry calls were highly variable amongst the animals. Consequently, although Fig 3D gives the impression that the proportion of cries in higher in ACC animals they did not differ significantly from the controls. Due to their high variability, cries were removed in the measurement of social contact. Accordingly, Fig. 3E does not include cry behavior; it shows that the ACC animals engage less in social contact calls.

      Related to Figure 3. What is the difference between "egg" and "eck" calls? Do you mean "ock"?

      We apologize. This was a typo. It should be ock calls.

      Figure 4B. Is the sample size five animals per group and per week? Overlapping data points seem to be placed next to each other. Why in some groups (e.g. ACC 6 weeks) less than five dots are visible?

      The sample size differed per week because of the lack of recording during the COVID restrictions. In Fig 4b, we have now separated the overlapping dots. We have also added the sample size of the groups in the figure legends.

      Would the authors expect to see stronger differences between the lesioned and the control groups when comparing a later developmental stage? The animals were euthanized at the age of

      These speculation is certainly feasible and yes, we were hoping to establish this level of detail with testing at later developmental stages. This is an aspect of development we are currently pursuing.

      Could these experiments be conducted?

      I’m afraid these animals are longer available, but we are currently conducting experiments in other animals with early life neurochemical manipulations who show behavioral changes into early adulthood.

      ACC lesion: It is reported that the lesions extended past 24b into motor area 6M. Did the animal display any motor control disability?

      Surprisingly, despite the lesion encroaching into 6M, these animals showed no observable motor impairment. We assessed the animals grip strength and body weight and discovered normal strength and growth in weight in both controls and the lesioned group. We have added this data as supplemental information (Fig. S3).

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary:

      This study investigates what happens to the stimulus-driven responses of V4 neurons when an item is held in working memory. Monkeys are trained to perform memory-guided saccades: they must remember the location of a visual cue and then, after a delay, make an eye movement to the remembered location. In addition, a background stimulus (a grating) is presented that varies in contrast and orientation across trials. This stimulus serves to probe the V4 responses, is present throughout the trial, and is task-irrelevant. Using this design, the authors report memory-driven changes in the LFP power spectrum, changes in synchronization between the V4 spikes and the ongoing LFP, and no significant changes in firing rate.

      Strengths:

      (1) The logic of the experiment is nicely laid out.

      (2) The presentation is clear and concise.

      (3) The analyses are thorough, careful, and yield unambiguous results.

      (4) Together, the recording and inactivation data demonstrate quite convincingly that the signal stored in FEF is communicated to V4 and that, under the current experimental conditions, the impact from FEF manifests as variations in the timing of the stimulus-evoked V4 spikes and not in the intensity of the evoked activity (i.e., firing rate).

      Weaknesses:

      I think there are two limitations of the study that are important for evaluating the potential functional implications of the data. If these were acknowledged and discussed, it would be easier to situate these results in the broader context of the topic, and their importance would be conveyed more fairly and transparently.

      (1) While it may be true that no firing rate modulations were observed in this case, this may have been because the probe stimuli in the task were behaviorally irrelevant; if anything, they might have served as distracters to the monkey's actual task (the MGS). From this perspective, the lack of rate modulation could simply mean that the monkeys were successful in attending the relevant cue and shielding their performance from the potentially distracting effect of the background gratings. Had the visual probes been in some way behaviorally relevant and/or spatially localized (instead of full field), the data might have looked very different.

      Any task design involves tradeoffs; if the visual stimulus was behaviorally relevant, then any observed neurophysiological changes would be more confounded by possible attentional effects. We cannot exclude the possibility that a different task or different stimuli would produce different results; we ourselves have reported firing rate enhancements for other types of visual probes during an MGS task (Merrikhi et al. 2017). We have added an acknowledgement of these limitations in the discussion section (lines 323-330 in untracked version). At minimum, our results show a dissociation between the top-down modulation of phase coding, which is enhanced during WM even for these task-irrelevant stimuli, and rate coding. Establishing whether and how this phase coding is related to perception and behavior will be an important direction for future work.

      With this in mind, it would be prudent to dial down the tone of the conclusions, which stretch well beyond the current experimental conditions (see recommendations).

      We have edited the title (removing the word ‘primarily’) and key sentences throughout to tone down the conclusions, generally to state that the importance of a phase code in WM modulations is *possible* given the observed results, rather than certain (see abstract lines 26-27, introduction lines 59-62, conclusion lines 310-311).

      (2) Another point worth discussing is that although the FEF delay-period activity corresponds to a remembered location, it can also be interpreted as an attended location, or as a motor plan for the upcoming eye movement. These are overlapping constructs that are difficult to disentangle, but it would be important to mention them given prior studies of attentional or saccade-related modulation in V4. The firing rate modulations reported in some of those cases provide a stark contrast with the findings here, and I again suspect that the differences may be due at least in part to the differing experimental conditions, rather than a drastically different encoding mode or functional linkage between FEF and V4.

      We have added a paragraph to the discussion section addressing links to attention and motor planning (lines 315-333), and specifically acknowledging the inherent difficulties of fully dissociating these effects when interpreting our results (lines 323-330).

      Reviewer #2 (Public review):

      Summary:

      It is generally believed that higher-order areas in the prefrontal cortex guide selection during working memory and attention through signals that selectively recruit neuronal populations in sensory areas that encode the relevant feature. In this work, Parto-Dezfouli and colleagues tested how these prefrontal signals influence activity in visual area V4 using a spatial working memory task. They recorded neuronal activity from visual area V4 and found that information about visual features at the behaviorally relevant part of space during the memory period is carried in a spatially selective manner in the timing of spikes relative to a beta oscillation (phase coding) rather than in the average firing rate (rate code). The authors further tested whether there is a causal link between prefrontal input and the phase encoding of visual information during the memory period. They found that indeed inactivation of the frontal eye fields, a prefrontal area known to send spatial signals to V4, decreased beta oscillatory activity in V4 and information about the visual features. The authors went one step further to develop a neural model that replicated the experimental findings and suggested that changes in the average firing rate of individual neurons might be a result of small changes in the exact beta oscillation frequency within V4. These data provide important new insights into the possible mechanisms through which top-down signals can influence activity in hierarchically lower sensory areas and can therefore have a significant impact on the Systems, Cognitive, and Computational Neuroscience fields.

      Strengths:

      This is a well-written paper with a well-thought-out experimental design. The authors used a smart variation of the memory-guided saccade task to assess how information about the visual features of stimuli is encoded during the memory period. By using a grating of various contrasts and orientations as the background the authors ensured that bottom-up visual input would drive responses in visual area V4 in the delay period, something that is not commonly done in experimental settings in the same task. Moreover, one of the major strengths of the study is the use of different approaches including analysis of electrophysiological data using advanced computational methods of analysis, manipulation of activity through inactivation of the prefrontal cortex to establish causality of top-down signals on local activity signatures (beta oscillations, spike locking and information carried) as well as computational neuronal modeling. This has helped extend an observation into a possible mechanism well supported by the results.

      Weaknesses:

      Although the authors provide support for their conclusions from different approaches, I found that the selection of some of the analyses and statistical assessments made it harder for the reader to follow the comparison between a rate code and a phase code. Specifically, the authors wish to assess whether stimulus information is carried selectively for the relevant position through a firing rate or a phase code. Results for the rate code are shown in Figures 1B-G and for the phase code are shown in Figure 2. Whereas an F-statistic is shown over time in Figure 1F (and Figure S1) no such analysis is shown for LFP power. Similarly, following FEF inactivation there is no data on how that influences V4 firing rates and information carried by firing rates in the two conditions (for positions inside and outside the V4 RF). In the same vein, no data are shown on how the inactivation affects beta phase coding in the OUT condition.

      Per the reviewer’s suggestion, we have added several new supplementary figures. We now show the F-statistic for discriminability over time for the LFP timecourse (Fig. S2), and as a function of power in various frequencies (Fig. S4). We have added before/after inactivation comparisons of the LFP and spiking activity, and their respective F-statistics for discrimination between contrasts and orientations in Fig. S9. Lastly, we added a supplementary figure evaluating the impact of FEF inactivation on beta phase coding in the OUT condition, showing no significant change (Fig. S11).

      Moreover, some of the statistical assessments could be carried out differently including all conditions to provide more insight into mechanisms. For example, a two-way ANOVA followed by post hoc tests could be employed to include comparisons across both spatial (IN, OUT) and visual feature conditions (see results in Figures 2D, S4, etc.). Figure 2D suggests that the absence of selectivity in the OUT condition (no significant difference between high and low contrast stimuli) is mainly due to an increase in slope in the OUT condition for the low contrast stimulus compared to that for the same stimulus in the IN condition. If this turns out to be true it would provide important information that the authors should address.

      We have updated the STA slope measurement, excluding the low contrast condition which lacks a clear peak in the STA. Additionally, we equalized the bin widths and aligned the x-axes for better visual comparability. Then, we performed a two-way ANOVA, analyzing the effects of spatial features (IN vs. OUT) and visual conditions (contrast and orientation). The results showed a significant effect of the visual feature on both orientation (F = 3.96, p=0.046) and contrast (F = 14.26, p<10<sup>-3</sup>). However, neither the spatial feature nor the spatial-visual interaction exhibited significant effects for orientation (F = 0.52, p=0.473, F=1.56, p=0.212) or contrast (F = 2.19, p=0.139, F=1.15, p=0.283).

      There are also a few conceptual gaps that leave the reader wondering whether the results and conclusion are general enough. Specifically,

      (1) The authors used microstimulation in the FEF to determine RFs. It is thus possible that the FEF sites that were inactivated were largely more motor-related. Given that beta oscillations and motor preparatory activity have been found to be correlated and motor sites show increased beta oscillatory activity in the delay period, it is possible that the effect of FEF inactivation on V4 beta oscillations is due to inactivation of the main source of beta activity. Had the authors inactivated sites with a preponderance of visual neurons in the FEF would the results be different?

      We do not believe this to be likely based on what is known anatomically and functionally about this circuitry. Anatomically, the projections from FEF to V4 arise primarily from the supragranular layers, not layers which contain the highest proportion of motor activity (Barone et al. 2000, Pouget et al. 2009, Markov et al. 2013). Functionally, based on electrical identification of V4-projecting FEF neurons, we know that FEF to V4 projections are predominantly characterized by delay rather than motor activity (Merrikhi et al. 2017). We have now tried to emphasize these points when we introduce the inactivation experiments (lines 185-186).

      Experimentally, the spread of the pharmacological effect with our infusion system is quite large relative to any clustering of visual vs. motor neurons within the FEF, with behavioral consequences of inactivation spreading to cover a substantial portion of the visual hemifield (e.g., Noudoost et al. 2014, Clark et al. 2014), and so our manipulation lacks the spatial resolution to selectively target motor vs. other FEF neurons.

      (2) Somewhat related to this point and given the prominence of low-frequency activity in deeper layers of the visual cortex according to some previous studies, it is not clear where the authors' V4 recordings were located. The authors report that they do have data from linear arrays, so it should be possible to address this.

      Unfortunately, our chamber placement for V4 has produced linear array penetration angles which do not reliably allow identification of cortical layers. We are aware of previous results showing layer-specific effects of attention in V4 (e.g., Pettine et al. 2019, Buffalo et al. 2011), and it would indeed be interesting to determine whether our observed WM-driven changes follow similar patterns. We may be able to analyze a subset of the data with current source density analysis to look for layer-specific effects in the future, but are not able to provide any information at this time.

      (3) The authors suggest that a change in the exact frequency of oscillation underlies the increase in firing rate for different stimulus features. However, the shift in frequency is prominent for contrast but not for orientation, something that raises questions about the general applicability of this observation for different visual features.

      While the shift in peak frequency across contrasts is more prominent than that across orientations (Fig. S3A-B), the relationship between orientation and peak frequency is also significant (one-way ANOVA for peak frequency across contrasts, F<sub>Contrast</sub>=10.72, p<10<sup>-4</sup>; or across orientations, F<sub>Orientation</sub>=3, p=0.030; stats have been added to Fig. S3 caption). This finding also aligns with previous studies, which reported slight peak frequency shifts (~1–2 Hz) in the context of attention (Fries, 2015). To address the question of whether the frequency-firing rate correlation generalizes to orientation-driven changes, we now examine the relationship between peak frequency and firing rate separately for each contrast level (Fig. S14). The average normalized response as a function of peak frequency, pooled across subsamples of trials from each of 145 V4 neurons (100 subsamples/neuron), IN vs. OUT conditions, shows a significant correlation during the delay period for each contrast (contrast low (F<sub>Condition</sub>=0.03, p=0.867; F<sub>Frequency</sub>=141.86, p<10<sup>-18</sup>; F<sub>Interaction</sub>=10.70, p=0.002, ANCOVA), contrast middle (F<sub>Condition</sub>=7.18, p=0.009; F<sub>Frequency</sub>=96.76, p<10<sup>-14</sup>; F<sub>Interaction</sub>=0.13, p=0.716, ANCOVA), contrast high (F<sub>Condition</sub>=12.51, p=0.001; F<sub>Frequency</sub>=333.74, p<10<sup>-29</sup>; F<sub>Interaction</sub>=7.91, p=0.006, ANCOVA).

      (4) One of the major points of the study is the primacy of the phase code over the rate code during the delay period. Specifically, here it is shown that information about the visual features of a stimulus carried by the rate code is similar for relevant and irrelevant locations during the delay period. This contrasts with what several studies have shown for attention in which case information carried in firing rates about stimuli in the attended location is enhanced relative to that for stimuli in the unattended location. If we are to understand how top-down signals work in cognitive functions it is inevitable to compare working memory with attention. The possible source of this difference is not clear and is not discussed. The reader is left wondering whether perhaps a different measure or analysis (e.g. a percent explained variance analysis) might reveal differences during the delay period for different visual features across the two spatial conditions.

      We have added discussion regarding the relationship of these results to previous findings during attention in the discussion section (lines 315-333).

      The use of the memory-guided saccade task has certain disadvantages in the context of this study. Although delay activity is interpreted as memory activity by the authors, it is in principle possible that it reflects preparation for the upcoming saccade, spatial attention (particularly since there is a stimulus in the RF), etc. This could potentially change the conclusion and perspective.

      We have added a new discussion paragraph addressing the relationship to attention and motor planning (lines 315-333). We have also moderated the language used to describe our conclusions throughout the manuscript in light of this ambiguity.

      For the position outside the V4 RF, there is a decrease in both beta oscillations and the clustering of spikes at a specific phase. It is therefore possible that the decrease in information about the stimuli features is a byproduct of the decrease in beta power and phase locking. Decreased oscillatory activity and phase locking can result in less reliable estimates of phase, which could decrease the mutual information estimates.

      Looking at the SNR as a ratio of power in the beta band to all other bands, there is no significant drop in SNR between conditions (SNRIN = 4.074+-984, SNROUT = 4.333+-0.834 OUT, p=0.341, Wilcoxon signed-rank). Therefore, we do not think that the change in phase coding is merely a result of less reliable phase estimates.

      The authors propose that coherent oscillations could be the mechanism through which the prefrontal cortex influences beta activity in V4. I assume they mean coherent oscillations between the prefrontal cortex and V4. Given that they do have simultaneous recordings from the two areas they could test this hypothesis on their own data, however, they do not provide any results on that.

      This paper only includes inactivation data. We are working on analyzing the simultaneous recording data for a future publication.

      The authors make a strong point about the relevance of changes in the oscillation frequency and how this may result in an increase in firing rate although it could also be the reverse - an increase in firing rate leading to an increase in the frequency peak. It is not clear at all how these changes in frequency could come about. A more nuanced discussion based on both experimental and modeling data is necessary to appreciate the source and role (if any) of this observation.

      As the reviewer notes, it is difficult to determine whether the frequency changes drive the rate changes, vice versa, or whether both are generated in parallel by a common source. We have adjusted our language to reflect this (lines 291-293). Future modeling work may be able to shed more light on the causal relationships between various neural signatures.

      Reviewer #3 (Public review):

      Summary:

      In this report, the authors test the necessity of prefrontal cortex (specifically, FEF) activity in driving changes in oscillatory power, spike rate, and spike timing of extrastriate visual cortex neurons during a visual-spatial working memory (WM) task. The authors recorded LFP and spikes in V4 while macaques remembered a single spatial location over a delay period during which task-irrelevant background gratings were displayed on the screen with varying orientation and contrast. V4 oscillations (in the beta range) scaled with WM maintenance, and the information encoded by spike timing relative to beta band LFP about the task-irrelevant background orientation depended on remembered location. They also compared recorded signals in V4 with and without muscimol inactivation of FEF, demonstrating the importance of FEF input for WM-induced changes in oscillatory amplitude, phase coding, and information encoded about background orientations. Finally, they built a network model that can account for some of these results. Together, these results show that FEF provides meaningful input to the visual cortex that is used to alter neural activity and that these signals can impact information coding of task-irrelevant information during a WM delay.

      Strengths:

      (1) Elegant and robust experiment that allows for clear tests for the necessity of FEF activity in WM-induced changes in V4 activity.

      (2) Comprehensive and broad analyses of interactions between LFP and spike timing provide compelling evidence for FEF-modulated phase coding of task-irrelevant stimuli at remembered location.

      (3) Convincing modeling efforts.

      Weaknesses:

      (1) 0% contrast background data (standard memory-guided saccade task) are not reported in the manuscript. While these data cannot be used to consider information content of spike rate/time about task-irrelevant background stimuli, this condition is still informative as a 'baseline' (and a more typical example of a WM task).

      We have added a new supplementary figure to show the effect of WM on V4 LFP power and SPL in 0% contrast trials (Fig. S6). These results (increases in beta LFP power and SPL when remembering the V4 RF location) match our previous report for the effect of spatial WM on LFP power and SPL within extrastriate area MT (Bahmani et al. 2018).

      (2) Throughout the manuscript, the primary measurements of neural coding pertain to task-irrelevant stimuli (the orientation/contrast of the background, which is unrelated to the animal's task to remember a spatial location). The remembered location impacts the coding of these stimulus variables, but it's unclear how this relates to WM representations themselves.

      Indeed, here we have focused on how maintaining spatial WM impacts visual processing of incoming sensory information, rather than on how the spatial WM signal itself is represented and maintained. Behaviorally, this impact on visual signals could be related to the effects of the content of WM on perception and reaction times (e.g., Soto et al. 2008, Awh et al. 1998, Teng et al. 2019), but no such link to behavior is shown in our data.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      As mentioned above, the two points I raised in the public review merit a bit of development in the Discussion. In addition, the authors should revise some of their conclusions.

      For instance (L217):

      "The finding that WM mainly modulates phase coded information within extrastriate areas fundamentally shifts our understanding of how the top-down influence of prefrontal cortex shapes the neural representation, suggesting that inducing oscillations is the main way WM recruits sensory areas."

      In my opinion, this one is over-the-top on various counts.

      Here is another exaggerated instance (L298):

      "...leading us to conclude that representations based on the average firing rate of neurons are not the primary way that top-down signals enhance sensory processing."

      Again, as noted above, the problem is that one could make the case that the top-down signals are, in fact, highly effective, since they are completely quashing any distracter-related modulation in firing rate across RFs. There is only so much that one can conclude from responses to stimuli that are task-irrelevant, uniform across space, and constant over the course of a trial.

      I think even the title goes too far. What the work shows, by all accounts, is that the sustained activity in FEF has a definitive impact on V4 *even* with respect to a sustained, irrelevant background stimulus. The result is very robust in this sense. However, this is quite different from saying that the *primary* means of functional control for FEF is via phase coding. Establishing that would require ruling out other forms of control (i.e., rate coding) in all or a wide range of experimental conditions. That is far from the restricted set of conditions tested here and is also at variance with many other experiments demonstrating effects of attention or even FEF microstimulation on V4 firing activity.

      To reiterate, in my opinion, the work is carefully executed and the data are interesting and largely unambiguous. I simply take issue with what can be reliably concluded, and how the results fit with the rest of the literature. Revisions along these lines would improve the readability of the paper considerably.

      We have edited the title (removing the word ‘primarily’) and key sentences throughout to tone down the conclusions, generally to state that the importance of a phase code in WM modulations is *possible* given the observed results, rather than certain (see abstract lines 26-27, introduction lines 59-62, conclusion lines 310-311).

      Reviewer #3 (Recommendations for the authors):

      (1) My primary comment that came up multiple times as I read the manuscript (and which is summarized above) is that I wasn't ever sure why the authors are focused on analyzing neural coding of task-irrelevant sensory information during a WM task as a function of WM contents (remembered location). Most studies of neural codes supporting WM often focus on coding the remembered information - not other information. Conceptually, it seems that the brain would want to suppress - or at least not enhance - representations of task-irrelevant information when performing a demanding task, especially when there is no search requirement, and when there is no feature correspondence between the remembered and viewed stimuli. (i.e., the interaction between WM and visual input is more obvious for visual search for a remembered target). Why, in theory, would a visual region need to improve its coding of non-remembered information as a function of WM? This isn't meant to detract from the results, which are indeed very interesting and I think quite informative. The authors are correct that this is certainly relevant for sensory recruitment models of WM - there's clear evidence for a role of feedback from PFC to extrastriate cortex - but what role, specifically, each region plays in this task is critical to describe clearly, especially given the task-irrelevance of the input. Put another way: what if the animal was remembering an oriented grating? In that case, MI between spike-based measures and orientation would be directly relevant to questions of neural WM representations, as the remembered feature is itself being modeled. But here, the focus seems to be on incidental coding.

      Indeed, here we have focused on how maintaining spatial WM impacts visual processing of incoming sensory information, rather than on how the spatial WM signal itself is represented and maintained. Behaviorally, this impact on visual signals could be related to the effects of the content of WM on perception and reaction times (e.g., Soto et al. 2008, Awh et al. 1998, Teng et al. 2019), but no such link to behavior is shown in our data.

      Whether similar phase coding is also used to represent the content of object WM (for example, if the animal was remembering an oriented grating), or whether phase coding is only observed for WM’s modulation of the representation of incoming sensory signals, is an important question to be addressed in future work.

      (2) Related to the above, the phrasing of the second sentence of the Discussion (lines 291-292) is ambiguous - do the authors mean that the FEF sends signals that carry WM content to V4, or that FEF sends projections to V4, and V4 has the WM content? As presently phrased, either of these are reasonable interpretations, yet they're directly opposing one another (the next sentence clarifies, but I imagine the authors want to minimize any confusion).

      We have edited this sentence to read, “Within prefrontal areas, FEF sends direct projections to extrastriate visual areas, and activity in these projections reflects the content of WM.”

      (3) I'm curious about how the authors consider the spatial WM task here different from a cued spatial attention task. Indeed, both require sustained use of a location for further task performance. The section of the Discussion addressing similar results with attention (lines 307-311) presently just summarizes the similarities of results but doesn't offer a theoretical perspective for how/why these different types of tasks would be expected to show similar neural mechanisms.

      We have added discussion regarding the relationship of these results to previous findings during attention in the discussion section (lines 315-333).

      (4) As far as I can tell, there is no consideration of behavioral performance on the memory-guided saccade task (RT, precision) across the different stimulus background conditions. This should be reported for completeness, and to determine whether there is an impact of the (likely) task-irrelevant background on task performance. This analysis should also be reported for Figure 3's results characterizing how FEF inactivation disrupts behavior (if background conditions were varied, see point 7 below).

      We have added the effect of inactivation on behavioral RT and % correct across the different stimulus background conditions (Fig. S8). Background contrast and orientation did not impact either RT or % correct.

      (5) Results from Figure 2 (especially Figures 2A-B) concerning phase-locked spiking in V4 should be shown for 0%-contrast trials as well, as these trials better align with 'typical' WM tasks.

      We have added a new supplementary figure to show the effect of WM on V4 LFP power and SPL in 0% contrast trials (Fig. S6). These results (increases in beta LFP power and SPL) match our previous report for the effect of spatial WM on LFP power and SPL within extrastriate area MT (Bahmani et al. 2018).

      (6) The magnitude of SPL difference in aggregate (Figure 2B) is much, much smaller than that of the example site shown (Figure 2A), such that Figure 2A's neuron doesn't appear to be visible on Figure 2B's scatterplot. Perhaps a more representative sample could be shown? Or, the full range of x/y axes in Figure 2B could be plotted to illustrate the full distribution.

      We have updated Fig. 2A with a more representative sample neuron.

      (7) I'm a bit confused about the FEF inactivation experiments. In the Methods (lines 512-513), the authors mention there was no background stimulus presented during the inactivation experiment, and instead, a typical 8-location MGS task was employed. However, in the results on pg 8 (Lines 201-214), and Figure 3G, the authors quantify a phase code MI. The previous phase code MI analysis was looking at MI between each spike's phase and the background stimulus - but if there's no background, what's used to compute phase code MI? Perhaps what they meant to write was that, in addition to the primary task with a manipulation of background properties, an 8-location MGS task was additionally employed.

      The reviewer is correct that both tasks were used after inactivation (the 8-location task to assess the spread of the behavioral effect of inactivation, and the MGS-background task for measuring MI). We have edited the methods text to clarify.

      (8) How is % Correct defined for the MGS task? (what is the error threshold? Especially for the results described in lines 192-193).

      The % correct is defined as correct completed trials divided by the total number of trials; the target window was a circle with radius of 2 or 4 dva (depending on cue eccentricity). These details have been added to the Methods.

      (9) The paragraph from lines 183-200 describes a number of behavioral results concerning "scatter" and "RT" - the RT shown seems extremely high, and perhaps is normalized. Details of this normalization should be included in the Methods. The "scatter" is listed as dva, but it's not clear how scatter is quantified (std dev of endpoint distribution? Mean absolute error), nor how target eccentricity is incorporated (as scatter is likely higher for greater target eccentricity).

      We have renamed ‘scatter’ to ‘saccade error’ in the text to match the figure, and now provide details in the Methods section. Both RT and saccade error are normalized for each session, details are now provided in the Methods. Since error was normalized for each session before performing population statistics, no other adjustment for eccentricity was made.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1:

      (1) Line numbers are missing.

      Added

      (2) VR classroom. Was this a completely custom design based on Unity, or was this developed on top of some pre-existing code? Many aspects of the VR classroom scenario are only introduced (e.g., how was the lip-speech synchronisation done exactly?). Additional detail is required. Also, is or will the experiment code be shared publicly with appropriate documentation? It would also be useful to share brief example video-clips.

      We have added details about the VR classroom programming to the methods section (p. 6-7), and we have now included a video-example as supplementary material.

      “Development and programming of the VR classroom were done primarily in-house, using assets (avatars and environment) were sourced from pre-existing databases. The classroom environment was adapted from assets provided by Tirgames on TurboSquid (https://www.turbosquid.com/Search/Artists/Tirgames) and modified to meet the experimental needs. The avatars and their basic animations were sourced from the Mixamo library, which at the time of development supported legacy avatars with facial blendshapes (this functionality is no longer available in current versions of Mixamo). A brief video example of the VR classroom is available at: https://osf.io/rf6t8.

      “To achieve realistic lip-speech synchronization, the teacher’s lip movements were controlled by the temporal envelope of the speech, adjusting both timing and mouth size dynamically. His body motions were animated using natural talking gestures.”

      While we do intent to make the dataset publicly available for other researchers, at this point we are not making the code for the VR classroom public. However, we are happy to share it on an individual-basis with other researchers who might find it useful for their own research in the future.

      (3) "normalized to the same loudness level using the software Audacity". Please specify the Audacity function and parameters.

      We have added these details (p.7)

      “All sound-events were normalized to the same loudness level using the Normalize function in the audio-editing software Audacity (theaudacityteam.org, ver 3.4), with the peak amplitude parameter set to -5 dB, and trimmed to a duration of 300 milliseconds.“

      (4) Did the authors check if the participants were already familiar with some of the content in the mini-lectures?

      This is a good point. Since the mini-lectures spanned many different topics, we did not pre-screen participants for familiarity with the topics, and it is possible that some of the participants had some pre-existing knowledge.

      In hindsight, it would have been good to have added some reflective questions regarding participants prior knowledge as well as other questions such as level of interest in the topic and/or how well they understood the content. These are elements that we hope to include in future versions of the VR classroom.

      (5) "Independent Component Analysis (ICA) was then used to further remove components associated with horizontal or vertical eye movements and heartbeats". Please specify how this selection was carried out.

      Selection of ICA components was done manually based on visual inspection of their time-course patterns and topographical distributions, to identify components characteristic of blinks, horizontal eye-movements and heartbeats). Examples of these distinct components are provided in Author response image 1 below. These is now specified in the methods section.

      Author response image 1.

      (6) "EEG data was further bandpass filtered between 0.8 and 20 Hz". If I understand correctly, the data was filtered a second time. If that's the case, please do not do that, as that will introduce additional and unnecessary filtering artifacts. Instead, the authors should replace the original filter with this one (so, filtering the data only once). Please see de Cheveigne and Nelkn, Neuron, 2019 for an explanation. Also, please provide an explanation of the rationale for further restricting the cut-off bands in the methods section. Finally, further details on the filters should be included (filter type and order, for example).

      Yes, the data was indeed filtered twice. The first filter is done as part of the preprocessing procedure, in order to remove extremely high- and low- frequency noise but retain most activity within the range of “neural” activity. This broad range is mostly important for the ICA procedure, so as to adequately separate between ocular and neural contribution to the recorded signal.

      However, since both the speech tracking responses and ERPs are typically less broadband and are comprised mostly of lower frequencies (e.g., those that make up the speech-envelope), a second narrower filter was applied to improve TRF model-fit and make ERPs more interpretable.

      In both cases we used a fourth order zero-phase Butterworth IIR filter with 1-seconds of padding, as implemented in the Fieldtrip toolbox. We have added these details to the manuscript.

      (7) "(~ 5 minutes of data in total), which is insufficient for deriving reliable TRFs". That is a bit pessimistic and vague. What does "reliable" mean? I would tend to agree when talking about individual subject TRFs, which 5 min per participant can be enough at the group level. Also, this depends on the specific speech material. If the features are univariate or multivariate. Etc. Please narrow down and clarify this statement.

      We determined that the data in the Quiet condition (~5 min) was insufficient for performing reliable TRF analysis, by assessing whether its predictive-power was significantly better than chance. As shown in Author response image 2 below, the predictive power achieved using this data was not higher than values obtained in permuted data (p = 0.43). Therefore, we did not feel that it was appropriate to include TRF analysis of the Quiet condition in this manuscript. We have now clarified this in the manuscript (p. 10)

      Author response image 2.

      (8) "Based on previous research in by our group (Kaufman & Zion Golumbic 2023), we chose to use a constant regularization ridge parameter (λ= 100) for all participants and conditions". This is an insufficient explanation. I understand that there is a previous paper involved. However, such an unconventional choice that goes against the original definition and typical use of these methods should be clearly reported in this manuscript.

      We apologize for not clarifying this point sufficiently, and have added an explanation of this methodological choice (p.11):

      “The mTRF toolbox uses a ridge-regression approach for L2 regularization of the model to ensure better generalization to new data. We tested a range of ridge parameter values (λ's) and used a leave-one-out cross-validation procedure to assess the model’s predictive power, whereby in each iteration, all but one trials are used to train the model, and it is then applied to the left-out trial. The predictive power of the model (for each λ) is estimated as the Pearson’s correlation between the predicted neural responses and the actual neural responses, separately for each electrode, averages across all iterations. We report results of the model with the λ the yielded the highest predictive power at the group-level (rather than selecting a different λ for each participant which can lead to incomparable TRF models across participants; see discussion in Kaufman & Zion Golumbic 2023).”

      Assuming that the explanation will be sufficiently convincing, which is not a trivial case to make, the next issue that I will bring up is that the lambda value depends on the magnitude of input and output vectors. While the input features are normalised, I don't see that described for the EEG signals. So I assume they are not normalized. In that case, the lambda would have at least to be adapted between subjects to account for their different magnitude.

      We apologize for omitting this detail – yes, the EEG signals were normalized prior to conducting the TRF analysis. We have updated the methods section to explicitly state this pre-processing step (p.10).

      Another clarification, is that value (i.e., 100) would not be comparable either across subjects or across studies. But maybe the authors have a simple explanation for that choice? (note that this point is very important as this could lead others to use TRF methods in an inappropriate way - but I understand that the authors might have specific reasons to do so here). Note that, if the issue is finding a reliable lambda per subject, a more reasonable choice would be to use a fixed lambda selected on a generic (i.e., group-level) model. However selecting an arbitrary lambda could be problematic (e.g., would the results replicate with another lambda; and similarly, what if a different EEG system was used, with different overall magnitude, hence the different impact of the regularisation).

      We fully agree that selecting an arbitrary lambda is problematic (esp across studies). As clarified above, the group-level lambda chosen here for the encoding more was data-driven, optimized based on group-level predictive power.

      (9) "L2 regularization of the model, to reduce its complexity". Could the authors explain what "reduce its complexity" refers to?

      Our intension here was to state that the L2 regularization constrains the model’s weights so that it can better generalize between to left-out data. However, for clarity we have now removed this statement.

      (10) The same lambda value was used for the decoding model. From personal experience, that is very unlikely to be the optimal selection. Decoding models typically require a different (usually larger) lambda than forward models, which can be due to different reasons (different SNR of "input" of the model and, crucially, very different dimensionality).

      We agree with the reviewer that treatment of regularization parameters might not be identical for encoding and decoding models. Our initial search of lambda parameters was limited to λ= 0.01 - 100, with λ= 100 showing the best reconstruction correlations. However, following the reviewer’s suggestion we have now broadened the range and found that, in fact reconstruction correlations are further improved and the best lambda is λ= 1000 (see Author response image 3 below, left panel). Importantly, the difference in decoding reconstruction correlations between the groups is maintained regardless of the choice of lambda (although the effect-size varies; see Author response image 3, right panel). We have now updated the text to reflect results of the model with λ= 1000.

      Author response image 3.

      (11) Skin conductance analysis. Additional details are required. For example, how was the linear interpolation done exactly? The raw data was downsampled, sure. But was an anti-aliasing filter applied? What filter exactly? What implementation for the CDA was run exactly?

      We have added the following details to the methods section (p. 14):

      “The Skin Conductance (SC) signal was analyzed using the Ledalab MATLAB toolbox (version 3.4.9; Benedek and Kaernbach, 2010; http://www.ledalab.de/) and custom-written scripts. The raw data was downsampled to 16Hz using FieldTrip's ft_resampledata function, which applies a built-in anti-aliasing low-pass filter to prevent aliasing artifacts. Data were inspected manually for any noticeable artifacts (large ‘jumps’), and if present were corrected using linear interpolation in Ledalab. A continuous decomposition analysis (CDA) was employed to separate the tonic and phasic SC responses for each participant. The CDA was conducted using the 'sdeco' mode (signal decomposition), which iteratively optimizes the separation of tonic and phasic components using the default regularization settings.”

      (12) "N1- and P2 peaks of the speech tracking response". Have the authors considered using the N1-P2 complex rather than the two peaks separately? Just a thought.

      This is an interesting suggestion, and we know that this has been used sometimes in more traditional ERP literature. In this case, since neither peak was modulated across groups, we did not think this would yield different results. However, it is a good point to keep in mind for future work.

      (13) Figure 4B. The ticks are missing. From what I can see (but it's hard without the ticks), the N1 seems later than in other speech-EEG tracking experiments (where is closer to ~80ms). Could the authors comment on that? Or maybe this looks similar to some of the authors' previous work?

      We apologize for this and have added ticks to the figure.

      In terms of time-course, a N1 peak at around 100ms is compatible with many of our previous studies, as well as those from other groups.

      (14) Figure 4C. Strange thin vertical grey bar to remove.

      Fixed.

      (15) Figure 4B: What about the topographies for the TRF weights? Could the authors show that for the main components?

      Yes. The topographies of the main TRF components are similar to those of the predictive power and are compatible with auditory responses. We have added them to Figure 4B.

      (16) Figure 4B: I just noticed that this is a grand average TRF. That is ok (but not ideal) only because the referencing is to the mastoids. The more appropriate way of doing this is to look at the GFP, instead, which estimates the presence of dipoles. And then look at topographies of the components. Averaging across channels makes the plotted TRF weaker and noisier. I suggest adding the GFP to the plot. Also, the colour scale in Figure 4A is deceiving, as blue is usually used for +/- in plots of the weights. While that is a heatmap, where using a single colour or even yellow to red would be less deceiving at first look. Only cosmetics, indeed. The result is interesting nonetheless!

      We apologize for this, and agree with the reviewer that it is better not to average across EEG channels. In the revised Figure, we now show the TRFs based on the average of electrodes FC1, FC2, and FCz, which exhibited the strongest activity for the two main components.

      Following the previous comment, we have also included the topographical representation of the TRF main components, to give readers a whole-head perspective of the TRF.

      We have also fixed the color-scales.

      We are glad that the reviewer finds this result interesting!

      (17) Figure 4C. This looks like a missed opportunity. That metric shows a significant difference overall. But is that underpinned but a generally lower envelope reconstruction correlation, or by a larger deviation in those correlations (so, that metric is as for the control in some moments, but it drops more frequently due to distractibility)?

      We understand the reviewer’s point here, and ideally would like to be able to address this in a more fine-grained analysis, for example on a trial-by-trial basis. However, the design of the current experiment was not optimized for this, in terms of (for example) number of trials, the distribution of sound-events and behavioral outcomes. We hope to be able to address this issue in our future research.

      (18) I am not a fan of the term "accuracy" for indicating envelope reconstruction correlations. Accuracy is a term typically associated with classification. Regression models are typically measured through errors, loss, and sometimes correlations. 'Accuracy' is inaccurate (no joke intended).

      We accept this comment and now used the term “reconstruction correlation”.

      (19) Discussion. "The most robust finding in". I suggest using more precise terminology. For example, "largest effect-size".

      We agree and have changed the terminology (p. 31).

      (20) "individuals who exhibited higher alpha-power [...]". I probably missed this. But could the authors clarify this result? From what I can see, alpha did not show an effect on the group. Is this referring to Table 2? Could the authors elaborate on that? How does that reconcile with the non-significant effect of the group? In that same sentence, do you mean "and were more likely"? If that's the case, and they were more likely to report attentional difficulties, how is it that there is no group-effect when studying alpha?

      Yes, this sentence refers to the linear regression models described in Figure 10 and in Table 2. As the reviewer correctly points out, this is one place where there is a discrepancy between the results of the between-group analysis (ADHD diagnosis yes/no) and the regression analysis, which treats ADHD symptoms as a continuum, across both groups. The same is true for the gaze-shift data, which also did not show a significance between-group effect but was identified in the regression analysis as contributing to explaining the variance in ADHD symptoms.

      We discuss this point on pages 30-31, noting that “although the two groups are clearly separable from each other, they are far from uniform in the severity of symptoms experienced”, which motivated the inclusion of both analyses in this paper.

      At the bottom of p. 31 we specifically address the similarities and differences between the between-group and regression-based results. In our opinion, this pattern emphasizes that while neither approach is ‘conclusive’, looking at the data through both lenses contributes to an overall better understanding of the contributing factors, as well as highlighting that “no single neurophysiological measure alone is sufficient for explaining differences between the individuals – whether through the lens of clinical diagnosis or through report of symptoms”.

      (21) "why in the latter case the neural speech-decoding accuracy did not contribute to explaining ASRS scores [...]". My previous point 1 on separating overall envelope decoding from its deviation could help there. The envelope decoding correlation might go up and down due to SNR, while you might be more interested in the dynamics over time (i.e., looking at the reconstructions over time).

      Again, we appreciate this comment, but believe that this additional analysis is outside the scope of what would be reliably-feasible with the current dataset. However, since the data will be made publicly available, perhaps other researchers will have better ideas as to how to do this.

      (22) Data and code sharing should be discussed. Also, specific links/names and version numbers should be included for the various libraries used.

      We are currently working on organizing the data to make it publicly available on the Open Science Project.

      We have updated links and version numbers for the various toolboxes/software used, throughout the manuscript.

      Reviewer #2:

      (1) While it is highly appreciated to study selective attention in a naturalistic context, the readers would expect to see whether there are any potential similarities or differences in the cognitive and neural mechanisms between contexts. Whether the classic findings about selective attention would be challenged, rebutted, or confirmed? Whether we should expect any novel findings in such a novel context? Moreover, there are some studies on selective attention in the naturalistic context though not in the classroom, it would be better to formulate specific hypotheses based on previous findings both in the strictly controlled and naturalistic contexts.

      Yes, we fully agree that comparing results across different contexts would be extremely beneficial and important.

      The current paper serves as an important proof-first-concept demonstrating the plausibility and scientific potential of using combined EEG-VR-eyetracking to study neurophysiological aspects of attention and distractibility, but is also the basis for formulating specific hypothesis that will be tested in follow-up studies.

      If fact, a follow up study is already ongoing in our lab, where we are looking into this point, by testing users in different VR scenarios (e.g., classroom, café, office etc.), and assessing whether similar neurophysiological patterns are observed across contexts and to what degree they are replicable within and across individuals. We hope to share these data with the community in the near future.

      (2) Previous studies suggest handedness and hemispheric dominance might impact the processing of information in each hemisphere. Whether these issues have been taken into consideration and appropriately addressed?

      This is an interesting point. In this study we did not specifically control for handedness/hemispheric dominance, since most of the neurophysiological measured used here are sensory/auditory in their nature, and therefore potentially invariant to handedness. Moreover, the EEG signal is typically not very sensitive to hemispheric dominance, at least for the measures used here. However, this might be something to consider more explicitly in future studies. Nonetheless, we have added handedness information to the Methods section (p. 5): “46 right-handed, 3 left-handed”

      (3) It would be interesting to know how students felt about the Virtual Classroom context, whether it is indeed close to the real classroom or to some extent different.

      Yes, we agree. Obviously, the VR classroom differs in many ways from a real classroom, in terms of the perceptual experience, social aspects and interactive possibilities. We did ask participants about their VR experience after the experiment, and most reported feeling highly immersed in the VR environment and engaged in the task, with a strong sense of presence in the virtual-classroom.

      We note that, in parallel to the VR studies in our lab, we are also conducting experiments in real classrooms, and we hope that the cross-study comparison will be able to shed more light on these similarities/differences.

      (4) One intriguing issue is whether neural tracking of the teacher's speech can index students' attention, as the tracking of speech may be relevant to various factors such as sound processing without semantic access.

      Another excellent point. While separating the ‘acoustic’ and ‘semantic’ contributions to the speech tracking response is non-trivial, we are currently working on methodological approaches to do this (again, in future studies) following, for example, the hierarchical TRF approach used by Brodbeck et al. and others.

      (5) There are many results associated with various metrics, and many results did not show a significant difference between the ADHD group and the control group. It is difficult to find the crucial information that supports the conclusion. I suggest the authors reorganize the results section and report the significant results first, and to which comparison(s) the readers should pay attention.

      We apologize if the organization of the results section was difficult to follow. This is indeed a challenge when collecting so many different neurophysiological metrics.

      To facilitate this, we have now added a paragraph at the beginning of the result section, clarifying its structure (p.16):

      The current dataset is extremely rich, consisting of many different behavioral, neural and physiological responses. In reporting these results, we have separated between metrics that are associated with paying attention to the teacher (behavioral performance, neural tracking of the teacher’s speech, and looking at the teacher), those capturing responses to the irrelevant sound-events (ERPs and event-related changes in SC and gaze); as well as more global neurophysiological measures that may be associated with the listeners’ overall ‘state’ of attention or arousal (alpha- and beta-power and tonic SC).

      Moreover, within each section we have ordered the analysis such that the ones with significant effects are first. We hope that this contributes to the clarity of the results section.

      (6) The difference between artificial and non-verbal humans should be introduced earlier in the introduction and let the readers know what should be expected and why.

      We have added this to the Introduction (p. 4)

      (7) It would be better to discuss the results against a theoretical background rather than majorly focusing on technical aspects.

      We appreciate this comment. In our opinion, the discussion does contain a substantial theoretical component, both regarding theories of attention and attention-deficits, and also regarding their potential neural correlates. However, we agree that there is always room for more in depth discussion.

      Reviewer #3:

      Major:

      (1) While the study introduced a well-designed experiment with comprehensive physiological measures and thorough analyses, the key insights derived from the experiment are unclear. For example, does the high ecological validity provide a more sensitive biomarker or a new physiological measure of attention deficit compared to previous studies? Or does the study shed light on new mechanisms of attention deficit, such as the simultaneous presence of inattention and distraction (as mentioned in the Conclusion)? The authors should clearly articulate their contributions.

      Thanks for this comment.

      We would not say that this paper is able to provide a ‘more sensitive biomarker’ or a ‘new physiological measure of attention’ – in order to make those type of grand statements we would need to have much more converging evidence from multiple studies and using both replication and generalization approaches.

      Rather, from our perspective, the key contribution of this work is in broadening the scope of research regarding the neurophysiological mechanisms involved in attention and distraction.

      Specifically, this work:

      (1) Offers a significant methodological advancement of the field – demonstrating the plausibility and scientific potential of using combined EEG-VR-eyetracking to study neurophysiological aspects of attention and distractibility in contexts that ‘mimic’ real-life situations (rather than highly controlled computerized tasks).

      (2) Provides a solid basis formulating specific mechanistic hypothesis regarding the neurophysiological metrics associated with attention and distraction, the interplay between them, and their potential relation to ADHD-symptoms. Rather than being an end-point, we see these results as a start-point for future studies that emphasize ecological validity and generalizability across contexts, that will hopefully lead to improved mechanisms understanding and potential biomarkers of real-life attentional capabilities (see also response to Rev #2 comment #1 above).

      (3) Highlights differences and similarities between the current results and those obtained in traditional ‘highly controlled’ studies of attention (e.g., in the way ERPs to sound-events differ between ADHD and controls; variability in gaze and alpha-power; and more broadly about whether ADHD symptoms do or don’t map onto specific neurophysiological metrics). Again, we do not claim to give a definitive ’answer’ to these issues, but rather to provide a new type of data that can expands the conversation and address the ecological validity gap in attention research.

      (2) Based on the multivariate analyses, ASRS scores correlate better with the physiological measures rather than the binary deficit category. It may be worthwhile to report the correlation between physiological measures and ASRS scores for the univariate analyses. Additionally, the correlation between physiological measures and behavioral accuracy might also be interesting.

      Thanks for this. The beta-values reported for the regression analysis reflect the correlations between the different physiological measures and the ASRS scores (p. 30). From a statistical perspective, it is better to report these values rather than the univariate correlation-coefficients, since these represent the ‘unique’ relationship with each factor, after controlling for all the others.

      The univariate correlations between the physiological measures themselves, as well as with behavioral accuracy, are reported in Figure 10

      (3) For the TRF and decoding analysis, the authors used a constant regularization parameter per a previous study. However, the optimal regularization parameter is data-dependent and may differ between encoding and decoding analyses. Furthermore, the authors did not conduct TRF analysis for the quiet condition due to the limited ~5 minutes of data. However, such a data duration is generally sufficient to derive a stable TRF with significant predictive power (Mesik and Wojtczak, 2023).

      The reviewer raises two important points, also raised by Rev #1 (see above).

      Regarding the choice of regularization parameters, we have now clarified that although we used a common lambda value for all participants, it was selected in a data-driven manner, so as to achieve an optimal predictive power at the group-level.

      See revised methods section:

      “The mTRF toolbox uses a ridge-regression approach for L2 regularization of the model to ensure better generalization to new data. We tested a range of ridge parameter values (λ's) and used a leave-one-out cross-validation procedure to assess the model’s predictive power, whereby in each iteration, all but one trials are used to train the model, and it is then applied to the left-out trial. The predictive power of the model (for each λ) is estimated as the Pearson’s correlation between the predicted neural responses and the actual neural responses, separately for each electrode, averages across all iterations. We report results of the model with the λ the yielded the highest predictive power at the group-level (rather than selecting a different λ for each participant which can lead to incomparable TRF models across participants; see discussion in Kaufman & Zion Golumbic 2023).”

      Regarding whether data was sufficient in the Quiet condition for performing TRF analysis – we are aware of the important work by Mesik & Wojtczak, and had initially used this estimate when designing our study. However, when assessing the predictive-power of the TRF model trained on data from the Quiet condition, we found that it was not significantly better than chance (see Author response image 2, ‘real’ predictive power vs. permuted data). Therefore, we ultimately did not feel that it was appropriate to include TRF analysis of the Quiet condition in this manuscript. We have now clarified this in the manuscript (p. 10)

      (4) As shown in Figure 4, for ADHD participants, decoding accuracy appears to be lower than the predictive power of TRF. This result is surprising because more data (i.e., data from all electrodes) is used in the decoding analysis.

      This is an interesting point – however, in our experience it is not necessarily the case that decoding accuracy (i.e., reconstruction correlation with the stimulus) is higher than encoding predictive-power. While both metrics use Pearson’s’ correlations, they quantify the similarity between two different types of signals (the EEG and the speech-envelope). Although the decoding procedure does use data from all electrodes, many of them don’t actually contain meaningful information regarding the stimulus, and thus could just as well hinder the overall performance of the decoding.

      (5) Beyond the current analyses, the authors may consider analyzing inter-subject correlation, especially for the gaze signal analysis. Given that the area of interest during the lesson changes dynamically, the teacher might not always be the focal point. Therefore, the correlation of gaze locations between subjects might be better than the percentage of gaze duration on the teacher.

      Thanks for this suggestion. We have tried to look into this, however working with eye-gaze in a 3-D space is extremely complex and we are not able to calculate reliable correlations between participants.

      (6) Some preprocessing steps relied on visual and subjective inspection. For instance, " Visual inspection was performed to identify and remove gross artifacts (excluding eye movements) " (P9); " The raw data was downsampled to 16Hz and inspected for any noticeable artifacts " (P13). Please consider using objective processes or provide standards for subjective inspections.

      We are aware of the possible differences between objective methods of artifact rejection vs. use of manual visual inspection, however we still prefer the manual (subjective) approach. As noted, in this case only very large artifacts were removed, exceeding ~ 4 SD of the amplitude variability, so as to preserve as many full-length trials as possible.

      (7) Numerous significance testing methods were employed in the manuscript. While I appreciate the detailed information provided, describing these methods in a separate section within the Methods would be more general and clearer. Additionally, the authors may consider using a linear mixed-effects model, which is more widely adopted in current neuroscience studies and can account for random subject effects.

      Indeed, there are many statistical tests in the paper, given the diverse types of neurophysiological data collected here. We actually thought that describing the statistics per method rather than in a separate “general” section would be easier to follow, but we understand that readers might diverge in their preferences.

      Regarding the use of mixed-effect models – this is indeed a great approach. However, it requires deriving reliable metrics on a per-trial basis, and while this might be plausible for some of our metrics, the EEG and GSR metrics are less reliable at this level. This is why we ultimately chose to aggregate across trials and use a regular regression model rather than mixed-effects.

      (8) Some participant information is missing, such as their academic majors. Given that only two lesson topics were used, the participants' majors may be a relevant factor.

      To clarify – the mini-lectures presented here actually covered a large variety of topics, broadly falling within the domains of history, science and social-science and technology. Regarding participants’ academic majors, these were relatively diverse, as can be seen in Author response table 1 and Author response image 4.

      Author response table 1.

      Author response image 4.

      (9) Did the multiple regression model include cross-validation? Please provide details regarding this.

      Yes, we used a leave-one-out cross validation procedure. We have now clarified this in the methods section which now reads:

      “The mTRF toolbox uses a ridge-regression approach for L2 regularization of the model to ensure better generalization to new data. We tested a range of ridge parameter values (λ's) and used a leave-one-out cross-validation procedure to assess the model’s predictive power, whereby in each iteration, all but one trials are used to train the model, and it is then applied to the left-out trial. The predictive power of the model (for each λ) is estimated as the Pearson’s correlation between the predicted neural responses and the actual neural responses, separately for each electrode, averages across all iterations. We report results of the model with the λ the yielded the highest predictive power at the group-level (rather than selecting a different λ for each participant which can lead to incomparable TRF models across participants; see discussion in Kaufman & Zion Golumbic 2023).”

      Minor:

      (10) Typographical errors: P5, "forty-nine 49 participants"; P21, "$ref"; P26, "Table X"; P4, please provide the full name for "SC" when first mentioned.

      Thanks! corrected

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      For many years, there has been extensive electrophysiological research investigating the relationship between local field potential patterns and individual cell spike patterns in the hippocampus. In this study, using state-ofthe-art imaging techniques, they examined spike synchrony of hippocampal cells during locomotion and immobility states. In contrast to conventional understanding of the hippocampus, the authors demonstrated that hippocampal place cells exhibit prominent synchronous spikes locked to theta oscillations.

      Strengths:

      The voltage imaging used in this study is a highly novel method that allows recording not only suprathreshold-level spikes but also subthreshold-level activity. With its high frame rate, it offers time resolution comparable to electrophysiological recordings.

      We thank the reviewer for a thorough review of our manuscript and for recognizing the strength of our study.

      Reviewer #2 (Public review):

      Summary:

      This study employed voltage imaging in the CA1 region of the mouse hippocampus during the exploration of a novel environment. The authors report synchronous activity, involving almost half of the imaged neurons, occurred during periods of immobility. These events did not correlate with SWRs, but instead, occurred during theta oscillations and were phased locked to the trough of theta. Moreover, pairs of neurons with high synchronization tended to display non-overlapping place fields, leading the authors to suggest these events may play a role in binding a distributed representation of the context.

      Strengths:

      Technically this is an impressive study, using an emerging approach that allow single-cell resolution voltage imaging in animals, that while head-fixed, can move through a real environment. The paper is written clearly and suggests novel observations about population-level activity in CA1.

      We thank the reviewer for a thorough review of our manuscript and for recognizing the strength of our study.

      Weaknesses:

      The evidence provided is weak, with the authors making surprising population-level claims based on a very sparse data set (5 data sets, each with less than 20 neurons simultaneously recorded) acquired with exciting, but less tested technology. Further, while the authors link these observations to the novelty of the context, both in the title and text, they do not include data from subsequent visits to support this. Detailed comments are below:

      (1) My first question for the authors, which is not addressed in the discussion, is why these events have not been observed in the countless extracellular recording experiments conducted in rodent CA1 during exploration of novel environments. Those data sets often have 10x the neurons simultaneously recording compared to these present data, thus the highly synchronous firing should be very hard to miss. Ideally, the authors could confirm their claims via the analysis of publicly available electrophysiology data sets. Further, the claim of high extra-SWR synchrony is complicated by the observation that their recorded neurons fail to spike during the limited number of SWRs recorded during behavior- again, not agreeing with much of the previous electrophysiological recordings.

      (2) The authors posit that these events are linked to the novelty of the context, both in the text, as well as in the title and abstract. However they do not include any imaging data from subsequent days to demonstrate the failure to see this synchrony in a familiar environment. If these data are available it would strengthen the proposed link to novelty is they were included.

      (3) In the discussion the authors begin by speculating the theta present during these synchronous events may be slower type II or attentional theta. This can be supported by demonstrating a frequency shift in the theta recording during these events/immobility versus the theta recording during movement. (4) The authors mention in the discussion that they image deep layer PCs in CA1, however this is not mentioned in the text or methods. They should include data, such as imaging of a slice of a brain post-recording with immunohistochemistry for a layer specific gene to support this.

      Comments on revisions:

      I have no further major requests and thank the authors for the additional data and analyses.

      We thank the reviewer for recognizing our efforts in revising the manuscript.

      Reviewer #3 (Public review):

      Summary:

      In the present manuscript, the authors use a few minutes of voltage imaging of CA1 pyramidal cells in head-fixed mice running on a track while local field potentials (LFPs) are recorded. The authors suggest that synchronous ensembles of neurons are differentially associated with different types of LFP patterns, theta and ripples. The experiments are flawed in that the LFP is not "local" but rather collected the other side of the brain.

      Strengths:

      The authors use a cutting-edge technique.

      We thank the reviewer for a thoughtful review of our manuscript and for pointing out the technical strength of our study.

      Weaknesses:

      The two main messages of the manuscript indicated in the title are not supported by the data. The title gives two messages that relate to CA1 pyramidal neurons in behaving head-fixed mice: (1) synchronous ensembles are associated with theta (2) synchronous ensembles are not associated with ripples. The main problem with the work is that the theta and ripple signals were recorded using electrophysiology from the opposite hemisphere to the one in which the spiking was monitored. However, both rhythms exhibit profound differences as a function of location.

      Theta phase changes with the precise location along the proximo-distal and dorso-ventral axes, and importantly, even reverses with depth. Because the LFP was recorded using a single-contact tungsten electrode, there is no way to know whether the electrode was exactly in the CA1 pyramidal cell layer, or in the CA1 oriens, CA1 radiatum, or perhaps even CA3 - which exhibits ripples and theta which are weakly correlated and in anti-phase with the CA1 rhythms, respectively. Thus, there is no way to know whether the theta phase used in the analysis is the phase of the local CA1 theta.

      Although the occurrence of CA1 ripples is often correlated across parts of the hippocampus, ripples are inherently a locally-generated rhythm. Independent ripples occur within a fraction of a millimeter within the same hemisphere. Ripples are also very sensitive to the precise depth - 100 micrometers up or down, and only a positive deflection/sharp wave is evident. Thus, even if the LFP was recorded from the center of the CA1 pyramidal layer in the contralateral hemisphere, it would not suffice for the claim made in the title.

      We thank the reviewer for pointing out the issue regarding the claim made in the title. We have revised the manuscript to clarify that the theta and ripple oscillations referenced in the title refer to specific frequency bands of intracellular and contralaterally recorded field potentials rather than field potentials recorded at the same site as the neuronal activity.

      Abstract (line19):

      “… Notably, these synchronous ensembles were not associated with contralateral ripple oscillations but were instead phase-locked to theta waves recorded in the contralateral CA1 region. Moreover, the subthreshold membrane potentials of neurons exhibited coherent intracellular theta oscillations with a depolarizing peak at the moment of synchrony.”

      Introduction (line68):

      “… Surprisingly, these synchronous ensembles occurred outside of contralateral ripples and were phase-locked to intracellular theta oscillations as well as extracellular theta oscillations recorded from the contralateral CA1 region.”

      To address concerns about electrode placement, we have now included posthoc histological verification of electrode locations, confirming that they were positioned in the contralateral CA1 pyramidal layer (Author response image 1). 

      Author response image 1.

      Post-hoc histological section showing the location of a DiI-coated electrode in the contralateral CA1 pyramidal layer. Scale bar: 300 μm.

      While we appreciate that theta and ripple oscillations exhibit regional variations in phase and amplitude, previous studies have demonstrated a strong co-occurrence and synchrony of these oscillations between both hippocampi1-3. Given that our primary objective was to examine how neuronal ensembles relate to large-scale hippocampal oscillation states rather than local microcircuit-level fluctuations, we recorded theta and ripple oscillations from the contralateral CA1 region.

      However, we acknowledge that contralateral recordings do not capture all ipsilateral-specific dynamics. Theta phases vary with depth and precise location, and local ripple events may be independently generated across small spatial scales. To reflect this, we have now explicitly acknowledged these considerations in the discussion. 

      Discussion (line527):

      While contralateral LFP recordings reliably capture large-scale hippocampal theta and ripple oscillations, they may not fully account for ipsilateral-specific dynamics, such as variations in theta phase alignment or locally generated ripple events. Although contralateral recordings serve as a well-established proxy for large-scale hippocampal oscillatory states, incorporating simultaneous ipsilateral field potential recordings in future studies could refine our understanding of local-global network interactions. Despite these considerations, our findings provide robust evidence for the existence of synchronous neuronal ensembles and their role in coordinating newly formed place cells. These results advance our understanding of how synchronous neuronal ensembles contribute to spatial memory acquisition and hippocampal network coordination.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The authors have provided sufficient experimental and analytical data addressing my comments, particularly regarding consistency with past electrophysiological data and the exclusion of potential imaging artifacts.

      We thank the reviewer for recognizing our efforts in revising the manuscript.

      Minor comment: In Figure 2C and Figure 5-figure supplement 1, 'paired Student's t-test' is not entirely appropriate. More precisely, either 'paired t-test' or 'Student's t-test' would better indicate the correct statistical method. Please verify whether these data comparisons are within-group or between-group.

      Thank you for the comment. We have revised the manuscript as suggested.

      Reviewer #2 (Recommendations for the authors):

      I have no further major requests and thank the authors for the additional data and analyses.

      We thank the reviewer for recognizing our efforts in revising the manuscript.

      Minor points- line 169- typo, correct grant to grand

      Thank you for pointing it out. The typo has been corrected.

      (1) Buzsaki, G. et al. Hippocampal network patterns of activity in the mouse. Neuroscience 116, 201-211 (2003). https://doi.org:10.1016/s03064522(02)00669-3

      (2) Szabo, G. G. et al. Ripple-selective GABAergic projection cells in the hippocampus. Neuron 110, 1959-1977 e1959 (2022). https://doi.org:10.1016/j.neuron.2022.04.002

      (3) Huang, Y. C. et al. Dynamic assemblies of parvalbumin interneurons in brain oscillations. Neuron 112, 2600-2613 e2605 (2024). https://doi.org:10.1016/j.neuron.2024.05.015

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1:

      Summary:

      The authors propose a new model of biologically realistic reinforcement learning in the direct and indirect pathway spiny projection neurons in the striatum. These pathways are widely considered to provide a neural substrate for reinforcement learning in the brain. However, we do not yet have a full understanding of mechanistic learning rules that would allow successful reinforcement learning like computations in these circuits. The authors outline some key limitations of current models and propose an interesting solution by leveraging learning with efferent inputs of selected actions. They show that the model simulations are able to recapitulate experimental findings about the activity profile in these populations of mice during spontaneous behavior. They also show how their model is able to implement off-policy reinforcement learning.

      Strengths:

      The manuscript has been very clearly written and the results have been presented in a readily digestible manner. The limitations of existing models, that motivate the presented work, have been clearly presented and the proposed solution seems very interesting. The novel contribution of the proposed model is the idea that different patterns of activity drive current action selection and learning. Not only does this allow the model is able to implement reinforcement learning computations well, but this suggestion may have interesting implications regarding why some processes selectively affect ongoing behavior and others affect learning. The model is able to recapitulate some interesting experimental findings about various activity characteristics of dSPN and iSPN pathway neuronal populations in spontaneously behaving mice. The authors also show that their proposed model can implement off-policy reinforcement learning algorithms with biologically realistic learning rules. This is interesting since off-policy learning provides some unique computational benefits and it is very likely that learning in neural circuits may, at least to some extent, implement such computations.

      We thank the reviewer for the positive comments.

      Weaknesses:

      A weakness in this work is that it isn’t clear how a key component in the model - an efferent copy of selected actions - would be accessible to these striatal populations. The authors propose several plausible candidates, but future work may clarify the feasibility of this proposal.

      We agree that the biological substrate of the efference copy remains a key open question. We discuss potential pathways in the Discussion section of our manuscript and hope that future experimental studies clarify the question.

      Reviewer #2:

      Summary:

      The basal ganglia is often understood within a reinforcement learning (RL) framework, where dopamine neurons convey a reward prediction error that modulates cortico-striatal connections onto spiny projection neurons (SPNS) in the striatum. However, current models of plasticity rules are inconsistent with learning in a reinforcement learning framework.

      This paper proposes a new model that describes how distinct learning rules in direct and indirect pathway striatal neurons allow them to implement reinforcement learning models. It proposes that two distinct components of striatal activity affect action selection and learning. They show that the proposed implementation allows learning in simple tasks and is consistent with experimental data from calcium imaging data in direct and indirect SPNs in freely moving mice.

      Strengths:

      Despite the success of reward prediction errors at characterizing the responses of dopamine neurons as the temporal difference error within an RL framework, the implementation of RL algorithms in the rest of the basal ganglia has been unclear. A key missing aspect has been the lack of a RL implementation that is consistent with the distinction of direct- and indirect SPNs. This paper proposes a new model that is able to learn successfully in simple RL tasks and explains recent experimental results.

      The author shows that their proposed model, unlike previous implementations, this model can perform well in RL tasks. The new model allows them to make experimental predictions. They test some of these predictions and show that the dynamics of dSPNs and iSPNs correspond to model predictions.

      More generally, this new model can be used to understand striatal dynamics across direct and indirect SPNs in future experiments.

      We thank the reviewer for the positive comments.

      Weaknesses:

      The authors could characterize better the reliability of their experimental predictions and the description of the parameters of some of the simulations.

      In addition to the descriptions in the Methods, we have provided code implementing the key features of our simulations, which should contribute to reproducibility of our results.

      The authors propose some ideas about how the specificity of the striatal efferent inputs but should highlight better that this is a key feature of the model whose anatomical implementation has yet to be resolved.

      We have clarified in the Discussion section “Biological substrates of striatal efferent inputs” that these represent assumptions or predictions that have not yet been demonstrated experimentally.

      Reviewer #3:

      Summary:

      This paper points out an inconsistency of the roles of the striatal spiny neurons projecting to the indirect pathway (iSPN) and the synaptic plasticity rule of those neurons expressing dopamine D2 receptors and proposes a novel, intriguing mechanisms that iSPNs are activated by the efference copy of the chosen action that they are supposed to inhibit.

      The proposed model was supported by simulations and analysis of the neural recording data during spontaneous behaviors.

      Strengths:

      Previous models suggested that the striatal neurons learn action-value functions, but how the information about the chosen action is fed back to the striatum for learning was not clear. The author pointed out that this is a fundamental problem for iSPNs that are supposed to inhibit specific actions and its synaptic inputs are potentiated with dopamine dips.

      The authors propose a novel hypothesis that iSPNs are activated by efference copy of the selected action which they are supposed to inhibit during action selection. Even though intriguing and seemingly unnatural, the authors demonstrated that the model based on the hypothesis can circumvent the problem of iSPNs learning to disinhibit the actions associated with negative reward errors. They further showed by analyzing the cell-type specific neural recording data by Markowitz et al. (2018) that iSPN activities tend to be anti-correlated before and after action selection.

      We thank the reviewer for the positive comments.

      Weaknesses:

      It is not correct to call the action value learning using the externally-selected action as “offpolicy.” Both off-policy algorithm Q-learning and on-policy algorithm SARSA update the action value of the chosen action, which can be different from the greedy action implicated by the present action values. In standard reinforcement learning terminology, on-policy or off-policy is regarding the actions in the subsequent state, whether to use the next action value of (to be) chosen action or that of greedy choice as in equation (7).

      It is worth noting that this paper suggested that dopamine neurons encode on-policy TD errors: Morris G, Nevet A, Arkadir D, Vaadia E, Bergman H (2006). Midbrain dopamine neurons encode decisions for future action. Nat Neurosci, 9, 1057-63. https://doi.org/10.1038/nn1743.

      We regret that we do not completely follow the reviewer’s comment. We use “off-policy” to refer to the fact that, considered in isolation, the basal ganglia reinforcement learning system that we model learns a target policy that may be distinct from the behavioral policy of the organism as a whole.

      It is also confusing to contract TD learning and Q-learning, as the latter is considered as one type of TD learning. In the TD error signal by state value function (6) is dependent on the chosen action at−1 implicitly in rt and st based on the reward and state transition function.

      We agree that this was confusing. We have therefore changed the places in our paper where we intended to refer to “TD learning of a value function V (s)” to specifically mention V (s), rather than just “TD learning.”

      It is not clear why interferences of the activities for action selection and learning can be avoided, especially when actions are taken with short intervals or even temporal overlaps. How can the efference copy activation for the previous action be dissociated with the sensory cued activation for the next action selection?

      The non-interference arises from the orthogonality of the difference (action selection) and sum (efference copy) modes, as described in Figure 3. However, we agree with the reviewer that the problem of temporal credit assignment, when many actions are taken before reward feedback is obtained, is present in our model, as in any standard RL model.

      Although it may be difficult to single out the neural pathway that carries the efference copy signal to the striatum, it is desired to consider their requirements and difference possibilities. A major issue is that the time delay from actions to reward feedback can be highly variable.

      An interesting candidate is the long-latency neurons in the CM thalamus projecting to striatal cholinergic interneurons, which are activated following low-reward actions: Minamimoto T, Hori Y, Kimura M (2005). Complementary process to response bias in the centromedian nucleus of the thalamus. Science, 308, 1798-801. https://doi.org/10.1126/science.1109154.

      We are grateful for the interesting suggestion and reference, which we have added to the manuscript. However, we note that the issue of delayed reward feedback may also be partially addressed by using a sufficiently long eligibility trace.

      In the paragraph before Eq. (3), Eq. (1) should be Eq. (2) for the iSPN.

      Corrected.

    1. Author response:

      eLife Assessment

      This manuscript offers important insights into how polyphosphate (polyP) influences protein phase separation differently from DNA. The authors present compelling evidence that polyP distinguishes between protein conformational states, leading to diverse condensate behaviors. However, differences in charge density between polyP and DNA complicate direct comparisons, and the extent to which polyP-driven phase transitions reveal initial protein states remains unclear. Addressing these concerns would strengthen the manuscript's impact for researchers interested in biomolecular condensates, protein dynamics, and stress response mechanisms.

      We thank the editorial team for the favorable assessment. We, however, contend the specific point on the difference in charge density. We have already performed experiments wherein a higher concentration of DNA is used to match the overall ‘concentration of charges’ as in the experiments with polyP (see Figure S6), and we do not identify or observe any differences in the maturation behavior with DNA, i.e. we see only dissolution at both higher and lower concentrations of DNA. Charge density (i.e. the number of charges per unit volume of the polymer), on the other hand, is an intrinsic feature of the polymer which is naturally different between DNA and polyP. In fact, the primary result of our work is our observation that polyP can discern the starting ensembles more efficiently, likely through actively engaging and interacting with the ensemble while DNA appears to be a passive player. 

      Reviewer #1 (Public review):

      Summary:

      In the article titled "Polyphosphate discriminates protein conformational ensembles more efficiently than DNA promoting diverse assembly and maturation behaviors," Goyal and colleagues investigate the role of negatively charged biopolymers, i.e., polyphosphate (polyP) and DNA, play in phase separation of cytidine repressor (CytR) and fructose repressor (FruR). The authors find that both negative polymers drive the formation of metastable protein/polymer condensates. However, polyPdriven condensates form more gel- or solid-like structures over time while DNA-driven condensates tend to dissipate over time. The authors link this disparate condensate behavior to polyP-induced structures within the enzymes. Specifically, they observe the formation of polyproline II-like structures within two tested enzyme variants in the presence of polyP. Together their results provide a unique insight into the physical and structural mechanism by which two unique negatively charged polymers can induce distinct phase transitions with the same protein. This study will be a welcomed addition to the condensate field and provide new molecular insights into how binding partner-induced structural changes within a given protein can affect the mesoscale behavior of condensates. The concerns outlined below are meant to strengthen the manuscript.

      Strengths:

      Throughout the article, the authors used the correct techniques to probe physical changes within proteins that can be directly linked to phase transition behaviors. Their rigorous experiments create a clear picture of what occurs at the molecular level with CytR and FruR are exposed to either DNA or polyP, which are unique, highly negatively charged biopolymers found within bacteria. This work provides a new view of mechanisms by which bacteria can regulate the cytoplasmic organization upon the induction of stress. Furthermore, this is likely applicable to mammalian and plant cells and likely to numerous proteins that undergo condensation with nucleic acids and other charged biopolymers.

      Weaknesses:

      The biggest weakness of this study is that compares the phase behavior of enzymes driven by negatively charged polymers that have intrinsic differences in net charge and charge density. Because these properties are extremely important for controlling phase separation, any differences may result in the observed phase transitions driven by DNA and polyP. The authors should perform an additional experiment to control for these differences as best they can. The results from these experiments will provide additional insight into the importance of charge-based properties for controlling phase transitions.

      We thank the reviewer for providing a positive review of our work. On the comment related to the final paragraph, we note that we have already conducted an experiment with a higher DNA concentration (11.24 µM) to explore if the concentration of charges plays any significant role. The results of this experiment are presented in Figure S6. We observe that even at a higher DNA concentration, the condensates dissolve over time. Therefore, the difference in the maturation behavior of condensates with varying initial protein ensembles is due to the nature of polyP (likely through its enhanced flexibility). 

      Reviewer #2 (Public review):

      Summary:

      In this study, Goyal et al demonstrate that the assembly of proteins with polyphosphate into either condensates or aggregates can reveal information on the initial protein ensemble. They show that, unlike DNA, polyphosphate is able to effectively discriminate against initial protein ensembles with different conformational heterogeneity, structure, and compactness. The authors further show that the protein native ensemble is vital on whether polyphosphate induces phase separation or aggregation, whereas DNA induces a similar outcome regardless of the initial protein ensemble. This work provides a way to improve our mechanistic understanding of how conformational transitions of proteins may regulate or drive LLPS condensate and aggregate assemblies within biological systems.

      Strengths:

      This is a thoroughly conducted study that provides an alternative route for inducing phase separation that is more informative on the initial protein ensemble involved. This is particularly useful and a complementary means to investigate the role played by protein dynamics and plasticity in phase transitions. The authors use an appropriate set of techniques to investigate unique phase transitions within proteins induced by polyphosphates. An alternative protein system is used to corroborate their findings that the unique assemblies induced by polyphosphates when compared to DNA are not restricted to a single system. The work here is well-documented, easy to interpret, and of relevance for the condensate community.

      Weaknesses:

      The major weakness of this manuscript is that it is unclear if the information on the initial protein conformational ensemble can be determined solely from the assembly and maturation behavior and the discrimination abilities of polyphosphates. In both systems studied (CytR and FruR), polyphosphate discriminates and results in unique assemblies and maturation behaviors based on the initial protein ensemble. However, it seems the assembly and maturation behavior are not a direct result of the degree of conformational dynamics and plasticity in the initial protein. In the case of CytR, the fully-folded system forms condensates that resolubilize, while the highly disordered state immediately aggregates. Whereas, in the case of FruR, the folded state induces spontaneous aggregation, and the more dynamic, molten globular, system results in short-lived condensates. These results seem to suggest the polyphosphates' ability to discriminate between the initial protein ensemble may not be able to reveal what that initial protein ensemble is unless it is already known.

      We thank the reviewer for providing constructive comments on our work. On the final paragraph: we agree that the outcome does not provide information on nature of the starting ensemble. As of now, our experimental results are primarily observations on questions related to maturation outcomes when protein ensembles of varying structure, compactness and stability interact with polyP. if there are differences in the native ensemble due to mutations (which at times cannot be revealed by ensemble probes), polyP appears to discern it more efficiently than DNA.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This study aimed to investigate the effects of optically stimulating the A13 region in healthy mice and a unilateral 6-OHDA mouse model of Parkinson's disease (PD). The primary objectives were to assess changes in locomotion, motor behaviors, and the neural connectome. For this, the authors examined the dopaminergic loss induced by 6-OHDA lesioning. They found a significant loss of tyrosine hydroxylase (TH+) neurons in the substantia nigra pars compacta (SNc) while the dopaminergic cells in the A13 region were largely preserved. Then, they optically stimulated the A13 region using a viral vector to deliver the channelrhodopsine (CamKII promoter). In both sham and PD model mice, optogenetic stimulation of the A13 region induced pro-locomotor effects, including increased locomotion, more locomotion bouts, longer durations of locomotion, and higher movement speeds. Additionally, PD model mice exhibited increased ipsi lesional turning during A13 region photoactivation. Lastly, the authors used whole-brain imaging to explore changes in the A13 region's connectome after 6-OHDA lesions. These alterations involved a complex rewiring of neural circuits, impacting both afferent and efferent projections. In summary, this study unveiled the pro-locomotor effects of A13 region photoactivation in both healthy and PD model mice. The study also indicates the preservation of A13 dopaminergic cells and the anatomical changes in neural circuitry following PD-like lesions that represent the anatomical substrate for a parallel motor pathway.

      Strengths:

      These findings hold significant relevance for the field of motor control, providing valuable insights into the organization of the motor system in mammals. Additionally, they offer potential avenues for addressing motor deficits in Parkinson's disease (PD). The study fills a crucial knowledge gap, underscoring its importance, and the results bolster its clinical relevance and overall strength.

      The authors adeptly set the stage for their research by framing the central questions in the introduction, and they provide thoughtful interpretations of the data in the discussion section. The results section, while straightforward, effectively supports the study's primary conclusion - the pro-locomotor effects of A13 region stimulation, both in normal motor control and in the 6-OHDA model of brain damage.

      We thank the reviewer for their positive comments.

      Weaknesses:

      (1) Anatomical investigation. I have a major concern regarding the anatomical investigation of plastic changes in the A13 connectome (Figures 4 and 5). While the methodology employed to assess the connectome is technically advanced and powerful, the results lack mechanistic insight at the cell or circuit level into the pro-locomotor effects of A13 region stimulation in both physiological and pathological conditions. This concern is exacerbated by a textual description of results that doesn't pinpoint precise brain areas or subareas but instead references large brain portions like the cortical plate, making it challenging to discern the implications for A13 stimulation. Lastly, the study is generally well-written with a smooth and straightforward style, but the connectome section presents challenges in readability and comprehension. The presentation of results, particularly the correlation matrices and correlation strength, doesn't facilitate biological understanding. It would be beneficial to explore specific pathways responsible for driving the locomotor effects of A13 stimulation, including examining the strength of connections to well-known locomotor-associated regions like the Pedunculopontine nucleus, Cuneiformis nucleus, LPGi, and others in the diencephalon, midbrain, pons, and medulla.

      We initially considered two approaches. The first was to look at specific projections to the motor regions, focusing on the MLR. The second was to utilize a whole-brain analysis, which is presented here. Given what we know about the zona incerta, especially its integrative role, we felt that examining the full connectome was a reasonable starting point.

      The value of the whole-brain approach is that it provides a high-level overview of the afferents and efferents to the region. The changes in the brain that occur following Parkinson-like lesions, such as those in the nigrostriatal pathway, are complex and can affect neighbouring regions such as the A13. Therefore, we wished to highlight the A13, which we considered a therapeutic target, and examine changes in connectivity that could occur following acute lesions affecting the SNc. We acknowledge that this study does not provide a causal link, but it presents the fundamental background information for subsequent hypothesis-driven, focused, region-specific analysis.

      The terms provided were taken from the Allen Brain Atlas terminology and presented as abbreviations. We have added two new figures focusing on motor regions to make the information more comprehensible (new Figures 4 and 5) and rewrote the connectomics section to make it easier to understand.

      Additionally, identifying the primary inputs to A13 associated with motor function would enhance the study's clarity and relevance.

      This is a great point to help simplify the whole-brain results. We have presented the motor-related inputs and outputs as part of a new figure in the main paper (Figure 5) and added accompanying text in the results section. We have also updated the correlation matrices to concentrate on motor regions (Figure 4). This highlights possible therapeutic pathways. We have also enhanced our discussion of these motor-related pathways. We have retained the entire dataset and added it to our data repository for those interested.

      The study raises intriguing questions about compensatory mechanisms in Parkinson's disease and a new perspective on the preservation of dopaminergic cells in A13, despite the SNc degeneration, and the plastic changes to input/output matrices. To gain inspiration for a more straightforward reanalysis and discussion of the results, I recommend the authors refer to the paper titled "Specific populations of basal ganglia output neurons target distinct brain stem areas while collateralizing throughout the diencephalon from the David Kleinfeld laboratory." This could guide the authors in investigating motor pathways across different brain regions.

      Thank you for the advice. As pointed out, Kleinfeld’s group presented their data in a nice, focused way. For the connectomic piece, we have added Figure 5, which provides a better representation than our previous submission.

      (2) Description of locomotor performance. Figure 3 provides valuable data on the locomotor effects of A13 region photoactivation in both control and 6-OHDA mice. However, a more detailed analysis of the changes in locomotion during stimulation would enhance our understanding of the pro-locomotor effects, especially in the context of 6-OHDA lesions. For example, it would be informative to explore whether the probability of locomotion changes during stimulation in the control and 6-OHDA groups. Investigating reaction time, speed, total distance, and could reveal how A13 is influencing locomotion, particularly after 6-OHDA lesions. The laboratory of Whelan has a deep knowledge of locomotion and the neural circuits driving it so these features may be instructive to infer insights on the neural circuits driving movement. On the same line, examining features like the frequency or power of stimulation related to walking patterns may help elucidate whether A13 is engaging with the Mesencephalic Locomotor Region (MLR) to drive the pro-locomotor effects. These insights would provide a more comprehensive understanding of the mechanisms underlying A13-mediated locomotor changes in both healthy and pathological conditions.

      Thank you for these suggestions. We have reorganized Figure 3 to highlight the metrics by separating the 6-OHDA from the Sham experiments (3F-J, which highlights distance travelled, average speed and duration). We have also added additional text to highlight these metrics better in the text. We have relabelled Supplementary Figure S3, which presents reaction time as latency to initiate locomotion and updated the main text to address the reviewers' points.

      Reviewer #2 (Public Review):

      Summary:

      The paper by Kim et al. investigates the potential of stimulating the dopaminergic A13 region to promote locomotor restoration in a Parkinson's mouse model. Using wild-type mice, 6-OHDA injection depletes dopaminergic neurons in the substantia nigra pars compacta, without impairing those of the A13 region and the ventral tegmentum area, as previously reported in humans. Moreover, photostimulation of presumably excitatory (CAMKIIa) neurons in the vicinity of the A13 region improves bradykinesia and akinetic symptoms after 6-OHDA injection. Whole-brain imaging with retrograde and anterograde tracers reveals that the A13 region undergoes substantial changes in the distribution of its afferents and projections after 6-OHDA injection. The study suggests that if the remodeling of the A13 region connectome does not promote recovery following chronic dopaminergic depletion, photostimulation of the A13 region restores locomotor functions.

      Strengths:

      Photostimulation of presumably excitatory (CAMKIIa) neurons in the vicinity of the A13 region promotes locomotion and locomotor recovery of wild-type mice 1 month after 6-OHDA injection in the medial forebrain bundle, thus identifying a new potential target for restoring motor functions in Parkinson's disease patients.

      Weaknesses:

      Electrical stimulation of the medial Zona Incerta, in which the A13 region is located, has been previously reported to promote locomotion (Grossman et al., 1958). Recent mouse studies have shown that if optogenetic or chemogenetic stimulation of GABAergic neurons of the Zona Incerta promotes and restores locomotor functions after 6-OHDA injection (Chen et al., 2023), stimulation of glutamatergic ZI neurons worsens motor symptoms after 6-OHDA (Lie et al., 2022).

      Thank you - we have added this reference. It is helpful as Grossman did stimulate the zona incerta in the cat and elicit locomotion, suggesting that stimulation of the area in normal mice has external validity. Grossman’s results prompted a later clinical examination of the zona incerta, but it concentrated on the zona incerta regions close to the subthalamic regions (Ossowska 2019), further caudal to the area we focused on. Chen et al. (2023) targeted the area in the lateral aspect of central/medial zona incerta, formed by dorsal and ventral zona incerta, which may account for the differing results. Our data were robust for stimulation of the medial aspect of the rostromedial zona incerta. The thigmotactic behaviour that we observed in our work that focused on CamKII neurons has not been observed with chemogenetic, optogenetic activation or with photoinhibition of GABAergic central/medial ZI (Chen et al. 2023).

      GABAergic activation of mZI to Cuneiform projections (Sharma et al. 2024) also did not produce thigmotactic behavior. We have added these points to the discussion.

      Although CAMKIIa is a marker of presumably excitatory neurons and can be used as an alternative marker of dopaminergic neurons, behavioral results of this study raise questions about the neuronal population targeted in the vicinity of the A13 region. Moreover, if YFP and CHR2-YFP neurons express dopamine (TH) within the A13 region (Fig. 2), there is also a large population of transduced neurons within and outside of the A13 region that do not, thus suggesting the recruitment of other neuronal cell types that could be GABAergic or glutamatergic.

      We found that CamKII transfection of the A13 region was extremely effective in promoting locomotor activity, which was critical for our work in exploring its possible therapeutic potential. We have since quantified the cell number, we found that the c-fos cell number was increased following ChR2 activation. There is evidence of TH activation - but the data suggest that other cell types contribute. C-fos alone is a blunt tool to assess specificity - rather, it is better at showing overall photostimulus efficacy - which we have demonstrated. Moreover, there is evidence that cell types are not purely dopaminergic, with GABA co-localized (Negishi et al. 2020). We acknowledge that specific viral approaches that target the GABAergic, glutamatergic, and dopaminergic circuits would be very useful. The range of tools to target A13 dopaminergic circuits is more limited than the SNc, for example, because the A13 region lacks DAT, and TH-IRES-Cre approaches, while helpful, are less specific than DAT-Cre mouse models. Intersectional approaches targeting multiple transmitters (glutamate & dopamine, for example) may be one solution as we do not expect that a single transmitter-specific pathway would work, as well as broad targeting of the A13 region. Our recent work suggests that GABAergic neuron activation may have more general effects on behaviour rather than control of ongoing locomotor parameters (Sharma et al. 2024). Recent work shows a positive valence effect of dopamine A13 activation on motivated food-seeking behavior, which differs from consummatory behavior observed with GABAergic modulation (Ye, Nunez, and Zhang 2023). Chemogenetic inactivation and ablation of dopaminergic A13 revealed that they contribute to grip strength and prehensile movements, uncoupling food-seeking grasping behavior from motivational factors (Garau et al. 2023). Overall, this suggests differing effects of GABA compared to DA and/or glutamatergic cell types, consistent with our effects of stimulating CamKII. The discussion has been updated.

      Regarding the analysis of interregional connectivity of the A13 region, there is a lack of specificity (the viral approach did not specifically target the A13 region), the number of mice is low for such correlation analyses (2 sham and 3 6-OHDA mice), and there are no statistics comparing 6-OHDA versus sham (Fig. 4) or contra- versus ipsilesional sides (Fig. 5). Moreover, the data are too processed, and the color matrices (Fig. 4) are too packed in the current format to enable proper visualization of the data. The A13 afferents/efferents analysis is based on normalized relative values; absolute values should also be presented to support the claim about their upregulation or downregulation.

      Generally, papers using tissue-clearing imaging approaches have low sample sizes due to technical complexity and challenges. The technical challenges of obtaining these data were substantial in both collection and analysis. There are multiple technical complexities arising from dual injections (A13 and MFB coordinates) and targeting the area correctly. The A13 region is difficult to target as it spans only around 300 µm in the anterior-posterior axis. While clearing the brain takes weeks, and light-sheet imaging also takes time, the time necessary to analyze the tissue using whole-brain quantification is labor intensive, especially with a lack of a standardized analysis pipeline from atlas registrations, signal segmentations, and quantifications. The field is still relatively new, requiring additional time to refine pipelines.

      Correlation matrices are often used in analyzing connectivity patterns on a brain-wide scale, as they can identify any observable patterns within a large amount of data. We used correlation matrices to display estimated correlation coefficients between the afferent and efferent proportions from one brain subregion to another across 251 brain regions in total in a pairwise manner (not for hypothesis testing). We provided descriptive statistics (mean and error bars) in the original Figure 5C and G. As mentioned in comments for Reviewer 1, we have now presented the data in revised Figure 4 and 5 that focuses specifically on motor-related pathways to provide information on possible pathways. The has simplified the correlation matrices and highlighted the differences in 6-OHDA efferent data especially. As suggested, raw values are shared in a supplemental file on our data repository.

      In the absence of changes in the number of dopaminergic A13 neurons after 6-OHDA injection, results from this correlation analysis are difficult to interpret as they might reflect changes from various impaired brain regions independently of the A13 region.

      We acknowledge that models of Parkinson’s disease, particularly those using 6-OHDA, induce plasticity in various regions, which may subsequently affect A13 connectivity. We aim to emphasize the residual, intact A13 pathways that could serve as therapeutic targets in future investigations. This emphasis is pertinent in the context of potential clinical applications, as the overall input and output to the region fundamentally dictate the significance of the A13 region in lesioned nigrostriatal models. We agree with the reviewer that the changes certainly can be independent of A13; however, the fact that there was a significant change in the connectome post-6-OHDA injection and striatonigral degeneration is in and of itself important to document. We have added a sentence acknowledging this limitation to the discussion.

      There is no causal link between anatomical and behavioral data, which raises questions about the relevance of the anatomical data.

      This point was also addressed earlier in response to a comment from Reviewer 1. Focusing on specific motor pathways is one avenue to explore. However, given that the zona incerta acts as an integrative hub, we believed it is prudent to initially examine both afferent and efferent pathways using a brain-wide approach. For instance, without employing this methodology, the potential significance of cortical interconnectivity to the A13 region might not have been fully appreciated. As mentioned previously, we will place additional emphasis on motor-related regions in our revised paper, thereby enhancing the relevance of the anatomical data presented. With these modifications, we anticipate that our data will underscore specific motor-related targets for future exploration, employing optogenetic targeting to assess necessity and sufficiency.

      Overall, the study does not take advantage of genetic tools accessible in the mouse to address the direct or indirect behavioral and anatomical contributions of the A13 region to motor control and recovery after 6-OHDA injection.

      Our study has not specifically targeted neurons that express dopaminergic, glutamatergic, or GABAergic properties (refer to earlier comment for more detail). However, like others, we find that targeting one neuronal population often does not result in a pure transmitter phenotype. For instance, evidence suggests co-localization of dopamine neurons with a subpopulation of GABA neurons in the A13/medial zona incerta (Negishi et al. 2020). In the hypothalamus, research by Deisseroth and colleagues (Romanov et al. 2017) indicates the presence of multiple classes of dopamine cells, each containing different ratios of co-localized peptides and/or fast neurotransmitters. Consequently, we believe our work lays the foundation for the investigations suggested by the reviewer. Furthermore, if one considers this work in the context of a preclinical study to determine whether the A13 might be a target in human Parkinson's disease, the existing technology that could be utilized is deep brain stimulation (DBS) or electrical modulation, which would also affect different neuronal populations in a non-specific manner.

      While optogenetic stimulation therapy is longer term, using CamKII combined with the DJ hybrid AAV could be a translatable strategy for targeting A13 neuronal populations in non-human primates (Watakabe et al. 2015; Watanabe et al. 2020). We have added to the discussion.

      Reviewer #3 (Public Review):

      Kim, Lognon et al. present an important finding on pro-locomotor effects of optogenetic activation of the A13 region, which they identify as a dopamine-containing area of the medial zona incerta that undergoes profound remodeling in terms of afferent and efferent connectivity after administration of 6-OHDA to the MFB. The authors claim to address a model of PD-related gait dysfunction, a contentious problem that can be difficult to treat with dopaminergic medication or DBS in conventional targets. They make use of an impressive array of technologies to gain insight into the role of A13 remodeling in the 6-OHDA model of PD. The evidence provided is solid and the paper is well written, but there are several general issues that reduce the value of the paper in its current form, and a number of specific, more minor ones. Also, some suggestions, that may improve the paper compared to its recent form, come to mind.

      Thank you for the suggestions and careful consideration of our work - it is appreciated.

      The most fundamental issue that needs to be addressed is the relation of the structural to the behavioral findings. It would be very interesting to see whether the structural heterogeneity in afferent/effects projections induced by 6-OHDA is related to the degree of symptom severity and motor improvement during A13 stimulation.

      As mentioned in comments for Reviewer 1, we have performed additional analysis and present this in Figure 5. We have also revised Figure 4, focusing on motor regions. Our work will provide a roadmap for future studies to disentangle divergent or convergent A13 pathways that are involved in different or all PD-related motor symptoms. Because we could not measure behavioural change in the same animals studied with the anatomic study (essentially because the optrode would have significantly disrupted the connectome we are measuring), we cannot directly compare behaviour to structure.

      The authors provide extensive interrogation of large-scale changes in the organization of the A13 region afferent and efferent distributions. It remains unclear how many animals were included to produce Fig 4 and 5. Fig S5 suggests that only 3 animals were used, is that correct? Please provide details about the heterogeneity between animals. Please provide a table detailing how many animals were used for which experiment. Were the same animals used for several experiments?

      The behavioral set and the anatomical set were necessarily distinct. In the anatomical experiments, we employed both anterograde and retrograde viral approaches to target the afferent and efferent A13 populations with fluorescent proteins. For the behavioral approach, a single ChR2 opsin was utilized to photostimulate the A13 region; hence combining the two populations was not feasible. We were also concerned that the optrode itself would interfere with connectomics. A lower number of animals were used for the whole-brain work due to technical limitations described earlier. We have now provided additional information regarding numbers in all figures and the text. Using Spearman’s correlation analysis, we found afferent and efferent proportions across animals to be consistent, with an average correlation of 0.91, which is reported in Figure S6.

      While the authors provide evidence that photoactivation of the A13 is sufficient in driving locomotion in the OFT, this pro-locomotor effect seems to be independent of 6-OHDA-induced pathophysiology. Only in the pole test do they find that there seems to be a difference between Sham vs 6-OHDA concerning the effects of photoactivation of the A13. Because of these behavioral findings, optogenic activation of A13 may represent a gain of function rather than disease-specific rescue. This needs to be highlighted more explicitly in the title, abstract, and conclusion.

      Optogenetic activation of A13 may represent a gain of function in both healthy and 6-OHDA mice, highlighting a parallel descending motor pathway that remains intact. 6-OHDA lesions have multiple effects on motor and cognitive function. This makes a single pathway unlikely to rescue all deficits observed in 6-OHDA models. The lack of locomotion observed in 6-OHDA models can be reversed by A13 region photostimulation. Therefore, this is a reversal of a loss of function, in this case. However, the increase in turning represents a gain of function. We have highlighted this as suggested in the discussion.

      The authors claim that A13 may be a possible target for DBS to treat gait dysfunction. However, the experimental evidence provided (in particular the lack of disease-specific changes in the OFT) seems insufficient to draw such conclusions. It needs to be highlighted that optogenetic activation does not necessarily have the same effects as DBS (see the recent review from Neumann et al. in Brain: https://pubmed.ncbi.nlm.nih.gov/37450573/). This is important because ZI-DBS so far had very mixed clinical effects. The authors should provide plausible reasons for these discrepancies. Is cell-specificity, which only optogenetic interventions can achieve, necessary? Can new forms of cyclic burst DBS achieve similar specificity (Spix et al, Science 2021)? Please comment.

      Thank you for the valuable comments. They have been incorporated into the discussion.

      Our study highlights a parallel motor pathway provided by the A13 region that remains intact in 6-OHDA mice and can be sufficiently driven to rescue the hypolocomotor pathology observed in the OFT and overcome bradykinesia and akinesia. The photoactivation of ipsilesional A13 also has an overall additive effect on ipsiversive circling, representing a gain of function on the intact side that contributes to the magnitude of overall motor asymmetry against the lesioned side. The effects of DBS are rather complex, ranging from micro-, meso-, to macro-scales, involving activation, inhibition, and informational lesioning, and network interactions. This could contribute to the mixed clinical effects observed with ZI-DBS, in addition to differences in targeting and DBS programming among the studies (see review (Ossowska 2019) ). Also the DBS studies targeting ZI have never targeted the rostromedial ZI which extends towards the hypothalamus and contains the A13. Furthermore, DBS and electrical stimulation of neural tissue, in general, are always limited by current spread and lower thresholds of activation of axons (e.g., axons of passage), both of which can reduce the specificity of the true therapeutic target. Optogenetic studies have provided mechanistic insights that could be leveraged in overcoming some of the limitations in targeting with conventional DBS approaches. Spix et al. (2021) provided an interesting approach highlighting these advancements. They devised burst stimulation to facilitate population-specific neuromodulation within the external globus pallidus. Moreover, they found a complementary role for optogenetics in exploring the pathway-specific activation of neurons activated by DBS. To ascertain whether A13 DBS may be a viable therapy for PD gait, it will be necessary to perform many more preclinical experiments, and tuning of DBS parameters could be facilitated by optogenetic stimulation in these murine models. We have added to the discussion.

      In a recent study, Jeon et al (Topographic connectivity and cellular profiling reveal detailed input pathways and functionally distinct cell types in the subthalamic nucleus, 2022, Cell Reports) provided evidence on the topographically graded organization of STN afferents and McElvain et al. (Specific populations of basal ganglia output neurons target distinct brain stem areas while collateralizing throughout the diencephalon, 2021, Neuron) have shown similar topographical resolution for SNr efferents. Can a similar topographical organization of efferents and afferents be derived for the A13/ ZI in total?

      The ZI can be subdivided into four subregions in the antero-posterior axis: rostral (ZIr), dorsal (ZId), ventral (ZIv), and caudal (ZIc) regions. The dorsal and ventral ZI is also referred together as central/medial/intermediate ZI. There are topographical gradients in different cell types and connectivity across these subregions (see reviews: (Mitrofanis 2005; Monosov et al. 2022; Ossowska 2019). Recent work by Yang and colleagues (2022) demonstrated a topographical organization among the inputs and outputs of GABAergic (VGAT) populations across four ZI subregions. Given that A13 region encompasses a smaller portion (the medial aspect) of both rostral and medial/central ZI (three of four ZI subregions) and coexpress VGAT, A13 region likely falls under rostral and intermediate medial ZI dataset found in Yang et al. (2022). With our data, we would not be able to capture the breadth of topographical organization shown in Yang et al (2022).

      In conclusion, this is an interesting study that can be improved by taking into consideration the points mentioned above.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Figure 2 indeed presents valuable information regarding the effects of A13 region photoactivation. To enhance the comprehensiveness of this figure and gain a deeper understanding of the neurons driving the pro-locomotor effect of stimulation, it would be beneficial to include quantifications of various cell types:

      • cFos-Positive Cells/TH-Positive Cells: it can help determine the impact of A13 stimulation on dopaminergic neurons and the associated pro-locomotor effect in the healthy condition and especially in the context of Parkinson's disease (PD) modeling.

      • cFos-Positive Cells /TH-Negative Cells: Investigating the number of TH-negative cells activated by stimulation is also important, as it may reveal non-dopaminergic neurons that play a role in locomotor responses. Identifying the location and characteristics of these TH-negative cells can provide insights into their functional significance.

      We have completed this analysis. The data is presented in Figure 2F, where we show increased c-fos intensity with photoactivation. We observed an increase in the number of cells activated in the A13 region. However, we did not definitively see increases in TH+ cells, suggesting a heterogeneous set of neurons responsible for the effects—possibly glutamatergic neurons.

      Incorporating these quantifications into Figure 2 would enhance the figure's informativeness and provide a more comprehensive view of the neuronal populations involved in the locomotor effects of A13 stimulation.

      We have added text and a new graph.

      (2) Refer to Figure 3. In the main text (page 5) when describing the animal with 6-OHDA the wrong panels are indicated. It is indicated in Figure 2A-E but it should be replaced with 3A-E.

      Please do that.

      Done, and we have updated the figure to improve readability, by separating the 6-OHDA findings from sham in all graphs.

      Reviewer #2 (Recommendations For The Authors):

      Abstract

      Page 1: Inhibitory or lesion studies will be necessary to support the claim that the global remodeling of afferent and efferent projections of the A13 region highlights the Zona Incerta's role as a crucial hub for the rapid selection of motor function.

      Overall, there is quite a bit of evidence that the zona incerta is a hub for afferent/efferents.

      Mitrofanis (2005) and, more recently, Wang et al. (2020) summarize some of the evidence. Yang (2022) illustrates that the zona incerta shows multiple inputs to GABAergic neurons and outputs to diverse regions. Recent work suggests that the zona incerta contributes to various motor functions such as hunting, exploratory locomotion, and integrating multiple modalities (Zhao et al. 2019; Wang et al. 2019; Monosov et al. 2022; Chometton et al. 2017). The introduction has been updated.

      Introduction

      Page 2, paragraph 2: "However, little attention has been placed on the medial zona incerta (mZI), particularly the A13, the only dopamine-containing region of the rostral ZI" Is the A13 region located in the rostral or medial ZI or both?

      It should have been written “rostromedial” ZI. The A13 is located in the medial aspect of rostromedial ZI. Introduction has been updated.

      Page 2, para 3: Li et al (2021) used a mini-endoscope to record the GCaMP6 signal. Masini and Kiehn, 2022 transiently blocked the dopaminergic transmission; they never used 6-OHDA.

      Please correct through the text.

      Corrected.

      Page 2, para 4: the A13 connectome encompasses the cerebral cortex,... MLR. The MLR is a functional region, correct this for the CNF and PPN.

      Corrected.

      Page 3, the last paragraph of the introduction could be clarified by presenting the behavioral data first, followed by the anatomy.

      This has been corrected

      Figure 1 is nice and clear, and well summarizes the experimental design.

      Thank you.

      Figure 2 shows an example of the extent of the ChR2-YFP expression and the position of an optical fiber tip above the dopaminergic A13 region from a mouse. Without any quantification, these images could be included in Figure 1. Despite a very small volume (36.8nL) of AAV, the extent of ChR2-YFP expression is quite large and includes dopaminergic and unidentified neurons within the A13 region but also a large population of unidentified neurons outside of it, thus raising questions about the volume and the types of neurons recruited.

      This is an important consideration. The issue of viral spread is complex and depends on factors including tissue type, serotype, and promotor of the virus. Li et al. (2021), for example, used different virus serotypes and promotors, injecting 150nL, whereas we used AAV DJ, injecting 36.8nL. AAV-DJ is a hybrid viral type consisting of multiple serotypes. It has a high transduction efficiency, which leads to greater gene delivery than single-serotype AAV viral constructs (Mao et al. 2016). A secondary consideration regarding translation was that AAV-DJ could effectively transduce non-primate neurons (Watanabe et al. 2020). We have addressed the issue of neurons recruited earlier, provided c-Fos quantification, and provided a new supplementary figure showing viral spread (Figure S1).

      Anatomical reconstruction of the extent of the ChR2-YFP expression and the location of the tip of the optical fiber will be necessary to confirm that ChR2-YFP expression was restricted to the A13 region.

      We will provide additional information regarding viral spread, ferrule tip placement, and c-fos cell counts. This has been done in Figure 2 and we also present a new Figure S1 where we have quantified the viral spread.

      Page 5, 1st para: Double-check the references, as not all of them are 6-OHDA injections in the MLF.

      Corrected. Removed Kiehn reference.

      Page 5, 1st para, 4th line: Replace ferrule with optical canula or fiber.

      Done

      Page 5, 1st para, 9th line: Replace Figure 2 with Figure 3.

      Done

      Page 5, 2nd para: About the refractory decrease in traveled distance by sham-ChR2 mice: is this significant?

      It was not significant (Figure S1C, 1-way RM ANOVA: F5,25 = 0.486, P \= 0.783). This has been updated in the text.

      Figure 3 showing behavioral assessments is nice, but the stats are not always clear. In Fig 3A, are each of the off and on boxes 1 minute long? The figure legend states the test lasts 1 min, but isn't it 4 minutes? In Figure 3B-E and 3J-M, what are the differences? Do the stats identify a significant difference only during the stimulation phase? Fig. 3F-I are nice and could have been presented as primary examples prior to data analysis in Fig. 3B-E. Group labels above the graph would help.

      Yes, the off-on boxes are 1 minute long. The error is corrected in the legend. Great suggestion for F-I - they have been moved ahead of the summary figures. We have also updated new Fig 3F-,I, J, L, M) to make the differences between 6-OHDA and sham graphs easier to visualize. The stats do indicate a significant difference during the stimulation phase. We have added group labels, and reorganized the figure, and it is much easier to read now.

      Fig. 3L-M, what do PreSur, Post, and Ferrule mean? I assume that Ferrule refers to mice tested with the optical fiber without stimulation, whereas Stim. refers to the stimulation. It would be helpful to standardize the format of stats in Fig. 3B-E and 3-J-M. What are time points a, b, and c referring to?

      We have renamed the figure names to be more intuitive. We have standardized the presentation of statistics in the figure, and eliminated the a,b,c nomenclature. We have also updated the caption to provide descriptions of the tests in Fig 3 L-M.

      Figure S2A: the higher variability in 6-OHDA-YFP mice in comparison to 6-OHDA-ChR2 mice prior to stimulation suggests that 6-OHDA-YFP mice were less impaired. Why use boxplots only for these data? Would a pairwise comparison be more appropriate?

      We have removed these plots from Figure S2. We now present the Baseline to Pre values across the experimental timespan to illustrate the fact that distance travelled returned to baseline values for all trials conducted.

      Fig. S2B: add the statistical marker.

      We have removed this from Figure S2.

      Page 7, para 1, line 8: to add "in comparison to 6-OHDA-YFP and YFP mice" to during photostimulation... (Figure 3E).

      Done

      Page 7, para 3, line 5: about larger improvement, replace "sham ChR2" with "6-OHDA."

      Done

      Page 8, para 1, line 4: Perier et al., 2000 reported that 6-OHDA injection increased the firing frequency of the ZI over a month.

      Added the timeframe to this sentence.

      Page 8, para 2, line 1: Since the results were expected, add some references.

      Done.

      Page 8, para 3, line 4. Double-check the reference.

      Corrected.

      Page 8: About large-scale changes in the A13 region, the relevance of correlation matrices is difficult to grasp. Analysis of local connectivity would have been more informative in the context of GABAergic and glutamatergic neurons of the ZI in the vicinity of the A13 region.

      We have updated the figures for connectivity throughout the manuscript. Overall, there are new Figures 4 and 5 in the main text. We also provide a revised Supplementary Figure 8. Unfortunately, we could not do that experiment regarding local connectivity. In light of our new work (Sharma et al. 2024), it is clear that this will be critical going forward.

      Page 8, para 3, line: given Fig. 2, there is concern about the claim that only the A13 region was targeted. The time of the analysis after 6-OHDA should be mentioned. Some sections of the paragraph could be moved to methods.

      We have provided more information about the viral spread in the text and Supplementary Figure 1. The functional and anatomical experiments are separate, which we realize caused confusion. We have mentioned analysis time after 6-OHDA and inserted this into the text.

      Fig. 4: The color code helps the reader visualize distribution differences. However, statistical analyses comparing 6-OHDA versus sham should be included. Quantification per region would greatly help readers visualize the data and support the conclusion. The relationship between the type of correlation (positive or negative) and absolute change (increase or decrease) is unknown in the current format, which limits the interpretation of the data. Moreover, examples of raw images of axons and cells should be presented for several brain regions. The experimental design with a timeline, as in Fig. 1, would be helpful. The legend for Fig. 4 is a bit long. Some sections are very descriptive, whereas others are more interpretive.

      We have provided a new Figure 5 where we present quantification per region, and the correlation matrices have been updated in Figure 4. We have also focused on motor regions as mentioned earlier. We also provide examples of raw regions in Supplementary Figure 8. Raw values are shared on our data repository.

      Page 10, para 1, line 1: add "afferent" to "changes in -afferent and- projection patterns."

      Done

      Page 10, para 1, line 9: remove the 2nd "compared to sham" in the sentence.

      Done

      Page 10, para 1, line 10: remove "coordinated" in "several regions showed a coordinated reduction in afferent density." We cannot say anything about the timing of events, as there is only info at 1 month.

      Done

      Page 10, para 2: the section should be written in the past tense.

      Done

      Page 13, para 2, the last sentence is overstated. Please remove "cells" and refer to the A13 region instead.

      Done

      About differential remodelling of the A13 region connectome: Figure 5C and 5G: The proportion of total afferents ipsi- and contralateral to 6-OHDA injection argues that the A13 region primarily receives inputs from the cortical plate and the striatum. Unfortunately, there are no statistics.

      Due to the small sample size, we provided descriptive statistics (mean and error bars) in Figure 5A. As mentioned in comments for Reviewers 1 and 2, we have revised Figure 5 to present data focusing on motor-related pathways to provide clarity. In addition, absolute values are shared on our data repository.

      Figure 5 D and 5H: Changes in the proportion of total afferents/projections are relatively modest (less than 10% of the whole population for the highest changes). There is no standard deviation for these data and no statistics. Do they reflect real changes or variability from the injection site?

      The changes are relatively modest (less than 10%) since a small brain region usually provides a small proportion of total input (McElvain et al. 2021; Yang et al. 2022). The changes in the proportions reflect real differences between average proportions observed in sham and 6-OHDA mice. The variability in the total labelling of neurons and fibers was minimized by normalizing individual regional counts against total counts found in each animal. This figure has been updated as reviewers requested.

      Fig 5F and H: The example in F shows a huge decrease in the striatum, but H indicates only a 2% change, which makes the example not very representative. Absolute values would be helpful.

      While a 2% change may seem small, it represents a relatively large change in the A13 efferent connectome. To provide further clarity, we have provided absolute values as suggested in our new supplemental table.

      Figure 6 is inaccurate and unnecessary.

      Figure 6 has been removed.

      Discussion

      Although interesting, the discussion is too long.

      The discussion has been reduced by about three quarters of a page.

      Methods

      Page 17, para 1: include the stereotaxic coordinates of the optical cannula above the A13 region.

      Added.

      References

      Chen, Fenghua, Junliang Qian, Zhongkai Cao, Ang Li, Juntao Cui, Limin Shi, and Junxia Xie. 2023. “Chemogenetic and Optogenetic Stimulation of Zona Incerta GABAergic Neurons Ameliorates Motor Impairment in Parkinson’s Disease.” i Science 26 (7). https://doi.org/ 10.1016/j.isci.2023.107149.

      Chometton, S., K. Charrière, L. Bayer, C. Houdayer, G. Franchi, F. Poncet, D. Fellmann, and P. Y. Risold. 2017. “The Rostromedial Zona Incerta Is Involved in Attentional Processes While Adjacent LHA Responds to Arousal: C-Fos and Anatomical Evidence.” Brain Structure & Function 222 (6): 2507–25.

      Garau, Celia, Jessica Hayes, Giulia Chiacchierini, James E. McCutcheon, and John Apergis-Schoute. 2023. “Involvement of A13 Dopaminergic Neurons in Prehensile Movements but Not Reward in the Rat.” Current Biology: CB, October.

      https://doi.org/ 10.1016/j.cub.2023.09.044.

      Li, Zhuoliang, Giorgio Rizzi, and Kelly R. Tan. 2021. “Zona Incerta Subpopulations Differentially Encode and Modulate Anxiety.” Science Advances 7 (37): eabf6709.

      Mao, Yingying, Xuejun Wang, Renhe Yan, Wei Hu, Andrew Li, Shengqi Wang, and Hongwei Li. 2016. “Single Point Mutation in Adeno-Associated Viral Vectors -DJ Capsid Leads to Improvement for Gene Delivery in Vivo.” BMC Biotechnology 16 (January):1.

      McElvain, Lauren E., Yuncong Chen, Jeffrey D. Moore, G. Stefano Brigidi, Brenda L. Bloodgood, Byung Kook Lim, Rui M. Costa, and David Kleinfeld. 2021. “Specific Populations of Basal Ganglia Output Neurons Target Distinct Brain Stem Areas While Collateralizing throughout the Diencephalon.” Neuron 109 (10): 1721–38.e4.

      Mitrofanis, J. 2005. “Some Certainty for the ‘Zone of Uncertainty’? Exploring the Function of the Zona Incerta.” Neuroscience 130 (1): 1–15.

      Monosov, Ilya E., Takaya Ogasawara, Suzanne N. Haber, J. Alexander Heimel, and Mehran Ahmadlou. 2022. “The Zona Incerta in Control of Novelty Seeking and Investigation across Species.” Current Opinion in Neurobiology 77 (December):102650.

      Negishi, Kenichiro, Mikayla A. Payant, Kayla S. Schumacker, Gabor Wittmann, Rebecca M.  Butler, Ronald M. Lechan, Harry W. M. Steinbusch, Arshad M. Khan, and Melissa J. Chee. 2020. “Distributions of Hypothalamic Neuron Populations Coexpressing Tyrosine Hydroxylase and the Vesicular GABA Transporter in the Mouse.” The Journal of Comparative Neurology 528 (11): 1833–55.

      Ossowska, Krystyna. 2019. “Zona Incerta as a Therapeutic Target in Parkinson’s Disease.” Journal of Neurology. https://doi.org/ 10.1007/s00415-019-09486-8.

      Romanov, Roman A., Amit Zeisel, Joanne Bakker, Fatima Girach, Arash Hellysaz, Raju Tomer, Alán Alpár, et al. 2017. “Molecular Interrogation of Hypothalamic Organization Reveals Distinct Dopamine Neuronal Subtypes.” Nature Neuroscience 20 (2): 176–88.

      Sharma, Sandeep, Cecilia A. Badenhorst, Donovan M. Ashby, Stephanie A. Di Vito, Michelle A. Tran, Zahra Ghavasieh, Gurleen K. Grewal, Cole R. Belway, Alexander McGirr, and Patrick J. Whelan. 2024. “Inhibitory Medial Zona Incerta Pathway Drives Exploratory Behavior by Inhibiting Glutamatergic Cuneiform Neurons.” Nature Communications 15 (1): 1160.

      Spix, Teresa A., Shruti Nanivadekar, Noelle Toong, Irene M. Kaplow, Brian R. Isett, Yazel  Goksen, Andreas R. Pfenning, and Aryn H. Gittis. 2021. “Population-Specific Neuromodulation Prolongs Therapeutic Benefits of Deep Brain Stimulation.” Science 374 (6564): 201–6.

      Wang, Xiyue, Xiaolin Chou, Bo Peng, Li Shen, Junxiang J. Huang, Li I. Zhang, and Huizhong W. Tao. 2019. “A Cross-Modality Enhancement of Defensive Flight via Parvalbumin Neurons in Zona Incerta.” eLife 8 (April). https://doi.org/ 10.7554/eLife.42728.

      Wang, Xiyue, Xiao-Lin Chou, Li I. Zhang, and Huizhong Whit Tao. 2020. “Zona Incerta: An Integrative Node for Global Behavioral Modulation.” Trends in Neurosciences 43 (2): 82–87.

      Watakabe, Akiya, Masanari Ohtsuka, Masaharu Kinoshita, Masafumi Takaji, Kaoru Isa, Hiroaki Mizukami, Keiya Ozawa, Tadashi Isa, and Tetsuo Yamamori. 2015. “Comparative Analyses of Adeno-Associated Viral Vector Serotypes 1, 2, 5, 8 and 9 in Marmoset, Mouse and Macaque Cerebral Cortex.” Neuroscience Research 93 (April):144–57.

      Watanabe, Hidenori, Hiromi Sano, Satomi Chiken, Kenta Kobayashi, Yuko Fukata, Masaki  Fukata, Hajime Mushiake, and Atsushi Nambu. 2020. “Forelimb Movements Evoked by Optogenetic Stimulation of the Macaque Motor Cortex.” Nature Communications 11 (1): 3253.

      Yang, Yang, Tao Jiang, Xueyan Jia, Jing Yuan, Xiangning Li, and Hui Gong. 2022. “Whole-Brain Connectome of GABAergic Neurons in the Mouse Zona Incerta.” Neuroscience Bulletin 38 (11): 1315–29.

      Ye, Qiying, Jeremiah Nunez, and Xiaobing Zhang. 2023. “Zona Incerta Dopamine Neurons Encode Motivational Vigor in Food Seeking.” bioRxiv: The Preprint Server for Biology, June. https://doi.org/ 10.1101/2023.06.29.547060.

      Zhao, Zheng-Dong, Zongming Chen, Xinkuan Xiang, Mengna Hu, Hengchang Xie, Xiaoning Jia, Fang Cai, et al. 2019. “Zona Incerta GABAergic Neurons Integrate Prey-Related Sensory Signals and Induce an Appetitive Drive to Promote Hunting.” Nature Neuroscience 22 (6): 921–32.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #2 (Public review):

      Summary

      In this extensive comparative study, Moreno-Borrallo and colleagues examine the relationships between plasma glucose levels, albumin glycation levels, diet and lifehistory traits across birds. Their results confirmed the expected positive relationship between plasma blood glucose level and albumin glycation rate but also provided findings that are somewhat surprising or contrast with findings of some previous studies (positive relationships between blood glucose and lifespan, or absent relationships between blood glucose and clutch mass or diet). This is the first extensive comparative analysis of glycation rates and their relationships to plasma glucose levels and life history traits in birds that is based on data collected in a single study, with blood glucose and glycation measured using unified analytical methods (except for blood glucose data for 13 species collected from a database).

      Strengths

      This is an emerging topic gaining momentum in evolutionary physiology, which makes this study a timely, novel and important contribution. The study is based on a novel data set collected by the authors from 88 bird species (67 in captivity, 21 in the wild) of 22 orders, except for 13 species, for which data were collected from a database of veterinary and animal care records of zoo animals (ZIMS). This novel data set itself greatly contributes to the pool of available data on avian glycemia, as previous comparative studies either extracted data from various studies or a ZIMS database (therefore potentially containing much more noise due to different methodologies or other unstandardised factors), or only collected data from a single order, namely Passeriformes. The data further represents the first comparative avian data set on albumin glycation obtained using a unified methodology. The authors used LC-MS to determine glycation levels, which does not have problems with specificity and sensitivity that may occur with assays used in previous studies. The data analysis is thorough, and the conclusions are substantiated. Overall, this is an important study representing a substantial contribution to the emerging field evolutionary physiology focused on ecology and evolution of blood/plasma glucose levels and resistance to glycation.

      Weaknesses

      Unfortunately, the authors did not record handling time (i.e., time elapsed between capture and blood sampling), which may be an important source of noise because handling-stress-induced increase in blood glucose has previously been reported. Moreover, the authors themselves demonstrate that handling stress increases variance in blood glucose levels. Both effects (elevated mean and variance) are evident in Figure ESM1.2. However, this likely makes their significant findings regarding glucose levels and their associations with lifespan or glycation rate more conservative, as highlighted by the authors.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      I understand that your main objective regarding glycation rate and lifespan, was to analyse the species resistance to glycation with respect to lifespan, while factoring out the species-specific variation in blood glucose level. However, I still believe that the absolute glycation level (i.e., not controlled for blood glucose level) may also be important for the evolution of lifespan. Given that blood glucose is positively related to both glycation and lifespan (although with a plateau in the latter case), lifespan could possibly be positively correlated with absolute glycation levels. If significant, that would be an interesting and counterintuitive finding, which would call for an explanation, thereby potentially stimulating further research. If not significant, it would show that long-lived species do not have higher glycation levels, despite having higher blood glucose levels, thereby strengthening your argument about higher resistance of longlived species to glycation. So, in my opinion, the inclusion of an additional model of glycation level on life-history traits, without controlling for blood glucose, is worth considering.

      We include now this model as supplementary material, indicating it in several parts of the text, including some of these issues we discussed here.

      Lines 230-231: Please, provide a citation for these GVIF thresholds

      We include it now.

      Figure 3: I think that showing both glucose and glycation rate on the linear scale, rather than log scale, would better illustrate your conclusion - the slowing rise of glycation rate with increasing glucose levels.

      That is a good point, although it may also be confusing for readers to see a graph that represents the data in a different way as the models. Maybe showing both graphs (as 3.A and 3.B) can solve it?

      Figure 4. I recommend stating in the caption that the whiskers do not represent interquartile ranges (a standard option in box plots) but credible intervals as mentioned in the current version of the public author response.

      Sorry about that, it was missed. Now it is included. Nevertheless, interquartile ranges from the posterior distributions can still be observed here represented with the boxes. Then the whiskers are the credible intervals.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, Guo and colleagues used a cell rounding assay to screen a library of compounds for inhibition of TcdB, an important toxin produced by Clostridioides difficile. Caffeic acid and derivatives were identified as promising leads, and caffeic acid phenethyl ester (CAPE) was further investigated.

      Strengths:

      Considering the high morbidity rate associated with C. difficile infections (CDI), this manuscript presents valuable research in the investigation of novel therapeutics to combat this pressing issue. Given the rising antibiotic resistance in CDI, the significance of this work is particularly noteworthy. The authors employed a robust set of methods and confirmatory tests, which strengthened the validity of the findings. The explanations provided are clear, and the scientific rationale behind the results is well-articulated. The manuscript is extremely well-written and organized. There is a clear flow in the description of the experiments performed. Also, the authors have investigated the effects of CAPE on TcdB in careful detail and reported compelling evidence that this is a meaningful and potentially useful metabolite for further studies.

      Weaknesses:

      This is really a manuscript about CAPE, not caffeic acid, and the title should reflect that. Also, a few details are missing from the description of the experiments. The authors should carefully revise the manuscript to ascertain that all details that could affect the interpretation of their results are presented clearly. Just as an example, the authors state in the results section that TcdB was incubated with compounds and then added to cells. Was there a wash step in between? Could compound carryover affect how the cells reacted independently from TcdB? This is just an example of how the authors should be careful with descriptions of their experimental procedures. Lastly, authors should be careful when drawing conclusions from the analysis of microbiota composition data. Ascribing causality to correlational relationships is a recurring issue in the microbiome field. Therefore, I suggest authors carefully revise the manuscript and tone down some statements about the impact of CAPE treatment on the gut microbiota.

      Thanks for your constructive suggestion. We have carefully revised the manuscript, including the description of title, results and methods sections.

      Reviewer #2 (Public review):

      Summary:

      This work is towards the development of nonantibiotic treatment for C. difficile. The authors screened a chemical library for activity against the C. difficile toxin TcdB, and found a group of compounds with antitoxin activity. Caffeic acid derivatives were highly represented within this group of antitoxin compounds, and the remaining portion of this work involves defining the mechanism of action of caffeic acid phenethyl ester (CAPE) and testing CAPE in mouse C. difficile infection model. The authors conclude CAPE attenuates C. difficile disease by limiting toxin activity and increasing microbial diversity during C. difficile infection.

      Strengths/ Weaknesses:

      The strategy employed by the authors is sound although not necessarily novel. A compound that can target multiple steps in the pathogenies of C. difficile would be an exciting finding. However, the data presented does not convincingly demonstrate that CAPE attenuates C. difficile disease and the mechanism of action of CAPE is not convincingly defined. The following points highlight the rationale for my evaluation.

      (1) The toxin exposure in tissue culture seems brief (Figure 1). Do longer incubation times between the toxin and cells still show CAPE prevents toxin activity?

      Thanks for your comments. The cytotoxicity assay was employed to directly assess the protective capacity of CAPE against cell death induced by TcdB. Our observations at 1 and 12 h post-TcdB exposure revealed that CAPE effectively mitigated the toxic effects of the TcdB at both time points, demonstrating its potent protective role. Please see Figure S1.

      (2) The conclusion that CAPE has antitoxin activity during infection would be strengthened if the mouse was pretreated with CAPE before toxin injections (Figure 1D).

      Thanks for your constructive comments. According to your suggestion, we administered TcdB 2 h after pretreatment with CAPE. The outcomes demonstrated that CAPE pretreatment significantly enhanced the survival rate of the intoxicated mice, confirming that CAPE retains its antitoxin efficacy during the infection process. Please see Figure S2.

      (3) CAPE does not bind to TcdB with high affinity as shown by SPR (Figure 4). A higher affinity may be necessary to inhibit TcdB during infection. The GTD binds with millimolar affinity and does not show saturable binding. Is the GTD the binding site for CAPE? Auto processing is also affected by CAPE indicating CAPE is binding non-GTD sites on TcdB.

      Thanks for your comments. Our findings indicate that the GTD domain is a critical binding site for CAPE. CAPE exerts its protective effects at multiple stages of TcdB-mediated cell death, including inhibiting TcdB's self-cleavage and blocking the activity of GTD, thereby preventing the glycosylation modification of Rac1 by TcdB.

      (4) In the infection model, CAPE does not statistically significantly attenuate weight loss during C. difficile infection (Figure 6). I recognize that weight loss is an indirect measure of C. difficile disease but histopathology also does not show substantial disease alleviation (see below).

      Thanks for your comments. Our comparative analysis revealed a notable distinction in the body weight of mice on the third day post-infection (Figure 6B). Similarly, the dry/wet stool ratio exhibited a comparable pattern, suggesting that treatment with phenethyl caffeic acid ameliorated Clostridium difficile-induced diarrhea to a significant degree (Figure 6C).

      (5) In the infection model (Figure 6), the histopathology analysis shows substantial improvement in edema but limited improvement in cellular infiltration and epithelial damage. Histopathology is probably the most critical parameter in this model and a compound with disease-modifying effects should provide substantial improvements.

      Thanks for your comments. Edema, inflammatory factor infiltration, and epithelial damage served as key evaluation metrics. Statistical analysis revealed that the pathological scores of mice treated with CAPE were markedly reduced compared to those in the model group (Figure 6F).

      (6) The reduction in C. difficile colonization is interesting. It is unclear if this is due to antitoxin activity and/or due to CAPE modifying the gut microbiota and metabolites (Figure 6). To interpret these data, a control is needed that has CAPE treatment without C. difficile infection or infection with an atoxicogenic strain.

      The observed reduction in C. difficile fecal colonization following drug treatment may be attributed to the CAPE's antitoxin properties or its capacity to modify the intestinal microbiota and metabolites. These two mechanisms likely work in tandem to combat CDI. CDI is primarily triggered by the toxins A (TcdA) and B (TcdB) secreted by the bacterium. Certain therapies, including monoclonal antibodies like bezlotoxumab, target CDI by neutralizing these toxins, thereby mitigating gut damage and subsequent C. difficile colonization(1,2). The establishment of C. difficile in the gut is intricately linked to the equilibrium of the intestinal microbiota. Although antibiotic treatments can inhibit C. difficile growth, they may also disrupt the microbial balance, potentially facilitating the overgrowth of other pathogens. Consequently, interventions such as fecal microbiota transplantation (FMT) are designed to reestablish gut flora balance and consequently decrease C. difficile colonization(3,4). Moreover, the administration of probiotics and prebiotics is considered to reduce C. difficile colonization by modifying the gut environment(5,6).

      (7) Similar to the CAPE data, the melatonin data does not display potent antitoxin activity and the mouse model experiment shows marginal improvement in the histopathological analysis (Figure 9). Using 100 µg/ml of melatonin (~ 400 micromolar) to inactivate TcdB in cell culture seems high. Can that level be achieved in the gut?

      The uptake and dissemination of melatonin within the body varies with the dose administered. For instance, in rats, the bioavailability of melatonin following administration was found to be 53.5%, whereas in dogs, bioavailability was nearly complete (100%) at a dose of 10 mg/kg, yet it decreased to 16.9% at a lower dose of 1 mg/kg(7). This data suggests that the absorption of melatonin differs across various animal species and is influenced by the dose administered. Moreover, it underscores the higher potential bioavailability of melatonin, implying that a dose of 200 mg/kg should be adequate to achieve the desired concentration in the body post-administration.

      (8) The following parameters should be considered and would aid in the interpretation of this work. Does CAPE directly affect the growth of C. difficile? Does CAPE affect the secretion of TcdB from C. difficile? Does CAPE alter the sporulation and germination of C. diffcile?

      We incorporated CAPE into the MIC assay for detecting C. difficile, as well as for assessing the sporulation capacity of C. difficile and evaluating the secretion level of TcdB. The findings revealed that CAPE markedly repressed tcdB transcription at a concentration of 16 μg/mL and effectively suppressed the growth and sporulation of C. difficile BAA-1870 at a concentration of 32 μg/mL. Please see Figure S3.

      References:

      (1) Skinner AM, et al. Efficacy of bezlotoxumab to prevent recurrent Clostridioides difficile infection (CDI) in patients with multiple prior recurrent CDI. Anaerobe. 2023 Dec; 84: 102788.

      (2) Wilcox MH, et al. Bezlotoxumab for Prevention of Recurrent Clostridium difficile Infection. N Engl J Med. 2017 Jan 26;376(4):305-317.

      (3) Khoruts A, Sadowsky MJ. Understanding the mechanisms of faecal microbiota transplantation. Nat Rev Gastroenterol Hepatol. 2016 Sep;13(9):508-16.

      (4) Khoruts A, Staley C, Sadowsky MJ. Faecal microbiota transplantation for Clostridioides difficile: mechanisms and pharmacology. Nat Rev Gastroenterol Hepatol. 2021 Jan;18(1):67-80.

      (5) Mills JP, Rao K, Young VB. Probiotics for prevention of Clostridium difficile infection. Curr Opin Gastroenterol. 2018 Jan;34(1):3-10.

      (6) Lau CS, Chamberlain RS. Probiotics are effective at preventing Clostridium difficile-associated diarrhea: a systematic review and meta-analysis. Int J Gen Med. 2016 Feb 22; 9:27-37.

      (7) Yeleswaram K, et al. Pharmacokinetics and oral bioavailability of exogenous melatonin in preclinical animal models and clinical implications. J Pineal Res. 1997 Jan;22(1):45-51.

      Reviewer #3 (Public review):

      Summary:

      The study is well written, and the results are solid and well demonstrated. It shows a field that can be explored for the treatment of CDI.

      Strengths:

      The results are really good, and the CAPE shows a good and promising alternative for treating CDI. The methodology and results are well presented, with tables and figures that corroborate them. It is solid work and very promising.

      Weaknesses:

      Some references are too old or missing.

      Thanks for your constructive suggestion. We have included and refreshed several references to enhance the manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      While the manuscript convincingly demonstrates that CAPE affects the TcdB toxin and reduces its toxicity in vitro, it would be beneficial to include data on the effect of CAPE on the growth of C. difficile. This would help ensure that the observed in vivo effects are not merely due to reduced bacterial growth but rather due to the specific action of CAPE on the toxin.

      Thanks for your constructive suggestion. We have augmented our findings with the impact of CAPE on the bacteria themselves, revealing that CAPE not only hampers the growth of the bacterial cells but also suppresses their capacity to produce spores. Please see Figure S3.

      (1) Line 41, line 115 - authors should clarify what they mean when mentioning Bacteroides within parentheses.

      Thanks for your comments. We have completed the corresponding modifications according to the suggestions.

      (2) Line 71 - Is C. difficile really found "in the environment"?

      Thanks for your comments. C. difficile is prevalent across various natural settings, including soil and water ecosystems. A study has identified highly diverse strains of this bacterium within environmental samples(1). Moreover, the significant presence of C. difficile in soil and lawn specimens collected near Australian hospitals indicates that the organism is indeed a common inhabitant in the environment(2).

      (3) Lines 128-130 - Was there a wash step here? What could be the impact of compound carryover in this experiment?

      Thanks for your comments. Following pre-incubation of TcdB with CAPE, remove the compounds that have not bound to TcdB through centrifugation. The persistence of the compound in the culture post-washing could result in an inflated assessment of its efficacy, particularly if it continues to engage with TcdB or the cells beyond the initial 1-hour pre-incubation window. The carryover of the compound might also give rise to misleading positive results, where the compound seems to confer protection or inhibition against TcdB-mediated cell rounding, whereas such effects are actually due to the lingering activity of the compound. This carryover could skew the determination of the compound's minimum effective concentration, as the effective concentration interacting with the cells might be inadvertently elevated. Furthermore, if the compounds possess cytotoxic properties or impact cell viability, carryover could generate artifacts in cell morphology that are unrelated to the direct interaction between TcdB and the compounds.

      (4) Lines 133-134 - I suggest authors mention how many caffeic acid derivatives there were in the entire library so that the suggested "enrichment" of them in the group of bioactive compounds can be better judged.

      Thanks for your comments. The natural compound library contained eight caffeic acid derivatives, of which methyl caffeic acid and ferulic acid displayed no efficacy. This information has been incorporated into the manuscript.

      (5) Line 135 - I recommend the authors add the molarity of the compound solutions used.

      Thanks for your comments. We have completed the corresponding modifications according to the suggestions.

      (6) Line 247 - I think the term "CAPE mice" is confusing. Please use a full description.

      Thanks for your comments. We have completed the corresponding modifications according to the suggestions.

      (7) Line 248 - I also think the terms "model mice" and "model group" are confusing. Maybe call them "control mice"?

      Thanks for your comments. The terms "model mice" and "model group" are indeed synonymous, and we have subsequently clarified that control mice refer to those that have not been infected with C. difficile.

      (8) Line 273 - "most abundant species at the genus level" is incorrect. I think what you mean is "most abundant TAXA".

      Thanks for your comments. We have completed the corresponding modifications according to the suggestions.

      (9) Line 278 - Please include your p-value cut-off together with the LDA score.

      Thanks for your comments. We have revised the above description to “LDA score > 3.5, p < 0.05”.

      (10) Line 292 - Details on how metabolomics was performed should be included here.

      Thanks for your comments. We have completed the corresponding modifications according to the suggestions.

      (11) Line 299 - 1.5 is a fairly low cut-off. The authors should at a minimum also include the p-value cut-off used.

      Response: Thanks for your comments. We have revised the above description to “fold change > 1.5, p < 0.05”.

      (12) Line 307 - Purine "degradation" would be better here.

      Thanks for your comments. We have completed the corresponding modifications according to the suggestions.

      (13) Line 328 onward - The melatonin experiment is a weird one. Although I fully understand the rationale behind testing the effect of melatonin in the mouse model, the idea that just because melatonin levels changed in the gut it would act as a direct inhibitor of TcdB was very far-fetched, even though it ended up working. Authors should explain this in the manuscript.

      Thanks for your comments. Furthermore, beyond our murine studies, we have confirmed that melatonin significantly diminishes TcdB-induced cytotoxicity at the cellular level (Figure 9A). Additionally, it has been documented that melatonin, acting as an antimicrobial adjuvant and anti-inflammatory agent, can decrease the recurrence of CDI(3). Consequently, we contend that the aforementioned statement is substantiated.

      (14) Lines 429-435 - There are seemingly contradictory pieces of information here. The authors state that adenosine is released from cells upon inflammation and that CAPE treatment caused an increase in adenosine levels. Later in this section, the authors state that adenosine prevents TcdA-mediated damage and inflammation. This should be clarified and better discussed.

      Thanks for your comments. Adenosine modulates immune responses and inflammatory cascades by interacting with its receptors, including its capacity to suppress the secretion of specific pro-inflammatory mediators. We have updated this depiction in the manuscript.

      (15) Lines 513-514 - How was this phenotype quantified?

      Thanks for your comments. Initially, we introduced TcdB at a final concentration of 0.2 ng/mL along with various concentrations of compounds into 1 mL of medium for a 1-h pre-incubation period. Subsequently, unbound compounds were removed through centrifugation, and the resulting mixture was then applied to the cells.

      (16) Figure 3 - panels are labeled incorrectly.

      Thanks for your comments. We have completed the corresponding modifications according to the suggestions.

      (17) Figure 5C - it is unclear what the different colors and labels represent.

      Thanks for your comments. In the depicted graph, blue denotes the total binding energy, red signifies the electrostatic interactions, green corresponds to the van der Waals forces, and orange indicates solvation or hydration effects. The horizontal axis represents the mutation of the amino acid residue at the respective position to alanine. As illustrated in Figure 5C, the mutations W520A and GTD exhibit the highest binding energies.

      References:

      (1) Janezic S, et al. Highly Divergent Clostridium difficile Strains Isolated from the Environment. PLoS One. 2016 Nov 23;11(11): e0167101.

      (2) Perumalsamy S, Putsathit P, Riley TV. High prevalence of Clostridium difficile in soil, mulch and lawn samples from the grounds of Western Australian hospitals. Anaerobe. 2019 Dec; 60:102065.

      (3) Sutton SS, et al. Melatonin as an Antimicrobial Adjuvant and Anti-Inflammatory for the Management of Recurrent Clostridioides difficile Infection. Antibiotics (Basel). 2022 Oct 25;11(11):1472.

      Reviewer #2 (Recommendations for the authors):

      Minor comments and questions.

      (1) Which form of TcdB is being used in these experiments?

      Thanks for your comments. The TcdB proteins used in this study are TcdB1 subtypes.

      (2) Why are THP-1 cells being used in these assays?

      Thanks for your comments. For the purposes of this study, we employed a diverse array of cell lines, including Vero, HeLa, THP-1, Caco-2, and HEK293T. Each cell line was selected to serve a specific experimental objective. The inclusion of the THP-1 cell line was necessitated by the need to incorporate a macrophage cell line to ensure the comprehensive nature of our experiments, allowing for the testing of both epithelial cells and macrophages. C. difficile is a kind of intestinal pathogenic bacteria, and immune clearance plays a vital role in the process of pathogen infection, so THP-1 cells are used as important immune cells.

      (3) Please improve the quality of the microscopy images in Figure 1.

      Thanks for your comments. We have improved the quality of the microscopy images in Figure 1.

      (4) Does the flow cytometry experiment in Figure 2B show internalization? Surface-bound toxins would provide the same histogram.

      Thanks for your comments. Figure 2B was employed to assess the internalization of TcdB, and the findings indicate that CAPE does not influence the internalization process of TcdB.

      (5) The sensogram in Figure 4A does not look typical and should be clarified.

      Thanks for your comments. Typically, small molecules and proteins engage in a rapid binding and dissociation dynamic. However, as depicted in Figure 4A, the interaction between CAPE and TcdB demonstrates a gradual progression towards equilibrium. This behavior can be primarily explained by the swift occupation of the protein's primary binding sites by the small molecule in the initial stages. Subsequently, CAPE binds to secondary or lower affinity sites, extending the time needed to reach equilibrium. Additionally, the likelihood of CAPE binding to multiple sites on TcdB requires time for the exploration and occupation of these diverse locations before equilibrium is attained, we have incorporated an analysis of this potential scenario into the manuscript.

      Reviewer #3 (Recommendations for the authors):

      These are my suggestions for the text:

      (1) Line 29: high recurrent rates.

      Thanks for your comments. We have completed the corresponding modifications according to the suggestions.

      (2) Line 32: Where is the caffeic acid identified? I think a line should be included.

      Thanks for your comments. Caffeic acid was identified from natural compounds library and we have completed the corresponding modifications according to the suggestions.

      (3) Line 39: C. difficile is not italic.

      Thanks for your comments. We have completed the corresponding modifications according to the suggestions.

      (4) Line 41: Bacteroides spp.

      Thanks for your comments. We have completed the corresponding modifications according to the suggestions.

      (5) Line 56: This number of casualties 56.000 is still happening or it was in the past?

      Thanks for your comments. The mortality rates reported in the manuscript reflect a downturn in the incidence and fatality of CDI around 2017(1), as the infection gained broader recognition. Nonetheless, a recent study reveals that the mortality rate for CDI cases in Germany can soar to 45.7% within a year, with the overall economic burden amounting to approximately 1.6 billion euros. This underscores the ongoing significance of CDI as a global public health challenge(2).

      (6) Line 104: Where did the idea of testing caffeic acid come from? Any previous study of the authors? Any studies with the inhibition of other pathogens?

      Thanks for your comments. Initially, we conducted a screen of a compound library comprising 2,076 compounds and identified several potent inhibitors, which, upon structural analysis, were revealed to be caffeic acid derivatives. Prior to our investigation, no studies had explored the potential of CAPE in this context.

      (7) Line 115: Bacteroides spp.

      Thanks for your comments. We have completed the corresponding modifications according to the suggestions.

      Results section

      (8) Did the authors try the caffeic acid with the TcdA or binary toxin? I know this is not the purpose of the study, but TcdA toxin has a high identity structure with TcdB and generates inflammation in the gut via neutrophils. Negative strains for the major toxins and positive for the binary toxin also cause severe cases of CDI.

      Thanks for your comments. Although we acknowledge the significance of TcdA and binary toxins in CDI, we did not investigate the impact of CAPE on these toxins. Our focus was exclusively on the effect of CAPE against TcdB, as it is the primary virulence factor in C. difficile pathogenesis. Since TcdA and TcdB are highly similar in structure, we will analyze the neutralization effect of CAPE on TcdA in later studies.

      (9) Does caffeic acid have any effect on C. difficle? Or does it only gain the toxins? That would be ideal.

      Thanks for your comments. We have included additional related assays in our study. Beyond directly neutralizing TcdB, CAPE also demonstrates the capacity to inhibit the growth and spore formation of C. difficile.

      (10) Line 230: C. difficile BAA-1870 is a clinical strain? There are no details about it in the paper.

      Thanks for your comments. C. difficile BAA-1870 (RT027/ST1), a highly virulent isolate frequently employed in research(3-6), was kindly donated by Professor Aiwu Wu. We have meticulously noted the PCR ribotype in our manuscript.

      (11) Line 236: Did the mice fully recover from CDI after the administration of the CAPE? Was one dose enough?

      Thanks for your comments. CAPE was administered orally at 24 h intervals, commencing with the initial dose on Day 0. By the time a significant difference was observed on Day 3, the treatment had been administered a total of three times.

      Methodology

      (12) Most of the methods do not have a reference.

      Thanks for your comments. We have added several references to the methods.

      Discussion section

      (13) The first two paragraphs of the discussion should be summarized. Those details were already explained in the introduction.

      Thanks for your comments. The discussion section and the introduction address slightly different focal points; therefore, we aim to retain the first two paragraphs to maintain continuity and context.

      (14) Line 382: Bezolotoxumab was approved by the FDA in 2016. It is not recent.

      Thanks for your comments. We have revised the above description.

      (15) Line 410: "Despite the high 410 cure rate and increasing popularity of FMT, its safety remains controversial. Although this is true, recently (2022) the FDA approved the Rebyota, which was later cited by the authors.

      Thanks for your comments. We have revised the above description.

      (16) Lines 415-416: "the abundance of Bacteroides, a critical gut microbiota component that is required for C. difficile resistance". There is only one reference cited by the authors. I suppose that if it is true, more studies should be mentioned. Why are probiotics with Bacteroides spp. not available in the market?

      Thanks for your comments. We have supplemented additional references. The scarcity of probiotic products containing Bacteroides spp. on the market is primarily attributable to the stringent requirements of their survival conditions. As most Bacteroides spp. are anaerobic, they thrive in oxygen-deprived environments. This unique survival trait poses challenges in maintaining their viability during product preservation and distribution, which in turn escalates production costs and complexity. Furthermore, despite the significant role of Bacteroides in gut health, research into its potential probiotic benefits and safety is comparatively underexplored.

      References:

      (1) Guh AY, et al. Emerging Infections Program Clostridioides difficile Infection Working Group. Trends in U.S. Burden of Clostridioides difficile Infection and Outcomes. N Engl J Med. 2020 Apr 2;382(14):1320-1330.

      (2) Schley K, et al. Costs and Outcomes of Clostridioides difficile Infections in Germany: A Retrospective Health Claims Data Analysis. Infect Dis Ther. 2024 Nov 20.

      (3) Saito R, et al. Hypervirulent clade 2, ribotype 019/sequence type 67 Clostridioides difficile strain from Japan. Gut Pathog. 2019 Nov 4; 11:54.

      (4) Pellissery AJ, Vinayamohan PG, Venkitanarayanan K. In vitro antivirulence activity of baicalin against Clostridioides difficile. J Med Microbiol. 2020 Apr;69(4):631-639.

      (5) Shao X, et al. Chemical Space Exploration around Thieno[3,2-d]pyrimidin-4(3H)-one Scaffold Led to a Novel Class of Highly Active Clostridium difficile Inhibitors. J Med Chem. 2019 Nov 14;62(21):9772-9791.

      (6) Mooyottu S, Flock G, Venkitanarayanan K. Carvacrol reduces Clostridium difficile sporulation and spore outgrowth in vitro. J Med Microbiol. 2017 Aug;66(8):1229-1234.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary

      Chabukswar et al analysed endogenous retrovirus (ERV) Env variation in a set of primate genomes using consensus Env sequences from ERVs known to be present in hominoids using a Blast homology search with the aim of characterising env gene changes over time. The retrieved sequences were analysed phylogenetically, and showed that some of the integrations are LTR-env recombinants.

      Strengths

      The strength of the manuscript is that such an analysis has not been performed yet for the subset of ERV Env genes selected and most of the publicly available primate genomes.

      Weaknesses

      Unfortunately, the weaknesses of the manuscript outnumber its strengths. Especially the methods section does not contain sufficient information to appreciate or interpret the results. The results section contains methodological information that should be moved, while the presentation of the data is often substandard. For instance, the long lists of genomes in which a certain Env was found could better be shown in tables. Furthermore, there is no overview of the primate genomes Saili how did you answer to this?, or accession numbers, used. It is unclear whether the analyses, such as the phylogenetic trees, are based on nucleotide or amino acid sequences since this is not stated. tBLASTn was used in the homology searches, so one would suppose aa are retrieved. In the Discussion, both env (nt?) and Env (aa?) are used.

      For the non-hominoids, genome assembly of publicly available sequences is not always optimal, and this may require Blasting a second genome from a species. Which should for instance be done for the HML2 sequences found in the Saimiri boliviensis genome, but not in the related Callithrix jacchus genome. Finally, the authors propose to analyse recombination in Env sequences but only retrieve env-LTR recombinant Envs, which should likely not have passed the quality check.

      Since the Methods section does not contain sufficient information to understand or reproduce the results, while the Results are described in a messy way, it is unclear whether or not the aims have been achieved. I believe not, as characterisation of env gene changes over time is only shown for a few aberrant integrations containing part of the LTR in the env ORF.

      We thank the reviewer for the critiques of the manuscript and their constructive suggestions to improve the clarity, methodological rigor, and data presentation.

      (1) The concern regarding the insufficient data in the methods has been resolved in the revised manuscript by adding a supplementary file that contains the genome assemblies that  were used to perform the tBLAStn analysis using the reconstructed Env sequences. The requested accession numbers are available for all sequences in the supplementary phylogenetic figures.

      (2) We have also modified the manuscript by moving a portion of the results section in the methods section, in particular all the methodological description of the reconstruction of Env part (Line 197-231).

      (3) As suggested, the long list of genomes mentioned in the results section in which the Env tBLASTn hits were obtained are now provided in the table form (Table 2) as an overall summary of the distribution of ERV Env in the genomes and the genome assemblies are mentioned in Supplementary file 2.

      (4) As for the point regarding the tBLASTn usage in the homology searches, we first performed tBLASTn analysis using the reconstructed Env amino acid sequences as query and performed tBLASTn similarity search in the primate genomes. The tBLASTn algorithm uses the amino acid sequences to compare with the translated nucleotide database in all six frames and hence the hits obtained are nucleotide sequences (Line 381-383). These nt sequences were used for all the further analysis such as sequence alignment, phylogenetic analysis and recombination analysis. For better clarity, we have specified the use of env nt alignments in the methods section to avoid the raised confusion in the discussion.

      (5) For the HML supergroup characterization in squirrel monkey genome (Saimiri boliviensis), we used the tBLASTn hits obtained in the S. boliviensis from the initial analysis to perform the comparative genomics in two Platyrrhini genomes available on UCSC Genome browser. In particular, this analysis was performed to confirm the presence of specific members of HML supergroup in squirrel monkey genomes that has not been previously reported. We used the available genome assemblies because of the annotations available on Genome browser, and especially the possibility to use the repeatmasker tracks and the comparative genomics tools in order to use the human genome as a reference. We reported the coordinates for the members of HML supergroup that were retrieved through the comparative genomic assemblies by applying the repeat masker custom track, that have many ERVS that are not present in NCBI reference genomes.

      (6) The concern regarding only retrieving env-LTR recombinant Envs has been addressed in the revised results section (Lines 747-758). As also mentioned in the methods section, the RDP software detects the recombinant sequences and a breakpoint position for the recombinant signals and hence we confirmed only those sequences that were predicted as potential recombinant sequences by the RDP software through comparative genomics. All the sequences predicted by the software were env-LTR recombinant and hence we confirmed and reported only those recombinant sequences in the manuscript.

      Reviewer #1 (Recommendations for the authors):

      The paper could be strengthened by:

      - a rigorous rewriting and shortening of the manuscript, thereby eliminating all textbook-like paragraphs, and all biological misinterpretations and confusions. Distinguish between retroviral replication as an exogenous virus, and host genome remodeling affecting ERVs. Rewrite the sections on template switching by RT being the basis for the observed recombinations, while host genome recombinations are far more likely. ERVs with such aberrant env/LTR gene recombination are unlikely to be fit for cross-species transmission. Likely, such a recombinant was generated in a common ancestor. Also, host RNA polymerase II transcribes retroviral RNA (line 79), not RT.

      - check lines 89-90 as pro is part of the pol gene in gamma- and lentiviruses.

      We thank the reviewer for the suggestion, we have revised the manuscript by shortening the introduction section and eliminating the textbook like paragraphs and also clarifying the recombination mechanism. We have revised the introduction section at Lines 102-111, and the clarification for the recombination mechanism is provided at lines 1668-1675

      - adding much more information to the Methods section. Such as which genomes were searched, were nt or aa have been retrieved and analysed, were multiple genomes of a species searched, a list of databases used ('various databases' in line 164 does not suffice), etc.

      We thank the reviewer for the observation. As mentioned above, in the revised manuscript we have provided more detailed methods by including a supplementary file for the genome assemblies used for tBLASTn analysis and comparative genomics. For the sequence alignment, phylogenetic analysis and recombination analysis we used nt sequences, as it is also mentioned in the revised version. Lastly, all the databases that were used and are mentioned in the methods section.

      - more information is needed on the alignments and phylogenetic trees. For instance, how were indels treated? How long were the alignments on average regarding informative sites?

      We thank the reviewer for the questions, to answer them we have added a paragraph (Lines 359-362) describing the reconstruction process in more details.

      - confirm the findings about the presence or absence of an ERV, such as for the squirrel monkey genome, using additional genomes of the species

      As mentioned above, we only used the genome assemblies available on the genome browser because of the annotations available on Genome browser, blasting the second NCBI RefSeq genome using the BLAST algorithm does not provide accurate information and annotations compared to that of Genome browser and hence we reported the coordinates for the members of HML supergroup that were retrieved through the comparative genomic assemblies by applying the repeat masker custom track, that have many ERVS that are not present in NCBI reference genomes.

      - present the lists of findings in primate genomes on pages 9 and 10 in tables

      We thank the reviewer for the suggestion, we have provided a new table (Table 2) in the revised version summarizing the ERV Env distribution results.

      - a significant limitation of the study is that only env ERVs found in hominoids have been searched in OWM and NWM, not ones specific for monkeys. This should be mentioned somewhere.

      As the reviewer pointed out, the study was designed to explore ERVs’ Env  sequences in hominoids which were then searched in the OWM and NWM genomes, this is now better stated in the introduction at Lines 57-60.

      - define abbreviations at first use (e.g. HML in abstract)

      We thank the reviewer for the suggestion, we have mentioned the abbreviations in the abstract, where we mentioned HML first (Line 65)

      - explain 'pathological domestication' (line 42). Domestication implies usefulness to the host. And over time, deleterious insertions would have been likely purged from a population.

      We thank the reviewer for the observation, we have modified the sentence and provided a clearer explanation for the pathological and physiological consequences of ERVs’ env (lines 52-57).

      Furthermore:

      - why begin the discussion with a lengthy description of domestication and syncytins, which is not part of the current study?

      We thank the reviewer for the critique. Accordingly, we have now modified the discussion section by shortening the part about domestication of syncytins, and just mentioned them as an example at lines 942-944.

      - how can 96 hits have been retrieved for spuma-like envs (line 506), while it was earlier reported (line 333), that the most hits were gamma-like?

      We thank the reviewer for the observation, we have clarified and explained how 96 hits have been retrieved for spuma-like envs in lines 670-677 of the discussion section.

      English grammar should be improved throughout the manuscript.

      And I could not open half of the supplementary files

      As suggested we have revised English and checked that all files were correctly open.

      Reviewer #2 (Public Review):

      Summary:

      The manuscript by Chabukswar et al. describes a comprehensive attempt to identify and describe the diversity of retroviral envelope (env) gene sequences present in primate genomes in the form of ancient endogenous retrovirus (ERV) sequences.

      Strengths:

      The focus on env can be justified because of the role the Env proteins likely played in determining viral tropism and host range of the viruses that gave rise to the ERV insertions, and to a lesser extent, because of the potential for env ORFs to be coopted for cellular functions (in the rare cases where the ORF is still intact and capable of encoding a functional Env protein). In particular, these analyses can reveal the potential roles of recombination in giving rise to novel combinations of env sequences. The authors began by compiling env sequences from the human genome (from human endogenous retrovirus loci, or "HERVs") to build consensus Env protein sequences, and then they use these as queries to screen other primate genomes for group-specific envs by tBLASTn. The "groups" referred to here are previously described, as unofficial classifications of endogenous retrovirus sequences into three very broad categories - Class I, Class II and Class III. These are not yet formally recognized in retroviral taxonomy, but they each comprise representatives of multiple genera, and so would fall somewhere between the Family and Genus levels. The retrieved sequences are subject to various analyses, most notably they are screened for evidence of recombination. The recombinant forms appear to include cases that were probably viral dead-ends (i.e. inactivating the env gene) even if they were propagated in the germline.

      The availability of the consensus sequences (supplement) is also potentially useful to others working in this area.

      Weaknesses:

      The weaknesses are largely in presentation. Discussions of ERVs are always complicated by the lack of a formal and consistent nomenclature and the confusion between ERVs as loci and ERVs as indirect information about the viruses that produced them. For this reason, additional attention needs to be paid to precise wording in the text and/or the use of illustrative figures.

      We thank the reviewer for the general observation. We put additional attention to the wording in text/figures, and hope to have improved the manuscript clarity.

      Reviewer #2 (Recommendations for the authors):

      Reviewing the manuscript was a challenge because figures were difficult to read. As provided, the fonts were sometimes too small to read in a standard layout and had to be expanded on screen.

      The tree in Figure 3 could also be made easier to read, for example if the authors collapsed related branches and gave the clusters a single, clear label (this is not necessary, just a suggestion) - especially if the supplementary trees have all the labelled branches for any readers who want specific details.

      I also recommend asking a third party (perhaps a scientific colleague) with fluency in English grammar and familiarity with English scientific idiom to provide some editorial feedback on the text.

      Figure 4 legend is confusing. From the description it sounds like the tree in 4B is a host phylogeny, but it's not clearly stated. And if so, how was the tree generated? Is it based on entire genomes? Include at least enough methodological detail or citations that someone could recreate it, if necessary. The details and how it was done should be briefly mentioned here and in detail in the Methods section.

      We thank the reviewer for the observation. As for Figure 4 we have modified its legend and more clearly stated how the phylogenetic tree of the primate genomes was generated using TimeTree. We have also provided further details in the methods section (Lines 475-489).

      As suggested we have revised English.

      Line 42 - what is "pathological domestication"? It sounds like a contradiction in terms.

      We thank the reviewer for the observation. We have modifies the sentence and provided clearer explanation for the pathological and physiological consequences of ERVs’ env (lines 52-57).

      Lines 166-167 - the authors use the word "classes" but then use a list of terms that correspond to genera within the Retroviridae. The authors should be cautious here, as "class" and "genus" are both official taxonomic terms with different meanings. Do they mean genus? Or, if a more informal term is needed, perhaps "group"?

      Thank you for the observation, the ERVs have been classified into three classes (Class I, II and III) based on the relatedness to the exogenous retroviruses Gammaretrovirus, Betaretrovirus and Spumaretrovirus genera respectively and hence have been mentioned in the manuscript as per the nomenclature proposed by Gifford et al., 2018 which has been cited at Lines 122-125.

      Line 221- "defferent" should be "different"

      Corrected

      Lines 233-234 - what is meant by "canonical" and "non-canonical" forms? Can the authors please define these two terms?

      Thank you for the question, canonical refers to sequences that are well-preserved and match the structural and functional features of complete env genes, and non-canonical refers to sequences with significant structural alterations or truncations that deviate from this typical form. This explanation has been mentioned in the revised version at Lines 475-479.

      Line 252 - if/is

      Corrected

      Lines 274-276 needs a citation to the paper(s) that reported this.

      Corrected

      Line 283-285 - this was confusing. How could the authors have noted distinct occurrences and clusters of these if they were excluded from the BLAST analysis? It says the consensus sequences were effectively representing these, but doesn't this raise the possibility that the consensus sequences are not specific enough? Could this also then lead to false identification? Perhaps a few more words to explain should be added.

      We thank the reviewer for the observation. While performing the tBlastn search we did obtain the hits for HERV15, HERVR, ERVV1, ERVV2 and PABL, and we have mentioned the detailed explanation about this observation in the revised manuscript at lines 619-627.

      Line 298 - missing comma

      Corrected

      Lines 348-351- this list is not a list of recombination mechanisms. Template switching is a mechanism of recombination, but "acquisition" is simply a generic term, "degradation" is not a mechanism, and "cross-species transmission" might be a driver or a result of recombination, but it is not a mechanism of recombination.

      We thank the reviewer for the observation. We have revised the explanation for the recombination events in the discussion section, as some parts of the results have been moved to discussion section (Lines 1058-1065)

      Lines 369-372. It's not clear why this means the event was a "very recent occurrence". Do the authors mean that there were shared integration sites between some of the species, and that these sites lacked the insertions in other species (e.g. gibbon, orangutan, monkeys)?

      For the long section on recombination events involving an env sequence with an LTR in it, can the authors explain how they know when it's a recombination event versus integration of one provirus into another one, followed by recombination between LTRs to generate a solo-LTR?

      We thank the reviewer for the observation. Regarding the very recent occurrence of the recombination event, we have explained it in revised manuscript at lines 769-824 writing “In fact, the recombinant sequences were shared only between 4 species of Catarrhini parvorder and were absent in more distantly related primates (such as gibbons, orangutans, etc.). This with the presence of shared recombination sites suggests that the insertion occurred after the divergence of these species, while its absence in others indicate that it is a recombination event.”

      For the observation regarding the env-LTR recombination events, the recombinants were first detected by the RDP software and were further validated through the BLAT search in the genomes available on genome browser. The explanation on how we obtained these env-LTR recombination events is now provided in lines 746-763 of the revised manuscript.

      Methods Lines 151-168 and Figure 1 legend Lines 689-690 - how did the authors distinguish between "translated regions" corresponding to the actual Env protein sequence from translation of the other two reading frames? That is, there must have been substantial "translatable" stretches of sequence in the two incorrect reading frames as well as the reading frame corresponding to Env, so the question is how were the correct ones identified for the reconstruction?

      We thank the reviewer for the observation. We have provided the detailed explanation to the observation in the methods section (Lines 335-359).

      Line 495 - "previously reported" should include citation(s) of the prior report(s).

      We thank the reviewer for the observation, we have provided appropriate citations.

      Line 525 - the authors propose that the mechanism "is the co-packaging of different ERVs in a virus particle". First, I assume they meant to say that RNA from different ERVs is co-packaged. Second, isn't it also possible or likely that these could arise from co-packaging of exogenous retrovirus RNAs and recombination, especially if the related exogenous forms were still circulating at the time these things arose?

      We thank the reviewer for the observation. We have modified in the revised manuscript a proposed mechanism that includes also the possibility of co-packaging of exogenous retrovirus RNAs and recombination, at lines 1082-1099

      Line 686 - env should either be italicized (gene) or capitalized (protein), depending on what the authors intended here.

      We thank the reviewer for the observation. We have corrected the typological error in the new version of manuscript.

      Reviewer #3 (Public review):

      Summary:

      Retroviruses have been endogenized into the genome of all vertebrate animals. The envelope protein of the virus is not well conserved and acquires many mutations hence can be used to monitor viral evolution. Since they are incorporated into the host genome, they also reflect the evolution of the hosts. In this manuscript the authors have focused their analyses on the env genes of endogenous retroviruses in primates. Important observations made include the extensive recombination events between these retroviruses that were previously unknown and the discovery of HML species in genomes prior to the splitting of old and new world monkeys.

      Strengths:

      They explored a number of databases and made phylogenetic trees to look at the distribution of retroviral species in primates. The authors provide a strong rationale for their study design, they provide a clear description of the techniques and the bioinformatics tools used.

      Weaknesses:

      The manuscript is based on bioinformatics analyses only. The reference genomes do not reflect the polymorphisms in humans or other primate species. The analyses thus likely underestimates the amount of diversity in the retroviruses. Further experimental verification will be needed to confirm the observations.

      Not sure which databases were used, but if not already analyzed, ERVmap.com and repeatmesker are ones that have many ERVs that are not present in the reference genomes. Also, long range sequencing of the human genome has recently become available which may also be worth studying for this purpose.

      We thank the reviewer for the observations and comments. We would like to clarify that the intent of the work was to perform bioinformatics analysis and so a wet lab experimental verification of the observations are out of the scope of the present manuscript. For the aim of the manuscript, we have used the NCBI reference genomes, while for the report of the coordinates of HML supergroup in the squirrel monkey genome and the coordinates of the recombination events through BLAT search we have used genomes assemblies available on Genome browser with repeat masker custom track, since it has well represented ERV annotations.

      The suggestion regarding using long range sequencing of human genome is an interesting perspective and hence in the future work we will try to implement it in our analysis as well as perform an experimental verification, since, again, the focus of the present work does not include wet experimental part.

      Reviewer #3 (Recommendations for the authors):

      In a few places the term HERV has been used when describing ERVs in non-human primates. This needs to be corrected.

      We thank the reviewer for the observation. We have checked and accordingly modified the terms in the manuscript wherever necessary.

    1. Author response:

      eLife Assessment

      This study provides a valuable contribution to understanding how negative affect influences food-choice decision making in bulimia nervosa, using a mechanistic approach with a drift diffusion model (DDM) to examine the weighting of tastiness and healthiness attributes. The solid evidence is supported by a robust crossover design and rigorous statistical methods, although concerns about low trial counts, possible overfitting, and the absence of temporally aligned binge-eating measures limit the strength of causal claims. Addressing modeling transparency, sample size limitations, and the specificity of mood induction effects, would enhance the study's impact and generalizability to broader populations.

      We thank the Editor and Reviewers for their summary of the strengths of our study, and for their thoughtful review and feedback on our manuscript. We apologize for the confusion in how we described the multiple steps performed and hierarchical methods used to ensure that the model we report in the main text was the best fit to the data while not overfitting. We are not certain about what is meant by “[a]ddressing model transparency,” but as described in our response to Reviewer 1 below, we have now more clearly explained (with references) that the use of hierarchical estimation procedures allows for information sharing across participants, which improves the reliability and stability of parameter estimates—even when the number of trials per individual is small. We have clarified for the less familiar reader how our Bayesian model selection criterion penalizes models with more parameters (more complex models). Although details about model diagnostics, recoverability, and posterior predictive checks are all provided in the Supplementary Materials, we have clarified for the less familiar reader how each of these steps ensures that the parameters we estimate are not only identifiable and interpretable, but also ensure that the model can reproduce key patterns in the data, supporting the validity of the model. Additionally, we have provided all scripts for estimating the models by linking to our public Github repository. Furthermore, we have edited language throughout to eliminate any implication of causal claims and acknowledged the limitation of the small sample size.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Using a computational modeling approach based on the drift diffusion model (DDM) introduced by Ratcliff and McKoon in 2008, the article by Shevlin and colleagues investigates whether there are differences between neutral and negative emotional states in:

      (1) The timings of the integration in food choices of the perceived healthiness and tastiness of food options between individuals with bulimia nervosa (BN) and healthy participants.

      (2) The weighting of the perceived healthiness and tastiness of these options.

      Strengths:

      By looking at the mechanistic part of the decision process, the approach has the potential to improve the understanding of pathological food choices. The article is based on secondary research data.

      Weaknesses:

      I have two major concerns and a major improvement point.

      The major concerns deal with the reliability of the results of the DDM (first two sections of the Results, pages 6 and 7), which are central to the manuscript, and the consistency of the results with regards to the identification of mechanisms related to binge eating in BN patients (i.e. last section of the results, page 7).

      (1) Ratcliff and McKoon in 2008 used tasks involving around 1000 trials per participant. The Chen et al. experiment the authors refer to involves around 400 trials per participant. On the other hand, Shevlin and colleagues ask each participant to make two sets of 42 choices with two times fewer participants than in the Chen et al. experiment. Shevlin and colleagues also fit a DDM with additional parameters (e.g. a drift rate that varies according to subjective rating of the options) as compared to the initial version of Ratcliff and McKoon. With regards to the number of parameters estimated in the DDM within each group of participants and each emotional condition, the 5- to 10-fold ratio in the number of trials between the Shevlin and colleagues' experiment and the experiments they refer to (Ratcliff and McKoon, 2008; Chen et al. 2022) raises serious concerns about a potential overfitting of the data by the DDM. This point is not highlighted in the Discussion. Robustness and sensitivity analyses are critical in this case.

      We thank the Reviewer for their thoughtful critique. We agree that a limited number of trials can forestall reliable estimation, which we acknowledge in the Discussion section. However, we used a hierarchical estimation approach which leverages group information to constrain individual-level estimates. This use of group-level parameters to inform individual-level estimates reduces overfitting and noise that can arise when trial counts are low, and the regularization inherent in hierarchical fitting prevents extreme parameter estimates that could arise from noisy or limited data (Rouder & Lu, 2005). As a result, hierarchical estimation has been repeatedly shown to work well in settings with low trial counts, including as few as 40 trials per condition (Ratcliff & Childers, 2015; Wiecki et al., 2013), and previous applications of the time-varying DDM to food choice task data has included experiments with as few as 60 trials per condition (Maier et al., 2020). We have added references to these more recent approaches and specifically note their advantages for the modeling of tasks with fewer trials. Additionally, our successful parameter recovery described in the Supplementary Materials supports the robustness of the estimation procedure and the reliability of our results.

      The authors compare different DDMs to show that the DDM they used to report statistical results in the main text is the best according to the WAIC criterion. This may be viewed as a robustness analysis. However, the other DDM models (i.e. M0, M1, M2 in the supplementary materials) they used to make the comparison have fewer parameters to estimate than the one they used in the main text. Fits are usually expected to follow the rule that the more there are parameters to estimate in a model, the better it fits the data. Additionally, a quick plot of the data in supplementary table S12 (i.e. WAIC as a function of the number of parameters varying by food type in the model - i.e. 0 for M0, 2 for M1, 1 for M2 and 3 for M3) suggests that models M1 and potentially M2 may be also suitable: there is a break in the improvement of WAIC between model M0 and the three other models. I would thus suggest checking how the results reported in the main text differ when using models M1 and M2 instead of M3 (for the taste and health weights when comparing M3 with M1, for τS when comparing M3 with M2). If the differences are important, the results currently reported in the main text are not very reliable.

      We thank the Reviewer for highlighting that it would be helpful for the paper to explicitly note that we specifically selected WAIC as one of two methods to assess model fit because it penalizes for model complexity. We now explicitly state that, in addition to being more robust than other metrics like AIC or BIC when comparing hierarchical Bayesian models like those in the current study, model fit metrics like WAIC penalize for model complexity based on the number of parameters (Watanabe, 2010). Therefore, it is not the case that more complex models (i.e., having additional parameters) would automatically have lower WAICs. Additionally, we note that our second method to assess model fit, posterior predictive checks demonstrate that only model M3 can reproduce key behavioral patterns present in the empirical data. As described in the Supplementary Materials, M1 and M2 miss those patterns in the data. In summary, we used best practices to assess model fit and reliability (Wilson & Collins, 2019): results from the WAIC comparison (which in fact penalizes models with more parameters) and results from posterior predictive checks align in showing that M3 best fit to our data. We have added a sentence to the manuscript to state this explicitly.

      (2) The second main concern deals with the association reported between the DDM parameters and binge eating episodes (i.e. last paragraph of the results section, page 7). The authors claim that the DDM parameters "predict" binge eating episodes (in the Abstract among other places) while the binge eating frequency does not seem to have been collected prospectively. Besides this methodological issue, the interpretation of this association is exaggerated: during the task, BN patients did not make binge-related food choices in the negative emotional state. Therefore, it is impossible to draw clear conclusions about binge eating, as other explanations seem equally plausible. For example, the results the authors report with the DDM may be a marker of a strategy of the patients to cope with food tastiness in order to make restrictive-like food choices. A comparison of the authors' results with restrictive AN patients would be of interest. Moreover, correlating results of a nearly instantaneous behavior (i.e. a couple of minutes to perform the task with the 42 food choices) with an observation made over several months (i.e. binge eating frequency collected over three months) is questionable: the negative emotional state of patients varies across the day without systematically leading patients to engage in a binge eating episode in such states.

      I would suggest in such an experiment to collect the binge craving elicited by each food and the overall binge craving of patients immediately before and after the task. Correlating the DDM results with these ratings would provide more compelling results. Without these data, I would suggest removing the last paragraph of the Results.

      We thank the Reviewer for these interesting suggestions and appreciate the opportunity to clarify that we agree that claims about causal connections between our decision parameters and symptom severity metrics would be inappropriate. Per the Reviewer’s suggestions, we have eliminated the use of the word “predict” to describe the tested association with symptom metrics.  We also agree that more time-locked associations with craving ratings and near-instantaneous behavior would be useful, and we have added this as an important direction for future research in the discussion. However, associating task-based behavior with validated self-report measures that assess symptom severity over long periods of time that precede the task visit (e.g., over the past 2 weeks in depression, over the past month in eating disorders) is common practice in computational psychiatry, psychiatric neuroimaging, and clinical cognitive neuroscience (Hauser et al., 2022; Huys et al., 2021; Wise et al., 2023), and this approach has been used several times specifically with food choice tasks (Dalton et al., 2020; Steinglass et al., 2015). We have revised the language throughout the manuscript to clarify: the results suggest that individuals whose task behavior is more reactive to negative affect tend to be the most symptomatic, but the results do not allow us to determine whether this reactivity causes the symptoms.

      In response to this Reviewer’s important point about negative affect not always producing loss-of-control eating in individuals with BN, we also now explicitly note that while several studies employing ecological momentary assessments (EMA) have repeatedly shown that increases in negative affect significantly increase the likelihood of subsequent loss-of-control eating (Alpers & Tuschen-Caffier, 2001; Berg et al., 2013; Haedt-Matt & Keel, 2011; Hilbert & Tuschen-Caffier, 2007; Smyth et al., 2007), not all loss-of-control eating occurs in the context of negative affect, and that future studies should integrate food choice task data pre and post-affect inductions with measures that capture the specific frequency of loss of control eating episodes that occur during states of high negative affect.

      (3) My major improvement point is to tone down as much as possible any claim of a link with binge eating across the entire manuscript and to focus more on the restrictive behavior of BN patients in between binge eating episodes (see my second major concern about the methods). Additionally, since this article is a secondary research paper and since some of the authors have already used the task with AN patients, if possible I would run the same analyses with AN patients to test whether there are differences between AN (provided they were of the restrictive subtype) and BN.

      We appreciate the Reviewer’s perspective and suggestions. We have adjusted our language linking loss-of-control eating frequency with decision parameters, and we have added additional sentences focusing on the implications for the restrictive behavior of patients with BN between binge eating episodes. In the Supplementary Materials. We have added an analysis of the restraint subscale of the EDE-Q and confirmed no relationship with parameters of interest. While we agree additional analyses with AN patients would be of interest, this is outside the scope of the paper. Our team have collected data from individuals with AN using this task, but not with any affect induction or measure of affect. Therefore, we have added this important direction for future research to the discussion.

      Reviewer #2 (Public review):

      Summary:

      Binge eating is often preceded by heightened negative affect, but the specific processes underlying this link are not well understood. The purpose of this manuscript was to examine whether affect state (neutral or negative mood) impacts food choice decision-making processes that may increase the likelihood of binge eating in individuals with bulimia nervosa (BN). The researchers used a randomized crossover design in women with BN (n=25) and controls (n=21), in which participants underwent a negative or neutral mood induction prior to completing a food-choice task. The researchers found that despite no differences in food choices in the negative and neutral conditions, women with BN demonstrated a stronger bias toward considering the 'tastiness' before the 'healthiness' of the food after the negative mood induction.

      Strengths:

      The topic is important and clinically relevant and methods are sound. The use of computational modeling to understand nuances in decision-making processes and how that might relate to eating disorder symptom severity is a strength of the study.

      Weaknesses:

      The sample size was relatively small and may have been underpowered to find differences in outcomes (i.e., food choice behaviors). Participants were all women with BN, which limits the generalizability of findings to the larger population of individuals who engage in binge eating. It is likely that the negative affect manipulation was weak and may not have been potent enough to change behavior. Moreover, it is unclear how long the negative affect persisted during the actual task. It is possible that any increases in negative affect would have dissipated by the time participants were engaged in the decision-making task.

      We thank the Reviewer for their comments on the strengths of the paper, and for highlighting these important considerations regarding the sample demographics and the negative affect induction. As in the original paper that focused only on ultimate food choice behaviors, we now specifically acknowledge that the study was only powered to detect small to medium group differences in the effect of negative emotion on these final choice behaviors. Regarding the sample demographics, we agree that the study’s inclusion of only female participants is a limitation.  Although the original decision for this sampling strategy was informed by data suggesting that bulimia nervosa is roughly six times more prevalent among females than males (Udo & Grilo, 2018), we now note in the discussion that our female-only sample limits the generalizability of the findings.

      We also agree with the Reviewer’s noted limitations of the negative mood induction, and based on the reviewer’s suggestions, we have added to our original description of these limitations in the Discussion. Specifically, we now note that although the task was completed immediately after the affect induction, the study did not include intermittent mood assessments throughout the choice task, so it is unclear how long the negative affect persisted during the actual task.

      Reviewer #3 (Public review):

      Summary:

      The study uses the food choice task, a well-established method in eating disorder research, particularly in anorexia nervosa. However, it introduces a novel analytical approach - the diffusion decision model - to deconstruct food choices and assess the influence of negative affect on how and when tastiness and healthiness are considered in decision-making among individuals with bulimia nervosa and healthy controls.

      Strengths:

      The introduction provides a comprehensive review of the literature, and the study design appears robust. It incorporates separate sessions for neutral and negative affect conditions and counterbalances tastiness and healthiness ratings. The statistical methods are rigorous, employing multiple testing corrections.

      A key finding - that negative affect induction biases individuals with bulimia nervosa toward prioritizing tastiness over healthiness - offers an intriguing perspective on how negative affect may drive binge eating behaviors.

      Weaknesses:

      A notable limitation is the absence of a sample size calculation, which, combined with the relatively small sample, may have contributed to null findings. Additionally, while the affect induction method is validated, it is less effective than alternatives such as image or film-based stimuli (Dana et al., 2020), potentially influencing the results.

      We agree that the small sample size and specific affect induction method may have contributed to the null model-agnostic behavioral findings. Based on this Reviewer’s and Reviewer 2’s comments, we have added these factors to our original acknowledgements of limitations in the Discussion.

      Another concern is the lack of clarity regarding which specific negative emotions were elicited. This is crucial, as research suggests that certain emotions, such as guilt, are more strongly linked to binge eating than others. Furthermore, recent studies indicate that negative affect can lead to both restriction and binge eating, depending on factors like negative urgency and craving (Leenaerts et al., 2023; Wonderlich et al., 2024). The study does not address this, though it could explain why, despite the observed bias toward tastiness, negative affect did not significantly impact food choices.

      We thank the Reviewer for raising these important points and possibilities. In the supplementary materials, we have added an additional analysis of the specific POMS subscales that comprise the total negative affect calculation that was reported in the original paper (Gianini et al., 2019), and which we now report in the main text. Ultimately, we found that, across both groups, the negative affect induction increased responses related to anger, confusion, depression, and tension while reducing vigor.

      We agree with the Reviewer that factors like negative urgency and cravings are relevant here. The study did not collect any measures of craving, and in response to Reviewer 1 and this Reviewer, we now note in the discussion that replication studies including momentary craving assessments will be important. While we don’t have any measurements of cravings, we did measure negative urgency. Despite these prior findings, the original paper (Gianini et al., 2019) did not find that negative urgency was related to restrictive food choices. We have now repeated those analyses, and we also were unable to find any meaningful patterns. Nonetheless, we have added an analysis of negative urgency scores and decision parameters to the supplementary materials.      

      References

      Alpers, G. W., & Tuschen-Caffier, B. (2001). Negative feelings and the desire to eat in bulimia nervosa. Eating Behaviors, 2(4), 339–352. https://doi.org/10.1016/S1471-0153(01)00040-X

      Berg, K. C., Crosby, R. D., Cao, L., Peterson, C. B., Engel, S. G., Mitchell, J. E., & Wonderlich, S. A. (2013). Facets of negative affect prior to and following binge-only, purge-only, and binge/purge events in women with bulimia nervosa. Journal of Abnormal Psychology, 122(1), 111–118. https://doi.org/10.1037/a0029703

      Dalton, B., Foerde, K., Bartholdy, S., McClelland, J., Kekic, M., Grycuk, L., Campbell, I. C., Schmidt, U., & Steinglass, J. E. (2020). The effect of repetitive transcranial magnetic stimulation on food choice-related self-control in patients with severe, enduring anorexia nervosa. International Journal of Eating Disorders, 53(8), 1326–1336. https://doi.org/10.1002/eat.23267

      Gianini, L., Foerde, K., Walsh, B. T., Riegel, M., Broft, A., & Steinglass, J. E. (2019). Negative affect, dietary restriction, and food choice in bulimia nervosa. Eating Behaviors, 33, 49–54. https://doi.org/10.1016/j.eatbeh.2019.03.003

      Haedt-Matt, A. A., & Keel, P. K. (2011). Revisiting the affect regulation model of binge eating: A meta-analysis of studies using ecological momentary assessment. Psychological Bulletin, 137(4), 660–681. https://doi.org/10.1037/a0023660

      Hauser, T. U., Skvortsova, V., Choudhury, M. D., & Koutsouleris, N. (2022). The promise of a model-based psychiatry: Building computational models of mental ill health. The Lancet Digital Health, 4(11), e816–e828. https://doi.org/10.1016/S2589-7500(22)00152-2

      Hilbert, A., & Tuschen-Caffier, B. (2007). Maintenance of binge eating through negative mood: A naturalistic comparison of binge eating disorder and bulimia nervosa. International Journal of Eating Disorders, 40(6), 521–530. https://doi.org/10.1002/eat.20401

      Huys, Q. J. M., Browning, M., Paulus, M. P., & Frank, M. J. (2021). Advances in the computational understanding of mental illness. Neuropsychopharmacology, 46(1), 3–19. https://doi.org/10.1038/s41386-020-0746-4

      Maier, S. U., Raja Beharelle, A., Polanía, R., Ruff, C. C., & Hare, T. A. (2020). Dissociable mechanisms govern when and how strongly reward attributes affect decisions. Nature Human Behaviour, 4(9), Article 9. https://doi.org/10.1038/s41562-020-0893-y

      Ratcliff, R., & Childers, R. (2015). Individual differences and fitting methods for the two-choice diffusion model of decision making. Decision, 2(4), 237–279. https://doi.org/10.1037/dec0000030

      Rouder, J. N., & Lu, J. (2005). An introduction to Bayesian hierarchical models with an application in the theory of signal detection. Psychonomic Bulletin & Review, 12(4), 573–604. https://doi.org/10.3758/BF03196750

      Smyth, J. M., Wonderlich, S. A., Heron, K. E., Sliwinski, M. J., Crosby, R. D., Mitchell, J. E., & Engel, S. G. (2007). Daily and momentary mood and stress are associated with binge eating and vomiting in bulimia nervosa patients in the natural environment. Journal of Consulting and Clinical Psychology, 75(4), 629–638. https://doi.org/10.1037/0022-006X.75.4.629

      Steinglass, J., Foerde, K., Kostro, K., Shohamy, D., & Walsh, B. T. (2015). Restrictive food intake as a choice—A paradigm for study. International Journal of Eating Disorders, 48(1), 59–66. https://doi.org/10.1002/eat.22345

      Udo, T., & Grilo, C. M. (2018). Prevalence and Correlates of DSM-5–Defined Eating Disorders in a Nationally Representative Sample of U.S. Adults. Biological Psychiatry, 84(5), 345–354. https://doi.org/10.1016/j.biopsych.2018.03.014

      Watanabe, S. (2010). Asymptotic Equivalence of Bayes Cross Validation and Widely Applicable Information Criterion in Singular Learning Theory. Journal of Machine Learning Research, 11, 3571–3594.

      Wiecki, T. V., Sofer, I., & Frank, M. J. (2013). HDDM: Hierarchical Bayesian estimation of the drift-diffusion model in Python. Frontiers in Neuroinformatics, 7. https://doi.org/10.3389/fninf.2013.00014

      Wilson, R. C., & Collins, A. G. (2019). Ten simple rules for the computational modeling of behavioral data. eLife, 8, e49547. https://doi.org/10.7554/eLife.49547

      Wise, T., Robinson, O. J., & Gillan, C. M. (2023). Identifying Transdiagnostic Mechanisms in Mental Health Using Computational Factor Modeling. Biological Psychiatry, 93(8), 690–703. https://doi.org/10.1016/j.biopsych.2022.09.034

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews: 

      Reviewer #1 (Public review): 

      Summary: 

      Govindan and Conrad use a genome-wide CRISPR screen to identify genes regulating retention of intron 4 in OGT, leveraging an intron retention reporter system previously described (PMID: 35895270). Their OGT intron 4 reporter reliably responds to O-GlcNAc levels, mirroring the endogenous splicing event. Through a genome-wide CRISPR knockout library, they uncover a range of splicing-related genes, including multiple core spliceosome components, acting as negative regulators of OGT intron 4 retention. They choose to follow up on SFSWAP, a largely understudied splicing regulator shown to undergo rapid phosphorylation in response to O-GlcNAc level changes (PMID: 32329777). RNA-sequencing reveals that SFSWAP depletion not only promotes OGT intron 4 splicing but also broadly induces exon inclusion and intron splicing, affecting decoy exon usage. While this study offers interesting insights into intron retention and O-GlcNAc signaling regulation, the RNA sequencing experiments lack the essential controls needed to provide full confidence to the authors' conclusions. 

      Strengths: 

      (1) This study presents an elegant genetic screening approach to identify regulators of intron retention, uncovering core spliceosome genes as unexpected positive regulators of intron retention. 

      (2) The work proposes a novel functional role for SFSWAP in splicing regulation, suggesting that it acts as a negative regulator of splicing and cassette exon inclusion, which contrasts with expected SR-related protein functions. 

      (3) The authors suggest an intriguing model where SFSWAP, along with other spliceosome proteins, promotes intron retention by associating with decoy exons. 

      We thank the reviewer for recognizing and detailing the strengths of our manuscript. 

      Weaknesses: 

      (1) The conclusions on SFSWAP impact on alternative splicing are based on cells treated with two pooled siRNAs for five days. This extended incubation time without independent siRNA treatments raises concerns about off-target effects and indirect effects from secondary gene expression changes, potentially limiting confidence in direct SFSWAP-dependent splicing regulation. Rescue experiments and shorter siRNA-treatment incubation times could address these issues. 

      We repeated our SFSWAP knockdown analysis and analyzed both OGT e4-e5 junction splicing and SFSWAP transcript levels by RT-qPCR (now included in Sup. Fig. S4) from day 2 to day 5 post siRNA treatment. We observed that the time point at which OGT intron 4 removal increases (day 2) coincides with the time at which SFSWAP transcript levels start decrease, consistent with a direct effect of SFSWAP knockdown on OGT intron 4 splicing. Moreover, the effect of SFSWAP knockdown on OGT intron 4 splicing peaks between day 4-5, supporting our use of these longer time points to cast a wide net for SFSWAP targets.

      (2) The mechanistic role of SFSWAP in splicing would benefit from further exploration. Key questions remain, such as whether SFSWAP directly binds RNA, specifically the introns and exons (including the decoy exons) it appears to regulate. Furthermore, given that SFSWAP phosphorylation is influenced by changes in O-GlcNAc signaling, it would be interesting to investigate this relationship further. While generating specific phosphomutants may not yield definitive insights due to redundancy and also beyond the scope of the study, the authors could examine whether distinct SFSWAP domains, such as the SR and SURP domains, which likely overlap with phosphorylation sites, are necessary for regulating OGT intron 4 splicing. 

      We absolutely agree with the reviewer that the current work stops short of a detailed mechanistic study, and we have made every attempt to be circumspect in our interpretations to reflect that limitation. In addition, we are very interested in delving more deeply into the mechanistic aspects of this regulation. In fact, we have initiated many of the experiments suggested by the reviewer (and more), but in each case, rigorous interpretable results will require a minimum another year’s time. 

      For example, we have used crosslinking and biotin labeling techniques (using previously available reagents from Eclipsebio) to test whether SFSWAP binds RNA. The results were negative, but the lack of strong SFSWAP antibodies required that we use a transiently expressed myc-tagged SFSWAP. Therefore, this negative result could be an artifact of the exogenous expression and/or tagging. Given the difficulties of “proving the negative”, considerably more work will be required to substantiate this finding. As another example, we intend to develop a complementation assay as suggested. For an essential gene, the ideal complementation system employs a degron system, and we have spent months attempting to generate a homozygous AID-tagged SFSWAP. Unfortunately, we so far have only found heterozygotes. Of course, this could be because the tag interferes with function, the insert was not efficiently incorporated by homologous repair, or that we simply haven’t yet screened a sufficient number of clones. We’re confident that these technical issues that can be addressed, but they will take a significant amount of time to resolve. While we would ideally define a mechanism, we think that the data reported here outlining functions for SFSWAP in splicing represent a body of work sufficient for publication. 

      (3) Data presentation could be improved (specific suggestions are included in the recommendations section). Furthermore, Excel tables with gene expression and splicing analysis results should be provided as supplementary datasheets. Finally, a more detailed explanation of statistical analyses is necessary in certain sections. 

      We have addressed all specific suggestions as detailed in the recommendations below.

      Reviewer #2 (Public review): 

      Summary: 

      The paper describes an effort to identify the factors responsible for intron retention and alternate exon splicing in a complex system known to be regulated by the O-GlcNAc cycling system. The CRISPR/Cas9 system was used to identify potential factors. The bioinformatic analysis is sophisticated and compelling. The conclusions are of general interest and advance the field significantly. 

      Strengths: 

      (1) Exhaustive analysis of potential splicing factors in an unbiased screen. 

      (2) Extensive genome wide bioinformatic analysis. 

      (3) Thoughtful discussion and literature survey. 

      We thank the reviewer for recognizing and detailing the strengths of our manuscript. 

      Weaknesses: 

      (1) No firm evidence linking SFSWAP to an O-GlcNAc specific mechanism. 

      We couldn’t agree more with this critique. Indeed, our intention at the outset for the screen was to find an O-GlcNAc sensor linking OGT splicing with O-GlcNAc levels. As often occurs with high-throughput screens, we didn’t find exactly what we were looking for, but the screen nonetheless pointed us to interesting biology. Prompted by our screen, we describe new insights into the function of SFSWAP a relatively uncharacterized essential gene. Currently, we are testing other candidates from our screen, and we are performing additional studies to identify potential O-GlcNAc sensors.  

      (2) Resulting model leaves many unanswered questions. 

      We agree (see Reviewer 1, point 2 response).  

      Reviewer #3 (Public review): 

      Summary: 

      The major novel finding in this study is that SFSWAP, a splicing factor containing an RS domain but no canonical RNA binding domain, functions as a negative regulator of splicing. More specifically, it promotes retention of specific introns in a wide variety of transcripts including transcripts from the OGT gene previously studied by the Conrad lab. The balance between OGT intron retention and OGT complete splicing is an important regulator of O-GlcNAc expression levels in cells. 

      Strengths: 

      An elegant CRISPR knockout screen employed a GFP reporter, in which GFP is efficiently expressed only when the OGT retained intron is removed (so that the transcript will be exported from the nucleus to allow for translation of GFP). Factors whose CRISPR knockdown causes decreased intron retention therefore increase GFP, and can be identified by sequencing RNA of GFP-sorted cells. SFSWAP was thus convincingly identified as a negative regulator of OGT retained intron splicing. More focused studies of OGT intron retention indicate that it may function by regulating a decoy exon previously identified in the intron, and that this may extend to other transcripts with decoy exons. 

      We thank the reviewer for recognizing the strengths of our manuscript. 

      Weaknesses: 

      The mechanism by which SFSWAP represses retained introns is unclear, although some data suggests it can operate (in OGT) at the level of a recently reported decoy exon within that intron.

      Interesting/appropriate speculation about possible mechanisms are provided and will likely be the subject of future studies. 

      We completely agree that this is a limitation of the current study (see above). Now that we have a better understanding of SFSWAP functions, we will continue to explore SFSWAP mechanisms as suggested. 

      Overall the study is well done and carefully described but some figures and some experiments should be described in more detail. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors): 

      (1) Clarify and add missing statistical details across the figures. For example, Figure S2 lacks statistical comparisons, and in Figures 4A and 4C the tests applied should be specified in the legend. 

      We have added appropriate statistical analysis wherever missing and edited figure legends to specify the tests used.

      (2) The authors are strongly encouraged to provide detailed tables of gene expression and alternative splicing analyses from RNA-Seq experiments (e.g., edgeR, rMATS, Whippet, and MAJIQ), as this would enhance transparency and facilitate data interpretation. 

      We have added tables for gene expression and alternate splicing analysis as suggested (Suppl. tables 3-

      6).

      (3) Although the legend sometimes indicates differently (e.g., Figure 3b, 5a, 5c, etc), the volcano plots showing the splicing changes do not contain a cutoff for marginally differential percent spliced in or intron retention values. 

      The legends have been edited to reflect the correct statistical and/or PSI cutoffs.

      (4) For consistency, use a consistent volcano plot format across all relevant figures (Figures 3b, 5a-c, S3, S4, S7, and S8), including cutoffs for differential splicing and the total count of up- and down-regulated events. 

      Due to different statistical frameworks and calculations employed by different alternate splicing pipelines, we could not use the same cutoffs for different pipelines.  However, we have now indicated the number of up- and down-regulated events for consistency among the volcano plots.

      (5) What is the overlap of differentially regulated events between the different analytical methodologies applied? 

      We analyzed the degree of overlap between the three pipelines used in the paper using a Venn diagram (added to Suppl. Fig. S7). However, as widely reported in literature (e.g., Olofsson et al., 2023; Biochem Biophys Res Commun. 2023; doi: 10.1016/j.bbrc.2023.02.053.), the degree of overlap between pipelines is quite low.

      (6) To further substantiate your conclusions, additional validations of RNA-Seq splicing data, ideally visualized on an agarose gel, would be valuable, especially for exons and introns regulated by SFSWAP, and particularly for OGT decoy exons in Figure 4c. 

      We have not included these experiments as we focused on other critiques for this resubmission. Because the RNA-seq, RT-PCR and RT-qPCR data all align, we are confident that the products we are seeing are correctly identified and orthogonally validated (Figs 2d, 4a, 4b, and 4c).  

      (7) It would be more informative if the CRISPR screen data were presented in a format where both the adjusted p-value and LFC values of the hits are presented. Perhaps a volcano plot? 

      We have now included these graphs in revised Supplementary Figure S2. 

      (8) In Figure 2d, a cartoon showing primer binding sites for each panel could aid interpretation, particularly in explaining the unexpected simultaneous increase in OGT mRNA and intron retention upon SFSWAP knockdown. 

      We have added a cartoon showing primer binding sites similar to that shown in Fig. 4a.

      (9) Page 9, line 1, states that SFSWAP autoregulates its expression by controlling intron retention. Including a Sashimi plot would provide visual support for this claim. 

      The data suggesting that SFSWAP autoregulates its own transcript abundance were reported in Zachar et al. (1994), not from our own studies. Validation of those data with our RNA-seq data is confounded by the fact that we are using siRNAs to knockdown the SFSWAP RNA at the transcript level (Fig. S15). 

      (10) In the legend of Figure S2 the authors state that negative results are inconclusive because RNA knockdowns are not verified by western blotting or qRT-PCR. This is correct, but the reviewer would also argue that the positive results are also inconclusive as they are not supported by a rescue experiment to confirm that the effect is not due to off-target effects. 

      This is a fair point with respect to the siRNA experiments on their own. However, the CRISPR screen was performed with sgRNAs, and MAGeCK RRA scores are high only for those genes that have multiple sgRNAs that up-regulate the gene. Examination of the SFSWAP sgRNAs individually shows that three of four SFSWAP sgRNAs had false discovery rates ≤10<sup>-42</sup> for GFP upregulation. Thus, the siRNAs provide an additional orthogonal approach. It seems unlikely that the siRNAs, and three independent sgRNAs will have the same off-target results. Thus, these combined observations support the conclusion that SFSWAP loss leads to decreased OGT intron retention.  

      (11) For clarity in Figure 3a, consider using differential % spliced in or intron retention bar plots with directionality (positive and negative axis) and labeling siSFSWAP as the primary condition. 

      (12) Consider presenting Figure 5D as a box plot with a Wilcoxon test for statistical comparison. 

      For both points 11 and 12, we have tried the graphs as the reviewer suggested. While these were good suggestions, in both cases we felt that the original plots ended up presenting a clearer presentation of the data (see Author response image 1).

      Author response image 1.

      (13) Please expand the Methods section to detail the Whippet and MAJIQ analyses. 

      We have expanded the methods section to include additional details of the alternate splicing analysis.

      (14) Include coordinates for the four possible OGT decoy exon combinations analyzed in the Methods section. 

      We have added the coordinates of all four decoy forms in the methods section.  

      (15) A section on SFSWAP mass spectrometry is listed in Methods but is missing from the manuscript. 

      This section has now been removed.

      Reviewer #2 (Recommendations for the authors): 

      This is an excellent contribution. The paper describes an effort to identify the factors responsible for intron retention and alternate exon splicing in a complex system known to be regulated by the O-GlcNAc cycling system. The CRISPR/Cas9 system was used to identify potential factors. The bioinformatic analysis is sophisticated and compelling. The conclusions are of general interest and advance the field significantly. 

      Some specific recommendations. 

      (1) The plots in Figure 3 describing SI and ES events are confusing to this reader. Perhaps the violin plot is not the best way to visualize these events. The same holds true for the histograms in the lower panel of Figure 3. Not sure what to make of these plots. 

      For Figure 3b, we include both scatter and violin plots to represent the same data in two distinct ways. For Figure 3d, we agree that these are not the simplest plots to understand, and we have spent significant time trying to come up with a better way of displaying these trends in GC content as they relate to SE and RI events. Unfortunately, we were unable to identify a clearer way to present these data. 

      (2) The model (Figure 6) is very useful but confusing. The legend and the Figure itself are somewhat inconsistent. The bottom line of the figure is apparent but I fear that the authors are trying to convey a more complete model than is apparent from this figure. Please revise. 

      We have simplified the figure from the previous submission. As mentioned above, we admit that mechanistic details remain unknown. However, we have tried to generate a model that reflects our data, adds some speculative elements to be tested in the future, but remains as simple as possible. We are not quite sure what the reviewer was referring to as “somewhat inconsistent”, but we have attempted to clarify the model in the revised Discussion and Figure legend.  

      (3) It is unclear how normalization of the RNA seq experiments was performed (eg. Figure S5 and 6).  

      The normalization differences in Fig. S5 and S6 (now Fig S8 and S9) were due to scaling differences during the use of rmats2sashimiplot software. We have now replaced Fig. S5 to reflect correctly scaled images.

      I am enthusiastic about the manuscript and feel that with some clarification it will be an important contribution. 

      Thank you for these positive comments about our study!

      Reviewer #3 (Recommendations for the authors): 

      (1) In Figure 1f, it is clear that siRNA-mediated knockdown of OGT greatly increases spliced RNA as the cells attempt to compensate by more efficient intron removal (three left lanes). However, there is no discussion of the various treatments with TG or OSMI. Might quantitation of these lanes not also show the desired effects of TG and OSMI on spliced transcript levels? 

      The strong effect of OGT knockdown masks the (comparatively modest) effects of subsequent inhibitor treatments on the reporter RNA. We have edited the results section to clarify this.

      (2) In Figure 2c, why is the size difference between spliced RNA and intron-retained RNA so different in the GFP-probed gel (right) compared with the OGT-probed gel (left)? Even recognizing that the GFP probe is directed against reporter transcripts, and the OGT probe (I think) is directed against endogenous OGT transcripts, shouldn't the difference between spliced and unspliced bands be the same, i.e., +/- the intron 4 sequence. Also, why does the GFP probe detect the unspliced transcript so poorly? 

      The fully spliced endogenous OGT mRNA is ~5.5 kb while the fully spliced reporter is only ~1.6kb, so the difference in size (the apparent shift relative to the mRNA) is quite different. Moreover, the two panels in Fig 2c are not precisely scaled to one another, so direct comparisons cannot be made. 

      The intron retained isoform does not accumulate to high levels in this reporter, a phenotype that we also observed with our GFP reporter designed to probe the regulation of the MAT2A retained intron (Scarborough et al., 2021). We are not certain about the reason for these observations, but suspect that the reporter RNA’s retained intron isoforms are less stable in the nucleus than their endogenous counterparts. Alternatively, the lack of splicing may affect 3´ processing of the transcripts so that they do not accumulate to the high levels observed for the wild-type genes. 

      (3) Please provide more information about the RNA-seq experiments. How many replicates were performed under each of the various conditions? The methods section says three replicates were performed for the UPF1/TG experiments; was this also true for the SFSWAP experiments?  

      All RNA-seq experiments were performed in biological triplicates. We have edited the methods section to clarify this.

      (4) Relatedly, the several IGV screenshots shown in Figure 3C presumably represent the triplicate RNA seq experiments. In part D, how many experiments does the data represent? Is it a compilation of three experiments? 

      Fig. 3d is derived from alternate splicing analysis performed on three biological replicates. We have added the number of replicates (n=3) on the figure to clarify this. We have also noted that the three IGV tracks represent biological replicates in the Figure legend for 3c.  

      (5) Please provide more details regarding the qRT-PCR experiments. 

      We have provided the positions of primer sets used for RT-qPCR analysis and cartoon depictions of target sites below the data wherever appropriate.

      (6) In the discussion of decoy exon function (in the Discussion section), several relevant observations are cited to support a model in which decoy exons promote assembly of splicing factors. One might also cite the finding that eCLIP profiling has found enriched binding of U2AF1 and U2AF2 at the 5' splice site region of decoy exons (reference 16). 

      Excellent point. This has now been added to the Discussion. 

      Minor corrections / clarifications: 

      (1) In the Figure 2A legend, CRISPR is misspelled. 

      Corrected.

      (2) In the discussion, the phrase "indirectly inhibits splicing of exons 4 and 5, but promoting stable unproductive assembly of the spliceosome", the word "but" should probably be "by". 

      Corrected.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study has preliminarily revealed the role of ACVR2A in trophoblast cell function, including its effects on migration, invasion, proliferation, and clonal formation, as well as its downstream signaling pathways.

      Strengths:

      The use of multiple experimental techniques, such as CRISPR/Cas9-mediated gene knockout, RNA-seq, and functional assays (e.g., Transwell, colony formation, and scratch assays), is commendable and demonstrates the authors' effort to elucidate the molecular mechanisms underlying ACVR2A's regulation of trophoblast function. The RNA-seq analysis and subsequent GSEA findings offer valuable insights into the pathways affected by ACVR2A knockout, particularly the Wnt and TCF7/c-JUN signaling pathways.

      Weaknesses:

      The molecular mechanisms underlying this study require further exploration through additional experiments. While the current findings provide valuable insights into the role of ACVR2A in trophoblast cell function and its involvement in the regulation of migration, invasion, and proliferation, further validation in both in vitro and in vivo models is needed. Additionally, more experiments are required to establish the functional relevance of the TCF7/c-JUN pathway and its clinical significance, particularly in relation to pre-eclampsia. Additional techniques, such as animal models and more advanced clinical sample analyses, would help strengthen the conclusions and provide a more comprehensive understanding of the molecular pathways involved.

      Reviewer #2 (Public review):

      Summary:

      ACVR2A is one of a handful of genes for which significant correlations between associated SNPs and the incidences of preeclampsia have been found in multiple populations. It is one of the TGFB family receptors, and multiple ligands of ACVR2A, as well as its coreceptors and related inhibitors, have been implicated in placental development, trophoblast invasion, and embryo implantation. This useful study builds on this knowledge by showing that ACVR2A knockout in trophoblast-related cell lines reduces trophoblast invasion, which could tie together many of these observations. Support for this finding is incomplete, as reduced proliferation may be influencing the invasion results. The implication of cross-talk between the WNT and ACRV2A/SMAD2 pathways is an important contribution to the understanding of the regulation of trophoblast function.

      Strengths:

      (1) ACVR2A is one of very few genes implicated in preeclampsia in multiple human populations, yet its role in pathogenesis is not very well studied and this study begins to address that hole in our knowledge.

      (2) ACVR2A is also indirectly implicated in trophoblast invasion and trophoblast development via its connections to many ligands, inhibitors, and coreceptors, suggesting its potential importance.

      (3) The authors have used multiple cell lines to verify their most important observations.

      Weaknesses:

      (1) There are a number of claims made in the introduction without attribution. For example, there are no citations for the claims that family history is a significant risk factor for PE, that inadequate trophoblast invasion of spiral arteries is a key factor, and that immune responses, and reninangiotensin activity are involved.

      Thank you for pointing out the lack of citations in some parts of the introduction. We have revised the manuscript to include appropriate references for the claims regarding family history as a risk factor for PE, the role of inadequate trophoblast invasion in spiral arteries, and the involvement of immune responses and the renin-angiotensin system. The revised text now includes citations to well-established studies in the field (Salonen Ros et al., 2000; Chappell LC et al., 2021; Brosens et al., 2002; Knofler et al., 2019; Redman CWG et al., 1999; LaMarca B et al., 2008). We believe these additions improve the scientific rigor of the manuscript.

      (2) The introduction states "As a receptor for activin A, ACVR2A..." It's important to acknowledge that ACVR2A is also the receptor for other TGFB family members, with varying affinities and coreceptors. Several TGFB family members are known to regulate trophoblast differentiation and invasion. For example, BMP2 likely stimulates trophoblast invasion at least in part via ACVR2A (PMID 29846546).

      Thank you for highlighting the broader role of ACVR2A as a receptor for multiple members of the TGF-β superfamily. We have revised the introduction to acknowledge that ACVR2A is not only the receptor for activin A but also interacts with other ligands, such as BMP2, which likely stimulates trophoblast invasion via ACVR2A (PMID: 29846546). This addition provides a more comprehensive view of ACVR2A's function in trophoblast biology. While the focus of our current study is on activin A, we agree that ACVR2A's role in mediating the effects of other TGF-β family members is an important topic for future research.

      (3) An alternative hypothesis for the potential role of ACVR2A in preeclampsia is its functions in the endometrium. In the mouse ACVR2A knockout in the uterus (and other progesterone receptorexpressing cells) leads to embryo implantation failure.

      Thank you for bringing up the potential role of ACVR2A in the endometrium as an alternative hypothesis. We have revised the discussion to acknowledge this possibility and cited relevant studies showing that uterine-specific knockout of ACVR2A in mice leads to embryo implantation failure (Monsivais et al., 2021). This suggests that ACVR2A may play a critical role in uterine receptivity and embryo implantation, which could influence placental development and preeclampsia pathogenesis. While our current study focuses on trophoblast-related functions of ACVR2A, we agree that investigating its role in the uterine environment is an important direction for future research.

      (4) In the description of the patient population for placental sample collections, preeclampsia is defined only by hypertension, and this is described as being in accordance with ACOG guidelines. ACOG requires a finding of hypertension in combination with either proteinuria or one of the following: thrombocytopenia, elevated creatinine, elevated liver enzymes, pulmonary, edema, and new onset unresponsive headache.

      We appreciate the reviewer’s detailed observation regarding the definition of preeclampsia.

      We have reviewed and clarified our description of the diagnostic criteria based on the American College of Obstetricians and Gynecologists (ACOG) guidelines. Specifically, we have revised the definition in the Materials and Methods section under "Collection of Placenta and Decidua Specimens," as follows: In accordance with the guidelines from the American College of Obstetricians and Gynecologists (ACOG, 2023), preeclampsia (PE) is diagnosed as hypertension (systolic blood pressure ≥140 mmHg or diastolic blood pressure ≥90 mmHg on at least two occasions) in combination with one or more of the following: proteinuria (≥300 mg/24-hour urine collection or protein/creatinine ratio ≥0.3), thrombocytopenia, elevated serum creatinine, elevated liver enzymes, pulmonary edema, or new-onset headache unresponsive to treatment.

      (5) I believe that Figures 1a and 1b are data from a previously published RNAseq dataset, though it is not entirely clear in the text. The methods section does not include a description of the analysis of these data undertaken here. It would be helpful to include at least a brief description of the study these data are taken from - how many samples, how were the PE/control groups defined, gestational age range, where is it from, etc. For the heatmap presented in B, what is the significance of the other genes/ why are they being shown? If the purpose of these two panels is to show differential expression specifically of ACVR2A in this dataset, that could be shown more directly.

      Clarification of RNAseq dataset: The Methods section has been revised to specify the dataset source (GEO accession number: GSE114691), which includes 20 PE and 21 control placental samples with gestational ages ranging from 34 to 38 weeks. PE and control groups were defined using clinical criteria such as hypertension and proteinuria, and these details have also been added to the Results section. RNAseq analysis description: We have included details of the differential gene expression analysis in the Methods section. Specifically, the DESeq2 R package was used, with thresholds of FDR < 0.05 and |log2(fold change) | ≥ 1. The selection of WNT pathwayrelated genes in Figure 1B is based on these analyses. Significance of the heatmap genes: The genes displayed in Figure 1B were selected based on their significant differential expression and enrichment in pathways relevant to PE pathogenesis, such as the WNT signaling pathway. We have clarified this in the Results section and updated the figure legend to explain their biological relevance. Purpose of Figures 1A and 1B: Figure 1A emphasizes the downregulation of ACVR2A in PE placentas, while Figure 1B complements this by presenting differentially expressed genes associated with the WNT pathway. These figures collectively highlight the role of ACVR2A in PE and its connection to broader molecular pathways. Text descriptions have been updated to improve clarity and focus.

      (6) More information is needed in the methods section to understand how the immunohistochemistry was quantified. "Quantitation was performed" is all that is provided. Was staining quantified across the whole image or only in anchoring villous areas? How were HRP & hematoxylin signals distinguished in ImageJ? How was the overall level of HRP/DAB development kept constant between the NC and PE groups?

      Thank you for pointing out the need for more details regarding the quantification of immunohistochemistry (IHC). We have now clarified and expanded the description of the IHC quantification process in the Methods section as follows: Quantification Across the Entire Section: IHC staining was assessed across the entire tissue section to account for global expression patterns. For quantitative analysis, representative regions from the anchoring villous areas, where ACVR2A expression is most prominent, were selected for comparison between NC and PE groups. This ensured that the analysis focused on biologically relevant regions. ImageJ Analysis:

      Images of stained sections were captured under identical magnifications and lighting conditions. Hematoxylin (blue, nuclear staining) and DAB/HRP (brown, protein-specific signal) were distinguished using ImageJ's color deconvolution plugin. The DAB/HRP signal was isolated and quantified based on the integrated optical density (IOD) within the selected regions. Consistency in HRP/DAB Development: To maintain consistency between NC and PE groups, all tissue samples were processed under identical experimental conditions, including the same antibody dilution, incubation times, and DAB/HRP development durations. Negative controls (without primary antibody) were included to monitor background staining, and the DAB reaction was stopped simultaneously across all samples to avoid overdevelopment. Statistical Analysis: The quantified DAB signal intensity was normalized to the area of the selected regions, and comparisons between NC and PE groups were performed using statistical tests (e.g., Student’s ttest). Results are reported as mean ± SD. We hope this additional detail addresses your concerns.

      (7) In Figure 1E it is not immediately obvious to many readers where the EVT are. It is probably worth circling or putting an arrow to the little region of ACVR2A+ EVT that is shown in the higher magnification image in Figure 1E. These are actually easier to see in the pictures provided in the supplement Figure 1. Of note, the STB is also staining positive. This is worth pointing out in the results text.

      Thank you for your suggestion regarding Figure 1E. To make the location of the ACVR2A+ extravillous trophoblasts (EVTs) more apparent, we have updated Figure 1E by adding arrows to indicate the regions of EVTs in the higher magnification image. Additionally, we have included annotations in the supplemental Figure S1 to further aid visualization. We appreciate your observation that syncytiotrophoblasts (STBs) also show positive staining for ACVR2A. We have revised the Results section to explicitly mention this finding and its potential significance.

      (8) It is not possible to judge whether the IF images in 1F actually depict anchoring villi. The DAPI is really faint, and it's high magnification, so there isn't a lot of context. Would it be possible to include a lower magnification image that shows where these cells are located within a placental section? It is also somewhat surprising that this receptor is expressed in the cytoplasm rather than at the cell surface. How do the authors explain this?

      Thank you for your suggestion to provide more context for the immunofluorescence (IF) images in Figure 1F. To address this, we have included lower magnification images in Supplementary Figure S2, showing the overall structure of the placental section and the location of the anchoring villi. These images help to contextualize the regions analyzed in Figure 1F, which were selected to clearly illustrate ACVR2A expression in extravillous trophoblasts (EVTs). In Figure 1F, we have focused on higher magnification images for better visualization of ACVR2A staining patterns in EVTs. Regarding the subcellular localization of ACVR2A, the receptor is predominantly expressed on the cell surface, as shown in our images. However, some intracellular staining is also observed, which may reflect receptor trafficking or recycling processes, consistent with the behavior of other activin receptors under physiological or pathological conditions. We have clarified these points in the Results and Discussion sections.

      (9) The results text makes it sound like the data in Figure 2A are from NCBI & Protein atlas, but the legend says it is qPCR from this lab. The methods do not detail how these various cell lines were grown; only HTR-SVNeo cell culture is described. Similarly, JAR cells are used for several experiments and their culture is not described.

      Thank you for pointing out the need for clarification regarding Figure 2A and cell culture methods. The data in Figure 2A were generated using RT-qPCR conducted in our laboratory, not solely based on data from NCBI or the Human Protein Atlas. We have revised the Results section to reflect this more accurately. Regarding the culture conditions, we acknowledge that the methods for other cell lines were not explicitly detailed. For this study, all cell lines, including JAR and other cancer cell lines, were cultured following standard protocols provided by the suppliers. Specifically, JAR cells and other cell lines were purchased from Wuhan Punosei Life Technology and were maintained in RPMI-1640 medium supplemented with 10% fetal bovine serum (FBS) and 1% penicillin/streptomycin under standard conditions (37°C, 5% CO<sub>2</sub>). This information has been added to the Methods section for clarity.

      (10) Under RT-qPCR methods, the phrase "cDNA reverse transcription cell RNA was isolated..." does not make any sense.

      Thank you for pointing out the unclear phrasing in the RT-qPCR methods section. We agree that the original description was not precise. To address this, we have revised the relevant section to improve clarity and accuracy. Specifically, the methods now explicitly describe the two key steps: RNA isolation and cDNA synthesis. The revised text reads: Total RNA was isolated from cells using a Total RNA Extraction Kit (TIANGEN, China) following the manufacturer’s instructions. The extracted RNA was reverse-transcribed into complementary DNA (cDNA) using a cDNA Synthesis Kit (Takara, Japan) according to the protocol provided by the manufacturer.

      (11) The paragraph beginning "Consequently, a potential association..." is quite confusing. It mentions analyzing ACVR2A expression in placentas, but then doesn't point to any results of this kind and repeats describing the results in Figure 2a, from various cell lines.

      Thank you for your comment regarding the paragraph beginning with "Consequently, a potential association...". We understand that the current wording may create confusion. The primary aim of this section is to compare ACVR2A expression levels across various cell lines, including trophoblast-derived and non-trophoblast cell lines, to highlight the relevance of ACVR2A in trophoblast function, particularly in invasion and migration. To address your concerns, we have revised the paragraph for clarity and logical flow. The updated text explicitly focuses on the comparison of ACVR2A expression across cell lines (Figure 2A) and how this supports the hypothesis that ACVR2A plays a key role in trophoblast invasion and migration. Additionally, the discussion of placental samples has been separated to avoid confusion with cell line results. We hope this revision resolves the issue.

      (12) The authors should acknowledge that the effect of the ACVR2A knockout on proliferation makes it difficult to draw any conclusions from the trophoblast invasion assays. That is, there might be fewer migrating or invading cells in the knockout lines because there are fewer cells, not because the cells that are there are less invasive. Since this is a central conclusion of the study, it is a major drawback.

      Thank you for highlighting this important point. We agree that the reduced proliferation observed in ACVR2A knockout cells could influence the results of the invasion assays, as fewer cells may inherently lead to reduced invasion. To minimize this effect, we conducted the invasion and migration assays under low-serum conditions (1–2% serum) to limit cell proliferation during the experimental timeframe. This approach was based on optimization trials and existing literature, as serum-free conditions were found to negatively impact cell viability and experimental reproducibility. While these efforts helped to mitigate the impact of proliferation on the results, we acknowledge this as a limitation of our study and have added this discussion to the manuscript. Future studies could incorporate approaches such as normalizing cell numbers or using additional proliferation-independent methods to confirm the findings. We hope this clarification and the steps taken address your concerns.

      (13) The legend and the methods section do not agree on how many fields were selected for counting in the transwell invasion assays in Figure 3C. The methods section and the graph do not match the number of replicate experiments in Figure 3D (the number of replicate experiments isn't described for 3C).

      Thank you for pointing out the inconsistencies regarding the number of fields counted and the number of replicates in the Transwell invasion assays (Figure 3C) and colony formation assays (Figure 3D). We apologize for the lack of clarity in the Methods section and figure legend. To address this, we have revised both the figure legends and the Methods section for consistency and added detailed descriptions. For Figure 3C, cell invasion was quantified by randomly selecting 5 fields of view per sample under 300× magnification. Images shown in the figure were taken at lower magnification to provide a better visual comparison between experimental and control groups. For Figure 3D, each experiment was independently repeated at least 10 times to ensure robust and reproducible results. These clarifications have been incorporated into the revised manuscript. We appreciate your feedback and believe this revision improves the clarity and transparency of our methods.

      (14) Discussion says "Transcriptome sequencing analysis revealed low ACVR2A expression in placental samples from PE patients, consistent with GWAS results across diverse populations." The authors should explain this briefly. Why would SNPs in ACVR2A necessarily affect levels of the transcript?

      Thank you for raising this important point. We acknowledge that our study did not directly investigate how SNPs in the ACVR2A gene affect transcript levels. However, prior studies have suggested that SNPs can influence gene expression through various mechanisms. For example, SNPs in regulatory regions (e.g., promoters, enhancers, or untranslated regions) may affect transcription factor binding, RNA stability, or splicing efficiency, ultimately altering transcript levels. While we did not directly assess the functional consequences of ACVR2A SNPs in this study, the observed downregulation of ACVR2A in PE placentas aligns with the potential regulatory impact of SNPs previously identified in GWAS studies. To address this, we have revised the Discussion section to clarify the relationship between SNPs and transcript levels and acknowledge this limitation.  

      (15) "The expression levels of ACVR2A mRNA were comparable to those of tumor cells such as A549. This discovery suggested a potential pivotal role of ACVR2A in the biological functions of trophoblast cells, especially in the nurturing layer." Alternatively, ACVR2A expression resembles that of tumors because the cell lines used here are tumor cells (JAR) or immortalized cells (HTR8). These lines are widely used to study trophoblast properties, but the discussion should at least acknowledge the possibility that the behavior of these cells does not always resemble normal trophoblasts.

      Thank you for pointing out this important limitation. We agree that the JAR and HTR8/SVneo cell lines, being tumor-derived or immortalized, may not fully replicate the behavior of normal trophoblast cells. While these cell lines are widely used as models for studying trophoblast properties due to their ease of culture and invasive behavior, their gene expression and signaling pathways could partially reflect their tumorigenic or immortalized origins. We have revised the Discussion section to acknowledge this limitation and clarify the interpretation of ACVR2A expression levels in these cells.

      (16) The authors should discuss some of what is known about the relationship between the TCF7/c-JUN pathway and the major signaling pathway activated by ACVR2A, Smad 2/3/4. The Wnt and TGFB family cross-talk is quite complex and it has been studied in other systems.

      Thank you for highlighting the relationship between the TCF7/c-JUN pathway and Smad2/3/4 signaling. In our study, we chose to focus on Smad1/5 due to its strong association with ACVR2A in placental development, as demonstrated in a recent study(DOI: 10.1038/s41467-021-23571-5). This study showed that the BMP signaling pathway, mediated through ACVR2A-Smad1/5, is essential for endometrial receptivity and embryo implantation. While Smad2/3/4 are wellestablished mediators of TGF-β signaling, Smad1/5 activation is more directly linked to ACVR2A in the context of reproductive biology.

      In PE placentas, we observed a significant downregulation of Smad1/5 expression, which supports the hypothesis that ACVR2A-mediated Smad signaling is disrupted in this condition. Although we did not directly assess Smad2/3/4 in this study, prior research has shown that Smad2/3 can interact with TCF/LEF transcription factors to regulate Wnt-related target genes, suggesting potential cross-talk between these pathways. We have now clarified this rationale and included a discussion of these interactions in the revised manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Several points need to be addressed to improve the clarity and robustness of the presented findings:

      (1) From a clinical perspective, several concerns arise regarding the interpretation of these findings. First, the small sample size of 20 patients may not be representative of the broader population, limiting the generalizability of the results. Additionally, although no significant differences in age and pre-pregnancy BMI were observed between the PE and normal control groups, other clinical variables, such as hypertension or gestational diabetes, may also influence ACVR2A expression and contribute to PE development. Furthermore, while the study suggests a correlation between reduced ACVR2A expression and PE, it remains unclear whether this association holds true across different subtypes of PE or whether there are other underlying clinical factors that could account for these changes in gene expression. These factors need to be considered in future studies to better understand the clinical relevance of ACVR2A in PE.

      Thank you for raising these insightful concerns about the clinical interpretation of our findings. We agree that the small sample size of 20 patients may limit the generalizability of our results. To address this, we are actively expanding our cohort by collecting additional clinical samples from PE patients and normotensive controls. This effort aims to strengthen the robustness of our findings and provide stronger evidence for the role of ACVR2A in PE. We would also like to clarify that, during the initial sample collection, we specifically included only PE patients without comorbidities such as gestational diabetes, chronic hypertension, or other pregnancy-related complications. This strict selection criterion was implemented to minimize the potential influence of confounding clinical variables and ensure that our findings specifically reflect the association between ACVR2A expression and PE. While our study provides important initial insights, we recognize the need for larger-scale studies to validate these findings. The ongoing collection of clinical samples will allow us to address this limitation and enhance the translational relevance of our research. We have revised the manuscript to reflect these points and highlight our plans to strengthen the study by increasing the sample size.

      (2) The section "Precision Genome Surgery: ACVR2A Knockout via CRISPR/Cas9" in the results contains some issues with expression details. The results section should be more structured, with data presented in a more detailed and clear manner, ensuring that there is a clear connection between each experimental step and its corresponding result. For example, the sentence "Following multiple rounds of monoclonal culture, genotype identification, RT-qPCR and Western blotting (WB) analysis for screening, specific double-knockout monoclonal cell lines were distinctly chosen" contains redundant phrasing and unnecessary details, which affect the flow of the text.

      Thank you for your constructive feedback on the “Precision Genome Surgery: ACVR2A Knockout via CRISPR/Cas9” section. We agree that this section can be better structured to present the data in a more detailed and coherent manner. To address this, we have reorganized the results into distinct steps, ensuring a clear connection between each experimental step and its corresponding result. Redundant phrasing has been removed to improve the flow and readability of the text. The revised section emphasizes the purpose of each step, the screening process, and the specific results obtained.

      (3) The figure legends and panel labels in Figure 3 should be revised to ensure clarity and consistency. The figure legend should specify the exact panels (e.g., Figure 3A, 3B, 3C, etc.) and clearly describe the experimental conditions and results shown in each part.

      Thank you for pointing out the need for improved clarity and consistency in the figure legends and panel labels for Figure 3. We have revised the figure legend to specify each panel (e.g., Figure 3A, 3B, 3C, etc.) and included detailed descriptions of the experimental conditions and results displayed in each part. These updates aim to ensure better understanding and alignment between the figure legend and the panels.

      (4) Lack of In Vivo Validation of ACVR2A Knockout: The study does not include in vivo experiments to validate the effects of ACVR2A knockout. It would be important to investigate whether similar regulatory effects of ACVR2A on trophoblast cell migration and invasion can be observed in animal models or in larger clinical studies. The lack of in vivo data raises questions about the translational relevance of the findings.

      Thank you for highlighting the importance of in vivo validation to assess the translational relevance of our findings. While we acknowledge that in vivo experiments could provide additional insights into the role of ACVR2A in trophoblast migration and invasion, this study was primarily designed as an in vitro investigation to explore the molecular mechanisms underlying ACVR2A function in trophoblast cells. The choice of an in vitro model allowed us to perform precise and controlled mechanistic analyses, which are critical for establishing a foundation for future research. We agree that in vivo studies using animal models or larger clinical cohorts are important next steps to validate the regulatory effects of ACVR2A on trophoblast function and its contribution to PE pathogenesis. These directions will be pursued in future research to further establish the translational potential of our findings. We have included this perspective in the revised Discussion section.

      (5) TCF7/c-JUN Pathway in Clinical Samples: In the study of the TCF7/c-JUN pathway, the authors mention assessing protein expression in clinical samples through immunohistochemistry (IHC). However, the manuscript does not provide a clear explanation of how the findings from laboratory cell models (such as HTR8/SVneo and JAR) relate to the clinical samples. Specifically, while ACVR2A knockout is shown to affect these proteins at the cellular level, it is unclear whether this effect is observed in clinical samples. Therefore, further validation of the TCF7/c-JUN pathway in the cell models and exploration of its relationship with protein expression in clinical samples is necessary. Additional experiments, such as immunofluorescence staining or mass spectrometry, could further confirm the role of the TCF7/c-JUN pathway in cells and provide a more direct comparison with clinical data.

      Thank you for highlighting the need to connect findings from cell models to clinical samples, particularly with respect to the TCF7/c-JUN pathway. In response to your comment, we conducted additional experiments using Western blot analysis to evaluate the expression of ACVR2A, SMAD1/5, SMAD4, pSMAD1/5/9, and TCF7L1/TCF7L2 in PE placental tissues compared to normotensive controls (Figure 7A). The results demonstrated significantly reduced expression of these proteins in PE placentas, providing evidence that disruptions in the ACVR2A-SMAD and TCF7/c-JUN signaling pathways observed in vitro are also present in clinical samples.

      These findings strengthen the translational relevance of our study by directly linking the molecular mechanisms identified in cell models to clinical observations. We have updated the Results and Discussion sections to incorporate these new data, and we believe this addition addresses your concern about the relationship between in vitro and clinical findings.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Lu et. al. proposed here a direct role of LPS in inducing hepatic fat accumulation and that the metabolism of LPS therefore can mitigate fatty liver injury. With an Acyloxyacyl hydrolase whole-body KO mice, they demonstrated that Acyloxyacyl hydrolase deletion resulted in higher hepatic fat accumulation over 8 months of high glucose/high fructose diet. Previous literature has found that hepatocyte TLR4 (which is a main receptor for binding LPS) KO reduced fatty liver in the MAFLD model, and this paper complements this by showing that degradation/metabolism of LPS can also reduce fatty liver. This result proposed a very interesting mechanism and the translational implications of utilizing Acyloxyacyl hydrolase to decrease LPS exposure are intriguing.

      The strengths of the present study include that they raised a very simplistic mechanism with LPS that is of interest in many diseases. The phenotype shown in the study is strong. The mechanism proposed by the findings is generally well supported.

      There are also several shortcomings in the findings of this study. As AOAH is a whole-body KO, the source production of AOAH in MAFLD is unclear. Although the authors used published single-cell RNA-seq data and flow-isolated liver cells, physiologically LPS degradation could occur in the blood or the liver. The authors linked LPS to hepatocyte fatty acid oxidation via SREBP1. The mechanism is not explored in great depth. Is this signaling TLR4? In this model, LPS could activate macrophages and mediate the worsening of hepatocyte fatty liver injury via the paracrine effect instead of directly signaling to hepatocytes, thus it is not clear that this is a strictly hepatocyte LPS effect. It would also be very interesting to see if administration of the AOAH enzyme orally could mitigate MAFLD injury. Overall, this work will add to the current understanding of the gut-liver axis and development of MAFLD and will be of interest to many readers.

      We thank the reviewers for their important questions and comments.

      In previous studies we found that AOAH is expressed in Kupffer cells and dendritic cells cells (Shao et al., 2007). Single-cell RNAseq analysis of mouse livers by others has found AOAH in Kupffer cells, monocytes, NK cells and ILC1 cells (Remmerie et al.,2020). We also analyzed human liver single-cell RNAseq data and found that AOAH is expressed in monocytes, macrophages, resident and circulating NK cells, and some T cells (Ramachandran et al., 2019) (Please see new Figure 3E). Using clodronate-liposomes to deplete Kupffer cells we found that hepatic AOAH mRNA diminished and nSREBP1 increased (Please see new Figure 5D). These results suggest that Kupffer cells are the major source of AOAH in the liver and that LPS needs to be inactivated in the liver to prevent hepatocyte lipid accumulation.

      Using primary hepatocyte culture, we found that LPS can stimulate hepatocytes directly to induce mTOR activation and SREBP1 activation (new Figure 6E). Adding purified Kupffer cells to the hepatocyte culture did not further increase SREBP1 activation. These results suggest that LPS may directly stimulate hepatocyte to accumulate fat, at least in vitro.

      Both TLR4 and caspase 11 are reported to play important roles in MASLD development (Sharifnia et al., 2015; Zhu et al., 2021). We have crossed Aoah<sup>-/-</sup> mice with TLR4<sup>-/-</sup> mice and found that Aoah<sup>-/-</sup>TLR4<sup>-/-</sup> and Aoah<sup>-/-</sup> mice had similarly severe MASLD. This is probably because TLR4 is required for gut homeostasis (Rakoff-Nahoum et al., 2004); in TLR4 whole-body KO mice compromised gut homeostasis may result in more severe MASLD. By specifically deleting TLR4 on hepatocytes, Yu et al found that NASH-induced fibrosis was mitigated (Yu et al., 2021). In future studies we therefore would need to specifically delete TLR4 in hepatocytes to test whether excessive gut-derived LPS in Aoah<sup>-/-</sup> mice stimulates hepatic TLR4 to induce more severe MASLD. We would also test whether Caspase 11 is required for hepatic fat accumulation in Aoah<sup>-/-</sup> mice.

      It is intriguing to test whether providing exogenous AOAH may mitigate MASLD. We will use an AAV expressing AOAH to test this idea.

      Reviewer #2 (Public review):

      The authors of this article investigated the impact of the host enzyme AOAH on the progression of MASLD in mice. To achieve this, they utilized whole-body Aoah<sup>-/-</sup> mice. The authors demonstrated that AOAH reduced LPS-induced lipid accumulation in the liver, probably by decreasing the expression and activation of SREBP1. In addition, AOAH reduced hepatic inflammation and minimized tissue damage.

      However, this paper is descriptive without a clear mechanistic study. Another major limitation is the use of whole-body KO mice so the cellular source of the enzyme remains undefined. Moreover, since LPS-mediated SREBP1 regulation or LPS-mediated MASLD progression is already documented, the role of AOAH in SREBP1-dependent lipid accumulation and MASLD progression is largely expected.

      Specific comments:

      (1) The overall human relevance of the current study remains unclear.

      It is a good point. We have studied human relevance and show the results in Figure 3E. AOAH expression increased in the hepatic macrophages and monocytes of MASLD patients.

      (2) Is AOAH secreted from macrophages or other immune cells? Are there any other functions of AOAH within the cells?

      AOAH can be secreted from kidney proximal tubule cells and the released AOAH can be taken up by cells that do not express AOAH (Feulner et al., 2004). AOAH can also deacylate oxidized phospholipids, DAMP molecules (Zou et al., 2021).

      (3) Due to using whole-body KO mice, the role of AOAH in specific cell types was unclear in this study, which is one of the major limitations of this study. The authors should at least conduct in vitro experiments using a co-culture system of hepatocytes and Kupffer cells (or other immune cells) isolated from WT or Aoah<sup>-/-</sup> mice.

      Thanks for the suggestion.

      Using clodronate-liposomes, we depleted Kupffer cells and found that hepatic AOAH mRNA diminished and nSREBP1 increased in the liver (Please see new Figure 5D). These results confirm that Kupffer cells are the major source of AOAH in the liver and LPS needs to be inactivated in the liver to prevent hepatocyte lipid accumulation.  Using primary hepatocyte culture, we found that LPS can stimulate hepatocytes directly to induce mTOR activation and SREBP1 activation (new Figure 6E).  These results suggest that LPS may directly stimulate hepatocytes to accumulate fat, at least in vitro.

      (4) It has been well-known that intestinal tight junction permeability is increased by LPS or inflammatory cytokines. However, in Figure 3E, intestinal permeability is comparable between the groups in both diet groups. The authors should discuss more about this result. In addition, intestinal junctional protein should be determined by Western blot and IHC (or IF) to further confirm this finding.

      We have stained ZO-1 (Please see Author response image 1, ZO-1- green fluorescence) in Aoah<sup>+/+</sup> and Aoah<sup>-/-</sup> mouse colonic sections. We did not see a big difference between the two strains of mice.

      Author response image 1.

      Feeding a high fat diet in our mouse facility for 28 weeks has led to increased gut permeability, but there was no difference between Aoah<sup>+/+</sup> and Aoah<sup>-/-</sup>mice. Thus, the more severe MASLD in Aoah<sup>-/-</sup> mice is mainly caused by elevated bioactive LPS instead of increased LPS translocation from the intestine to the liver.

      (5) In Figure 6, the LPS i.g. Aoah<sup>-/-</sup> group is missing. This group should be included to better interpret the results.

      Please see new Figure 6. When we orally gavaged Aoah<sup>-/-</sup> mice with LPS, fecal LPS levels did not increase further. Their liver SREBP1 did not increase further while the SREBP1 target gene expression increased when compared with Aoah<sup>-/-</sup> mice i.g. PBS.

      (6) The term NAFLD has been suggested to be changed to MASLD as the novel nomenclature according to the guidelines of AASLD and EASL.

      Thanks for the suggestion. We have changed NAFLD to MASLD.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Consider using MAFLD rather than NAFLD.

      Thanks for the suggestion. We have changed NAFLD to MASLD.

      References

      Feulner, J.A., M. Lu, J.M. Shelton, M. Zhang, J.A. Richardson, and R.S. Munford. 2004. Identification of acyloxyacyl hydrolase, a lipopolysaccharide-detoxifying enzyme, in the murine urinary tract. Infection and immunity 72:3171-3178.

      Zou, B., M. Goodwin, D. Saleem, W. Jiang, J. Tang, Y. Chu, R.S. Munford, and M. Lu. 2021. A highly conserved host lipase deacylates oxidized phospholipids and ameliorates acute lung injury in mice. eLife 10:

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #2 (Public review):

      The authors have constructively responded to previous referee comments and I believe that the manuscript is a useful addition to the literature. I particularly appreciate the quantitative approach to social behavior, but have two cautionary comments.

      (1) Conceptually it is important to further justify why this particular maximum entropy model is appropriate. Maximum entropy models have been applied across a dizzying array of biological systems, including genes, neurons, the immune system, as well as animal behavior, so would seem quite beneficial to explain the particular benefits here, for mouse social behavior as coarse-grained through the eco-hab chamber occupancy. This would be an excellent chance to amplify what the models can offer for biological understanding, particularly in the realm of social behavior

      We thank the reviewer for this comment. Maximum entropy models, along with other statistical inference methods that learn interaction patterns from simultaneously-measured degrees of freedom, help distinguish various types of interactions, e.g. direct vs. indirect interactions among animals, individual preference to food vs. social interaction with pairs. As research on social behavior expands from focusing on pairs of animals to studying groups in (semi-)naturalistic environments, maximum entropy models serve as a crucial link between high-throughput data and the need to identify and distinguish interaction rules. Specifically, among all possible maximum entropy models, the pairwise maximum entropy model is one of the simplest that can describe interactions among individuals, which serves as an excellent starting point to understand collective and social behavior in animals.

      Although the Eco-HAB setup currently records spatially coarse-grained data, it still provides more spatial information compared to the traditional three-chamber tests used to assess sociability for rodents. By showing that the maximum entropy model can effectively analyze Eco-HAB data, we hope to highlight its potential in research of social behavior in animals.

      To amplify what the models can offer for biological understanding particularly in the realm of social behavior, We have updated the Introduction to add a more logical structure to the need of using maximum entropy models to identify interactions among mice. Additionally, we updated the first paragraph of the Discussion to make it specific that it is the use of maximum entropy models that identifies interaction patterns from the high-throughput data. Finally, we have also added in the Discussion (line 422-425) arguments supporting the specific use of pairwise maximum entropy models to study social behaviors.

      (2) Maximum entropy models of even intermediate size systems involve a large number of parameters. The authors are transparent about that limitation here, but I still worry that the conclusion of the sufficiency of pairwise interactions is simply not general, and this may also relate to the differences from previous work. If, as the authors suggest in the discussion, this difference is one of a choice of variables, then that point could be emphasized. The suggestion of a follow up study with a smaller number of mice is excellent.

      We thank the reviewer for raising the issue and agree that the caveat of how general pairwise interactions can describe social behavior of animals needs to be discussed. We have added a sentence in the Discussion to point out this important caveat. “More generally, this discrepancy when looking at different choices of variables raises the issue that when studying social behavior of animals in a group, it is important to test and compare interaction models with different complexity (e.g. pairwise or with higher-order interactions).” We have also toned down our conclusion to limit our results of pairwise interactions describing mice co-localization patterns to the data collected in Eco-HAB (also see Reviewer 3 Major Point 2).

      Reviewer #3 (Public review):

      Summary:

      Chen et al. present a thorough statistical analysis of social interactions, more precisely, co-occupying the same chamber in the Eco-HAB measurement system. They also test the effect of manipulating the prelimbic cortex by using TIMP-1 that inhibits the MMP-9 matrix metalloproteinase. They conclude that altering neural plasticity in the prelimbic cortex does not eliminate social interactions, but it strongly impacts social information transmission.

      Strengths:

      The quantitative approach to analyzing social interactions is laudable and the study is interesting. It demonstrates that the Eco-HAB can be used for high throughput, standardized and automated tests of the effects of brain manipulations on social structure in large groups of mice.

      Weaknesses:

      A demonstration of TIMP-1 impairing neural plasticity specifically in the prelimbic cortex of the treated animals would greatly strengthen the biological conclusions. The Eco-HAB provides coarser spatial information compared to some other approaches, which may influence the conclusions.

      Recommendations for the authors:  

      Reviewer #3 (Recommendations for the authors):

      Major points

      (1) Do the Authors have evidence that TIMP-1 was effective, as well as specific to the prelimbic cortex?

      We refer to the literature for the effectiveness and specificity of TIMP-1 to the prelimbic cortex.

      Specifically, the study by Okulski et al. (Biol. Psychiatry 2007) provides clear evidence that TIMP1 plays a role in synaptic plasticity in the prefrontal cortex. They showed that TIMP-1 is induced in the medial prefrontal cortex (mPFC) following stimulation that triggers late long-term potentiation (LTP), a key model of synaptic plasticity. Overexpression of TIMP-1 in the mPFC blocked the activity of matrix metalloproteinases (MMPs) and prevented the induction of late LTP in vivo. Similar effects were observed with pharmacological inhibition of MMP-9 in vitro, reinforcing the idea that TIMP-1 regulates extracellular proteolysis as part of the plasticity mechanism in the prefrontal cortex. These findings confirm that TIMP-1 is both effective and active in this specific brain region.

      Further evidence comes from Puścian et al. (Mol. Psychiatry 2022), who used TIMP-1-loaded nanoparticles to influence neuronal plasticity in the amygdala. They found that TIMP-1 affected MMP expression, LTP, and dendritic morphology, showing its impact on synaptic modifications. More directly relevant, Winiarski et al. (Sci. Adv. 2025) demonstrated that injecting TIMP-1-loaded nanoparticles into the prelimbic cortex altered responses to social stimuli, further supporting the idea that TIMP-1 has region-specific effects on behavioral processes.

      We have also updated the main text (page 8, 1st paragraph of “Effect of impairing neuronal plasticity in the PL on subterritory preferences and sociability”) of the manuscript to include the above references.

      (2) The Authors seem to suggest that one main reason for the different results compared to Shemesh et al. 2013 was the coarseness of the Eco-HAB data. In this case, I think this conclusion should be toned down because of this significant caveat.

      We thank the reviewer for pointing this out, and agree that this caveat and difference should be emphasized. To tone down the conclusion, we have

      (1) added details about the Eco-HAB (it being coarse-grained, etc.) in the abstract to tone down the conclusion.

      (2) added to the results summary in the Discussion (top of page 12) that the results are “within in the setup of the semi-naturalistic Eco-HAB experiments”

      (3) added to the Discussion (page 13) that the different results compared to Shemesh et al 2013 means that general studies of social behavior need to compare models with different levels of complexity (e.g. pairwise vs. higher-order interactions). (Also see Reviewer 2 Comment 2.)

      Minor points

      (1) Please explain what is measured in Fig. 1C (what is on the y axis?).

      Figure 1C shows the activity of the mice as measured by the rate of transitions, i.e. the number of times the mice switch boxes during each hour of the day, averaged over all N = 15 mice and T = 10 days (cohort M1). The error bars represent variability of activities across individuals or across days. For mouse-to-mouse variability (blue), we first compute for each mouse its number of transitions averaged over the same hour for all 10 days, then we compute its standard deviation across all 15 mice and plot it as error bars. For day-to-day variability (orange), we first compute for each day the number of transitions for each hour averaged over all mice, then compute its standard deviation across all 10 days as the errorbar. We have added the detailed explanation in the caption of Figure 1C.

      (2) In Fig. 3, it would be better to present the control group also in the main figure instead of the supplementary.

      We have merged Figure 3 and Figure 3 Supplementary 1 to present the control group also in the main figure.

      (3) In Fig. 3 and corresponding supplements, there seems to be a large difference between males and females. I think this would deserve some more discussion.

      While not being the main focus of this paper, we agree with the reviewer that the difference between male and female is important and deserves attention in the discussion and also future study. Thus we have added a paragraph in the Discussion (line 394-399, bottom of page 12).

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this report, the authors made use of a murine cell life derived from a MYC-driven liver cancer to investigate the gene expression changes that accompany the switch from normoxic to hypoxia conditions during 2D growth and the switch from 2D monolayer to 3D organoid growth under normoxic conditions. They find a significant (ca. 40-50%) overlap among the genes that are dysregulated in response to hypoxia in 2D cultures and in response to spheroid formation. Unsurprisingly, hypoxia-related genes were among the most prominently deregulated under both sets of conditions. Many other pathways pertaining to metabolism, splicing, mitochondrial electron transport chain structure and function, DNA damage recognition/repair, and lipid biosynthesis were also identified.

      We thank this reviewer for his/her time and efforts, and the insightful comments.

      Major comments:

      (1) Lines 239-240: The authors state that genes involved in DNA repair were identified as being necessary to maintain survival of both 2D and 3D cultures (Figure S6A). Hypoxia is a strong inducer of ROS. Thus, the ROS-specific DNA damage/recognition/repair pathways might be particularly important. The authors should look more carefully at the various subgroups of the many genes that are involved in DNA repair. They should also obtain at least a qualitative assessment of ROS and ROS-mediated DNA damage by staining for total and mitochondrial-specific ROS using dyes such as CM-H2-DCFDA and MitoSox. Actual direct oxidative damage could be assessed by immunostaining for 8-oxo-dG and related to the sub-types of DNA damage-repair genes that are induced. The centrality of DNA damage genes also raises the question as to whether the previously noted prominence of the TP53 pathway (see point 5 below) might represent a response to ROS-induced DNA damage.

      We thank this reviewer for the insightful comments, and agreed that ROS induced by hypoxia could play a role in modulating DNA repair and consequently cellular essentiality. Although pathway enrichment in Figure S6A (now as Figure 2-figure supplement 4A) showed that DNA repair pathway was essential to cell survival in hypoxia and 3D cultures, the genes associated with this pathway (Ddb1;Brf2;Gtf3c5;Guk1;Taf6) are not typical DNA repair genes. They are more likely involved in gene transcription. However, it will be interesting to see if they are specifically involved in DNA damage in response to ROS, which is out of focus of this study.

      (2) Because most of the pathway differences that distinguish the various cell states from one another are described only in terms of their transcriptome variations, it is not always possible to understand what the functional consequences of these changes actually are. For example, the authors report that hypoxia alters the expression of genes involved in PDH regulation but this is quite vague and not backed up with any functional or empirical analyses. PDH activity is complex and regulated primarily via phosphorylation/dephosphorylation (usually mediated by PDK1 and PDP2, respectively), which in turn are regulated by prevailing levels of ATP and ADP. Functionally, one might expect that hypoxia would lead to the down-regulation of PDH activity (i.e. increased PDH-pSer392) as respiration changes from oxidative to non-oxidative. This would not be appreciated simply by looking at PDH transcript levels. This notion could be tested by looking at total and phospho-PDH by western blotting and/or by measuring actual PDH activity as it converts pyruvate to AcCoA.

      We agreed with this reviewer that PDH activity regulation could be affected by multi-factors, and it is worthy of further validation by other approaches.

      (3) Line 439: Related to the above point: the authors state: "It is likely that blockade of acetyl-CoA production by PDH knockout may force cells to use alternative energy sources under hypoxic and 3D conditions, averting the Warburg effect and promoting cell survival under limited oxygen and nutrient availability in 3D spheroids." This could easily be tested by determining whether exogenous fatty acids are more readily oxidized by hypoxic 2D cultures or spheroids than occurs in normoxic 2D cultures.

      We thank for this suggestion. We apologized for not being able to validate everything.

      (4) Line 472: "Hypoxia induces high expression of Acaca and Fasn in NEJF10 cells indicating that hypoxia promotes saturated fatty acid synthesis...The beneficial effect of Fasn and Acaca KO to NEJF10 under hypoxia is probably due to reduction of saturated fatty acid synthesis, and this hypothesis needs to be tested in the future.". As with the preceding comment, this supposition could readily be supported directly by, for example, performing westerns blots for these enzymes and by showing that incubation of hypoxic 2D cells or spheroids converted more AcCoA into lipid.

      We thank for this suggestion. However, functional validation for the Fasn and Acaca KO is out of focus in this study.

      (5) In Supplementary Figure 2B&C, the central hub of the 2D normoxic cultures is Myc (as it should well be) whereas, in the normoxic 3D, the central hub is TP53 and Myc is not even present. The authors should comment on this. One would assume that Myc levels should still be quite high given that Myc is driven by an exogenous promoter. Does the centrality of TP53 indicate that the cells within the spheroids are growtharrested, being subjected to DNA damage and/or undergoing apoptosis?

      The predicted transcription factor activity analysis was based on the differential ATAC-seq peaks among different culture through pairwise comparisons. If TP53 and MYC were not present under that condition, it did not mean their activity was absent.

      “…the centrality of TP53 indicate that the cells within the spheroids are growth-arrested, being subjected to DNA damage and/or undergoing apoptosis?” This reviewer has raised an interesting question. We are investigating this hypothesis and hopefully we can give a clear answer in the future.

      (6) In the Materials and Methods section (lines 711-720), the description of how spheroid formation was achieved is unclear. Why were the cells first plated into non-adherent 96 well plates and then into nonadherent T75 flasks? Did the authors actually utilize and expand the cells from 144 T75 flasks and did the cells continue to proliferate after forming spheroids? Many cancer cell types will initially form monolayers when plated onto non-adherent surfaces such as plastic Petri dishes and will form spheroid-like structures only after several days. Other cells will only aggregate on the "non-adherent" surface and form spheroid-like structures but will not actually detach from the plate's surface. Have the authors actually documented the formation of true, non-adherent spheroids at 2 days and did they retain uniform size and shape throughout the collection period? The single photo in Supplementary Figure 1 does not explain when this was taken. The authors include a schematic in Figure 2A of the various conditions that were studied. A similar cartoon should be included to better explain precisely how the spheroids were generated and clarify the rationale for 96 well plating. Overall, a clearer and more concise description of how spheroids were actually generated and their appearance at different stages of formation needs to be provided.

      The cells were initially plated in non-adherent 96-well plates to facilitate the formation of spheroids in a controlled and uniform manner. As correctly mentioned by the reviewer, during the initial stages, cells cultured on non-adherent surfaces often form aggregates or clumps, and it takes a few days for them to develop into solid spheroids.

      In our study, we aimed to achieve 3D spheroid formation immediately following the transduction process to allow for screening under both 2D and 3D conditions. Plating the cells into 96-well plates enabled us to monitor and control the formation of spheroids in smaller volumes before scaling up the culture in non-adherent T75 flasks for subsequent experimental steps. This setup allows us to maintain gene editing processes under both 2D and 3D conditions.

      Regarding the proliferation and uniformity of spheroids:

      • Yes, the spheroids continued to proliferate after their formation.

      • True, non-adherent spheroids were documented as early as the next day. This was visually confirmed under microscopy, and size uniformity was maintained throughout the collection period by following optimized culture protocols.

      We also agreed with the reviewer’s suggestion to include a cartoon schematic similar to Figure 2A, illustrating the spheroid generation process and clarifying the rationale for using 96-well plates. We have included such a cartoon and speroid growth curve monitored by Incucyte as Figure 2-figure supplement 2.

      (7) The authors maintained 2D cultures in either normoxic or hypoxic (1% O2) states during the course of their experiments. On the other hand, 3D cultures were maintained under normoxic conditions, with the assumption that the interiors of the spheroids resemble the hypoxic interiors of tumors. However, the actual documentation of intra-spheroid hypoxia is never presented. It would be a good idea for the authors to compare the degree of hypoxia achieved by 2D (1% O2) and 3D cultures by staining with a hypoxia-detecting dye such as Image-iT Green. Comparing the fluorescence intensities in 2D cultures at various O2 concentrations might even allow for the construction of a "standard curve" that could serve to approximate the actual internal O2 concentration of spheroids. This would allow the authors to correlate the relative levels of hypoxia between 2D and 3D cultures.

      This is an excellent idea that we certainly will do it in our future experiments.

      (8) Related to the previous 2 points, the authors performed RNAseq on spheroids only 48 hours after initiating 3D growth. I am concerned that this might not have been a sufficiently long enough time for the cells to respond fully to their hypoxic state, especially given my concerns in Point 6. Might the results have been even more robust had the authors waited longer to perform RNA seq? Why was this short time used?

      We agreed with this reviewer. We were unsure if 48hours was an ideal timepoint. It might be necessary to perform a longitudinal experiment to harvest samples under different timepoints in the future experiments.

      (9) What happens to the gene expression pattern if spheroids are re-plated into standard tissue culture plates after having been maintained as spheroids? Do they resume 2D growth and does the gene expression pattern change back?

      This is a great question and we have never thought about what the gene expression pattern would be if speroids are re-plated in 2D. This could be a challenging experiment because the gene expression and epigenetic changes are timing related. However, the cells do grow well after re-plated in 2D.

      (10) Overall, the paper is quite descriptive in that it lists many gene sets that are altered in response to hypoxia and the formation of spheroids without really delving into the actual functional implications and/or prioritizing the sets. Some of these genes are shown by CRISPR screening to be essential for maintaining viability although in very few cases are these findings ever translated into functional studies (for example, see points 14 above). The list of genes and gene pathways could benefit from a better explanation and prioritization of which gene sets the authors believe to be most important for survival in response to hypoxia and for spheroid formation.

      This was a genome-wide study that integrated RNA-seq, ATAC-seq and CRISPR KO, providing resource to understand the oncogenic pathways in different culture conditions. We believe we have clearly articulated the important genes/pathways in our abstract.

      (11) The authors used a single MYC-driven tumor cell line for their studies. However, in their original paper (Fang, et al. Nat Commun 2023, 14: 4003.) numerous independent cell lines were described. It would help to know whether RNAseq studies performed on several other similar cell lines gave similar results in terms of up & down-regulated transcripts (i.e. representative of the other cell lines are NEJF10 cells).

      We have not generated RNA-seq data for these cell lines cultured in different conditions.

      Reviewer #2 (Public review):

      Summary:

      The manuscript by Fang et al., provides a tour-de-force study uncovering cancer cell's varied dependencies on several gene programs for their survival under different biological contexts. The authors addressed genomic differences in 2D vs 3D cultures and how hypoxia affects gene expression. They used a Myc-driven murine liver cancer model grown in 2D monolayer culture in normoxia and hypoxia as well as cells grown as 3D spheroids and performed CRISPR-based genome-wide KO screen to identify genes that play important roles in cell fitness. Some context-specific gene effects were further validated by in-vitro and in-vivo gene KO experiments.

      Strengths:

      The key findings in this manuscript are:

      (1) Close to 50% of differentially expressed genes were common between 2D Hypoxia and 3D spheroids conditions but they had differences in chromatin accessibility.

      (2) VHL-HIF1a pathway had differential cell fitness outcomes under 2D normoxia vs 2D hypoxia and 3D spheroids.

      (3) Individual components of the mitochondrial respiratory chain complex had contrasting effects on cell fitness under hypoxia.

      (4) Knockout of organogenesis or developmental pathway genes led to better cell growth specifically in the context of 3D spheroids and knockout of epigenetic modifiers had varied effects between 2D and 3D conditions.

      (5) Another key program that leads to cells fitness outcomes in normoxia vs hypoxia is the lipid and fatty acid metabolism.

      (6) Prmt5 is a key essential gene under all growth conditions, but in the context of 3D spheroids even partial loss of Prmt5 has a synthetic lethal effect with Mtap deletion and Mtap is epigenetically silenced specifically in the 3D spheroids.

      We appreciate this reviewer for acknowledging the strengths of our study.

      Issues to address:

      (1) The authors should clarify the link between the findings of the enrichment of TGFb-SMAD signaling REACTOME pathway to the findings that knocking out TGFb-SMAD pathway leads to better cell fitness outcomes for cells in the 3D growth conditions.

      We have clarified this link in abstract by saying “Notably, multicellular organogenesis signaling pathways including TGFb-SMAD, which is upregulated in 3D culture, specifically constrict the uncontrolled cell proliferation in 3D while inactivation of epigenetic modifiers (Bcor, Kmt2d, Mettl3 and Mettl14) has opposite outcomes in 2D vs. 3D:

      (2) Supplementary Figure 4C has been cited in the text but doesn't exist in the supplementary figures section.

      Sorry for this typo. It should be 5C which is Figure 2-figure supplement 3C in the new version of MS. We have corrected it now.

      (3) A small figure explaining this ABC-Myc driven liver cancer model in Supplementary Figure 1 would be helpful to provide context.

      We appreciate this suggestion. We have added a cartoon as Figure 1-figure supplement 1A to indicate the procedure for generation of this model.

      (4) The method for spheroids formation is not found in the method section.

      We described the method in our previous publication (Nature Communications 2023 Jul 6;14(1):4003.). However, we have added the information in method now, and the procedure is very simple (line 623-624). We found the murine liver cancer cell lines can readily form spheroids when they are cultured in low-attachment dish with standard DMEM complete media.

      (5) In Supplementary Figure 1b, the comparisons should be stated the opposite way - 3D vs 2D normoxia and 2D-Hypoxia vs 2D-Normoxia.

      We have made correction in the Figure legend of Figure S1B which is Figure 1B now in the new version of MS.

      (6) There are typos in the legend for Supplementary Figure 10.

      We have checked the typos.

      (7) Consider putting Supplementary Figure 1b into the main Figure 1.

      We have moved both Supplementary Figure 1a and 1b into main Figure 1 as Figure 1A and 1B. Hopefully, this will help the readers to catch the information easily.

      (8) Please explain only one timepoint (endpoint) for 3D spheroids was performed for the CRISPR KO screen experiment, while several timepoints were done for 2D conditions? Was this for technical convenience?

      As this reviewer speculated, indeed this was for technical convenience. We found that it was technically challenging to split the spheroids for CRISPR screening.

      (9) In line 372, it is indicated that Bcor KO (Fig 5e) had growth advantage - this was observed in only one of the gRNA -- same with Kmt2d KO in the same figure where there was an opposite effect. Please justify the use of only one gRNA.

      We actually used 4 gRNAs for each gene. In the heatmap, although one of the gRNA for each gene showed some levels of enrichment under hypoxic 2D condition, they were all highly enriched in 3D.

      (10) Why was CRISPR based KO strategy not used for the PRMT5 gene but rather than the use of shRNA.? Note that one of the shRNA for PRMT5 had almost no KO (PRMT5-shRNA2 Figure 7B) but still showed phenotype (Figure 7D) - please explain.

      We used shRNA as second approach for cross-validation. We agreed that the knockdown efficiency of shRNA2 was not as good as the others, with only about 40% knockdown efficiency.

      (11) In Figure 7D, which samples (which shRNA group) were being compared to do the t-test?

      The comparisons were for shCtrl and each of the shPRMT5. We have clarified this in figure legend.

      (12) In line 240, it is stated that oxphos gene set is essential for NEJF10 cell survival in both normoxia and hypoxia conditions. But shouldn't oxphos be non-essential in hypoxia as cells move away from oxphos and become glycolytic?

      This is a great question. While indeed hypoxia may promote the switch from oxphos to glycolysis, several studies showed that the low oxygen concentrations in hypoxic regions of tumors may not be limiting for oxphos, and ATP is generated by oxphos in tumors even at very low oxygen tensions (please see review Clin Cancer Res (2018) 24 (11): 2482–2490.). We therefore speculated that NEJF10 cells were still dependent on oxphos for ATP production under hypoxia. However, this needs further investigation. We have added this discussion in our manuscript (line 250-254).

      (13) In line 485 it is mentioned that Pmvk and Mvd genes which are involved in cholesterol synthesis when knocked out had a positive effect on cell growth in 3D conditions and since cholesterol synthesis is essential for cell growth how does this not matter much in the context of 3D - please explain.

      We thank this reviewer for this note. It seemed that only two gRNA for each were upregulated in 3D and it could be due to technical issue or clonal selection. We have deleted this sentence in our new version of MS.

      Reviewer #3 (Public review):

      Summary:

      In this study, Fang et al. systematically investigate the effects of culture conditions on gene expression, genome architecture, and gene dependency. To do this, they cultivate the murine HCC line NEJF10 under standard culture conditions (2D), then under similar conditions but under hypoxia (1% oxygen, 2D hypoxia) and under normoxia as spheroids (3D). NEJF10 was isolated from a marine HCC model that relies exclusively on MYC as a driver oncogene. In principle, (1) RNA-seq, (2) ATAC-seq and (3) genetic screens were then performed in this isogenic system and the results were systematically compared in the three cultivation methods. In particular, genome-wide screens with the CRISPR library Brie were performed very carefully. For example, in the 2D conditions, many different time points were harvested to control the selection process kinetically. The authors note differential dependencies for metabolic processes (not surprisingly, hypoxia signaling is affected) such as the regulation and activity of mitochondria, but also organogenesis signaling and epigenetic regulation.

      Strengths:

      The topic is interesting and relevant and the experimental set-up is carefully chosen and meaningful. The paper is well written. While the study does not reveal any major surprises, the results represent an important resource for the scientific community.

      We thank this reviewer for his/her positive comments.

      Weaknesses:

      However, this presupposes that the statistical analysis and processing are carried out very carefully, and this is where my main suggestions for revision begin. Firstly, I cannot find any information on the number of replicates in RNA- and ATAC-seq. This should be clearly stated in the results section and figure legends and cut-offs, statistical procedures, p-values, etc. should be mentioned as well. In principle, all NGS experiments (here ATAC- and RNA-seq) should be performed in replicates (at least duplicates, better triplicates) or the results should be validated by RT-PCR in independent biological triplicates. Secondly, the quantification of the analyses shown in the figures and especially in the legends is not sufficiently careful. Units are often not mentioned. Example Figure 4a: The legend says: 'gRNA reads' but how can the read count be -1? I would guess these are FC, log2FC, or Z-values. All figure legends need careful revision.

      Based upon the reviewer’s suggestions, we have added details about the replicates in figure legend. For gRNA read heatmap, the scale bar indicates the Z score. We have added the information in figure legends.

      Furthermore, I would find a comparison of the sgRNA abundances at the earliest harvesting time with the distribution in the library interesting, to see whether and to what extent selection has already taken place before the three culture conditions were established (minor point).

      This is great point. Unfortunately, we did not perform such an analysis.

      Recommendations for the authors:

      Reviewing Editor:

      There are three general issues:

      First, there is a lack of detail regarding much of the analysis. In some cases, this makes it difficult to assess the value of the data, albeit, there is generally a consensus the information is really interesting.

      Second, the findings - although provocative - lack mechanistic details and are focused more on descriptive findings. Hence, the manuscript would be improved by some effort at evaluating identified programs and providing some suggestions of mechanisms.

      Third, the authors need to put much more effort into the clarity and tightness of the presentation.

      We have made clarification in response to the reviewer’s comments.

      Reviewer #1 (Recommendations for the authors):

      Figure S1C. the labeling of the lower x-axis is inverted.

      Due to space limitation, we changed the figure orientation in our old version of MS. We have tilted the figure back in the new version, which is Figure 1-figure supplement 1B now.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1:

      Summary:

      The authors address the role of the centromere histone core in force transduction by the kinetochore.

      Strengths:

      They use a hybrid DNA sequence that combines CDEII and CDEIII as well as Widom 601 so they can make stable histones for biophysical studies (provided by the Widom sequence) and maintain features of the centromere (CDE II and III).

      Weaknesses:

      The main results are shown in one figure (Figure 2). Indeed the Centromere core of Widom and CDE II and III contribute to strengthening the binding force for the OA-beads. The data are very nicely done and convincingly demonstrate the point. The weakness is that this is the entire paper. It is certainly of interest to investigators in kinetochore biology, but beyond that, the impact is fairly limited in scope.

      This reviewer might have missed that this is a Research Advance, not an article. Research Advances are limited in scope by definition and provide a new development that builds on research reported in a prior paper. They can be of any length. Our Research Advance builds on our prior work, Hamilton et al., 2020 and provides the new result that native centromere sequences strengthen the attachment of the kinetochore to the nucleosome.

      Reviewer #2:

      Summary:

      This paper provides a valuable addendum to the findings described in Hamilton et al. 2020 (https://doi.org/10.7554/eLife.56582). In the earlier paper, the authors reconstituted the budding yeast centromeric nucleosome together with parts of the budding yeast kinetochore and tested which elements are required and sufficient for force transmission from microtubules to the nucleosome. Although budding yeast centromeres are defined by specific DNA sequences, this earlier paper did not use centromeric DNA but instead the generic Widom 601 DNA. The reason is that it has so far been impossible to stably reconstitute a budding yeast centromeric nucleosome using centromeric DNA.

      In this new study, the authors now report that they were able to replace part of the Widom 601 DNA with centromeric DNA from chromosome 3. This makes the assay more closely resemble the in vivo situation. Interestingly, the presence of the centromeric DNA fragment makes one type of minimal kinetochore assembly, but not the other, withstand stronger forces.

      We thank the reviewer for their careful and positive assessment of our work.

      Which kinetochore assembly turned out to be affected was somewhat unexpected, and can currently not be reconciled with structural knowledge of the budding yeast centromere/kinetochore. This highlights that, despite recent advances (e.g. Guan et al., 2021; Dendooven et al., 2023), aspects of budding yeast kinetochore architecture and function remain to be understood and that it will be important to dissect the contributions of the centromeric DNA sequence.

      We couldn’t agree more.

      Given the unexpected result, the study would become yet more informative if the authors were able to pinpoint which interactions contribute to the enhanced force resistance in the presence of centromeric DNA.

      Strength:

      The paper demonstrates that centromeric DNA can increase the attachment strength between budding yeast microtubules and centromeric nucleosomes.

      Weakness:

      How centromeric DNA exerts this effect remains unclear.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      (1) Additional specific mutants would be helpful in interpreting the effect observed. The authors speculate that a small segment of OA near the DNA (based on Dendooven et al., 2023) could be important. Would it be possible to introduce specific mutations and test this?

      This would be an interesting study but is far beyond the scope of a Research Advance. In fact, it would make a nice thesis project for a new student. Although perhaps not obvious, these studies require a large set of reagents including wrapped nucleosomes, which must be made fresh (they cannot be frozen) and five purified recombinant complexes, purified by specialized protocols that maintain their activity. Moreover, each datapoint is gathered one at a time. For example, the data in Figure 2 in this manuscript includes 343 datapoints acquired one at a time over the course of 1.5 years.  

      (2) Please provide the sequences of the other CEN3-W601 chimeras that were tested and did NOT stably wrap centromeric histone octamers. This may help others to design yet different constructs in the future. (Maybe the information is there and I didn't see it?)

      We fully agree and thank the reviewer for this excellent suggestion. The sequences and summaries of their wrapping stability are now provided in Table 3, page 17.

      (3) I wonder whether the authors tested the C0N3 sequence used in Dendooven et al., 2023. If not, could it be tested? This would more tightly couple the functional assay shown here with the structural work.

      We did not test the CON3 sequence, which was published several years after the start of this work. We agree that a tight coupling between the functional assay and the structural work would be useful. However, we also see the advantage of being able to go beyond the structural work and include even more CEN3 sequence than has so far been possible in the structural work.  

      In addition to measuring the role of DNA sequence in Okp1/Ame1 attachment to the nucleosome, we were interested in the role of DNA sequence in the attachment of Mif2. Therefore, we included all 35 bp of the Mif2 footprint in our chimeric CCEN DNA sequence. CON3 only includes 8 bp from CDEII. We did produce stable nucleosomes using CEN3-601 from Guan et al. (see Table 3). Again, CEN3-601 only includes 8 bp of the Mif2 footprint so we opted to study nucleosomes wrapped in our CCEN DNA with the entire Mif2 footprint. Curiously we found that even the entire Mif2 footprint was not enough to find the DNA sequence specificity seen in the EMSA experiments reported by Xiao et al., 2017.

      To help readers understand the differences between all these constructs, we have included them in Table 3.

      (4) Would an AlphaFold 3 prediction of the assemblies used in this paper be feasible and useful?

      The structures of the Dam1 complex (Jenni et al., 2018), Ndc80 complex (Zahm, et al., 2023 and references therein), MIND complex (Dimitrova et al., 2016), OA complex (Dendooven et al., 2023), and the nucleosome (Xaio et al., 2017; Yan et al., 2019; Guan et al., 2021; Dendooven et al., 2023) are published. The interactions between many of these complexes are understood beyond the level that AlphaFold3 could provide (Dimitrova et al., 2016; Dendooven et al., 2023). One of the main questions is how Mif2 interacts with the nucleosome and the other components of the kinetochore. Even structural analyses that included Mif2 in the assembly detect little or no Mif2 in the final structure. Unfortunately, AlphaFold3 is also not helpful as it predicts only the structure of the dimerization domain, which was already known (Cohen et al., 2008).

      AlphaFold3 predicts the rest of Mif2 is largely unstructured with several alpha helices predicted with low confidence.

      (5) Given that the centromeric DNA piece included should be able to bind the CBF3 complex, would it be possible to add this complex and test the effect on force transmission?

      This would be an interesting experiment, and we do expect CBF3 to bind. As stated above, this is far beyond the scope of this Research Advance. In our experience, with each new kinetochore subcomplex that we add into our reconstitutions, there are new challenges purifying the subcomplex in active form and in sufficient quantity. We are eager to add CBF3 but this is not something we can pull off in the context of this Research Advance. Thank you again for the time and energy spent reviewing our manuscript

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      The authors set out to analyse the roles of the teichoic acids of Streptococcus pneumoniae in supporting the maintenance of the periplasmic region. Previous work has proposed the periplasm to be present in Gram positive bacteria and here advanced electron microscopy approach was used. This also showed a likely role for both wall and lipo-teichoic acids in maintaining the periplasm. Next, the authors use a metabolic labelling approach to analyse the teichoic acids. This is a clear strength as this method cannot be used for most other well studied organisms. The labelling was coupled with super-resolution microscopy to be able to map the teichoic acids at the subcellular level and a series of gel separation experiments to unravel the nature of the teichoic acids and the contribution of genes previously proposed to be required for their display. The manuscript could be an important addition to the field but there are a number of technical issues which somewhat undermine the conclusions drawn at the moment. These are shown below and should be addressed. More minor points are covered in the private Recommendations for Authors.

      Weaknesses to be addressed:

      (1) l. 144 Was there really only one sample that gave this resolution? Biological repeats of all experiments are required.

      CEMOVIS is a very challenging method that is not amenable to numerous repeats. However, multiple images were recorded from at least two independent samples for each strain. Additional sample images are shown in a new Fig. S3.

      CETOVIS is even more challenging (only two publications in Pubmed since 2015) and was performed on a single ultrathin section that, exceptionally, laid perfectly flat on the EM grid, allowing tomography data acquisition on ∆tacL cells. The reconstructed tomogram confirmed the absence of a granular layer in the depth of the section. Additionally, the numbering of Fig. S4A-B (previously misidentified as Fig. S2A-B) has been corrected in the text of V2.

      (2) Fig. 4A. Is the pellet recovered at "low" speeds not just some of the membrane that would sediment at this speed with or without LTA? Can a control be done using an integral membrane protein and Western Blot? Using the tacL mutant would show the behaviour of membranes alone.

      We think that the pellet is not just some of the membrane but most of it. In support of this view, the “low” speed pellets after enzymatic cell lysis contain not just some membrane lipids, but most of them (Fig. S10A). We therefore expect membrane proteins to be also present in this fraction. We performed a Western blot using antibodies against the membrane protein PBP2x (new Fig. S7C). Unfortunately, no signal was detected most likely due to protein degradation from contaminant proteases that we could trace to the purchased mutanolysin. The same sedimentation properties were observed with the ∆tacL strain as shown in Fig. 6A. However, in the ∆tacL strain the membrane pellet still contains membrane-bound TA precursors. It is therefore impossible to test definitely if pneumococcal membranes totally devoid of TA would sediment in the same way.

      (3) Fig. 4A. Using enzymatic digestion of the cell wall and then sedimentation will allow cell wall associated proteins (and other material) to become bound to the membranes and potentially effect sedimentation properties. This is what is in fact suggested by the authors (l. 1000, Fig. S6). In order to determine if the sedimentation properties observed are due to an artefact of the lysis conditions a physical breakage of the cells, using a French Press, should be carried out and then membranes purified by differential centrifugation. This is a standard, and well-established method (low-speed to remove debris and high-speed to sediment membranes) that has been used for S. pneumoniae over many years but would seem counter to the results in the current manuscript (for instance Hakenbeck, R. and Kohiyama, M. (1982), Purification of Penicillin-Binding Protein 3 from Streptococcus pneumoniae. European Journal of Biochemistry, 127: 231-236).

      Thank you for this suggestion. We have tested this hypothesis by breaking cells with a Microfluidizer followed by differential centrifugation. This experiment, which requires an important minimal volume, was performed with unlabeled cells (due to the cost of reagents) and assessed by Western blot using antibodies against the membrane protein PBP2x (new Fig. S7C). In this case, the majority of the membrane material was found in the high-speed pellet, as expected.

      We also applied the spheroplast lysis procedure of Flores-Kim et al. to the labeled cells, and found that most of the labeled material sedimented at low speed (new Fig. S7B), as observed with our own procedure.

      With these new results, the section on membrane density has been removed from the Supplementary Information. Instead, the fractionation is further discussed in terms of size of membrane fragments and presence of intact spheroplasts in the notes in Supplementary Information preceding Fig. S7.

      (4) l. 303-305. The authors suggest that the observed LTA-like bands disappear in a pulse chase experiment (Fig. 6B). What is the difference between this and Fig. 5B, where the bands do not disappear? Fig. 5C is the WT and was only pulse labelled for 5 min and so would one not expect the LTA-like bands to disappear as in 6B?

      Fig. 6B shows a pulse-chase experiment with strain ∆tacL, whereas Fig. 5C shows a similar experiment with the parental WT strain. The disappearance of the LTA-like band pattern with the ∆tacL strain (Fig. 6B), and their persistence in the WT strain (Fig. 5C), indicate that these bands are the undecaprenyl-linked TA in ∆tacL and proper LTA in the WT. A sentence has been added to better explain this point in V2.

      Note that we have exchanged the previous Fig. 5C and Fig. S13B, so that the experiments of Fig. 5A and 5C are in the same medium, as suggested by Reviewer #2.

      (5) Fig. 6B, l. 243-269 and l. 398-410. If, as stated, most of the LTA-like bands are actually precursor then how can the quantification of LTA stand as stated in the text? The "Titration of Cellular TA" section should be re-evaluated or removed? If you compare Fig. 6C WT extract incubated at RT and 110oC it seems like a large decrease in amount of material at the higher temperature. Thus, the WT has a lot of precursors in the membrane? This needs to be quantified.

      Indeed, the quantification of the ratio of LTA and WTA in the WT strain rests on the assumption that the amount of membrane-linked polymerized TA precursors is negligible in this strain. This assumption is now stated in the Titration section. We think it is the case. The true LTA and TA precursors do not have exactly the same electrophoretic mobility, being shifted relative to each other by about half a ladder “step”. This difference is visible when samples are run in adjacent lanes on the same gel, as in the new Fig. 6C. The difference of migration was well documented in the original paper about the deletion of tacL, although tacL was known as rafX at that time, and the ladders were misidentified as WTA (Wu et al. 2014. A novel protein, RafX, is important for common cell wall polysaccharide biosynthesis in Streptococcus pneumoniae: implications for bacterial virulence. J Bacteriol. 196, 3324-34. doi: 10.1128/JB.01696-14). This reference was added in V2. The experiment in the new Fig. 6C was repeated to have all samples on the same gel and treated at a lower temperature. The minor effect on the amount of LTA when WT cells are heated at pH 4.2 may be due to the removal of some labeled phosphocholine. We have NMR evidence that the phosphocholine in position D is labile to acidic treatment of LTA, which may lack in some cases, as reported by Hess et al. (Nat Commun. 2017 Dec 12;8(1):2093. doi: 10.1038/s41467-017-01720-z).

      (6) L. 339-351, Fig. 6A. A single lane on a gel is not very convincing as to the role of LytR. Here, and throughout the manuscript, wherever statements concerning levels of material are made, quantification needs to be done over appropriate numbers of repeats and with densitometry data shown in SI.

      Yes indeed. Apart from the titration of TA in the WT strain, we haven’t yet carried out a thorough quantification of TA or LTA/WTA ratio in different strains and conditions, although we intend to do so in a follow-up study, using the novel opportunities offered by the method presented here.

      However, to better substantiate our statement regarding the ∆lytR strain, we have quantified two experiments performed in C-medium with azido-choline, and two experiments of pulse labeling in BHI medium. The results are presented in the additional supplementary Fig. S14. The value of 51% was a calculation error, and was corrected to 41%. Likewise, the decrease in the WTA/LTA ratio was corrected to 5 to 7-fold.

      (7) 14. l. 385-391. Contrary to the statement in the text, the zwitterionic TA will have associated counterions that result in net neutrality. It will just have both -ve and +ve counterions in equal amounts (dependent on their valency), which doesn't matter if it is doing the job of balancing osmolarity (rather than charge).

      Thank you for pointing out this point. The paragraph has been corrected in V2.

      Reviewer #2 (Public review):

      The Gram-positive cell wall contains for a large part of TAs, and is essential for most bacteria. However, TA biosynthesis and regulation is highly understudied because of the difficulties in working with these molecules. This study closes some of our important knowledge gaps related to this and provides new and improved methods to study TAs. It also shows an interesting role for TAs in maintaining a 'periplasmic space' in Gram positives. Overall, this is an important piece of work. It would have been more satisfying if the possible causal link between TAs and periplasmic space would have been more deeply investigated with complemented mutants and CEMOVIS. For the moment, there is clearly something happening but it is not clear if this only happens in TA mutants or also in strains with capsules/without capsules and in PG mutants, or in lafB (essential for production of another glycolipid) mutants. Finally, some very strong statements are made suggesting several papers in the literature are incorrect, without actually providing any substantiation/evidence supporting these claims. Nevertheless, I support the publication of this work as it pioneers some new methods that will definitively move the field forward.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) l. 55 It is stated that TA are generally not essential. This needs to be introduced in a little more detail as in several species they are collectively. Need some more references here to give context.

      We have expended the paragraph and added a selection of references in V2.

      (2) l. 63 and Fig. 1A. Is the model based on the images from this paper? Is the periplasm as thick as the peptidoglycan layer? Would you not expect the density of WTA to be the same throughout the wall, rather than less inside? Do the authors think that the TA are present as rods in the cell envelope and because of this the periplasm looks a little like a bilayer, is this so? Is the relative thickness of the layers based on the data in the paper (Table 1)?

      The model proposed in Fig. 1A is not based on our data. It is a representation of the model proposed by Harold Erickson, and the appropriate reference has been added to the figure legend in V2. We do not speculate on the relative density of WTA inside the peptidoglycan layer, at the surface or in the periplasm. The only constraint from the model is that the density of WTA in the periplasm should be sufficient for self-exclusion and allow the brush polymer theory to apply. The legend has been amended in V2.

      We indeed think that the bilayer appearance of the periplasmic space in the wild type strain, and the single layer periplasmic space in the ∆tacL and ∆lytR support the Erickson’s model. Although the model was drawn arbitrarily, it turns out that the relative thickness of the peptidoglycan and periplasmic scale is in rough agreement with the measurements reported in Table 1.

      (3) Fig. 2. It is hard to orient oneself to see the layers. The use of the term periplasmic space (l. 132) and throughout is probably not wise as it is not a space.

      We prefer to retain this nomenclature since the term periplasmic space has been used in all the cell envelope CEMOVIS publications and is at the core of Erickson’s hypothesis about these observations and teichoic acids.

      (4) L. 147. This is not referring to Fig. S2A-B as suggested but Fig. S3A-B.

      This has been corrected.

      (5) l. 148. How do you know the densities observed are due to PG or certainly PG alone? Perhaps it is better to call this the cell wall.

      Yes. Cell wall is a better nomenclature and the text and Table 1 have been corrected in V2, in accordance with Fig. 2.

      (6) l. 165. It is also worth noting that peripheral cell wall synthesis also happens at the same site so this may well not be just division.

      Yes. We have replaced “division site” by “mid-cell” in V2.

      (7) l. 214 What is the debris? If PG digestion has been successful then there will be marginal debris. Is this pellet translucent (like membranes)? If you use fluorescently labelled PG in the preparation has it all disappeared, as would be expected by fully digested and solubilised material?

      In traditional protocols of bacterial membrane preparation, a low-speed centrifugation is first performed to discard “debris” that to our knowledge have not been well characterized but are thought to consist of unbroken cells and large fragments of cell wall. After enzymatic degradation of the pneumococcal cell wall, the low-speed pellet is not translucent as in typical membrane pellets after ultracentrifugation, but is rather loose, unlike a dense pellet of unbroken cells. A description of the pellet appearance was added in V2.

      It is a good idea to check if some labeled PG is also pelleted at low-speed after digestion. In a double labeling experiment using azido-choline and a novel unpublished metabolic probe of the PG, we found that the PG was fully digested and labeled fragments migrated as a couple of fuzzy bands likely corresponding to different labeled peptides. These species were not pelleted at low speed.

      (8) l. 219. Can you give a reference to certify that the low mobility material is WTA? Why does it migrate differently than LTA? Or is the PG digestion not efficient?

      WTA released from sacculi by alkaline lysis were found to migrate as a smear at the top of native gels revealed by alcian-blue silver staining, which is incompatible with SDS (Flores-Kim, 2019, 2022). The references have be added in V2. It could be argued in this case that the smearing was due to partial degradation of the WTA by the alkaline treatment.

      Bui et al. (2012) reported the preparation of WTA by enzymatic digestion of sacculi, but the resulting WTA were without muropeptide, presumably due to a step of boiling at pH 5 used to deactivate the enzymes.

      To our knowledge, this is the first report of pneumococcal WTA prepared by digestion of sacculi and analyzed by SDS-PAGE. Since the migration of WTA in native and SDS-PAGE is similar, we hypothesize that they do not interact significantly with the dodecyl sulphate, in contrast to the LTA, which bear a lipidic moiety. The fuzziness of the WTA migration pattern may also result from the greater heterogeneity due to the attached muropeptide, such as different lengths (di-, tetra-saccharide…), different peptides despite the action of LytA (tri-, tetra-peptide…), different O-acetylation status, etc.

      (9) L. 226-227, Fig S8. Presumably several of the major bands on the Coomassie stained gel are the lysozyme, mutanolysin, recombinant LytA, DNase and RNase used to digest the cell wall etc.? Can the sizes of these proteins be marked on the gel. Do any of them come down with the material at low-speed centrifugation?

      We have provided a gel showing the different enzymes individually and mixed (new Fig. S9G). While performing several experiments of this type, we found that the mutanolysin might be contaminated with proteases. The enzymes do not appear to sediment at low speed.

      (10) Fig. S9B. It is difficult to interpret what is in the image as there appear to be 2 populations of material (grey and sometimes more raised). Does the 20,000 g material look the same?

      Fig. S10B is a 20,000 × g pellet. We agree that there appears to be two types of membrane vesicles, but we do not know their nature.

      (11) l. 277 and Fig. 5A. Why is it "remarkable" that there are apparently more longer LTA molecules as the cell reach stationary phase?

      This is the first time that a change of TA length is documented. Such a change could conceivably have consequences in the binding and activity of CBPs and the physiology of the cell envelope in general. These questions should be adressed in future studies.

      (12) l. 280. How do you know which is the 6-repeat unit?

      It is an assumption based on previous analyses by Gisch et al.( J Biol Chem 2013, 288(22):15654-67. doi: 10.1074/jbc.M112.446963). The reference was added.

      (13) Fig. 5A and C. Panel C, the cells were grown in a different medium and so are not comparable to Panel A. Why is Fig. S12B not substituted for 5B? Presumably these are exponential phase cells.

      We have interverted the Fig. S13B and 5C in V2, as suggested, and changed the text and legends accordingly.

      Reviewer #2 (Recommendations for the authors):

      L30: vitreous sections?

      Corrected in V2.

      L32: as their main universal function --> as a universal function. To show it's the main universal function, you will need to look at this across various bacterial species.

      Changed to “possible universal function” in V2.

      L35: enabled the titration the actual --> titration of the actual?

      Corrected in V2.

      L34: consider breaking up this very long sentence.

      Done in V2.

      L37: may compensate the absence--> may compensate for the absence.

      Corrected in V2.

      L45: Using metabolic labeling and electrophoresis showed --> Metabolic labeling and...

      Corrected in V2.

      L46: This finding casts doubts on previous results, since most LTA were likely unknowingly discarded in these studies. This needs to be rephrased and is unnecessarily callous. While the current work casts doubts on any quantitative assessments of actual LTA levels measured in previous studies, it does not mean any qualitative assessments or conclusions drawn from these experiments are wrong. Better would be to say: These findings suggest that previously reported quantitative assessments of LTA levels are likely underestimating actual LTA levels, since much of the LTA would have been unknowingly discarded.

      If the authors do think that actual conclusions are wrong in previous work, then they need to be more explicit and explain why they were wrong.

      Yes indeed. The statement was toned down in V2.

      L55: Although generally non-essential. I would remove or rephrase this statement. I don't think any TA mutant will survive out in the wild and will be essential under a certain condition. So perhaps not essential for growth under ideal conditions, but for the rest pretty essential.

      The paragraph was amended by qualifying the essentiality to laboratory conditions and including selected references.

      L95: Note that the prevailing model until reference 20 (Gibson and Veening) was that the TA is polymerized intracellularly (see e.g. Figure 2 of PMID: 22432701, DOI: 10.1089/mdr.2012.0026). This intracellular polymerisation model seemed unlikely according to Gibson and Veening ('As TarP is classified by PFAM as a Wzy-type polymerase with predicted active site outside the cell, we speculate that TarP and TarQ polymerize the TA extracellularly in contrast to previous reports.'), but there is no experimental evidence as far as this referee knows of either model being correct.

      Despite the lack of experimental evidence, we think that Gibson and Veening are very likely correct, based on their argument, and also by analogy with the synthesis of other surface polysaccharides from undecaprenyl- or dolichol-linked precursors. It is unfortunate that Figure 2 of PMID: 22432701, DOI: 10.1089/mdr.2012.0026 was published in this way, since there was no evidence for a cytoplasmic polymerization, to our knowledge.

      L97: It is commonly believed, although I'm not sure it has ever been shown, that the capsule is covalently attached at the same position on the PG as WTA. Therefore, there must be some sort of regulation/competition between capsule biosynthesis and WTA biosynthesis (see also ref. 21). The presence of the capsule might thus also influence the characteristics of the periplasmic space. Considering that by far most pneumococcal strains are encapsulated, the authors should discuss this and why a capsule mutant was used in this study and how translatable their study using a capsule mutant is to S. pneumoniae in general.

      A paragraph was added in the Introduction of V2 to present the complication and a sentence was added at the end of the discussion to mention that this should be studied in the future.

      L102: Ref 29 should probably be cited here as well?

      Since in Ref 29 (Flores-Kim et al. 2019) there is a detectable amount of LTA (presumably precursors TA) in the ∆tacL stain, we prefer to cite only Hess et al. 2017 regarding the absence of LTA in the absence of TacL. However, we added in V2 a reference to Flores-Kim et al. 2019 in the following paragraph regarding the role of the LTA/WTA ratio.

      L106: dependent on the presence of the phosphotransferase LytR (21). --> dependent on the presence of the phosphotransferase LytR, whose expression is upregulated during competence (21).

      Corrected in V2.

      L119: I fail to see how the conclusions drawn by other groups (I assume the authors mean work from the Vollmer, Rudner, Bernhardt, Hammerschmidt, Havarstein, Veening groups?) are invalid if they compared WTA:LTA ratios between strains and conditions if they underestimated the LTA levels? Supposedly, the LTA levels were underestimated in all samples equally so the relative WTA/LTA ratio changes will qualitatively give the same outcome? I agree that these findings will allow for a reassessment of previous studies in which presumably too low LTA levels were reported, but I would not expect a difference in outcome when people compared WTA:LTA ratios between strains?

      The sentence was rephrased in V2 to be neutral regarding previous work and rather emphasize future possibilities.

      L131: Perhaps it would be good to highlight that such a conspicuous space has been noticed before by other EM methods (see e.g. Figs.4 and 5 or ref 19, or one of the most clear TEM S. pneumoniae images I have seen in Fig. 1F of Gallay et al, Nat. Micro 2021). However, always some sort of staining had previously been performed so it was never clear this was a real periplasmic space. CEMOVIS has this big advantage of being label free and imaging cells in their presumed native state.

      Thanks for pointing out these beautiful data that we had overlooked. We have added a few sentences and references in the Discussion of V2.

      L201: References are not numbered.

      Corrected in V2.

      L271/L892: Change section title. 'Evolution' can have multiple meanings. It would be more clear to write something like 'Increased TA chain length in stationary phase cells' or something like that.

      Changed in V2.

      L275: harvested

      Corrected in V2.

      L329: add, as suggested shown previously (I guess refs 24 and 29)

      Reference to Hess et al. 2017 has been added in V2. A sentence and further references to Flores-Kim, 2019, 2022 and Wu et al. 2014 were added at the end of the discussion with respect to the LTA-like signal observed in these studies of ∆tacL strains.

      L337: I think a concluding sentence is warranted here. These experiments demonstrate that membrane-bound TA precursors accumulate on the outside of the membrane, and are likely polymerized on the outside as well, in line with the model proposed in ref. 20.

      From the point of view of formal logic, the accumulation of membrane-bound TA precursors on the outer face of the membrane does not prove that they were assembled there. They could still be polymerized inside and translocated immediately. However, since this is extremely unlikely for the reasons discussed by Gibson and Veening, we have added a mild conclusion sentence and the reference in V2.

      L343: How accurate are these quantifications? Just by looking at the gel, it seems there is much less WTA in the lytR mutant than 50% of the wild type?

      Yes, the 51% value was a calculation error. This was changed to 41%. Likewise, the decrease of the WTA amount relative to LTA was corrected to 5- to 7-fold.

      Apart from the titration of TA in the WT strain, we haven’t yet carried out a careful quantification neither of TA nor of the LTA/WTA ratio in different strains and conditions, although we intend to do so in the near future using the method presented here.

      However, to better substantiate our statement regarding the ∆lytR strain, we have quantified two experiments of growth in C-medium with azido-choline, and two experiments of pulse labeling in BHI medium. The results are presented in the additional supplementary Fig. S14.

      L342: although WTA are less abundant and LTA appear to be longer (Fig. 6A). although WTA are less abundant and LTA appear to be longer (Fig. 6A), in line with a previous report showing that LytR the major enzyme mediating the final step in WTA formation (ref. 21). (or something like that). Perhaps better is to start this paragraph differently. For instance: Previous work showed that LytR is the major enzyme mediating the final step in WTA formation (ref. 21). As shown in Fig. 6A, the proportion of WTA significantly decreased in the lytR mutant. However, there was still significant WTA present indicating that perhaps another LCP protein can also produce WTA.

      Changed in V2.

      Of note, WTA levels would be a lot lower in encapsulated strains as used in Ref. 21 (assuming WTA and capsule compete for the same linkage on PG). So perhaps it would be hard to detect any residual WTA in a encapsulated lytR mutant?

      Investigation of the relationship between TA and capsule incorporation or O-acetylation is definitely a future area of study using this method of TA monitoring.

      L371: see my comments related to L131. Some TEM images clearly show the presence of a periplasmic space.

      Comments and references have been added in V2.

      L402: It would be really interesting to perform these experiments on a wild type encapsulated strain. Would these have much more LTA? (I understand you cannot do these experiments perhaps due to biosafety, but it might be interesting to discuss).

      Yes. It would be interesting to compare the TA in D39 and D39 ∆cps strains. We have added this perspective at the end of the discussion in V2.

      L418: ref lacks number

      Corrected in V2.

      L423: refs missing.

      References added in V2.

      L487: See my comments regarding L46. I do not see one valid point in the current paper why underestimating LTA levels would change any of the conclusions drawn in Ref. 21. I do not know the other papers cited well enough, but it seems highly unlikely that their conclusions would be wrong by systematically underestimating LTA levels. As far as I understand it, this current work basically confirms the major conclusions drawn by these 'doubtful' papers (that TacL makes LTA and LytR is the main WTA producer). As such, I find this sentence highly unfair without precisely specifying what the exact doubts are. Sure, this current paper now shows that probably people have discarded unknowingly LTA and therefore underestimated LTA levels, so any quantitative assessment of LTA levels are probably wrong. That is one thing. But to say this casts doubts on these studies is very serious and unfair (unless the authors provide good arguments to support these serious claims).

      Yes indeed. The sentence was rephrased to be strictly factual in V2.

      Table 2: I assume these strains are delta cps? Would be relevant to list this genotype.

      The Table 2 was completed in V2.

      The authors should comment on why the mutants have not been complemented, especially for lytR as it's the last gene in a complex operon. It would be great to see WTA levels being restored by ectopic expression of LytR.

      Yes. We think this could be part of an in-depth study of the attachment of WTA, together with the investigation of the other LCP phosphotransferases.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Joint Public Review:

      Summary:

      The behavioral switch between foraging and mating is important for resource allocation in insects. This study characterizes the role of sulfakinin and the sulfakinin receptor 1 in changes in olfactory responses associated with foraging versus mating behavior in the oriental fruit fly (Bactrocera dorsalis), a significant agricultural pest. This pathway regulates food consumption and mating receptivity in other species; here the authors use genetic disruption of sulfakinin and sulfakinin receptor 1 to provide strong evidence that changes in sulfakinin signaling modulate antennal responses to food versus pheromonal cues and alter the expression of ORs that detect relevant stimuli.

      Strengths:

      The authors utilize multiple complementary approaches including CRISPR/Cas9 mutagenesis, behavioral characterization, electroantennograms, RNA sequencing and heterologous expression to convincingly demonstrate the involvement of the sulfakinin pathway in the switch between foraging and mating behaviors. The use of both sulfakinin peptide and receptor mutants is a strength of the study and implicates specific signaling actors.

      Weaknesses:

      The authors demonstrate that SKR is expressed in olfactory neurons, however there are additional potential sites of action that may contribute to these results.

      Recommendations for the authors:

      The authors have addressed most of the issues raised by the reviewers. Below are a few outstanding issues.

      (1) Lines 68-69 describe "control of B. dorsalis include the use of the behavioral responses to semiochemicals" but does not describe what these responses are or how behavior is modulated.

      The sentence was revised as “Control of B. dorsalis include the use of the reproductive and feeding behavioral responses to semiochemicals” (lines 69 in the revision).

      (2) Statistical analysis for 9 hour starved females at 5 minutes is missing in Figure 1D and S1.

      We had added statistical analysis for 9 hour starved females at 5 minutes in the revised Figures 1D and S1, respectively (lines 578).

      (3) The legend in Figure S2 should be revised as it is not clear from the figure which of the odors are food associated odors.

      As suggested, we added food odor label in the revised Figure S2 (lines 666).

      (4) Line 167: "Therefore, the upregulated OR genes in starved WT flies, OR7a.4, OR7a.8 and OR10a, were activated by the pheromonal components, while down regulated genes, OR49a and OR63a, were activated by food volatiles." Based on the data, this sentence is incorrect - Therefore, the upregulated OR genes in starved WT flies, OR7a.4, OR7a.8 and OR10a, were activated by the food components, whereas downregulated genes, OR49a and OR63a, were activated by pheromonal components."

      We are sorry for our mistake. We had corrected it (lines 168-169).

      (5) Line 192: "The coordinated action of sulfakinin on mutiple downstreams,..." should be revised to "downstream pathways or tissues" or simply removing "multiple downstream".

      As suggested, we removed “multiple downstream”. See line 192.

      (6) Reference formatting is inconsistent: see line 207 vs line 208.

      We had corrected it as “(Wu et al., 2019)” (lines 207). 

      (7) Lines 241-244 The broad discussion regarding the evolution and ancestral function of CCK here and the phylogeny in Figure S6 are peripheral to the authors claims.

      As suggested, we removed the section and the Figure S6 in the revision.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This research article by Nath et al. from the Lee Lab addresses how lipolysis under starvation is achieved by a transient receptor potential channel, TRPγ, in the neuroendocrine neurons to help animals survive prolonged starvation. Through a series of genetic analyses, the authors identify that TRPγ mutations specifically lead to a failure in lipolytic processes under starvation, thereby reducing animals' starvation resistance. The conclusion was confirmed through total triacylglycerol levels in the animals and lipid droplet staining in the fat bodies. This study highlights the importance of transient receptor potential (TRP) channels in the fly brain to modulate energy homeostasis and combat metabolic stress. While the data is compelling and the message is easy to follow, several aspects require further clarification to improve the interpretation of the research and its visibility in the field.

      Strengths:

      This study identifies the biological meaning of TRPγ in promoting lipolysis during starvation, advancing our knowledge about TRP channels and the neural mechanisms to combat metabolic stress. Furthermore, this study demonstrates the potential of the TRP channel as a target to develop new therapeutic strategies for human metabolic disorders by showing that metformin and AMPK pathways are involved in its function in lipid metabolisms during starvation in Drosophila.

      Weaknesses:

      Some key results that might strengthen their conclusions were left out for discussion or careful explanation (see below). If the authors could improve the writing to address their findings and connect their findings with conclusions, the research would be much more appreciated and have a higher impact in the field.

      Here, I listed the major issues and suggestions for the authors to improve their manuscript:

      (1) Are the increased lipid droplet size and the upregulated total TAG level measured in the starved or sated mutant in Figure 1? This information might be crucial for readers to understand the physiological function of TRP in lipid metabolism. In other words, clarifying whether the upregulated lipid storage is observed only in the starved trp mutant will advance our knowledge of TRPγ. If the increase of total TAG level is only observed in the starved animals, TRP in the Dh44 neurons might serve as a sensor for the starvation state required to promote lipolysis in starvation conditions. On the other hand, if the total TAG level increases in both starved and sated animals, activation of Dh44 through TRPγ might be involved in the lipid metabolism process after food ingestion.

      We measured total TAG level in Figure 1 and LD sizes in Figure 2 under sated condition. We inserted “under sated condition” to clarify it. lines 97 and 147-148.

      Thanks for your suggestions.

      (2) It is unclear how AMPK activation in Dh44 neurons reduces the total triacylglycerol (TAG) levels in the animals (Figure 3G). As AMPK is activated in response to metabolic stress, the result in Figure 3G might suggest that Dh44 neurons sense metabolic stress through AMPK activation to promote lipolysis in other tissues. Do Dh44 neurons become more active during starvation? Is activation of Dh44 neurons sufficient to activate AMPK in the Dh44 neurons without starvation? Is activation of AMPK in the Dh44 neurons required for Dh44 release and lipolysis during starvation? These answers would provide more insights into the conclusion in Lines 192-193.

      In our previous study, we demonstrated that trpγ mutants exhibited lower levels of glucose, trehalose and glycogen level (Dhakal et al. 2022), and in the current study, we observed excessive lipid storage in the trpγ mutant, indicating imbalanced energy homeostasis. Given the established role of AMPK in maintaining energy balance (Marzano et. al., 2021, Lin et al 2021), we employed the activated form of AMPK (UAS-AMPK<sup>TD</sup>) in our experiments. Our result showed that expression of activated AMPK in Dh44 neurons led to a reduction in total TAG levels, suggesting that AMPK activation in these neurons can promote lipolysis even in the absence of starvation. Regarding the activation of Dh44 neurons, Dus et al in 2015 reported that Dh44 cells in the brain are activated by nutritive sugars especially in starvation conditions. In addition, another report showed a role of Dh44 neuron in regulating starvation induced sleep suppression (Oh et. al., 2023) which may imply that these neurons become more active under starved conditions. We did not directly assess whether Dh44 neuron activity increases during starvation or whether AMPK activation in these neurons is required for DH44 release and subsequent lipolysis, our finding support the notion that AMPK activation in Dh44 neuron is sufficient to reduce TAG levels, potentially by metabolic stress response typically observed during starvation. We explained it like the following: “Dh44 neurons regulate starvation-induced sleep suppression (Oh et. al., 2023), which implies that these neurons become more active under starved conditions.” lines 190-191.

      (3) It is unclear how the lipolytic gene brummer is further downregulated in the trpγ mutant during starvation while brummer is upregulated in the control group (Figure 6A). This result implies that the trpγ mutant was able to sense the starvation state but responded abnormally by inhibiting the lipolytic process rather than promoting lipolysis, which makes it more susceptible to starvation (Figure 3B).

      Thanks for your suggestions. We explained it like the following: “The data indicates that the trpg mutant can sense the starvation state but responds abnormally by suppressing lipolysis instead of activating it. This dysregulated lipolytic response likely increases the mutant's vulnerability to starvation, as it cannot effectively mobilize lipid stores for energy during periods of nutrient deprivation.” lines 251-254.

      (4) There is an inconsistency of total TAG levels and the lipid droplet size observed in the Dh44 mutant but not in the Dh44-R2 mutant (Figures 7A and 7F). This inconsistency raises a possibility that the signaling pathway from Dh44 release to its receptor Dh44-R2 only accounts for part of the lipid metabolic process under starvation. Adding discussion to address this inconsistency may be helpful for readers to appreciate the finding.

      Thanks for your suggestion. We included the following in the Discussion: “There is an inconsistency of total TAG levels and the LD size observed in the Dh44 mutant. This inconsistency raises a possibility that the signaling pathway from DH44 release to its receptor DH44R2 only accounts for part of the lipid metabolic process under starvation. While Dh44 mutant flies displayed normal internal TAG levels, Dh44R2 mutant flies exhibited elevated TAG levels. This suggested that the lipolysis phenotype could be facilitated by a neuropeptide other than DH44. Alternatively, a DH44 neuropeptide-independent pathway could mediate the lipolysis.” lines 429-436.

      Reviewer #2 (Public Review):

      Summary:

      In this paper, the function of trpγ in lipid metabolism was investigated. The authors found that lipid accumulation levels were increased in trpγ mutants and remained high during starvation; the increased TAG levels in trpγ mutants were restored by the expression of active AMPK in DH44 neurons and oral administration of the anti-diabetic drug metformin. Furthermore, oral administration of lipase, TAG, and free fatty acids effectively restored the survival of trpγ mutants under starvation conditions. These results indicate that TRPv plays an important role in the maintenance of systemic lipid levels through the proper expression of lipase. Furthermore, authors have shown that this function is mediated by DH44R2. This study provides an interesting finding in that the neuropeptide DH44 released from the brain regulates lipid metabolism through a brain-gut axis, acting on the receptor DH44R2 presumably expressed in gut cells.

      Strengths:

      Using Drosophila genetics, careful analysis of which cells express trpγ regulates lipid metabolism is performed in this study. The study supports its conclusions from various angles, including not only TAG levels, but also fat droplet staining and survival rate under starved conditions, and oral administration of substances involved in lipid metabolism.

      Weaknesses:

      Lipid metabolism in the gut of DH44R2-expressing cells should be investigated for a better understanding of the mechanism. Fat accumulation in the gut is not mechanistically linked with fat accumulation in the fat body. The function of lipase in the gut (esp. R2 region) should be addressed, e.g. by manipulating gut-lipases such as magro or Lip3 in the gut in the contest of trpγ mutant. Also, it is not clarified which cell types in the gut DH44R2 is expressed. The study also mentioned only in the text that bmm expression in the gut cannot restore lipid droplet enlargement in the fat body, but this result might be presented as a figure.

      We appreciate the reviewer’s insightful suggestions. Unfortunately, due to the unviability of the reagent (UAS-Lip3), we were unable to manipulate gut lipase in trpy mutants as proposed. However, we additionally performed immunostaining to examine the co-expression of trpγ and Dh44R2 in the gut, and our results indicate that both trpγ and Dh44R2 are co-expressed in the R2 region of the gut (Figure 7O and P). Furthermore, we have updated our figures to address the point that bmm expression in the gut does not restore lipid droplet enlargement in the fat body, with the revised version (Figure 5I and J).

      Reviewer #3 (Public Review):

      In this manuscript, the authors demonstrated the significance of the TRPγ channel in regulating internal TAG levels. They found high TAG levels in TRPγ mutant, which was ascribed to a deficit in the lipolysis process due to the downregulation of brummer (bmm). It was notable that the expression of TRPγ in DH44+ PI neurons, but not dILP2+ neurons, in the brain restored the internal TAG levels and that the knockdown of TRPγ in DH44+ PI neurons resulted in an increase in TAG levels. These results suggested a non-cell autonomous effect of Dh44+PI neurons. Additionally, the expression of the TRPγ channel in Dh44 R2-expressing cells restored the internal TAG levels. The authors, however, did not provide an explanation of how TRPγ might function in both presynaptic and postsynaptic cells in the non-cell autonomous manner to regulate the TAG storage. The authors further determined the effect of TRPγ mutation on the size of lipid droplets (LD) and the lifespan and found that TRPγ mutation caused an increase in the size of LD and a decrease in the lifespan, which were reverted by feeding lipase and metformin. These were creative endeavors, I thought. The finding that DH44+ PI neurons have non-cell autonomous functions in regulating bodily metabolism (mainly sugar/lipid) in addition to directing sugar nutrient sensing and consumption is likely correct, but the paper has many loose ends. I would like to see a revision that includes more experiments to tighten up the findings and appropriate interpretations of the results.

      (1) The authors need to provide interpretations or speculations as to how DH44+ PI neurons have non-cell autonomous functions in regulating the internal TAG stores, and how both presynaptic DH44 neurons and postsynaptic DH44 R2 neurons require TRPγ for lipid homeostasis.

      In Discussion, we had mentioned our previous finding. “ We previously proposed that TRPg holds DH44 neurons in a state of afterdepolarization, thus reducing firing rates by inactivating voltage-gated Na+ channels (Dhakal et al., 2022). At the physiological level, this induces the consistent release of DH44 and depletion of DH44 stores, resulting in nutrient utilization and storage malfunctions.”

      We also included the following: “TRPg in DH44 neurons may influence the release of metabolic signals or hormones that act on postsynaptic DH44R2 cells. These postsynaptic cells could, in turn, modulate lipid storage and metabolism in a non-cell autonomous manner. However, the mechanism by which TRPg functions in DH44R2 cells remains unclear. One possible explanation is that TRPg in the gut may be activated by stretch or osmolarity (Akitake et al. 2015).” lines 439-440.

      This interaction between presynaptic and postsynaptic cells may ensure a coordinated response to metabolic changes and maintain lipid homeostasis. Thus, both Dh44-expressing and Dh44-R2-expressing cells are crucial for the proper functioning of TRPγ in regulating internal TAG levels and lipid storage.

      (2) The expression of TRPγ solely in DH44 R2 neurons of TRPγ mutant flies restored the TAG phenotype, suggesting an important function mediated by TRPγ in DH44 R2 neurons. However, the authors did not document the endogenous expression of TRPγ in the DH44R2+ gut cells. This needs to be shown.

      We appreciate the reviewer’s suggestion. To address this, we performed immunostaining to examine the expression of TRPγ in the DH44R2+ gut cells. Our results, as shown in Figure 7 O and P, confirm that TRPγ is co-expressed in the Dh44R2+ cells in the gut. We also found that Dh44R2 is expressed in the brain as well. We documented this part like the following: “Given that Dh44R2 is predominantly expressed in the intestine, we performed immunostaining to examine whether Dh44R2 co-localizes with trpg in gut cells. Our results confirmed that Dh44R2 and trpg are co-expressed in intestinal cells (Figure 7O and P). Additionally, we analyzed Dh44R2 expression in the brain and found that two Dh44R2-expressing cells are co-localized with Dh44-expressing cells in the PI region (Figure 7Q). To further delineate whether Dh44R2-mediated fat utilization is specific to the brain, gut, or fat body, we knocked down Dh44R2<sup>RNAi</sup> using Dh44-GAL4, myo1A-GAL4, and cg-GAL4, respectively (Figure 7–figure supplement 1E). Notably, knockdown of Dh44R2 with Myo1A-GAL4 resulted in elevated TAG levels, indicating that DH44R2 activity in lipid metabolism is specific to the gut.” lines 375-384.

      (3) While Dh44 mutant flies displayed normal internal TAG levels, Dh44R2 mutant flies exhibited elevated TAG levels (Figure 7A). This suggested that the lipolysis phenotype could be facilitated by a neuropeptide other than Dh44. Alternatively, a Dh44 neuropeptide-independent pathway could mediate the lipolysis. In either case, an additional result is needed to substantiate either one of the hypotheses.

      The Dh44 mutant flies exhibited normal TAG levels, whereas Dh44R2 mutant flies showed elevated TAG levels. However, when we examined the lipid droplets in the fat body, both Dh44 mutant and Dh44R2 mutant flies displayed larger lipid droplets, indicating a disruption in lipid metabolism. Additionally, we assessed starvation survival time and found that both Dh44 and Dh44R2 mutant flies exhibited reduced survival under starvation conditions compared to controls. Supplementation with lipase (Figure 7–figure supplement 1A), glycerol (Figure 7–figure supplement 1B), hexanoic acid (Figure 7–figure supplement 1C), and mixed TAGs (Figure 7–figure supplement 1D) improved starvation survival time, further supporting that the lipid metabolism pathway was impaired in both mutants. These observations highlight the role of Dh44 in regulating lipolysis. We included related Discussion: “There is an inconsistency of total TAG levels and the LD size observed in the Dh44 mutant. This inconsistency raises a possibility that the signaling pathway from DH44 release to its receptor DH44R2 only accounts for part of the lipid metabolic process under starvation. While Dh44 mutant flies displayed normal internal TAG levels, Dh44R2 mutant flies exhibited elevated TAG levels. This suggested that the lipolysis phenotype could be facilitated by a neuropeptide other than DH44. Alternatively, a DH44 neuropeptide-independent pathway could mediate the lipolysis.” lines 429-436.

      (4) While the authors observed an increased area of fat body lipid droplets (LD) in Dh44 mutant flies (Figure 7F), they did not specify the particular region of the fat body chosen for measuring the LD area.

      We have chosen the 2-3 segment in the abdomen for all fat body images, which we already mentioned in Nile red staining in the Method section line 630-631.

      (5) The LD area only accounts for TAG levels in the fat body, whereas TAG can be found in many other body parts, including the R2 area as demonstrated in Figure 5A-D using Nile red staining. As such, measuring the total internal TAG levels would provide a more accurate representation of TAG levels than the average fat body LD area.

      We have measured total internal TAG level in whole body throughout the experiments (Figure 1F, 2C, 2E, 3C, 3G, 4A, 4B, 7A, 7I, and many Supplementary Figures) except bmm expression using GAL4/UAS system. Now we include this new data in Figure 5–figure supplement 1) which is the same conclusion with LD analysis.

      (6) In Figure 5F-I, the authors should perform the similar experiment with Dh44, Dh44R1, and Dh44R2 mutant flies.

      We did the experiments with Dh44, Dh44R1, and Dh44R2 mutant flies and we found that Dh44 and Dh44R2 mutant flies showed reduced starvation survival time than control and which was increased after supplementation of lipase, glycerol, hexanoic acid and TAG (Figure 7– figure supplement 1A–D). lines 361-372.

      (7) The representative image in Figure 6B does not correspond to the GFP quantification results shown in Figure 6C. In trpr1;bmm::GFP flies, the GFP signal appears stronger in starved conditions than in satiated conditions.

      We updated it with new images. We quantified GFP intensity level using image J and found that GFP intensity level was significantly lower in starved condition in trpγ<sup>1</sup>;bmm::GFP flies than sated condition.

      (8) In Figure 6H-I, fat body-specific expression of bmm reversed the increased LD area in TRPγ mutants. The authors also showed that Dh44+PI neuron-specific expression of bmm yielded a similar result. The authors need to provide an interpretation as to how bmm acts in the fat body or DH44 neurons to regulate this.

      We first inserted the following in results: “Furthermore, the expression of bmm in the fat body, as well as Dh44 neurons in the PI region, can promote lipolysis at the systemic level.” lines 276-277.

      Additionally, we discussed it in the Discussion: “Brummer lipase is essential for regulating lipid levels in the insect fat body by mediating lipid mobilization and energy homeostasis. In Nilaparvata lugens, it facilitates triglyceride breakdown (Lu et al., 2018), while studies in Drosophila show that reduced Brummer lipase expression decreases fatty acids and increases diacylglycerol levels, highlighting its role in lipid metabolism (Nazario-Yepiz et al., 2021). Here, we additionally demonstrate that bmm expression in DH44 neurons within the PI region can systemically regulate TAG levels. Cell signaling or energy status in DH44 neurons may contribute to hormonal release that targets organs such as the fat body.” lines 451-459.

      (9) The authors should explain why the DH44 R1 mutant did not represent similar results as the wild type.

      We added “In addition, bmm levels in Dh44R1<sup>Mi</sup> under starved condition did not increase as significantly as in the control. This suggests a unique role of DH44 and its receptors in regulating lipid metabolism and response to nutritional status in Drosophila.” lines 358-360.

      (10) It would be good to have a schematic that represents the working model proposed in this manuscript.

      We updated the schematic model in revised version (Figure 8).

      Recommendations for the authors:

      Reviewing Editor (Recommendations For The Authors):

      This paper characterized the function of trpγ in Dh44-expressing PI neurons for lipid metabolism and lipolysis induced by prolonged starvation. The authors applied a series of lipolytic genetic manipulation and lipid/lipid metabolism supplements to rescue the trpγ deficits in lipolysis: the expression of active AMPK in the DH44-expressing PI neurons or brummer, a lipolytic gene, in the trpγ-expressing cells, and oral administration of the anti-diabetic drug metformin, lipase, TAG and free fatty acids. Despite this exhaustive characterization of the defective lipolysis in the trpγ mutants, there remain puzzles in inconsistent defects of Dh44 and DH44R2 in the total TAG levels and in the expression and functions of the receptor in the gut. Clarification of these points and other issues raised by the reviewers should improve the mechanisms of lipid metabolism through Dh44 signalling.

      Reviewer #1 (Recommendations For The Authors):

      (1) It might be worth introducing Dh44 in the introduction section as it is unclear to readers how the authors hypothesized the site-of-action of TRPγ in Dh44 neurons for lipid metabolism after reading the introduction.

      We introduced the following: “We found that TRPg expression in Dh44 neuroendocrine cells in the brain is critical for maintaining normal carbohydrate levels in tissues (Dhakal et al. 2022). Building on this, we hypothesized that TRPg in Dh44 cells also regulates lipid and protein homeostasis.” lines 69-71.

      (2) Providing a summary model in the end to integrate the present findings and their previous publication about TRPγ functions in Drosophila sugar selection would greatly help readers understand and appreciate the general role of TRPγ in balancing energy homeostasis.

      We made a schematic model in Figure 8.

      (3) Swapping the order of Figures 5 and 6 might be a better way to tell the story without logic gaps. The results addressing the mechanisms of metformin and TRPγ in promoting lipolysis under starvation are interrupted by the lipid storage data in the R2 cells in the current Figure 5A-5E. In addition, presenting Figure 5A-5E before or together with Figure 7 will help readers appreciate the expression of Dh44-R2 and its function in regulating lipid metabolism in Figure 7.

      We did.

      (4) It might be misleading to use the word "sated" for the condition of 5-hour mild starvation. The word "mild starvation" or the equivalents might be a better word choice.

      We appreciate the reviewer’s concern. As hemolymph sugar level does not drop down significantly in 5 hr starvation, the previous papers (Dus et al 2015, Dhakal et al 2022) indicated it as sated condition. To use the word consistently, we prefer using “sated” instead of “mild starvation”.

      (5) It is unclear what the white arrows are pointing at in Figures 7O and 7P. Some of those seem to be non-specific signals, so it is hard to connect the figure to the conclusion in Lines 351-353. It would be helpful to add some explanations to help readers interpret Figures 7O and 7P.

      In the previous version, Figure 7O and 7P white arrows represented the expression of Dh44R2 in the SEZ region of the brain and R2 region of the gut. In revised version, to make clear, we performed additional immunostaining for the co-expression of trpγ and Dh44R2 in the gut. We found that trpγ and Dh44R2 co-expressed at the R2 region of the gut specifically (Figure 7O and P). Similarly, we found that two cells of Dh44R2 co-expressed in Dh44 cells in the PI region of the brain (now Figure 7Q). We updated this part. lines 375-380.

      (6) The figure legend for the (G) panel in Figure 2-figure Supplement 1 was mislabeled as (F).

      We corrected it.

      (7) In Line 85, the authors might want to write "… among these mutants, only trpγ mutant displayed reduced carbohydrate levels, suggesting …". Please confirm the information for the sentence. lines 87-88.

      We clarified it.

      Reviewer #2 (Recommendations For The Authors):

      (1) The trpγ[G4] would be difficult for non-Drosophila researchers to understand; it would be better to use trpγ-Gal4.

      We got the mutant line from Dr. Craig Montell who named it. We explained it like the following in the main text: “controlled by GAL4 knocked into the trpg locus (trpg<sup>G4</sup> flies; +)” line 109.

      (2) The arrows in Figures 7O and 7P need to be explained in the figure legends.

      We did.

      Reviewer #3 (Recommendations For The Authors):

      (11) Lines 95-96 should have a reference.

      We did.

      (12) Lines 129-130: It should read "TRPγ expressed in DH44 cells is sufficient for the regulation of lipid levels."

      We changed it as suggested.

      (13) Figure 5E needs to be repeated with more trials.

      We increased the n numbers. Previously (Figure 5E) we included area of 10 LDs from 3 samples, and in revised figure (Figure 6I) we have included 28 LDs from 10 samples.

      (14) Figures 5F-I, bold lines are not too visible and therefore, dotted lines could be used.

      We changed it as suggested.

      (15) Line 356: It is not true that D-trehalose or D-fructose is commonly detected by DH44 neurons. These sugars at concentrations much higher than the physiological concentration range stimulate DH44 neurons (see Dus et al., 2015).

      We removed it.

      (16) Lines 362-363: It should read "Expression of TRPγ in DH44 neurons was necessary and sufficient to regulate the carbohydrate and lipid levels.".

      We changed it.

      (17) Lines 369-370: The authors need to consider removing the possible role of CRF in regulating lipid homeostasis. It could be considered to be far-fetched.

      We removed it.

      (18) Line 407-408: the sentence "Nevertheless, it is also known that DH44 neurons mediate the influence of dietary amino acids on promoting food intakes in flies (37)" needs to be removed. They used amino acid concentrations that were far greater than the physiological levels observed in the internal milieu of flies. Still, many laboratories cannot reproduce the result of using the high AA concentrations.

      We removed it.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (public review): 

      This manuscript presents SAVEMONEY, a computational tool designed to enhance the utilization of Oxford Nanopore Technologies (ONT) long-read sequencing for the design and analysis of plasmid sequencing experiments. In the past few years, with the improvement in both sequencing length and accuracy, ONT sequencing is being rapidly extended to almost all omics analyses which are dominated by short-read sequencing (e.g., Illumina). However, relatively higher sequencing errors of long-read sequencing techniques including PacBio and ONT is still a major obstacle for plasmid/clone-based sequencing service that aims to achieve single base/nucleotide accuracy. This work provides a guideline for sequencing multiple plasmids together using the same ONT run without molecular barcoding, followed by data deconvolution. The whole algorithm framework is well-designed, and some real data and simulation data are utilized to support the conclusions. The tool SAVEMONEY is proposed to target users who have their own ONT sequencers and perform library preparation and sequencing by themselves, rather than relying on commercial services. As we know and discussed by the authors, in the real world, to ensure accuracy, the researchers will routinely pick up multiple colonies in the same plasmid construction and submit for Sanger sequencing. However, SAVEMONEY is not able to support the simultaneous analysis of multiple colonies in the same run, as compared to the barcoding-based approaches. This is a major limitation in the significance of this work. Encouraging computational ePorts in ONT data debarcoding for mixed-plasmid or even single-cell sequencing would be more valuable in the field. 

      We thank the reviewer for the positive response to our manuscript and the helpful comments.

      The tool SAVEMONEY is proposed to target users who have their own ONT sequencers and perform library preparation and sequencing by themselves, rather than relying on commercial services.

      We apologize that we were not clear enough in the manuscript. Our tool is designed for users who rely on commercial services (i.e., those who cannot include a barcode by themselves). However, it can also benefit those performing library preparation, as SAVEMONEY can be applied after standard barcode-based sequencing and de-multiplexing. The combination of standard barcodes with SAVEMONEY would significantly expands the scope of sequencing applications. For example, it would enable sequencing of more plasmid types than the number of available barcodes and, in some cases, it may even eliminate the need for barcode introduction. Because we do not own ONT equipment and because the primary target audience for the SAVEMONEY algorithm are users without ONT equipment, we were not able to conduct experiments using ONT. However, to clarify these possibilities, we added a dedicated paragraph describing these issues (3rd paragraph in the discussion section).

      However, SAVEMONEY is not able to support the simultaneous analysis of multiple colonies in the same run, as compared to the barcoding-based approaches.

      We agree with the reviewer about this limitation of SAVEMONEY, as it does not allow mixing of plasmids from multiple colonies in the same cloning run. However, that does not necessarily mean that SAVEMONEY cannot reduce sequencing costs in cloning. For example, when sequencing two colonies from each of three diPerent constructs (six plasmids in total), the standard approach would require sequencing costs for six samples. However, with SAVEMONEY, up to three plasmids can be mixed per sample, allowing them to be sequenced as just two samples. As a result, the sequencing cost per plasmid is reduced to one-third. The greatest benefits can be realized when SAVEMONEY is used at the laboratory level or by multiple researchers. To make this point clearer, we have added sentences in the 5th paragraph of the discussion section.

      (1) To provide more comprehensive information for users who care about the cost, the Introduction section should include a cost comparison between Sanger and ONT, with more details, such as diPerent ONT platforms (MinION, PromethION, FlongIe), chemistries (flow cells) and kits. This additional information will be more helpful and informative for the users who have their own sequencers and are the target audience for SAVEMONEY. 

      We thank the reviewer for pointing this out. Since we do not own ONT equipment, we are unable to provide a total cost for using the ONT platform. However, we have included the price per sample (~$15 per plasmid) for the commercial service we have used, as well as the equipment that they employ (V14 chemistry on a PromethION with an R10.4.1 flow cell) and the number of reads obtained per plasmid (~100–1000) in the 4th paragraph of the introduction section.     Though these costs will inevitably change over time, this information should still be helpful for those who own ONT sequencers in estimating the costs.

      (2) In "Overview of the algorithm" (Pages 3-4) under the Results section, instead of stating "However, coverage varies from ~100-1000 and is diPicult to predict because each nanopore flow cell has diPerent properties.", it will be beneficial to provide more detailed information, such as sequencing length, yield/read count per flow cell of diPerent platforms. This information will assist users in designing their own experiments ePectively. 

      We thank the reviewer for the comment. As mentioned in the previous response, we are unable to provide sequencing length, yield/read count per flow cell because we do not own ONT equipment. However, we apologize if it was not clear in "Overview of the algorithm" section that we are discussing the use of results obtained from commercial services, and therefore we need to provide more detailed information about the results from the commercial service. We have now clarified in the sentence pointed out by the reviewr that the numbers are derived from the information provided by commercial sequencing services. In addition, we have also added that typical examples of the result properties, i.e., read length and quality score distribution, can be found in Fig. 2 at the end of the same paragraph.

      (3) While this study optimized and evaluated the tool using a total of 14 plasmids, it may not provide suPicient power to represent the diversity of the plasmid world. Consideration should be given to expanding the dataset to include a broader range of plasmids in future studies to enhance the robustness and generalizability of the tool. 

      We are grateful to the reviewer for their valuable input. It is very reasonable that we had to expect that a larger number of plasmids should be used, even though the main target of SAVEMONEY is those who utilize commercial services. In the previous version of SAVEMONEY, it was not possible to process in a reasonable amount of time if too many plasmids were provided, though the algorithm itself does not have no restrictions based on the number of plasmids. Therefore, we have changed the underlying code to improve the algorithm, making it more than 20 times faster than the previous version (the benchmark time mentioned in the 3rd paragraph of the discussion section was improved to 3.1 minutes from the previous 65 minutes, using the same dataset and the same computer). Additionally, SAVEMONEY is now compatible with multiprocessing. The processing time is expected to decrease approximately inversely proportional to the number of CPU cores used. We have added these updates at the end of the 3rd paragraph in the discussion section.

      (4) If applicable and feasible, including a comparison or benchmark of SAVEMONEY against other similar tools would further strengthen the manuscript. This comparison would allow users to evaluate the advantages and disadvantages of diPerent tools for their specific needs. 

      We thank the reviewer for the suggestion. We have added the benchmark using the similar tool, On-Ramp, with the exact same set of plasmids and FASTQ data used for our benchmark (4th paragraph in the discussion section). Because the machine specifications used in the On-Ramp web server are unknown, a direct comparison is not possible. However, using only laptop-level computational resources, SAVEMONEY was able to process the data 38% faster than On-Ramp. When using mini-PC level computational resources, the processing time was 64% faster than on-RAMP.

      (5) The importance of pre-filtering raw sequencing reads should be emphasized as noisy reads can significantly impact the overall performance of the tool. It is essential to clarify whether any pre-filtering steps were performed in this study, such as filtering based on quality scores, read length, or other relevant factors. 

      We apologize for not being clear. Unfortunately, the commercial sequencing service we used did not provide the information regarding pre-filtering. However, the impact of the quality of pre-filtering based on quality score and read length on the quality of the final results is theoretically minimal in SAVEMONEY. First, during the initial step of the post-analysis, the classification step, short reads compared to the full plasmid length can be excluded based on the user-defined “score_threshold”. Simultaneously, low-quality reads with poor alignment to the plasmid can also be excluded, because “score_threshold” is related to the normalized alignment score. Even if there are low-quality reads that are not excluded at this stage, the ePect can be minimized during the final step of the post-analysis that generates consensus sequences. This is because our Bayesian analysis considers not only the base calling but also the q-scores to determine the consensus. Therefore, we believe the overall impact of pre-filtering on the final results is negligible.

      (6) The statement regarding the number of required reads per plasmid (20-30) and the maximum number of plasmids (up to six) that can be mixed in a single run may become outdated due to the rapid advancements in ONT technology. In the Discussion section, instead of assuming specific numbers, it would be more beneficial to provide information based on the current state of ONT sequencing, such as the number of reads per MinION flow cell that can be produced.

      We thank the reviewer for pointing this out. Because the number of required reads per plasmid depends on the accuracy of each read (i.e., the number of required reads can be reduced if the accuracy increases), we have added the description of these points to the last paragraph of the discussion section.

      Reviewer #2 (public review):  

      The authors developed an algorithm that allows for deconvoluting of plasmid sequences from a mixture of plasmids that have been sequenced by nanopore long read technology. As library preparations and barcoding of individual samples increase sequencing costs, the algorithm bypasses this need and thus decreases time on sample prep and sequencing costs. In the first step, the tool assesses which of the plasmid constructions can be mixed in a single library preparation by calculating a distance matrix between the reference plasmid and the constructions producing sequence clusters. The user is given groups of plasmids, from diPerent clusters, to be pooled together for sequencing. After sequencing, the algorithm deconvolutes the reads by classifying them based on alignments to the reference sequence. A Bayesian analysis approach is used to obtain a consensus sequence and quality scores. 

      Strengths 

      The authors exploit one of the main advantages of long-read sequencing which is to accurately resolve regions of high complexity, as regularly found in plasmids, and developed a tool that can validate plasmid constructions by reducing sequencing costs. Multiple plasmids (up to six) can be analyzed simultaneously in a single library without the need for sample barcoding, also reducing sample preparation time. Although inserts must be diPerent, just 2 bases diPerence would be enough for a correct assignation. It maximizes cost-ePiciency for projects that require large amounts of plasmid constructions and highthroughput validation. 

      We thank the reviewer for the positive response to our manuscript and the helpful comments.

      Weaknesses 

      The method proposed by the authors requires prior knowledge of plasmid sequences (i.e., blueprints or plasmid reference) and is not suitable for small experiments. The plasmid inserts or backbones must be diPerent e.g., multiple colonies from the same plasmid construction ePort cannot be submitted together. 

      As also discussed in the response to reviewer 1, we agree with the reviewer that SAVEMONEY does not allow you the analysis of plasmids from multiple colonies in the same cloning experiment. However, that does not necessarily mean that SAVEMONEY cannot reduce the sequencing cost. For example, when sequencing two colonies from each of three diPerent constructs (six plasmids in total), the standard approach would require sequencing costs for six samples. However, with SAVEMONEY, up to three plasmids can be mixed per sample, allowing them to be sequenced as just two samples. As a result, the sequencing cost per plasmid is reduced to one-third. The greatest benefits can be realized when SAVEMONEY is used at the laboratory level or by multiple researchers. To make this point clearer, we have added sentences in the 5th paragraph of the discussion section.

      The reviewer also expressed concern that SAVEMONEY is not suitable for experiments at a small scale. To put it more precisely, SAVEMONEY cannot be used when the experiment size is minimal, such as in a lab that consistently constructs only a single plasmid at a time. That said, the strength of SAVEMONEY lies in its scalability. Even in labs where plasmid construction is typically limited to one at a time, there may be occasional instances where two or more plasmids are created simultaneously. In such cases, SAVEMONEY can be used to reduce sequencing costs. Moreover, in a typical molecular biology lab where multiple plasmids are constructed every week, SAVEMONEY can be particularly ePective. Given its adaptability and cost-saving potential and widespread use since its initial publication on bioRxiv and on Google Colab, we are confident that SAVEMONEY will continue to be a valuable tool for a wide range of researchers.

      Recommendations For The Authors:

      Reviewer #2 (Recommendations For The Authors): 

      The manucript assumes all samples are sent out for sequencing at a specific company. This could be generalized for a much broader use since many labs now own nanopore sequencers. In turn, the advantage of reducing hands-on sample prep becomes more evident. 

      We thank the reviewer for pointing this out. We agree that SAVEMONEY can also benefit those performing library preparation. Combination of standard barcodes with SAVEMONEY significantly expands the scope of sequencing applications. For example, it enables sequencing of more plasmid types than the number of available barcodes and, in some cases, may even eliminate the need for the sample prep step to introduce barcode. Because we do not own ONT equipment, we could not conduct experiments using ONT. However, to clarify these possibilities, we added a dedicated paragraph (3rd paragraph in the discussion section).

      The base calling model (high accuracy, super accuracy) used by Plasmidsaurus and tested here should be mentioned.  

      We thank the reviewer for the suggestion. The description about the base calling model (HAC) was added in Materials and Methods section.

      Other modifications to the revised manuscript 

      Beyond changes made in response to reviewer comments above, we have also through our continued use and improvement of SAVEMONEY, made additional changes to the algorithm and therefore to the manuscript. Those changes are outlined below. Improvements in the pre-survey step

      (1) The pre-survey algorithm was reduced to a Zero-One Integer Linear Programming Problem to guarantee the optimal combinations, as previous versions did not ensure an optimal solution. Relatedly, the explanation of the algorithm in the main manuscript was updated.

      (2) The algorithm was modified to ensure that the number of plasmids distributed to each group is balanced. A new feature was also added to allow users to specify the number of groups, which is beneficial when balancing between cost and quality.

      (3) An error was corrected in Fig. 2, where the distance calculation method for the hierarchical clustering step for group formation was Farthest Point Algorithm, which calculates distance between two clusters based on the farthest pair of plasmids. The correct method is the Nearest Point Algorithm. This error was present only in Fig. 2, while other implementations, including source code of SAVEMONEY and Google Colab page, were correct from the beginning. We have corrected the error in Fig. 2.

      Modifications in figures, manuscripts, and other aspects

      (1) Fig. 3 was updated to reflect the update of SAVEMONEY, although it did not show any important diPerences.

      (2) Parameter names were updated as follows:

      “threshold (pre)” -> “distance_threshold”

      “threshold (post)” -> “score_threshold” Added “number_of_groups”

      (3) The order of elements was rearranged in Fig. 4.

      (4) Incorrect calculations were fixed in Fig. 4g, h, and i (old Fig. 4d, h, and l). Related to that, Fig. 4j, k, and l and Table 1 were added, in addition to the explanation in the main manuscript.

      (5) SAVEMONEY was packaged and was released on PyPI to facilitate easy installation and integration by other developers.

      (6) SAVEMONEY was updated and expanded to accommodate linear DNA fragments, such as PCR amplicons and long synthetic DNA. Users can select the topology of DNA by specifying that as an option. A description of this new capability was added at the end of “Overview of the algorithm” section.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1:

      (…) some concerns with interpretations and technical issues make several major conclusions in this manuscript less rigorous, as explained in detail in comments below. In particular, the two major concerns I have: 1) the contradiction between the strong reduction of global translation, with puromycin incorporation gel showing no detectable protein synthesis in cold, and an apparently large fraction of transcripts whose abundance and translation in Fig. 2A are both strongly increased. 2) The fact that no transcripts were examined for dependance on IRE-1/XBP1 for their induction by cold, except for one transcriptional reporter, and some weaknesses (see below) in data showing activation of IRE-1/XBP-1 pathway. The conclusion for induction of UPR by cold via specific activation of IRE-1/XBP-1 pathway, in my opinion, requires additional experiments.

      Relating to the first point, the results of puromycin incorporation and ribosome profiling are not contradictory. The former shows absolute changes in translation, i.e. changes in how much protein the cell is producing, while the latter shows relative changes between the produced proteins, i.e. how the cell prioritizes its protein production. An observed up-regulation in ribosome profiling does not necessarily mean (but could) that the corresponding protein goes up in absolute terms (units produced per time). Instead, it implies that out of the population of all translating ribosomes, a larger fraction is translating (prioritizing) this particular mRNA relative to other mRNAs. The second point is addressed later in the response.

      Major concerns:

      (1) Fig. 1B shows polysomes still present on day 1 of 4ºC exposure, but the gel in Fig. 1C suggests a complete lack of protein synthesis. Why?

      We realized that the selected gel exposure may give the false impression of a complete lack of puromycin incorporation at 4ºC. To avoid confusion, we now show in Figure 1 – figure supplement 1 the original gel image next to its longer exposure. The quantification of puromycin incorporation remains in Fig. 1C (it is based on 3 biological replicates and only one replicate is shown in the corresponding supplement). We hope it is now clear that there is an ongoing puromycin incorporation/translation at 4ºC, albeit much reduced compared with 20ºC.

      What is then the evidence that ribosomal footprints used in much of the paper as evidence of ongoing active translation are from actual translating rather than still bound to transcripts but stationary ribosomes, considering that cooling to 4ºC is often used to 'freeze' protein complexes and prevent separation of their subunits? The authors should explain whether ribosome profiling as a measure of active translation has been evaluated specifically at 4ºC, or test this experimentally.

      While the ribosomal profiling alone might not prove ongoing translation, the residual puromycin incorporation does (see the longer gel exposure in Figure 1 – figure supplement 1). To strengthen this argument, we selected two additional genes (cebp-1 and numr-1) whose ribosomal footprints increase in the cold, and whose GFP-fusions were available from the CGC. Monitoring their expression, we observed the expected increase in the cold (see Figure 2 – figure supplement 3 A-B). The ongoing translation in the cold is also in line with our previous study (Peke et al., 2022), where we observed de novo protein synthesis of other proteins under the same cooling conditions as in this study.

      They should also provide some evidence (like Western blots) of increases in protein levels for at least some of the strongly cold-upregulated transcripts, like lips-11.

      As explained above, we addressed it by additionally examining two strains expressing GFP-fused proteins, whose translation in the cold is predicted to increase according to our ribosomal profiling data. See the new Figure 2 – figure supplement 3 A-B.

      As puromycin incorporation seems to be the one direct measure of global protein synthesis here, it conflicts with much of the translation data, especially considering that quite a large fraction of transcripts have increased both mRNA levels and ribosome footprints, and thus presumably increased translation at 4ºC, in Fig. 2A.

      We hope the above explanations put this concern to rest.

      Also, it is not clear how quantitation in Fig. 1C relates to the gel shown, the quantitation seems to indicate about 50-60% reduction of the signal, while the gel shows no discernable signal.

      A above, see a longer western blot exposure in Figure 1 – figure supplement 1 and note that the quantification is based on three biological replicates.

      (2) It is striking that plips-11::GFP reporter is induced in day 1 of 4ºC exposure, apparently to the extent that is similar to its induction by a large dose of tunicamycin (Fig. 3 supplement),

      We did not intend to compare the extend of induction between cold and tunicamycin treatment. The tunicamycin experiment was meant to confirm that, as suggested by expression data from Shen et al. 2005, lips-11 is upregulated upon UPR activation.

      …but the three IRE-1 dependent UPR transcripts from Shen 2005 list were not induced at all on day 1 (Fig. 4 supplement). Moreover, the accumulation of the misfolded CPL-1 reporter, that was interpreted as evidence that misfolding may be triggering UPR at 4ºC, was only observed on day 1, when the induction of the three IRE-1 targets is absent, but not on day 3, when it is stronger. How does this agree with the conclusion of UPR activation by cold via IRE-1/XBP-1 pathway?

      In the originally submitted supplemental figure, we compared mRNA levels between day 1 animals at 20ºC versus 4ºC. However, as argued later by this reviewer, it may be better to use day 0 animals at 20ºC as the reference (since at 20ºC the animals will continue producing embryos). Thus, we repeated the RT-qPCR analysis with additional time points (and genes relevant to other comments). This analysis, now in Figure 4 – figure supplement 2, shows that these mRNAs (dnj-27, srp-7, and C36B7.6) increased already at day 1 in the cold compared with the reference 20ºC animals on day 0, and their levels increased further on day 3.

      It is true that the authors do note very little overlap between IRE-1/XBP-1-dependent genes induced by different stress conditions, but for most of this paper, they draw parallels between tunicamycin-induced and cold induced IRE-1/XBP-1 activation.

      We carefully re-examined the manuscript to ensure that we do not draw parallels between cold and tunicamycin treatment. The three genes (dnj-27, srp-7, and C36B7.6) were taken from Shen et al. because that study reported lips-11 as an IRE-1-responsive gene, which we realized thanks to the Wormbase annotation of lips-11. Examining the three genes in our expression data, srp-7 (like lips-11) is also upregulated more than 2-fold, while the other two genes go up but less than 2-fold. As mentioned by the reviewer, we note little overlap between the different stress conditions suggesting that the response is context dependent. Additional differences may arise if, as we hypothesize, UPR is activated in the cold in response to both protein and lipid stress. Note that the 2-fold cutoff used in the previous Figure 7 – figure supplement 1 was (erroneously) on the log2 scale, so showed genes upregulated at least 4-fold. We now corrected it to 2-fold. While there are now a few more overlapping genes, the overall conclusion, that there is little overlap between different conditions, did not change. We now list the shared genes in the new Supplementary file 5.

      The conclusion that "the transcription of some cold-induced genes reflects the activation of unfolded protein response (UPR)..." is based on analysis of only one gene, lips-11. No other genes were examined for IRE-1 dependence of their induction by cold, neither the other 8 genes that are common between the cold-induced genes here and the ER stress/IRE-1- induced in Shen 2005 (Venn diagram in Figure 7 supplement), nor the hsp-4 reporter. What is the evidence that lips-11 is not the only gene whose induction by cold in this paper's dataset depends on IRE-1? This is a major weakness and needs to be addressed.

      Furthermore, whether induction by cold of lips-11 itself is due to IRE1 activation was not tested, only a partial decrease of reporter fluorescence by ire-1 RNAi is shown. A quantitative measure of the change of lips-11 transcript in ire-1 and xbp-1 mutants is needed to establish if it depends on IRE-1/XBP-1 pathway.

      We now examined by RT-qPCR if the induction of the three genes from Shen at al. (dnj-27, srp-7, and C36B7.6), as well as lips-11 and hsp-4 depends on IRE-1. In the new Figure 4 – figure supplement 2, we show that the upregulation of all these genes is reduced in the cold in the ire1 mutant (although in the wild type, the increase of hsp-4 mRNA appeared to be non-significant, despite the observed upregulation of the hsp-4 GFP reporter).

      The authors could provide more information and the additional data for the transcripts upregulated by both ER stress and cold, including the endogenous lips-11 and hsp-4 transcripts: their identity, fold induction by both cold and ER stress, how their induction is ranked in the corresponding datasets (all of these are from existing data), and do they depend on IRE-1/XBP-1 for induction by cold?

      As above, the dependence of endogenous lips-11 and hsp-4 on IRE-1 is now shown in the new Figure 4 – figure supplement 2, and the shared genes from Figure 7 – figure supplement 1 are listed in the new Supplementary file 5. We did not perform additional analysis comparing various data sets, as we felt that understanding the differences between IRE-1-mediated transcription outputs across different conditions goes well beyond this study.

      Without these additional data and considering that the authors did not directly measure the splicing of xbp-1 transcript (see comment for Fig. 3 below), the conclusion that cold induces UPR by specific activation of IRE-1/XBP-1 pathway is premature.

      To address the splicing of endogenous xbp-1, we examined our ribosome profiling data for the translation of spliced xbp-1, and found that the spliced variant is more abundant in the cold. This data is now shown in Figure 3 – figure supplement 2B.

      There are also technical issues that are making it difficult to interpret some of the results, and missing controls that decrease the rigor of conclusions:

      (1) For RNAseq and ribosome occupancy, were the 20ºC day 1 adult animals collected at the same time as the other set was moved to 4ºC, or were they additionally grown at 20ºC for the same length of time as the 4ºC incubations, which would make them day 2 adults or older at the time of analysis? This information is only given for SUnSET: "animals were cultivated for 1 or 3 additional days at 4ºC or 20ºC".

      In the RNAseq experiments, the 20ºC animals were collected at the same time as the others were moved to 10ºC (and then 4ºC), so they were not additionally grown at 20ºC. We make it now clear in Methods.

      This could be a major concern in interpreting translation data: First, the inducibility of both UPR and HSR in worms is lost at exactly this transition, from day 1 to day 2 or 3 adults, depending on the reporting lab (for example Taylor and Dillin 2013, Labbadia and Morimoto, 2015, De-Souza et al 2022).

      As explained above, the 20ºC animals were collected at the same time as the others were moved to 4ºC. Then, we reported before that ageing appears to be suppressed in animals incubated at 4ºC (Habacher et al., 2016; Figure S1C). Thus, it terms of their biological age, cold-incubated animals appear to be closer to the 20ºC animals at the time they are moved to the cold (day 0). Thus, the ageing-associated deterioration in UPR inducibility mentioned above presumably does not apply to cold-incubated animals, which is in line with the observed IRE-1-dependent upregulation of several genes in day 3 animals at 4ºC.

      How do authors account for this? Would results with reporter induction, or induction of IRE-1 target genes in Fig. 4, change if day 1 adults were used for 20ºC?

      Our analysis in Figure 4 – figure supplement 2 now includes 20ºC animals at day 0, 1, and 3.

      Second, if animals at the time of shift to 4ºC were only beginning their reproduction, they will presumably not develop further during hibernation, while an additional day at 20ºC will bring them to the full reproductive capacity. Did 4ºC and 20ºC animals used for RNAseq and ribosome occupancy have similar numbers of embryos, and were the embryos at similar stages?

      As explained above, the reference animals at 20ºC were young adults containing few embryos. Indeed, at 4ºC the animals do not accumulate embryos. Although we cannot say that for all genes, note that the genes analysed in Figure 4 – figure supplement 2 increase in abundance also when compared with the day 3 animals kept at 20ºC.

      (2) Second, no population density is given for most of the experiments, despite the known strong effects of crowding (high pheromone) on C. elegans growth. From the only two specifics that are given, it seems that very different population sizes were used: for example, 150 L1s were used in survival assay, while 12,000 L1s in SUnSET. Have the authors compared results they got at high population densities with what would happen when animals are grown in uncrowded plates? At least a baseline comparison in the beginning should have been done.

      None of the experiments involved crowded populations. In the SUnSET experiments, we just used larger and more plates to obtain sufficient material.

      (3) Fig. 3: it is unclear why the accepted and well characterized quantitative measure of IRE1 activation, the splicing of xbp-1transcript, is not determined directly by RT-PCR. The fluorescent XBP-1spliced reporter, to my knowledge, has not been tested for its quantitative nature and thus its use here is insufficient. Furthermore, the image of this fluorescent reporter in Fig. 3b shows only one anterior-most row of cells of intestine, and quantitation was done with 2 to 5 nuclei per animal, while lips-11 is induced in entire intestine. Was there spliced XBP-1 in the rest of the intestinal nuclei? Could the authors show/quantify the entire animal (20 intestinal cells) rather than one or two rows of cells?

      As explained above, we now included the analysis of xbp-1 splicing in Figure 3 – figure supplement 2B. As for the fluorescent reporter, it is difficult to measure all gut nuclei since part of the gut is occluded by the gonad. Nonetheless, we do see induction of the reporter in other gut nuclei and show now additional examples from midgut in Figure 3 – figure supplement 2A.  

      (4) The differences in the outcomes from this study and the previous one (Dudkevich 2022) that used 15ºC to 2ºC cooling approach are puzzling, as they would suggest two quite different IRE-1 dependent programs of cold tolerance. It would be good if authors commented on overlapping/non-overlapping genes, and provided their thoughts on the origin of these differences considering the small difference in temperatures.

      Indeed, there seem to be substantial differences between different temperatures and cooling paradigms. While understanding the C. elegans responses to cold is still in its infancy, one possible explanation for the observed differences is that we used different starting growth temperatures. While the initial populations in our study were grown at 20ºC, Dudkevich et al. used 15ºC. Worms display profound physiological differences between these two temperatures. For example, Xiao et al. (2013) showed that the cold-sensitive TRPA-1 channel is important at 15ºC but not 20ºC. Thus, the trajectories along which worms adapt to near freezing temperature may vary depending on their initial physiological state (and perhaps the target temperature, as we used 4ºC and they 2ºC). We now expanded argumentation on this topic in Discussion. I should also say that we planned on testing NLP-3 function in our paradigm, but our request for strains remained unanswered.

      Second, have the authors performed a control where they reproduced the rescue by FA supplementation of poor survival of ire-1 mutants after the 15ºC to 2ºC shift? Without this or another positive control, and without measuring change in lipid composition in their own experiments, it is unclear whether the different outcomes with respect to FAs are due to a real difference in adaptive programs at these temperatures, or to failure in supplementation?

      While we did not re-examine the findings by Dudkevich et al., we did include now another positive control. As reporter by Hou et al. (2014), supplementing unsaturated FAs rescues the induction of the hsp-4 reporter in fat-6 RNAi-ed animals. Although we were able to reproduce that result (Figure 6 – figure supplement 1), the same supplementation procedure did not suppress the lips11 reporter (Figure 6 – figure supplement 2).

      (5) Have the authors tested whether and by how much ire-1(ok799) mutation shortens the lifespan at 20ºC? This needs to be done before the defect in survival of ire-1 mutants in Fig. 7a can be interpreted.

      The lifespan at standard cultivation temperature was examined by others (Henis-Korenblit et al., 2010; Hourihan et al., 2016), showing that ire-1(ok799) mutants live shorter. However, while some mechanism that prolong lifespan may also improve cold survival, the two phenomena are not identical and whether IRE-1 facilitates longevity and cold survival in the same or different way remains to be seen.

      Reviewer #2:

      (1) The conclusions regarding a general transcriptional response are based on one gene, lips-11, which does not affect survival in response to cold. We would suggest altering the title, to replace "Reprograming gene expression: with" Regulation of the lipase lips-11".

      We now examined IRE-1 dependent induction of additional genes – see Figure 4 – figure supplement 2. While we do not know what fraction of cold-induced genes depends on IRE-1, we feel that our findings justify the statement that that gene expression in the cold involves the IRE1/XBP-1 pathway (title) or that that the transcription of some/a subset of cold-induced genes depend on this pathway (in abstract, model, and discussion).

      (2) There is no gene ontology with the gene expression data.

      We now included the top 10 most enriched and suppressed gene categories between 10ºC and 4ºC (since the biggest change happens between these conditions, as shown in Figure 2 – figure supplement 1A). This is now included in the Figure 2 – figure supplement 2.

      (3) Definitive conclusions regarding transcription vs translational effects would require use of blockers such as alpha amanatin or cyclohexamide.

      As explained also for reviewer 1, we confirmed now that at least some genes, whose translation is upregulated based on the ribosome profiling, are indeed upregulated in the cold at the protein level (Figure 2 – figure supplement 3A-B). Thus, the increase in ribosomal occupancy seems to accurately reflect increased translation. Since mRNA levels correlate overall with the ribosomal occupancy, it appears that the mRNA levels are the main determinants of the translation output. Because the lips-11 promoter is sufficient to upregulate the GFP reporter in the cold, it further suggests that the regulation happens at the transcription level. It is true that at this point we cannot completely rule out the effects of mRNA stability, which we clearly acknowledge in the discussion.

      (4) Conclusions regarding the role of lipids are based on supplementation with oleic acid or choline, yet there is no lipid analysis of the cold animals, or after lips-1 knockdown.

      We agree that this is an important direction for future studies but feel that lipidomic analysis goes beyond the scope of current work.

      Although choline is important for PC production, adding choline in normal PC could have many other metabolic impacts and doesn't necessarily implicate PC without lipidomic or genetic evidence.

      We agree and acknowledge it now in Discussion: “However, choline also plays other roles, including in neurotransmitter synthesis and methylation metabolism. Thus, we cannot yet rule out the possibility that the protective effects of choline supplementation stem from functions outside PC synthesis.”

      Reviewer #3:

      The study has several weaknesses: it provides limited novel insights into pathways mediating transcriptional regulation of cold-inducible genes, as IRE-1 and XBP-1are already well-known responders to endoplasmic reticulum stress, including that induced by cold.

      We presume the reviewer refers to the study by Dudkevich et al. (2022). As explained in our manuscript, there are important differences between that study and ours in how the IRE-1 signalling is utilized and to what ends.

      Additionally, the weak cold sensitivity phenotype observed in ire-1 mutants casts doubt on the pathway's key role in cold adaptation. The study also overlooks previous research (e.g.PMID: 27540856) that links IRE-1 to SKN-1, another major stress-responsive pathway, potentially missing important interactions and mechanisms involved in cold adaptation.

      We state in the manuscript that the IRE-1 pathway plays a modest but significant role in cold adaptation and state in the Fig. 7 model and Discussion that additional pathways work alongside IRE-1 to drive cold-specific gene expression.

      Recommendations for the authors:

      Reviewer #1:

      Minor comments:

      (1) Fig. 2B - reporter expression seems to be already present in the intestine of 20ºC animals. What is the turnover rate of GFP in the intestine and how is it affected by the temperature shift? If GFP degradation is inhibited, could it explain the increase in signal in 4ºC animals, rather than increased transcription? This seems to be true for the hsp-4 transcriptional reporter, as the GFP fluorescence appears to increase during 4ºC incubation (Fig. 4a), but the hsp-4 message levels are only increased after 1 day but not in later days at 4ºC, based on the RNAseq in provided dataset. How well do changes in lips-11 reporter fluorescence correspond to the changes in the endogenous lips-11 transcript?

      Note that increased GFP fluorescence is accompanied by increased mRNA levels. In addition to the RNAseq data, we now also examined changes of the endogenous lips-11 transcript by RTqPCR and observed its strong (and IRE-1 dependent) upregulation in the cold– see Figure 4 – figure supplement 2. Moreover, we now included two other examples of GFP-tagged proteins whose fluorescence increases in the cold, concomitant with increased mRNA levels and ribosomal occupancy (Figure 2 – figure supplement 2A-B).

      (2) Descriptions of methods to measure different aspects of translation are very abbreviated and in some places make it difficult to understand the paper. One example - what is RFP in Fig. 2a?

      We replaced now “RFP” with “RPF” (ribosome protected fragment) and the abbreviation is explained firsts time it is used.

      (3) How was the effectiveness of RNAi at 4ºC validated?

      As explained in Methods, we subjected animals to RNAi long before they were transferred to 4ºC, so the corresponding protein is depleted prior to cooling.

      (4) Several of the conclusions on translation and ribosomal occupancy are written in a somewhat confusing way. For example, the authors state that "shift from 10ºC to 4ºC had a strong effect" when describing "impact on translation (ribosomal occupancy)" (page 4), but in the next sentence, they state "a good correlation between mRNA levels and translation (Figure 2A)". Was ribosomal occupancy normalized to the transcript abundance?

      We do not perceive any discrepancy between the two statements. The former refers to the difference between time points, where we observed the largest change in both the transcriptome and ribosomal occupancy from 10ºC to 4ºC (as can be inferred in the PCA plot in Figure 2 - figure supplement 1). The latter refers to the observation that changes in mRNA levels mirrored, in most of cases, similar changes in the ribosomal occupancy.

      The ribosomal occupancy was not normalized, as that would essentially normalize the y-axis (ribosomal occupancy) with the x-axis (mRNA), and so express changes in “translational efficiency” as a function of changes in mRNA abundance. While this type of analysis can also reveal interesting biological phenomena, it would explore a different question.

      (5) "For most transcripts ... increased the abundance of a particular protein appears to correlate depend primarily on the abundance of its mRNA" (page 5). This is an overstatement, the protein levels were not quantified.

      As explained above, we now additionally monitored the expression of two GFP-tagged proteins (CEBP-1 and NUMR-1). Monitoring their expression, we observed the expected increase in GFP fluorescence in the cold (see Figure 2 – figure supplement 3 A-B). While we did not examine them also by western blot, these observations are in line with our conclusions.

      (6) The statement "Since transcription is the main determinant of mRNA levels, these results suggest that cold-specific gene expression primarily depends on transcription activation" seems to assume that message degradation doesn't have much of an impact at 4ºC. What is the evidence here? The authors themselves later suggest either transcription or mRNA stability in Discussion.

      While we cannot exclude that mRNA stability of some genes may be affected, this concern is more valid for the messages that go down in the cold. Although we have done it for only selected genes, each time we observed an increase in the mRNA levels, we also observed the corresponding increase in the protein; this study and Pekec et al. (2022). Then, the lips-11 reporter was designed to monitor the activity of its promoter, which we showed in sufficient to upregulate reporter GFP in the cold. We have now expanded the corresponding paragraph in Discussion, which will hopefully come across as more balanced.  

      Reviewer #2:

      (1) Alter title, conclusions to better reflect specific nature of the work.

      We now provided additional data and feel that it justifies our conclusions and title.

      (2) Use Gene Ontology searches to look at patterns of gene expression in RNA seq data.

      We now show it in Figure 2 – figure supplement 2.

      (3) Use genetic or lipidomic tools rather than solely adding exogenous lipids.

      We agree that lipidomic analysis is an important direction for future research, but feel that lipidomic analysis and further genetic experiments go beyond the scope of current manuscript.

      Reviewer #3:

      To strengthen the evidence for the role of IRE-1 in cold adaptation, the authors might consider performing additional functional assays, such as testing the effects of IRE-1 and XBP-1 mutations under varying cold conditions and testing the genetic interaction of ire-1 with xbp-1, skn-1, and hsf-1 in cold sensitivities. It is also worth using alternative approaches such as independent alleles of ire-1, knockdowns or tissue-specific knockouts (without potential developmental compensation in global constitutive mutants) to better characterize the contribution of IRE-1 to cold adaptation. Additionally, studies that examine tissue-specific responses to cold exposure could provide important insights, as different tissues may utilize distinct molecular pathways to adapt to cold stress.

      We also tested ire-1 and xbp-1 functions by RNAi-mediated depletion. SKN-1 is a good candidate for future studies, but Horikawa at al. (2024) showed that HSF-1 is not required for cold dormancy (at 4ºC); we also show now that HSF-1::GFP does not increase in the cold (Figure 2 – figure supplement 3C).

      This reviewer also recommends clarifying the novelty of your findings in the context of existing literature, particularly regarding the established roles of IRE-1 and XBP-1 in responding to endoplasmic reticulum stress.

      The entry point of this study was to clarify a long-standing problem in hibernation research, i.e., the apparent discrepancy between a global translation repression and de novo gene expression observed in the cold. By connecting cold-mediated expression of some genes to the IRE-1/XBP1 pathway, we strengthen the argumentation for transcription-mediated gene regulation in hibernating animals. We did go the extra mile to test the possible reason behind the activation of UPR<sup>ER</sup> in the cold but feel that a deeper analysis deserves a separate study.

      The term "hibernation" should be avoided or reworded since the study does not provide direct behavioral or physiological evidence for hibernation-like states; instead, the manuscript could refer to "cold-induced responses" or "adaptations to cold temperatures."

      The term “hibernation” was used before even in the context of the C. elegans dauer state, which, arguably, is even less appropriate. In addition to a global suppression of translation shown here, we reported before that the same cooling regime suppresses ageing (Habacher et al., 2016; Figure S1C). Incubating at 4ºC also arrests C. elegans development (Horikawa et al., 2024). Thus, while the worm and mammalian hibernation are certainly not equivalent – which we clearly spell out – we like to use “hibernation” interchangeably with “cold dormancy” to draw attention to a fascinating aspect of C. elegans biology. Still, we use now quotation marks in the title to avoid misunderstanding.

      The discussion could be strengthened by addressing the relevance of prior studies, such as those linking IRE-1 to SKN-1 (PMID: 27540856), TRPA-1 (PMID: 23415228), ZIP-10 (PMID: 29664006), HSF-1 (PMID: 38987256) in cold adaptation and elaborating on how your findings provide new

      The IRE-1/SKN-1 and ZIP-10 papers are now mentioned when describing the model in Figure 7. The TRP-1 and HSF-1 papers are cited when discussing physiological differences between different cold temperatures. Consistent with our studies, the HSF-1 paper shows that nematodes enter a dormant state at 4ºC (but at 9ºC and higher temperatures continue developing). Importantly, HSF-1 promotes the development at 9ºC but is not important for the arrest at 4ºC. We also shown now in Figure 2 – figure supplement 3C that HSF-1 does not go up at 4ºC.

    1. Author response:

      Reviewer #1 (Public Review):

      (1) The authors conclude that the committed progenitors revert to GSCs based on the coexpression of nanos2 and foxl2l nanos2 and based on expression of id1 in mutants but not in WT. Without functional data demonstrating that the progenitors revert to an earlier state, alternative interpretations should be considered. For example, it is possible that the cells initiate the committed progenitor program but continue to express the GSC program and that the coexpression of both programs blocks differentiation.

      Thanks for your insightful comment. We have explored possible alternative interpretations of our data. Regarding the suggested possibility of a continued GSC program in the mutant, we have examined the expression of GSC markers including nanos2 in the mutant at different stages. We found that in the mutant, nanos2 or other GSC markers were not significantly upregulated in GSC-to progenitor transition (G-P) and early progenitors (Prog-E) (Fig. 4B). The expression of these GSC markers was also low in the integrated clusters I4-I6 when G-P and Prog-E stages were prominent (Fig. 3D and Fig. 3E). GSC marker nanos2 was high only in mutant Prog-C. These results argue against continued GSC programs in the foxl2l mutants. Another possible explanation is that perhaps some mutant Prog-C acquires some GSC property with the upregulation of nanos2 instead of a continuous GSC program. We have now clarified our rationale about mutant cells gaining new GSC properties and included both interpretations in the Result.

      Consistent with this possibility, some Fox family members, FoxL2 and FoxPs for example, are known to be both activators and repressors of transcription or act primarily as repressors. Potentially relevant to this work, repressive activity of FoxL2 has been previously reported in the mammalian ovary (Pisarska et al Endocrinology 2004, Pisarska Am J. Phys Endo. Metabolism 2010, Kuo Reproduction 2012, Kuo Endocrinology 2011, as well as more recent publications). In that context interfering with FoxL2 was proposed to cause upregulated expression of genes normally repressed by FoxL2, accelerated follicle recruitment, and premature ovarian failure.

      FoxL2 exerts both activating and repressive activities. We believe that Foxl2l can also activate and repress its target gene expression. Although its target genes have not been clearly identified, Foxl2l may activate genes involved such process as oogenic meiosis, and may also repress other genes involved in other processes, say perhaps nanos2.

      (2) The authors conclude that the committed progenitor stage is "the gate toward female determination" and that the cells "stay at S-Phase temporarily before differentiation". This conclusion seems to be based solely on single cell RNAseq expression. In several species, including zebrafish, meiotic entry occurs earlier in females and has been correlated with ovary development. The possibility that the late progenitor stage, the stage when meiotic genes are detected in this study and a stage missing in foxl2l mutants, is actually the key stage for female determination cannot be excluded by the data provided.

      We agree that Prog-L is important for the initiation of female meiosis. We have made revision in the text to point out the importance of Prog-L in female differentiation.

      (3) The authors discuss prior working showing that loss of germ cells leads to male development and that germ cells are required for female development and claim to extend that work by showing here that some progenitors are already sexually differentiated. First, the stages compared are completely different. The earlier work looks at the primordial germ cells and their loss in the first few days of development before a gonad forms. In contrast, this work examines stages well after the gonad has formed and during sex determination.

      Both previous studies and our study indicate the important role of germ cells in zebrafish sex differentiation during gonadal development. The earlier works show that the abundance of primordial germ cells contributes to sex differentiation. Our current finding further suggests the existence of female identify in some germ cells at the juvenile stage and discusses the importance of cell in sexual differentiation. We have added the developmental age in our study to emphasize the age difference.

      The second concern is that the conclusion that the progenitors are differentiated is based solely on the expression of foxl2l, which is initially expressed in the juvenile ovary state that lab strains have been shown to develop through (Wilson et al Front Cell Dev Bio 2024). While it is fair to state that some cells express ovary markers at this stage, it is unclear that this is sufficient evidence that the cells are differentiated.

      The conclusion about the differentiation of progenitors is not based solely on foxl2l expression; rather, it is according to the whole transcriptomic profiles of both WT (Figure 1B) and foxl2l mutant cells (Figure 3A) as well as the foxl2l mutant phenotype (Figure 2C). Three types of progenitors, Prog-E, Prog-C and Prog-L were identified by whole transcriptomic analysis in WT. In foxl2l mutants, the transcriptomic profile further shows that Prog-L and meiotic cells are completely lost, and all germ cells undergo male differentiation eventually. These results together indicate that the differentiation of Prog-C to Prog-L guides the progenitor toward female differentiation. Our result also showed that in the juvenile gonad, foxl2l expression is high in two types of progenitors, Prog-C and Prog-L, and become low after meiotic entry.

      For example, in the context of the foxl2l mutant, the authors observe that GSCs and early progenitors inappropriately express foxl2l, but the mutants develop as males. Thus, expression of foxl2l transcripts alone is insufficient evidence to claim that the cells are already differentiated as female.

      The foxl2l mutants develop into males because they lack functional Foxl2l. Although the mutated foxl2l transcript is present in mutant cells, these transcripts are not functional. These mutants develop into males eventually. This result is consistent with our claim that functional Foxl2l is important for the development of Prog-L and female differentiation.

      (4) The comparison between medaka and zebrafish foxl2l mutants seems to suggest that Foxl2l is required for meiosis in medaka but has a different role in zebrafish. However, if foxl2l represses the earlier developmental programs of GSCs and early progenitors, it is possible that continued expression of these early programs interferes with activation of meiotic genes. This could account for the absence of the late progenitor stage in foxl2l mutants since the late progenitor stage is defined by and distinguished from the earlier stages by expression of foxl2l and meiotic genes. If so, foxl2l may be similarly required in both systems.

      Medaka and zebrafish Foxl2l may share similar functions such as the stimulation of meiotic gene expression and promotion of oogenesis in the female germ cells preparing for meiotic entry. In addition, we also detected aberrant upregulation of nanos2 in some foxl2l mutant cells. The idea of “continued expression of these early programs interferes with activation of meiotic genes” is conceivable, but for now we have no evidence for it. We do not know whether the absence of meiotic genes is due to an interference caused by the activation of nanos2 or due to the complete loss of Prog-L and meiotic cells. It will also be interesting to find out whether medaka Foxl2l has a role in early progenitors

      (5) The authors state that "Foxl2l may ensure female differentiation by preventing stemness and antagonizing male development." It is unclear why suppressing stemness would be necessary for female differentiation since female zebrafish have stem cells as do male zebrafish. It seems likely that turning off the GSC and early differentiation programs is important for allowing expression of meiosis and oocyte differentiation genes, and that a gene other than Foxl2l is required for differentiation from GSCs to spermatocytes.

      It is true that we have not proved whether suppression of stemness is required for female differentiation. Maybe our earlier statement is a bit misleading. We agree that it is likely that turning off the GSC and early differentiation programs is important for allowing expression of meiotic and oocyte differentiation genes, and that a gene other than Foxl2l is required for differentiation from GSCs to spermatocytes. To avoid confusion, we have modified our statement in the text.

      (6) Based on its expression in mutant progenitors, p53 is proposed to assist with alternative differentiation of mutant germ cells. Although p53 transcripts are expressed, no evidence is provided that p53 is involved in differentiation of germ cells, and sex bias has not been associated with the published p53 mutants in zebrafish. Furthermore, while p53 has been shown to be important for ovary to testis transformation in mutant contexts in adults, it appears dispensable for testis development in mutants that disrupt ovary differentiation in earlier stages (Rodriguez-Mari et al PLoS Gen 2010, Shive PNAS 2010, Hartung et al Mol. Reprod. Dev 2014, Miao Development 2017, Kaufman et al PLoSGen 2018, Bertho et al Development 2021. It is possible that p53 eliminates foxl2l mutant germ cells that are simultaneously expressing multiple developmental programs, but this possibility would need to be tested.

      The tp53<sup>-/-</sup>foxl2l<sup>-/-</sup> double mutant cannot alleviate the all-male phenotype of foxl2l<sup>-/-</sup> mutant (Dev Biol, 517, 91-99, 2024), indicating that the male development is not due to p53-mediated germ cell apoptosis. We have cited the suggested papers and compared relation of tp53 between these mutants (fancl, zar1, etc.) mentioned in the cited papers. Since tp53 was enriched in certain foxl2l<sup>-/-</sup> mutant cell clusters, and tp53 mutation fails to rescue the all-male phenotype, it is possible that p53 expressed in these mutant cell clusters has roles other than inducing apoptosis. One assumption is that p53 may be involved in the germ cell differentiation, especially p53 is known to promote differentiation of airway epithelial progenitors, adipogenesis and embryonic stem cells. We have emphasized that the suggested role of p53 in germ cell differentiation is our assumption in the Discussion.

      Reviewer #3 (Public Review):

      This is the first report to show a transcriptional factor, foxl2l, is essential for the development of female germs. Without foxl2l, germ cells will be developed into sperms. The report also clearly defined the arrested stage of early germ cells in foxl2l mutants, or stages that is critical for foxl2l to play a role for the further development of female germ cells.

      (1) Due to lack of cell lineage tracing, the claim of foxl2l suppression of dedifferentiate of progenitor cells to GSC based on the gene expression and cell number changes is weak.

      Thanks for your comments pointing out our contribution and also weakness. We acknowledge the lack of direct evidence on the reversion of mutant Prog-C to GSC in our data. We now removed the claim about the repression of stemness by Foxl2l.

      (2) In addition, separation of early germ cell types in foxl2l mutant using marker genes from WT may not be optimal.

      The cell type of mutant cell is determined by two independent analyses. First is inferring the developmental stage of mutant cells. This approach assumes that mutant cells can indeed be mapped to specific WT stages through their transcriptomic profiles. However, as indicated by this reviewer’s comments, mutant cells exhibited heterogeneity and can be distinct from WT cells. Defining cell types in mutants by WT markers may not be optimal. To address this, we conducted another analysis, co-clustering. Mutant cells and WT cells at early stages (GSC , G-P, Prog-E, Prog-C(S) and Prog-C) were co-clustered. This approach does not assume a direct correspondence between mutant and WT developmental stages. Instead, it facilitates the identification of novel germ cell types in mutants while characterizing the relationship between WT and mutant cells. In some clusters, both WT and mutant cells were present, indicating high transcriptomic similarity. In other clusters, most cells are only mutant cells, indicating distinct mutant cell types (Figure 3C). We can, therefore, assign developmental properties to these mutant cells with confidence.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Pradhan et al investigated the potential gustatory mechanisms that allow flies to detect cholesterol. They found that flies are indifferent to low cholesterol and avoid high cholesterol. They further showed that the ionotropic receptors Ir7g, Ir51b, and Ir56d are important for the cholesterol sensitivity in bitter neurons. The figures are clear and the behavior result is interesting. However, I have several major comments, especially on the discrepancy of the expression of these Irs with other lab published results, and the confusing finding that the same receptors (Ir7g, Ir51b) have been implicated in the detection of various seemingly unrelated compounds.

      Strengths:

      The results are very well presented, the figures are clear and well-made, text is easy to follow.

      Weaknesses:

      (1) Regarding the expression of Ir56d. The reported Ir56d expression pattern contradicts multiple previous studies (Brown et al., 2021 eLife, Figure 6a-c; Sanchez-Alcaniz et al., 2017 Nature Communications, Figure 4e-h; Koh et al., 2014 Neuron, Figure 3b). These studies, using three different driver lines, consistently showed Ir56d expression in sweet-sensing neurons and taste peg neurons. Importantly, Sanchez-Alcaniz et al. demonstrated that Ir56d is not expressed in Gr66a-expressing (bitter) neurons. This discrepancy is critical since Ir56d is identified as the key subunit for cholesterol detection in bitter neurons, and misexpression of Ir7g and Ir51b together is insufficient to confer cholesterol sensitivity (Fig.4b,d). Which Ir56d-GAL4 (and Gr66a-I-GFP) line was used in this study? Is there additional evidence (scRNA sequencing, in-situ hybridization, or immunostaining) supporting Ir56d expression in bitter neurons?

      We agree that the expression pattern of Ir56d diverges from two prior reports . The studies by Brown et al. and Koh et al. employed the same Ir56d-GAL4 driver line, which exhibited expression in sweet-sensing gustatory receptor neurons (GRNs) and taste peg neurons, but not bitter GRNs (the Sanchez-Alcaniz et al. paper did not use an Ir56d-Gal4).

      In our study, we used a Ir56d-GAL4 driver line (KDRC:2307) and the Gr66a-I-GFP reporter line (Weiss et al., 2011 Neuron). This is a crucial distinction, as differences in the regulatory regions used to generate different driver lines are well known to underlie differences in expression patterns. Our double-labeling experiments revealed co-expression of Ir56d with Gr66a-positive bitter GRNs specifically within the S6 and S7 sensilla—types previously shown to exhibit strong electrophysiological responses to cholesterol (Figure 2—figure supplement 1F).

      We believe this observation is biologically significant and consistent with our functional data. Specifically, targeted expression of Ir56d in bitter neurons using the Gr33a-GAL4 was sufficient to rescue cholesterol avoidance behavior in Ir56d<sup>1</sup> mutants (Figure 3G). These results demonstrate that Ir56d plays a functional role in bitter GRNs for cholesterol detection. The convergence of genetic, behavioral, and electrophysiological data presented in our study provides compelling support for this previously unappreciated expression pattern and function of Ir56d.

      (2) Ir51b has previously been implicated in detecting nitrogenous waste (Dhakal 2021), lactic acid (Pradhan 2024), and amino acids (Aryal 2022), all by the same lab. Additionally, both Ir7g and Ir51b have been implicated in detecting cantharidin, an insect-secreted compound that flies may or may not encounter in the wild, by the same lab. Is Ir51b proposed to be a specific receptor for these chemically distinct compounds or a general multimodal receptor for aversive stimuli? Unlike other multimodal bitter receptors, the expression level of Ir51b is rather low and it's unclear which subset of GRNs express this receptor. The chemical diversity among nitrogenous waste, amino acids, lactic acid, cantharidin, and cholesterol raises questions about the specificity of these receptors and warrants further investigation and at a minimum discussion in this paper. Given the wide and seemingly unrelated sensitivity of Ir51b and Ir7g to these compounds I'm leaning towards the hypothesis that at least some of these is non-specific and ecologically irrelevant without further supporting evidence from the authors.

      While it is true that IR51b and IR7g are responsive to a range of compounds, they share chemical features such as nitrogen-containing groups, hydrophobicity, or amphipathic structures suggesting that recognition of these chemicals may be mediated by the same or overlapping domains within the receptor complexes. These features could facilitate binding to a structurally diverse yet chemically related groups of aversive ligands.

      In the case of cholesterol, while its sterol ring system is distinct from the other compounds, it shares hydrophobic and amphipathic properties that may enable interaction with these receptors via similar structural motifs. Importantly, our data demonstrates that Ir51b and Ir7g are necessary but not sufficient on their own to confer cholesterol sensitivity, indicating that additional co-factors or receptor subunits are required for full functionality (Figure 4B, D). Furthermore, our dose-response analysis (Figure 3F) shows that Ir7g is particularly important at higher cholesterol concentrations, supporting the idea of graded sensitivity rather than indiscriminate activation. This suggests that these receptors may have evolved to recognize cholesterol and its analogs (e.g., phytosterols such as stigmasterol, yet to be tested), which are naturally found in the fly’s diet (e.g., yeast and plant-derived matter), as ecologically relevant cues signaling microbial contamination, lipid imbalance, or dietary overconsumption.

      We acknowledge the reviewer’s concern regarding the relatively low expression levels of Ir51b and Ir7g. However, we note that low transcript abundance does not necessarily equate to diminished physiological relevance. Finally, we agree that the chemical diversity of ligands associated with Ir51b and Ir7g warrants deeper investigation, particularly through structure-function studies aimed at identifying ligand-binding domains and receptor-ligand interactions at atomic resolution.

      (3) The Benton lab Ir7g-GAL4 reporter shows no expression in adults. Additionally, two independent labellar RNA sequencing studies (Dweck, 2021 eLife; Bontonou et al., 2024 Nature Communications) failed to detect Ir7g expression in the labellum. This contradicts the authors' previous RT-PCR results (Pradhan 2024 Fig. S4, Journal of Hazardous Materials) showing Ir7g expression in the labellum. Additionally the Benton and Carlson lab Ir51b-GAL4 reporters show no expression in adults as well. Please address these inconsistencies.

      With respect to Ir7g, we acknowledge that the Ir7g-GAL4 reporter line from the Benton lab does not exhibit detectable expression in adult labella. Furthermore, two independent transcriptomic studies—Dweck et al., 2021 (eLife) and Bontonou et al., 2024 (Nature Communications) also did not detect Ir7g transcripts in bulk RNA-seq datasets derived from adult labella. However, our previously published RT-PCR data (Pradhan et al., 2024, Journal of Hazardous Materials, Fig. S4) revealed Ir7g expression in labellar tissue, albeit at low levels. Our RT-PCR includes an internal control (tubulin) with the same reaction tube with control and the Ir7g mutant as a negative control. Therefore, we stand behind the findings that Ir7g is expressed in the labellum.

      We would like to point out that RT-PCR is more sensitive and better-suited to detect low-abundance transcripts than bulk RNA-seq, which may fail to capture transcripts due to limitations in depth of coverage. Moreover, immunohistochemistry can have limitations in detecting very low expression levels. Costa et al. 2013 (Translational lung cancer research) states that “RNA-Seq technique will not likely replace current RT-PCR methods, but will be complementary depending on the needs and the resources as the results of the RNA-Seq will identify those genes that need to then be examined using RT-PCR methods”.

      Similarly, regarding Ir51b, while the GAL4 reporter lines from the Benton and Carlson labs do not show robust adult expression, our RT-PCR and functional data strongly support a role for Ir51b in labellar bitter GRNs. Specifically, Ir51b<sup>1</sup> mutants display electrophysiological deficits in response to cholesterol (Figure 2A–B), and these defects are rescued by expressing Ir51b in Gr33a-positive bitter neurons (Figure 3G), providing functional validation of the RT-PCR expression.

      (4) The premise that high cholesterol intake is harmful to flies, which makes sensory mechanisms for cholesterol avoidance necessary, is interesting but underdeveloped. Animal sensory systems typically evolve to detect ecologically relevant stimuli with dynamic ranges matching environmental conditions. Given that Drosophila primarily consume fruits and plant matter (which contain minimal cholesterol) rather than animal-derived foods (which contain higher cholesterol), the ecological relevance of cholesterol detection requires more thorough discussion. Furthermore, at high concentrations, chemicals often activate multiple receptors beyond those specifically evolved for their detection. If the cholesterol concentrations used in this study substantially exceed those encountered in the fly's natural diet, the observed responses may represent an epiphenomenon rather than an ecologically and ethologically relevant sensory mechanism. What is the cholesterol content in flies' diet and how does that compare to the concentrations used in this paper?

      Drosophila melanogaster cannot synthesize sterols de novo, and must acquire them from its diet. In natural environments, flies acquire sterols from fermenting fruit, decaying plant matter, and yeast, which contain trace amounts of phytosterols (e.g., stigmasterol, β-sitosterol) and ergosterol. While the exact sterol concentrations in these sources remain uncharacterized, our behavioral assays used concentrations (0.001–0.01% by weight) that align with the low levels expected in such nutrient-limited ecological niches.

      In our study, the cholesterol concentrations tested ranged from 0.001% to 0.1%, thereby spanning both the physiologically relevant and slightly elevated range. Importantly, avoidance behaviors and receptor activation were most prominent at 0.1% cholesterol. While it is true that high chemical concentrations may elicit off-target effects via broad receptor activation, our genetic and electrophysiological data indicate that the observed responses are mediated by specific ionotropic receptors (Ir51b, Ir7g, Ir56d) and not merely generalized chemical stress.

      Ecologically, elevated sterol levels may also signal conditions unsuitable for egg-laying or larval development. For example, high levels of cholesterol or other sterols may occur in substrates colonized by pathogenic microbes, decaying animal tissue, or in cases of abnormal microbial fermentation, which could represent a nutritional or microbial hazard. The avoidance of cholesterol may help signal the flies to avoid consuming decaying animal tissue. In this context, sensory detection of excessive cholesterol might serve as a protective function.

      Reviewer #2 (Public review):

      Summary:

      In Cholesterol Taste Avoidance in Drosophila melanogaster, Pradhan et al. used behavioral and electrophysiological assays to demonstrate that flies can: (1) detect cholesterol through a subset of bitter-sensing gustatory receptor neurons (GRNs) and (2) avoid consuming food with high cholesterol levels. Mechanistically, they identified five members of the IR family as necessary for cholesterol detection in GRNs and for the corresponding avoidance behavior. Ectopic expression experiments further suggested that Ir7g + Ir56d or Ir51b + Ir56d may function as tuning receptors for cholesterol detection, together with the Ir25a and Ir76b co-receptors.

      Strengths:

      The experimental design of this study was logical and straightforward. Leveraging their expertise in the Drosophila taste system, the research team identified the molecular and cellular basis of a previously unrecognized taste category, expanding our understanding of gustation. A key strength of the study was its combination of electrophysiological recordings with behavioral genetic experiments.

      Weaknesses:

      My primary concern with this study is the lack of a systematic survey of the IRs of interest in the labellum GRNs. Consequently, there is no direct evidence linking the expression of putative cholesterol IRs to the B GRNs in the S6 and S7 sensilla.

      Specifically, the authors need to demonstrate that the IR expression pattern explains cholesterol sensitivity in the B GRNs of S6 and S7 sensilla, but not in other sensilla. Instead of providing direct IR expression data for all candidate IRs (as shown for Ir56d in Figure 2-figure supplement 1F), the authors rely on citations from several studies (Lee, Poudel et al. 2018; Dhakal, Sang et al. 2021; Pradhan, Shrestha et al. 2024) to support their claim that Ir7g, Ir25a, Ir51b, and Ir76b are expressed in B GRNs (Lines 192-194). However, none of these studies provide GAL4 expression or in situ hybridization data to substantiate this claim.

      Without a comprehensive IR expression profile for GRNs across all taste sensilla, it is difficult to interpret the ectopic expression results observed in the B GRN of the I9 sensillum or the A GRN of the L-sensillum (Figure 4). It remains equally plausible that other tuning IRs-beyond the co-receptor Ir25a and Ir76b-could interact with the ectopically expressed IRs to confer cholesterol sensitivity, rather than the proposed Ir7g + Ir56d or Ir51b + Ir56d combinations.

      We provide electrophysiological data demonstrating that the S6 and S7 sensilla respond to cholesterol (Figure 1D). This finding is consistent with the hypothesis that these sensilla harbor the complete receptor complexes necessary for cholesterol detection. In our electrophysiological recordings, only those bitter GRNs that co-express Ir56d along with either Ir7g or Ir51b generate action potentials in response to cholesterol. Other S-type sensilla lacking one or more of these subunits remain unresponsive, reinforcing the idea that these components are necessary for receptor function and sensory coding of cholesterol. Moreover, in the cholesterol-insensitive I9 sensillum (based on our mapping results using electrophysiology), co-expression of either Ir7g + Ir56d or Ir51b + Ir56d conferred de novo cholesterol sensitivity (Figure 4B). Importantly, no cholesterol response was observed when any of these IRs was expressed alone or when Ir7g + Ir51b were co-expressed without Ir56d. These findings strongly argue against the possibility that endogenous tuning IRs in I9 sensilla (e.g., Ir25a, Ir76b) are sufficient to generate cholesterol responsiveness.

      Furthermore, based on the literature, Ir25a and Ir76b are endogenously expressed in I- and L-type sensilla. Thus, their presence alone is insufficient for cholesterol responsiveness. These data support the model that cholesterol sensitivity depends on a specific, multi-subunit receptor complex (e.g., Ir7g + Ir25a + Ir56d + Ir76b or Ir51b + Ir25a + Ir56d + Ir76b).

      In conclusion, while we acknowledge that our data do not provide a full anatomical map of IR expression across all sensilla, our results strongly support the idea that cholesterol sensitivity in S6 and S7 sensilla arises from specific combinations of IRs expressed in the B GRNs.

      Reviewer #3 (Public review):

      Summary:

      Whether and how animals can taste cholesterol is not well understood. The study provides evidence that 1) cholesterol activates a subset of bitter-sensing gustatory receptor neurons (GRNs) in the fly labellum, but not other types of GRNs, 2) flies show aversion to high concentrations of cholesterol, and this is mediated by bitter GRNs, and 3) cholesterol avoidance depends on a specific set of ionotropic receptor (IR) subunits acting in bitter GRNs. The claims of the study are supported by electrophysiological recordings, genetic manipulations, and behavioral readouts.

      Strengths:

      Cholesterol taste has not been well studied, and the paper provides new insight into this question. The authors took a comprehensive and rigorous approach in several different parts of the paper, including screening the responses of all 31 labellar sensilla, screening a large panel of receptor mutants, and performing misexpression experiments with nearly every combination of the 5 IRs identified. The effects of the genetic manipulations are very clear and the results of electrophysiological and behavioral studies match nicely, for the most part. The appropriate controls are performed for all genetic manipulations.

      Weaknesses:

      The weaknesses of the study, described below, are relatively minor and do not detract from the main conclusions of the paper.

      (1) The paper does not state what concentrations of cholesterol are present in Drosophila's natural food sources. Are the authors testing concentrations that are ethologically Drosophila melanogaster primarily feeds on fermenting fruits and associated microbial communities, especially yeast, which serve as major sources of dietary sterols. These natural food sources are known to contain phytosterols such as stigmasterol and β-sitosterol. One study quantified phytosterols (e.g., stigmasterol, sitosterol) in fruits, reporting concentrations between 1.6–32.6 mg/100 g edible portion (~0.0016–0.0326% wet weight) (Han et al 2008). The range we tested falls within this range. Additionally, ergosterol, the principal sterol in yeast and a structural analog of cholesterol, is present at levels of about 0.005% to 0.02% in yeast-rich environments.

      To ensure physiological relevance, we designed our behavioral assays to include a broad concentration range of cholesterol, from 10<sup>-5</sup>% to 10<sup>-1</sup>%. This spans both physiological levels (0.001–0.01%), which are comparable to those found in the natural diet, and supra-physiological levels (e.g., 0.1%), which exceed natural exposure but help define the threshold for aversive behavior.

      Our results demonstrate that flies begin to avoid cholesterol at concentrations ≥10<sup>-3</sup>% more (Figure 3A), which falls within the upper physiological range and may reflect the threshold beyond which cholesterol or related sterols become deleterious. At these higher concentrations, excess sterols may disrupt membrane fluidity, interfere with hormone signaling, or promote microbial overgrowth—all of which could compromise fly health.

      (2) The paper does not state or show whether the expression of IR7g, IR51b, and IR56d is confined to bitter GRNs. Bitter-specific expression of at least some of these receptors would be necessary to explain why bitter GRNs but not sugar GRNs (or other GRN types) normally show cholesterol responses.

      We show the Ir56d-Gal4 is co-expressed with Gr66a-GFP in S6/S7 sensilla, indicating that it is expressed in bitter GRNs (Figure 2—figure supplement 1F). In the case of Ir7g and Ir51b, there are no reporters or antibodies to address expression. However, previously they have been shown to be expressed in bitter GRNs using RT-PCR (Dhakal et al. 2021, Communications Biology; Pradhan et al. 2024, Journal of Hazardous Materials). In addition, we provide functional evidence that bitter GRNs are required for the cholesterol response since silencing bitter GRNs abolishes cholesterol-induced action potentials (Figure 1E–F). Moreover, we showed that we could rescue the Ir7g<sup>1</sup>, Ir51b<sup>1</sup> and Ir56d<sup>1</sup> mutant phenotypes only when we expressed the cognate transgenes in bitter GRNs using the Gr33a-GAL4 (Figure 3G). Thus, while Ir7g/Ir51b are not exclusive to bitter GRNs, their functional role in cholesterol detection is bitter-GRN-specific.

      (3) The authors only investigated the responses of GRNs in the labellum, but GRN responses in the leg may also contribute to the avoidance of cholesterol feeding. Alternatively, leg GRNs might contribute to cholesterol attraction that is unmasked when bitter GRNs are silenced. In support of this possibility, Ahn et al. (2017) showed that Ir56d functions in sugar GRNs of the leg to promote appetitive responses to fatty acids.

      This is an interesting idea. Indeed, when bitter GRNs are hyperpolarized, the flies exhibit a strong attraction to cholesterol. Nevertheless, the cellular basis for cholesterol attraction and whether it is mediated by GRNs in the legs will require a future investigation.

      (4) The authors might consider using proboscis extension as an additional readout of taste attraction or aversion, which would help them more directly link the labellar GRN responses to a behavioral readout. Using food ingestion as a readout can conflate the contribution of taste with post-ingestive effects, and the regulation of food ingestion also may involve contributions from GRNs on multiple organs, whereas organ-specific contributions can be dissociated using proboscis extension. For example, does presenting cholesterol on the proboscis lead to aversive responses in the proboscis extension assay (e.g., suppression of responses to sugar)? Does this aversion switch to attraction when bitter GRNs are silenced, as with the feeding assay?

      We thank the reviewer for the suggestion regarding the use of the proboscis extension reflex (PER) assay to strengthen the link between labellar GRN activity and behavioral responses to cholesterol.

      Author response image 1.

      Our PER assay results shown above indicate that cholesterol presentation on the labellum or forelegs leads to an aversive response, as evidenced by a significant reduction in proboscis extension when compared to control stimuli (Author response image 1A. 2% sucrose or 2% sucrose with 10<sup>-1</sup>% cholesterol was applied to labellum or forelegs and the percent PER was recorded. n=6. Data were compared using single-factor ANOVA coupled with Scheffe’s post-hoc test. Statistical significance was compared with the control. Means ± SEMs. **p<0.01). This finding supports the idea that cholesterol is detected by labellar and leg GRNs and elicits behavioral avoidance. In contrast, sucrose stimulation robustly induces proboscis extension, as expected for an appetitive stimulus. We confirmed the defects of due to each Ir mutant by presenting the stimuli to the labellum (Author response image 1B). Together, these PER results provide a more direct behavioral correlate of labellar and leg GRN activation and reinforce our conclusion that cholesterol is sensed as an aversive tastant through the labellar bitter GRNs.

      (5) The authors claim that the cholesterol receptor is composed of IR25a, IR76b, IR56d, and either IR7g or IR51b. While the authors have shown that IR25a and IR76b are each required for cholesterol sensing, they did not show that both are required components of the same receptor complex. If the authors are relying on previous studies to make this assumption, they should state this more clearly. Otherwise, I think further misexpression experiments may be needed where only IR25a or IR76b, but not both, are expressed in GRNs.

      In our study, we relied on prior work demonstrating that Ir25a and Ir76b function as broadly required co-receptors in most IR-dependent chemosensory pathways (Ganguly et al., 2017; Lee et al., 2018). These studies showed that Ir25a and Ir76b are co-expressed in many GRNs across multiple taste modalities. Functional IR complexes often fail to form or signal properly in the absence of these co-receptors. Thus, it is widely accepted in the field that Ir25a and Ir76b function together as a core heteromeric scaffold for diverse IR complexes, akin to co-receptors in other ionotropic glutamate receptor families. We state that while Ir25a and Ir76b are presumed co-receptors in the cholesterol receptor complex based on their conserved roles, their direct physical interaction with Ir7g, Ir51b, and Ir56d remains to be demonstrated.

      In support of this model, we note that in our ectopic expression experiments using I9 sensilla, which endogenously express Ir25a and Ir76b, introduction of either Ir7g + Ir56d or Ir51b + Ir56d was sufficient to confer cholesterol sensitivity (Figure 4B). We obtained a similar result in L6 sensilla (Figure 4D), which also endogenously express Ir25a and Ir76b. These findings imply that both co-receptors are already present in these sensilla and are likely part of the functional complex. However, we agree that we have not directly tested the requirement for both co-receptors in a minimal reconstitution context, such as expressing only Ir25a or Ir76b alongside tuning IRs in an otherwise null background. Such an experiment would indeed provide more direct evidence of their joint requirement in the receptor complex. Future studies, including heterologous expression experiments, will be necessary to define the cholesterol-receptor complexes.