5,949 Matching Annotations
  1. Oct 2024
    1. Author response:

      Reviewer #2 (Public Review):

      M. El Amri et al., investigated the functions of Marcks and Marcks like 1 during spinal cord (SC) development and regeneration in Xenopus laevis. The authors rigorously performed loss of function with morpholino knock-down and CRISPR knock-out combining rescue experiments in developing spinal cord in embryo and regeneration in tadpole stage.

      For the assays in the developing spinal cord, a unilateral approach (knock-down/out only one side of the embryo) allowed the authors to assess the gene functions by direct comparing one-side (e.g. mutated SC) to the other (e.g. wild type SC on the other side). For the assays in regenerating SC, the authors microinject CRISPR reagents into 1-cell stage embryo. When the embryo (F0 crispants) grew up to tadpole (stage 50), the SC was transected. They then assessed neurite outgrowth and progenitor cell proliferation. The validation of the phenotypes was mostly based on the quantification of immunostaining images (neurite outgrowth: acetylated tubulin, neural progenitor: sox2, sox3, proliferation: EdU, PH3), that are simple but robust enough to support their conclusions. In both SC development and regeneration, the authors found that Marcks and Marcksl1 were necessary for neurite outgrowth and neural progenitor cell proliferation.

      The authors performed rescue experiments on morpholino knock-down and CRISPR knock-out conditions by Marcks and Marcksl1 mRNA injection for SC development and pharmacological treatments for SC development and regeneration. The unilateral mRNA injection rescued the loss-of-function phenotype in the developing SC. To explore the signalling role of these molecules, they rescued the loss-of-function animals by pharmacological reagents They used S1P: PLD activator, FIPI: PLD inhibitor, NMI: PIP2 synthesis activator and ISA-2011B: PIP2 synthesis inhibitor. The authors found the activator treatment rescued neurite outgrowth and progenitor cell proliferation in loss of function conditions. From these results, the authors proposed PIP2 and PLD are the mediators of Marcks and Marcksl1 for neurite outgrowth and progenitor cell proliferation during SC development and regeneration. The results of the rescue experiments are particularly important to assess gene functions in loss of function assays, therefore, the conclusions are solid. In addition, they performed gain-of-function assays by unilateral Marcks or Marcksl1 mRNA injection showing that the injected side of the SC had more neurite outgrowth and proliferative progenitors. The conclusions are consistent with the loss-of-function phenotypes and the rescue results. Importantly, the authors showed the linkage of the phenotype and functional recovery by behavioral testing, that clearly showed the crispants with SC injury swam less distance than wild types with SC injury at 10-day post surgery.

      Prior to the functional assays, the authors analyzed the expression pattern of the genes by in situ hybridization and immunostaining in developing embryo and regenerating SC. They confirmed that the amount of protein expression was significantly reduced in the loss of function samples by immunostaining with the specific antibodies that they made for Marcks and Marcksl1. Although the expression patterns are mostly known in previous works during embryo genesis, the data provided appropriate information to readers about the expression and showed efficiency of the knock-out as well.

      MARCKS family genes have been known to be expressed in the nervous system. However, few studies focus on the function in nerves. This research introduced these genes as new players during SC development and regeneration. These findings could attract broader interests from the people in nervous disease model and medical field. Although it is a typical requirement for loss of function assays in Xenopus laevis, I believe that the efficient knock-out for four genes by CRISPR/Cas9 was derived from their dedication of designing, testing and validation of the gRNAs and is exemplary.

      Weaknesses,

      (1) Why did the authors choose Marcks and Marcksl1? The authors mentioned that these genes were identified with a recent proteomic analysis of comparing SC regenerative tadpole and non-regenerative froglet (Line (L) 54-57). However, although it seems the proteomic analysis was their own dataset, the authors did not mention any details to select promising genes for the functional assays (this article). In the proteomic analysis, there must be other candidate genes that might be more likely factors related to SC development and regeneration based on previous studies, but it was unclear what the criteria to select Marcks and Marcksl1 was.

      To highlight the rationale for selecting these proteins, we reworded the sentence as follows: “A recent proteomic screen … after SCI identified a number of proteins that are highly upregulated at the tadpole stage but downregulated in froglets (Kshirsagar, 2020). These proteins included Marcks and Marcksl1, which had previously been implicated in the regeneration of other tissues (El Amri et al., 2018) suggesting a potential role for these proteins also in spinal cord regeneration.”

      (2) Gene knock-out experiments with F0 crispants,

      The authors described that they designed and tested 18 sgRNAs to find the most efficient and consistent gRNA (L191-195). However, it cannot guarantee the same phenotypes practically, due to, for example, different injection timing, different strains of Xenopus laevis, etc. Although the authors mentioned the concerns of mosaicism by themselves (L180-181, L289-292) and immunostaining results nicely showed uniformly reduced Marcks and Marcksl1 expression in the crispants, they did not refer to this issue explicitly.

      To address this issue, we state explicitly in line 208-212: “We also confirmed by immunohistochemistry that co-injection of marcks.L/S and marcksl1.L/S sgRNA, which is predicted to edit all four homeologs (henceforth denoted as 4M CRISPR) drastically reduced immunostaining for Marcks and Marcksl1 protein on the injected side (Fig. S6 B-G), indicating that protein levels are reduced in gene-edited embryos.”

      (3) Limitations of pharmacological compound rescue

      In the methods part, the authors describe that they performed titration experiments for the drugs (L702-704), that is a minimal requirement for this type of assay. However, it is known that a well characterized drug is applied, if it is used in different concentrations, the drug could target different molecules (Gujral TS et al., 2014 PNAS). Therefore, it is difficult to eliminate possibilities of side effects and off targets by testing only a few compounds.

      As explained in the responses to reviewer 1, we have completely rewritten and toned down our presentation of the pharmacological result and explicitly mention in our discussion now the possibility of side effects.

    1. Author response:

      We agree with reviewer #1 to remove the mGluR6b data. It is indeed a weakness and is too preliminary. We will gladly remove it from the revised version.

      We will address the issue of the bulk responses (depicted in Figures 5 and 6) by showing the significance data, arguing that although we cannot prove that prey-detection is increased for lower intensities, the bulk effect is significant, so prey detection is effectively stronger.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      Previous work demonstrated a strong bias in the percept of an ambiguous Shepard tone as either ascending or descending in pitch, depending on the preceding contextual stimulus. The authors recorded human MEG and ferret A1 single-unit activity during presentation of stimuli identical to those used in the behavioral studies. They used multiple neural decoding methods to test if context-dependent neural responses to ambiguous stimulus replicated the behavioral results. Strikingly, a decoder trained to report stimulus pitch produced biases opposite to the perceptual reports. These biases could be explained robustly by a feed-forward adaptation model. Instead, a decoder that took into account direction selectivity of neurons in the population was able to replicate the change in perceptual bias.

      Strengths:

      This study explores an interesting and important link between neural activity and sensory percepts, and it demonstrates convincingly that traditional neural decoding models cannot explain percepts. Experimental design and data collection appear to have been executed carefully. Subsequent analysis and modeling appear rigorous. The conclusion that traditional decoding models cannot explain the contextual effects on percepts is quite strong.

      Weaknesses:

      Beyond the very convincing negative results, it is less clear exactly what the conclusion is or what readers should take away from this study. The presentation of the alternative, "direction aware" models is unclear, making it difficult to determine if they are presented as realistic possibilities or simply novel concepts. Does this study make predictions about how information from auditory cortex must be read out by downstream areas? There are several places where the thinking of the authors should be clarified, in particular, around how this idea of specialized readout of direction-selective neurons should be integrated with a broader understanding of auditory cortex.

      While we have not used the term "direction aware", we think the reviewer refers generally to the capability of our model to use a cell's direction selectivity in the decoding. In accordance with the reviewer's interpretation, we did indeed mean that the decoder assumes that a neuron does not only have a preferred frequency, but also a preferred direction of change in frequency (ascending/descending), which is what we use to demonstrate that the decoding in this way aligns with the human percept. We have adapted the text in several places to clarify this, in particular expanding the description in the Methods substantially.

      Reviewer #2 (Public Review):

      The authors aim to better understand the neural responses to Shepard tones in auditory cortex. This is an interesting question as Shepard tones can evoke an ambiguous pitch that is manipulated by a proceeding adapting stimulus, therefore it nicely disentangles pitch perception from simple stimulus acoustics.

      The authors use a combination of computational modelling, ferret A1 recordings of single neurons, and human EEG measurements.

      Their results provide new insights into neural correlates of these stimuli. However, the manuscript submitted is poorly organized, to the point where it is near impossible to review. We have provided Major Concerns below. We will only be able to understand and critique the manuscript fully after these issues have been addressed to improve the readability of the manuscript. Therefore, we have not yet reviewed the Discussion section.

      Major concerns

      Organization/presentation

      The manuscript is disorganized and therefore difficult to follow. The biggest issue is that in many figures, the figure subpanels often do not correspond to the legend, the main body, or both. Subpanels described in the text are missing in several cases.

      We have gone linearly through the text and checked that all figure subpanels are referred to in the text and the legend. As far as we can tell, this was already the case for all panels, with the exception of two subpanels of Fig. 5.

      Many figure axes are unlabelled.

      We have carefully checked the axes of all panels and all but two (Fig. 5D) were labeled. As is customary, certain panels inherit the axis label from a neighboring panel, if the label is the same, e.g. subpanels in Fig. 6F or Fig. 5E, which helps to declutter the figure. We hope that with this clarification, the reviewer can understand the labels of each panel.

      There is an inconsistent style of in-text citation between figures and the main text. The manuscript contains typos and grammatical errors. My suggestions for edits below therefore should not be taken as an exhaustive list. I ask the authors to consider the following only a "first pass" review, and I will hopefully be able to think more deeply about the science in the second round of revisions after the manuscript is better organized.

      While we are puzzled by the severity of issues that R2 indicates (see above, and R3 qualifies it as "well written", and R1 does not comment on the writing negatively), we have carefully gone through all specific issues mentioned by R2 and the other reviewers. We hope that the revised version of the paper with all corrections and clarifications made will resolve any remaining issues.

      Frequency and pitch

      The terms "frequency" and "pitch" seem to be used interchangeably at times, which can lead to major misconceptions in a manuscript on Shepard tones. It is possible that the authors confuse these concepts themselves at times (e.g. Fig 5), although this would be surprising given their expertise in this field. Please check through every use of "frequency" and "pitch" in this manuscript and make sure you are using the right term in the right place. In many places, "frequency" should actually be "fundamental frequency" to avoid misunderstanding.

      Thanks for pointing this out. We have checked every occurrence and modified where necessary.

      Insufficient detail or lack of clarity in descriptions

      There seems to be insufficient information provided to evaluate parts of these analysis, most critically the final pitch-direction decoder (Fig 6), which is a major finding. Please clarify.

      Thanks for pointing this out. We have extended the description of the pitch-direction decoder and highlighted its role for interpreting the results.

      Reviewer #3 (Public Review):

      Summary:

      This is an elegant study investigating possible mechanisms underlying the hysteresis effect in the perception of perceptually ambiguous Shepard tones. The authors make a fairly convincing case that the adaptation of pitch direction sensitive cells in auditory cortex is likely responsible for this phenomenon.

      Strengths:

      The manuscript is overall well written. My only slight criticism is that, in places, particularly for non-expert readers, it might be helpful to work a little bit more methods detail into the results section, so readers don't have to work quite so hard jumping from results to methods and back.

      Following this excellent suggestion, we have added more brief method sketches to the Results section, hopefully addressing this concern.

      The methods seem sound and the conclusions warranted and carefully stated. Overall I would rate the quality of this study as very high, and I do not have any major issues to raise.

      Thanks for your encouraging evaluation of the work.

      Weaknesses:

      I think this study is about as good as it can be with the current state of the art. Generally speaking, one has to bear in mind that this is an observational, rather than an interventional study, and therefore only able to identify plausible candidate mechanisms rather than making definitive identifications. However, the study nevertheless represents a significant advance over the current state of knowledge, and about as good as it can be with the techniques that are currently widely available.

      Thanks for your encouraging evaluation of our work. The suggestion of an interventional study has also been on our minds, however, this appears rather difficult, as it would require a specific subset of cells to be inhibited. The most suitable approach would likely be 2p imaging with holographic inhibition of a subset of cells (using ArchT for example), that has a preference for one direction of pitch change, which should then bias the percept/behavior in the opposite direction.

      Reviewer #1 (Recommendations For The Authors):

      MAJOR CONCERNS

      (1) What is the timescale used to compute direction selectivity in neural tuning? How does it compare to the timing of the Shepard tones? The basic idea of up versus down pitch is clear, the intuition for the role of direction tuning and its relation to stimulus dynamics could be laid out more clearly. Are the authors proposing that there are two "special" populations of A1 neurons that are treated differently to produce the biased percept? Or is there something specific about the dynamics of the Shepard stimuli and how direction selective neurons respond to them specifically? It would help if the authors could clarify if this result links to broader concepts of dynamic pitch coding in general or if the example reported here is specific (or idiosyncratic) to Shepard tones.

      We propose that the findings here are not specific to Shepard tones. To the contrary, only basic properties of auditory cortex neurons, i.e. frequency preference, frequency-direction (i.e. ascending or descending) preference, and local adaptation in the tuning curve, suffice. Each of these properties have been demonstrated many times before and we only verified this in the lead-up to the results in Fig. 6. While the same effects should be observable with pure tones, the lack of ambiguity in the perception of direction of a frequency step for pure tone pairs, would make them less noticeable here. Regarding the time-scale of the directional selectivity, we relied on the sequencing of tones in our paradigm, i.e. 150 ms spacing. The SSTRFs were discretized at 50 ms, and include only the bins during the stimulus, not during the pause. The directional tuning, i.e. differences in the SSTRF above and below the preferred pitchclass for stimuli before the last stimulus, typically extended only one stimulus back in time. We have clarified this in more detail now, in particular in the added Methods section on the directional decoder.

      (2) (p. 9) "weighted by each cell's directionality index ... (see Methods for details)" The direction-selective decoder is interesting and appears critical to the study. However, the details of its implementation are difficult to locate. Maybe Fig. 6A contains the key concepts? It would help greatly if the authors could describe it in parallel with the other decoders in the Methods.

      We have expanded the description of the decoder in the Methods as the reviewer suggests.

      LESSER CONCERNS

      p. 1. (L 24) "distances between the pitch representations...." It's not obvious what "distances" means without reading the main paper. Can some other term or extra context be provided?

      We have added a brief description here.

      p. 2. (L 26) "Shepard tones" Can the authors provide a citation when they first introduce this class of stimuli?

      Citation has been added.

      p. 3 (L 4) "direction selective cells" Please define or provide context for what has a direction. Selective to pitch changes in time?

      Yes, selective to pitch changes in time is what is meant. We have further clarified this in the text.

      p. 4 (L 9-19). This paragraph seems like it belongs in the Introduction?

      Given the concerns raised by R2 about the organization of the manuscript we prefer to keep this 'road-map' in the manuscript, as a guidance for the reader.

      p. 4 (L 32) "majority of cells" One might imagine that the overlap of the bias band and the frequency tuning curve of individual neurons might vary substantially. Was there some criterion about the degree of overlap for including single units in the analysis? Does overlap matter?

      We are not certain which analysis the reviewer is referring to. Generally, cells were not excluded based on their overlap between a particular Bias band and their (Shepard) tuning curve. There are several reasons for this: The bias was located in 4 different, overlapping Shepard tone regions, and all sounds were Shepard tones. Therefore, all cells overlapped with their (Shepard) tuning curve with one or multiple of the Biases. For decoding analysis, all cells were included as both a response and lack of a response is contributing to the decoding. If the reviewer is referring only to the analysis of whether a cell adapts, then the same argument applies as above, i.e. this was an average over all Bias sequences, and therefore every responding cell was driven to respond by the Bias, and therefore it was possible to also assess whether it adapted its response for different positions inside the Bias. We acknowledge that the limited randomness of the Bias sequences in combination with the specific tuning of the cells could in a few cases create response patterns over time that are not indicative of the actual behavior for repeated stimulation, however, since the results are rather clear with 91% of cells adapting, we do not think this would significantly change the conclusions.

      p. 5 (L 17) "desynchronization ... behaving conditions" The logic here is not clear. Is less desynchronization expected during behavior? Typically, increased attention is associated with greater desynchronization.

      Yes, we reformulated the sentence to: While this difference could be partly explained by desynchronization which is typically associated with active behavior or attention [30], general response adaptation to repeated stimuli is also typical in behaving humans [31].

      p. 7 (L 5) "separation" is this a separation in time?

      Yes, added.

      p. 7 (L 33) "local adaptation" The idea of feedforward adaptation biasing encoding has been proposed before, and it might be worth citing previous work. This includes work from Nelken specifically related to SSA. Also, this model seems similar to the one described in Lopez Espejo et al (PLoS CB 2019).

      Thanks for pointing this out. We think, however, that neither of these publications suggested this very narrow way of biasing, which we consider biologically implausible. We have therefore not added either of these citations.

      p. 11 (L. 17) The cartoon in Fig. 6G may provide some intuition, but it is quite difficult to interpret. Is there a way to indicate which neuron "votes" for which percept?

      This is an excellent idea, and we have added now the purported perceptual relation of each cell in the diagram.

      p. 12 (L. 8). "classically assumed" This statement could benefit from a citation. Or maybe "classically" is not the right word?

      We have changed 'classically' to 'typically', and now cite classical works from Deutsch and Repp. We think this description makes sense, as the whole concept of bistable percepts has been interpreted as being equidistant (in added or subtracted semitone steps) from the first tone, see e.g. Repp 1997, Fig.2.

      p. 12 (L. 12) "...previous studies" of Shepard tone percepts? Of physiology?

      We have modified it to 'Relation to previous studies of Shepard tone percepts and their underlying physiology", since this section deals with both.

      p. 12 (L. 25) "compatible with cellular mechanisms..." This paragraph seems key to the study and to Major Concern 1, above. What are the dynamics of the task stimuli? How do they compare with the dynamics of neural FM tuning and previously reported studies of bias? And can the authors be more explicit in their interpretation - should direction selective neurons respond preferentially to the Shepard tone stimuli themselves? And/or is there a conceptual framework where the same neurons inform downstream percepts of both FM sweeps and both normal (unbiased) and biased Shepard tones?

      The reviewer raises a number of different questions, which we address below:

      - Dynamics of the task stimuli in relation to previously reported cellular biasing: The timescales tested in the studies mentioned are similar to what we used in our bias, e.g. Ye et al 2010 used FM sweeps that lasted for up to 200ms, which is quite comparable to our SOA of 150ms.

      - Preferred responses to Shepard tones: no, we do not think that there should be preferred responses to Shepard tones, but rather that responses to Shepard tones can be thought of as the combined responses to the constituent tones.

      - Conceptual framework where the same neurons inform about FM sweeps and both normal (unbiased) and biased Shepard tones: Our perspective on this question is as follows: To our knowledge, the classical approach to population decoding in the auditory system, i.e. weighted based on preferred frequency, has not been directly demonstrated to be read out inside the brain, and certainly not demonstrated to be read out in only this way in all areas of the brain that receive input from the auditory cortex. Rather it has achieved its credibility by being linked directly with animal performance or match with the presented stimuli. However, these approaches were usually geared towards a representation that can be estimated based on constituent frequencies. Additional response properties of neurons, such as directional selectivity have been documented and analyzed before, however, not been used for explaining the percept. We agree that our use of this cellular response preference in the decoding implicitly assumes that the brain could utilize this as well, however, this seems just as likely or unlikely as the use of the preferred frequency of a neuron. Therefore we do not think that this decoding is any more speculative than the classical decoding. In both cases, subsequent neurons would have to implicitly 'know' the preference of the input neuron, and weigh its input correspondingly.

      We have added all the above considerations to the discussion in an abbreviated form.

      p. 15 (L. 15). Is there a citation for the drive system?

      There is no publication, but an old repository, where the files are available, which we cite now: https://code.google.com/archive/p/edds-array-drive/

      p. 16 (L. 24) "position in an octave" It is implied but not explicitly stated that the Shepard tones don't contain the fundamental frequency. Can the authors clarify the relationship between the neural tuning band and the bands of the stimulus. Did a single stimulus band typically fall in a neuron's frequency tuning curve? If not 1, how many?

      Yes, it is correct that the concept of fundamental frequency does not cleanly apply to Shepard tones, because it is composed of octave spaced pure tones, but the lowest tone is placed outside the hearing range of the animal and amplitude envelope (across frequencies). Therefore one or more constituent tones of the Shepard tone can fall into the tuning curve of a neuron and contribute to driving the neuron (or inhibiting it, if they fall within an inhibitory region of the tuning curve). The number of constituent tones that fall within the tuning curve depends on the tuning width of the neurons. The distribution of tuning widths to Shepard tones is shown in Fig. S1E, which indicated that a lot of neurons had rather narrow tuning (close to the center), but many were also tuned widely, indicated that they would be stimulated by multiple constituent tones of the Shepard tone. As the tuning bandwidth (Q30: 30dB above threshold) of most cortical neurons in the ferret auditory cortex (see e.g. Bizley et al. Cerebral Cortex, 2005, Fig.12) is below 1, this means that typically not more than 1 tone fell into the tuning curve of a neuron. However, we also observed multimodal tuning-curves w.r.t. to Shepard tones, which suggests that some neurons were stimulated by more than 2 or more constituent tones (again consistent with the existence of more broadly tuned neurons (see same citation). We have added this information partly to the manuscript in the caption of Fig. S1E.

      p. 17 (L. 32). "Fig 4" Correct figure ref? This figure appears to be a schematic rather than one displaying data.

      Thanks for pointing this out, changed to Fig. 5.

      p. 18 (L. 25). "assign a pitchclass" Can the authors refer to a figure illustrating this process?

      Added.

      p. 19 (L. 17). Is mu the correct symbol?

      Thanks. We changed it to phi_i, as in the formula above.

      p. 19 (L 19). "convolution" in time? Frequency?

      Thanks for pointing this out, the term convolution was incorrect in this context. We have replaced it by "weighted average" and also adapted and simplified the formula.

      p. 19 (L 25) "SSTRF" this term is introduced before it is defined. Also it appears that "SSTRF" and "STRF" are sometimes interchanged.

      Apologies, we have added the definition, and also checked its usage in each location.

      p. 23 (Fig 2) There is a mismatch between panel labels in the figure and in the legend. Bottom right panel (B3), what does time refer to here?

      Thanks for pointing these out, both fixed.

      p. 24 (L 23) "shifts them away" away from what?

      We have expanded the sentence to: "After the bias, the decoded pitchclass is shifted from their actual pitchclass away from the biased pitchclass range ... "

      p. 25 (L 7) "individual properties" properties of individual subjects?

      Thanks for pointing this out, the corresponding sentence has been clarified and citations added.

      p. 26 (L 20) What is plotted in panel D? The average for all cells? What is n?

      Yes, this is an average over cells, the number of cells has now been added to each panel.

      p. 28 (L 3) How to apply the terms "right" "right" "middle" to the panel is not clear. Generally, this figure is quite dense and difficult to interpret.

      We have changed the caption of Panel A and replaced the location terms with the symbols, which helps to directly relate them to the figure. We have considered different approaches of adding or removing content from the figure to help make it less dense, but that all did not seem to help. For lack of better options we have left it in its current form.

      MINOR/TYPOS

      p. 3 (L 1) "Stimulus Specific Adaptation" Capitalization seems unnecessary

      Changed.

      p. 4 (L 14) "Siple"

      Corrected.

      p. 9 (L 10) "an quantitatively"

      Corrected

      p. 9 (L 20) "directional ... direction ... directly ... directional" This is a bit confusing as directseems to mean several different things in its different usages.

      We have gone through these sentences, and we think the terms are now more clearly used, especially since the term 'direction' occurs in several different forms, as it relates to different aspects (cells/percept/hypothesis). Unfortunately, some repetition is necessary to maintain clarity.

      Reviewer #2 (Recommendations For The Authors):

      Detailed critique

      Stimuli

      It would be very useful if the authors could provide demos of their stimuli on a website. Many readers will not be familiar with Shepard tones and the perceptual result of the acoustical descriptions are not intuitive. I ended up coding the stimuli myself to get some intuition for them.

      We have created some sample tones and sequences and uploaded them with the revision as supplementary documents.

      Abstract

      P1 L27 'pitch and...selective cells' - The authors haven't provided sufficient controls to demonstrate that these are "pitch cells" or "selective" to pitch direction. They have only shown that they are sensitive to these properties in their stimuli. Controls would need to be included to ensure that the cells aren't simply responding to one frequency component in the complex sound, for example. This is not really critical to the overall findings, but the claim about pitch "selectivity" is not accurate.

      Fair point. We have removed the word 'selective' in both occurrences.

      Introduction

      P2 L14-17: I do not follow the phonetic example provided. The authors state that the second syllable of /alga/ and /arda/ are physically identical, but how is this possible that ga = da? The acoustics are clearly different. More explanation is needed, or a correction.

      Apologies for the slightly misleading description, it has now been corrected to be in line with the original reference.

      P2,L26-27: Should the two uses of "frequency" be "F0" and "pitch" here? The tones are not separated in frequency by half and octave, but "separated in [F0]" by half an octave, correct? Their frequency ranges are largely overlapping. And the second 'frequency', which refers to the percept, should presumably be "pitch".

      Indeed. This is now corrected.

      P3 L2-6: Unclear at this point in the manuscript what is the difference between the 3 percepts mentioned: perceived pitch-change direction, Shepard tone pitches, and "their respective differences". (It becomes clear later, but clarification is needed here).

      We have tried a few reformulations, however, it tends to overload the introduction with details. We believe it is preferable to present the gist of the results here, and present the complete details later in the MS.

      P3 L6-7 What does it mean that the MEG and single unit results "align in direction and dynamics"? These are very different signals, so clarification is needed.

      We have phrased the corresponding sentence more clearly.

      Results

      Throughout: Choose one of 'pitch class', 'pitchclass', or 'pitch-class' and use it consistently.

      Done.

      P4L12 - would be helpful at this point to define 'repulsive effect'

      We have added another sentence to clarify this term.

      P4, L14 "simple"

      Done

      P4, L12 - not clear here what "repulsive influence" means

      See above.

      P4, L17 - alternative to which explanation? Please clarify. In general, this paragraph is difficult to interpret because we do not yet have the details needed to understand the terms used and the results described. In my opinion, it would be better to omit this summary of the results at the very beginning, and instead reveal the findings as they come, when they can be fully explained to the Reader.

      We agree, but we also believe that a rather general description here is useful for providing a roadmap to the results. However, we have added a half-sentence to clarify what is meant by alternative.

      P4 L30 - text says that cells adapt in their onset, sustained and offset responses, but only data for onset responses are shown (I think - clarification needed for fig 2A2). Supp figure shows only 1 example cell of sustained and offset, and in fact there is no effect of adaptation in the sustained response shown there.

      Regarding the effect of adaptation and whether it can be discerned from the supplementary figure: the shown responses are for 10 repetitions of one particular Bias sequence. Since the response of the cell will depend on its tuning and the specific sequence of the Shepard tones in this Bias, it is not possible to assess adaptation for a given cell. We assess the level of adaptation, by averaging all biases (similar to what is shown in Fig. 2A2) per cell, and then fit an exponential to it, separately by response type. The step direction of the exponential, relative to the spontaneous rate is then used to assess the kind of adaptation. The vast majority of cells show adaptation. We have added this information to the Methods of the manuscript.

      P4, L32 - please state the statistical test and criterion (alpha) used to determine that 91% of cells decreased their responses throughout the Bias sequence. Was this specifically for onset responses?

      Thanks for pointing this out, test and p-value added. Adaptation was observed for onset, sustained and offset responses, in all cases with the vast majority showing an adapting behavior, although the onset responses were adapting the most.

      P4 L36 - "response strength is reduced locally". What does "locally" mean here? Nearby frequencies?

      We have added a sentence here to clarify this question.

      Figure 1 - this appears to be the wrong version of the figure, as it doesn't match the caption or results text. It's not possible to assess this figure until these things are fixed. Figure 1A schematic of definition of f(diff) does not correspond to legend definition.

      As far as we can tell, it is all correct, only the resolution of the figure appears to be rather low. This has been improved now.

      Fig 2 A2 - is this also onset responses only?

      Yes, added to the caption.

      Fig 2 A3 - add y-axis label. The authors are comparing a very wide octave band (5.5 octaves) to a much narrower band (0.5 octaves). Could this matter? Is there something special about the cut-off of 2.5 octaves in the 2 bands, or was this an arbitrary choice?

      Interesting question.... essentially our stimulus design left us only with this choice, i.e. comparing the internal region of the bias with the boundary region of the bias, i.e. the test tones. The internal region just corresponds to the bias, which is 5 st wide, and therefore the range is here given as 2.5 st relative to its center, while the test tones are at the boundary, as they are 3 st from the center. The axis for the bias was mislabelled, and has now been corrected. The y-axis label is matched with the panel to the left, but has now been added to avoid any confusion.

      Fig 2A4 - does not refer to ferret single unit data, as stated in the text (p5L8). Nor does supp Fig2, as stated. Also, the figure caption does not match the figure.

      Apologies, this was an error in the code that led to this mislabelling. We have corrected the labels, which also added back the recovery from the Bias sequence in the new Panel A4.

      P5 l9 - Figure 3 is not understandable at this point in the text, and should not be referred to here. There is a lot going on in Fig 3, and it isn't clear what you are referring to.

      Removed.

      P5 L12 - by Fig 2 B1, I assume you mean A4? Also, F2B1 shows only 1 subject, not 2.

      Yes, mislabeled by mistake, and corrected now.

      Fig2B2 -What is the y-axis?

      Same as in the panel to its left, added for clarity.

      Stimuli: why are tones presented at a faster rate to ferrets than to humans?

      The main reason is that the response analysis in MEG requires more spacing in time than the neuronal analysis in the ferret brain.

      P5 L6 - there is no Fig 5 D2? I don't think it is a good idea to get the reader to skip so far ahead in the figures at this stage anyway, even if such a figure existed. It is confusing to jump around the manuscript

      Changed to 'see below'

      P5 L8 - There is no Figure 2A4, so I don't know whether this time constant is accurate.

      This was in reference to a panel that had been removed before, but we have added it back now.

      P5 L16: "in humans appears to be more substantial (40%) than for the average single units under awake conditions". One cannot directly compare magnitude of effects in MEG and single unit signals in this way and assume it is due to behavioural state. You are comparing different measures of neural activity, averaged over vastly different numbers of numbers, and recorded from different species listening to different stimuli (presentation rates).

      Yes, that's why the next sentence is: "However, comparisons between the level of adaptation in MEG and single neuron firing rates may be misleading, due to the differences in the signal measured and subsequent processing.", and all statements in the preceding sentences are phrased as 'appears' and 'may'. We think we have formulated this comparison with an appropriate level of uncertainty. Further, the main message here is that adaptation is taking place in both active and passive conditions.

      P5 L25 -I do not see any evidence regarding tuning widths in Fig s2, as stated in the text.

      Corrected to Fig. S1.

      P5 l26 - Do not skip ahead to Fig 5 here. We aren't ready to process that yet.

      OK, reference removed.

      P5 l27 - Do you mean because it could be tuning to pitch chroma, not height?

      Yes, that is a possible interpretation, although it could also arise from a combination of excitatory and inhibitory contributions across multiple octaves.

      P5 l33 - remove speculation about active vs passive for reasons given above.

      Removed.

      P6L2-6 'In the present...5 semitone step' - This is an incorrect interpretation of the minimal distance hypothesis in the context of the Shepard tone ambiguity. The percept is ambiguous because the 'true' F0 of the Shepard tones are imperceptibly low. Each constituent frequency of a single tone can therefore be perceived either as a harmonic of some lower fundamental frequency or as an independent tone. The dominant pitch of the second tone in the tritone pair may therefore be biased to be perceived at a lower constituent frequency (when the bias sequence is low) or at a higher constituent frequency (when the bias sequence is high). The text states that the minimal distance hypothesis would predict that an up-bias would make a tritone into a perfect fourth (5 semitones). This is incorrect. The MDH would predict that an up-bias would reduce the distance between the 1st tone in the ambiguous pair and the upper constituent frequency of the 2nd tone in the pair, hence making the upper constituent frequency the dominant pitch percept of the 2nd tone, causing an ascending percept.

      The reviewer here refers to a “minimal distance hypothesis”, which without a literature reference,is hard for us to fully interpret. However, some responses are given below:

      - "The percept is ambiguous because the 'true' F0 of the Shepard tones are imperceptibly low." This statement appears to be based on some misconception: due to the octave spacing (rather than multiple/harmonics of a lowest frequency), the Shepard tones cannot be interpreted as usual harmonic tones would be. It is correct that the lowest tone in a Shepard tone is not audible, due to the envelope and the fact that it could in principle be arbitrarily small... hence, speaking about an F0 is really not well-defined in the case of a Shepard tone. The closest one could get to it would be to refer to the Shepard tone that is both in the audible range and in the non-zero amplitude envelope. But again, since the envelope is fading out the highest and lowest constituent tones, it is not as easy to refer to the lowest one as F0 (as it might be much quieter than the next higher constituent.

      - "The dominant pitch of the second tone in the tritone pair may therefore be biased to be perceived at a lower constituent frequency (when the bias sequence is low) or at a higher constituent frequency (when the bias sequence is high)." This may relate to some known psychophysics, but we are unable to interpret it with certainty.

      - "The text states that the minimal distance hypothesis would predict that an up-bias would make a tritone into a perfect fourth (5 semitones). This is incorrect." We are unsure how the reviewer reaches this conclusion.

      - "The MDH would predict that an up-bias would reduce the distance between the 1st tone in the ambiguous pair and the upper constituent frequency of the 2nd tone in the pair, hence making the upper constituent frequency the dominant pitch percept of the 2nd tone, causing an ascending percept." Again, in the absence of a reference to the MDH, we are unsure of the implied rationale. We agree that this is a possible interpretation of distance, however, we believe that our interpretation of distance (i.e. distances between constituent tones) is also a possible interpretation.

      Fig 4: Given that it comes before Figure 3 in the results text, these should be switched in order in the paper.

      Switched.

      PCA decoder: The methods (p18) state that the PCA uses the first 3 dimensions, and that pitch classes are calculated from the closest 4 stimuli. The results (P6), however, state that the first 2 principal components are used, and classes are computed from the average of 10 adjacent points. Which is correct, or am I missing something?

      Thanks for pointing this out, we have made this more concrete in the Methods to: "The data were projected to the first three dimensions, which represented the pitch class as well as the position in the sequence of stimuli (see Fig. 43A for a schematic). As the position in the Bias sequence was not relevant for the subsequent pitch class decoding, we only focussed on the two dimensions that spanned the pitch circle." Regarding the number of stimuli that were averaged: this might be a slight misunderstanding: Each Shepard tone was decoded/projected without averaging. However, to then assign an estimated pitch class, we first had to establish an axis (here going around the circle), where each position along the axis was associated with a pitch class. This was done by stepping in 0.5 semitone steps, and finding the location in decoded space that corresponded to the median of the Shepard tones within +/- 0.25st. To increase the resolution, this circular 'axis' of 24 points was then linearly interpolated to a resolution of 0.05st. We have updated the text in the Methods accordingly. The mentioning of 10 points for averaging in the Results was correct, as there were 240 tones in all bias stimuli, and 24 bins in the pitch circle. The mentioning of an average over 4 tones in the Methods was a typo.

      Fig 3A: axes of pink plane should be PC not PCA

      Done.

      Fig 3B: the circularity in the distribution of these points is indeed interesting! But what do the authors make of the gap in the circle between semitones 6-7? Is this showing an inherent bias in the way the ambiguous tone is represented?

      While we cannot be certain, we think that this represents an inhomogeneous sampling from the overall set of neural tuning preferences, and that if we had recorded more/all neurons, the circle would be complete and uniformly sampled (which it already nearly is, see Fig.4C, which used to be Fig. 3C).

      Fig 3B (lesser note): It'd be preferable to replace the tint (bright vs. dark) differentiation of the triangles to be filled vs. unfilled because such a subtle change in tint is not easily differentiable from a change in hue (indicating a different variable in this plot) with this particular colour palette

      We have experimented with this suggestion, and it didn't seem to improve the clarity. However, we have changed the outline of the test-pair triangles to white, which now visually separates them better.

      P6 l32 - Please indicate if cross-validation was used in this decoder, and if so, what sort. Ideally, the authors would test on a held-out data set, or at least take a leave-one-out approach. Otherwise, the classifier may be overfit to the data, and overfitting would explain the exceptional performance (r=.995) of the classifier.

      Cross-validation was not used, as the purpose of the decoder is here to create a standard against which to compare the biased responses in the ambiguous pair, which were not used for training of the decoder. We agree that if we instead used a cross-validated decoder (which would only apply to the local average to establish the pitch class circle) the correlation would be somewhat lower, however, this is less relevant for the main question, i.e. the influence of the Bias sequence on the neural representation of the ambiguous pair. We have added this information to the corresponding section.

      Fig 3D: I understood that these pitch classifications shown by the triangles were carried out on the final ambiguous pair of stimuli. I thought these were always presented at the edges of the range of other stimuli, so I do not follow how they have so many different pitchclass values on the x-axis here.

      There were 4 Biases, centered at 0,3,6 or 9 semitones, and covering [-2.5,2.5]st relative to this center. Therefore the edges of the bias ranges (3st away from their centers) happen to be the same as the centers, e.g. for the Bias centered at 3, the ambiguous pair would be a 0-6 or 6-0 step. Therefore there are 4 locations for the ambiguous tones on the x-axis of Fig. 4D (previously 3D).

      Figure 4: This demonstration of the ambiguity of Shepard pairs may be misleading. The actual musical interval is never ambiguous, as this figure suggests. Only the ascending vs descending percept is ambiguous. Therefore the predictions of the ferret A1 decoding (Fig 3D) and the model in Fig 5 are inconsistent with perception in two ways. One (which the authors mention) is the direction of the bias shift (up vs down). Another (not mentioned here) is that one never experiences a shift in the shepard tone at a fraction of a semitone - the musical note stays the same, and changes only in pitch height, not pitch chroma.

      We are unsure of the reviewer’s direction with this question. In particular the second point is not clear to us: "...one (who?) never (in this experiment? in real life?) experiences a bias shift in the Shepard tone at a fraction of a semitone" (why is this relevant in the current experiment?). Pitch chrome would actually be a possible replacement for pitch class, but somehow, the previous Shepard tone literature has referred to it as pitch class.

      P7 l12 - omit one 'consequently'

      Changed to 'Therefore'.

      P7 l24 - I encourage the authors to not use "local" and "global" without making it clear what space they refer to. One tends to automatically think of frequency space in the auditory system, but I think here they mean f0 space? What is a "cell close to the location of the bias"? Cells reside in the brain. The bias is in f0 space. The use of "local" and "global" throughout the manuscript is too vague.

      Agreed, the reference here was actually to the cell's preferred pitch class, not its physical location (which one might arguably be able to disambiguate, given the context). We have changed the wording, and also checked the use of global/local throughout the manuscript. The main use of 'global/local' is now in reference to the range of adaptation, and is properly introduced on first mention.

      P7 L26 -there is no Fig 5D1. Do you mean the left panel of 5D?

      Thanks. Changed.

      FigS3 is referred to a lot on p7-8. Should this be moved to the main text?

      The main reason why we kept it in the supplement is that it is based on a more static model, which is intended to illustrate the consequences of different encoding schemes. In order to not confuse the reader about these two models, we prefer to keep it in the supplement, which - for an online journal - makes little difference since the reader can just jump ahead to this figure in the same way as any other figure.

      Fig 5C, D - label x-axis.

      Added.

      Fig 5E - axis labels needed. I don't know what is plotted on x and y, and cannot see red and green lines in left plot

      Thanks for noticing this, colors corrected, axes labeled.

      Page 8 L3-15 - If I follow this correctly, I think the authors are confusing pitch and frequency here in a way that is fundamental to their model. They seem to equate tonotopic frequency tuning to pitch tuning, leading to confused implications of frequency adaptation on the F0 representation of complex sounds like Shepard tones. To my knowledge, the authors do not examine pure tone frequency tuning in their neurons in this study. Please clarify how you propose that frequency tuning like that shown in Fig 5A relates to representation of the F0 of Shepard tones. Or...are the authors suggesting these neural effects have little to do with pitch processing and instead are just the result of frequency tuning for a single harmonic of the Shepard tones?

      We agree that it is not trivial to describe this well, while keeping the text uncluttered, in particular, because often tuning properties to stimulus frequency contribute to tuning properties of the same neuron for pitch class, although this can be more or less straightforward: specifically, for some narrowly tuned cells, the Shepard tuning is simply a reflection of their tuning to a single octave range of the constituent tones (see Fig. S1). For more broadly tuned cells, multiple constituent tones will contribute to the overall Shepard tuning, which can be additive, subtractive, or more complex. The assumption in our approach is that we can directly estimate the Shepard tuning to evaluate the consequence for the percept. While this may seem artificial, as Shepard tones do not typically occur in nature, the same argument could be made against pure tones, on which classical tuning curves and associated decodings are often based. Relating the Shepard tuning to the classical tuning would be an interesting study in itself, although arguably relating the tuning of one artificial stimulus to another. Regarding the terminology of pitch, pitch class and frequency: The term pitch class is commonly used in the field of Shepard tones, and - as we indicated in the beginning of the results: "the term pitch is used interchangeably with pitch class as only Shepard tones are considered in this study". We agree that the term pitch, which describes the perceptual convergence/construction of a tone-height from a range of possible physical stimuli, needs to be separated from frequency as one contributor/basis for the perception of a pitch. However, we think that the term pitch can - despite its perceptual origin - also be associated with neuron/neural responses, in order to investigate the neural origin of the pitch percept. At the same time, the present study is not targeted to study pitch encoding per se, as this would require the use of a variety of stimuli leading to consistent pitch percepts. Therefore, pitch (class) is here mainly used as a term to describe the neural responses to Shepard tones, based on the previous literature, and the fact that Shepard tones are composite stimuli that lead to a pitch percept. The last sentence has been added to the manuscript for clarity.

      P7-9: I wasn't left with a clear idea of how the model works from this text. I assume you have layers of neurons tuned to frequency or f0 (based on the real data?), which are connected in some way to produce some sort of output when you input a sound? More detail is needed here. How is the dynamic adaptation implemented?

      The detailed description of the model can be found in the Methods section. We have gone through the corresponding paragraph and have tried to clarify the description of the model by introducing a high-level description and the reference to the corresponding Figure (Fig. 5A) in the Results.

      Fig6A: Figure caption can't be correct. In any case, these equations cannot be understood unless you define the terms in them.

      We have clarified the description in the caption.

      Fig 6/directionality analysis: Assuming that the "F" in the STRFs here is Shepard tone f0, and not simple frequency?

      We have changed the formula in the caption and the axis labels now.

      Fig 6C - y-axis values

      In the submission, these values were left out on purpose, as the result has an arbitrary scale, but only whether it is larger or smaller than 0 counts for the evaluation of the decoded directionality (at the current level of granularity). An interesting refinement would be to relate the decoded values to animal performance. We have now scaled the values arbitrarily to fit within [-1,1], but we would like to emphasize that only their relative scale matters here, not their absolute scale.

      Fig 6E - can't both be abscissa (caption). I might be missing something here, but I don't see the "two stripes" in the data that are described in the caption.

      Thank you. The typo is fixed. The stripes are most clearly visible in the right panel of Fig. 6E, red and blue, diagonally from top left to bottom right.

      Fig 6G -I have no idea what this figure is illustrating.

      This panel is described in the text as follows: "The resulting distribution of activities in their relation to the Bias is, hence, symmetric around the Bias (Fig. 6G). Without prior stimulation, the population of cells is unadapted and thus exhibits balanced activity in response to a stimulus. After a sequence of stimuli, the population is partially adapted (Fig. 6G right), such that a subsequent stimulus now elicits an imbalanced activity. Translated concretely to the present paradigm, the Bias will locally adapt cells. The degree of adaptation will be stronger, if their tuning curve overlaps more with the biased region. Adaptation in this region should therefore most strongly influence a cell’s response. For example, if one considers two directional cells, an up- and a down-selective cell, cocentered in the same frequency location below the Bias, then the Bias will more strongly adapt the up-cell, which has its dominant, recent part of the SSTRF more inside the region of the Bias (Fig. 6G right). Consistent with the percept, this imbalance predicts the tone to be perceived as a descending step relative to the Bias. Conversely, for the second stimulus in the pair, located above the Bias, the down-selective cells will be more adapted, thus predicting an ascending step relative to the previous tone."

      I might be just confused or losing steam at this point, but I do not follow what has been done or the results in Fig 6 and the accompanying text very well at all. Can this be explained more clearly? Perhaps the authors could show spike rate responses of an example up-direction and down-direction neuron? Explain how the decoder works, not just the results of it.

      We agree that we are presenting something new here. However, it is conceptually not very different from decoding based on preferred frequencies. We have attempted to provide two illustrations of how the decoder works (Fig. 6A) and how it then leads to the percept using prototypical examples of cellular SSTRFs (Fig. 6G). We have added a complete, but accessible description to the Methods section. Showing firing rates of neurons would unfortunately not be very telling, given the usual variability in neural response and the fact that our paradigm did not have a lot of repetitions (but instead a lot of conditions), which would be able to average out the variability on a single neuron level.

      Discussion - I do not feel I can adequately critique the author's interpretation of the results until I understand their results and methods better. I will therefore save my critique of the discussion section for the next round of revisions after they have addressed the above issues of disorganization and clarity in the manuscript.

      We hope that the updated version of the manuscript provides the reviewer now with this possibility.

      Methods

      P15L7 - gender of human subjects? Age distribution? Age of ferrets?

      We have added this information.

      P16L21 - What is the justification for randomizing the phase of the constituent frequencies?

      The purpose of the randomization was to prevent idiosyncratic phase relationships for particular Shepard tones, which would depend in an orderly fashion on the included base-frequencies if non-randomized, and could have contributed to shaping the percept for each Shepard tone in a way that was only partly determined by the pitch class of the Shepard tone. Added to the section.

      P17L6 - what are the 2 randomizations? What is being randomized?

      Pitch classes and position in the Bias sequence. Added to the section.

      P16 Shepard Tuning section - What were the durations of the tones and the time between tones within a trial?

      Thanks, added!

      Equations - several undefined terms in the equations throughout the manuscript.

      Thanks. We have gone through the manuscript and all equations and have introduced additional definitions where they had been missing.

      Reviewer #3 (Recommendations For The Authors):

      P3L10: "passive" and "active" conditions come totally out of the blue. Need introducing first. (Or cut. If adaptation is always seen, why mention the two conditions if the difference is not relevant here?)

      We have added an additional sentence in the preceding paragraph, that should clarify this. The reason for mentioning it is that otherwise a possible counter-argument could be made that adaptation does not occur in the active condition, which was not tested in ferrets (but presents an interesting avenue for future research).

      P3L14 "siple" typo

      Corrected.

      P4L1 "behaving humans" you should elaborate just a little here on what sort of behavior the participants engaged in.

      Thanks for pointing this out. We have clarified this by adding an additional sentence directly thereafter.

      P4 adaptation: I wonder whether it would be useful to describe the Bias condition a bit more here before going into the observations. The reader cannot know what to expect unless they jump ahead to get a sense of what the Bias looks like in the sense of how many stimuli are in it, and how similar they are to each other. Observations such as "the average response strength decreases as a function of the position in the Bias sequence" are entirely expected if the Bias is made up of highly repetitive material, but less expected if it is not. I appreciate that it can be awkward to have Methods after Results, but with a format like that, the broad brushstroke Methods should really be incorporated into the Results and only the tedious details should be reserved for the Methods to avoid readers having to jump back and forth.

      Agreed, we have inserted a corresponding description before going into the details of the results.

      Related to this (perhaps): Bottom of P4, top of P5: "significantly less reduced (33%, p=0.0011, 2 group t-test) compared to within the bias (Fig. 2 A3, blue vs. red), relative to the first responses of the bias" ... I am at a loss as to what the red and blue symbols in Fig 2 A3 really show, and I wonder whether the "at the edges" to "within the Bias" comparison were to make sense if at this stage I had been told more about the composition of the Bias sequence. Do the ambiguous ('target') tones also occur within the Bias? As I am unclear about what is compared against what I am also not sure how sound that comparison is.

      We have added an extended description of the Bias to the beginning of this section of the manuscript. For your reference: the Shepard tones that made up the ambiguous tones were not part of the Bias sequence, as they are located at 3st distance from the center of the Bias (above and below), while the Bias has a range of only +/- 2.5st.

      Fig 2: A4 B1 B2 labels should be B1 B2 B3

      Corrected.

      Fig 2 A2, A3: consider adjusting y-axis range to have less empty space above the data. In A3 in particular, the "interesting bit" is quite compressed.

      Done, however, while still matching the axes of A2 and A3 for better comparability.

      I am under the strong impression that the human data only made it into Fig 2 and that the data from Fig 3 onwards are animal data only. That is of course fine (MEG may not give responses that are differentiated enough to perform the sort of analyses shown in the later figures. But I do think that somewhere this should be explicitly stated.

      Yes, the reviewer's observation is correct. The decoding analyses could not be conducted on the human MEG data and was therefore not further pursued. Its inclusion in the paper has the purpose of demonstrating that even in humans and active conditions, the local adaptation is present, which is a key contributor to the two decoding models. We now state this explicitly when starting the decoding analysis.

      P5L2 "bias" not capitalized. Be consistent.

      All changed to capitalized.

      P5L8 reference to Fig 2 A4: something is amiss here. From legend of Fig 2 it seems clear that panel A4 label is mislabeled B1. Maybe some panels are missing to show recovery rates?

      Apologies for this residual text from a previous version of the manuscript. We have gone through all references and corrected them.

      P6L7 comma after "decoding".

      Changed.

      Fig 3, I like this analysis. What would be useful / needed here though is a little bit more information about how the data were preprocessed and pooled over animals. Did you do the PCA separately for each animal, then combine, or pool all units into a big matrix that went into the PCA? What about repeat, presentations? Was every trial a row in the matrix, or was there some averaging over repeats? (In fact, were there repeats??)

      Thanks for bringing up these relevant aspects, which were partly insufficiently detailed in the manuscript. Briefly, cells were pooled across animals and we only used cells that could meaningfully contribute to the decoding analysis, i.e. had auditory responses and different responses to different Shepard tones. Regarding the responses, as stated in the Methods, "Each stimulus was repeated 10 times", and we computed average responses across these repetitions. Single trials were not analyzed separately. We have added this information in the Methods, and refer to it in the Results.

      Also, there doesn't appear to be a preselection of units. We would not necessarily expect all cortical neurons to have a meaningful "best pitch" as they may be coding for things other than pitch. Intuitively I suspect that, perhaps, the PCA may take care of that by simply not assigning much weight to units that don't contribute much to explained variance? In any event I think it should be possible, and would be of some interest, to pull out of this dataset some descriptive statistics on what proportion of units actually "care about pitch" in that they have a lot (or at least significantly more than zero) of response variance explained by pitch. Would it make sense to show a distribution of %VE by pitch? Would it make sense to only perform the analysis in Fig 3 on units that meet some criterion? Doing so is unlikely to change the conclusion, but I think it may be useful for other scientists who may want to build on this work to get a sense of how much VE_pitch to expect.

      We fully agree with the reviewer, which is why this information is already presented in Supplementary Fig.1, which details the tuning properties of the recorded neurons. Overall, we recorded from 1467 neurons across all ferrets, out of which 662 were selected for the decoding analysis based on their driven firing rate (i.e. whether they responded significantly to auditory stimulation) and whether they showed a differential response to different Shepard tones The thresholds for auditory response and tuning to Shepard tones were not very critical: setting the threshold low, led to quantitatively the same result, however, with more noise. Setting the thresholds very high, reduced the set of cells included in the analysis, and eventually that made the results less stable, as the cells did not cover the entire range of preferences to Shepard tones. We agree that the PCA based preprocessing would also automatically exclude many of the cells that were already excluded with the more concrete criteria beforehand. We have added further information on this issue in the Methods section under the heading 'Unit selection'.

      P9 "tones This" missing period.

      Changed.

      P10L17 comma after "analysis"

      Changed.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #3 (Public Review):

      Some critical comments are provided below:

      (1) The data quality still needs to be improved. There are many outliers in the experimental data shown in some figures, e.g. Figure 2D-G. The presence of these outliers makes the results unreliable. The author should thoroughly review the data analysis in the manuscript. In addition, a couple of western blot bands, such as IL-1β in Figure 3C, are not clear enough, please provide clearer western blot results again to support the conclusion.

      Following our comparative analysis, we have determined that these data do not affect our conclusions. Moreover, our experimental design included a total of six mice per group, with all mouse samples being subjected to testing.

      (2) As shown in Figure 1G-I, foot thickness and IL-1β content in foot tissues of the Aged+Abx group were significantly reduced, but there was no difference in serum uric acid level. In addition, the Abx-untreated group should be included at all ages.

      Thank you for your comment. We have included this data in Supplemental Material 4.

      (3) Since FMT (Figure 4) and butyrate supplementation (Figure 8) have different effects on uric acid synthesis enzyme and excretion, different mechanisms may lie behind these two interventions. Transplantation with significantly enriched single strains from young mice, such as Bifidobacterium and Akkermansia, is the more reliable approach to reveal the underlying mechanism between gut microbiota and gout.

      Thank you for your comment. Due to the involvement of multiple bacterial genera in gout and hyperuricemia, and the practical challenge of testing all strains, our focus shifted to the functional implications and metabolism of the microbiota. Experimental validation confirmed that butyrate exerts a dual-therapeutic effect in mitigating gout and hyperuricemia.

      (4) In Figure 2F, the results showed the IL-1β, IL-6, and TNF-α content in serum, which was inconsistent with the authors' manuscript description (Line 171).

      Thank you for your comment. The modifications to the results have been implemented.

      (5) Figures 2F-H duplicate Supplementary Figures S1B-D. The authors should prepare the article more carefully to avoid such mistakes.

      Thank you for your comment. We have corrected it in the manuscript.

      (6) In lines 202-206, the authors stated that the elevated serum uric acid levels in the Young+Old or Young+Aged groups, but there is no difference in the results shown in Figure 4A.

      Thank you for your comment. We have corrected it in the manuscript.

      (7) Please visualize the results in Table 2 in a more intuitive manner.

      The results have been presented in Table 2 with a more intuitive visual format. The detailed information is presented in Supplement 4.

      (8) The heatmap in Figure 7A cannot strongly support the conclusion "the butyric acid content in the faeces of Young+PBS group was significantly higher than that in the Aged+PBS group". The author should re-represent the visual results and provide a reasonable explanation. In addition, please provide the ordinate unit of Supplementary Figure 7A-H.

      Thank you for your comment. Figure 7A and Supplementary Figure 7A-H together illustrate "the butyric acid content in the faeces of Young+PBS group was significantly higher than that in the Aged+PBS group", and the specific units of short-chain fatty acids have been annotated in the manuscript.

      (9) Uncropped original full-length western blot should be provided.

      Thank you for your comment. We have made relevant notes in the paper.

      Reviewer #1 (Recommendations For The Authors):

      Gout, a prevalent form of arthritis among the elderly, exhibits an intricate relationship with age and gut microbiota. The authors found that gut microbiota plays a crucial role in determining susceptibility to age-related gout. They observed that age-related gut microbiota regulated the activation of the NLRP3 inflammasome pathway and modulated uric acid metabolism. "Younger" microbiota has a positive impact on the gut microbiota structure of old or aged mice, enhancing butanoate metabolism and butyric acid content. Finally, they found butyric acid exerts a dual effect, inhibiting inflammation in acute gout and reducing serum uric acid levels. This work's insights emphasize the potential of "young" gut microbiome in mitigating senile gout. The whole study was interesting, but there were some minor errors in the overall writing of the paper. The author should carefully check the spelling of the words in the text and the case consistency of the group names.

      Questions:

      (1) Line 118, line 142, and elsewhere 24 months in the same format as before.

      Thank you for your comment. We have corrected it in the manuscript.

      (2) Lines 123, Old and Aged group should be a complex number.

      Thank you for your suggestion. We have corrected it in the manuscript.

      (3) Why does line 133 mention the use of ABX? Please add a brief explanation.

      Thank for your suggestion. The aim of utilizing ABX is to construct the linkage between gut microbiota, age, and gout.

      (4) Lines 172-175, the description of TNF does not match the description of the result figure, may be the picture placement error, please correct this.

      Thank you for your careful review. The error has been corrected and the accurate result has been inserted into the original manuscript.

      (5) Lines183-185 and lines193-lines195, Pro-Caspase-1 and Pro-IL activate excess write.

      Thank you for your careful review. We have corrected the error at the original location.

      (6) Line 400, the text should not be written as increased.

      Thank you for your careful review. We have corrected the error at the original location.

      (7) "ns" needs to be added in the legend to indicate that there is no significant difference.

      Thank you for your careful review. We have corrected the error at the original location.

      (8) Lines 1080-1084 "Old or Aged control group and the old or aged group", group names should be case-sensitive.

      Thank you for your suggestion. We have made the correct modification to the group names.

      (9) Lines 1072-1073, "Representative western blot images of foot tissue NLRP3 pathways proteins" add band density.

      Thank you for your suggestion. We have corrected the error on lines 1072-1073 of the article.

      Reviewer #2 (Recommendations For The Authors):

      Specific comments:

      (1) In Figures 1G-H, the Aged+PBS group with antibiotic treatment shows a significant reduction in foot swelling and IL-1β compared to the Young+PBS and Old+PBS groups. The authors state that age-related changes in the gut microbiota exacerbate gout. However, why does only the Aged+PBS group improve with antibiotic treatment? It seems that butyrate alone cannot explain this phenomenon.

      We utilize antibiotics for treatment in order to establish the relationship between gut microbiota, age, and gout. Different age groups are directly given antibiotics for treatment. We found that after clearing the gut microbiota and then stimulating with MSU, the trend of inflammation factors changing with age disappears.

      (2) In Figure 2, the fecal transplantation from young mice improved the infiltration of inflammatory cells and inflammatory cytokines in the Old and Aged groups. However, in Supplementary Figure 1A, there is no improvement observed in the percentage of foot swelling. Is it appropriate to conclude that inflammation was improved even though foot swelling was not suppressed?

      Although we did not observe changes in the swelling of the mice's feet, there were changes in the inflammatory cell infiltration and inflammation factors in the slices. We rely on a comprehensive assessment of various indicators to determine whether the inflammatory condition has improved or worsened.

      (3) In line #249, the authors state that "the fecal microbiota from mice in the young group promotes uric acid elimination, inhibits reabsorption, and may contribute to the integrity of the intestinal barrier structure." However, Supplementary Figure 3F-H shows no significant alterations in Occludin and ZO-1 mRNA expression levels among all groups. Therefore, it is difficult to conclude that the fecal microbiota from the young group promotes the integrity of the intestinal barrier structure. A functional barrier assay, such as oral administration of FITC-dextran, would be necessary to verify the authors' conclusion.

      In Supplementary Figure 3F-H, we observed that the mRNA expression of Occludin and ZO-1 increased but showed no significant difference. However, after the elderly mice were transplanted with the intestinal microbiota of young mice, the mRNA expression of JAMA showed a significant upward trend. Additionally, due to the scarcity of old mice, we were unable to perform the oral administration of FITC-dextran. However, we supplemented with immunohistochemical slices of Zo-1 and Occludin to support our viewpoint.

      (4) In Figure 4, when comparing the young+PBS group with the old+PBS or aged+PBS groups, there are hardly any differences in the proteins involved in uric acid synthesis (ADA, GDA, XOD) or the genes involved in uric acid transport (URAT1, GLUT9, OAT1, OTA3, ABCG2). Since no changes in uric acid synthesis or transport pathways are observed with aging, it is questionable to conclude that fecal transplantation from young mice improves these pathways and lowers blood uric acid levels.

      In the calculation process, we used different age groups of the control group as references, instead of directly using young mice. We then compared the data of mice of different ages, and the results are in Supplementary Material 4.

      (5) In line 276, the authors describe "the Young +Old and Young+Aged groups tended to be closer to the Old+PBS and Aged+PBS groups, and the Old+Young and Aged+young groups tended to be closer to the Young+PBS group (Figure 5D)". Please conduct a statistical analysis.

      (6) In line 298, the authors hypothesize that butyrate might be the key molecule responsible for controlling gout, as Bifidobacterium and Akkermansia were abundant in the Young group, and the butyrate pathway was prominent. However, neither Bifidobacterium nor Akkermansia are butyrate-producing bacteria. Thus, the conclusion appears to be biased toward butyrate, raising questions about this interpretation.

      Upon comparison, we discovered other bacteria genera that produce butyrate, such as Lachnoclostridium. Additionally, literature (PMID:38126785, 26420851) reports have indicated that Bifidobacteria combined with other genera can enhance the production of butyrate. Meanwhile, Akkermansia, particularly the species Akkermansia muciniphila, has been found to confer several beneficial traits, as evidenced by preclinical studies. These traits include promoting the growth of butyrate-producing bacteria through the production of acetate, which leads to a decrease in the loss of the colonic bilayer and subsequent reduction in inflammation (PMID:35468952). Based on the predicted results of microbiome functions, we observed that the Butanoate_metabolism of the microbiota in young mice and the elderly mice recipients of young mouse microbiota was enhanced. Considering that Lachnoclostridium can produce butyrate, and that Bifidobacteria and Akkermansia can promote the production of butyrate by the intestinal microbiota, we speculated that butyrate might play a role in gout and hyperuricemia.

      (7) In Supplementary Figure 7, acetic acid and propionic acid also show the same behavior as butyric acid. It is possible that these metabolites may also affect the development of gout.

      Thank you for your suggestion. Indeed, Figure 7 does show a similar trend for acetic and propionic acids as for butyric acid. However, considering the predictive data of microbial function and the non-targeted metabolomic data, there is an enhancement of Butanoate_metabolism in both young mice and elderly mice receiving young mouse intestinal microbiota transplants. Therefore, we prioritized butyrate as the subject of our study. Due to the scarcity of elderly mice, we are unable to conduct subsequent experiments with acetic and propionic acids, which is one of the limitations of this study. This work will be addressed in our follow-up research.

      (8) In Figure 6, the secondary bile acid biosynthesis pathway was also changed. However, there is little mention of secondary bile acid in the discussion section. Please carefully discuss other possibilities besides butyrate.

      Thank you for your suggestion. We have incorporated a discussion about secondary bile acids into the relevant section of our manuscript.

      (9) In line #330, the authors state, 'the metabolites identified as showing differential abundance between the groups were enriched in the butanoate metabolism pathway (Figure 6A-D).' However, there does not appear to be much difference in the butanoate metabolism pathway. Specifically, in Figure 6C, the butanoate metabolism pathway in the Old group does not differ from that in the Young group. Please explain in more detail whether the butanoate metabolism pathway is relevant in the Old group.

      The metabolites identified as showing differential abundance between the groups were enriched in the butanoate metabolism pathway. The differential metabolites are enriched in the butyrate metabolism pathway; however, the non-targeted metabolomics did not reveal the extent of their enrichment.

      (10) In Figure 7, the authors measured the levels of short-chain fatty acids in the Young and Aged groups. They found butyrate in the feces of mice in the Young group was higher than that in the Aged group. However, I wonder whether the Old group also had low levels of butyrate or not.

      In the experiment, we selected three representative groups to verify the hypothesis that butyrate may play a significant role in gout and hyperuricemia. Subsequently, we found that supplementing 18-month-old and 24-month-old mice with butyrate indeed reduced blood uric acid levels and alleviated gout symptoms. Since 18-month-old mice are difficult to obtain, we only conducted microbiome sequencing and non-targeted metabolomic analysis.

      Minor issues:

      (11) In line 74, what does MSU stand for? Please describe the abbreviation.

      In line 74, MSU refers to Monosodium urate crystals.

      (12) In line 136, please insert a space between "IL-1β" and "and".

      Thank you for your suggestion. We have corrected the error of the article.

      (13) In line 570, please describe the method of butyrate administration and also correct the grammatical errors.

      Thank you for your suggestion. We have corrected the error of the article.

      (14) Change the title of x axis in Figure 2F-H, "Serum ~" to "Peritoneal fluid ~", according to the legend.

      Thank you for your suggestion. We have corrected this error in the manuscript.

      (15) In line 302, "succinates" should be "butyric acid or butyrate".

      Thank you for your suggestion. We have corrected this error in the manuscript.

      Reviewer #3 (Recommendations For The Authors):

      (1) The authors showed the results of IL-1β levels in foot tissues in Figure 1C and Figure 1H, and serum IL-1β, IL-6, and TNF-α levels in Figure 2F-H. Could the authors also provide the results of IL-6 and TNF-α in foot tissue in Figure 1?

      Thank you for your suggestion. We have added the results of of IL-6 and TNF-α in foot tissue in supplementary material 4.

      (2) There are some errors in the reference citation format, such as missing page numbers.

      Thank you for your careful review. We have revised the references in our manuscript.

      (3) There are too many writing errors in the manuscript, which greatly affect the understanding of the text. The manuscript must be carefully revised to improve its readability. It's recommended that a professional English writer or native speaker proofread the paper before submission. Some errors, but not limited to these errors, are listed below.

      a. Line 107: The abbreviation for "short-chain fatty acid" should be SCFA, not SFCA.

      Thank you for your careful review. We have corrected this error in the manuscript.

      b. Line 136: There is a missing space between IL-1β and and. B.

      Thank you for your careful review. We have corrected this error in the manuscript.

      c. Line 145, the phrase "on gout on gout", and line 471, "that transplantation" are repeated.

      Thank you for your careful review. We have corrected this error in the manuscript.

      d. Line 152: "Age+PBS" should be "Aged+PBS".

      Thank you for your careful review. We have corrected this error in the manuscript.

      e. In Figure 1e, "Aded+PBS" should be "Aged+PBS".

      Thank you for your careful review.  We have corrected the error in Figure 1e.

      f. Line 152: The phrase "by via" is repeated.

      Thank you for your suggestion. We have deleted the phrase "by via" in line 152.

      g. "16S rDNA" in line 92 is inconsistent with the "16S rRNA" in line 652.

      Thank you for your suggestion. We have revised the error in the manuscript to maintain consistency in professional terminology.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Tleiss et al. demonstrate that while commensal Lactiplantibacillus plantarum freely circulate within the intestinal lumen, pathogenic strains such as Erwinia carotovora or Bacillus thuringiensis are blocked in the anterior midgut where they are rapidly eliminated by antimicrobial peptides. This sequestration of pathogenic bacteria in the anterior midgut requires the Duox enzyme in enterocytes, and both TrpA1 and Dh31 in enteroendocrine cells. This effect induces muscular muscle contraction, which is marked by the formation of TARM structures (thoracic ary-related muscles). This muscle contraction-related blocking happens early after infection (15mins). On the other side, the clearance of bacteria is done by the IMD pathway possibly through antimicrobial peptide production while it is dispensable for the blockage. Genetic manipulations impairing bacterial compartmentalization result in abnormal colonization of posterior midgut regions by pathogenic bacteria. Despite a functional IMD pathway, this ectopic colonization leads to bacterial proliferation and larval death, demonstrating the critical role of bacteria anterior sequestration in larval defense.

      This important work substantially advances our understanding of the process of pathogen clearance by identifying a new mode of pathogen eradication from the insect gut. The evidence supporting the authors' claims is solid and would benefit from more rigorous experiments.

      (1) The authors performed the experiments on Drosophila larvae. I wonder whether this model could extend to adult flies since they have shown that the ROS/TRPA1/Dh31 axis is important for gut muscle contraction in adult flies. If not, how would the authors explain the discrepancy between larvae and adults?

      We have linked the adult phenotype to the larval model to explore the ROS/TrpA1/Dh31 axis in both contexts.  As highlighted in the discussion, however, there are key behavioral differences between larvae and adult flies. Unlike larvae, which remain in the food environment, adult flies have the ability to move away. This difference could impact the relevance of gut muscle contraction and bacterial clearance mechanisms between the two stages. Specifically, in larvae, the rapid ejection of gut contents due to muscle contraction poses a unique risk: larvae may inadvertently re-ingest the expelled material within minutes, which could influence their immune defenses. We have clarified this distinction and our hypothesis in the final section of the discussion, as it emphasizes the adaptive nature of this mechanism in larvae.

      (2) The authors performed their experiments and proposed the models based on two pathogenic bacteria and one commensal bacterial at a relatively high bacterial dose. They showed that feeding Bt at 2X1010 or Ecc15 at 4X108 did not induce a blockage phenotype. 

      I wonder whether larvae die under conditions of enteric infection with low concentrations of pathogenic bacteria. 

      To address this, we have provided new data (Movie 5), in which larvae were fed a lower dose of Bt-GFP at 1.3 × 10^10 CFU/mL. In this video, we observe that when larvae ingest fewer bacteria, no blockage occurs, and the bacteria are able to reach the posterior midgut. As the bacterial load is lower, the fluorescence signal is weaker, but the movie clearly shows the excretion of bacteria. Importantly, under these conditions, no larval death was observed. These findings suggest that below a certain bacterial threshold, the pathogenicity is insufficient to: (1) trigger the blockage response, and (2) kill the larvae. In such cases, bacteria are likely eliminated through normal peristaltic movements rather than through the blockage mechanism described in our study.

      If larvae do not show mortality, what is the mechanism for resisting low concentrations of pathogenic bacteria? 

      As mentioned in our previous response, we hypothesize that the larvae’s ability to resist low concentrations of pathogenic bacteria is likely due to being below the threshold of virulence. At lower bacterial doses, the pathogenic load is insufficient to trigger the blockage mechanism or cause larval death. In these cases, it is probable that classical peristaltic movements of the gut efficiently eliminate the bacteria, preventing them from colonizing the posterior midgut or causing significant harm. Thus, the larvae rely on standard gut motility and immune mechanisms, rather than the blockage response, to clear lower doses of bacteria.

      Why is this model only applied to high-dose infections? 

      The reason this model primarily applies to high-dose infections is that lower concentrations of pathogenic bacteria do not trigger the blockage mechanism. As we mentioned in the manuscript, for low bacterial concentrations, where the GFP signal remains detectable, wild-type larvae are still able to resist live bacteria in the posterior part of the intestine.

      Regarding the bacterial doses used in our experiments, it's important to clarify that we calculate the bacterial load based on colony-forming units (CFU). In our setup, there are approximately 5 × 10^4 CFU per midgut. For each experiment, we prepare 500 µl of contaminated medium containing 4 × 10^10 CFU. Fifty larvae are placed into this 500 µl of medium, meaning each larva ingests around 5 × 10^4 CFU within one hour of feeding.

      This leads us to two key points:

      (1) Continuous feeding might trigger the blockage response even at lower doses, as extended exposure to bacteria could lead to higher accumulation within the gut.

      (2) Other defense mechanisms, such as the production of reactive oxygen species (ROS) or classical peristaltic movements, could be sufficient to eliminate lower bacterial doses (around 10^3 CFU or below).

      We also refer to the newly provided Movie 5, where larvae fed with Bt-GFP at 1.3 × 10^10 CFU/mL show no blockage at low ingestion levels and successfully eliminate the bacteria.

      (3) The authors claim that the lock of bacteria happens at 15 minutes while killing by AMPs happens 6-8 hours later. 

      Our CFU data indicate that it’s after 4 to 6 hours that the quantity of bacteria decreases. We fixed this in the text.

      What happened during this period? 

      During the 4 to 6-hour period, several defense mechanisms are activated. ROS play a bacteriostatic and bacteriolytic role, helping to control bacterial growth. Concurrently, the IMD pathway is activated, leading to the transcription, translation, and secretion of antimicrobial peptides. These AMPs exert both bacteriostatic and bacteriolytic effects, contributing to the eventual clearance of the pathogenic bacteria.

      More importantly, is IMD activity induced in the anterior region of the larval gut in both Ecc15 and Bt infection at 6 hours after infection? 

      We have provided new data (Supplementary Figure 6) that includes RT-qPCR analysis of the whole larval gut in wt, TrpA1- and Dh31- genetic background after feeding with Lp, Ecc15, Bt, or yeast only. We monitored the expression of three different AMP-encoding genes and found that while AMP expression varied depending on the food content, there were no significant differences between the genotypes tested.

      Additionally, we included new imaging data (Supplementary Figure 11) from AMP reporter larvae (Dpt-Cherry) fed with fluorescent Lp or Bt. In larvae infected with Bt, which is blocked in the anterior part of the gut, the dpt gene is predominantly induced in this region, indicating strong IMD pathway activity in response to Bt infection. Conversely, in larvae fed with Lp-GFP, the Dpt-Cherry reporter shows weak expression in the anterior midgut, and is barely detectable in the posterior midgut where Lp-GFP establishes itself. This aligns with previous findings by Bosco-Drayon et al. (2012), which demonstrated low AMP expression in the posterior midgut due to the presence of negative regulators of the IMD pathway, such as amidases and Pirk.

      Are they mostly expressed in the anterior midgut in both bacterial infections? Several papers have shown quite different IMD activity patterns in the Drosophila gut. Zhai et al. have shown that in adult Drosophila, IMD activity was mostly absent in the R2 region as indicated by dpt-lacZ. Vodovar et al. have shown that the expression of dpt-lacZ is observable in proventriculus while Pe is not in the same region. Tzou et al. showed that Ecc15 infection induced IMD activity in the anterior midgut 24 hours after infection. 

      Based on our new data (Supplementary Figure 11), we observe that Dpt-RFP expression is primarily localized in the anterior midgut and likely in the beginning of acidic region in larvae infected with Bt, Ecc and Lp. 

      Using TrpA1 and Dh31 mutants, the authors found both Ecc15 and Bt in the posterior midgut. Why are they not evenly distributed along the gut? 

      We observe that bacteria are not evenly distributed along the gut in wild-type larvae as well, with LP. This suggests that the transit time in the anterior part of the gut may be relatively short due to active peristaltism, which would make this region function as a "checkpoint" for bacteria that are not supposed to be blocked. Indeed, we confirmed that peristaltism is active during our intoxication experiments, which could explain the rapid movement of bacteria through the anterior midgut.

      In contrast, bacteria tend to remain longer in the posterior midgut, which corresponds to the absorptive functions of intestinal cells in this region. This would explain why we observe more bacteria in the posterior midgut for Lp in control larvae and for Ecc15 and Bt in the TrpA1- and Dh31- mutants. Although a few bacteria are still found in the anterior midgut, they are consistently in much lower numbers compared to the posterior, as shown in Figures 1A and 3A of our manuscript.

      Last but not least, does the ROS/TrpA1/Dh31 axis affect AMP expression?

      We investigated whether the ROS/TrpA1/Dh31 axis influences AMP expression by performing RT-qPCR on the whole gut of larvae in wild-type, TrpA1-, and Dh31- genetic backgrounds. Larvae were fed with Lp, Ecc, Bt, or yeast (new data: Supplementary Figure 6). We monitored the expression of three different AMP-encoding genes and found that while AMP expression varied depending on the food content, there were no significant differences in AMP expression between the different genotypes.

      Additionally, we provide imaging data from AMP reporter larvae (pDpt-Cherry) fed with fluorescent Lp or Bt (new data: Supplementary Figure 11). These results further confirm that the ROS/TrpA1/Dh31 axis does not significantly affect AMP expression in our experimental conditions.

      (4) The TARM structure part is quite interesting. However, the authors did not show its relevance in their model. Is this structure the key-driven force for the blocking phenotype and killing phenotype? 

      We agree that the TARM structures are a fascinating aspect of this study and acknowledge the interest in their potential role in the blocking and killing phenotypes. While we are keen to explore the specific contributions of these structures during bacterial intoxication, the current genetic tools available for manipulating TARMs target both TARM T1 and T2 simultaneously, as demonstrated by Bataillé et al., 2020 (Fig. 2). Of note, these muscles are essential for proper gut positioning in larvae, and their absence leads to significant defects in food intake and transit, which would confound the results of our intoxication experiments (see Fig. 6 from Bataillé et al., 2020).

      Therefore, while TARMs are likely involved in these processes, the current limitations in selectively targeting them prevent us from definitively testing their role in bacterial blocking and killing at this stage. We hope to address this in future studies as more refined genetic tools become available.

      Is the ROS/TrpA1/Dh31 axis required to form this structure?

      To determine whether the ROS/TrpA1/Dh31 axis is required for the formation of TARM structures, we examined larval guts from control, TrpA1-, and Dh31- mutant backgrounds. Our new data (Supplementary Figure 8) show that the TARM T2 structures are still present in the mutants, indicating that the formation of these structures does not depend on the ROS/TrpA1/Dh31 axis.

      Reviewer #2 (Public Review):

      This article describes a novel mechanism of host defense in the gut of Drosophila larvae. Pathogenic bacteria trigger the activation of a valve that blocks them in the anterior midgut where they are subjected to the action of antimicrobial peptides. In contrast, beneficial symbiotic bacteria do not activate the contraction of this sphincter, and can access the posterior midgut, a compartment more favorable to bacterial growth.

      Strengths:

      The authors decipher the underlying mechanism of sphincter contraction, revealing that ROS production by Duox activates the release of DH31 by enteroendocrine cells that stimulate visceral muscle contractions. The use of mutations affecting the Imd pathway or lacking antimicrobial peptides reveals their contribution to pathogen elimination in the anterior midgut.

      Weaknesses:

      The mechanism allowing the discrimination between commensal and pathogenic bacteria remains unclear.

      Based on our findings, we hypothesize that ROS play a crucial role in this discrimination process, with uracil release by pathogenic or opportunistic bacteria potentially serving as a key signal.

      To test whether uracil could trigger this discrimination, we conducted experiments where Lp was supplemented with uracil. However, our results show that uracil supplementation alone was not sufficient to induce the blockage response (new data: Supplementary Figure 5). This suggests that while uracil may be a factor in bacterial discrimination, it is likely not the sole trigger, and additional bacterial factors or signals may be required to activate the blockage mechanism. 

      The use of only two pathogens and one symbiotic species may not be sufficient to draw a conclusion on the difference in treatment between pathogenic and symbiotic species.

      To address this concern, we performed additional intoxication experiments using Escherichia coli OP50, a bacterium considered innocuous and commonly used as a standard food source for C. elegans in laboratory settings. The results, presented in our updated data (new data: Fig 1B), show that E. coli OP50, despite being from the same genus as Ecc, does not trigger the blockage response. This further supports our conclusion that the gut’s discriminatory mechanism is specific to pathogenic bacteria, and not merely based on bacterial genus.

      We can also wonder how the process of sphincter contraction is affected by the procedure used in this study, where larvae are starved. Does the sphincter contraction occur in continuous feeding conditions? Since larvae are continuously feeding, is this process physiologically relevant?

      In our intoxication protocol, the larvae are exposed to contaminated food for 1 hour, during which the blockage ratio is quantified. Since this period involves continuous feeding with the contaminated food, we do not consider the larvae starved during the quantification process. Our observations show differences in the blockage response depending on the bacterial contaminant and the genetic background of the host. Additionally, we were able to trigger the blocking phenomenon using exogenous hCGRP.

      Regarding the experimental setup for movie observations, it is true that larvae are immobilized on tape in a humid chamber, which is not a fully physiological context. However, in the new movie we provide (Movie 3), co-treatment with fluorescent Dextran (Red) and fluorescent Bt (Green) shows that both are initially blocked, followed by the posterior release of Dextran once the bacterial clearance begins.

      Furthermore, to address the question of continuous exposure, we extended the exposure period to 20 hours instead of 1 hour. Even after prolonged exposure, we observed that pathogens are still blocked in the anterior part of the gut (new data: Supplementary Figure 2B). This supports the physiological relevance of the sphincter contraction and its ability to function under continuous feeding conditions.

      Reviewer #1 (Recommendations For The Authors):

      (1) The authors performed the experiments on Drosophila larvae. I wonder whether this model could extend to adult flies since they have shown that the ROS/TRPA1/Dh31 axis is important for gut muscle contraction in adult flies. If not, how would the authors explain the discrepancy between larvae and adults?

      We link the adult phenotype to the one we describe in larvae in order to have the candidate approach toward the ROS/TrpA1/Dh31 axis. As we already mention in the discussion, while larvae stay in the food, adult flies can go away. If larvae eject their gut content, they may ingest it within minutes. We clarify our idea in the last part of the discussion.

      (2) The authors performed their experiments and proposed the models based on two pathogenic bacteria and one commensal bacterial at a relatively high bacterial dose. They showed that feeding Bt at 2X1010 or Ecc15 at 4X108 did not induce a blockage phenotype. 

      I wonder whether larvae die under conditions of enteric infection with low concentrations of pathogenic bacteria. 

      Video provided with Bt-GFP 1.3 10^10 CFU/mL (new data: Movie 5). When larvae eat less, there is no blockage and bacteria can reach the posterior midgut. Note that the fluorescence is weak due to the low amount of bacteria ingested. The movie shows an excretion of the bacteria. There is also no death of the larvae. Together these results suggest that below a given threshold, the virulence of the bacteria is too weak to i) trigger a blockage and 2/ kill the larva. The bacteria are likely eliminated through classical peristaltism.

      If larvae do not show mortality, what is the mechanism for resisting low concentrations of pathogenic bacteria? 

      Maybe we are below the threshold of virulence. See our response just above.

      Why is this model only applied to high-dose infections? 

      As mentioned in the manuscript, lower concentrations do not trigger the blockage and for lower concentrations with a GFP signal still detectable, wild-type animals resist the presence of live-bacteria within the posterior part of the intestine.

      About the doses, the CFU should be considered. Indeed, there are around 5.10^4 CFU per midgut. In our experimental procedure we calculate the amount of bacteria for 500 µl of contaminated medium (i.e. 4.10^10 CFU/500µl of medium). Then around 50 larvae were deposited in the 500µl of contaminated media. In this condition, one larva ingests 5.10^4 CFU. Moreover, larvae are only fed for 1h. 

      So 1/ continuous feeding may also trigger locking even at lower doses and 2/ the other mechanisms of defenses (such as ROS) or peristalsis may be sufficient to eliminate lower doses (i.e. 10^3 CFU or below). See the new movie 5 we provide with Bt-GFP 1.3 10^10 CFU/mL

      (3) The authors claim that the lock of bacteria happens at 15 minutes while killing by AMPs happens 6-8 hours later. 

      Our CFU data indicate that it’s after 4 to 6 hours that the quantity of bacteria decreases. We fixed this in the text.

      What happened during this period? 

      ROS activity (bacteriostatic and bacteriolytic), IMD activation, AMP transcription, translation, secretion and bacteriostatic as well as bacteriolytic activity.

      More importantly, is IMD activity induced in the anterior region of the larval gut in both Ecc15 and Bt infection at 6 hours after infection? 

      We provide new data for larval whole gut RT-qPCR data in wt, TrpA1- and Dh31- genetic background fed with Lp or Ecc or Bt or yeast only (new data: SUPP6). We monitored 3 different AMP-encoding genes and found differences related to the food content, but no differences between genotypes. In addition, we provide images from AMP reporter animals (Dpt-Cherry) fed with fluorescent Lp or Bt (new data: SUPP11) showing that with Bt blocked in the anterior part of the intestine, the dpt gene is mainly induced in this area. Note that in the larva infected with Lp-GFP, the Dpt-Cherry reporter is weakly expressed in the anterior midgut. In the posterior midgut, the place where Lp-GFP is established, Dpt-Cherry is barely detectable. This observation is in line with the previous observation made by Bosco-Drayon et al., (2012) demonstrating the low level of AMP expression in the posterior midgut due to the expression of the IMD negative regulators such as amidases and pirk. In the larva infected with Bt-GFP, note the obvious expression of DptCherry in the anterior midgut colocalizing with the bacteria (new data: SUPP11).

      Are they mostly expressed in the anterior midgut in both bacterial infections? Several papers have shown quite different IMD activity patterns in the Drosophila gut. Zhai et al. have shown that in adult Drosophila, IMD activity was mostly absent in the R2 region as indicated by dpt-lacZ. Vodovar et al. have shown that the expression of dpt-lacZ is observable in proventriculus while Pe is not in the same region. Tzou et al. showed that Ecc15 infection induced IMD activity in the anterior midgut 24 hours after infection. 

      In ctrl animals fed Bt, Ecc and Lp we see Dpt-RFP in anterior midgut and likely in the beginning of acidic region. See the new data: SUPP11 images provided for the previous remark.

      Using TrpA1 and Dh31 mutants, the authors found both Ecc15 and Bt in the posterior midgut. Why are they not evenly distributed along the gut? 

      Same is true with Lp in wt; not evenly distributed. As if the transit time in the anterior part is very short due to peristaltism which would fit for a check point area if you’re not supposed to be blocked. Indeed, peristaltism is active during our intoxications. Then, it stays longer in the posterior part, fitting with the absorptive skills of the intestinal cells in this area. With Lp in ctrl or Ecc and Bt in TrpA1- and Dh31- mutants, there are always a few in the anterior midgut but always much less compared to the posterior. See our figure 1A and 3A.

      Last but not least, does the ROS/TrpA1/Dh31 axis affect AMP expression?

      We provide larval whole gut RT-qPCR data in wt, TrpA1- and Dh31- genetic background fed with Lp or Ecc or Bt or yeast only (new data: SUPP6). We monitored 3 different AMPencoding genes and found differences related to the food content, but no differences between genotypes. In addition, we provide images from AMP reporter animals (pDptCherry) fed with fluorescent Lp or Bt, (new data: SUPP11).

      (4) The TARM structure part is quite interesting. However, the authors did not show its relevance in their model. Is this structure the key-driven force for the blocking phenotype and killing phenotype? 

      Indeed, we would like to explore the roles of these structures and the putative requirement upon bacterial intoxication using some driver lines developed by the team that studied these muscles in vivo. However, the genetic tools currently available will target TARMsT1 and T2 at the same time. See Fig 2 form Bataillé et al, . 2020. Moreover, these TARMs are, at first, crucial for the correct positioning of the gut within the larvae and their absence lead to a global food intake and transit defect that will bias the outcomes of our intoxication protocol (see fig 6 from Bataillé et al,. 2020).

      Is the ROS/TrpA1/Dh31 axis required to form this structure?

      We provide images of larval guts from ctrl, TrpA1 and Dh31 mutants demonstrating the presence of the TARMs T2 structures despite the mutations (new data: SUPP8). In addition, we provide representative movies of peristalsis in intestines of Dh31 mutants fed or not with Ecc to illustrate that muscular activity is not abolished (new data: Movie 9 and Movie 10).

      Minor points:

      (1) Why not use the Pros-Gal4/UAS-Dh31 strain in Figure 3B in addition to hCGRP?

      We opted for exogenous hCGRP addition because it allowed us precise timing control over Dh31 activation. Overexpression of Dh31 from embryogenesis or early larval stages could have significant and unintended effects on intestinal physiology, potentially confounding the results. While temporal control using TubG80ts could be an alternative, our focus was on identifying the specific cells responsible for the phenomenon.

      To achieve this, we perturbed Dh31 production via RNAi, specifically targeting a limited number of enteroendocrine cells (EECs) using the DJ752-Gal4 driver, as described by Lajeunesse et al., 2010. Our new data (Supplementary Figure 4) demonstrate that Dh31 expression in this subset of cells is indeed necessary for the blockage phenomenon.

      (2) Section title (line 287) refers to mortality, but no mortality data is in the figure.

      We agree that the title referenced mortality, whereas no mortality data was presented in this section. We have updated the title to better reflect the data discussed in this part of the manuscript.

      (3) It may be better to combine ROS-related contents in the same figure.

      While it is technically feasible to consolidate the ROS-related content into one figure, doing so would require splitting essential data, such as the Gal4 controls for the RNAi assays and parts of the survival phenotype data. We believe that the current structure of the study, which first explores the molecular aspects of the phenomenon and then demonstrates its relevance to the animal’s survival, provides a clearer and more logical flow. For these reasons, we prefer to maintain the current figure layout.

      Reviewer #2 (Recommendations For The Authors):

      Major recommendation

      (1) Other wild-type backgrounds should be added (including the w Drosdel background of the AMP14 deficient flies) to check the robustness of the phenotype.

      To address the concern regarding the robustness of the phenotype across different wildtype backgrounds, we have tested additional genetic backgrounds, including w1, the isogenized w1118 and Oregon animals. 

      The results (new data: Figure 1C) demonstrate that Lp is able to transit freely to the posterior part of the intestine in all backgrounds, while Ecc and Bt are blocked in the anterior part. These findings confirm the robustness of the phenotype across different wildtype strains.

      (2) Although we recognize that this may be limited by the number of GFP-expressing species, other commensal and pathogenic bacteria should be tested in this assay (e.g. E. faecalis and Acetobacter).

      We performed new intoxication experiments using Escherichia coli OP50, a wellestablished innocuous bacterial strain. The data, presented in Figure 1B (new data), show that E. coli OP50, despite being from the same genus as Ecc, does not trigger the blockage response. This further supports our hypothesis that the blockage phenomenon is specific to pathogenic bacteria and not simply related to the bacterial genus.

      (3) It is important to test whether sphincter closure also occurs in continuous feeding conditions. This does not mean repeating all the experiments but just shows that this mechanism can take place in conditions where larvae are kept in a vial with food.

      While the movies we provide involve larvae immobilized on tape in a humid chamber, which is not a fully physiological context, we now provide new data (Movie 3) showing that, after co-treatment with fluorescent Dextran (Red) and fluorescent Bt (Green), both substances are initially blocked in the anterior midgut. Later, the dextran is released posteriorly once bacterial clearance has begun.

      Additionally, we extended the feeding period in our experiments from 1 hour to 20 hours to simulate more continuous exposure to contaminated food. Even under these prolonged conditions, we observed that pathogens are still blocked in the anterior part of the gut (new data: Supplementary Figure 2B). This confirms that the sphincter mechanism can function in continuous feeding conditions as well.

      (4) What are the molecular determinants discriminating innocuous from pathogenic bacteria? Addressing this point will increase the impact of the article. The fact that Relish mutants have normal valve constriction suggests that peptidoglycan recognition is not involved. Is there a sensing of pathogen virulence factors? 

      Our data suggest that uracil could be a key molecular determinant in discriminating between innocuous and pathogenic bacteria, as previously described by the W-J Lee team in several studies on adult Drosophila. However, in our experiments, exogenous uracil addition using the blue dye protocol (Keita et al., 2017) did not induce any significant changes in the larvae. Similarly, uracil supplementation in adult flies failed to trigger the Ecc expulsion and gut contraction phenotype, as reported by Benguettat et al., 2018. 

      To further investigate this, we tested the addition of uracil during Lp-GFP intoxication. In these experiments, we did not observe any blockage of Lp (new data: Supplementary Figure 5). These results suggest that uracil might not be the sole trigger for the blockage response, or we may not be providing uracil exogenously in the most effective way. Alternatively, there could be other pathogen-specific virulence factors that contribute to this discrimination mechanism.

      To address this question, the authors should infect larvae with Ecc15 evf- mutants or Ecc15 lacking uracil production. 

      Thank you for your suggestion to use Ecc15 evf- mutants or Ecc15 lacking uracil production to explore the role of uracil in bacterial discrimination. While we have provided some data using uracil supplementation (new data: Supplementary Figure 5), we agree that testing mutants like PyrE would be an important next step. Unfortunately, we currently lack access to fluorescent PyrE or Ecc15 evf- mutants.

      We are planning to address this by developing a new protocol involving fluorescent beads alongside bacteria. This approach will allow us to test several bacterial strains in parallel and better define the size threshold of the valve. However, we do not have the relevant data yet, but this will be a key focus of our future work.

      Similarly, does feeding heat-killed Ecc15 or Bt induce sequestration in the anterior midgut (larvae may be fed dextran-FITC at the same time to track bacteria)?

      Unfortunately, in our attempts to test heat-killed or ethanol-killed fluorescent Ecc15 for these experiments, we encountered an issue: while we were able to efficiently kill the bacteria, we lost the GFP signal required to track their position in the gut. This made it challenging to assess whether sequestration in the anterior midgut occurs with non-viable bacteria.

      Is uracil or Bt toxin feeding sufficient to induce valve closure? 

      As previously mentioned, uracil is a strong candidate for bacterial discrimination, and we have tested its role by adding exogenous uracil during Lp-GFP intoxication. However, in these experiments, Lp was not blocked (new data: Supplementary Figure 5). This suggests that uracil alone may not be sufficient to induce valve closure, or it may not be the only factor involved. It is also possible that our method of exogenous uracil supplementation may not be effectively mimicking the endogenous conditions.

      Regarding Bt, we used vegetative cells without Cry toxins in our experiments. Cry toxins are only produced during sporulation and are enclosed in crystals within the spore. The Bt strain we used, 4D22, has been deleted for the plasmids encoding Cry toxins. As a result, there were no Cry toxins present in the Bt-GFP vegetative cells used in our assays. This has been clarified in the Materials and Methods section of the manuscript.

      Would Bleomycin induce the same phenotype? 

      Indeed, Bleomycin, as well as paraquat, has been shown to damage the gut and trigger intestinal cell proliferation in adult Drosophila through mechanisms involving TrpA1. Testing whether Bleomycin induces a similar phenotype in larvae would indeed be interesting.

      However, one challenge we face in our intoxication protocol is that larvae tend to stop feeding when chemicals are added to their food mixture. We encountered similar difficulties in our DTT experiments, which were challenging to set up for this reason. Consequently, we aim to avoid approaches that might impair the general feeding activity of the larvae, as it can significantly affect the outcomes of our experiments.

      Could this process of sphincter closure be more related to food poisoning?

      If gut damage were the primary trigger for sphincter closure, we would indeed expect the blockage phenomenon to occur later following bacterial exposure. However, in our experiments, we observe the blockage occurring early after bacterial contact, suggesting that damage may not be the main trigger for this response.

      That said, we have not yet tested bacterial mutants lacking toxins, nor have we tested a direct damaging agent such as Bleomycin, as proposed. These would be valuable future experiments to explore the potential role of gut damage more thoroughly in this process.

      (5) Is Imd activation normal in trpA1 and DH31 mutants? The authors could use a diptericin reporter gene to check if Diptericin is affected by a lack of valve closure in trpA1.

      To address this, we performed RT-qPCR on whole larval guts from wt, TrpA11 and Dh31KG09001 genetic background. Larvae were fed with Lp, Ecc, Bt or yeast only (new data: SUPP6). We monitored the expression of three different AMP-encoding genes and found that while AMP expression varied depending on the food content, there were no significant differences in AMP expression between the genotypes.

      Additionally, we provide imaging data from AMP reporter animals (pDpt-Cherry) in a wildtype background, fed with fluorescent Lp or Bt (new data: Supplementary Figure 11). These images also support the conclusion that Diptericin expression is not significantly affected by a lack of valve closure in trpA1 and Dh31 mutants.

      (6) Are the 2-6 DH31 positive cells the same cells described by Zaidman et al., Developmental and Comparative Immunology 36 (2012) 638-647.

      The cells identified as hemocytes in the midgut junctions by Zaidman et al. are likely the same cells we describe in our study, as they are located in the same region and are Dh31 positive. We have added a reference to this paper and included lines in the manuscript acknowledging this connection.

      Although confirming whether these cells are Hml+, Dh31+, and TrpA1+ would clarify their exact identity, this falls outside the scope of our current study. However, the possibility that these cells play a role in physical barrier immunity and also possess a hemocyte identity is indeed intriguing, and we hope future research will explore this further.

      Minor points

      (1) The mutations should be appropriately labelled with the allele name.

      This has been fixed in the main text, in Fig Legends, and in figures. 

      (2) Line 230-231: the sentence is unclear to me.

      We simplified the sentence and do not refer to the expulsion in larvae.

      (3) Discussion: although the discussion is already a bit long, it would be interesting to see if this process is likely to happen/has been described in other insects (mosquito, Bactrocera, ...).

      We reviewed the available literature but were unable to find specific examples describing the blockage phenomenon in other insects. Most studies we found focused on symbiotic bacteria rather than pathogenic or opportunistic bacteria. However, as mentioned in our manuscript, the anterior localization of opportunistic or pathogenic bacteria has been observed in Drosophila by independent research groups.

      (4) Line 546: add the Caudal Won-Jae Lee paper to state the posterior midgut is less microbicidal.

      We added the reference at the right place, mentioning as well that it concerns adults. 

      (5)  Figure 6 indicates what the cells are, shown by the arrow.

      The sentence ‘the arrows point to TARMs’ is present in the legend of Fig6.

      (6) Does the sphincter closure depend on hemocytes?

      As mentioned above, the cells we identify as TrpA1+ in the midgut junction may be the same cells described by Zaidman et al., 2012, and earlier by Lajeunesse et al., 2010. Inactivating hemocytes using the Hml-Gal4 driver may also affect these Dh31+ cells, as they share similarities with hemocytes, as pointed out by Zaidman et al. However, distinguishing between hemocytes and Dh31+/TrpA1+ cells would require a genetic intersectional approach, which is beyond the scope of our current study.

      Nevertheless, the possibility that these cells play a dual role in immunity (through blockage) and share characteristics with hemocytes while functioning as enteroendocrine cells (EECs) is quite intriguing and deserves further exploration in future studies.

    1. Author response:

      Reviewer #1 (Public review):

      Lu et al. use their workflow to visualize RNA expression of five enzymes that are each involved in the biosynthetic pathway of different neurotransmitters/modulators, namely chat (cholinergeric), gad (GABAergic), tbh (octopaminergic), th (dopaminergic), and tph (serotonergic). In this way, they generate an anatomical atlas of neurons that produce these molecules. Collectively these markers are referred to as the "neuronpool." They overstate when they write, "The combination of these five types of neurons constitutes a neuron pool that enables the labeling of all neurons throughout the entire body." This statement does not accurately represent the state of our knowledge about the diversity of neurons in S. mediterranea. There are several lines of evidence that support the presence of glutamatergic and glycinergic neurons, including the following. The glutamate receptor agonists NMDA and AMPA both produce seizure-like behaviors in S. mediterranea that are blocked by the application of glutamate receptor antagonists MK-801 and DNQX (which antagonize NMDA and AMPA glutamate receptors, respectively; Rawls et al., 2009). scRNA-Seq data indicates that neurons in S. mediterranea express a vesicular glutamate transporter, a kainite-type glutamate receptor, a glycine receptor, and a glycine transporter (Brunet Avalos and Sprecher, 2021; Wyss et al., 2022). Two AMPA glutamate receptors, GluR1 and GluR2, are known to be expressed in the CNS of another planarian species, D. japonica (Cebria et al., 2002). Likewise, there is abundant evidence for the presence of peptidergic neurons in S. mediterranea (Collins et al., 2010; Fraguas et al., 2012; Ong et al., 2016; Wyss et al., 2022; among others) and in D. japonica (Shimoyama et al., 2016). For these reasons, the authors should not assume that all neurons can be assayed using the five markers that they selected. The situation is made more complex by the fact that many neurons in S. mediterranea appear to produce more than one neurotransmitter/modulator/peptide (Brunet Avalos and Sprecher, 2021; Wyss et al., 2022), which is common among animals (Vaaga et al., 2014; Brunet Avalos and Sprecher, 2021). However the published literature indicates that there are substantial populations of glutamatergic, glycinergic, and peptidergic neurons in S. mediterranea that do not produce other classes of neurotransmission molecule (Brunet Avalos and Sprecher, 2021; Wyss et al., 2022). Thus it seems likely that the neuronpool will miss many neurons that only produce glutamate, glycine or a neuropeptide.

      In response to your comments, we agree that our initial statement regarding the "neuron pool" overstated the extent of neuronal coverage provided by the five selected markers. We have revised the sentence as “The combination of these five types of neurons constitutes a neuron pool that enables the labeling of most of the neurons throughout the entire body, including the eyes, brain, and pharynx”. 

      Furthermore, we chose the five neurotransmitter systems (cholinergic, GABAergic, octopaminergic, dopaminergic, and serotonergic) based on their well-characterized roles in planarian neurobiology and the availability of reliable markers. However, we acknowledge the limitations of this approach and recognize that it does not encompass all neuron types, particularly those involved in glutamatergic, glycinergic, and peptidergic signaling, which have been documented in S. mediterranea. We will also add the content about other neuron types in our revised manuscript “Additionally, there is considerable diversity among glutamatergic, glycinergic, and peptidergic neurons in planarians. Many neurons in S. mediterranea express more than one neurotransmitter or neuropeptide, which adds further complexity to the system.”

      The authors use their technique to image the neural network of the CNS using antibodies raised vs. Arrestin, Synaptotagmin, and phospho-Ser/Thr. They document examples of both contralateral and ipsilateral projections from the eyes to the brain in the optic chiasma (Figure 1C-F). These data all seem to be drawn from a single animal in which there appears to be a greater than normal number of nerve fiber defasciculatations. It isn't clear how well their technique works for fibers that remain within a nerve tract or the brain. The markers used to image neural networks are broadly expressed, and it's possible that most nerve fibers are too densely packed (even after expansion) to allow for image segmentation. The authors also show a close association between estrella-positive glial cells and nerve fibers in the optic chiasma. 

      Thank you for your detailed feedback. While we did not perform segmentation of all neuron fibers, we were able to segment more isolated fibers that were not densely packed within the neural tracts. We use 120 nm resolution to segment neurons along the three axes. Our data show the presence of both contralateral and ipsilateral projections of visual neurons. Although Figure 1C-F shows data from one planarian, we imaged three independent specimens to confirm the consistency of these observations. In the revised manuscript, we will include a discussion on the limitations of TLSM in reconstructing neural networks, particularly when it comes to resolving fibers within densely packed regions of the nerve tracts.

      The authors count all cell types, neuron pool neurons, and neurons of each class assayed. They find that the cell number to body volume ratio remains stable during homeostasis (Figure S3C), and that the brain volume steadily increases with increasing body volume (Figure S3E). They also observe that the proportion of neurons to total body cells is higher in worms 2-6 mm in length than in worms 7-9 mm in length (Figure 2D, S3F). They find that the rate at which four classes of neurons (GABAergic, octopaminergic, dopaminergic, serotonergic) increase relative to the total body cell number is constant (Figure S3G-J). They write: "Since the pattern of cholinergic neurons is the major cell population in the brain, these results suggest that the above observation of the non-linear dynamics between neurons and cell numbers is likely from the cholinergic neurons." This conclusion should not be reached without first directly counting the number of cholinergic neurons and total body cells. Given that glutamatergic, glycinergic, and peptidergic neurons were not counted, it also remains possible that the non-linear dynamics are due (in part or in whole) to one or more of these populations. 

      We have removed the statement "Since the pattern of cholinergic neurons is the major cell population in the brain, these results suggest that the above observation of the non-linear dynamics between neurons and cell numbers is likely from the cholinergic neurons." We changed this statement into “These results suggest that the above observation of the non-linear dynamics between neurons and cell numbers is not likely from the octopaminergic, GABAergic, dopaminergic and serotonergic neurons. Since our neuron pool may not include glutamatergic, glycinergic, and peptidergic neurons, we would like to add the possibility that the non-linear dynamics may be from cholinergic neurons or other neurons not included in our staining.”

      Reviewer #2 (Public review): 

      Weaknesses: 

      (1) The proprietary nature of the microscope, protected by a patent, limits the technical details provided, making the method hard to reproduce in other labs. 

      Thank you for your comment. We understand the importance of reproducibility and transparency in scientific research. We would like to point out that the detailed design and technical specifications of the TLSM are publicly available in our published work: Chen et al., Cell Reports, 2020. Additionally, the protocol for C-MAP, including the specific experimental steps, is comprehensively described in the methods section of this paper. We believe that these resources should provide sufficient information for other labs to replicate the method.

      (2) The resolution of the analyses is mostly limited to the cellular level, which does not fully leverage the advantages of expansion microscopy. Previous applications of expansion microscopy have revealed finer nanostructures in the planarian nervous system (see Fan et al. Methods in Cell Biology 2021; Wang et al. eLife 2021). It is unclear whether the current protocol can achieve a comparable resolution. 

      Thank you for raising this important point. The strength of our C-MAP protocol lies in its fluorescence-protective nature and user convenience. Notably, the sample can be expanded up to 4.5-fold linearly without the need for heating or proteinase digestion, which helps preserve fluorescence signals. In addition, the entire expansion process can be completed within 48 hours. While our current analysis focused on cellular-level structures, our method can achieve comparable or better resolution and we will add this information in the revised manuscript.

      (3) The data largely corroborate past observations, while the novel claims are insufficiently substantiated. 

      A few major issues with the claims: 

      (4) Line 303-304: While 6G10 is a widely used antibody to label muscle fibers in the planarian, it doesn't uniformly mark all muscle types (Scimone at al. Nature 2017). For a more complete view of muscle fibers, it is important to use a combination of antibodies targeting different fiber types or a generic marker such as phalloidin. This raises fundamental concerns about all the conclusions drawn from Figures 4 and 6 about differences between various muscle types. Additionally, the authors should cite the original paper that developed the 6G10 antibody (Ross et al. BMC Developmental Biology 2015). 

      We appreciate the reviewer’s insightful comments and acknowledge that 6G10 does not uniformly label all muscle fiber types. We agree that this limitation should be recognized in the interpretation of our results. we will revise the manuscript to explicitly state the limitations of using 6G10 alone for muscle fiber labeling and highlight the need for additional markers. We would also clarify that the primary objective of our study was not to distinguish all muscle fiber types but rather to demonstrate the application of our 3D tissue reconstruction method in addressing traditional research questions. Nonetheless, we agree that expanding the labeling strategy in future studies would allow for a more thorough investigation of muscle fiber diversity. We will ensure all citations are properly revised and updated in our next version.

      (5) Lines 371-379: The claim that DV muscles regenerate into longitudinal fibers lacks evidence. Furthermore, previous studies have shown that TFs specifying different muscle types (DV, circular, longitudinal, and intestinal) both during regeneration and homeostasis are completely different (Scimone et al., Nature 2017 and Scimone et al., Current Biology 2018). Single-cell RNAseq data further establishes the existence of divergent muscle progenitors giving rise to different muscle fibers. These observations directly contradict the authors' claim, which is only based on images of fixed samples at a coarse time resolution. 

      Thank you for your valuable feedback. Our intent was not to suggest that DV muscles regenerate into longitudinal fibers. Our observations focused on the wound site, where DV muscle fibers appear to reconnect, and longitudinal fibers, along with other muscle types, gradually regenerate to restore the structure of the injured area. We will revise the relevant sections of the manuscript to clarify this dynamic process more accurately.

      (6) Line 423: The manuscript lacks evidence to claim glia guide muscle fiber branching. 

      We will remove this statement from the revised version. Instead, we will focus on describing our observations of the connections between glial cells and muscle fibers.

      (7) Lines 432/478: The conclusion about neuronal and muscle guidance on glial projections is similarly speculative, lacking functional evidence. It is possible that the morphological defects of estrella+ cells after bcat1 RNAi are caused by Wnt signaling directly acting on estrella+ cells independent of muscles or neurons. 

      We understand that this approach is insufficient and we will revise the manuscript to more clearly state the limitations of our data. We will describe our observations as preliminary and suggest that further experiments are required.

      (8) Finally, several technical issues make the results difficult to interpret. For example, in line 125, cell boundaries appear to be determined using nucleus images; in line 136, the current resolution seems insufficient to reliably trace neural connections, at least based on the images presented. 

      We use two setups for imaging cells and neuron projections. For cellular resolution imaging, we utilized a 1× air objective with a numerical aperture (NA) of 0.25 and a working distance of 60 mm (OLYMPUS MV PLAPO). The voxel size used was 0.8×0.8×2.5 µm3. This configuration resulted in a resolution of 2×2×5 µm3 and a spatial resolution of 0.5×0.5×1.25 µm3 with 4× isotropic expansion. Alternatively, for sub-cellular imaging, we employed a 10×0.6 SV MP water immersion objective with 0.8 NA and a working distance of 8 mm (OLYMPUS). The voxel size used in this configuration was 0.26×0.26×0.8 µm3. As a result of this configuration, we achieved a resolution of 0.5×0.5×1.6 µm3 and a spatial resolution of 0.12×0.12×0.4 µm3 with a 4.5× isotropic expansion. The higher resolution achieved with sub-cellular imaging allows us to observe finer structures and trace neural connections.

      Regarding your question about cell boundaries, we will revise the manuscript to specify that the boundaries we identified are those of each nucleus, rather than entire cells. This distinction will be made clear in the revised version.

      Reviewer #3 (Public review): 

      Weaknesses: 

      (1) The work would have been strengthened by a more careful consideration of previous literature. Many papers directly relevant to this work were not cited. Such omissions do the authors a disservice because in some cases, they fail to consider relevant information that impacts the choice of reagents they have used or the conclusions they are drawing. 

      For example, when describing the antibody they use to label muscles (monoclonal 6G10), they do not cite the paper that generated this reagent (Ross et al PMCID: PMC4307677), and instead, one of the papers they do cite (Cebria 2016) that does not mention this antibody. Ross et al reported that 6G10 does not label all body wall muscles equivalently, but rather "predominantly labels circular and diagonal fibers" (which is apparent in Figure S5A-D of the manuscript being reviewed here). For this reason, the authors of the paper showing different body wall muscle populations play different roles in body patterning (Scimone et al 2017, PMCID: PMC6263039, also not cited in this paper) used this monoclonal in combination with a polyclonal antibody to label all body wall muscle types. Because their "pan-muscle" reagent does not label all muscle types equivalently, it calls into question their quantification of the different body wall muscle populations throughout the manuscript. It does not help matters that their initial description of the body wall muscle types fails to mention the layer of thin (inner) longitudinal muscles between the circular and diagonal muscles (Cebria 2016 and citations therein). 

      Ipsilateral and contralateral projections of the visual axons were beautifully shown by dye-tracing experiments (Okamoto et al 2005, PMID: 15930826). This paper should be cited when the authors report that they are corroborating the existence of ipsilateral and contralateral projections. 

      Thank you for your feedback. We will incorporate these citations and clarifications into the revised manuscript. We acknowledge the limitations of this approach and recognize that it does not encompass all neuron types, particularly those involved in glutamatergic, glycinergic, and peptidergic signaling. We will also add the content about other neuron types in our revised version.

      (2) The proportional decrease of neurons with growth in S. mediterranea was shown by counting different cell types in macerated planarians (Baguna and Romero, 1981; https://link.springer.com/article/10.1007/BF00026179) and earlier histological observations cited there. These results have also been validated by single-cell sequencing (Emili et al, bioRxiv 2023, https://www.biorxiv.org/content/10.1101/2023.11.01.565140v). Allometric growth of the planaria tail (the tail is proportionately longer in large vs small planaria) can explain this decrease in animal size. The authors never really discuss allometric growth in a way that would help readers unfamiliar with the system understand this. 

      Thank you for your feedback. We will incorporate these citations and clarifications into the revised manuscript.

      (3) In some cases, the authors draw stronger conclusions than their results warrant. The authors claim that they are showing glial-muscle interactions, however, they do not provide any images of triple-stained samples labeling muscle, neurons, and glia, so it is impossible for the reader to judge whether the glial cells are interacting directly with body wall muscles or instead with the well-described submuscular nerve plexus. Their conclusion that neurons are unaffected by beta-cat or inr-1 RNAi based on anti-phospho-Ser/Thr staining (Fig. 6E) is unconvincing. They claim that during regeneration "DV muscles initially regenerate into longitudinal fibers at the anterior tip" (line 373). They provide no evidence for such switching of muscle cell types, so it is unclear why they say this. 

      We acknowledge that some of our conclusions were overclaimed given the current data, and we appreciate the opportunity to clarify and refine these claims in the revised manuscript. Regarding the statement that "DV muscles initially regenerate into longitudinal fibers at the anterior tip" (line 373), as addressed in our previous response, this phrasing was unclear. Our intent was not to imply that DV muscles switch into longitudinal fibers. Instead, we observed that muscle fibers reconnect at the wound site, with longitudinal fibers and other muscle types gradually restoring the structure. We will revise this section to better describe the dynamic changes observed during regeneration.

      (4) The authors show how their automated workflow compares to manual counts using PI-stained specimens (Figure S1T). I may have missed it, but I do not recall seeing a similar ground truth comparison for their muscle fiber counting workflow. I mention this because the segmented image of the posterior muscles in Figure 4I seems to be missing the vast majority of circular fibers visible to the naked eye in the original image. 

      Thank you for raising this important point. We will include a ground truth comparison of our automated muscle fiber counting with manual counts in the supplementary figures. Regarding the observation of missing circular fibers in Figure 4I, we agree that the segmentation appears to have missed a significant number of circular fibers in this particular image. This may have been due to limitations in the current parameters of the segmentation algorithm, especially in distinguishing fibers in regions of varying intensity or overlap. We are revisiting the segmentation parameters to improve the accuracy of detecting circular fibers, and we will provide an updated version of Figure 4I in the revised manuscript.

      (5) It is unclear why the abstract says, "We found the rate of neuron cell proliferation tends to lag..." (line 25). The authors did not measure proliferation in this work and neurons do not proliferate in planaria. 

      Thank you for bringing this to our attention. What we intended to convey was the increase in neuron number during homeostasis. We will revise the abstract to avoid this mistake in this context and instead describe it as the increase in neuron numbers due to progenitor cell differentiation during homeostasis.

      (6) It is unclear what readers are to make of the measurements of brain lobe angles. Why is this a useful measurement and what does it tell us? 

      The measurement of brain lobe angles is intended to provide a quantitative assessment of the growth and morphological changes of the planarian brain during regeneration. Additionally, the relevance of brain lobe angles has been explored in previous studies, such as Arnold et al., Nature, 2016, further supporting its use as a meaningful parameter.

      (7) The authors repeatedly say that this work lets them investigate planarians at the single-cell level, but they don't really make the case that they are seeing things that haven't already been described at the single-cell level using standard confocal microscopy. 

      Thank you for your comment. We agree that single-cell level imaging has been previously achieved in planarians using conventional confocal microscopy. However, our goal was to extend the application of expansion microscopy by combining C-MAP with tiling light sheet microscopy (TLSM), which allows for faster and high-resolution 3D imaging of whole-mount planarians. This combination offers several key advantages over traditional confocal microscopy. For example, it enables high-throughput imaging across entire organisms with a level of detail and speed that is not easily achieved using confocal methods. This approach allows us to investigate the planarian nervous system at multiple developmental and regenerative stages in a more comprehensive manner, capturing large-scale structures while preserving fine cellular details. The ability to rapidly image whole planarians in 3D with this resolution provides a more efficient workflow for studying complex biological processes. We believe this distinction is significant and represents an advance over previous methods. We will clarify this point in the manuscript to better distinguish our approach from standard techniques.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      In this article the authors described mouse models presenting with backer muscular dystrophy, they created three transgenic models carrying three representative exon deletions: ex45-48 del., ex45-47 19 del., and ex45-49 del. This article is well written but needs improvement in some points.

      Strengths:

      This article is well written. The evidence supporting the authors' claims is robust, though further implementation is necessary. The experiments conducted align with the current state-of-the-art methodologies.

      Weaknesses:

      This article does not analyze atrophy in the various mouse models. Implementing this point would improve the impact of the work

      We thank the reviewer for their constructive suggestions and comments on this work. Muscle hypertrophy is shown with growth in dystrophin-deficient skeletal muscle in mdx mice; thus, we did not pay attention to the factors associated with muscle atrophy in BMD mice. As the reviewer suggested, the examination of the association between type IIa fiber reduction and muscle atrophy is important, and the result is considered to be helpful in resolving the cause of type IIa fiber reduction in BMD mice.

      Thus, we are planning to:

      (1) Evaluate the cross-sectional areas (CSA) of muscles and compare them with the changes in the proportion of type IIa fibers.

      (2) Evaluate the expression levels of Murf1 and Atrogin1 as markers of muscle atrophy using RT-PCR.

      Reviewer #2 (Public review):

      Summary

      Miyazaki et al. established three distinct BMD mouse models by deleting different exon regions of the dystrophin gene, observed in human BMD. The authors demonstrated that these models exhibit pathophysiological changes, including variations in body weight, muscle force, muscle degeneration, and levels of fibrosis, alongside underlying molecular alterations such as changes in dystrophin and nNOS levels. Notably, these molecular and pathological changes progress at different rates depending on the specific exon deletions in the dystrophin gene. Additionally, the authors conducted extensive fiber typing, revealing a site-specific decline in type IIa fibers in BMD mice, which they suggest may be due to muscle degeneration and reduced capillary formation around these fibers.

      Strengths:

      The manuscript introduces three novel BMD mouse models with different dystrophin exon deletions, each demonstrating varying rates of disease progression similar to the human BMD phenotype. The authors also conducted extensive fiber typing across different muscles and regions within the muscles, effectively highlighting a site-specific decline in type IIa muscle fibers in BMD mice.

      Weaknesses:

      The authors have inadequate experiments to support their hypothesis that the decay of type IIa muscle fibers is likely due to muscle degeneration and reduced capillary formation. Further investigation into capillary density and histopathological changes across different muscle fibers is needed, which could clarify the mechanisms behind these observations.

      We thank the reviewer for these positive comments and the very important suggestion about type IIa fiber reduction and capillary change around muscle fibers in BMD mice. From the results of the cardiotoxin-induced muscle degeneration and regeneration model, type IIa and IIx fibers showed delayed recovery compared with that of type-IIb fibers. However, this delayed recovery of type IIa and IIx could not explain the cause of the selective muscle fiber reduction limited to type IIa fibers in BMD mice. Therefore, we considered vascular dysfunction as the reason for the selective type IIa fiber reduction, and we found morphological capillary changes from a “ring pattern” to a “dot pattern” around type IIa fibers in BMD mice. However, the association between selective type IIa fiber reduction and the capillary change around muscle fibers in BMD mice remains unclear due to the lack of information about capillaries around type IIx and IIb fibers. The reviewer pointed out this insufficient evaluation of capillaries around other muscle fibers (except for type IIa fibers), and this suggestion is very helpful for explaining the association between selective type IIa fiber reduction and vascular dysfunction in BMD mice.

      Thus, we are planning to:

      (1) Evaluate the changes in capillary formation around other muscle fibers, except for type IIa fibers (e.g., type IIx and IIb fibers).

      (2) Evaluate the endothelial area around other muscle fibers, except for type IIa fibers.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      The authors have assembled a cohort of 10 SiNET, 1 SiAdeno, and 1 lung MiNEN samples to explore the biology of neuroendocrine neoplasms. They employ single-cell RNA sequencing to profile 5 samples (siAdeno, SiNETs 1-3, MiNEN) and single-nuclei RNA sequencing to profile seven frozen samples (SiNET 4-10).

      They identify two subtypes of siNETs, characterized by either epithelial or neuronal NE cells, through a series of DE analyses. They also report findings of higher proliferation in non-malignant cell types across both subtypes. Additionally, they identify a potential progenitor cell population in a single-lung MiNEN sample.

      Strengths:

      Overall, this study adds interesting insights into this set of rare cancers that could be very informative for the cancer research community. The team probes an understudied cancer type and provides thoughtful investigations and observations that may have translational relevance.

      Weaknesses:

      The study could be improved by clarifying some of the technical approaches and aspects as currently presented, toward enhancing the support of the conclusions:

      (1) Methods: As currently presented, it is possible that the separation of samples by program may be impacted by tissue source (fresh vs. frozen) and/or the associated sequencing modality (single cell vs. single nuclei). For instance, two (SiNET1 and SiNET2) of the three fresh tissues are categorized into the same subtype, while the third (SiNET9) has very few neuroendocrine cells. Additionally, samples from patient 1 (SiNET1 and SiNET6) are separated into different subtypes based on fresh and frozen tissue. The current text alludes to investigations (i.e.: "Technical effects (e.g., fresh vs. frozen samples) could also impact the capture of distinct cell types, although we did not observe a clear pattern of such bias."), but the study would be strengthened with more detail.

      We thank the reviewer for the thoughtful and constructive review. Due to the difficulty in obtaining enough SiNET samples, we used two platforms to generate data - single cell analysis of fresh samples, and single nuclei analysis of frozen samples. We opted to combine both sample types in our analysis while being fully aware of the potential for batch effects. We therefore agree that this is a limitation of our work, and that differences between samples should be interpreted with caution.

      Nevertheless, we argue that the two SiNET subtypes that we have identified are very unlikely to be due to such batch effect. First, the epithelial SiNET subtype was not only detected in two fresh samples but also in one frozen sample (albeit with relatively few cells, as the reviewer correctly noted). Second, and more importantly, the epithelial SiNET subtype was also identified in analysis of an external and much larger cohort of bulk RNA-seq SiNET samples that does not share the issue of two platforms (as seen in Fig. 2f). Moreover, the proportion of samples assigned to the two subtypes is similar between our data and the external data. We therefore argue that the identification of two SiNET subtypes cannot be explained by the use of two data platforms. However, we agree that the results should be further investigated and validated by future studies, as is often done in research on rare tumors.

      The reviewer also commented that two samples from the same patient which were profiled by different platforms (SiNET1 and SiNET6) were separated into different subtypes. We would like to clarify that this is not the case, since SiNET6 was not included in the subtype analysis due to too few detected Neuroendocrine cells, and was not assigned to any subtype, as noted in the text and as can be seen by its exclusion from Figure 2 where subtypes are defined. We apologize that our manuscript may have gave the wrong impression about SiNET6 classification (it is labeled in Fig. 4a in a misleading manner). In the revised manuscript, we will correct the labeling in Fig. 4a and clarify that SiNET is not assigned to any subtype. We will further acknowledge the limitation of the two platforms and the arguments in favor of the existence of two SiNET subtypes.

      (2) Results:<br /> Heterogeneity in the SiNET tumor microenvironment: It is unclear if the current analysis of intratumor heterogeneity distinguishes the subtypes. It may be informative if patterns of tumor microenvironment (TME) heterogeneity were identified between samples of the same subtype. The team could also evaluate this in an extension cohort of published SiNET tumors (i.e. revisiting additional analyses using the SiNET bulk RNAseq from Alvarez et al 2018, a subset of single-cell data from Hoffman et al 2023, or additional bulk RNAseq validation cohorts for this cancer type if they exist [if they do not, then this could be mentioned as a need in Discussion])

      We agree that analysis of an independent cohort will assist in defining the association between TME and the SiNET subtype. However, the sample size required for that is significantly larger than the data available. In the revised manuscript we will note that as a direction for future studies.

      (3) Proliferation of NE and immune cells in SiNETs: The observed proliferation of NE and immune cells in SiNETs may also be influenced by technical factors (including those noted above). For instance, prior studies have shown that scRNA-seq tends to capture a higher proportion of immune cells compared to snRNA-seq, which should be considered in the interpretation of these results. Could the team clarify this element?

      We agree that different platforms could affect the observed proportions of immune cells, and more generally the proportions of specific cell types. However, the low proliferation of Neuroendocrine cells and the higher proliferation of immune cells (especially B cells, but also T cells and macrophages) is consistently observed in both platforms, as shown in Fig. 4a, and therefore appears to be reliable despite the limitations of our work. We will clarify this consistency in the revised manuscript. 

      (4) Putative progenitors in mixed tumors: As written, the identification of putative progenitors in a single lung MiNEN sample feels somewhat disconnected from the rest of the study. These findings are interesting - are similar progenitor cell populations identified in SiNET samples? Recognizing that ideally additional validation is needed to confidently label and characterize these cells beyond gene expression data in this rare tumor, this limitation could be addressed in a revised Discussion.

      We agree with this comment and will add the need for additional validation for this finding in the revised Discussion.

      Reviewer #2 (Public review):

      Summary:

      The research identifies two main SiNET subtypes (epithelial-like and neuronal-like) and reveals heterogeneity in non-neuroendocrine cells within the tumor microenvironment. The study validates findings using external datasets and explores unexpected proliferation patterns. While it contributes to understanding SiNET oncogenic processes, the limited sample size and depth of analysis present challenges to the robustness of the conclusions.

      Strengths:

      The studies effectively identified two subtypes of SiNET based on epithelial and neuronal markers. Key findings include the low proliferation rates of neuroendocrine (NE) cells and the role of the tumor microenvironment (TME), such as the impact of Macrophage Migration Inhibitory Factor (MIF).

      Weaknesses:

      However, the analysis faces challenges such as a small sample size, lack of clear biological interpretation in some analyses, and concerns about batch effects and statistical significance.

      Reviewer #3 (Public review):

      Summary:

      In this study, the authors set out to profile small intestine neuroendocrine tumors (siNETs) using single-cell/nucleus RNA sequencing, an established method to characterize the diversity of cell types and states in a tumor. Leveraging this dataset, they identified distinct malignant subtypes (epithelial-like versus neuronal-like) and characterized the proliferative index of malignant neuroendocrine cells versus non-malignant microenvironment cells. They found that malignant neuroendocrine cells were far less proliferative than some of their non-malignant counterparts (e.g., B cells, plasma cells, epithelial cells) and there was a strong subtype association such that epithelial-like siNETs were linked to high B/plasma cell proliferation, potentially mediated by MIF signaling, whereas neuronal-like siNETs were correlated with low B/plasma cell proliferation. The authors also examined a single case of a mixed lung tumor (neuroendocrine and squamous) and found evidence of intermediate/mixed and stem-like progenitor states that suggest the two differentiated tumor types may arise from the same progenitor.

      Strengths:

      The strengths of the paper include the unique dataset, which is the largest to date for siNETs, and the potentially clinically relevant hypotheses generated by their analysis of the data.

      Weaknesses:

      The weaknesses of the paper include the relatively small number of independent patients (n = 8 for siNETs), lack of direct comparison to other published single-cell NET datasets, mixing of two distinct methods (single-cell and single-nucleus RNA-seq), lack of direct cell-cell interaction analyses and spatially-resolved data, and lack of in vitro or in vivo functional validation of their findings.

      The analytical methods applied in this study appear to be appropriate, but the methods used are fairly standard to the field of single-cell omics without significant methodological innovation. As the authors bring forth in the Discussion, the results of the study do raise several compelling questions related to the possibility of distinct biology underlying the epithelial-like and neuronal-like subtypes, the origin of mixed tumors, drivers of proliferation, and microenvironmental heterogeneity. However, this study was not able to further explore these questions through spatially-resolved data or functional experiments.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review): 

      Summary: 

      The authors examined the salt-dependent phase separation of the low-complexity domain of hnRN-PA1 (A1-LCD). Using all-atom molecular dynamics simulations, they identified four distinct classes of salt dependence in the phase separation of intrinsically disordered proteins (IDPs), which can be predicted based on their amino acid composition. However, the simulations and analysis, in their current form, are inadequate and incomplete. 

      Strengths: 

      The authors attempt to unravel the mechanistic insights into the interplay between salt and protein phase separation, which is important given the complex behavior of salt effects on this process. Their effort to correlate the influence of salt on the low-complexity domain of hnRNPA1 (A1-LCD) with a range of other proteins known to undergo salt-dependent phase separation is an interesting and valuable topic. 

      Weaknesses: 

      (1) The simulations performed are not sufficiently long (Figure 2A) to accurately comment on phase separation behavior. The simulations do not appear to have converged well, indicating that the system has not reached a steady state, rendering the analysis of the trajectories unreliable.

      We have extended the simulations for an additional 500 ns, to 1500 ns. The last 500 ns show reasonably good convergence (see Figure 2A).

      (2) The majority of the data presented shows no significant alteration with changes in salt concentration. However, the authors have based conclusions and made significant comments regarding salt activities. The absence of error bars in the data representation raises questions about its reliability. Additionally, the manuscript lacks sufficient scientific details of the calculations.  

      We have now included error bars. With the error bars, the salt dependences of all the calculated properties (exception for Rg) show a clear trend. Additionally, we have expanded the descriptions of our calculations (p. 15-16).

      (3) In Figures 2B and 2C, the changes in the radius of gyration and the number of contacts do not display significant variations with changes in salt concentration. The change in the radius of gyration with salt concentration is less than 1 Å, and the number of contacts does not change by at least 1. The authors' conclusions based on these minor changes seem unfounded. 

      The variation of ~ 1 Å for the calculated Rg is similar to the counterpart for the experimental Rg. As for the number of contacts, note that this property is presented on a per-residue basis, so a value of 1 means that each residue picks up one additional contact, or each protein chain gains a total of 131 contacts, when the salt concentration is increased from 50 to 1000 mM.

      Reviewer #2 (Public Review): 

      This is an interesting computational study addressing how salt affects the assembly of biomolecular condensates. The simulation data are valuable as they provide a degree of atomistic details regarding how small salt ions modulate interactions among intrinsically disordered proteins with charged residues, namely via Debye-like screening that weakens the effective electrostatic interactions among the polymers, or through bridging interactions that allow interactions between like charges from different polymer chains to become effectively attractive (as illustrated, e.g., by the radial distribution functions in Supplementary Information). However, this manuscript has several shortcomings: 

      (i) Connotations of the manuscript notwithstanding, many of the authors' concepts about salt effects on biomolecular condensates have been put forth by theoretical models, at least back in 2020 and even earlier. Those earlier works afford extensive information such as considerations of salt concentrations inside and outside the condensate (tie-lines). But the authors do not appear to be aware of this body of prior works and therefore missed the opportunity to build on these previous advances and put the present work with its complementary advantages in structural details in the proper context.

      (ii) There are significant experimental findings regarding salt effects on condensate formation [which have been modeled more recently] that predate the A1-LCD system (ref.19) addressed by the present manuscript. This information should be included, e.g., in Table 1, for sound scholarship and completeness. 

      (iii) The strengths and limitations of the authors' approach vis-à-vis other theoretical approaches should be discussed with some degree of thoroughness (e.g., how the smallness of the authors' simulation system may affect the nature of the "phase transition" and the information that can be gathered regarding salt concentration inside vs. outside the "condensate" etc.). Accordingly, this manuscript should be revised to address the following. In particular, the discussion in the manuscript should be significantly expanded by including references mentioned below as well as other references pertinent to the issues raised. 

      (1) The ability to use atomistic models to address the questions at hand is a strength of the present work. However, presumably because of the computational cost of such models, the "phase-separated" "condensates" in this manuscript are extremely small (only 8 chains). An inspection of Fig.1 indicates that while the high-salt configuration (snapshot, bottom right) is more compact and droplet-like than the low-salt configuration (top right), it is not clear that the 50 mM NaCl configuration can reasonably correspond to a dilute or homogeneous phase (without phase separation) or just a condensate with a lower protein concentration because the chains are still highly associated. One may argue that they become two droplets touching each other (the chains are not fully dispersed throughout the simulation box, unlike in typical coarse-grained simulations of biomolecular phase separation). While it may not be unfair to argue from this observation that the condensed phase is less stable at low salt, this raises critical questions about the adequacy of the approach as a stand-alone source of theoretical information. Accordingly, an informative discussion of the limitation of the authors' approach and comparisons with results from complementary approaches such as analytical theories and coarsegrained molecular dynamics will be instructive-even imperative, especially since such results exist in the literature (please see below). 

      We now discuss the limitations of our all-atom simulations and also other approaches (p. 13; see below).

      (2) The aforementioned limitation is reflected by the authors' choice of using Dmax as a sort of phase separation order parameter. However, no evidence was shown to indicate that Dmax exhibits a twostate-like distribution expected of phase separation. It is also not clear whether a Dmax value corresponding to the linear dimension of the simulation box was ever encountered in the authors' simulated trajectories such that the chains can be reliably considered to be essentially fully dispersed as would be expected for the dilute phase. Moreover, as the authors have noted in the second paragraph of the Results, the variation of Dmax with simulation time does not show a monotonic rank order with salt concentration. The authors' explanation is equivalent to stipulating that the simulation system has not fully equilibrated, inevitably casting doubt on at least some of the conclusions drawn from the simulation data. 

      First off, with the extended simulations, the Dmax values converge to a tiered order rank, with successively decreasing values from low salt (50 mM) to intermediate salt (150 and 300 mM) to high salt (500 and 1000 mM). Secondly, as we now state (p. 13), our low-salt simulations mimic a homogenous solution whereas our high-salt simulations mimic the dense phase of a phase-separated system. The intermediate-salt simulations also mimic the dense phase but at a somewhat lower concentration (hence the intermediate Dmax value).

      (3) With these limitations, is it realistic to estimate possible differences in salt concentration between the dilute and condensed phases in the present work? These features, including tie-lines, were shown to be amenable to analytical theory and coarse-grained molecular dynamics simulation (please see below).  

      The differences in salt effects that we report do not represent those between two phases. Rather, as explained in the preceding reply, they represent differences between a homogenous solution at low salt and the dense phase at higher salt. We also acknowledge salt effects calculated by analytical theory and coarse-grained simulations (p. 13).

      (4) In the comparison in Fig.2B between experimental and simulated radius of gyration as a function of [NaCl], there is an outlier among the simulated radii of gyration at [NaCl] ~ 250 mM. An explanation should be offered.  

      After extending the simulations and analyzing the last 500 ns, the Rg data no longer show an outlier though still have some fluctuations from one salt concentration to another.

      (5) The phenomenon of no phase separation at zero and low salt and phase separation at higher salt has been observed for the IDP Caprin1 and several of its mutants [Wong et al., J Am Chem Soc 142, 24712489 (2020) [https://pubs.acs.org/doi/full/10.1021/jacs.9b12208], see especially Fig.9 of this reference]. This work should be included in the discussion and added to Table 1. 

      We now have added Caprin1 to Table 1 (new ref 26) and discuss this paper (p. 13).

      (6) The authors stated in the Introduction that "A unifying understanding of how salt affects the phase separation of IDPs is still lacking". While it is definitely true that much remains to be learned about salt effects on IDP phase separation, the advances that have already been made regarding salt effects on IDP phase separation is more abundant than that conveyed by this narrative. For instance, an analytical theory termed rG-RPA was put forth in 2020 to provide a uniform (unified) treatment of salt, pH, and sequence-charge-pattern effects on polyampholytes and polyelectrolytes (corresponding to the authors' low net charge and high net charge cases). This theory offers a means to predict salt-IDP tie-lines and a comprehensive account of salt effect on polyelectrolytes resulting in a lack of phase separation at extremely low salt and subsequent salt-enhanced phase separation (similar to the case the authors studied here) and in some cases re-entrant phase separation or dissolution [Lin et al., J Chem Phys 152. 045102 (2020) [https://doi.org/10.1063/1.5139661]]. This work is highly relevant and it already provided a conceptual framework for the authors' atomistic results and subsequent discussion. As such, it should definitely be a part of the authors' discussion. 

      We now cite this paper (new ref 34) in Introduction (p. 4). We also discuss its results for Caprin1 (new ref 18; p. 13).

      (7) Bridging interactions by small ions resulting in effective attractive interactions among polyelectrolytes leading to their phase separation have been demonstrated computationally by Orkoulas et al., Phys Rev Lett 90, 048303 (2003) [https://journals.aps.org/prl/abstract/10.1103/PhysRevLett.90.048303]. This result should also be included in the discussion. 

      We now cite this paper (new ref 41; p. 11).

      (8) More recently, the salt-dependent phase separations of Caprin1, its RtoK variants and phosphorylated variant (see item #5 above) were modeled (and rationalized) quite comprehensively using rG-RPA, field-theoretic simulation, and coarse-grained molecular dynamics [Lin et al., arXiv:2401.04873 [https://arxiv.org/abs/2401.04873]], providing additional data supporting a conceptual perspective put forth in Lin et al. J Chem Phys 2020 (e.g., salt-IDP tie-lines, bridging interactions, reentrance behaviors etc.) as well as in the authors' current manuscript. It will be very helpful to the readers of eLife to include this preprint in the authors' discussion, perhaps as per the authors' discretion along the manner in which other preprints are referenced and discussed in the current version of the manuscript. 

      We now cite this paper (new ref 18) and discuss it along with new ref 26 in Discussion (p. 13).

      Reviewer #3 (Public Review): 

      Summary: 

      This study investigates the salt-dependent phase separation of A1-LCD, an intrinsically disordered region of hnRNPA1 implicated in neurodegenerative diseases. The authors employ all-atom molecular dynamics (MD) simulations to elucidate the molecular mechanisms by which salt influences A1-LCD phase separation. Contrary to typical intrinsically disordered protein (IDP) behavior, A1-LCD phase separation is enhanced by NaCl concentrations above 100 mM. The authors identify two direct effects of salt: neutralization of the protein's net charge and bridging between protein chains, both promoting condensation. They also uncover an indirect effect, where high salt concentrations strengthen pi-type interactions by reducing water availability. These findings provide a detailed molecular picture of the complex interplay between electrostatic interactions, ion binding, and hydration in IDP phase separation. 

      Strengths: 

      Novel Insight: The study challenges the prevailing view that salt generally suppresses IDP phase separation, highlighting A1-LCD's unique behavior. 

      Rigorous Methodology: The authors utilize all-atom MD simulations, a powerful computational tool, to investigate the molecular details of salt-protein interactions. 

      Comprehensive Analysis: The study systematically explores a wide range of salt concentrations, revealing a nuanced picture of salt effects on phase separation. 

      Clear Presentation: The manuscript is well-written and logically structured, making the findings accessible to a broad audience. 

      Weaknesses: 

      Limited Scope: The study focuses solely on the truncated A1-LCD, omitting simulations of the full-length protein. This limitation reduces the study's comparative value, as the authors note that the full-length protein exhibits typical salt-dependent behavior. A comparative analysis would strengthen the manuscript's conclusions and broaden its impact.

      Perhaps we did not impress on the reviewer how expensive the all-atom MD simulations on A1-LCD were: the systems each contained half a million atoms and the simulations took many months to complete. That said, we agree with the reviewer that, ideally, a comparative study on a protein showing the typical screening class of salt dependence would have made our work more complete. However, we are confident of the conclusions for several reasons. First, the three salt effects – charge neutralization, bridging, and strengthening of pi-types of interactions – revealed by the all-atom simulations are physically sound and well-supported by other studies. Second, these effects led us to develop a unified picture for the salt dependence of homotypic phase separation, in the form of a predictor for the classes of salt dependence based on amino-acid composition. This predictor works well for nearly 30 proteins. Third, recent studies using analytical theory and coarse-grained simulations (new ref 18) also strongly support our conclusions.

      Reviewer #1 (Recommendations For The Authors): 

      (1) In Figure 1, the color scheme should be updated and the figure remade, as the current set of color choices makes it very difficult to distinguish the magenta spheres.  

      We have increased the sizes of ions in Figure 1 to make them distinguishable.

      (2) Within the framework of atomistic simulations, the influence of salt concentration alteration on protein conformational plasticity is worth investigating. This could be correlated (with proper details) with the effect of salt-concentration-modulated protein aggregation behavior. 

      We now use RMSF to measure conformational plasticity, which shows a clear salt-dependent trend with a 27% reduction in fluctuations from 50 mM to 1000 mM NaCl (new Fig. S1).

      (3) The authors should mention the protein concentrations employed in the simulations and whether these are consistent with experimentally used concentrations.  

      We have mentioned the initial concentration (3.5 mM). We now further state that this concentration is maintained in the low-salt simulations, indicating absence of phase separation, but is increased to 23 mM in the high-salt simulations, indicating phase separation. The latter value is consistent with the measured concentrations in the dense phase (last two paragraphs of p. 5).

      (4) It would be useful to test the salt effect for at least two extreme salt concentrations at various protein concentrations, consistent with experimental protein concentration ranges.  

      In simulation studies of short peptides (ref 37), we have shown that the initial concentration does not affect the final concentration in the dense phase, as expected for phase-separation systems. We expect that the same will be true for the A1-LCD system at intermediate and high salt where phase separation occurs. Though this expectation could be tested by simulations at a different initial protein concentration, such simulations would be expensive but unlikely to yield new physical insight.

      (5) Importantly, the simulations do not appear to have converged well enough (Figure 2A). The authors should extend the simulation trajectories to ensure the system has reached a steady state.  

      We extended the simulations for an additional 500 ns, which now appear to show convergence. In Figure 2A we now see Dmax values converge to a tiered order rank, with successively decreasing values from low salt (50 mM) to intermediate salt (150 and 300 mM) to high salt (500 and 1000 mM). 

      (6) The authors mention "phase separation" in the title, but with only a 1 μs simulation trajectory, it is not possible to simulate a phenomenon like phase separation accurately. Since atomistic simulations cannot realistically capture phase separation on this timescale, a coarse-grained approach is more suitable. To properly explore salt effects in the context of phase separation, long timescale simulation trajectories should be considered. Otherwise, the data remain unreliable. 

      Our all-atom simulations revealed rich salt effects that might have been missed in coarse-grained simulations. It is true that coarse-grained models allow the simulations of the phase separation process, but as we have recently demonstrated (refs 36 and 37), all-atom simulations on the μs timescale are also able to capture the spontaneous phase separation of peptides and small IDPs. A1-LCD is much larger than those systems, so we had to use a relatively small chain number (8 chains here vs 64 used in ref 37 and 16 used in ref 37). S2ll, we observe the condensation into a dense phase at high salt. We discuss the pros and cons of all-atom vs. coarse-grained simulations in p. 13.

      (7) In Figure 5E, the plot does not show that g(r) has reached 1. If it does, the authors should show the full curve. The same issue remains with supplementary figures 1, 2, 3, etc.  

      We now show the approach to 1 in the insets of Figs. S2, S3, S4, and 5E.

      (8) None of the data is represented with error bars. The authors should include error bars in their data representations. 

      We have now included error bars in all graphs that report average values.

      (9) The authors state that "the net charge of the system reduces to only +8 at 1000 mM NaCl (Figure 3C)" but do not explain how this was calculated. 

      We now add this explanation in methods (p. 16).

      (10). The authors mention "similar to the role played by ATP molecules in driving phase separation of positively charged IDPs." However, ATP can inhibit aggregation, and its induction of phase separation is concentration-dependent. Given ATP's large aromatic moiety, its comparison to ions is not straightforward and is more complex. This comparison can be at best avoided. 

      In this context we are comparing the bridging capability of ATP molecules in driving phase separation of positively charged IDPs in ref 36 to the bridging capability of the ions here. In ref 36 the authors show ATP bridging interactions between protein chains similar to what we show here with ions.

      (11) Many calculations are vaguely represented. The process for calculating the number of bridging ions, for example, is not well documented. The authors should provide sufficient details to allow for the reproducibility of the data. 

      We have now expanded the methods section to include more detailed information on calculations done.

      Reviewer #3 (Recommendations For The Authors): 

      Include error bars or standard deviations for all results averaged over four replicates, particularly for the number of ions and contacts per residue. This would provide a clearer picture of the data's reliability and variability. 

      We have now included error bars in all graphs that report averaged values.

      Strengthen the support for the conclusion that "each Arg sidechain often coordinates two Cl- ions, multiple backbone carbonyls often coordinate a single Na+ ion." While Fig. 3A clearly demonstrates ArgCl- coordination, the Na+ coordination claim for a 131-residue protein requires further clarification. Consider including the integration profile of radial distribution functions for Na+ ions to bolster this assertion. 

      We now report the number of Na+ ions that coordinate with multiple backbone carbonyls (p. 7) as well as the number of Na+ ions that bridge between A1-LCD chains via coordination with multiple backbone carbonyls (p. 9). Please note that Figure 4A right panel displays an example of Na+ coordinating with multiple backbone carbonyls.

      Address the following typographical errors in the main text: o Page 11, line 25: "distinct classes of sat dependence" should be "distinct classes of salt dependence" o Page 14, line 9: "for Cl- and 3.0 and 5.4 A" should be "for Cl- and 3.0 and 5.4 √Ö" o Page 14, line 18: "As a control, PRDFs for water were also calculated" should be "As a control, RDFs for water were also calculated" (assuming PRDF was meant to be RDF) 

      We have now corrected these typos.

      Consider expanding the study to include simulations of the full-length protein to provide a more comprehensive comparison between the truncated A1-LCD and the complete protein's behavior in various salt concentrations. 

      As we explained above, even with eight chains of A1-LCD, which has 131 residues, the systems already contain half a million atoms each and the all-atom simulations took many months to complete. Full-length A1 has 314 residues so a multi-chain system would be too large to be feasible for all-atom simulations.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public review):

      Summary:

      Crosslinking mass spectrometry has become an important tool in structural biology, providing information about protein complex architecture, binding sites and interfaces, and conformational changes. One key challenge of this approach represents the quantitation of crosslinking data to interrogate differential binding states and distributions of conformational states.

      Here, Luo and Ranish present a novel class of isobaric crosslinkers ("Qlinkers"), conduct proof-of-concept benchmarking experiments on known protein complexes, and show example applications on selected target proteins. The data are solid and this could well be an exciting, convincing new approach in the field if the quantitation strategy is made more comprehensive and the quantitative power of isobaric labeling is fully leveraged as outlined below. It's a promising proof-of-concept, and potentially of broad interest for structural biologists.

      Strengths:

      The authors demonstrate the synthesis, application, and quantitation of their "Q2linkers", enabling relative quantitation of two conditions against each other. In benchmarking experiments, the Q2linkers provide accurate quantitation in mixing experiments. Then the authors show applications of Q2linkers on MBP, Calmodulin, selected transcription factors, and polymerase II, investigating protein binding, complex assembly, and conformational dynamics of the respective target proteins. For known interactions, their findings are in line with previous studies, and they show some interesting data for TFIIA/TBP/TFIIB complex formation and conformational changes in pol II upon Rbp4/7 binding.

      Weaknesses:

      This is an elegant approach but the power of isobaric mass tags is not fully leveraged in the current manuscript.

      First, "only" Q2linkers are used. This means only two conditions can be compared. Theoretically, higher-plexed Qlinkers should be accessible and would also be needed to make this a competitive method against other crosslinking quantitation strategies. As it is, two conditions can still be compared relatively easily using LFQ - or stable-isotope-labeling based approaches. A "Q5linker" would be a really useful crosslinker, which would open up comprehensive quantitative XLMS studies.

      We agree that a multiplexed Qlinker approach would be very useful. The multiplexed Qlinkers are more difficult and more expensive to synthesize. We are currently working on different schemes for synthesizing multiplexed Qlinkers.

      Second, the true power of isobaric labeling, accurate quantitation across multiple samples in a single run, is not fully exploited here. The authors only show differential trends for their interaction partners or different conformational states and do not make full quantitative use of their data or conduct statistical analyses. This should be investigated in more detail, e.g. examine Qlinker quantitation of MBP incubated with different concentrations of maltose or Calmodulin incubated with different concentrations of CBPs. Does Qlinker quantitation match ratios predicted using known binding constants or conformational state populations? Is it possible to extract ratios of protein populations in different conformations, assembly, or ligand-bound states?

      With these two points addressed this approach could be an important and convincing tool for structural biologists.

      We agree that multiplexed Qlinkers would open the door to exciting avenues of investigation such as studying conformational state populations.  We plan to conduct the suggested experiments when multiplexed Qlinkers are available.

      Reviewer #2 (Public review):

      The regulation of protein function heavily relies on the dynamic changes in the shape and structure of proteins and their complexes. These changes are widespread and crucial. However, examining such alterations presents significant challenges, particularly when dealing with large protein complexes in conditions that mimic the natural cellular environment. Therefore, much emphasis has been put on developing novel methods to study protein structure, interactions, and dynamics. Crosslinking mass spectrometry (CSMS) has established itself as such a prominent tool in recent years. However, doing this in a quantitative manner to compare structural changes between conditions has proven to be challenging due to several technical difficulties during sample preparation. Luo and Ranish introduce a novel set of isobaric labeling reagents, called Qlinkers, to allow for a more straightforward and reliable way to detect structural changes between conditions by quantitative CSMS (qCSMS).

      The authors do an excellent job describing the design choices of the isobaric crosslinkers and how they have been optimized to allow for efficient intra- and inter-protein crosslinking to provide relevant structural information. Next, they do a series of experiments to provide compelling evidence that the Qlinker strategy is well suited to detect structural changes between conditions by qCSMS. First, they confirm the quantitative power of the novel-developed isobaric crosslinkers by a controlled mixing experiment. Then they show that they can indeed recover known structural changes in a set of purified proteins (complexes) - starting with single subunit proteins up to a very large 0.5 MDa multi-subunit protein complex - the polII complex.

      The authors give a very measured and fair assessment of this novel isobaric crosslinker and its potential power to contribute to the study of protein structure changes. They show that indeed their novel strategy picks up expected structural changes, changes in surface exposure of certain protein domains, changes within a single protein subunit but also changes in protein-protein interactions. However, they also point out that not all expected dynamic changes are captured and that there is still considerable room for improvement (many not limited to this crosslinker specifically but many crosslinkers used for CSMS).

      Taken together the study presents a novel set of isobaric crosslinkers that indeed open up the opportunity to provide better qCSMS data, which will enable researchers to study dynamic changes in the shape and structure of proteins and their complexes. However, in its current form, the study some aspects of the study should be expanded upon in order for the research community to assess the true power of these isobaric crosslinkers. Specifically:

      Although the authors do mention some of the current weaknesses of their isobaric crosslinkers and qCSMS in general, more detail would be extremely helpful. Throughout the article a few key numbers (or even discussions) that would allow one to better evaluate the sensitivity (and the applicability) of the method are missing. This includes:

      (1) Throughout all the performed experiments it would be helpful to provide information on how many peptides are identified per experiment and how many have actually a crosslinker attached to it.

      As the goal of the experiments is to maximize identification of crosslinked peptides which tend to have higher charge states, we targeted ions with charge states of 3+ or higher in our MS acquisition settings for CLMS, and ignored ions with 2+ charge states, which correspond to many of the normal (i.e., not crosslinked) peptides that are identified by MS. As a result, normal peptides are less likely to be identified by the MS procedure used in our CLMS experiments compared to MS settings typically used to identify normal peptides. Our settings may also fail to identify some mono-modified peptides. Like most other CLMS methods, the total number of identified crosslinked peptide spectra is usually less than 1% of the total acquired spectra and we normally expect the crosslinked species to be approximately 1% of the total peptides. 

      We added information about the number of crosslinked and monolinked peptides identified in the pol I benchmarking experiments (line 173).  The number of crosslinks and monolinks identified in the pol II +/- a-amanitin experiment, the TBP/TFIIA/TFIIB experiment and the pol II experiment +/- Rpb4/7 are also provided.

      (2) Of all the potential lysines that can be modified - how many are actually modified? Do the authors have an estimate for that? It would be interesting to evaluate in a denatured sample the modification efficiency of the isobaric crosslinker (as an upper limit as here all lysines should be accessible) and then also in a native sample. For example, in the MBP experiment, the authors report the change of one mono-linked peptide in samples containing maltose relative to the one not containing maltose. The authors then give a great description of why this fits to known structural changes. What is missing here is a bit of what changes were expected overall and which ones the authors would have expected to pick up with their method and why have they not been picked up. For example, were they picked up as modified by the crosslinker but not differential? I think this is important to discuss appropriately throughout the manuscript to help the reader evaluate/estimate the potential sensitivity of the method. There are passages where the authors do an excellent job doing that - for example when they mention the missed site that they expected to see in the initial the pol II experiments (lines 191 to 207). This kind of "power analysis" should be heavily discussed throughout the manuscript so that the reader is better informed of what sensitivity can be expected from applying this method.

      Regarding the Pol II complex experiment described in Figures 4 and 5, out of the 277 lysine residues in the complex, 207 were identified as monolinked residues (74.7%), and 817 crosslinked pairs out of 38,226 potential pairs (2.1%) were observed. The ability of CLMS to detect proximity/reactivity changes may be impacted by several factors including 1) the (low) abundance of crosslinked peptides in complex mixtures, 2) the presence of crosslinkable residues in close proximity with appropriate orientation, and 3) the ability to generate crosslinked peptides by enzymatic digestion that are amenable to MS analysis (i.e., the peptides have appropriate m/z’s and charge states, the peptides ionize well, the peptides produce sufficient fragment ions during MS2 analysis to allow confident identification). Future efforts to enrich crosslinked peptides prior to MS analysis may improve sensitivity.

      It is very difficult to estimate the modification efficiency of Qlinker (or many other crosslinkers) based on peptide identification results. One major reason for this is that trypsin is not able to cleave after a crosslinker-modified lysine residue.  As a result, the peptides generated after the modification reaction have different lengths, compositions, charge states, and ionization efficiencies compared to unmodified peptides. These differences make it very difficult to estimate the modification efficiencies based on the presence/absence of certain peptide ions, and/or the intensities of the modified and unmodified versions of a peptide. Also, 2+ ions which correspond to many normal (i.e., unmodified) peptides were excluded by our MS acquisition settings.

      It is also very difficult to predict which structural changes are expected and which crosslinked peptides and/or modified peptides can be observed by MS.  This is especially true when the experiment involves proteins containing unstructured regions such as the experiments involving Pol II, and TBP, TFIIA and TFIIB. Since we are at the early stages of using qCLMS to study structural changes, we are not sure which changes we can expect to observe by qCLMS. Additional applications of Qlinker-CLMS are needed to better understand the types of structural changes that can be studied using the approach.

      We hope that our discussions of some the limitations of CLMS for detecting conformational/reactivity changes provide the reader with an understanding of the sensitivity that can be expected with the approach.  At the end of the paragraph about the pol II a-amanitin experiment we say, “Unfortunately, no Q2linker-modified peptides were identified near the site where α-amanitin binds. This experiment also highlights one of the limitations of residue-specific, quantitative CLMS methods in general. Reactive residues must be available near the region of interest, and the modified peptides must be identifiable by mass spectrometry.” In the section about Rbp4/7-induced structural changes in pol II we describe the under-sampling issue. And in the last paragraph we reiterate these limitations and say, “This implies that this strategy, like all MS-based strategies, can only be used for interpretation of positively identified crosslinks or monolinks. Sensitivity and under sampling are common problems for MS analysis of complex samples.”

      (3) It would be very helpful to provide information on how much better (or not) the Qlinker approach works relative to label-free qCLMS. One is missing the reference to a potential qCLMS gold standard (data set) or if such a dataset is not readily available, maybe one of the experiments could be performed by label-free qCLMS. For example, one of the differential biosensor experiments would have been well suited.

      We agree with the reviewer that it will be very helpful to establish gold standard datasets for CLMS. As we further develop and promote this technology, we will try to establish a standardized qCLMS.

      Reviewer #1 (Recommendations for the authors):

      Only a very minor point:

      I may have missed it but it's not really clear how many independent experiments were used for the benchmarking quantitation and mixing experiments for Figure 1. What is the reproducibility across experiments on average and on a per-peptide basis?

      Otherwise, I think the approach would really benefit from at least "Q5linkers" or even "Q10linkers", if possible. And then conduct detailed quantitative studies, either using dilution series or maybe investigating the kinetics of complex formation.

      We used a sample of BSA crosslinked peptides to optimize the MS settings, establish the MS acquisition strategies and test the quantification schemes.  The data in Figure 1 is based on one experiment, in which used ~150 ug of purified pol I complexes from a 6 L culture. We added this information to the Figure 1 legend. We also provide information about the reproducibility of peptide quantification by plotting the observed and expected ratios for each monolinked and crosslinked peptide identified in all of the runs in Figure S3.

      We agree with the reviewer that the Qlinker approach would be even more attractive if multiplex Qlinker reagents were designed. The multiplexed Qlinkers are more difficult and more expensive to synthesize. We are currently working on different schemes for synthesizing multiplexed Qlinkers.

      Reviewer #2 (Recommendations for the authors):

      In addition to the public review I have the following recommendations/questions:

      (1) The first part of the results section where the synthesis of the crosslinker is explained is excellent for mass spec specialists, but problematic for general readers - either more info should be provided (e.g. b1+ ions - most readers will have no idea why that is) - or potentially it could be simplified here and the details shifted to Materials and Methods for the expert reader. The same is true below for the length of spacer arms.

      However - in general this level of detail is great - but can impact the ease of understanding for the more mass spec affine but not expert reader.

      We have added the following sentence to assist the general reader: A b1+ ion is an ion with a charge state of +1 corresponding to the first N-terminal amino acid residue after breakage of the first peptide bond (lines 126-128).

      (2) The Calmodulin experiment (lines 239 to 257) - it is a very nice result that they see the change in the crosslinked peptide between residues K78-K95, but the monolinks are not just detected as described in the text but actually go 2 fold up. This would have been actually a bit expected if the residues are now too far away to be still crosslinked that the monolinks increase. In this case, this counteraction of monolinks to crosslinked sites can also be potentially used as a "selection criteria" for interesting sites that change. Is that a possible interpretation or do the authors think that upregulation of the monolinks is a coincidence and should not be interpreted?

      We agree with the reviewer that both monolinks and crosslinks can be used as potential indicators for some changes. However, it is much more difficult to interpret the abundance information from monolinks because, unlike crosslinks, there is little associated structural/proximity information with monolinks. Because it is difficult to understand the reason(s) for changes in monolink abundance, we concentrate on changes in crosslink abundances, which provide proximity/structural information about the crosslinked residues.

      (3) Lines 267 to 274: a small thing but the structural information provided is quite dense I have to say. Maybe simplify or accompany with some supplemental figures?

      We agree that the structural information is a bit dense especially for readers who are not familiar with the pol II system.  We added a reference to Figure 3c (line 177) to help the reader follow the structural information. 

      As qCLMS is still a relatively new approach for studying conformational changes, the utility of the approach for studying different types of conformational changes is still unclear. Thus, one of the goals of the experiments is to demonstrate the types of conformational changes that can be detected by Q2linkers.  We hope that the detailed descriptions will help structural biologists understand the types of conformational changes that can be detected using Qlinkers.

      (4) Line 280: explain maybe why the sample was fractionated by SCX (I guess to separate the different complexes?).

      SCX was used to reduce the complexity of the peptide mixtures. As the samples are complex and crosslinked peptides are of low abundance compared to normal peptides, SCX can separate the peptides based on their positive charges.  Larger peptides and peptides with higher charge states, such as crosslinked peptides, tend to elute at higher salt concentration during SCX chromatography.  The use of SCX to fractionate complex peptide mixtures is described in the “General crosslinking protocol and workflow optimization” section of the Methods, and we added a sentence to explain why the sample was fractionated by SCX (lines 278-279).

      (5) Lines 354 to 357: "This suggests that the inability to identity most of these crosslinked peptides in both experiments is mainly due to under-sampling during mass spectrometry analysis of the complex samples, rather than the absence of the crosslinked peptides in one of the experiments."

      This is an extremely important point for the interpretation of missing values - have the authors tried to also collect the mass spec data with DIA which is better in recovery of the same peptide signals between different samples? I realize that these are isobaric samples so DIA measurements per se are not useful as the quantification is done on the reporter channels in the MS2, but it would at least give a better idea if the missing signals were simply not picked up for MS2 as claimed by the authors or the modified peptides are just not present. Another possibility is for the authors to at least try to use a "match between the run" function as can be done in Maxquant. One of the strengths of the method is that it is quantitative and two states are analyzed together, but as can be seen in this experiment, more than two states might want to be compared. In such cases, the under-sampling issue (if that is indeed the cause) makes interpretation of many sites hard (due to missing values) and it would be interesting if for example, an analysis approach with a "match between the runs" function could recover some of the missing values.

      We agree that undersampling/missing values is an important issue that needs to be addressed more thoroughly. This also highlights the importance of qCLMS, as conclusions about structural changes based on the presence/absence of certain crosslinked species in database search results may be misleading if the absence of a species is due to under-sampling. We have not tried to collect the data with DIA since we would lose the quantitative information. It would be interesting to see if match between runs can recover some of the missing values. While this could provide evidence to support the under-sampling hypothesis, it would not recover the quantitative information.

      We recommend performing label swap experiments and focusing downstream analysis on the crosslinks/monolinks that are identified on both experiments. Future development of multiplexed Qlinker reagents should help to alleviate under-sampling issues. See response to Reviewer #1.

      (6) Lines 375 to 393 (the whole paragraph): extremely detailed and not easy to follow. Is that level of detail necessary to drive home that point or could it be visualized in enough detail to help follow the text?

      We agree that the paragraph is quite detailed, but we feel that the level of detailed is necessary to describe the types of conformational changes that can be detected by the quantitative crosslinking data, and also illustrate the challenges of interpreting the structural basis for some crosslink abundance changes even when high resolution structural data exists.

      To make it easier to follow, we added a sentence to the legend of Figure 5b. “In the holo-pol II structure (right), Switch 5 bending pulls Rpb1:D1442 away from K15, breaking the salt bridge that is formed in the core pol II structure (left). The increase in the abundances of the Rpb1:15-Rpb6:76 and Rpb1:15-Rpb6:72 crosslinks in holo-pol II is likely attributed to the salt bridge between K15 and D1442 in core pol II which impedes the NHS ester-based reaction between the epsilon amino group of K15 and the crosslinker.”

      (7) Final paragraph in the results section - lines 397 and 398: "All of the intralinks involving Rpb4 are more abundant in holo-pol II as expected." If I understand that experiment correctly the intralinks with Rpb4 should not be present at all as Rpb4 has been deleted. Is that due to interference between the 126 and 127 channels in MS2? If so, then this also sets a bit of the upper limit of quantitative differences that can be seen. The authors should at least comment on that "limitation".

      Yes, we shouldn’t detect any Rpb4 peptides in the sample derived from the Rpb4 knockout strain. The signal from Rpb4 peptides in the DRpb4 sample is likely due to co-eluting ions. To clarify, we changed the text to:

      All of the intralinks involving Rpb4 are more abundant in the holo-pol II sample (even though we don’t expect any reporter ion signal from Rpb4 peptides derived from the ∆Rpb4 pol II sample, we still observed reporter ion signals from the channel corresponding to the DRpb4 sample, potentially due to the presence of low abundance, co-eluting ions)(lines 395-399).

      (8) Materials and Methods - line 690: I am probably missing something but why were two different mass additions to lysine added to the search (I would have expected only one for the crosslinker)?

      The 297 Da modification is for monolinked peptides with one end of the crosslinker hydrolyzed and 18 Da water molecule is added. The 279 Da modification is for crosslinks and sometimes for looplinks (crosslinks involving two lysine residues on the same tryptic peptide).

    1. Author response:

      Review #1:

      Also, they observed no difference in the binding free energy of phosphatidylserine with wild TREM2-Ig and mutant TREM2-Ig, which is a bit inconsistent with the previous report with experiment studies by Journal of Biological Chemistry 293, (2018), Alzheimer's and Dementia 17, 475-488 (2021), Cell 160, 1061-1071 (2015).

      We directly note this contrast with experimental findings in the body of our work, particularly given the known limitations of free energy calculations in MD simulations, as outlined in the Limitations section. Our claim is that the loss of function in the R47H variant extends beyond decreased binding affinities and also impacts binding patterns. As stated in our manuscript: ‘Our observations for both sTREM2 and TREM2 indicate that R47H-induced dysfunction may result not only from diminished ligand binding but also an impaired ability to discriminate between different ligands in the brain, proposing a novel mechanism for loss-of-function.’

      Perhaps the authors made significant efforts to run a number of simulations for multiple models, which is nearly 17 microseconds in total; none of the simulations has been repeated independently at least a couple of times, which makes me uncomfortable to consider this finding technically true. Most of the important conclusions that authors claimed, including the opposite results from previous research, have been made on the single run, which raises the question of whether this observation can be reproduced if the simulation has been repeated independently. Although the authors stated the sampling number and length of MD simulations in the current manuscript as a limitation of this study, it must be carefully considered before concluding rather than based on a single run.

      The reviewer raises an interesting point regarding the repetition of individual simulations, a consideration we carefully evaluated during the design of this study. However, we believe our approach—running multiple independent models of the same system—offers a more rigorous methodology than simply repeating simulations of the same docked model. This strategy allows us to sample several distinct starting configurations, thereby minimizing biases introduced by docking algorithms and single-model reliance.

      In our study, we demonstrate that within the 150 ns timescale of our protein/ligand (PL) simulations, the relatively small ligands are able to move from their initial docking positions to a specific binding site. While ideally, replicates of these independent models would further strengthen the findings, this was not computationally feasible given the unprecedented total duration of our simulations. Importantly, our conclusions are seldom based on the results of a single protein/PL simulation.

      Moreover, the ergodic hypothesis suggests that over sufficiently long timescales, simulations will explore all accessible states. Additionally, we have performed several replicate simulations of our WT and R47H Ig-like domain models in solution, specifically to investigate CDR2 loop dynamics.

      In this case, since the system involves only the protein and lacks the independent replicates seen in the protein/PL simulations, these runs were chosen to effectively capture the stochastic nature of CDR2 loop movement.

      sTREM2 shows a neuroprotective effect in AD, even with the mutations with R47H, as evidenced by authors based on their simulation. sTREM2 is known to bind Aβ within the AD and reduce Aβ aggregation, whereas R47H mutant increases Aβ aggregation. I wonder why the authors did not consider Aβ as a ligand for their simulation studies. As a reader in this field, I would prefer to know the protective mechanism of sTREM2 in Aβ aggregation influenced by the stalk domain.

      Our initial approach for this study used Aβ as a ligand rather than phospholipids. However, we noted the difficulties in simulating Aβ, particularly in choosing relevant Aβ structures and oligomeric states (n-mers). We believe that phospholipids represent an equally pertinent ligand for TREM2, given its critical role in lipid sensing and metabolism. Furthermore, there is growing recognition in the AD research community of the need to move beyond Aβ and focus on other understudied pathological mechanisms.

      In a similar manner, why only one mutation is considered "R47H" for the study? There are more server mutations reported to disrupt tethering between these CDRs, such as T66M. Although this "T66M" is not associated with AD, I guess the stalk domain protective mechanism would not be biased among different diseases. Therefore, it would be interesting to see whether the findings are true for this T66M.

      In most previous studies, the mechanism for CDR destabilization by mutant was explored, like the change of secondary structures and residue-wise interloop interaction pattern. While this is not considered in this manuscript, neither detailed residue-wise interaction that changed by mutant or important for 'ligand binding" or "stalk domain".

      These are both excellent points that deserve extensive investigation. While R47H is the most common and prolific mutation in literature, an extensive catalog of other mutations is important to explore. We are currently preparing two separate publications that will delve into these gaps in more detail, as addressing them was beyond the scope of the present study.

      The comparison between the wild and mutant and other different complex structures must be determined by particular statistical calculations to state the observed difference between different structures is significant. Since autocorrelation is one of the major concerns for MD simulation data for predicting statistical differences, authors can consider bootstrap calculations for predicting statistical significance.

      We are currently working to address this comment to strengthen the validity of our results and statistical conclusions in the revised manuscript.  

      Review #2:

      The authors state that reported differences in ligand binding between the TREM2 and sTREM2 remain unexplained, and the authors cite two lines of evidence. The first line of evidence, which is true, is that there are differences between lipid binding assays and lipid signaling assays. However, signaling assays do not directly measure binding. Secondly, the authors cite Kober et al 2021 as evidence that sTREM2 and TREM2 showed different affinities for Abeta1-42 in a direct binding assay. Unfortunately, when Kober et al measured the binding of sTREM2 and Ig-TREM2 to Abeta they reported statistically identical affinities (Kd = 3.8 {plus minus} 2.9 µM vs 5.1 {plus minus} 3.7 µM) and concluded that the stalk did not contribute measurably to Abeta binding.

      We appreciate the reviewer’s insight and acknowledge the need to clarify our interpretation of Kober et al. (2021). We will adjust and refocus how we reference this evidence from Kober et al. in our revised manuscript. 

      In line with these findings, our energy calculations reveal that sTREM2 exhibits weaker—but still not statistically significant—binding affinities for phospholipids compared to TREM2. These results suggest that while overall binding affinity might be similar, differences in binding patterns or specific lipid interactions could still contribute to functional differences observed between TREM2 and sTREM2.

      The authors appear to take simulations of the Ig domain (without any stalk) as a surrogate for the full-length, membrane-bound TREM2. They compare the Ig domain to a sTREM2 model that includes the stalk. While it is fully plausible that the stalk could interact with and stabilize the Ig domain, the authors need to demonstrate why the full-length TREM2 could not interact with its own stalk and why the isolated Ig domain is a suitable surrogate for this state.

      We believe that this is a major limitation of all computational work of TREM2 to-date, and of experimental work which only presents the Ig-like domain. This is extensively discussed in the limitations section of our paper. Hence, we are currently working toward a manuscript that will be the first biologically relevant model of TREM2 in a membrane and will challenge the current paradigm of using the Ig-like domain as an experimental surrogate for TREM2.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      PPARgamma is a nuclear receptor that binds to orthosteric ligands to coordinate transcriptional programs that are critical for adipocyte biogenesis and insulin sensitivity. Consequently, it is a critical therapeutic target for many diseases, but especially diabetes. The malleable nature and promiscuity of the PPARgamma orthosteric ligand binding pocket have confounded the development of improved therapeutic modulators. Covalent inhibitors have been developed but they show unanticipated mechanisms of action depending on which orthosteric ligands are present. In this work, Shang and Kojetin present a compelling and comprehensive structural, biochemical, and biophysical analysis that shows how covalent and noncovalent ligands can co-occupy the PPARgamma ligand binding pocket to elicit distinctive preferences of coactivator and corepressor proteins. Importantly, this work shows how the covalent inhibitors GW9662 and T0070907 may be unreliable tools as pan-PPARgamma inhibitors despite their widespread use.

      Strengths:

      - Highly detailed structure and functional analyses provide a comprehensive structure-based hypothesis for the relationship between PPARgamma ligand binding domain co-occupancy and allosteric mechanisms of action. - Multiple orthogonal approaches are used to provide high-resolution information on ligand binding poses and protein dynamics.

      - The large number of x-ray crystal structures solved for this manuscript should be applauded along with their rigorous validation and interpretation.

      Weaknesses

      - Inclusion of statistical analysis is missing in several places in the text. - Functional analysis beyond coregulator binding is needed.

      We added additional statistical analyses as recommended (Source Data 1, a Microsoft Excel spreadsheet).

      Related to functional analysis, we cite and studies from our previous publication (Hughes et al. Nature Communications 2014 5:3571) where we demonstrated that the covalent inhibitor ligands (GW9662 and T0070907) do not block the activity of other ligands using a PPARγ transcriptional reporter assay and gene expression analysis in 3T3-L1 preadipocytes. Our study here expands on this finding and other published studies showing the structural mechanism for the lack of blocking activity by the covalent inhibitors.

      Reviewer #2 (Public Review):

      Summary:

      The flexibility of the ligand binding domain (LBD) of NRs allows various modes of ligand binding leading to various cellular outcomes. In the case of PPARγ, it's known that two ligands can co-bind to the receptor. However, whether a covalent inhibitor functions by blocking the binding of a non-covalent ligand, or co-bind in a manner that weakens the binding of a non-covalent ligand remains unclear. In this study, the authors first used TR-FRET and NMR to demonstrate that covalent inhibitors (such as GW9662 and T0070907) weaken but do not prevent non-covalent synthetic ligands from binding, likely via an allosteric mechanism. The AF-2 helix can exchange between active and repressive conformations, and covalent inhibitors shift the conformation toward a transcriptionally repressive one to reduce the orthosteric binding of the non-covalent ligands. By co-crystal studies, the authors further reveal the structural details of various non-covalent ligand binding mechanisms in a ligand-specific manner (e.g., an alternate binding site, or a new orthosteric binding mode by alerting covalent ligand binding pose).

      Strengths:

      The biochemical and biophysical evidence presented is strong and convincing.

      Weaknesses:

      However, the co-crystal studies were performed by soaking non-covalent ligands to LBD pre-crystalized with a covalent inhibitor. Since the covalent inhibitors would shift the LBD toward transcriptionally repressive conformation which reduces orthosteric binding of non-covalent ligands, if the sequence was reversed (i.e., soaking a covalent inhibitor to LBD pre-crystalized with a non-covalent ligand), would a similar conclusion be drawn? Additional discussion will broaden the implications of the conclusion.

      This is an interesting point, which we now expand upon in a new (third) paragraph of the discussion in our revised manuscript:

      “In our previous study, we observed synthetic and natural/endogenous ligand co-binding via co-crystallography where preformed crystals of PPARγ LBD bound to unsaturated fatty acids (UFAs) were soaked with a synthetic ligand, which pushed the bound UFA to an alternate site within the orthosteric ligand-binding pocket 8. In the scenario of synthetic ligand cobinding with a covalent inhibitor, it is possible that soaking a covalent inhibitor into preformed crystals where the PPARγ LBD is already bound to a non-covalent ligand may prove to be difficult. The covalent inhibitor would need to flow through solvent channels within the crystal lattice, which may not be a problem. However, upon reaching the entrance surface to the orthosteric ligand-binding pocket, it may be difficult for the covalent inhibitor to gain access to the region of the orthosteric pocket required for covalent modification as the larger non-covalent ligand could block access. This potential order of addition problem may not be a problem for studies in solution or in cells, where the non-covalent ligand can more freely exchange in and out of the orthosteric pocket and over time the covalent reaction would reach full occupancy.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      - IC50 or EC50 values are not reported for the coregulator interaction assays, R2 for fit should also be reported where Ki and IC50s are disclosed.

      We now report fitting statistics and IC50/EC50 values when possible in Figure 2B and Source Data 1 along with R2 values for the fit. We note that some data do not show complete or robust enough binding curves to faithfully fit to a dose response equation.

      -  Reporter gene or qPCR should be performed for the combinations of covalent and noncovalent ligands to show how these molecules impact transcriptional activities rather than just coregulator binding profiles.

      We previously performed PPARγ transcriptional reporter assay and gene expression analysis in 3T3-L1 preadipocytes to demonstrate that cotreatment of a covalent inhibitor (GW9662 or T0070907) with a non-covalent ligand does not block activity of the non-covalent ligand and showed cobinding-induced activation relative to DMSO control (Hughes et al., 2024 Nature Communications). We did not specifically mention this in our original manuscript, but we now call this out in the first paragraph of the results section.

      - Inclusion of a structure figure to show the different helix 12 orientations should be included in the introduction. Likewise, how the overall structure of the LBD changes as a result of the cobinding in the discussion or a summary model would be helpful.

      Our revised manuscript includes a structure figure called out in the introduction describing the active and repressive helix 12 PPARγ LBD conformations (new Figure 1). There are no major changes to the overall structure of the LBD compared to the active conformation that crystallized, so we did not include a summary model figure but we do refer readers to our previous paper (Shang and Kojetin, Structure 2021 29(9):940-950) in the penultimate paragraph of the discussion. We also added the following sentence to the crystallography results section related to the overall LBD changes:

      “The structures show high structural similarity to the transcriptionally active LBD conformation with rmsd values ranging from 0.77–1.03Å (Supplementary Table S2)”

      A typo in paragraph 3 of the discussion says "long-live" when it should probably say "long-lived."

      We corrected this typo.

      Reviewer #2 (Recommendations For The Authors):

      It's interesting that ligand-specific binding mode of non-covalent ligands was observed. Would modifications of the chemical structure of a covalent inhibitor alter the allosteric binding behavior of non-covalent ligands in a predictive manner? If so, how can such SAR be used to guide the design of covalent inhibitors to more broadly and effectively inhibit agonists of various chemical structures? Discussion on this topic could be valuable.

      This is an interesting point, which we now discuss in the penultimate and last paragraphs of the discussion:

      “Another way to test this structural model could be through the use of covalent PPARγ inverse agonist analogs with graded activity 23, where one might posit that covalent inverse agonist analogs that shift the LBD conformational ensemble towards a fully repressive LBD conformation may better inhibit synthetic ligand cobinding.”

      “It may be possible to use the crystal structures we obtained to guide structure-informed design of covalent inhibitors that would physically block cobinding of a synthetic ligand. This could be the potential mechanism of a newer generation covalent antagonist inhibitor we developed, SR16832, that more completely inhibit alternate site ligand binding of an analog of MRL20, rosiglitazone and the UFA docosahexaenoic acid (DHA)

      21 and thus may be a better choice for the field to use as a covalent ligand inhibitor of PPARγ.”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      How plants perceive their environment and signal during growth and development is of fundamental importance for plant biology. Over the last few decades, nano domain organisation of proteins localised within the plasma-membrane has emerged as a way of organising proteins involved in signal pathways. Here, the authors addressed how a non-surface localised signal (viral infection) was resisted by PM localised signalling proteins and the effect of nano domain organisation during this process. This is valuable work as it describes how an intracellular process affects signalling at the PM where most previous work has focused on the other way round, PM signalling effecting downstream responses in the plant. They identify CPK3 as a specific calcium dependent protein kinase which is important for inhibiting viral spread. The authors then go on to show that CPK3 diffusion in the membrane is reduced after viral infection and study the interaction between CPK3 and the remorins, which are a group of scaffold proteins important in nano domain organisation. The authors conclude that there is an interdependence between CPK3 and remorins to control their dynamics during viral infection in plants.

      Strengths:

      The dissection of which CPK was involved in the viral propagation was masterful and very conclusive. Identifying CPK3 through knockout time course monitoring of viral movement was very convincing. The inclusion of overexpression, constitutively active and point mutation non functioning lines further added to that.

      Weaknesses:

      My main concerns with the work are twofold.

      (1) Firstly, the imaging described and shown is not sufficient to support the claims made. The PM localisation and its non-PM localised form look similar and with no PM stain or marker construct used to support this. The sptPALM data conclusions are nice and fit the narrative. However, no raw data or movie is shown, only representative tracks. Therefore, the data quality cannot be verified and in addition, the reporting of number of single particle events visualised per experiment is absent, only number of cells imaged is reported. Therefore, it is impossible for the reader to appreciate the number of single molecule behaviours obtained and hence the quality of the data.

      (2) Secondly, remorins are involved in a lot of nanodomain controlled processes at the PM. The authors have not conclusively demonstrated that during viral infection the remorin effects seen are solely due to its interaction with CPK3. The sptPALM imaging of REM1.2 in a cpk3 knockout line goes part way to solve this but more evidence would strengthen it in my opinion. How do we not know that during viral infection the entire PM protein dynamics and organisation are altered? Or that CPK3 and REM are at very distant ends of a signalling cascade. Negative control experiments are required here utilising other PM localised proteins which have no role during viral infection. In addition, if the interaction is specific, the transiently expressed CPK3-CA construct (shown to from nano domains) should be expressed with REM1.2-mEOS to show the alterations in single particle behaviour occur due to specific activations of CPK3 and REM1.2 in the absence of PIAMV viral infection and it is not an artefact of whole PM changes in dynamics during viral infection.

      In addition, displaying more information throughout the manuscript (such as raw particle tracking movies and numbers of tracks measured) on the already generated data would strengthen the manuscript further.

      Overall, I think this work has the potential to be a very strong manuscript but additional reporting of methods and data are required and additional lines of evidence supporting interaction claims would significantly strengthen the work and make it exceptional.

      Reviewer #2 (Public Review):

      Summary:

      The paper provides evidence that CPK3 plays a role in plant virus infection, and reports that viral infection is accompanied by changes in the dynamics of CPK3 and REM1.2, the phosphorylation substrate of CPK3, in the plasma membrane. In addition, the dynamics of the two proteins in the PM are shown to be interdependent.

      Strengths:

      The paper contains novel, important information.

      Weaknesses:

      The interpretation of some experimental data is not justified, and the proposed model is not fully based on the available data.

      Reviewer #3 (Public Review):

      Summary:

      This study examined the role that the activation and plasma membrane localisation of a calcium dependent protein kinase (CPK3) plays in plant defence against viruses.<br /> The authors clearly demonstrate that the ability to hamper the cell-to-cell spread of the virus P1AMV is not common to other CPKs which have roles in defence against different types of pathogens, but appears to be specific to CPK3 in Arabidopsis. Further they show that lateral diffusion of CPK3 in the plasma membrane is reduced upon P1AMV infection, with CPK3 likely present in nano-domains. This stabilisation however, depends on one of its phosphorylation substrates a Remorin scaffold protein REM1-2. However, when REM1-2 lateral diffusion was tracked, it showed an increase in movement in response to P1AMV infection. These contrary responses to P1AMV infection were further demonstrated to be interdependent, which led the authors to propose a model in which activated CPK3 is stabilised in nano-domains in part by its interaction with REM1.2, which it binds and phosphorylates, allowing REM1-2 to diffuse more dynamically within the membrane.

      The likely impact of this work is that it will lead to closer examination of the formation of nano-domains in the plasma membrane and dissection of their role in immunity to viruses, as well as further investigation into the specific mechanisms by which CPK3 and REM1-2 inhibit the cell-to-cell spread of viruses.

      Strengths:

      The paper provided compelling evidence about the roles of CPK3 and REM1-2 through a combination of logical reverse genetics experiments and advanced microscopy techniques, particularly in single particle tracking.

      Weaknesses:

      There is a lack of evidence for the downstream pathways, specifically whether the role that CPK3 has in cytoskeletal organisation may play a role in the plant's defence against viral propagation. Also, there is limited discussion about the localisation of the nano-domains and whether there is any overlap with plasmodesmata, which as plant viruses utilise PD to move from cell to cell seems an obvious avenue to investigate.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Viral spread work in CPK mutants with time courses is beautiful!

      Regarding my public points on my issues with the imaging:

      - Figure 2A shows 'PM' localisation of CPK3 and 'non-PM' imaging of CPK3-G2A. The images are near identical both showing cell outlines and cytoplasmic strands. Here a PM marker (such as Lti6B) tagged with a different fluorophore or PM stain should be used in conjunction with surface views (such as in Figure 2C) to show it really is at the PM and the G2A line is not.

      Impaired membrane localization of CPK3-G2A is documented in Mehlmer et al., 2010 using microsomal fractionation. Although Figure 2A main purpose is to show correct expression of the constructs in the lines used for PlAMV propagation (Figure 2B), we replaced the images with wider view pictures to be more representative of the subcellular localization of CPK3 and CPK3-G2A.

      - Regarding Figure 2C, this is extremely noisy and PM heterogeneity is barely observable over the noise from the system (looking at the edges of surface imaged). You mention low resolution was an issue. I notice from the methods you have taken confocal images on an Zeiss 880 with Airyscan. These images must be confocal but If imaged with Airyscan the PM heterogeneity would be much clearer (see work from John Runions lab).

      Indeed, these are tangential views images obtained by Zeiss 880 with Airyscan. Based on tessellation analysis (Figure 2H-J), CPK3 is rather homogeneously distributed and forms ND of around 70nm of diameter. Objects of such size cannot be resolved using pixel reassignment methods such as Airyscan. Note also that AtREM in our study are less heterogeneously distributed than what was described in the literature for StREM1.3.

      - Regarding all sptPALM data. At least an example real data image and video is required otherwise the data can’t be assessed. The work of Alex Martiniere (sptPALM) or Alex Jonson (TIRF) all show raw data so the reader can appreciate the quality of the data. In addition, number of events (particles tracked) has to be shown in the figure legend, not just number of cells otherwise was one track performed per cell? Or 10,000? Obviously where the N sits in this range gives the reader more or less confidence of the data.

      We agree and we added example videos of sptPALM experiments in the supplementary data, we also indicated the number of tracked particles in the figure legends.

      - On a slight technical aside, how do you know the cells being imaged for sptPALM with PIAMV are actually infected with the virus? In Fig 2C you use a GFP tagged version but in sptPALM you use none tagged. I think a sentence in methods on this would help clarify.

      PlAMV-GFP was used for spt-PALM experiment and cell infection was assessed during PALM experiment. This is now precised in the corresponding figures and methods.

      - I also have a concern over some of the representative images showing the same things between different figures. Your clustering data in 3F looks very convincing. However, in Figure 2H the mock and PIAMV-GFP look very similar. How is Figure 3F so different for the same experiment? Especially considering the scale bars are the same for both figures. Same for CPK3-mRFP1.2 in Fig 2C and 3A, the same thing is being imaged, at the same scale (scale bars same size) but the images are extremely different.

      Figure 2 data were generated using CPK3 stably expressed in A. thaliana while Figure 3 data were obtained upon transient over-expression of CPK3 in N. benthamiana. We do not have a clear explanation for such a difference in CPK3 PM behavior, it could lie on a different PM composition or actin organization between those two species, this point is now addressed in the discussion.

      - Line 193&194 - you state that the CA CPK3 is reminiscent of the CPK3 upon PIAMV expression. I don't agree, while CPK3CA is less mobile (2D), the MSD shows it is in-between CPK3 and CPK3 + PIAMV. Therefore, can’t the opposite also be true? That overall the behaviour of CPK3-CA is reminiscent of WT CPK. I think this needs rewording.

      We agree and we reworded that part

      - Line 651 - what numerical aperture are you using for the lens during confocal microscopy. This is fundamentally important information directly related to the reproducibility of the work. You report it for the sptPALM.

      The numerical aperture is now indicated in the methods.

      Regarding my bigger point about specific interactions between CPK3 and remorin during viral infection to strengthen your claim the following need doing. I am not suggesting you do all of these but at least two would significantly enhance the paper.

      (1) Image a none related PM protein during viral infection using sptPALM and demonstrate that its behaviour is not altered (such as lti6b). This would show the affects on remorin behaviour are specific to CPK3 and not a whole scale PM alteration in dynamics due to viral infection.

      (2) Two colour SPT imaging of CPK3 and REM1.2. You show in absence of proteins (knockouts effect on each other) but your only interaction data is from a kinase assay (where CPK1 and 2 also interact, even though they are not localised at the same place) and colocalisation data (see below). A two colour SPT imaging experiment showing interaction and clustering of CPK3 and REM1.2 with each other and the change in their behaviours when viral infected and simultaneously imaged would address all of my concerns.

      - On another note, the co-localisation data (fig 5 sup 4) needs additional analysis. I would expect most PM proteins to show the results you show as the data is very noisy. In order to improve I would zoom in to fill the field of view and then determine correlation and also when one image is rotated 90 degrees (as described in Jarsch et al., plant cell) to enhance this work.

      (3) In the absence of viral infection, but presence of CPK3-CA, is sptPALM REM1.2 behaviour in the PM altered, if so then the interaction is specific and changes in remorin dynamics are not due to whole scale PM changes during viral infection and the manuscript substantially strengthened.

      (4) Building on from 3), if you have a CPK3 mutated with both CPK3-CA and G2A this would be constitutively active and non-PM localised and as such should not affect Remorin behaviour if your model is true, this would strengthen the case significantly but I appreciate is highly artificial and would need to be done transiently.

      Regarding the first point, since the role of PM proteins involved in potexvirus infection is barely assessed, picking a non-related PM protein might be tricky. The data obtained with mEOS3.2-REM1.2 expressed in cpk3 null-mutant point towards a specific role of CPK3 in PlAMV-induced REM1.2 diffusion and not a general alteration of PM protein behavior.

      Regarding the second point, we already reported the in vivo interaction between AtCPK3CA and AtREM1.2/AtREM1.3 by BiFC in N.benthamiana (Perraki et al 2018) and AtCPK3 was shown to co-IP with AtREM1.2 (Abel et al, 2021). While we agree on the relevance of doing dual color sptPALM with CPK3 and REM1.2, it is so far technically challenging and we would not be able to implement this in a timely manner. For the colocalization, although the whole cell is displayed in the figure, the analysis was performed on ROI to fill the field of analysis.

      We agree with the relevance of adding the colocalization analysis of randomized images (mTagBFP2 channel rotated 90 degrees), this is now added to Figure 5 – supplement figure 5.

      Finally, for the third and fourth points, spt-PALM analysis of REM1.2 in presence of CPK3-CA and CPK3-CA-G2A was performed (Figure 5 - figure supplement 4). The results suggest a specific role of CPK3-CA in REM1.2 diffusion.

      Minor points:

      Line 59 - from, I think you mean from.

      Line 63 - Reference needed after latter.

      Line 68 - Reference required after viral infection.

      Line 85 - Propose not proposed.

      Line 156 - Allowed us to not allows to.

      Line 204 - add we previously 'demonstrated'

      Line 622 and 623 - You say lines obtained from Thomas Ott. This is very odd phrasing considering he is an author. I appreciate citing the work producing the lines but maybe reword this

      These points were corrected, thank you.

      Reviewer #2 (Recommendations For The Authors):

      The paper provides evidence that CPK3 plays a role in plant virus infection, and reports that viral infection is accompanied by changes in the dynamics of CPK3 and REM1.2, the phosphorylation substrate of CPK3, in the plasma membrane. In addition, the dynamics of the two proteins in the PM are shown to be interdependent. The paper contains novel, important information that can undoubtedly be published in eLife. However, I have some concerns that should be addressed before it can be accepted for publication.

      Major concerns

      When the authors say that CPK3 plays a role in viral propagation, it should be clarified what is meant by 'propagation', - replication of the viral genome, its cell-to-cell transport, or long-distance transport via the phloem. By default the readers will tend to assume the former meaning. In my opinion, the term 'propagation' is misleading and should be avoided.

      We purposely chose the term “propagation” because it sums replication and cell-to-cell movement. Nevertheless, we previously showed that group 1 StREM1.3 doesn’t alter PVX replication (Raffaele et al., 2009 The Plant Cell). In this paper, as we do not investigate the role of AtREM1.2 or AtCPK3 in the replication of the viral PlAMV genome, we cannot state that these proteins are strictly involved in cell-to-cell movement of the virus.

      The authors show that viral infection is associated with decreased diffusion of CPK3 and increased diffusion of REM1.2 in the PM. However, it remains unclear whether these changes are related to partial resistance to viral infection involving CPK3 and REM1.2, or whether they are simply a consequence of viral infection that may lead to altered PM properties and altered dynamics of PM-associated proteins. Therefore, the model presented in Fig. 6 appears to be entirely speculative, as it postulates that changes in CPK3 and REM1.2 dynamics are the cause of suppressed virus 'propagation'. In addition, the model implies that a decrease in CPK3 mobility leads to activation of its kinase activity. This view is not supported by experimental data (see my next comment). The model should be deleted (both as the figure and its description in the Discussion) or substantially reworked so that it finally relies on existing data.

      For the first point, the results obtained from the additional experiments proposed by reviewer #1 supports the hypothesis of a direct impact of CPK3 on REM1.2 diffusion (Figure 5 - figure supplement 4).

      We agree with the second point and reworked the model to remove the link between CPK3 activation and its increased diffusion.

      The statement that 'changes in CPK3 dynamics upon PlAMV infection are linked to its activation' (line 194) is based on a flawed logic, and the conclusion in this section of Results ('changes in CPK3 dynamics upon PlAMV infection are linked to its activation') is incorrect, as it is not supported by experimental data. In fact, the authors show that CPK3 dynamics and clustering upon viral infection is somewhat reminiscent of the behavior of a CPK3 deletion mutant, which is a constitutively active protein kinase. However, this partial similarity cannot be taken as evidence that CPK3 dynamics upon PlAMV infection are related to its activation. Furthermore, the authors emphasize the similarity of the mutant and CPK3 in infected cells without taking into account a drastic difference in their localization (Fig. 3A, middle and right panels) showing that the reduced dynamics or the compared proteins may have different causes. I suggest the removal of the section 'CPK3 activation leads to its confinement in PM ND' from the paper, as the results included in this section are not directly related to other data presented.

      The PM lateral organization of PM-bound CPKs in their native or constitutively active form as well as the role of lipid in such phenomenon was never shown before. We believe that this section contains relevant information for the community. We kept the section but reworded it to tamper the correlation made between CPK3 PM organization upon viral infection and its activation.

      Line 270 - 'group 1 REMs might play a role in CPK3 domain stabilization upon viral infection'. This is an overstatement. The size of the CPK3-containing NDs may have no correlation with their stability.

      We reworded the sentence.

      Minor points

      Line 204 - we previously that Line 234 and hereafter - "the D" sounds strange. Suggest using "the diffusion coefficient".

      This was reworded.

      Reviewer #3 (Recommendations For The Authors):

      The authors have previously demonstrated that there was an increase in REM1.2 localisation to plasmodesmata under viral challenge. It would be useful to see if there was any co-localisation of REM1.2 and CPK3 with plasmodesmata in response to PlAMV and how this is affected in the mutants. This could be carried out relatively simply using aniline blue.

      These experiments were added to the supplementary data of Figure 2 – figure supplement 2.  and Figure 4 – figure supplement 4. , no enrichment of CPK3 or REM1.2 at plasmodesmata could be observed upon PlAMV infection.

      Fig 3 supplementary figure 2 would be better incorporated into the main body of Figure 3 as this underpins discussion on the involvement of lipids such as sterols in the formation of nanodomains.

      We moved Figure 3 – Supplementary figure 2 to the main body of Figure 3.

      Minor corrections:

      Whilst the paper is generally well written there are a number of grammatical errors:

      Line 1 & 2: Title doesn't quite read correctly, suggest a rewording for clarity.

      L31: Insert "a"after only

      L33: Replace "are playing" with "play"

      L34: Begin sentence "Viruses are intracellular pathogens and as such the role..."59: replace "form" with "from"

      L63: Insert "was demonstrated" after REM1.2)

      L85: Replace "proposed" with "propose"

      L86: replace "encouraging to explore" with "which will encourage further exploration of "

      L129: replace "we'll focus on" with "we concentrated on"

      L131: insert "an" before ATP

      L138: change "among" to "amongst"

      L156: change "allows to analyse" to "allows the analysis of"

      L204: Insert "showed" after previously.

      L232: "root seedlings" should this be the roots of seedlings?

      L235: insert "to" after "as"

      L280: insert "a" after "only"

      L281: change " to play" with "as playing": change CA to superscript

      L307: Insert "was" after "transcription"

      L320: change "display" to "displaying"

      L321: change "form" to forms"

      L340: "hampering" should come before viral

      L365: insert"us' after "allow"

      Thank you, these were corrected

    1. Author response:

      Public Reviews: 

      Reviewer #1 (Public review): 

      Summary: 

      This paper provides a computational model of a synthetic task in which an agent needs to find a trajectory to a rewarding goal in a 2D-grid world, in which certain grid blocks incur a punishment. In a completely unrelated setup without explicit rewards, they then provide a model that explains data from an approach-avoidance experiment in which an agent needs to decide whether to approach or withdraw from, a jellyfish, in order to avoid a pain stimulus, with no explicit rewards. Both models include components that are labelled as Pavlovian; hence the authors argue that their data show that the brain uses a Pavlovian fear system in complex navigational and approach-avoid decisions. 

      We thank the reviewer for their thoughtful comments. To clarify, the grid-world setup was used as a didactic tool/testbed to understand the interaction between Pavlovian and instrumental systems (lines 80-81) [Dayan et al., 2006], specifically in the context of safe exploration and learning. It helps us delineate the Pavlovian contributions during learning, which is key to understanding the safety-efficiency dilemma we highlight. This approach generates a hypothesis about outcome uncertainty-based arbitration between these systems, which we then test in the approach-withdrawal VR experiment based on foundational studies studying Pavlovian biases [Guitart-Masip et al., 2012, Cavanagh et al., 2013].

      Although the VR task does not explicitly involve rewards, it provides a specific test of our hypothesis regarding flexible Pavlovian fear bias, similar to how others have tested flexible Pavlovian reward bias without involving punishments (e.g., Dorfman & Gershman, 2019). Both the simulation and VR experiment models are derived from the same theoretical framework and maintain algebraic mapping, differing only in task-specific adaptations (e.g., differing in action sets and temporal difference learning for multi-step decisions in the grid world vs. Rescorla-Wagner rule for single-step decisions in the VR task). This is also true for Dayan et al. [2006] who bridge Pavlovian bias in a Go-No Go task (negative auto-maintenance pecking task) and a grid world task. Therefore, we respectfully disagree that the two setups are completely unrelated and that both models include components merely labelled as Pavlovian.

      We will rephrase parts of the manuscript to prevent the main message of our manuscript from being misconveyed. Particularly in the Methods and Discussion, to clarify that our main focus is on Pavlovian fear bias in safe exploration and learning (as also summarised by reviewers #2 and #3), rather than on its role in complex navigational decisions. We also acknowledge the need for future work to capture more sophisticated safe behaviours, such as escapes and sophisticated planning which span different aspects of the threat-imminence continuum [Mobbs et al., 2020], and we will highlight these as avenues for future research.

      In the first setup, they simulate a model in which a component they label as Pavlovian learns about punishment in each grid block, whereas a Q-learner learns about the optimal path to the goal, using a scalar loss function for rewards and punishments. Pavlovian and Q-learning components are then weighed at each step to produce an action. Unsurprisingly, the authors find that including the Pavlovian component in the model reduces the cumulative punishment incurred, and this increases as the weight of the Pavlovian system increases. The paper does not explore to what extent increasing the punishment loss (while keeping reward loss constant) would lead to the same outcomes with a simpler model architecture, so any claim that the Pavlovian component is required for such a result is not justified by the modelling. 

      Thank you for this comment. We acknowledge that our paper does not compare the Pavlovian fear system to a purely instrumental system with varying punishment sensitivity. Instead, our model assumes the coexistence of these two systems and demonstrates the emergent safety-efficiency trade-off from their interaction. It is possible that similar behaviours could be modelled using an instrumental system alone. In light of the reviewer’s comment, we will soften our claims regarding the necessity of the Pavlovian system, despite its known existence.

      We also encourage the reviewer to consider the Pavlovian system as a biologically plausible implementation of punishment sensitivity. Unlike punishment sensitivity (scaling of the punishments), which has not been robustly mapped to neural substrates in fMRI studies, the neural substrates for the Pavlovian fear system (e.g., the limbic loop) are well known (see Supplementary Fig. 16).

      Additionally, we point out that varying reward sensitivities while keeping punishment sensitivity constant allows our PAL agent to differentiate from an instrumental agent that combines reward and punishment into a single feedback signal. As highlighted in lines 136-140 and the T-maze experiment (Fig. 3 A, B, C), the Pavlovian system maintains fear responses even under high reward conditions, guiding withdrawal behaviour when necessary (e.g., ω = 0.9 or 1), which is not possible with a purely instrumental model if the punishment sensitivities are fixed. This is a fundamental point.

      We will revise our discussion and results sections to reflect these clarifications.

      In the second setup, an agent learns about punishments alone. "Pavlovian biases" have previously been demonstrated in this task (i.e. an overavoidance when the correct decision is to approach). The authors explore several models (all of which are dissimilar to the ones used in the first setup) to account for the Pavlovian biases. 

      Thank you, we respectfully disagree with the statement that our models used in the experimental setup are dissimilar to the ones used in the first setup. Due to differences in the nature of the task setup, the action set differs, but the model equations and the theory are the same and align closely, as described in our response above. The only additional difference is the use of a baseline bias in human experiments and the RLDDM model, where we also model reaction times with drift rates which is not a behaviour often simulated in grid world simulations. We will improve our Methods section to ensure that model similarity is highlighted.

      Strengths: 

      Overall, the modelling exercises are interesting and relevant and incrementally expand the space of existing models. 

      We thank reviewer #1 for acknowledging the relevance of our models in advancing the field. We would like to further highlight that, to the best of our knowledge, this is the first time reaction times in Pavlovian-Instrumental arbitration tasks have been modelled using RLDDM, which adds a novel dimension to our approach.

      Weaknesses: 

      I find the conclusions misleading, as they are not supported by the data. 

      First, the similarity between the models used in the two setups appears to be more semantic than computational or biological. So it is unclear to me how the results can be integrated. 

      We acknowledge the dissimilarity between the task setups (grid-world vs. approach-withdrawal). However, we believe these setups are computationally similar and may be biologically related, as suggested by prior work like Dayan et al. [2006], which integrates Go-No Go and grid-world tasks. Just as that work bridged findings in the appetitive domain, we aim to integrate our findings in the aversive domain. We will provide a more integrated interpretation in the discussion section of the revised manuscript.

      Dayan, P., Niv, Y., Seymour, B., and Daw, N. D. (2006). The misbehavior of value and the discipline of the will. Neural networks, 19(8):1153–1160.

      Secondly, the authors do not show "a computational advantage to maintaining a specific fear memory during exploratory decision-making" (as they claim in the abstract). Making such a claim would require showing an advantage in the first place. For the first setup, the simulation results will likely be replicated by a simple Q-learning model when scaling up the loss incurred for punishments, in which case the more complex model architecture would not confer an advantage. The second setup, in contrast, is so excessively artificial that even if a particular model conferred an advantage here, this is highly unlikely to translate into any real-world advantage for a biological agent. The experimental setup was developed to demonstrate the existence of Pavlovian biases, but it is not designed to conclusively investigate how they come about. In a nutshell, who in their right mind would touch a stinging jellyfish 88 times in a short period of time, as the subjects do on average in this task? Furthermore, in which real-life environment does withdrawal from a jellyfish lead to a sting, as in this task? 

      Thank you for your feedback. As mentioned above, we invite the reviewer to potentially think of Pavlovian fear systems as a way how the brain might implement punishment sensitivity. Secondly, it provides a separate punishment memory that cannot be overwritten with higher rewards (see also Elfwing and Seymour 2017, and Wang et al, 2021)

      Elfwing, S., & Seymour, B. (2017, September). Parallel reward and punishment control in humans and robots: Safe reinforcement learning using the MaxPain algorithm. In 2017 Joint IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob) (pp. 140-147). IEEE. 

      Wang, J., Elfwing, S., & Uchibe, E. (2021). Modular deep reinforcement learning from reward and punishment for robot navigation. Neural Networks, 135, 115-126.

      The simulation setups such as the following grid-worlds are common test-beds for algorithms in reinforcement learning [Sutton and Barto, 2018].

      Any experimental setup faces the problem of having a constrained experiment designed to test and model a specific effect versus designing a lesser constrained exploratory experiment which is more difficult to model. Here we chose the former, building upon previous foundational experiments on Pavlovian bias in humans [Guitart-Masip et al., 2012, Cavanagh et al., 2013].  The condition where withdrawal from a jellyfish leads to a sting, though less realistic, was included for balancing the four cue-outcome conditions. Overall the task was designed to isolate the effect we wanted to test - Pavlovian fear bias in choices and reaction times, to the best of our ability. In a free operant task, it is very well likely that other components not included in our model could compete for control.

      Crucially, simplistic models such as the present ones can easily solve specifically designed lab tasks with low dimensionality but they will fail in higher-dimensional settings. Biological behaviour in the face of threat is utterly complex and goes far beyond simplistic fight-flight-freeze distinctions (Evans et al., 2019). It would take a leap of faith to assume that human decision-making can be broken down into oversimplified sub-tasks of this sort (and if that were the case, this would require a meta-controller arbitrating the systems for all the sub-tasks, and this meta-controller would then struggle with the dimensionality j). 

      We agree that safe behaviours, such as escapes, involve more sophisticated computations. We do not propose Pavlovian fear bias as the sole computation for safe behavior, but rather as one of many possible contributors. Knowing about the existence about the Pavlovian withdrawal bias, we simply study its possible contribution. We will include in our discussion that such behaviours likely occupy different parts of the threat-imminence continuum [Mobbs et al., 2020].

      Dean Mobbs, Drew B Headley, Weilun Ding, and Peter Dayan. Space, time, and fear: survival computations along defensive circuits. Trends in cognitive sciences, 24(3):228–241, 2020.

      On the face of it, the VR task provides higher "ecological validity" than previous screen-based tasks. However, in fact, it is only the visual stimulation that differs from a standard screen-based task, whereas the action space is exactly the same. As such, the benefit of VR does not become apparent, and its full potential is foregone. 

      We thank the reviewer for their comment. We selected the action space to build on existing models [Guitart-Masip et al., 2012, Cavanagh et al., 2013] that capture Pavlovian biases and we also wanted to minimize participant movement for EEG data collection. Unfortunately, despite restricting movement to just the arm, the EEG data was still too noisy to lead to any substantial results. We will explore more free-operant paradigms in future works.

      On the issue of the difference between VR and lab-based tasks, we note the reviewer's point. Note however that desktop monitor-based tasks lack the sensorimotor congruency between the action and the outcome. Second, it is also arguable, that the background context is important in fear conditioning, as it may help set the tone of the fear system to make aversive components easier to distinguish.

      If the authors are convinced that their model can - then data from naturalistic approach-avoidance VR tasks is publicly available, e.g. (Sporrer et al., 2023), so this should be rather easy to prove or disprove. In summary, I am doubtful that the models have any relevance for real-life human decision-making. 

      We thank the reviewers for their thoughtful inputs. We do not claim our model is the best fit for all naturalistic VR tasks, as they require multiple systems across the threat-imminence continuum [Mobbs et al., 2020] and are currently beyond the scope of the current work. However, we believe our findings on outcome-uncertainty-based arbitration of Pavlovian bias could inform future studies and may be relevant for testing differences in patients with mental disorders, as noted by reviewer #2. At a general level, it can be said that most well-controlled laboratory-based tasks need to bridge a sizeable gap to applicabilty in real-life naturalistic behaviour; although the principle of using carefully designed tasks to isolate individual factors is well established

      Finally, the authors seem to make much broader claims that their models can solve safety-efficiency dilemmas. However, a combination of a Pavlovian bias and an instrumental learner (study 1) via a fixed linear weighting does not seem to be "safe" in any strict sense. This will lead to the agent making decisions leading to death when the promised reward is large enough (outside perhaps a very specific region of the parameter space). Would it not be more helpful to prune the decision tree according to a fixed threshold (Huys et al., 2012)? So, in a way, the model is useful for avoiding cumulatively excessive pain but not instantaneous destruction. As such, it is not clear what real-life situation is modelled here. 

      We thank the reviewer for their comments and ideas. In our discussion lines 257-264, we discuss other works which identify similar safety-efficiency dilemmas, in different models. Here, we simply focus on the safety-efficiency trade-off arising from the interactions between Pavlovian and instrumental systems. It is important to note that the computational argument for the modular system with separate rewards and punishments explicitly protects (up to a point, of course) against large rewards leading to death because the Pavlovian fear response is not over-written by successful avoidance in recent experience. Note also that in animals, reward utility curves are typically convex. We will clarify this in the discussion section.

      We completely agree that in certain scenarios, pruning decision trees could be more effective, especially with a model-based instrumental agent. Here we utilise a model-free instrumental agent, which leads to a simpler model - which is appreciated by some readers such as reviewer #2. Future work can incorporate model-based methods.

      A final caveat regarding Study 1 is the use of a PH associability term as a surrogate for uncertainty. The authors argue that this term provides a good fit to fear-conditioned SCR but that is only true in comparison to simpler RW-type models. Literature using a broader model space suggests that a formal account of uncertainty could fit this conditioned response even better (Tzovara et al., 2018). 

      We thank the reviewer for bringing this to our notice. We will discuss Tzovara et al., 2018 in our discussion in our revised manuscript.

      Reviewer #2 (Public review): 

      Summary: 

      The authors tested the efficiency of a model combining Pavlovian fear valuation and instrumental valuation. This model is amenable to many behavioral decision and learning setups - some of which have been or will be designed to test differences in patients with mental disorders (e.g., anxiety disorder, OCD, etc.). 

      Strengths: 

      (1) Simplicity of the model which can at the same time model rather complex environments. 

      (2) Introduction of a flexible omega parameter. 

      (3) Direct application to a rather advanced VR task. 

      (4) The paper is extremely well written. It was a joy to read. 

      Weaknesses: 

      Almost none! In very few cases, the explanations could be a bit better. 

      We thank reviewer #2 for their positive feedback and thoughtful recommendations. We will ensure that, in our revision, we clarify the explanations in the few instances where they may not be sufficiently detailed, as noted.

      Reviewer #3 (Public review): 

      Summary: 

      This paper aims to address the problem of exploring potentially rewarding environments that contain the danger, based on the assumption that an independent Pavlovian fear learning system can help guide an agent during exploratory behaviour such that it avoids severe danger. This is important given that otherwise later gains seem to outweigh early threats, and agents may end up putting themselves in danger when it is advisable not to do so. 

      The authors develop a computational model of exploratory behaviour that accounts for both instrumental and Pavlovian influences, combining the two according to uncertainty in the rewards. The result is that Pavlovian avoidance has a greater influence when the agent is uncertain about rewards. 

      Strengths: 

      The study does a thorough job of testing this model using both simulations and data from human participants performing an avoidance task. Simulations demonstrate that the model can produce "safe" behaviour, where the agent may not necessarily achieve the highest possible reward but ensures that losses are limited. Interestingly, the model appears to describe human avoidance behaviour in a task that tests for Pavlovian avoidance influences better than a model that doesn't adapt the balance between Pavlovian and instrumental based on uncertainty. The methods are robust, and generally, there is little to criticise about the study. 

      Weaknesses: 

      The extent of the testing in human participants is fairly limited but goes far enough to demonstrate that the model can account for human behaviour in an exemplar task. There are, however, some elements of the model that are unrealistic (for example, the fact that pre-training is required to select actions with a Pavlovian bias would require the agent to explore the environment initially and encounter a vast amount of danger in order to learn how to avoid the danger later). The description of the models is also a little difficult to parse. 

      We thank reviewer #3 for their thoughtful feedback and useful recommendations, which we will take into account while revising the manuscript.

      We acknowledge the complexity of specifying Pavlovian bias in the grid world and appreciate the opportunity to elaborate on how this bias is modelled. In the human experiment, the withdrawal action is straightforwardly biased, as noted, while in the grid world, we assume a hardwired encoding of withdrawal actions for each state/grid. This innate encoding of withdrawal actions could be represented in the dPAG [Kim et. al., 2013]. We implement this bias using pre-training, which we assume would be a product of evolution. Alternatively, this could be interpreted as deriving from an appropriate value initialization where the gradient over initialized values determines the action bias. Such aversive value initialization, driving avoidance of novel and threatening stimuli, has been observed in the tail of the striatum in mice, which is hypothesized to function as a Pavlovian fear/threat learning system [Menegas et. al., 2018].

      Additionally, we explored the possibility of learning the action bias on the fly by tracking additional punishment Q-values instead of pre-training, which produced similar cumulative pain and step plots. While this approach is redundant, and likely not how the brain operates, it demonstrates an alternative algorithm.

      We thank the reviewer for pointing out these potentially unrealistic elements, and we will revise the manuscript to clarify and incorporate these explanations and improve the model descriptions.

      Eun Joo Kim, Omer Horovitz, Blake A Pellman, Lancy Mimi Tan, Qiuling Li, Gal Richter-Levin, and Jeansok J Kim. Dorsal periaqueductal gray-amygdala pathway conveys both innate and learned fear responses in rats. Proceedings of the National Academy of Sciences, 110(36):14795–14800, 2013

      William Menegas, Korleki Akiti, Ryunosuke Amo, Naoshige Uchida, and Mitsuko Watabe-Uchida. Dopamine neurons projecting to the posterior striatum reinforce avoidance of threatening stimuli. Nature neuroscience, 21(10): 1421–1430, 2018

    1. Author response:

      Public Reviews: 

      Reviewer #1 (Public review): 

      Summary: 

      The study significantly advances our understanding of how exosomes regulate filopodia formation. Filopodia play crucial roles in cell movement, polarization, directional sensing, and neuronal synapse formation. McAtee et al. demonstrated that exosomes, particularly those enriched with the protein THSD7A, play a pivotal role in promoting filopodia formation through Cdc42 in cancer cells and neurons. This discovery unveils a new extracellular mechanism through which cells can control their cytoskeletal dynamics and interaction with their surroundings. The study employs a combination of rescue experiments, live-cell imaging, cell culture, and proteomic analyses to thoroughly investigate the role of exosomes and THSD7A in filopodia formation in cancer cells and neurons. These findings offer valuable insights into fundamental biological processes of cell movement and communication and have potential implications for understanding cancer metastasis and neuronal development. 

      Weaknesses: 

      The conclusions of this study are in most cases supported by data, but some aspects of data analysis need to be better clarified and elaborated. Some conclusions need to be better stated and according to the data observed. 

      We appreciate the reviewer's recognition of the impact of our study.  We will address the concerns about data analysis and statement of our conclusions in our full response to reviewers.

      Reviewer #2 (Public review): 

      Summary: 

      The authors show that small EVs trigger the formation of filopodia in both cancer cells and neurons. They go on to show that two cargo proteins, endoglin, and THSD7A, are important for this process. This possibly occurs by activating the Rho-family GTPase CDC42. 

      Strengths: 

      The EV work is quite strong and convincing. The proteomics work is well executed and carefully analyzed. I was particularly impressed with the chick metastasis assay that added strong evidence of in vivo relevance. 

      Weaknesses: 

      The weakest part of the paper is the Cdc42 work at the end of the paper. It is incomplete and not terribly convincing. This part of the paper needs to be improved significantly.

      We appreciate the reviewer's recognition of the impact of our study.  Indeed, more work needs to be done to clarify the role of Cdc42 in the induction of filopodia by exosome-associated THSD7A.  We anticipate that this will be a separate manuscript, delving in-depth into how exosome-associated THSD7A interacts with recipient cells to activate Cdc42 and carrying out a variety of assays for Cdc42 activation.

      Reviewer #3 (Public review): 

      Summary: 

      The authors identify a novel relationship between exosome secretion and filopodia formation in cancer cells and neurons. They observe that multivesicular endosomes (MVE)-plasma membrane (PM) fusion is associated with filopodia formation in HT1080 cells and that MVEs are present in filopodia in primary neurons. Using overexpression and knockdown (KD) of Rab27/HRS in HT1080 cells, melanoma cells, and/or primary rat neurons, they found that decreasing exosome secretion reduces filopodia formation, while Rab27 overexpression leads to the opposite result. Furthermore, the decreased filopodia formation is rescued in the Rab27a/HRS KD melanoma cells by the addition of small extracellular vesicles (EVs) but not large EVs purified from control cells. The authors identify endoglin as a protein unique to small EVs secreted by cancer cells when compared to large EVs. KD of endoglin reduces filopodia formation and this is rescued by the addition of small EVs from control cells and not by small EVs from endoglin KD cells. Based on the role of filopodia in cancer metastasis, the authors then investigate the role of endoglin in cancer cell metastasis using a chick embryo model. They find that injection of endoglin KD HT1080 cells into chick embryos gives rise to less metastasis compared to control cells - a phenotype that is rescued by the co-injection of small EVs from control cells. Using quantitative mass spectrometry analysis, they find that thrombospondin type 1 domain containing 7a protein (THSD7A) is downregulated in small EVs from endoglin KD melanoma cells compared to those from control cells. They also report that THSD7A is more abundant in endoglin KD cell lysate compared to control HT1080 cells and less abundant in small EVs from endoglin KD cells compared to control cells, indicating a trafficking defect. Indeed, using immunofluorescence microscopy, the authors observe THSD7A-mScarlet accumulation in CD63-positive structures in endoglin KD HT1080 cells, compared to control cells. Finally, the authors determine that exosome-secreted THSD7A induces filopodia formation in a Cdc42-dependent mechanism. 

      Strengths: 

      (1) While exosomes are known to play a role in cell migration and autocrine signaling, the relationship between exosome secretion and the formation of filopodia is novel. 

      (2) The authors identify an exosomal cargo protein, THSD7A, which is essential for regulating this function. 

      (3) The data presented provide strong evidence of a role for endoglin in the trafficking of THSD7A in exosomes. 

      (4) The authors associate this process with functional significance in cancer cell metastasis and neurological synapse formation, both of which involve the formation of filopodia. 

      (5) The data are presented clearly, and their interpretation appropriately explains the context and significance of the findings. 

      Weaknesses: 

      (1) A better characterization of the nature of the small EV population is missing: 

      It is unclear why the authors chose to proceed to quantitative mass spectrometry with the bands in the Coomassie from size-separated EV samples, as there are other bands present in the small EV lane but not the large EV lane. This is important to clarify because it underlies how they were able to identify THSD7A as a unique regulator of exosome-mediated filopodia formation. Is there a reason why the total sample fractions were not compared? This would provide valuable information on the nature of the small and large EV populations. 

      We would like to clarify that there are two sets of proteomics data in the manuscript. The first was comparing bands from a Coomassie gel from two samples: small EVs and large EVs from B16F1 cells. In this proteomics experiment, we identified endoglin as present in small EVs, but not large EVs. For this experiment, we only sent 4 bands from the small EV lane, chosen based on their obvious banding pattern difference on the Coomassie gel.

      In the second proteomics experiment, we used quantitative iTRAQ proteomics to compare small EVs purified from B16F1 control (shScr) and endoglin KD (shEng1 and shEng2) cell lines. In this experiment, we sent total protein extracted from small EV samples for analysis. So, these samples included the entire EV content, not just selected bands from a gel. In this experiment, we identified THSD7A as reduced in the shEng small EVs.

      (2) Data analysis and quantification should be performed with increased rigor: 

      a) Figure 1C - The optical and temporal resolution are insufficient to conclusively characterize the association between exosome secretion and filopodia. Specifically, the 10-second interval used in the image acquisitions is too close to the reported 20-second median time between exosome secretion and filopodia formation. Two-5 sec intervals should be used to validate this. It would also be important to correlate the percentage of filopodia events that co-occur with exosome secretion. Is this a phenomenon that occurs with most or only a small number of filopodia? Additionally, resolution with typical confocal microscopy is subpar for these analyses. TIRF microscopy would offer increased resolution to parse out secretion events. As the TIRF objective is listed in the Methods section, figure legends should mention which images were acquired using TIRF microscopy. 

      We acknowledge that the frame rate naturally limits our estimates of the timing of filopodia formation after exosome secretion. We set out to show a relationship between exosome secretion and filopodia formation, based on their proximity in timing. While our data set shows a median time interval of 20 seconds, the true median could be between 10-30 seconds, based on our frame rate.  Regardless of the exact timing, our data show that exosome secretion is rapidly followed by filopodia formation events.

      To address the question of the percentage of filopodia events that are preceded by exosome secretion, the reviewer is correct in stating that we might need TIRF microscopy to get an accurate calculation of this number.  Nonetheless, we will review our live imaging data for this experiment to determine if this calculation is possible. Again, we will be limited by the frame rate we used to capture the images, so we could possibly be missing secretion events taking place between the 10 second time intervals.  Regardless, for the secretion events that we visualized, we always observed subsequent filopodia formation.

      No TIRF imaging was used in this manuscript.  A TIRF objective was used for selected neuron imaging (see methods); however, it was used for spinning disk confocal microscopy, not for TIRF imaging.  We will clarify this in the methods.

      b) Figure 2 - It would be important to perform further analysis to concretely determine the relationship between exosome secretion and filopodia stability. Are secretion events correlated with the stability of filopodia? Is there a positive feedback loop that causes further filopodia stability and length with increased secretion? Furthermore, is there an association between the proximity of secretion with stability? Quantification of filopodia more objectively (# of filopodia/cell) would be helpful. 

      Our data shows that manipulation of general exosome secretion, via Hrs knockdown, affects both de novo filopodia formation and filopodia stability (Fig 2g,h). Interestingly, knockdown of endoglin only affects de novo filopodia formation, while filopodia stability is unaffected (Fig 4g,h). These results suggest that filopodia stability is dependent upon exosome cargoes besides endoglin/THSD7A.  Such cargoes might include other extracellular matrix molecules, such as fibronectin. We previously showed that exosomes promote nascent cell adhesion and rapid cell migration, through exosome-bound fibronectin (Sung et al., Nature Communications, 6:7164, 2015). We also previously found that inhibition of exosome secretion affects the persistence of invadopodia, which are filopodia-dependent structures (Hoshino et al., Cell Reports, 5:1159-1168, 2013).  We agree that this is an interesting research direction, and perhaps future work could focus on exosomal factors that are responsible for filopodia persistence.

      With regard to the way we plotted the filopodia data, we plotted the cancer cell data as filopodia per cell area so that it matched the neuron data, which was plotted as filopodia per 100 mm of dendrite distance. Since the neurons cannot be imaged as a whole cell, the quantification is based on the length of the dendrite in the image. We found that graphing the cancer cell data as filopodia per cell gave similar results as filopodia per cell area, as there were no significant differences in cell area between conditions and experiments. We plan to include a new supplementary figure showing the data in Figure 2 plotted as filopodia per cell to show that this quantification gives the same results.

      c) Figure 6 - Why use different gel conditions to detect THSD7A in small EVs from B16F1 cells vs HT1080 and neurons? Why are there two bands for THSD7A in panels C and E? It is difficult to appreciate the KD efficiency in E. The absence of a signal for THSD7A in the HT1080 shEng small EVs that show a signal for endoglin is surprising. The authors should provide rigorous quantification of the westerns from several independent experimental repeats. 

      Detection of THSD7A via Western blot was, unfortunately, not straightforward and simple. Due to the large size (~260 kDa) of THSD7A, its low level of expression in cancer cells, as well as the inconsistency of commercially available THSD7A antibodies, we had to troubleshoot multiple conditions.  We found that it was much easier to detect THSD7A in the human fibrosarcoma cell line HT1080 than in the mouse B16F1 cells, both in the cell lysates and in the small EVs. We were usually unable to detect THSD7A using these same conditions for the mouse melanoma B16F1 samples, but were successful using native gel conditions. We also detected THSD7A in rat primary neuron samples. All these samples were from different source organisms (human, mouse, rat) and from either cell lysates or extracellular vesicles, further complicating the analyses. Expression and maturation of THSD7A in these different cell types and compartments could involve different post-translational modifications, such as glycosylation, thus requiring different methods needed to detect THSD7A on Western blots and leading to different banding patterns. Based on our THSD7A trafficking data, we believe that in control cells, most of the THSD7A is getting trafficked and secreted via small EVs. As you can see in Figure 7A, the band for THSD7A in the shScr cell lysate is relatively light and also shows a double band similar to Figure 6E (both HT1080 samples).

      With regard to the level of knockdown of THSD7A in the Western blot shown in Figure 6E, the normalized level is quantitated below the bands.  If you compare that quantitation to the filopodia phenotypes in the same panel, they are quite concordant.  Figures 7B and 7C show quantification of triplicate Western blots, highlighting the significant accumulation of THSD7A in shEng cell lysates, as well as significant small EV secretion of THSD7A in control and WT rescued conditions.

      (3) The study lacks data on the cellular distribution of endoglin and THSD7A: 

      a) Figure 6 - Is THSD7A expected to be present in the nucleus as shown in panel D (label D is missing in the Figure). It is not clear if this is observed in neurons. a Western of endogenous THSD7A on cell fractions would clarify this. The authors should further characterize the cellular distribution of THSD7A in both cell types. Similarly, the cellular distribution of endoglin in the cancer cells should be provided. This would help validate the proposed model in Figure 8. 

      The image in figure 6D shows an HT1080 cell stained with phalloidin-Alexa Fluor 488 to visualize F-actin with or without expression of THSD7A-mScarlet.  In order to fully visualize the thin filopodia protrusions, the cellular plane of focus of the images for this panel was purposely taken at the bottom of the cell, where the cell is attached to the coverslip glass. Thus, we interpret the red signal across the cell body as THSD7A-mScarlet expression on the plasma membrane underneath the cell, not in the nucleus. The neuron images only include the dendrite portion of the neurons; therefore, there is no nucleus present in the neuronal images.

      b) Figure 7 - Although the western blot provides convincing evidence for the role of endoglin in THSD7A trafficking, the microscopy data lack resolution as well as key analyses. While differences between shSCR and shEng cells are clear visually, the insets appear to be zoomed digitally which decreases resolution and interferes with interpretation. It would be crucial to show the colocalization of endoglin and THSD7A within CD63-postive MVE structures. What are the structures in Figure 7E shSCR zoom1? It would be important to rule out that these are migrasomes using TSPAN4 staining. More information on how the analysis was conducted is needed (i.e. how extracellular areas were chosen and whether the images are representative of the larger population). A widefield image of shSCR and shEng cells and DAPI or HOECHST staining in the higher magnification images should be provided. Additionally, the authors should quantify the colocalization of external CD63 and mScarlet signals from many independently acquired images (as they did for the internal signals in panel F). Is there no external THSD7A signal in the shEng cells? 

      The images for Figure 7E were taken with high resolution on a confocal microscope.  Insets for Figure 7E were zoomed in so that readers could see the tiny structures.  Zoom 1 in Figure 7E shows areas of extracellular deposition. In these areas, we can see small punctate depositions that are positive for CD63 and/or THSD7A-mScarlet. Our interpretation of this staining is that the cells are secreting heterogeneous small EVs that are then attached to the glass coverslip. The images and zooms in Fig 7E were chosen to be representative and indeed reveal that there is more extracellular deposition of THSD7A-mScarlet outside the control shScr cells compared to the shEng cells, consistent with more export of THSD7A into small EVs from shScr cells when compared to those of shEng cells (Fig 7A,B). However, we did not quantify this difference, as these experiments were conducted with transient transfection of THSD7A-mScarlet and it is challenging to determine which cell the extracellular THSD7A-mScarlet came from, complicating any quantitative analysis on a per-cell basis.  Quantification of internal THSD7A localization is much more straightforward in this experimental regime.  Indeed, in Figure 7F we assessed internal colocalization of THSD7A-mScarlet and CD63, which we obtained by choosing only cells that were visually positive for THSD7A-mScarlet in each transient transfection and omitting all extracellular signals. Quantifying the extracellular colocalization of THSD7A and CD63 could certainly be a future direction for this project and would require establishing cells that stably express THSD7A-mScarlet.

    1. Author response:

      We thank the reviewers for their thoughtful feedback and valuable comments. We plan to fully address their concerns by including the following experiments and analyses:

      Reviewer 1 suggested exploring data scaling trends for encoding models, as successful scaling would justify larger datasets for language ECoG studies. To estimate scaling effects, we will develop encoding models on subsets of our data.

      Reviewer 2 expressed uncertainty about the baseline for model-brain correlation and recommended adding control LLMs with randomly initialized weights. In response, we will generate embeddings using untrained LLMs to establish a more robust baseline for encoding results.

      Reviewer 2 also proposed incorporating control regressors such as word frequency and phonetic features of speech. We will re-run our modeling analysis using control regressors for word frequency, 8 syntactic features (e.g., part of speech, dependency, prefix/suffix), and 3 phonetic features (e.g., phonemes, place/manner of articulation) to assess how much these features contribute to encoding performance.

      Reviewer 3 raised concerns that the “plateau in maximal encoding performance” was actually a decline for the largest models. We will add significance tests in Figure 2B to clarify this issue.

      Reviewer 3 also noted that in Supplementary Figure 1A, the decline in encoding performance was more pronounced when using PCA to reduce embedding dimensionality, in contrast to the trend observed when using ridge regression. To address this, we will attempt to replicate the observed scaling trends in Figure 2B using PCA combined with OLS.

      Additionally, we will provide a point-by-point response and revise the manuscript with updated analyses and figures in the near future.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review): 

      Summary: 

      Colomb et al have further explored the mechanisms of action of a family of three immunodulatory proteins produced by the murine gastrointestinal nematode parasite Heligmosomoides polygyrus bakeri. The family of HpARI proteins binds to the alarmin interleukin 33 and depending on family members, exhibits differential activities, either suppressive or enhancing. The present work extends previous studies by this group showing the binding of DNA by members of this family through a complement control protein (CCP1) domain. Moreover, they identify two members of the family that bind via this domain in a non-specific manner to the extracellular matrix molecule heparan sulphate through a basic charged patch in CCP1. The authors thus propose that binding to DNA or heparan sulphate extends the suppressive action of these two parasite molecules, whereas the third family member does not bind and consequently has a shorter half-life and may function via diffusion. 

      Strengths: 

      A strength of the work is the multifaceted approach to examining and testing their hypotheses, using a well-established and well-defined family of immunomodulatory molecules using multiple approaches including an in vivo setting. 

      Weaknesses: 

      There are a few weaknesses of the approach. Perhaps some discussion and speculation as to how these three family members might operate in concert during Heligmosomoides polygyrus bakeri infection would help place the biology of these molecules in context for the reader, e.g. when and where they are produced. 

      We agree that the roles of these proteins during infection requires further study and is not fully elucidated in infection here. We have added further discussion to the manuscript on their potential roles during infection (track changes manuscript, lines 277 – 283).

      Reviewer #2 (Public Review): 

      Summary: 

      Colomb et al. investigated here the heparin-binding activity of the HpARI family proteins from H. polygyrus. HpARIs bind to IL-33, a pleiotropic cytokine, and modulate its activities. HpARI1/2 has suppressive functions, while HpARI3 can enhance the interaction between IL-33 and its receptor. This study builds upon their previous observation that HpARI2 binds DNA via its CCP1 domain. Here, the authors tested the CCP1 domain of HpARIs in binding heparan sulfate, an important component of the extracellular matrix, and found that 1/2 bind heparan, but 3 cannot, which is related to their half-lives in vivo. 

      Strengths: 

      The authors use a comprehensive multidisciplinary approach to assess the binding and their effects in vivo, coupled with molecular modeling. 

      Weaknesses: 

      (1) Figure 1C should include Western. 

      We apologise for this oversight, and now include an uncropped western blot image as a Figure 1, Figure Supplement 1.

      (2) Figure 1E: Why does HpARI1 stop binding DNA at 50%? 

      It is currently unclear why HpARI1 does not bind to all DNA in the EMSA assay, however this was our repeated finding. With our revised findings we can now state definitively that HpARI1 has a lower affinity for HS compared to HpARI2, and in each of our assays (EMSA (Fig 1D-E), size exclusion chromatography (Fig 4A), HS-bead pull-down (Fig 4B), lung cell surface binding (Fig 4C) and ITC (Fig 4D)) HpARI1 always shows a weaker response compared to HpARI2. We hypothesise that HpARI1 binds more weakly to DNA/HS to allow it to diffuse further from the site of deposition, but we have yet to demonstrate this during infection. We add further discussion of this point (track changes manuscript, lines 262 – 266).

      (3) ITC binding experiment with HpARI1? Also, the ITC results from HpARI2 do not seem to saturate, thus it is difficult to really determine the affinity. 

      We have now included HpARI1-HS ITC, and re-ran the HpARI2 experiment to saturation (Fig 4D-E).

      (4) It would be helpful to add docking results from HpARI1. 

      We have now included HpARI1-HS docking, in Figure 5B.

      (5) Some conclusions are speculative and need to remain in the Discussion. e.g.: a) That HpARI3 may be able to diffuse farther 

      We have rewritten these points to remove the speculation on localisation from the abstract (lines 18-19) and introduction (line 78).

      b) That DNA/HS may trap HpARI1/2 at the infection site. 

      Likewise, these points have been rewritten in the abstract and introduction as above, and we have made it clearer that this is a model that we are proposing in the discussion (line 277-283).

      Reviewer #1 (Recommendations For The Authors): 

      The paper is well-written and the data well-presented. I have one small comment that the authors may like to consider. In the discussion, second paragraph, line 17, perhaps, "evolved" rather than "developed". 

      Thank you for this suggestion, we have made this change (line 248).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      To hopefully contribute to more strongly support the conclusions drawn by the authors, I am including a series of concerns regarding the manuscript, as well as some suggestions that could be useful to address these issues:

      (1) The main results of this study derive from the use of auxin-inducible degron (AID)-tagged proteins. Despite the great advantages of the AID strategy to conditionally deplete proteins, the AID tag can affect the normal function of a protein. In fact, some of the AID-labeled DDC components generated in this work are shown to be hypomorphic. Hence, the manuscript would have benefited from the additional confirmation of some of the observations using a different way to eliminate the proteins (e.g., temperature-sensitive mutants).

      Most ts mutants are also hypomorphic; hence we don’t see there is much advantage to their use. The addition of the AID to these proteins alone does not interfere with the ability to sustain checkpoint arrest as demonstrated in Figure S1. Instead we found that by overexpressing Rad9-AID we could demonstrate that inactivating Rad9 after 15 h behaved the same way as the inactivation of Ddc2, significantly strengthening our finding that the DDC checkpoint becomes dispensable while the SAC takes over. 

      (2) In cells depleted of Rad53-AID, the deletion of CHK1 stimulates an earlier release from a mitotic arrest induced by two DSBs (Figures 2D and 3C). Likewise, the authors claim that a faster escape from the cell cycle block can also be observed when upstream factors such as Ddc2, Rad9, or Rad24 are depleted in the absence of CHK1 (Figures 2A-C and Figures 3D-F). However, this earlier release from the cell cycle arrest, if at all, is only slightly noticeable in a Rad9-AID background (Figures 2B and 3E). In this sense, it is also worth pointing out that Rad9-AID chk1Δ (Figure 3E) and Rad24-AID chk1Δ (Figure 3F) cells were only evaluated up to 7 h, while in all other instances, cells were followed for 9 h, which hinders a fair assessment of the differences in the release from the cell cycle arrest.

      As noted above, we have now been able to examine Rad9 over the long-time frame.

      (3) Although only 25% of the cells depleted for Dun1 remained in G2/M arrest 7 h following the induction of two DSBs, it is shocking that Rad53 was nonetheless still phosphorylated after the cells had escaped the cell cycle blockage (Figure 4A).

      This persistence of Rad53 phosphorylation is also seen with the inactivation of Mad2, allowing escape in spite of continued Rad53 phosphorylation.

      (4) Generation of Rad9-AID2 and Rad24-AID2 strains did not fully restore the function of these proteins, since most cells had adapted 24 h after induction of two DSBs (Figure S1C). Nonetheless, Rad9-AID2 and Rad24-AID2 are still likely more stable than their AID counterparts, and hence the authors could have instead used the AID2 proteins for the experiments in Figure 2 to better evaluate the role of Rad9 and Rad24 in the maintenance of the DDC-dependent arrest.

      We note again that we have found a way to study Rad9 up to 24 h. 

      (5) Deletion of BFA1 has been shown to promote the escape from a cell cycle arrest triggered by telomere uncapping (Wang et al. 2000, Hu et al. 2001, Valerio-Santiago et al. 2013). Likewise, while cells carrying the cdc5-T238A allele cannot adapt to a checkpoint arrest induced by one irreparable DSB, BFA1 deletion rescues the adaptation defect of this mutant CDC5 allele (Rawal et al., 2016). The authors show how, using AID-degrons of Bfa1 and Bub2, that only Bub2, but not Bfa1, is required to maintain a prolonged cell cycle arrest after the induction of two DSBs. To reinforce this point, and as shown for mad2Δ cells (Figure S6A), the authors could perform a complete time course using both the Bfa1-AID and a bfa1Δ mutant to demonstrate that they do indeed show the same behavior in terms of the adaptation to a two DSB-induced cell cycle arrest.

      We thank the reviewer for noting these other instances where bfa1D promoted an escape from arrest. We tested a 2-DSB bfa1 deletion, data has been added to Figure S9E-F. We did not observe a difference in the percentage of cells escaping arrest between the 2-DSB bfa1 deletion and the 2-DSB BFA1-AID strains.

      (6) Bypass or adaptation of a checkpoint-induced cell cycle arrest in S. cerevisiae often leads to cells entering a new cell cycle without doing cytokinesis and, hence, to the accumulation of rebudded cells. However, the experiments shown in the manuscript only account for G1 or budded cells with either one or two nuclei. Do any of the mutants show cytokinesis problems and subsequent rebudding of the cells? If so, this should have been also noted and quantified in the corresponding assays.

      In the cases we have studied we have not seen instances where the cells re-bud without completing mitosis (at least as assessed by the formation of budded cells with two distinct DAPI staining masses). In the morphological assays we have done, we score the continuation of the cell cycle by the appearance of multiple buds, G1, and small budded cells. In our adaptation assays when cells escaped G2/M arrest they formed microcolonies indicating no short-term deficiency in cell division.

      (7) The location of the DSB relative to the centromere of a chromosome seems to be a factor that determines the capacity of the SAC to sustain a prolonged cell cycle arrest. The authors discuss the possibility that the DSB could somehow affect the structure of the kinetochore. Did they evaluate whether Mad1 or Mad2 were more actively recruited to kinetochores in those strains that more strongly trigger the SAC after induction of the DSBs?

      We have not attempted to follow Mad1/2 recruitment. ChIP-seq could be used to monitor Mad1/2 localization at the 16 centromeres in response to DSBs and the spread of g-H2AX across the centromere. Our previous data showed that g-H2AX could spread across the centromere region and could create a change that would be detected by Mad1/2.  This change does not, however, affect the mitotic behavior of a strain in which the H2A genes have been modified to the possibly phosphomimetic H2A-S129E allele.

      (8) The authors could speculate in the discussion about the reasons that could explain why the DDC is required for the maintenance of checkpoint arrest at early stages but then becomes dispensable for the preservation of a prolonged cell DNA DSB-induced cycle arrest, which is instead sustained at later stages by the SAC.

      Our suggestion is that cells would have adapted, but modification of the centromere region engages SAC.

      Finally, some minor issues are:

      (1) The lines in the graphs that display the results from adaptation assays (e.g., Figures 1B and 1E) or cell and nuclear morphology (e.g., Figures 1D and 1G) are too thick. This makes it sometimes difficult to distinguish the actual percentages of cells in each category, particularly in the experiments monitoring nuclear division.

      Fixed

      (2) While both the adaptation assay and the analysis of nuclear division in Figures 1E and 1G, respectively, show a complete DDC-dependent arrest at 4h, the Western blot in Figure 1F suggests that Rad53 is not phosphorylated at that time point. Do these figures represent independent experiments? Ideally, the analysis of cell budding and nuclear division, which is performed in liquid cultures, and the Western blot displaying Rad53 phosphorylation should correspond to the same experiment.

      Cell budding in liquid cultures and adaptation assays were performed in triplicate with 3 biological replicates and the collective results are shown in each graph showing the percentage of large-budded cells. Western blot samples were collected in each liquid culture experiment. The western blot in 1G is a representative western blot.

      (3) It is somewhat confusing that the blots for the proteins are not displayed in the same order in Figures 2A (Rad53 at the top) and 2B or 2C (Rad53 in the middle).

      Fixed.  We place Rad53 – the relevant protein - at the top.

      Reviewer #2 (Recommendations For The Authors):

      (1) Yeast with the two breaks responds to DNA damage checkpoint (DDC) until sometimes 4-15 h post DNA damage. Since the auxin-induced degradation does not completely deplete all the tagged proteins in cells, the results should be more carefully considered and not to interpret if the checkpoint entry or maintenance depends on each target protein's ability to induce Rad53 phosphorylation. It should be theoretically possible if checkpoint maintenance requires only a modest amount of checkpoint factors especially because the experiments involve the induction of one or two DSBs. The low levels of DDC factors may be insufficient for Rad53 activation but could still be effective for cell cycle arrest. Indeed, the Haber group showed that the mating type switch did not induce Rad53 phosphorylation but still invoked detectable DNA damage response. To test such possibilities, the authors might consider employing yet another marker for DDC such as H2A or Chk1 phosphorylation besides Rad53 autophosphorylation. Alternatively, the authors might check if auxin-induced depletion also disrupts break-induced foci formation for checkpoint maintenance or their enrichment at DNA breaks using ChIP assays at various points post-damage.

      DAPI staining of Ddc2-AID cells show that when IAA is added 4 h after DSB induction (Figure S3A), cells escape G2/M arrest as evidenced by the increase in large-budded cells with 2 DAPI signals, small budded cells, and G1 cells. Overexpression of Ddc2 can sustain the checkpoint past 24 h, but without SAC proteins like Mad2 they will eventually adapt (Figure S6B).

      That Rad9-AID or Rad24-AID in the absence of added auxin (but in the presence of TIR1) is unable to sustain arrest suggests to us that low levels of Rad9 or Rad24 are not sufficient to maintain arrest.  As the reviewer notes, normal MAT switching doesn’t cause Rad53 phosphorylation or arrest, though early damage-induced events such as H2A phosphorylation do occur.  But our point is that Rad9 or Ddc2 is needed to maintain arrest only up to a certain point, after which they become superfluous and a different checkpoint arrest is imposed. At that point apparently a low level of these proteins plays no obvious role.

      (2) It is interesting that DDC no longer responds to the damage signaling after 15 h of DSB-induced prolonged checkpoint arrest after two DNA double-strand breaks. Is this also applicable to other adaptation mutants? The results might improve the broad impact of the current conclusions. It is also possible that the transition from DDC to SPC depends on simply the changes in signaling or in part due to the molecular changes in the status of DNA breaks or its flanking regions. Indeed, the proposed model suggests that the spreading of H2A phosphorylation to centromeric regions induces SAC and thus mitotic arrest. The authors could measure H2A phosphorylation near the centromere using ChIP assays at various intervals post-DNA damage. It is particularly interesting if depletion of Ddc2 at 15 h post DNA damage does not alter the level of H2A phosphorylation at or near centromere.

      Our previous data have suggested that the involvement of the SAC in prolonging DSB-induced arrest involved post-translational modification of centromeric chromatin such as the Mec1- and Tel1-dependent phosphorylation of the histone H2A (Dotiwala). In budding yeast there is also a similar DSB-induced modification of histone H2B (Lee et al.). To ask if there is an intrinsic activation of the SAC if the regions around centromeres were modified by checkpoint kinase phosphorylation, we examined cell cycle progression in strains in which histone H2A or histone H2B was mutated to their putative phosphomimetic forms (H2A-S129E and H2B-T129E).  As shown in Figure S11, there was no effect on the growth rate of these strains, or of the double mutant, suggesting that cells did not experience a delay in entering mitosis because of these modifications. We note that although histone H2A-S129E is recognized by an antibody specific for the phosphorylation of histone H2A-S129, the mutation to S129E may not be fully phosphomimetic. 

      (3) It is puzzling why Rad9-AID or Rad24-AID are proficient for DDC establishment but cannot sustain permanent arrest in the two break cells. It appears Rad53 phosphorylation for DDC is weaker in cells expressing Rad9-AID or Rad24-AID according to Fig.2B and C even though their protein level before IAA treatment is still robust. This might also explain why the results of depleting Rad53 and Rad9 are very different. It also raises concern if the effect of Rad24 depletion on checkpoint maintenance is in part due to the weaker checkpoint establishment. It might be necessary to use the AID2 system to redo Rad24 depletion to exclude such a possibility.

      We believe that the AID mutants are very sensitive to the low level of IAA present in yeast.  The instability of the protein is entirely dependent on the TIR1 SCF factor, so the proteins themselves are not intrinsically defective; they are just subject to degradation.  Overexpressing Rad9 allowed us to evaluate its role at late time points. 

      (4) It is intriguing that the switch from DDC to SAC might take place at around 12 h when yeasts with a single unrepairable break ignore DDC and resume cell cycling (so-called "adaptation"). Since 4h and 15h are far apart and the transition point from DDC to SAC likely takes place between these two points, it will be very helpful to analyze and compare cell cycle exit after 24 h by treating IAA at multiple points between 4-15h.

      When we add IAA to Mad2-AID and Mad1-AID 4 h after DSB induction, cells remain arrested for up to 12 h after DSB induction. At 15 h cells begin to exit checkpoint arrest indicating that the handoff of checkpoint arrest must occur between 12 to 15 h after DSB induction. If we degraded DNA damage checkpoint proteins at any point before Mad2, Mad1, and Bub2 begin to contribute to checkpoint arrest, then arrested cells will likely adapt in a similar manner to when IAA was added 4 h after DSB induction.

      (5) Some of the Western blot quality is poor. For instance, in Figure 6C, Mad1-AID level after IAA addition is not compelling especially because the TIR level (the loading control) is also very low.

      In Figure 6C, while the relative levels of TIR1 are similar in the IAA treated and untreated samples, there is no detectable amount of Mad1-AID in the IAA treated samples indicating that Mad1-AID was successful degraded with the AID system.

      (6) Fig. 8 is complex. It might be helpful to define the different types of arrows in the figure. The legend also has a spelling error, Rad23 should be Rad24.

      We’ve defined what each arrow means in the legend and corrected the spelling error in the figure legend.

      Reviewer #3 (Recommendations For The Authors):

      Major concerns:

      Much of the manuscript states that two unrepairable DSBs lead to a long and severe G2/M arrest. Two main cytological approaches are used to make this statement: bud size and number on plates after micromanipulation (microcolony assay), and cell and nuclear morphology in liquid cultures. While the latter gives a clear pattern that can be assigned to a G2/M block as expected by DDC, i.e. metaphase-like mononucleated cells with large buds, the former can only tell whether cells eventually reach a second S phase (large budded cells on the plate can be in a proper G2/M arrest, but can also be in an anaphase block or even in the ensuing G1). The authors always performed the microcolony assay, but there are several cases where the much more informative budding/DAPI assay is missing. These include Dun1-aid and others, but more importantly chk1D and its combinations with DDC proteins. Incidentally, for the microcolony assay, it is more accurate to label the y-axis of the corresponding graphs (and in the figure legends and main text) with something like "large budded cells"; "G2/M arrested cells" is misleading.

      Figures have been updated to more accurately reflect what we are measuring.

      The results obtained with the Bfa1/Bub2 partner are intriguing. These two proteins form a complex whose canonical function is to prevent exit from mitosis until the spindle is properly aligned, acting in a distinct subpathway within the SAC that blocks MEN rather than anaphase onset. The data presented by the authors suggest that, on the one hand, both SAC subpathways work together to block the cell cycle. However, why does canonical SAC (Mad1/Mad2) inactivation not lead to a transition from G2/M (metaphase-like) arrested cells to anaphase-like arrest maintained by Bfa1-Bub2? Since Bfa1-Bub2 is a target of DDC, is it possible that DDC knockdown also inactivates this checkpoint, allowing adaptation? On the other hand, can the authors provide more data to confirm and strengthen their claim of a Bfa1-independent Bub2 role in prolonged arrest? Perhaps long-term protein localization and PTM changes. Bub2-independent roles for Bfa1 have been reported, but not vice versa, to the best of my knowledge.

      In the mitotic exit network Bfa1/Bub2 prime activation of the pathway by bringing Tem1 to spindle pole bodies. Phosphorylation of Bfa1 causes Tem1 to be released and phosphorylate Cdc5 to trigger exit by MEN. It has been shown that DNA damage, in a cdc13-1 ts mutant, phosphorylates Bfa1 in a Rad53 and Dun1 dependent manner. This phosphorylation of Bfa1 could release Tem1 and prime cells to exit checkpoint arrest when cells pass through anaphase. Looking at Tem1 localization to spindle pole bodies and interactions with Bfa1/Bub2 in response to DNA damage might give insight into why cells don’t experience an anaphase-like arrest when they are released by either deactivation of the DNA damage checkpoint or SAC.

      We have previously shown that a deletion of bub2 in a 1-DSB background shortens DSB-induced checkpoint arrest. Deletion of bfa1 in a 2-DSB background showed ~80-70% of cells stuck in a large-budded state as measured through an adaptation assay tracking the morphology of G1 cells on a YP-Gal plate and DAPI staining. Deletion or degradation of bfa1 might not release cells from arrest because the Mad2/Mad1 prevent cells from transitioning into anaphase. Our DAPI data for Bub2-AID shows an increase in cells with 2 DAPI signals (transition into anaphase) and small budded cells indicating that degradation of Bub2 is releasing cells into anaphase and allowing cells to complete mitosis.

      Further suggestions:

      It would be richer if authors could provide more than one experimental replicate in some panels (e.g., S1A,B; S4A; and S6B).

      S1C confirms that Rad9-AID and Rad24-AID will adapt by 24 h even with the point mutant TIR1(F74G) which has lower basal degradation than TIR1. S4A has been updated with additional experimental replicates. The 48 h timepoint after DSB induction was to show the importance of Mad2 even when Ddc2 is overexpressed.

      Figure 1: Rearrange figure panels when they are first mentioned in the text. For example, it makes more sense to have the plate adaptation assay as panel B for both 1-DSB and 2-DSB strains, budding plus DAPI as panel C, and Rad53 as panel D.

      These figures have been rearranged in the order that they are mentioned in the paper.

      Figure 5: Correct Ph-5-IAA in the Rad53 WBs (it should be 5-Ph-IAA).

      This has been corrected.

      Figure S2: The straight line under the "+IAA" text box is misleading. I think it should also cover the "-2" time point, right? Also, check the figure legend. Information is missing and does not correspond to the figure layout.

      This has been corrected.

      Figure S3: Perhaps "Cell cycle profile as determined by budding and DAPI staining" is a better and more accurate legend title.

      The legend title has been updated to “Cell cycle profile as determined by budding and DAPI staining in Ddc2-AID and Rad53-AID mutants ± IAA 4 h after galactose.”

      Figure S5: Detection of both Rad53 and Ddc2 in the same blot could lead to misinterpretation as hyperphosphorylated Rad53 appears to coincide with Ddc2 migration.

      Figure S5A-B are representative western blots where Rad53 was probed to show activation of the DNA damage checkpoint by Rad53 phosphorylation. When measuring the relative abundance of Ddc2 we did not probe all blots for Rad53.

      Table S1: Include the post-hoc test used for comparisons after ANOVA.

      A Sidak post-hoc test was used in PRISM for the one-way ANOVA test. PRISM listed the Sidak post-hoc test as the recommended test to correct for multiple comparisons. A column has been added to S. Table 1 to show which post-hoc test was used.

      Page 10, line 4: The putative additive effect of chk1 knockout with Dun1 depletion should also be compared to chk1 alone (in Figure 3A).

      We address the additive effect of chk1 knockout with Dun1-AID depletion in a later section on Page 11, line 6. Since we had not explored possible effects from downstream targets of Rad53 for prolonging checkpoint arrest when Rad53 was depleted, we did not mention the effect of the chk1 knockout on Dun1 depletion.

      Page 14, second paragraph, line 4: "Figure 6A-D", is it not?

      Figure S6A is measuring checkpoint arrest in a deletion of mad2 in a 2-DSB strain. Figure 6A-D shows how degradation of Mad2-AID and Mad1-AID after the handoff of arrest causes cells to exit the checkpoint in a Rad53 independent manner.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary: The authors investigated the function of Microrchidia (MORC) proteins in the human malaria parasite Plasmodium falciparum. Recognizing MORC's implication in DNA compaction and gene silencing across diverse species, the study aimed to explore the influence of PfMORC on transcriptional regulation, life cycle progression and survival of the malaria parasite. Depletion of PfMORC leads to the collapse of heterochromatin and thus to the killing of the parasite. The potential regulatory role of PfMORC in the survival of the parasite suggests that it may be central to the development of new antimalarial strategies.

      Strengths: The application of the cutting-edge CRISPR/Cas9 genome editing tool, combined with other molecular and genomic approaches, provides a robust methodology. Comprehensive ChIP-seq experiments indicate PfMORC's interaction with sub-telomeric areas and genes tied to antigenic variation, suggesting its pivotal role in stage transition. The incorporation of Hi-C studies is noteworthy, enabling the visualization of changes in chromatin conformation in response to PfMORC knockdown.

      We greatly appreciate the overall positive feedback and cognisense of our efforts. Our application of CRISPR/Cas9 genome editing tools coupled with complementary cellular and functional approaches shed light on the importance of _Pf_MORC in maintaining chromatin structural integrity in the parasite and highlights this protein as a promising target for novel therapeutic intervention.

      Weaknesses: Although disruption of PfMORC affects chromatin architecture and stage-specific gene expression, determining a direct cause-effect relationship requires further investigation.

      Our conclusions were made on the basis of multiple, unbiased molecular and functional genomic assays that point to the relevance of the _Pf_MORC protein in maintaining the parasite’s chromatin landscape. Although we do not claim to have precise evidence on the step-by-step pathway to which _Pf_MORC is involved, we bring forth first-hand evidence of its role in heterochromatin binding, gene-regulation and its association with major TFs as well as chromatin remodeling and modifying enzymes. We however agree with the comment regarding the lack of direct effects of _Pf_MORC KD and have since provided additional evidence by performing ChIP-seq experiments against H3K9me3 and H3K9ac during KD. Our new results are presented in Fig. 5. We showed that the level of H3K9me3 decreased significantly during _Pf_MORC KD.

      Furthermore, while numerous interacting partners have been identified, their validation is critical and understanding their role in directing MORC to its targets or in influencing the chromatin compaction activities of MORC is essential for further clarification. In addition, the authors should adjust their conclusions in the manuscript to more accurately represent the multifaceted functions of MORC in the parasite.

      Validation of the identified interacting partners is indeed critical and essential to understanding their role in directing MORC to its targets. Our protein pull down experiments have been done using several biological replicates. Several of the interacting partners have also been identified and published by other labs and collaborators. To confirm our results, we completed a direct comparison of our work with previous published work. Results have now been incorporated into the revised manuscript to confirm the identified interacting partners and the accuracy of the data we obtained in our experiment. Molecular validation of novel proteins identified in our protein pull down requires generation of tagged lines and may take a few more years but will be submitted for publication in a follow up manuscript.

      Reviewer #2 (Public Review):

      Summary: This paper, titled "Regulation of Chromatin Accessibility and Transcriptional Repression by PfMORC Protein in Plasmodium falciparum," delves into the PfMORC protein's role during the intra-erythrocytic cycle of the malaria parasite, P. falciparum. Le Roch et al. examined PfMORC's interactions with proteins, its genomic distribution in different parasite life stages (rings, trophozoites, schizonts), and the transcriptome's response to PfMORC depletion. They conducted a chromatin conformation capture on PfMORC-depleted parasites and observed significant alterations. Furthermore, they demonstrated that PfMORC depletion is lethal to the parasite.

      Strengths: This study significantly advances our understanding of PfMORC's role in establishing heterochromatin. The direct consequences of the PfMORC depletion are addressed using chromatin conformation capture.

      We appreciate the Reviewer’s comments and reflection on the importance of our work.

      Weaknesses: The study only partially addressed the direct effects of PfMORC depletion on other heterochromatin markers.

      Here again, we agree with the reviewer’s comment and have performed additional experiments to delve deeper into the multifaceted roles of _Pf_MORC. We have performed additional ChIP-sequencing analysis on _Pf_MORC depleted conditions focusing on known heterochromatin and euchromatin markers H3K9me3 and H3K9ac respectively. We hope our new results presented in figure 5 will shed light on the more direct implications of _Pf_MORC on heterochromatin and gene silencing.

      Reviewer #1 (Recommendations For The Authors):

      Suggestions for improved or additional experiments, data or analyses.

      • Why does MORC, which was used in the pull-down, seem to be only minimally enriched in the volcano plot, while a series of proteins (marked in red) and AP2 (highlighted in green) are enriched with log2 fold changes exceeding 15?

      We apologize for the confusion. MORC was detected with the highest number of peptides (97 and 113) and spectra (1041 and 1177) confirming the efficiency of our pull-down. However, considering the relatively large size of the MORC protein (295kDa) and it weak detection in the control (5 and 7 peptides; 16 and 43 spectra), the Log2 FoldChange and Z-statistic after normalization are minimal compared to smaller proteins that were not identified in the control samples.

      Additionally, can you explain why these proteins appear to be enriched at the same fold? 

      We can postulate that these proteins form a complex with a ratio of 1:1. Two of these three proteins are described to interact with MORC in several publications, supporting a strong interaction between them.

      Variations in the interactome could result from the washing buffer's stringency.

      We agree that the IP conditions could affect the detection of the interactome as well as the parasite stage used. As indicated below, the overlap with previous publications and the presence of AP2 TFs and chromatin remodelers strongly support our results.

      It would be highly appropriate for the authors, similar to the co-submitted article (Maneesh Kumar Singh et al.), to present their mass spectrometry data in relation to previous purifications in Plasmodium (Bryant et al. 2020; Subudhi et al. 2023; Hillier et al. 2019) and also in Toxoplasma (Farhat et al. 2020). It would be good if authors could also put their results into perspective in light of the following pre-prints:

      We agree with the reviewer’s comment. In this revised manuscript, we compared our IP-MS data to previous published manuscripts. Key proteins including the AP2-P (PF3D7_1107800) and HDAC1 were indeed identified in several experiments validating our initial findings of the formation of large complexes with MORC. However, it’s important to highlight that the MORC protein was not used as the bait protein in previously published papers, and thus some discrepancies can be observed.

      Given the tendency of MORCs to form multiple complexes with AP2 factors, have you explored whether specific AP2s are conserved between Plasmodium and Toxoplasma, within the phylum?

      P. falciparum encodes for 27 putative AP2s, while T. gondii has over 60 AP2s, making direct comparison challenging. Some Plasmodium AP2s have multiple counterparts in T. gondii and typically conservation is limited to the AP2 binding domains. Attempts to identify sequence homology among AP2s and the regions of conservation have been performed (PMID: 30959972, PMID: 30959972, PMID: 16040597). Although this information would provide interesting insight, we believe exploring this topic at this time would diverge from our primary objectives. It would be more appropriate to address this in future studies.

      Could this conservation be identified either through phylogenetic means or by using tools such as AlphaFold, especially considering not just the AP2 domains but also any existing ACDC domains?

      Although this may reveal important information regarding the association between MORC proteins and AP2 domains, we believe investigating the conservation between AP2 across apicomplexan parasites may prove too challenging and is beyond the scope of this work.

      Most of the genes are depicted without their immediate surroundings (Fig. 2d and Fig S2c, d). For instance, the promoter region of AP2g is not shown (Fig. 2d). It is therefore very challenging to determine the presence or absence of MORC upstream or downstream; considering that this factor, which can create DNA loop protrusions, might bind at a distance from the genes in question.

      All gene coverage plots, including AP2-G, show 500 bp up- and downstream of the displayed gene. We have modified our figure legends to make sure that this information is provided.

      Upon examining Figure S3, it is evident that the authors have indicated a decline in PfMORC expression, represented as percentages over two unique time frames. The methodology behind this quantification remains ambiguous. It's essential for the authors to specify whether normalization was done using a loading control. As a benchmark, Singh et al. (2021) in their Figure 4 transparently used GAPDH as a loading control and included an untreated sample in their western blot analysis.

      We thank the Reviewer for bringing this to our attention. Our initial quantification was performed using ImageJ. To address the Reviewer’s comment, we have reperformed the experiment. Our quantitative analysis was performed through Bio-Rad ImageLab software using aldolase expression as a loading control (50% of the MORC loading). This information has now been incorporated into the supplementary figures (Figure S3).

      There's a striking observation that, despite significant degradation of PfMORC (as depicted in Figures S1 and S3), only the upper band in the western blot diminishes. This inconsistency needs addressing, as it can raise questions about the interpretation of the results.

      We agree with the reviewer's comment. We experienced some challenges upon performing a Western Blot on such a large protein (295kDa). Our initial attempts required long exposure that may have highlighted non-specific signals of smaller proteins. To address the reviewer’s comment, we have performed the experiment one more time and made necessary changes to our WB protocol. Our new result better reflects the expected down regulation of _Pf_MORC. These changes have been incorporated to our manuscript and Fig S3.

      Recommendations for improving the writing and presentation.

      MORC KD quantification and consistency with previous findings (Figure S3): When comparing their results with those from another study (Singh et al. 2021), it's critical to ensure that the experimental conditions, especially the methodology for KD and the quantification of protein levels, are similar. If not, a direct comparison might be misleading.

      We greatly appreciate the suggestions and have made efforts to redesign the MORC KD quantifications according to the reviewer’s recommendations.

      While the manuscript mentions the level of KD, it does not delve into the functional consequences of such a decrease in protein levels. It would be of interest to understand how this level of KD affects the parasite's biology, especially in the context of the paper's main findings.

      We have addressed this question by looking at the changes in chromatin structure in WT versus KD parasites upon atc removal. We have also validated this initial result by designing an additional ChIP-seq experiment against histone marks in WT versus KD parasites upon atc removal. Our findings showed a significant downregulation in H3K9me coverage in heterochromatin regions, specifically in genes associated with antigenic variation and invasion genes. These findings suggest that PfMORC regulates at least partially gene silencing and chromatin arrangements. The manuscript has been edited accordingly. 

      Concluding page 5, the authors present an interpretation of their findings that suggests a multi-faceted role of PfMORC in regulating stage-specific gene families, particularly the gametocyte-related genes and merozoite surface proteins. While the narrative they present is intriguing, several concerns arise:

      Over-reliance on correlation: The authors draw a direct line between the levels of PfMORC binding and the function of these genes in the parasite's life cycle. However, a mere correlation between PfMORC binding and stage-specific gene activity does not necessarily imply causation. They would need to provide experimental evidence showing that manipulation of PfMORC levels directly impacts these genes' expression.

      We agree with the reviewer's comment. We have however partially addressed this issue by comparing our ChIP-seq, RNA-seq and Hi-C experiments. We concluded that several of the transcriptional changes observed were due to an indirect effect of PfMORC KD and were most likely induced by a cell cycle arrest and partial collapse of the chromatin structure. The collapse of the heterochromatin structure was validated using our Hi-C experiment. To further address additional concerns the review’s had, we have included additional ChIP-seq experiments targeting histone marks to confirm our initial hypothesis. Result of this additional experiment has been incorporated in the revised version of the manuscript.

      Ambiguity surrounding "low levels" and "high levels": The terms "low levels" and "high levels" of PfMORC binding are qualitative and could be subject to interpretation. Without quantification or a clear benchmark, these descriptions remain vague.

      We agree with the reviewers that the terms "low levels" and "high levels" of PfMORC binding are qualitative and could be subject to interpretation. We have however quantified our change in DNA binding using normalized reads (RPKM). In trophozoite and schizont stages, most of the genes contain a mean of <0.5 RPKM normalized reads per nucleotide of Pf_MORC binding within their promoter region, whereas antigenic gene families such as _var and rifin contain ~1.5 and 0.5 normalized reads, respectively (Fig. 2b). Similar results are also obtained for the gametocyte-specific transcription factor AP2-G  that contains levels of Pf_MORC binding similar to what is observed in _var genes (Fig. 2c and S2c, d).

      Shift in Binding Sites: The observed minor switch in PfMORC binding sites from gene bodies to intergenic and promoter regions is mentioned, but without context on how these shifts impact gene expression or any comparative analysis with other proteins showing similar shifts. The claim that this shift implicates PfMORC as an "insulator" is a leap without direct evidence.

      We apologize for the confusion. We  have compared our ChIP-seq with RNA seq results at different time points of the cell cycle and demonstrated that the shift observed has an effect in gene expression. We have edit the manuscript to clarify these results.

      Overextension of PfMORC's Role: The authors suggest that PfMORC moves to the regulatory regions around the TSS to guide RNA Polymerase and transcription factors. This is a substantial claim and would require additional experiments to validate. Simply observing binding in a region is insufficient to assign a specific functional role, especially one as critical as guiding RNA Polymerase. Historically, the MORC family has been primarily linked with gene silencing across Apicomplexan, plants, and metazoans. On page 7, the authors noted a minimal overlap between the ChIP-seq and RNA-seq signals (Fig. 4e). They also acknowledged that the pronounced gene expression shifts at schizont stages result from a combination of direct and indirect impacts of PfMORC degradation, which could cause cell cycle arrest and potential heterochromatin disintegration, rather than just decreased PfMORC binding. Therefore, the authors should adjust their conclusions in the manuscript to more accurately represent the multifaceted functions of MORC in the parasite.

      We agree with the reviewer's comment and have edited the manuscript accordingly.  

      DISCUSSION:

      The authors concluded that "Using a combination of ChIP-seq, protein knock down, RNA-seq and Hi-C experiments, we have demonstrated that the MORC protein is essential for the tight regulation of gene expression through chromatin compaction, preventing access to gene promoters from TFs and the general transcriptional machinery in a stage specific manner."

      Again, the assertion that MORC protein is essential for tight regulation of gene expression, based purely on correlational data (e.g., ChIP-seq showing binding doesn't prove functionality), assumes causality which might not be fully substantiated. The phrase "preventing access to gene promoters from TFs and the general transcriptional machinery in a stage-specific manner" needs also validation. Asserting that MORC is essential for this function might oversimplify the process and overlook other critical contributors.

      We agree with the reviewer’s comments and the conclusion has since been edited accordingly.

      The discussion is quite poor. It would be pertinent to put MORC in perspective within the broader picture of regulatory mechanisms of chromatin state at telomeres and var genes. For instance, how do SIR2 and HDAC1 (associated with MORC) divide the task of deacetylation? Or the contribution of HP1 and other non-coding RNAs.

      We agree with the reviewer’s suggestion. However, in order to put MORC in perspective within a broader picture, we would need to measure changes in localization of several molecular components regulating heterochromatin in WT versus KD condition. This will require access to several molecular tools and specific antibodies that we do not currently have. We have addressed these issues in our discussion.  

      Minor corrections to the text and figures.

      Figure 1d: Could you provide the ID for each AP2 directly on the volcano plot? While some IDs are referenced in the manuscript, visual representation in the plot would facilitate a clearer understanding of their enrichment levels.

      ID for unknown AP2 proteins have been added on the volcano plot.

      I recommend presenting Figure S2b as a panel within a primary figure. This change would offer readers a more quantitative understanding of the distinct differences between developmental stages. Notably, there seems to be a limited number of genes in common when considering the total, and there is an apparent lack of enrichment in the ring stage.

      This has been done.

      The captions are very minimally detailed. An effort must be made to better describe the panels as well as which statistical tests were used. 

      We have improved the figure legends and add the number of biological replicates as well as the statistic used in each figure legend.

      Figure 1A: The protein diagram with its domains does not take scale into account.

      The figure has been modified.

      Reviewer #2 (Recommendations For The Authors):

      (1) The study lacks a direct link between PfMORC's inferred function and the state of heterochromatin in the genome post-depletion.

      We agree with the reviewer's comment and have included additional ChIP-seq experiments to measure changes in histone marks in PfMORC depleted parasite line. We show a significant decrease in histone H3K9me3 marks in PfMORC KD condition.

      Conducting ChIP-seq on well-known heterochromatin markers such as H3K9me3, HP1, or H3K36me2/3 could shed light on the consequences of PfMORC depletion on global heterochromatin and its boundaries.

      With no access to an anti-HP1 antibody with reasonable affinity, we have not been able to study the impact of MORC KD on HP1 but have successfully observed the impact on H3K9me3 marks. These results have been added to the revised manuscript in (Fig. 5).

      (2) The authors should conduct a more comprehensive analysis of PfMORC's genomic localization, comparing it to ApiAP2 binding (interacting proteins) and histone modifications. This would provide valuable insights.

      We have performed a more comprehensive genome wide analysis of MORC binding through ChIP-seq on WT and MORC-KD conditions. Our results show that Pf_MORC localizes to heterochromatin with significant overlap with H3K9-trimethylation (H3K9me3) marks, at or near _var gene regions. When downregulated, level of H3K9me3 was detected at a lower level, validating a possible role of _Pf_MORC in gene repression. Regarding the comparison with AP2 binding, our proteomics datasets have shown extensive MORC binding with several AP2 proteins.

      (3) RNA-seq data reveals that only a few genes are affected after 24 hours of PfMORC depletion, with an equivalent number of up-regulated and down-regulated genes. The reasons behind down-regulation resulting from a heterochromatin marker depletion are not clearly established.

      We agree with the reviewer’s comment. At this stage (24 hours), _Pf_MORC depletion is limited and the effects at the transcriptional level are quite restricted. Furthermore, it is highly probable that down-regulated genes are most likely due to an indirect effect of a cell cycle arrest. We have edited the manuscript to address this comment. 

      The relationship between this data and the partial depletion of PfMORC needs further discussion.

      We agree with the reviewers and have improved our discussion in the revised version of the manuscript.

      (4) The authors did not compare their ChIP-seq data with the genes found downregulated in the RNA-seq data. Examining the correlation between these datasets would enhance the study.

      We apologize for the confusion. We have compared ChIP-seq and RNA-seq data and identified a very limited number of overlapping genes indicating that most of the changes observed in gene expression are in fact most likely indirect due to a cell cycle arrest and a collapse of the chromatin. We have edited the manuscript to clarify this issue.

      (5) The discussion section is relatively concise and does not fully address the complexity of the data, warranting further exploration.

      We have improved the discussion section in the revised version of the manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      The authors previously showed in cell culture that Su(H), the transcription factor mediating Notch pathway activity, was phosphorylated on S269 and they found that a phospho-deficient Su(H) allele behaves as a moderate gain of Notch activity in flies, notably during blood cell development. Since a downregulation of Notch signaling was proposed to be important for the production of a specialized blood cell types (lamellocytes) in response to wasp parasitism, the authors hypothesized that Su(H) phosphorylation might be involved in this cellular immune response.

      Consistent with their hypothesis, the authors show that Su(H)S269A knock-in flies display a reduced response to wasp parasitism and that Su(H) is phosphorylated upon infestation. Using in vitro kinase assays and a genetic screen, they identify the PKCa family member Pkc53E as the putative kinase involved in Su(H) phosphorylation and they show that Pkc53E can bind Su(H). They further show that Pkc53E deficit or its knock-down in larval blood cells results in similar blood cell phenotypes as Su(H)S269A, including a reduced response to wasp parasitism, and their epistatic analyses indicate that Pkc53E acts upstream of Su(H).

      Strengths

      The manuscript is well presented and the experiments are sound, with a good combination of genetic and biochemical approaches and several clear phenotypes which back the main conclusions. Notably Su(H)S269A mutation or Pkc53E deficiency strongly reduces lamellocyte production and the epistatic data are convincing.

      Weaknesses

      The phenotypic analysis of larval blood cells remains rather superficial. Looking at melanized cells is a crude surrogate to quantify crystal cell numbers as it is biased toward sessile cells (with specific location) and does not bring information concerning the percentage of blood cells differentiated along this lineage.

      In Su(H)S269A knock-in or Pkc53E zygotic mutants, the increase in crystal cells in uninfected conditions and the decreased capacity to induce lamellocytes following infection could have many origins which are not investigated. For instance, premature blood cell differentiation could promote crystal cell differentiation and reduce the pool of lamellocytes progenitors. These mutations could also affect the development and function of the posterior signaling center in the lymph gland, which plays a key role in lamellocyte induction.

      Similarly, the mild decrease on resistance to wasp infestation (Fig. 2A) could reflect a constitutive reduction in blood cell numbers in Su(H)S269A larvae rather than a defective down-regulation of Notch activity.

      We fully agree with the reviewer that sessile crystal cells counts are a coarse approach to capture hemocytes. However, they allowed the screening of numerous genotypes in the course of our kinase candidate screen. We recorded the hemocyte numbers in the various genetic backgrounds and with regard to wasp infestation. There was no significant difference between Su(H)S269A and Su(H)gwt control, independent of infection. This is in agreement with earlier observations of unchanged plasmatocyte numbers in N or Su(H) mutants compared to the wild type (Duvic et al., 2002). We noted, however, a small drop in hemocyte numbers in Su(H)S269D and a strong one in Pkc53ED28 mutants in both conditions relative to control. Presumably, Pkc53E has a more general role in blood cell development, which we have not further analysed. The results were included in new Figure 1_S1 and Figure 9_S1 supplements. Based on the link between hemocyte numbers and wasp resistance (e.g. McGonigle et al., 2017), we cannot exclude that the lowered resistance of Pkc53ED28 mutants regarding wasp attacks is partly due to reduced hemocyte numbers, albeit we did not see significant differences between either Su(H)S269A, nor Pkc53ED28 nor the double mutant. We have included this notion in the text.

      Lamellocytes arise in response to external challenges like parasitoid wasp infestation by trans-differentiation from larval plasmatocytes, and by maturation of lamellocyte precursors in the lymph gland, yet barely in the Su(H)S269A and Pkc53ED28 mutants.

      We find it hard to envisage, however, that a premature differentiation of plasmatocytes into crystal cells in our case could deplete the pool of lamellocyte progenitors in the hemolymph. (Is there a precedent?). Crystal cells make up about 5% of the hemocyte pool; they are increased max. 2 fold in the Su(H)S269A and Pkc53E mutants. Even if these extra crystal cells (now  ̴10%) had arisen by premature differentiation, there should be still enough plasmatocytes (̴ 80%) remaining with a potential to further divide and transdifferentiate into lamellocytes.

      Indeed, we cannot exclude an effect of the Su(H)S269A mutant on the development and function of the posterior signaling center of the lymph gland. We noted, however, a slight but significant enlargement of the PS in the Su(H)S269A mutant, that to our understanding cannot explain the reduced lamellocyte numbers.

      Whereas the authors also present targeted-knock down/inhibition of Pkc53E suggesting that this enzyme is required in blood cells to control crystal cell fate (Fig. 6), it is somehow misleading to use lz-GAL4 as a driver in the lymph gland and hml-GAL4 in circulating hemocytes as these two drivers do not target the same blood cell populations/steps in the crystal cell development process.

      We fully agree with the reviewer that the two driver lines target different blood cell populations/ steps in hematopoiesis. The hml-Gal4 driver is regarded pan-hemocyte, common to both plasmatocytes and pre-crystal cells (e.g. Tattikota et al., 2020). It has been reported to drive specifically within differentiated hemocytes prior to or at the stage of crystal cells commitment (Mukherjee et al., 2011). Hence, hml-Gal4 appeared suitable to hit sessile and circulating hemocytes prior to final differentiation into crystal cells or lamellocytes, respectively.

      In the lymph gland, however, hml is expressed within the cortical zone, where it appears specific to the plasmatocytes lineage, and not present in the crystal cell precursors (Blanco-Obregon et al., 2020). In contrast, lz-Gal4 is specific to the differentiating crystal cells in both lineages, i.e. in circulating and sessile hemocytes and in the lymph gland. Hence, we choose lz-Gal4 instead of hml-Gal4 at the risk of driving markedly later in the course of crystal cell differentiation. We included the reasoning in the text. Overall, we feel that this choice does not limit our conclusions.

      In addition, the authors do not present evidence that Pkc55E function (and Su(H) phosphorylation) is required specifically in blood cells to promote lamellocyte production in response to infestation.

      We have tried to address this interesting question by several means. Firstly, we show that Pkc53E is indeed expressed in the various cell types of larval hemocytes, shown in a new Figure 8 and Figure 8_S1 supplement. I.e., there is the potential of Pkc53E to promote lamellocyte formation. Moreover, RNAi-mediated downregulation of Pkc53E within hemocytes affected crystal cell formation similar to the Pkc53ED28 mutant, in agreement with a specific requirement within blood cells (Figure 6). Finally, we show a major drop in Notch target gene transcription (NRE-GFP) in response to wasp infestation within isolated hemocytes from Su(H)gwt in contrast to Su(H)S269A larvae (see new Figure 1 G). These data show that Su(H)-mediated Notch activity must be downregulated in hemocytes prior to lamellocyte formation in agreement with our hypothesis.

      Finally, the conclusion that Pkc53E is (directly) responsible for Su(H) phosophorylation needs to be strengthened. Most importantly, the authors do not demonstrate that Pkc53E is required for Su(H) phosphorylation in vivo (i.e. that Su(H) is not phosphorylated in the absence of Pkc53E following infestation).

      We would very much like to show respective results. Unfortunately, the low affinity of our pS269 antibody does not allow any in situ or in vivo experiments. We very much hope to obtain a more specific phosphoS269-Su(H) antibody allowing us further in situ studies, and show, for example co-localization with Pkc53E.

      In addition, the in vitro kinase assays with bacterially purified Pkc53E (in the presence of PMA or using an activated variant of Pkc53E) only reveal a weak activity on a Su(H) peptide encompassing S269 (Fig. 4).

      The reviewer correctly notes the poor activity of our purified Pkc53EEDDD kinase. This low activity also holds true for the standard peptide (PS), which in fact is even less well accepted than the Swt substrate. Indeed, the commercially available PKCα is a magnitude more active. Whether this reflects the poor quality of our isolated protein compared to the commercial PKCα, or whether it reflects a true biochemical property of Pkc53E remains to be shown in the future. We noted this observation in the manuscript.

      Moreover, while the authors show a coIP between an overexpressed Pkc53E and endogenous Su(H) (Fig. 7) (in the absence of infestation), it has recently been reported that Pkc53E is a cytoplasmic protein in the eye (Shieh et al. 2023), calling for a direct assessment of Pkc53E expression and localization in larval blood cells under normal conditions and upon infestation.

      Indeed, it is interesting that a Pkc53E-GFP fusion protein is cytoplasmic in the eye. The construct reported by Shieh et al. however, i.e. the B-isoform, is preferentially expressed in photoreceptors, where it regulates the de-polymerization of the actin cytoskeleton.

      Due to the eye-specific expression, we unfortunately cannot use the Pkc53E-B-GFP construct to test for Pkc53E’s distribution in other tissues.

      As this construct is of little use for studying hematopoiesis, we have instead used Pck53E-GFP (BL59413) derived from a protein trap: again, GFP is primarily seen in the cytoplasm of hemocytes, including lamellocytes of infected larvae. However, in a small number of hemocytes, GFP appears to be also nuclear (Fig. 8A), leaving the possibility that activated Pkc53E may localize to the nucleus, eventually phosphorylating Su(H) and downregulating Notch activity. As Su(H) enters the nucleus piggy-back with NICD, however, phosphorylation may as well occur at the membrane or within the cytoplasm. We note, however, that these hypotheses require a much more detailed analysis.

      Furthermore, the effect of the PKCa agonist PMA on Su(H)-induced reporter gene expression in cell culture and crystal cell number in vivo is somehow consistent with the authors hypothesis, but some controls are missing (notably western blots to show that PMA/Staurosporine treatment does not affect Su(H)-VP16 level) and it is unclear why STAU treatment alone promotes Su(H)-VP16 activity (in their previous reports, the authors found no difference between Su(H)S269A-VP16 and Su(H)-VP16) or why PMA treatment still has a strong impact on crystal cell number in Su(H)S269A larvae.

      We have added a Western blot showing that the treatment does not affect Su(H)-VP16 expression levels (Figure 5_supplement 1). As STAU is a general kinase inhibitor, it may obviate any inhibitory phosphorylation of Su(H)-VP16 in the HeLa cells, e.g. that by Akt1, CAMK2D or S6K which pilot T271, phosphorylation of which is expected to affect the DNA-binding of Su(H) as well (Figure 3_supplement 2). Moreover, in the previous report, we used different constructs with regard to the promoter, and we used RBPJ instead of Su(H), which may explain some of the discrepancies. As PMA is not specific to just Pkc53E, the altered crystal cell numbers may result from the influence on other kinases involved in blood cell homeostasis, as predicted by our genetic screen (Figure 3_supplement 1).

      Reviewer #1 (Recommendations For The Authors):

      (1) The authors should provide a more elaborate examination of larval blood cell types and blood cell counts under normal conditions and following infestation in the different zygotic mutants as well as upon Pkc53 knock-down. A thorough examination of PSC integrity should be performed and the maintenance of core blood cell progenitors examined. The authors should also clarify when after infestation the LG and larval bleeds are analyzed.

      - a more elaborate examination of larval blood cell types:

      - examination of larval blood cell counts under normal conditions: hemocyte # in gwt, SA, SD, & Pkc

      - examination of larval blood cell counts after infestation: hemocyte # in gwt, SA, SD, & Pkc

      - thorough examination  of PSC integrity: in gwt, SA, SD, & Pkc

      - thorough examination of blood cell progenitors: in gwt, SA, SD, & Pkc

      - clarify timing

      Hemocyte numbers of the various genotypes and conditions were recorded and are presented in Figure 1_S1 and Figure 9_S1. Timing was elaborated in the text and the Methods section.

      (2) The authors should clarify why they use lz-GAL4 or hml-GAL4 and what we can infer from using these different drivers.

      See above. The reasoning was included in the text.

      (3) The percentage of hatching of Su(H)S269A and Su(H)gwt flies in the absence of infestation should also be scored; a small decrease in Su(H)S269A viability might explain the observed differences in survival to wasp infestation. Absolute blood cell numbers (in the absence of infestation) have also been correlated with survival to infection and should be checked.

      Percentage of the emerging flies and hemocyte numbers in the absence of infestation were recorded and included in Figure 2, Figure 1_S1, Figure 9_S1.

      (4) Whereas the impact of Su(H)S269A or Pkc53E mutation on lamellocytes production is clear, there is still a substantial reduction in crystal cell production following infestation. So I wouldn't conclude that the Su(H) larvae are "unable" to detect this immune challenge or respond to it (line 116).

      Thank you for the hint, we corrected the text.

      (5) The expression and localization of Pkc53E in larval blood cells should be investigated, for instance using the Pkc53E-GFP line recently published by Shieh et al. (or at least at the RNA level).

      Firstly, we confirmed expression of Pkc53E in hemocytes by RT-PCR (Figure 8_S1 supplement). Secondly, expression of Pkc53E-GFP was monitored in hemocytes (Figure 8). To this end, we used the protein trap (BL59413), since the one published by Shieh et al., 2023 is restricted to photoreceptors.

      (6) It would be interesting to test the anti-pS269 antibody in immunostaining (using Su(H)S269A as negative control).

      Unfortunately, the pS269 antiserum does not work in situ at all.

      (7) The authors must perform a western blot with anti-pS269 in Pkc53e mutant to show that Su(H) is not phosphorylated anymore after wasp infestation.

      The blot gives a negative result.

      (8) It is surprising that no signal is seen in the absence of infestation with anti-pS269: the fact that Su(H)S269A have more crystal cells suggest that there is a constitutive level of phosphorylation of Su(H).

      We fully agree: In the ideal world, we would expect a low level of S269 phosphorylation in the wild type as well. However, given the lousy specificity of our antibody, we were happy to see phospho-Su(H) in infected larvae. We are currently working hard to get a better antibody. 

      (9) The authors should check Su(H)-VP16 levels and phosphorylation status after PMA and/or staurosporine treatment. Some clarifications are also needed to explain the impact of PMA in Su(H)S269 larvae (this clearly suggests that PKC has other substrates implicated in crystal cell development).

      Su(H)-VP16 expression levels were monitored by Western blot and were not altered conspicuously (Figure 5_1 supplement). Presumably, Pkc53E is not the only kinase involved in Su(H) phosphorylation or the transduction of stress signals. Moreover, PMA may have a more general effect on larval development and hematopoiesis affecting both genotypes. We included this reasoning in the text.

      (10) Concerning the redaction, the authors forgot to mention and discuss the work of Cattenoz et al. (EMBO J 2020). The presentation of the screen for kinase candidates could be streamlined and better illustrated (notably supplement table 4, which would be easier to grasp as a figure/graph). The discussion could be shortened (notably the part on T cells), and I don't really understand lines 374-376 (why is it consistent?).

      We are sorry for omitting Cattenoz et al. 2020, which we have now included. We fully agree that this paper is of utmost importance to our work. We streamlined the screen and included a new figure in addition to table 4 summarizing the results graphically (Figure 3_S1 supplement). We cut on the T cell part and omitted the strange lines.

      Reviewer #2 (Public Review):

      Summary:

      The current draft by Deischel et.al., entitled "Inhibition of Notch activity by phosphorylation of CSL in response to parasitization in Drosophila" decribes the role of Pkc53E in the phosphorylation of Su(H) to downregulate its transcriptional activity to mount a successful immune response upon parasitic wasp-infection. Overall, I find the study interesting and relevant especially the identification of Pkc53E in phosphorylation of Su(H) is very nice. However, I have a number of concerns with the manuscript which are central to the idea that link the phosphorylation of Su(H) via Pkc53E to implying its modulation of Notch activity. I enlist them one by one subsequently.

      Strengths:

      I find the study interesting and relevant especially because of the following:

      (1) The identification of Pkc53E in phosphorylation of Su(H) is very interesting.

      (2) The role of this interaction in modulating Notch signaling and thereafter its requirement in mounting a strong immune response to wasp infection is also another strong highlight of this study.

      Weaknesses:

      (1) Epistatic interaction with Notch is needed: In the entire draft, the authors claim Pkc53E role in the phosphorylation of Su(H) is down-stream of notch activity. Given the paper title also invokes Notch, I would suggest authors show this in a direct epistatic interaction using a Notch condition. If loss of Notch function makes many more lamellocytes and GOF makes less, then would modulating Pkc53E (and SuH)) in this manifest any change? In homeostasis as well, given gain of Notch function leads to increased crystal cells the same genetic combinations in homeostasis will be nice to see.

      While I understand that Su(H) functions downstream of Notch, but it is now increasingly evident that Su(H) also functions independent of Notch. An epistatic relationship between Notch and Pkc will clarify if this phosphorylation event of Su(H) via Pkc is part of the canonical interaction being proposed in the manuscript and not a non-canoncial/Notch pathway independent role of Su(H).

      This is important, as I worry that in the current state, while the data are all discussed inlight of Notch activity, any direct data to show this affirmatively is missing. In our hands we do find Notch independent Su(H) function in immune cells, hence this is a suggestion that stems from our own personal experience.

      The role of Notch in Drosophila hematopoiesis, notably during crystal cell development in both hematopoietic compartments is well established; likewise the role of Su(H) as integral signal transducer in this context (e.g. Duvic et al., 2002). Not only promotes Notch activity crystal cell fate by upregulating target genes, at the same time it prevents adopting the alternative plasmatocyte fate (e.g. Terriente-Felix et al., 2013). We could confirm the downregulation of Notch target gene expression in response to wasp infestation by qRT-PCR, which was discovered earlier by Small et al. (2014). This is clearly in favor of a repression of Notch activity rather than a relief of inhibition by Su(H). A ligand-independent activation of Notch signaling has been uncovered in the context of crystal cell maintenance in the lymph gland involving Sima/Hif-α, including Su(H) as transcriptional mediator (Mukherjee et al., 2011). However, we are unaware of a respective Su(H) activity independent of Notch.

      Certainly, Su(H) acts independently of Notch in terms of gene repression. Here, Su(H) forms a repressor complex together with H and co-repressors Groucho and CtBP to silence Notch target genes. Accordingly, loss of Su(H) or H may induce the upregulation of respective gene expression independent of Notch activity. This has been demonstrated, for example, during wing and heart development (Klein et al., 2000; Kölzer, Klein, 2006; Panta et al., 2020). Moreover, during axis formation of the early embryo, global repression is brought about by Su(H) and relieved by activated Notch (Koromila, Stathopolous, 2019). In all these instances, Su(H) is thought to act as a molecular switch, and the activation of Notch causes a strong expression of the respective genes. Likewise, the loss of DNA-binding resulting from the phosphorylation of Su(H) allows the upregulation of repressed Notch target genes in wing imaginal discs, e.g. dpn, as we have demonstrated before with overexpression and clonal analyses (Nagel et al. 2017; Frankenreiter et al., 2021). However, H does not contribute to crystal cell homeostasis, i.e. de-repression of Notch target genes does not appear to be a major driver in this context, asking for additional mechanisms to downregulate Notch activity. Our work provides evidence that these inhibitory mechanisms involves the phosphorylation of Su(H) by Pkc53E. Formally, we cannot exclude alternative mechanisms. Hence, we have tried to avoid the direct link between Su(H) phosphorylation and the inhibition of Notch activity throughout the text, including the title. Moreover, we have discussed the possible consequences of Su(H) lack of DNA binding, interfering either with the activation of Notch target genes or abrogating their repression.

      In addition, we have performed new experiments addressing the epistasis between Notch and Su(H) during crystal cell formation (Figure 1_supplement 1). To this end, we knocked down Notch activity in hemocytes by RNAi (hml::N-RNAi) in the Su(H)gwt and Su(H)S269A background, respectively. Indeed, Notch downregulation strongly impairs crystal cell development independent of the genetic background as expected if Notch were epistatic to Su(H). We attribute the slightly elevated crystal cell numbers observed in the Su(H)S269A background to the increase in the embryonic precursors (see Fig. 4; Frankenreiter et al. 2021). Of note, the Notch gain of function allele Ncos479 also displayed a likewise increase in embryonic crystal cell precursors as well as in crystal cells within the lymph gland (Frankenreiter et al. 2021).

      (2) Temporal regulation of Notch activity in response to wasp-infection and its overlapping dynamics of Su(H) phosphorylation via Pkc is needed:

      First, I suggest the authors to show how Notch activity post infection in a time course dependent manner is altered. A RT-PCR profile of Notch target genes in hemocytes from infected animals at 6, 12, 24, 48 HPI, to gauge an understanding of dynamics in Notch activity will set the tone for when and how it is being modulated. In parallel, this response in phospho mutant of Su(H) will be good to see and will support the requirement for phosphorylation of Su(H) to manifest a strong immune response.

      Indeed, it would be extremely nice to follow the entire processes in every detail, ideally at the cellular level. The challenge, however, is quantities. The mRNA isolated from hemocytes could be barely quantified, although the subsequent ct-values were ok. We quantified NRE-GFP expression, introduced into Su(H)gwt and Su(H)S269A, as well as atilla expression. We were able to generate data for two time slots, 0-6 h and 24-30 h post infection. The data are provided in the extended Figure 1G, and show a strong drop of NRE-GFP in the infected Su(H)gwt control compared to the uninfected animals, whereas expression in Su(H)S269A plateaus at around 60%-70% of the infected Su(H)gwt control. Atilla expression jumps up in the control, but stays low in Su(H)S269A hemocytes.

      Second, is the dynamics of phosphorylation in a time course experiment is missing. While the increased phosphorylation of Su(H) in response to wasp-infestation shown in Fig.2B is using whole animal, this implies a global down-regulation of Su(H)/Notch activity. The authors need to show this response specifically in immune cells. The reader is left to the assumption that this is also true in immune cells. Given the authors have a good antibody, characterizing this same in circulating immune cells in response to infection will be needed. A time course of the phosphorylation state at 6, 12, 24, 48 HPI, to guage an understanding of this dynamics is needed.

      We really would love to do these experiments. Unfortunately, our pS269 antibody is rather lousy. It does not allow to detect Su(H) protein in tissue or cells, nor does it work on protein extracts in Westerns or for IP. Hence, we have no way so far to demonstrate cell or tissue specificity of Su(H) phosphorylation. So far, we were lucky to detect mCherry-tagged Su(H) proteins pulled down in rather large amounts with the highly specific nano-bodies. We have tried very hard to repeat the experiment with hemolymph and lymph glands only, but we have failed so far. Hence, we have to state that our antibody is neither suitable for in vivo analyses, nor for a detection of phospho-Su(H) at lower levels.

      The authors suggest, this mechanism may be a quick way to down-regulate Notch, hence a side by side comparison of the dynamics of Notch down-regulation (such as by doing RT-PCR of Notch target genes following different time point post infection) alongside the levels of pS269 will strengthen the central point being proposed.

      We fully agree and hope to address these issues in the future by improving our tools.

      Last, in Fig7. the authors show Co-immuno-precipitation of Pkc53EHA with Su(H)gwt-mCh 994 protein from Hml-gal4 hemocytes. I understand this is in homeostasis but since this interaction is proposed to be sensitive to infection, then a Co-IP of the two in immune cells, upon infection should be incorporated to strengthen their point.

      We do not fully agree with the reviewer. Although we also think that the interaction between Pkc53E and Su(H) might occur more frequently upon infection, we propose that this is a transient process occurring in several but not all hemocytes at a given time. Moreover, in the described experiment, Pkc53E-HA was expressed in hemocytes via the UAS/Gal4 system. We cannot exclude that this approach causes an overexpression. Hence, we would not expect considerable differences between unchallenged and infested animals.

      (3) In Fig 5B, the authors show the change in crystal cell numbers as read out of PMA induced activation of Pkc53E and subsequent inhibition of Su(H) transcriptional activity, I would suggest the authors use more direct measures of this read out. RT-PCR of Su(H) target genes, in circulating immune cells, will strengthen this point. Formation of crystal cells is not just limited to Notch, I am not convinced that this treatment or the conditions have other affect on immune cells, such as any impact on Hif expression may also lead to lowering of CC numbers. Hence, the authors need to strengthen this point by showing that effects are direct to Notch and Su(H) and not non-specific to any other pathway also shown to be important for CC development.

      We agree with the Reviewer that the rather general influence of PMA on PKCs might present a systemic stress to the animal. For example, we observed a slight drop of crystal cell numbers also in Su(H)S269A, suggesting other kinases apart from Pkc53E were affected that are involved in crystal cell homeostasis. We have included this notion in the text. To provide more conclusive evidence we also fed Staurosporine to the larvae which reversed the PMA effect. In addition, we assayed the expression of NRE-GFP in hemocytes of infected animals by qRT-PCR, and observed a strong drop in the infected versus uninfected control but less so in Su(H)S269A. The new data are provided in extended Figures 1G and 5B.

      (4) In addition to the above mentioned points, the data needs to be strengthened to further support the main conclusions of the manuscript. I would suggest the authors present the infection response with details on the timing of the immune response. Characterization of the immune responses at respective time points (as above or at least 24 and 48 HPI, as norms in the field) will be important. Also, any change in overall cell numbers, other immune cells, plasmatocytes or CC post infection is missing and is needed to present the specificity of the impact. The addition of these will present the data with more rigor in their analysis.

      Total hemocyte numbers of the various genotypes, i.e. control, Su(H)S269A, Su(H)S269D, and Pkc53ED28 were included before and after wasp infestation in supplemental Figures 1_S1 and 9_S1. 

      (5) Finally, what is the view of the authors on what leads to activation of Pkc53E, any upstream input is not presented. It will be good to see if wasp infection leads to increased Pkc53 kinase activity.

      The analysis of the full process is an ongoing project. We propose that ROS is produced upon the wasps’ sting, which is to trigger the subsequent cascade of events. These have to end with activation of Pkc53E in the presumptive pre-lamellocyte pool of both lineages, i.e. in plasmatocyte of the hemolymph, presumably in the sessile compartment (Tattikotta et al., 2021) and at the same time in the lymph gland cortex harboring the LM precursors (Blanco-Obregon et al., 2020). One of the known upstream kinases, Pdk1 has a similar impact on crystal cell development as Pkc53E, making its involvement likely. Moreover, we think that other PKCs influence the process as well.

      Without a good read out, e.g. a functional pSu(H) antiserum working in situ or a Pkc-activity reporter, it will be quite difficult to follow up this question. However, we already know that Pkc53E is expressed in hemocytes of all types independent of wasp infestation, in agreement with a role during lamellocyte differentiation. We hope to unravel the process in more of it in the future.

      Overall, I think the findings in the current state are interesting and fill an important gap, but the authors will need to strengthen the point with more detailed analysis that includes generating new data and also presenting the current data with more rigor in their approach. The data have to showcase the relationship with Notch pathway modulation upon phosphorylation of CSL in a much more comprehensive way, both in homeostasis and in response to infection which is entirely missing in the current draft.

      Reviewer #3 (Public Review):

      Diechsel et al. provide important and valuable insights into how Notch signalling is shut down in response to parasitic wasp infestation in order to suppress crystal cell fate and favour lamellocyte production. The study shows that CSL transcription factor Su(H) is phosphorylated at S269A in response to parasitic wasp infestation and this inhibitory phosphorylation is critical for shutting down Notch. The authors go on to perform a screen for kinases responsible for this phosphorylation and have identified Pkc53E as the specific kinase acting on Su(H) at S269A. Using analysis of mutants, RNAi and biochemistry-based approaches the authors convincingly show how Pkc53E-Su(H) interaction is critical for remodelling hematopoiesis upon wasp challenge. The data presented supports the overall conclusions made by the authors. There are a few points below that need to be addressed by the authors to strengthen the conclusions:

      (1) The authors should check melanized crystal cells in Su(H)gwt and Su(H)S269A in presence of PMA and Staurosporine?

      Thank you for the suggestion. We included the results of PMA + Staurosporine feeding into an extended Fig. 5B; they match those from the HeLa cells. Unfortunately, Staurosporine alone was lethal for the larvae at various concentrations, presumably owing to the overarching inhibition of kinase activity. This global effect also explains the high crystal cell numbers in the control fed with PMA + STAU compared to the untreated animals, as the downregulation of many kinases results in higher crystal cell numbers, a fact uncovered in our genetic screen.

      (2) Data for number of dead pupae, flies eclosed, wasps emerged post infestation should be monitored for the following genotypes and should be included:

      Pkc53EΔ28_, Su(H)S269A,_ Pkc53EΔ28 Su(H)S269A, Su(H)S269D, Su(H)S269D Pkc53EΔ28

      We extended the data with and without infection. The respective data are shown in a new Fig. 9 and an extended Fig. 2,  except for the Su(H)S269D allele. Su(H)S269D is larval lethal, i.e. dies too early for wasp development, and hence could not be included in the assay. Overall, Pkc53EΔ28 matched Su(H)S269A_._

      (3) The exact molecular trigger for activation of Pkc53E upon wasp infestation is not clear.

      Indeed, and we would love to know! Perhaps, the generation of Ca2+ by the wasp’s breach of the larval cuticle results in Pkc53E activation. The generation of ROS could be involved as well. At this point, we can only speculate. We hope to be able in the future to obtain direct experimental evidence for the one or the other hypothesis.

      (4) The authors should check if activating ROS alone or induction of Calcium pulses/DUOX activation can mimic this condition and can trigger activation of Pkc53E and thereby cause phosphorylation of Su(H) at S269

      The reviewer’s suggestions open up a new field of investigations, and are hence beyond of the scope of this article. However, we want to pursue the research in this direction, albeit we realize that counting crystal cells is too coarse but to give a first impression, and that lamellocytes may form already by breaching the larval cuticle. A major challenge shall be direct measurements of Pkc53E activation. To date, we have no tools for this, but ideally, we would like to have a direct, biochemical read out. Although we have been unsuccessful in the past, we want to develop a strong and specific phospho-S269 antibody that is also working in situ. Alternatively, we think of developing a PS-phosphorylation reporter, to allow reasonably addressing these questions.

      (5) Does Pkc53E get activated during sterile inflammation?

      We are in the process of addressing this issue, however, feel that his topic is beyond the scope of this paper. Our preliminary experiments, however, support the notion of a phospho-dependent regulation of Su(H) also in this context.

      Reviewer #3 (Recommendations For The Authors):

      The authors provide a graphical representation of major phenotypes that form the basis of their investigation and conclusions but have not supplemented the quantitation with images that represent these phenotypes. The authors need to include the following data to strengthen their conclusions:

      (1) The authors should include representative images for each of the genotypes/conditions (in presence and absence of wasp infestation) based on which corresponding plots have been made in Figure 1. Please include this for both circulating lamellocytes in the hemolymph and in the lymph glands since this is one of the main figures presenting the key findings.

      The data have been included in Figure 1-S2 supplement.

      (2) Please include representative images of LG with Hnt staining and corresponding images for melanization for each of the genotypes used in the plots in Figure 6A and B.

      The data have been included in Figure 6-S2 supplement.

      (3) Representative images for each of the genotypes in Figure 7A & B should be included (circulating crystal cells and lymph gland crystal cell numbers).

      Representative images for each of the genotypes for Fig. 7A have been included in Figure 7-S1 and for the old Fig. 7B in Figure 9-S2 supplement, respectively.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Response to reviewers

      We thank the Editor and the Reviewers for their constructure review. In the light of this feedback, we have made a number of changes and additions to the manuscript, that we think improved the presentation and hopefully address the majority of the concerns by the reviewers.

      Main changes:

      •   We added a new SI section (B1) with a population dynamics simulation in the high clonal interference regime and without expiring fitness (see R1: (1)).

      •   We added a new SI section (A9) with the derivation of the equilibrium state of our SIR model in the case of 𝑀 immune groups and in the limit 𝜀 → 0 (see R1: (5)).

      •   The text of the section Abstraction as “expiring” fitness advantage has been modified.

      •   We added a new SI section (A4) describing the links between parameters of the “expiring fitness” and SIR models.

      All three reviewers had concerns about the relation between our SIR model and the “expiring fitness” model, that we hope will be addressed by the last two items listed above. In particular, we would like to underline the following points:

      •   The goal of our SIR model is to give a mechanistic explanation of partial sweeps using traditional epidemiological models. While ecological models (e.g. consumer resource) can give rise to the same phenomenology, we believe that in the context of host-pathogen interaction it is relevant to explicitely show that SIR models can result in partial sweeps.

      •   The expiring fitness model is mainly an effective model: it reproduces some qualitative features of the SIR but does not quantitatively match all aspects of the frequency dynamics in SIR models.

      •   It is possible to link the parameters of the SIR (𝛼,𝛾,𝑏,𝑓) and expiring fitness (𝑠,𝑥,𝜈) models at the beginning of the invasion of the variant (new SI section A4). However, the two models also differ in significant ways (the SIR model can for example oscillate, while the effective model can not). The correspondence of quantities like the initial invasion rate and the ‘expiration rate’ of fitness effects is thus only expected to hold for some time after the emergence of a novel variant.

      Public reviews:

      Reviewer 1:

      Summary In this work, the authors study the dynamics of fast-adapting pathogens under immune pressure in a host population with prior immunity. In an immunologically diverse population, an antigenically escaping variant can perform a partial sweep, as opposed to a sweep in a homogeneous population. In a certain parameter regime, the frequency dynamics can be mapped onto a random walk with zero mean, which is reminiscent of neutral dynamics, albeit with differences in higher order moments. Next, they develop a simplified effective model of time dependent selection with expiring fitness advantage, and posit that the resulting partial sweep dynamics could explain the behaviour of influenza trajectories empirically found in earlier work (Barrat-Charlaix et al. Molecular Biology and Evolution, 2021). Finally, the authors put forward an interesting hypothesis: the mode of evolution is connected to the age of a lineage since ingression into the human population. A mode of meandering frequency trajectories and delayed fixation has indeed been observed in one of the long-established subtypes of human influenza, albeit so far only over a limited period from 2013 to 2020. The paper is overall interesting and well-written. Some aspects, detailed below, are not yet fully convincing and should be treated in a substantial revision.

      We thank the reviewer for their constructive criticism. The deep split in the A/H3N2 HA segment from 2013 to 2020 is indeed the one of the more striking examples of such meandering frequency dynamics in otherwise rapidly adapting populations. But the up and down of H1N1pdm clade 5a.2a.1 in recent years might be a more recent example. We argue that such meandering dynamics might be a common contributor to seasonal influenza dynamics, even if it only spans 3-6 years.

      (1) The quasi-neutral behaviour of amino acid changes above a certain frequency (reported in Fig, 3), which is the main overlap between influenza data and the authors’ model, is not a specific property of that model. Rather, it is a generic property of travelling wave models and more broadly, of evolution under clonal interference (Rice et al. Genetics 2015, Schiffels et al. Genetics 2011). The authors should discuss in more detail the relation to this broader class of models with emergent neutrality. Moreover, the authors’ simulations of the model dynamics are performed up to the onset of clonal interference 𝜌/ 𝑠0 \= 1 (see Fig. 4). Additional simulations more deeply in the regime of clonal interference (e.g. 𝜌/ 𝑠0 \= 5) show more clearly the behaviour in this regime.

      We agree with the reviewer that we did not discuss in detail the effects of clonal interference on quasi-neutrality and predictability. As suggested, we conducted additional simulations of our population model in the regime of high clonal interference (𝜌/ 𝑠0 ≫ 1) and without expiring fitness effects. The results are shown in a new section of the supplementary information. These simulations show, as expected, that increasing clonal interference tends to decrease predictability: the fixation probability of an adaptive mutation found at frequency 𝑥 moves closer to 𝑥 as 𝜌 increases. However, even in a case of strong interference 𝜌/ 𝑠0 \= 32, 𝑝fix remains significantly different from the neutral expectation. We conclude from this that while it is true that dynamics tend to quasi-neutrality in the case of strong interference, this effect alone is unlikely to explain observations of H3N2 influenza dynamics. In our previous publication (BarratCharlaix et al, MBE, 2021) we have also investigated the effect of epistatic interactions between mutations, along side strong clonal interference. We concluded that, while most of these processes make evolution less predictable and push 𝑝fix towards the diagonal, it is hard to reproduce the empirical observations with realistic parameters. The “expiring fitness” model, however, produces this quite readily.

      But there are qualitative differences between quasi-neutrality in traveling wave models and the expiring fitness model. In the traveling wave, a genotype carrying an adaptive mutation is always fitter than if it didn’t carry the mutation. Quasi-neutrality emerges from the accumulation of fitness variation at other loci and the fact that the coalescence time is not much bigger than the inverse selection coefficient of the mutation. In the expiring fitness model, the selective effect of the mutation itself goes away with time. We now discuss the literature on quasi-neutrality and cite Rice et al. 2015 and Schiffels et al. 2011.

      In this context, I also note that the modelling results of this paper, in particular the stalling of frequency increase and the decrease in the number of fixations, are very similar to established results obtained from similar dynamical assumptions in the broader context of consumer resource models; see, e.g., Good et al. PNAS 2018. The authors should place their model in this broader context.

      We thank the reviewer for pointing out the link between consumer resource models and our work. We further strengthened our discussion of the similarity of the phenomenology to models typically used in ecology and made an effort to highlight the link between consumer-resource models and ours in the introduction and in the part on the SIR model.

      (2) The main conceptual problem of this paper is the inference of generic non-predictability from the quasi-neutral behaviour of influenza changes. There is no question that new mutations limit the range of predictions, this problem being most important in lineages with diverse immune groups such as influenza A(H3N2). However, inferring generic non-predictability from quasi-neutrality is logically problematic because predictability refers to individual trajectories, while quasi-neutrality is a property obtained by averaging over many trajectories (Fig. 3). Given an SIR dynamical model for trajectories, as employed here and elsewhere in the literature, the up and down of individual trajectories may be predictable for a while even though allele frequencies do not increase on average. The authors should discuss this point more carefully.

      We agree with the reviewer that the deterministic SIR model is of course predictable. Similarly, a partial sweep is predictable. But we argue that expiring fitness makes evolution less predictable in two ways: (i) When a new adaptive mutation emerges and rises in frequency, we typically don’t know how rapidly its fitness effect is ‘expiring’. Thus even if we can measure its instantaneous growth rate accurately, we can’t predict its fate far into the future. (ii) Compared to the situation where fitness effects are not expiring, time to fixation is longer and there are more opportunities for novel mutations to emergence and change the course of the trajectory. We have tried to make this point clearer in the manuscript.

      (3) To analyze predictability and population dynamics (section 5), the authors use a Wright-Fisher model with expiring fitness dynamics. While here the two sources of the emerging neutrality are easily tuneable (expiring fitness and clonal interference), the connection of this model to the SIR model needs to be substantiated: what is the starting selection 𝑠0 as a function of the SIR parameters (𝑓,𝑏,𝑀,𝜀), the selection decay 𝜈 = 𝜈(𝑓,𝑏,𝑀,𝜀,𝛾)? This would enable the comparison of the partial sweep timing in both models and corroborate the mapping of the SIR onto the simplified W-F model. In addition, the authors’ point would be strengthened if the SIR partial sweeps in Fig.1 and Fig.2 were obtained for a combination of parameters that results in a realistic timescale of partial sweeps.

      We added a new section to the SI (A4) that relates the parameters of the SIR and expiring fitness models. In particular, we compute the initial growth rate 𝑠0 and a proxy for the fitness expiry rate 𝜈 as a function of the SIR parameters 𝛼,𝛾,𝑓,𝑏,𝑀, at the instant where the variant is introduced. The initial growth rate depends primarily on the degree of immune escape 𝑓, while the expiration rate 𝜈 is related to incidence 𝐼wt + 𝐼𝑚. However, as both models have fundamentally different dynamics, these relations are only valid on time scales shorter than potential oscillations of the SIR model. Beyond that, the connection between the models is mostly qualitative: both rely on the fact that growth rate of a strain diminishes when the strain becomes more frequent, and give rise to partial sweeps.

      In Figure 1, the time it takes a partial sweep to finish is roughly 100− 200 generations (bottom right panel). If we consider H3N2 influenza and take one generation to be one week, this corresponds to a sweep time of 2 to 4 years, which is slightly slower but roughly in line with observations for selective sweeps. This time is harder to define if oscillatory dynamics takes place (middle right panel), but the time from the introduction of the mutant to the peak frequency is again of about 4 years. The other parameters of the model correspond to a waning time of 200 weeks and immune escape on the order of 20-30% change in susceptibility.

      Reviewer 2:

      Summary

      This work addresses a puzzling finding in the viral forecasting literature: high-frequency viral variants evince signatures of neutral dynamics, despite strong evidence for adaptive antigenic evolution. The authors explicitly model interactions between the dynamics of viral adaptations and of the environment of host immune memory, making a solid theoretical and simulation-based case for the essential role of host-pathogen eco-evolutionary dynamics. While the work does not directly address improved data-driven viral forecasting, it makes a valuable conceptual contribution to the key dynamical ingredients (and perhaps intrinsic limitations) of such efforts.

      Strengths

      This paper follows up on previous work from these authors and others concerning the problem of predicting future viral variant frequency from variant trajectory (or phylogenetic tree) data, and a model of evolving fitness. This is a problem of high impact: if such predictions are reliable, they empower vaccine design and immunization strategies. A key feature of this previous work is a “traveling fitness wave” picture, in which absolute fitnesses of genotypes degrade at a fixed rate due to an advancing external field, or “degradation of the environment”. The authors have contributed to these modeling efforts, as well as to work that critically evaluates fitness prediction (references 11 and 12). A key point of that prior work was the finding that fitness metrics performed no better than a baseline neutral model estimate (Hamming distance to a consensus nucleotide sequence). Indeed, the apparent good performance of their well-adopted “local branching index” (LBI) was found to be an artifact of its tendency to function as a proxy for the neutral predictor. A commendable strength of this line of work is the scrutiny and critique the authors apply to their own previous projects. The current manuscript follows with a theory and simulation treatment of model elaborations that may explain previous difficulties, as well as point to the intrinsic hardness of the viral forecasting inference problem.

      This work abandons the mathematical expedience of traveling fitness waves in favor of explicitly coupled eco-evolutionary dynamics. The authors develop a multi-compartment susceptible/infected model of the host population, with variant cross-immunity parameters, immune waning, and infectious contact among compartments, alongside the viral growth dynamics. Studying the invasion of adaptive variants in this setting, they discover dynamics that differ qualitatively from the fitness wave setting: instead of a succession of adaptive fixations, invading variants have a characteristic “expiring fitness”: as the immune memories of the host population reconfigure in response to an adaptive variant, the fitness advantage transitions to quasi-neutral behavior. Although their minimal model is not designed for inference, the authors have shown how an elaboration of host immunity dynamics can reproduce a transition to neutral dynamics. This is a valuable contribution that clarifies previously puzzling findings and may facilitate future elaborations for fitness inference methods.

      The authors provide open access to their modeling and simulation code, facilitating future applications of their ideas or critiques of their conclusions.

      We thank the reviewer for their summary, assessement, and constructive critique.

      (1) The current modeling work does not make direct contact with data. I was hoping to see a more direct application of the model to a data-driven prediction problem. In the end, although the results are compelling as is, this disconnect leaves me wondering if the proposed model captures the phenomena in detail, beyond the qualitative phenomenology of expiring fitness. I would imagine that some data is available about cross-immunity between strains of influenza and sarscov2, so hopefully some validation of these mechanisms would be possible.

      We agree with the reviewer that quantitatively confronting our model with data would be very interesting. Unfortunately, most available serological data for influenza and SARS-CoV-2 is obtained using post-infection sera from previoulsy naive animal models. To test our model, we would require human serology data, ideally demographically resolved, and a way to link serology to transmission dynamics. Furthermore, our model is mostly an explanation for qualitative features of variant dynamics and their apparent lack of predictability. We therefore considered that quantitative validation using data is out of scope of this work.

      (2) After developing the SIR model, the authors introduce an effective “expiring fitness” model that avoids the oscillatory behavior of the SIR model. I hoped this could be motivated more directly, perhaps as a limit of the SIR model with many immune groups. As is, the expiring fitness model seems to lose the eco-evolutionary interpretability of the SIR model, retreating to a more phenomenological approach. In particular, it’s not clear how the fitness decay parameter 𝜈 and the initial fitness advantage 𝑠0 relate to the key ecological parameters: the strain cross-immunity and immune group interaction matrices.

      The expiring fitness model emerges as a limiting case, at least qualitatively, of the SIR model when growth rate of the new variant is small compared to the waning rate and the SIR model does not oscillate. This can be readily achieved by many immune groups, which reconciles the large effect of many escape mutations and the lack of oscillation by confining the escape to some fraction of the population. Beyond that, the expiring fitness model is mainly an effective model that allows us to study the consequences of partial sweeps on predictability on long timescales. As stated in the “Main changes” section at the start of this reply, we added an SI section which links parameters of the two models. However, we underline the fact that beyond the phenomenon of partial sweeps, the dynamics of the two are different.

      Reviewer 3:

      Summary

      In this work the authors start presenting a multi-strain SIR model in which viruses circulate in an heterogeneous population with different groups characterized by different cross-immunity structures. They argue that this model can be reformulated as a random walk characterized by new variants saturating at intermediate frequencies. Then they recast their microscopic description to an effective formalism in which viral strains lose fitness independently from one another. They study several features of this process numerically and analytically, such as the average variants frequency, the probability of fixation, and the coalescent time. They compare qualitatively the dynamics of this model to variants dynamics in RNA viruses such as flu and SARS-CoV-2.

      Strengths

      The idea that a vanishing fitness mechanisms that produce partial sweeps may explain important features of flu evolution is very interesting. Its simplicity and potential generality make it a powerful framework. As noted by the authors, this may have important implications for predictability of virus evolution and such a framework may be beneficial when trying to build predictive models for vaccine design. The vanishing fitness model is well analyzed and produces interesting structures in the strains coalescent. Even though the comparison with data is largely qualitative, this formalism would be helpful when developing more accurate microscopic ingredients that could reproduce viral dynamics quantitatively. This general framework has a potential to be more universal than human RNA viruses, in situations where invading mutants would saturate at intermediate frequencies.

      We thank the reviewer for their positive remarks and constructive criticism below.

      Weaknesses

      The authors build the narrative around a multi-strain SIR model in which viruses circulate in an heterogeneous population, but the connection of this model to the rest of the paper is not well supported by the analysis. When presenting the random walk coarse-grained description in section 3 of the Results, there is no quantitative relation between the random walk ingredients importantly 𝑃(𝛽) - and the SIR model, just a qualitative reasoning that strains would initially grow exponentially and saturate at intermediate frequencies. So essentially any other microscopic description with these two features would give rise to the same random walk.

      As also highlighted in the response to other reviewers, we now discuss how the parameter of the SIR model are related to the initial growth rate and the ‘expiration’ rate of the effective model. While the phenomenology of the SIR model is of course richer, this correspondence describes its overdamped limit qualitatively well.

      Currently it’s unclear whether the specific choices for population heterogeneity and cross-immunity structure in the SIR model matter for the main results of the paper. In section 2, it seems that the main effect of these ingredients are reduced oscillations in variants frequencies and a rescaled initial growth rate. But ultimately a homogeneous population would also produce steady state coexistence between strains, and oscillation amplitude likely depends on parameters choices. Thus a homogeneous population may lead to a similar coarse-grained random walk.

      The reviewer is correct that the primary effects of using many immune groups is to slow down the increase of novel variant, which in turn dampens the oscillations. Having multiple immune groups widens the parameter space in which partial sweeps without dramatic oscillations are observed. For slow sweeps, similar dymamics are observed in a homogeneous population.

      Similarly, it’s unclear how the SIR model relates to the vanishing fitness framework, other than on a qualitative level given by the fact that both descriptions produce variants saturating at intermediate frequencies. Other microscopic ingredients may lead to a similar description, yet with quantitative differences.

      Both of these points were also raised by other reviewers and we agree that it is worth discussing them at greater length. We now discuss how the parameters of the ‘expiring fitness’ model relate to those of the SIR. We also discuss how other models such as ecological models give rise to similar coarse grained models.

      At the same time, from the current analysis the reader cannot appreciate the impact of such a mean field approximation where strains lose fitness independently from one another, and under what conditions such assumption may be valid.

      In the SIR model, the rate at which strains lose fitness does depend on the precise state of the host population through the quantities 𝑆𝑚 and 𝑆wt , which is apparent in equation (A27) of the new SI section. The fact that a new variant shifts the equilibrium frequencies of previous strains in a proportional way is valid if the “antigenic space” is of very high dimensions, as explained in section Change in frequency when adding subsequent strains of the SI. It would indeed be interesting to explore relaxations of this assumption by considering a larger class of cross immunity matrices 𝐾. However, in the expiring fitness model, the fact that strains lose fitness independently from each ohter is a necessary simplification.

      In summary, the central and most thoroughly supported results in this paper refer to a vanishing fitness model for human RNA viruses. The current narrative, built around the SIR model as a general work on host-pathogen eco-evolution in the abstract, introduction, discussion and even title, does not seem to match the key results and may mislead readers. The SIR description rather seems one of the several possible models, featuring a negative frequency dependent selection, that would produce coarse-grained dynamics qualitatively similar to the vanishing fitness description analyzed here.

      We have revised the text throughout to make the connections between the different parts of the manuscript, in particular the SIR model and the expiring fitness model, clearer. We agree that the phenomenology of the expiring fitness model is more general than the case of human RNA viruses described by the SIR model, but we think this generality is an attractive feature of the coarse-graining, not a shortcoming. Indeed, other settings with negative frequency dependent selection or eco-systems that adapt on appropriate time scale generate similar dynamics.

      Recommendations for the authors:

      Reviewer 1:

      (4) Line 74: what does fitness mean?

      Many population dynamics models, including ones used for viral forecasting, attach a scalar fitness to each strain. The growth rate of each strain is then computed by substracting the average population fitness to the strain’s fitness. In this sentence, fitness is intended in this way.

      (5) Fig. 1: The equilibrium frequency in the middle and bottom rows is hardly smaller than the equilibrium frequency in the top row for one immune group. This is surprising since for M=10, the variant escapes in only 1/10th of the population, which naively should impact the equilibrium frequency more strongly. Could the authors comment on this?

      This is indeed non-trivial, and a hand-waving argument can be made by considering the extreme case 𝜀 = 0. The variant is then completely neutral for the immune groups 𝑖 > 1, and would be at equilibrium at any frequency in these immune groups. Its equilibrium frequency is then only determined by group 1, which is the only one breaking degeneracy. For 𝜀 > 0 but small, we naturally expect a small deviation from the 𝜀 = 0 case and thus 𝛽 should only change slightly.

      A more rigorous argument with a mathematical proof in the case 𝜀 = 0 is now given in section A4 of the supplementary information.

      (6) Fig. 1: In the caption, it is stated that the simulations are performed with 𝜀 = 0.99. Is this a typo? It seems that it should be 𝜀 = 0.01, as in and just below equation (7).

      This was indeed a typo. It is now fixed.

      (7) Fig. 3: The data analysis should be improved. In order to link the average frequency trajectories to standard population genetics of conditional fixation probabilities, the focal time should always be the time where the trajectory crosses the threshold frequency for the first time. Plotting some trajectories from a later time onwards, on their downward path destined to loss, introduces a systematic bias towards negative clonal interference (for these trajectories, the time between the first and the second crossing of the threshold frequency is simply omitted). The focal time of first crossing of the threshold frequency can easily be obtained, e.g., by linear interpolation of the trajectory between subsequent time points of frequency evalution. In light of the modified procedure, the statements on the on the inertia of the trajectories after crossing 𝑥⋆ (line 356) should be re-examined.

      The way we process the data is already in line with the suggestions of the reviewer. In particular, we use as focal time the first time at which a trajectory is found in the threshold frequency bin. Trajectories that are never seen in the bin because of limited time-resolution are simply ignored.

      In Fig. 3, there are no trajectories that are on their downward path at the focal time and when crossing the threshold frequency. Our other work on predictability of flu Barrat-Charlaix et. al. (2021) has a similar figure, which maybe created confusion.

      (8) Fig. 4: authors write 𝛼/ 𝑠0 in the figure, but should be 𝜈/ 𝑠0.

      Fixed.

      (9) Line 420: authors refer to the blue curve in panel B as the case with strong interference. However, strong interference is for higher 𝜌/ 𝑠0, that is panel D (see point 1).

      Fixed.

      (10) Line 477: typo “there will a variety of mutations”.

      Fixed.

      Reviewer 2:

      Should 𝛼 be 𝜈 in Figure 4 legends?

      Thank you very much for spotting this error. We fixed it.

      Equations 4-5 could be further simplified.

      We factorised the 𝐼 term in equation 4. In equation 5, we prefered to keep the 1− 𝛿/ 𝛼 term as this quantity appears in different calculations concerning the model. For instance, 𝑆 = 𝛿/ 𝛼 at equilibrium.

      The sentence before equation 8 references 𝑃𝛽(𝛽), but this wasn’t previously introduced.

      We now introduce 𝑃𝑏𝜂 at the beginning of the section Ultimate fate of the variant.

      In the last paragraph of page 12, “monotonously” maybe should be “monotonically”.

      Fixed.

      For the supplement section B, you might want a more descriptive title than “other”.

      We renamed this section to Expiring fitness model and random walk.

      Reviewer 3:

      To expand on my previous comments, my main concerns regard the connection of section 2 and the SIR model with the rest of the paper.

      In the first paragraph of page 9 the authors argue that a stochastic version of the SIR model would lead to different fixation dynamics in homogeneous vs heterogeneous populations due to the oscillations. This paragraph is quite speculative, some numerical simulations would be necessary to quantitatively address to what extent these two scenarios actually differ in a stochastic setting, and how that depends on parameters.

      Likewise, the connection between the SIR model, the random walk coarse-grained description and the vanishing fitness model can be investigated through numerical simulations of a stochastic SIR given the chosen population and cross-immunity structures with i.e. 10-20 strains. This would allow for a direct comparison of individual strain dynamics rather than the frequency averages, as well as other scalar properties such as higher moments, coalescent, and fixation probability once reaching a given frequency. It would also be possible to characterize numerically the SIR P(beta) bridging the gap with the random walk description. It’s not obvious to me that the SIR P(beta) would not depend on the population size in the presence of birth-death stochasticity, potentially changing the moments scalings. I appreciate that such simulations may be computationally expensive, but similar numerical studies have been performed in previous phylodynamics works so it shouldn’t be out of reach.

      An alternative, the authors should consider re-centering the narrative directly on the random walk of the vanishing fitness model, mentioning the SIR more briefly as a possible qualitative way to get there. Either way the authors should comment on other ways in which this coarse-grained dynamics could arise.

      In the vanishing fitness model, where variants fitnesses are independent, is an infinite dimensional antigenic space implicitly assumed? If that’s the case, it should be explained in the main text.

      A long simulation of the SIR model would indeed be interesting, but is numerically demanding and our current simulation framework doesn’t scale well for many strains and susceptibilities. We thus refrained from adding extensive simulations.

      In Figure 2B of the main text, the simulation with 7 strains illustrates the qualitative match between the expiring fitness and the SIR model. However, it is clearly not long enough to discuss statistical properties of the corresponding random walk. Furthermore, we do not expect the individual strain dynamics of the SIR and expiring fitness models to match. The latter depends on few parameters (𝛼, 𝑠0), while the former depends on the full state of the host population and of the previous variants.

      In the sectin linking the parameters of the two models, we now discuss the distribution 𝑃(𝛽) of the SIR model for two strains and a specific choice of distribution for the cross immunity 𝑏 and 𝑓.

      Minor comments:

      There is some back and forth in the writing. For instance, when introducing the model, 𝐶𝑖𝑗 is first defined as 1/ 𝑀, then a few paragraphs later the authors introduce that in another limit 𝐶𝑖𝑖 is just much higher than any 𝐶𝑖𝑗, and finally they specify that the former is the fast mixing scenario.

      Another example is in section 2, in the first paragraph they put forward that heterogeneity and crossimmunity have different impacts on the dynamics, but the meaning attributed to these different ingredients becomes clear only a while later after the homogeneous population analysis. Uniforming the writing would make it easier for the reader to follow the authors’ train of thought.

      We removed the paragraph below Equation (1) mentioning the 𝐶𝑖𝑗 \= 1/ 𝑀 case, which we hope will linearize the writing.

      When mentioning geographical structure, why would geography affect how immunity sees pairs of viral strains (differences in 𝐾)?

      Geographic structure could influence cross-immunity because of exposure histories of hosts. For instance in the case of influenza, different geographical regions do not have the same dominating strains in each season, and hosts from different regions may thus build up different immunity.

      In the current narrative there are some speculations about non-scalar fitness, especially in section 2. The heterogeneity in this section does not seem so strong to produce a disordered landscape that defies the notion of scalar fitness in the same way some complex ecological systems do. A more parsimonious explanation for the coexistence dynamics observed here may be a negative frequency dependent selection.

      Our language here was not very precise and we agree that the phenomenology we describe is related to that of frequency dependent selection (mediated by via immunity of the host population that integrates past frequencies). Traveling wave models typically use fitness function that are independent of the population distribution and only account for the evolution via an increasing average fitness. We have made discussion more accurate by stating that we consider a case where fitness depends explicitly on present and past population composition, which includes the case of negative frequency dependent selection.

      I don’t understand the comparison with genetic drift (typo here, draft) in the last paragraph of section 3 given that there is no stochasticity in growth death dynamics.

      We compare the random walk to genetic drift because of the expression of the second moment of the step size. The genetic draft has the same functional form. If one defines the effective population size as in the text, the drift due to random sampling of alleles (neutral drift) and the changes in strain frequency in our model have the same first and second moments. The stochasticity here does not come from the dynamics, which are indeed deterministic, but from the appearance of new mutations (variants) on backgrounds that are randomly sampled in the population. This latter property is shared with genetic draft.

      In the vanishing fitness model, I think the reader would benefit from having 𝑃(𝑠) in the main text, and it should be made more clear what simulations assume what different choice of 𝑃(𝑠).

      We added the expression of 𝑃(𝑠) in the main text. Simulations use the value 𝑠0 \= 0.03, which we added in the caption of Figure 4.

      When comparing the model and data, is the point that COVID is not reproduced due to clonal interference? It seems from the plot that flu has clonal interference as well though. Why is that negligible?

      A similar point has been raised by the first reviewer (see R1-(1)). Clonal interference is not negligible, but we find it to be insufficient to explain the observations made for H3N2 influenza, namely the lack of inertia of frequency trajectories or the probability of fixation. This is shown in the new section (B1) of the SI. Both SARS-CoV-2 and H3N2 influenza experience clonal interference, but the former is more predictable than the latter. Our point is that expiring fitness effects should be stronger in influenza because of the higher immune heterogeneity of the host population, making it less predictable than SARS-CoV-2.

      Does the fixation probability as a function of frequency threshold match the flu data for some parameters sets?

      For H3N2 influenza, the fixation probability is found to be equal to the threshold frequency (see Barrat-Charlaix MBE 2021, also indirectly visible from Fig. 3). In Figure 4, we obtain that either a high expiry rate or intermediate expiry rates and clonal interference regimes match this observation.

      It would be instructive to see examples of the individual variant dynamics of the vanishing fitness model compared to the presented data.

      We added an extra SI figure (S7) showing 10 randomly selected trajectories of individual variants in the case of H3N2/HA influenza and for the expiring fitness model with different parameter choices.

      Figure 4E has no colorbar label. The reader shouldn’t have to look for what that means in the bottom of the SIs. In panels A and B the label should be 𝜈, not 𝛼. Same thing in most equations of page 42.

      We added the colorbar label to the figure and also updated the caption: a darker color corresponds to a higher probability of sweeps to overlap. We fixed the 𝜈 – 𝛼 confusion in the SI and in the caption of the figure.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment 

      This study presents an important finding on the involvement of a Caspase 3-dependent pathway in the elimination of synapses for retinogeniculate circuit refinement and eye-specific territory segregation. This work fits well with the concept of "synaptosis" which has been proposed in the past but lacked in vivo support. Despite its elegant design and many strengths, the evidence supporting the claims of the authors is incomplete, particularly regarding whether Caspase-3 expression can really be isolated to synapses vs locally dying cells, whether microglia direct or instruct synapse elimination, and whether astrocytes are also involved. The work will be of interest to investigators studying cell death pathways, neurodevelopment, and neurodegenerative disease.

      Regarding significance:

      This study provides in vivo evidence that caspase-3 is important for synapse elimination in the visual pathway (Figure 3 and 4) and corroborates the previously proposed but not yet validated “synaptosis” hypothesis. But more significantly, we show that caspase-3 is activated in dLGN relay neurons in response to synapse inactivation (Figure 1) when synaptic competition is present (Figure 2), and that caspase-3 is important for efficient elimination of weakened synapses by microglia (Figure 5 and 6). We consider the causal link between synapse weakening/inactivation and caspase-3 activation to be the most important finding of this study and believe it is an error to not include this aspect of the study in the assessment. The mechanism by which neuronal activity influences synapse elimination is a fundamental question in neuroscience, and our study presents a significant advancement in understanding this problem.

      Regarding strength of evidence:

      We do not agree with the assessment that our evidence should be broadly labeled as “incomplete”. In fact, we argue that many concerns raised by the reviewers are not focused on the main claims made in this study.

      (1) Regarding whether caspase-3 activation (not “expression”, which is the term used in the assessment) is isolated to synapses or occurs in entire cells, we show in Figure 1 that both types of signals can be present. The main concern of the reviewers seems to be that activated caspase-3 signals in apoptotic dLGN relay neurons are irrelevant to our analysis and confound interpretation. We argue that this is not the case.

      In Figure 1, we have two sets of controls demonstrating that the observed apoptosis of dLGN relay neurons occurs specifically in response to synapse inactivation. For each animal that received TeTxLC injection in the right eye, activated caspase-3 signal is compared between the left dLGN, where most of the inactivated synapses are located, and the right dLGN, where the minority of the inactivated synapses are located (between Figure 1B and 1C, also between the first and second group of Figure 1E). We observed apoptotic neurons in the right dLGN with more inactivated synapses but not in the left dLGN with fewer inactivated synapses. The second control is between TeTxLC-injected animals (Figure 1B) and mock-injected animals (Figure 1D). We observed apoptotic relay neurons in the dLGN of TeTxLC-injected animals (Figure 1B) but not mock-injected animals (Figure 1D). Both these controls show that the observed apoptosis of dLGN relay neurons is caused by synapse inactivation.

      In addition, in our synapse inactivation experiment (Figure 1), AAV-hSyn-TeTxLC is injected into the right eye and expressed only in RGCs, not in dLGN relay neurons. Since dLGN relay neurons in this experiment do not receive a perturbation that is independent of synaptic transmission, we conclude that their apoptosis occurs through synapse-dependent mechanisms.

      Furthermore, if the apoptotic neurons are confounding the analysis (as implied by reviewers and editors) and do not occur through synapse-dependent mechanisms, then inhibiting both eyes with TeTxLC (Figure 2C, rightmost group) should cause high levels of caspase-3 activation, like that in the single-inhibition condition. Instead, we observe the opposite (Figure 2C, middle group) – overall caspase-3 activity goes down significantly in the dual-inhibition condition and is closer to the unperturbed condition, which can be explained by a loss of interaction between “strong” and “weak” synapses. Taken together, our data demonstrate that apoptosis of relay neurons in Figure 1 occurs specifically in response to synapse inactivation through synapse-dependent mechanisms, and the activated caspase-3 signal in the neurons should be included in our analysis.

      Why does synaptic caspase-3 activation manifest in different forms: puncta, “blobs”, and cells?  This is not surprising when considering the mechanisms that neurons must utilize to spatially confine caspase-3 activation and the nature of the apoptotic signaling cascade. On one hand, it has been proposed that caspase-3 activity in dendrites can be locally confined by proteasomal degradation of cleaved caspase-3 (Erturk et al., DOI: 10.1523/JNEUROSCI.3121-13.2014 ). On the other hand, caspase-3 activation is known to trigger explosive feedback amplification of apoptotic signaling events (McComb et al., DOI: 10.1126/sciadv.aau9433 ). For caspase-3 activation to remain localized to dendrites, the negative regulation must outweigh the positive feedback amplification. By expressing TeTxLC in RGCs of one eye, we create a strong perturbation that silences a large fraction of the synapses in the retinogeniculate pathway, which likely shifts the balance between positive and negative regulation of caspase-3 activity in some relay neurons. To be more specific, if a given dLGN relay neuron receives too many inactivated synapses, which is likely the case in our perturbation, caspase-3 activity that is initially localized can overwhelm the physiological negative regulation mechanisms that act to spatially confine it, resulting in whole cell apoptosis. In fact, previous in vitro evidence (Enturk et al., DOI: 10.1523/JNEUROSCI.3121-13.2014 ) demonstrated that, while caspase-3 activation in a single distal dendrite can be locally contained, activating apoptosis signaling in dendrites proximal to the cell body can result in whole-cell apoptosis. Similarly, a few inactivated retinogeniculate synapses can elicit locally contained caspase-3 activity in dLGN relay neurons, but a large number of inactivated synapses on a single relay neuron may trigger sufficient caspase-3 activity that can lead to whole-cell apoptosis. We discussed how to interpret synapse inactivation-induced apoptosis in dLGN relay neurons both in the main text and in the discussion (line 123-132, and line 411-421).

      (2) Regarding microglia, we did not claim that “microglia direct or instruct synapse elimination”. Our main claim is that caspase-3 activation is important for efficient elimination of weakened synapses by microglia. This claim emphasizes a regulatory role for caspase-3 activation in microglia-mediated synapse elimination, but not a regulatory role of microglia in synapse elimination. To be more specific, our data suggest that lack of synaptic activity induces caspase-3 activity, and caspase-3 activity in turn influences which synapses are preferentially eliminated by microglia. Therefore, the elimination specificity is fundamentally determined (i.e. instructed) by neuronal activity, not by microglia. We also did not presume the manner in which microglia engage in synapse elimination. We specifically address this point in the discussion at line 458 through 465 where we acknowledge that microglia may indirectly mediate synapse elimination by engulfing shed neuronal material. In our title and text, we use the phrase “microglia-mediated synapse elimination”, which is not the same as microglia-instructed synapse elimination and does not presume any instructive/directive role of microglia.

      (3) Regarding whether astrocytes are involved, we did not challenge the notion that astrocytes play important roles in synapse elimination. Rather, our claim is that, unlike what we observed with microglia, the amount of synaptic material engulfed by astrocytes does not robustly depend on whether caspase-3 is present. We acknowledge that there might be a caspase-3 dependent phenotype that we were unable to detect (line 309-310), and that it is plausible that astrocytes mediate activity-dependent synapse elimination through other caspase-3-independent mechanisms. This claim is not central to our study, and we would like to qualify the statements in the manuscript. We will remove the phrase “but not astrocytes” in line 18 of the abstract.

      In summary, using a state-of-the-art method to inactivate retinogeniculate synapses, we discovered a causal link between synapse weakening/inactivation and caspase-3 activation. Coupled with well-established in vivo assays (e.g., segregation analysis, electrophysiology, and engulfment analysis) that are used in many landmark studies we cite, we provide solid evidence supporting our claim that “caspase-3 is essential for synapse elimination driven by both spontaneous and experience-dependent neural activity”, and that “synapse weakening-induced caspase-3 activation determines the specificity of synapse elimination mediated by microglia”.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      In this manuscript, the authors study the effects of synaptic activity on the process of eye-specific segregation, focusing on the role of caspase 3, classically associated with apoptosis. The method for synaptic silencing is elegant and requires intrauterine injection of a tetanus toxin light chain into the eye. The authors report that this silencing leads to increased caspase 3 in the contralateral eye (Figure 1) and demonstrate evidence of punctate caspase 3 that does not overlap neuronal markers like map2. However, the quantifications showing increased caspase 3 in the silenced eye (done at P5) are complicated by overlap with the signal from entire dying cells in the thalamus. The authors also show that global caspase 3 deficiency impairs the process of eye-specific segregation and circuit refinement (Figures 3-4). 

      The reviewer states: “this silencing leads to increased caspase 3 in the contralateral eye”. We observed increased caspase-3 activity, not protein levels, in the contralateral dLGN, not eye.

      The reviewer states: “and demonstrate evidence of punctate caspase 3 that does not overlap neuronal markers like map2”. This is not accurate. We show that the punctate active caspase-3 signals overlap with the dendritic marker MAP2 (Figure S4A).

      The reviewer states: “, the quantifications showing increased caspase 3 [activity] in the silenced [dLGN] (done at P5) are complicated by overlap with the signal from entire dying cells in the thalamus”. This is not accurate. The apoptotic neurons we observed are relay neurons located in the dLGN (confirmed by their morphology and positive staining of NeuN – Figure S4B-C), not “cells” of unknown lineage (as suggested by the reviewer) in the general “thalamus” area (as suggested by the reviewer). If the dying cells were non-neuronal cells, that would indeed confound our quantification and conclusions, but that is not the case.

      We argue that the active caspase-3 signals in apoptotic dLGN relay neurons are not a confounding factor but a bona fide response to synaptic silencing and therefore should be included in the quantification. We have two sets of controls (please also see the general response above), one is between the strongly inactivated dLGN and the weakly inactivated dLGN in each TeTxLC-injected animal, second is between dLGN of TeTxLC-injected animals and mock-injected animals. In both controls, only the dLGN receiving strong synapse inactivation has these apoptotic dLGN relay neurons, demonstrating that these cells occur as a consequence of synapse inactivation. It is also unlikely that our perturbation is causing cell death through a non-synaptic mechanism. As mock injections do not cause apoptosis in dLGN neurons, this phenomenon is not related to surgical damage. TeTxLC is injected into the eyes and only expressed in presynaptic RGCs, not in postsynaptic relay neurons, so this phenomenon is also unlikely to be caused by TeTxLC-related toxicity. Furthermore, if apoptosis of dLGN relay neurons is not related to synapse inactivation, then when TeTxLC is injected into both eyes, one would expect to see either the same amount or more apoptotic relay neurons, but we instead observed a reduction in dLGN neuron apoptosis, suggesting a synapse-related mechanism must be responsible. Considering the above, apoptosis of relay neurons in TeTxLC-inactivated dLGN is causally linked to synapse inactivation, and active caspase-3 signals in these neurons are true signals that should be included in the quantification.

      The authors also report that "synapse weakening-induced caspase-3 activation determines the specificity of synapse elimination mediated by microglia but not astrocytes" (abstract). They report that microglia engulf fewer RGC axon terminals in caspase 3 deficient animals (Figure 5), and that this preferentially occurs in silenced terminals, but this preferential effect is lost in caspase 3 knockouts. Based on this, the authors conclude that caspase 3 directs microglia to eliminate weaker synapses. However, a much simpler and critical experiment that the authors did not perform is to eliminate microglia and show that the caspase 3 dependent effects go away. Without this experiment, there is no reason to assume that microglia are directing synaptic elimination. 

      The reviewer states: “microglia engulf fewer RGC axon terminals in caspase 3 deficient animals (Figure 5), and that this preferentially occurs in silenced terminals, but this preferential effect is lost in caspase 3 knockouts”. We are not sure what the reviewer means by “this preferentially occurs in silenced terminals”. Our results show that microglia preferentially engulf silenced terminals, and such preference is lost in caspase-3 deficient mice (Figure 6).

      We do not understand the experiment where the reviewer suggested to: “eliminate microglia and show that the caspase 3 dependent effects go away”. To quantify caspase-3 dependent engulfment of synaptic material by microglia or preferential engulfment of silenced terminals by microglia, microglia must be present in the tissue sample. If we eliminate microglia, neither of these measurements can be made. What could be measured if microglia are eliminated is the refinement of retinogeniculate pathway. This experiment would test whether microglia are required for caspase-3 dependent phenotypes. This is not a claim made in the manuscript. Instead, we claimed caspase-3 is required for microglia to preferentially eliminate weak synapses.

      We did not claim that “microglia are directing synaptic elimination”. Our claim is that synapse inactivation induces caspase-3 activity, and this caspase-3 activity in turn determines the substrate preference of microglia-mediated synapse elimination. Based on this model, it is the neuronal activity that fundamentally directs synapse elimination. Throughout the manuscript, we used the term “microglia-mediated synapse elimination”. This terminology does not assume a directive/instructive role of microglia in synapse elimination and only describes the observed engulfment of synaptic material by microglia. We also did not assume how microglia engage in synapse elimination. We acknowledge in the discussion (line 458 through 465) that microglia may mediate synapse elimination in an indirect, passive way by engulfing shed neuronal material. This topic is a matter of debate in the field (Eyo et al., DOI: 10.1126/science.adh7906 ).

      Finally, the authors also report that caspase 3 deficiency alters synapse loss in 6-month-old female APP/PS1 mice, but this is not really related to the rest of the paper. 

      We respectfully disagree that Figure 7 is not related to the rest of the paper. Many genes involved in postnatal synapse elimination, such as C1q and C3, have been implicated in neurodegeneration. It is therefore natural and important to ask whether the function of caspase-3 in regulating synaptic homeostasis extends to neurodegenerative diseases in adult animals. The answer to this question may have broad therapeutic impacts.

      Reviewer #2 (Public Review): 

      Summary: 

      This manuscript by Yu et al. demonstrates that activation of caspase-3 is essential for synapse elimination by microglia, but not by astrocytes. This study also reveals that caspase 3 activation-mediated synapse elimination is required for retinogeniculate circuit refinement and eye-specific territories segregation in dLGN in an activity-dependent manner. Inhibition of synaptic activity increases caspase-3 activation and microglial phagocytosis, while caspase-3 deficiency blocks microglia-mediated synapse elimination and circuit refinement in the dLGN. The authors further demonstrate that caspase-3 activation mediates synapse loss in AD, loss of caspase-3 prevented synapse loss in AD mice. Overall, this study reveals that caspase-3 activation is an important mechanism underlying the selectivity of microglia-mediated synapse elimination during brain development and in neurodegenerative diseases. 

      Strengths: 

      A previous study (Gyorffy B. et al., PNSA 2018) has shown that caspase-3 signal correlates with C1q tagging of synapses (mostly using in vitro approaches), which suggests that caspase-3 would be an underlying mechanism of microglial selection of synapses for removal. The current study provides direct in vivo evidence demonstrating that caspase-3 activation is essential for microglial elimination of synapses in both brain development and neurodegeneration. 

      The paper is well-organized and easy to read. The schematic drawings are helpful for understanding the experimental designs and purposes. 

      Weaknesses: 

      It seems that astrocytes contain large amounts of engulfed materials from ipsilateral and contralateral axon terminals (Figure S11B) and that caspase-3 deficiency also decreased the volume of engulfed materials by astrocytes (Figures S11C, D). So the possibility that astrocyte-mediated synapse elimination contributes to circuit refinement in dLGN cannot be excluded.

      The experiments presented in Figure S11 aim to determine whether astrocyte-mediated synapse elimination depends on caspase- 3 signaling.  We do not claim that astrocytes are unimportant for synapse elimination or circuit refinement. We did observe a small decrease in synaptic material engulfed by astrocytes when caspase-3 is deficient, and we acknowledged that there could be defects that we were not able to detect (line 309-310). The claim that caspase-3 does not regulate astrocyte-mediated synapse elimination is not a central claim of the manuscript and we will qualify our statements in the text. We will remove the phrase “but not astrocytes” in the abstract (line 18).

      Does blocking single or dual inactivation of synapse activity (using TeTxLC) increase microglial or astrocytic engulfment of synaptic materials (of one or both sides) in dLGN? 

      We assume that by “blocking single or dual inactivation of synapse activity”, the reviewer refers to inactivating retinogeniculate synapses from one or both eyes.

      We showed that inactivating retinogeniculate synapses from one eye (single inactivation) increases microglia-mediated engulfment of presynaptic terminals of inactivated synapses (Figure 6). We did not measure microglia-mediated engulfment of synaptic material while inactivating retinogeniculate synapses from both eyes (dual inactivation). However, based on the total active caspase-3 signal (Figure 2) in the dual inactivation scenario, we do not expect to see an increase in engulfment of synaptic material.

      We did not measure astrocyte-mediated engulfment with single or dual inactivation, as we did not see a robust caspase-3 dependent phenotype in astrocyte-mediated engulfment.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Review:

      Summary:

      Bursicon is a key hormone regulating cuticle tanning in insects. While the molecular mechanisms of its function are rather well studied--especially in the model insect Drosophila melanogaster, its effects and functions in different tissues are less well understood. Here, the authors show that bursicon and its receptor play a role in regulating aspects of the seasonal polyphenism of Cacopsylla chinensis. They found that low temperature treatment activated the bursicon signaling pathway during the transition from summer form to winter form and affect cuticle pigment and chitin content, and cuticle thickness. In addition, the authors show that miR-6012 targets the bursicon receptor, CcBurs-R, thereby modulating the function of bursicon signaling pathway in the seasonal polyphenism of C. chinensis. This discovery expands our knowledge of the roles of neuropeptide bursicon action in arthropod biology.

      However, the study falls short of its claim that it reveals the molecular mechanisms of a seasonal polyphenism. While cuticle tanning is an important part of the pear psyllid polyphenism, it is not the equivalent of it. First, there are other traits that distinguish between the two morphs, such as ovarian diapause (Oldfield, 1970), and the role of bursicon signaling in regulating these aspects of polyphenism were not measured. Thus, the phenotype in pear psyllids, whereby knockdown bursicon reduces cuticle tanning seems to simply demonstrate the phenotypes of Drosophila mutants for bursicon receptor (Loveall and Deitcher, 2010, BMC Dev Biol) in another species (Fig. 2I, 4H). Second, the study fails to address the threshold nature of cuticular tanning in this species, although it is the threshold response (specifically, to temperature and photoperiod) that distinguishes this trait as a part of a polyphenism. Whereas miR-6012 was found to regulate bursicon expression, there no evidence is provided that this microRNA either responds to or initiates a threshold response to temperature. In principle, miR-6012 could regulate bursicon whether or not it is part of a polyphenism. Thus, the impact of this work would be significantly increased if it could distinguish between seasonal changes of the cuticle and a bona fide reflection of polyphenism.

      Thanks for your valuable suggestion. We concur with the review’s comment that cuticle tanning does not equate to the C. chinensis polyphenism. To better reflect the core focus of our research, we have revised the title to "Neuropeptide Bursicon and its receptor mediated the transition from summer-form to winter-form of Cacopsylla chinensis".

      In response to the reviewer's inquiry regarding the threshold nature of cuticular tanning in C. chinensis, we have included a detailed analysis of the phenotypic changes (including nymph phenotypes, cuticle pigment absorbance, and cuticle thickness) during the transition from summer-form to winter-form in C. chinensis at distinct time intervals (3, 6, 9, 12, 15 days) under different temperature conditions (10°C and 25°C). As shown in Figure S1, nymphs exhibit a light yellow and transparent coloration at 3, 6, and 9 days, while nymphs at 12 and 15 days display shades of yellow-green or blue-yellow under 25°C conditions. At 10°C conditions, the abdomen end turns black at 3, 6, and 9 days. By the 12 days, numerous light black stripes appear on the chest and abdomen of nymphs at 10°C. At 15 days, nymphs exhibit an overall black-brown appearance, featuring dark brown stripes on the left and right sides of each chest and abdominal section. Furthermore, the end of the abdomen and back display a large black-brown coloration at 10°C (Figure S1A). The UV absorbance of the total pigment extraction at a 300 nm wavelength markedly increases following 10°C exposure for 6, 9, 12, and 15 days compared to the 25°C treatment group (Figure S1B). Cuticle thicknesses also increased following 10°C exposure for 6, 9, 12, and 15 days compared to the 25°C treatment group (Figure S1C). The detailed results (L122-143), materials and methods (L647-652), and discussion (L319-322) have been added in our revised manuscript.

      Regarding the response of miR-6012 to temperature, we have already determined its expression at 3, 6, 10 days under different temperatures in the previous Figure 5E. We now included additional time intervals (9, 12, 15 days) in the updated Figure 5E. Our results indicate a significant decrease in the expression levels of miR-6012 after 10°C treatment for 3, 6, 9, 12, 15 days compared to the 25°C treatment group. Detailed information regarding this has been integrated into the Materials and Methods (Line 608-610) of our revised manuscript.

      Strengths:

      This study convincingly identifies homologs of the genes encoding the bursicon subunits and its receptor, showing an alignment with those of another psyllid as well as more distant species. It also demonstrates that the stage- and tissue-specific levels of bursicon follow the expected patterns, as informed by other insect models, thus validating the identity of these genes in this species. They provide strong evidence that the expression of bursicon and its receptor depend on temperature, thereby showing that this trait is regulated through both parts of the signaling mechanism.

      Several parallel measurements of the phenotype were performed to show the effects of this hormone, its receptor, and an upstream regulator (miR-6012), on cuticle deposition and pigmentation (if not polyphenism per se, as claimed). Specifically, chitin staining and TEM of the cuticle qualitatively show difference between controls and knockdowns, and this is supported by some statistical tests of quantitative measurements (although see comments below). Thus, this study provides strong evidence that bursicon and its receptor play an important role in cuticle deposition and pigmentation in this psyllid.

      The study identified four miRNAs which might affect bursicon due to sequence motifs. By manipulating levels of synthetic miRNA agonists, the study successfully identified one of them (miR-6012) to cause a cuticle phenotype. Moreover, this miRNA was localized (by FISH) to the cuticle, body-wide. To our knowledge, this is the first demonstrated function for this miRNA, and this study provides a good example of using a gene of known function as an entry point to discovering others influencing a trait. Thus, this finding reveals another level of regulation of cuticle formation in insects.

      Weaknesses:

      (1) The introduction to this manuscript does not accurately reflect progress in the field of mechanisms underlying polyphenism (e.g., line 60). There are several models for polyphenism that have been used to uncover molecular mechanisms in at least some detail, and this includes seasonal polyphenisms in Hemiptera. Therefore, the justification for this study cannot be predicated on a lack of knowledge, nor is the present study original or unique in this line of research (e.g., as reviewed by Zhang et al. 2019; DOI: 10.1146/annurev-ento-011118-112448). The authors are apparently aware of this, because they even provide other examples (lines 104-108); thus the introduction seems misleading as framed.

      Thanks for your excellent suggestion. We have added the paper of Zhang et al. 2019 which recommended by reviewer (DOI: 10.1146/annurev-ento-011118-112448) in Line 57 of our revised manuscript. The statement has been revised to “However, the specific molecular mechanism underling temperature-dependent polyphenism still require further clarification” in Line 60-61 of our revised manuscript.

      (2) The data in Figure 2H show "percent of transition." However, the images in 2I show insects with tanned cuticle (control) vs. those without (knockdown). Yet, based on the description of the Methods provided, there appears to be no distinction between "percent of transition" and "percent with tanning defects". This an important distinction to make if the authors are going to interpret cuticle defects as a defect in the polyphenism. Furthermore, there is no mention of intermediate phenotypes. The data in 2H are binned as either present or absent, and these are the phenotypes shown in 2I. Was the phenotype really an all-or-nothing response? Instead of binning, which masks any quantitative differences in the tanning phenotypes, the authors should objectively quantify the degree of tanning and plot that. This would show if and to what degree intermediate tanning phenotypes occurred, which would test how bursicon affects the threshold response. This comment also applies to the data in Figures 4G and 6G. Since cuticle tanning is present in more insect than just those with seasonal polyphenism, showing how this responds as a threshold is needed to make claims about polyphenism.

      We appreciate your insightful comments. As shown in Figure 1 of our published paper (Zhang et al., 2013; doi.org/10.7554/eLife.88744.3) and Figure 2C-2I of the current manuscript, the transition from summer-form to winter-form entails not only external cuticular tanning but also alterations in internal cuticular chitin levels and cuticle thickness. While external cuticular tanning serves as a prominent and easily observable indicator of this transition, it is crucial to acknowledge that internal changes also play a significant role and should be taken into consideration. Therefore, we propose that the term "percent of transition" may be more suitable than "percent with tanning defects" to describe this process accurately.

      In order to provide a more visually comprehensive understanding of the phenotypic changes during the transition from summer-form to winter-form, we have included images at different time points (3, 6, 9, 12, 15 days) under different temperature conditions in Figure S1A of our revised manuscript. Specifically, under the 10°C condition, nymphs exhibit abdomen tanning after 6 and 9 days of treatment, while the thorax remains untanned. By days 12 to 15, both the abdomen and thorax of the nymphs show tanning, resulting in the majority of summer-form nymphs transitioning into winter-form, as depicted in Figure 2I for comparison. This observation indicates the presence of a critical threshold for cuticle tanning of C. chinensis following exposure to 10°C. Nymphs that did not undergo the transition to winter-form succumbed to the cold, highlighting the absence of intermediate phenotypes at 12-15 days under the 10°C condition. The UV absorbance of the total pigment extraction at a 300 nm wavelength markedly increases following 10°C exposure for 6, 9, 12, and 15 days compared to the 25°C treatment group (Figure S1B). Additionally, cuticle thickness shows an increase following 10°C exposure for 6, 9, 12, and 15 days compared to the 25°C treatment group (Figure S1C). These results highlight the relationship between the threshold of cuticular tanning and the transition process. The detailed description and information have been added in Results (L122-143), Materials and Methods (L647-652), and Discussion (L319-322) of our manuscript.

      (3) This study also does not test the threshold response of cuticle phenotypes to levels of bursicon, its receptor, or miR-6012. Hormone thresholds are the most widespread and, in most systems where polyphenism has been studied, the defining characteristic of a polyphenism (e.g., Nijhout, 2003, Evol Dev). Quantitative (not binned) measurements of a polyphenism marker (e.g., chitin) should be demonstrated to result as a threshold titer (or in the case of the receptor, expression level) to distinguish defects in polyphenism from those of its component trait.

      Thanks for your valuable feedback. We have supplemented additional data on the phenotypes (Figure S1A), cuticle pigment absorbance (Figure S1B), cuticle thickness (Figure S1C), expression levels of bursicon (Figure 1E and 1F), its receptors (Figure 3G), and miR-6012 (Figure 5E) corresponding to nymphs treated over different time periods (3, 6, 9, 12, 15 days) under both 10°C and 25°C conditions in our revised manuscript.

      While all these identified markers exhibit a strong correlation with the transition from summer-form to winter-form, it is important to note that they are not suitable as definitive thresholds due to the nature of relative gene expression quantification and chitin content assessment, rather than absolute quantitation. Further, given that tanning hormones are neuropeptides present in trace amounts in insects, unlike steroid hormones, determining their titers poses a considerable challenge.

      (4) Cuticle issue:

      (a) Unlike Fig. 6D and F, Figs. 2D and F do not correspond to each other. Especially the lack and reduction of chitin in ds-a+b! By fluorescence microscopy there is hardly any signal, whereas by TEM there is a decent cuticle. Additionally, the dsGFP control cuticle in 2D is cut obliquely with a thick and a thin chitin layer. This is misleading.

      Thanks for your insightful feedback. We have replaced the previous WGA chitin staining images in the dsCcbursα+β treatment of Figure 2D with new representative images aligning with Figure 2F. Furthermore, the presence of both thin and thick chitin layers observed in the dsEGFP treatment of Figure 2D could potentially be ascribed to the chitin content in the insect midgut or fat body as previously discussed (Zhu et al., 2016). It is notable that during the process of cuticle staining, the chitin located in the midgut and fat body of C. chinensis may exhibit green fluorescence, leading to the appearance of a thin chitin layer. A detailed analysis and elucidation of these observations have been added in the discussion section (Lines 347-352) of our revised manuscript.

      Zhu KY, Merzendorfer H, Zhang W, Zhang J, Muthukrishnan S. Biosynthesis, Turnover, and Functions of Chitin in Insects. Annu Rev Entomol. 2016;61:177-196. doi:10.1146/annurev-ento-010715-023933.

      (b) In Figs. 2F and 4F, the endocuticle appears to be missing, a portion of the procuticle that is produced post-molting. As tanning is also occurring post-molting, there seems to be a general problem with cuticle differentiation at this time point. This may be a timing issue. Please clarify.

      Thank you for your suggestion. The insect cuticle typically comprises three distinct layers (endocuticle, exocuticle, and epicuticle), with the thickness of each layer varying among different insect species. Cuticle differentiation is closely linked to the molting cycle of insects (Mrak et al., 2017). In our study, nymphal cuticles exhibited normal differentiation patterns, characterized by a thin epicuticle and comparable widths of the endocuticle and exocuticle following dsEGFP treatment, as illustrated in Figure 2F and 4F. Conversely, nymphs treated with dsCcBurs-α, dsCcBurs-β, and dsCcburs-R displayed impaired development, manifesting only the exocuticle without a discernible endocuticle layer. These findings suggest that bursicon genes and their receptor play a pivotal role in regulating insect cuticle development (Costa et al., 2016). We have added some discussion about these results in Lines 356-367 of our revised manuscript.

      Mrak, P., Bogataj, U., Štrus, J., & Žnidaršič, N. (2017). Cuticle morphogenesis in crustacean embryonic and postembryonic stages. Arthropod structure & development, 46(1), 77–95. https://doi.org/10.1016/j.asd.2016.11.001

      Costa, C. P., Elias-Neto, M., Falcon, T., Dallacqua, R. P., Martins, J. R., & Bitondi, M. (2016). RNAi-mediated functional analysis of Bursicon genes related to adult cuticle formation and tanning in the Honeybee, Apis mellifera. PloS one, 11(12), e0167421. https://doi.org/10.1371/journal.pone.0167421

      (c) To provide background information, it would be useful analyze cuticle formation in the summer and winter morphs of controls separately by light and electron microscopy. More baseline data on these two morphs is needed.

      Thanks for your valuable feedback. To provide more background information about cuticle formation, we supplied the results of nymph phenotypes, cuticle pigment absorbance, and cuticle thickness at distinct time intervals (3, 6, 9, 12, 15 days) under different temperatures of 10°C and 25°C in Figure S1 of our revised manuscript. Hope these results can help better understand the baseline data on these two morphs.

      (d) For the TEM study, it is not clear whether the same part of the insect's thorax is being sectioned each time, or if that matters. There is not an obvious difference in the number of cuticular layers, but only the relative widths of those layers, so it is difficult to know how comparable those images are. This raises two questions that the authors should clarify. First, is it possible that certain parts of the thoracic cuticle, such as those closer to the intersegmental membrane, are naturally thinner than other parts of the body? Second, is the tanning phenotype based on the thickness or on the number of chitin layers, or both? The data shown later in Figure 4I, J convincingly shows that the biosynthesis pathway for chitin is repressed, but any clarification of what this might mean for deposition of chitin would help to understand the phenotypes reported. Also, more details on how the data in Fig. 2G were collected would be helpful. This also goes for the data in Fig. 4 (bursicon receptor knockdowns).

      Thanks for your great comment. The TEM investigation adhered to a standardized protocol was used as previous description (Zhang et al., 2023), Initially, insect heads were uniformly excised and then fixed in 4% paraformaldehyde. Subsequently, a consistent cutting and staining procedure was executed at a uniform distance above the insect's thorax. The dorsal region of the thorax was specifically chosen for subsequent fluorescence imaging or transmission electron microscopy assessments with the specific objective of quantifying cuticle thickness. Regarding the measurement of cuticle thickness, use the built-in measuring ruler on the software to select the top and bottom of the same horizontal line on the cuticle. Measure the cuticle of each nymph at two close locations. Six nymphs were used for each sample. Randomly select 9 values and plot them. The related description has been added in the Materials and Methods (Line 660-668) of our revised manuscript.

      Zhang, S.D., Li, J.Y., Zhang, D.Y., Zhang, Z.X., Meng, S.L., Li, Z., & Liu, X.X. (2023). MiR-252 targeting temperature receptor CcTRPM to mediate the transition from summer-form to winter-form of Cacopsylla chinensis. eLife, 12. https://doi.org/10.7554/eLife.88744

      (5) Tissue issue:

      The timed experiments shown in all figures were done in whole animals. However, we know from Drosophila that Bursicon activity is complex in different tissues. There is, thus, the possibility, that the effects detected on different days in whole animals are misleading because different tissues--especially the brain and the epidermis, may respond differentially to the challenge and mask each other's responses. The animal is small, so the extraction from single tissue may be difficult. However, this important issue needs to be addressed.

      Thanks for your excellent suggestion. We express our heartfelt appreciation to the reviewer for their valuable input regarding the challenges involved in dissecting various tissue sections from the diminutive early instar nymphs of C. chinensis. In light of the metamorphic transition of C. chinensis across developmental stages, this study concentrated on examining the extensive phenotypic alterations. Consequently, intact samples of C. chinensis were specifically chosen for for qPCR analysis. The related descriptions have been added in the Materials and Methods (Line 513, 517, 553, 555, and 613) and Discussion (Line 327-329) of our revised manuscript.

      (6) No specific information is provided regarding the procedure followed for the rescue experiments with burs-α and burs-β (How were they done? Which concentrations were applied? What were the effects?). These important details should appear in the Materials and Methods and the Results sections.

      Thanks for your excellent suggestion. For the rescue experiments, the dsRNA of CcBurs-R and proteins of burs α-α, burs β-β homodimers, or burs α-β heterodimer (200 ng/μL) were fed together. The concentration of heterodimer protein of CcBurs-α+β was 200 ng/μL. The heterodimer protein of CcBurs-α+β fully rescued the effect of RNAi-mediated knockdown on CcBurs-R expression, while α+α or β+β homodimers did not (Figure 3F). Feeding the α+β heterodimer protein fully rescued the defect in the transition percent and morphological phenotype after CcBurs-R knockdown (Figure 4G-4H). We have added the detailed methods of rescued experiments and specific concentrations in the Materials and Methods (Line 561-563), and Results (Line 263) of our revised manuscript.

      (7) Pigmentation

      (a) The protocol used to assess pigmentation needs to be validated. In particular, the following details are needed: Were all pigments extracted? Were pigments modified during extraction? Were the values measured consistent with values obtained, for instance, by light microscopy (which should be done)?

      Thanks for your excellent comment. Our protocol for pigment extracted as detailed in Bombyx mori, the cuticles were pulverized in liquid nitrogen and then dissolved in 30 milliliters of acidified methanol (Futahashi et al., 2012; Osanai-Futahashi et al., 2012). Thus, all cuticle pigments were dissected and treated with acidified methanol. Pigments were not modified during extraction.. The details description have been integrated into the Materials and Methods (Line 630-633) of our revised manuscript.

      Futahashi, R., Kurita, R., Mano, H., & Fukatsu, T. (2012). Redox alters yellow dragonflies into red. Proceedings of the National Academy of Sciences of the United States of America, 109(31), 12626–12631. https://doi.org/10.1073/pnas.1207114109

      Osanai-Futahashi, M., Tatematsu, K. I., Yamamoto, K., Narukawa, J., Uchino, K., Kayukawa, T., Shinoda, T., Banno, Y., Tamura, T., & Sezutsu, H. (2012). Identification of the Bombyx red egg gene reveals involvement of a novel transporter family gene in late steps of the insect ommochrome biosynthesis pathway. The Journal of biological chemistry, 287(21), 17706–17714. https://doi.org/10.1074/jbc.M111.321331

      (b) In addition, pigmentation occurs post-molting; thus, the results could reflect indirect actions of bursicon signaling on pigmentation. The levels of expression of downstream pigmentation genes (ebony, lactase, etc) should be measured and compared in molting summer vs. winter morphs.

      Thanks for your valuable suggestion. Actually, we already studied the function of some downstream pigmentation genes, including ebony, Lactase, Tyrosine hydroxylase, Dopa decarboxylase, and Acetyltransferase. The variations in the expression patterns of these genes are closely tied to the molting dynamics of nymphs undergoing transitions between summer-form and winter-form. These findings will put in another manuscript currently being prepared for submission, thus detailed outcomes are not suitable for inclusion in the current manuscript.

      (8) L236: "while the heterodimer protein of CcBurs α+β could fully rescue the effect of CcBurs-R knockdown on the transition percent (Figure 4G 4H)". This result seems contradictory. If CcBurs-R is the receptor of bursicon, the heterodimer protein of CcBurs α+β should not be able to rescue the effect of CcBurs-R knockdown insects. How can a neuropeptide protein rescue the effect when its receptor is not there! If these results are valid, then the CcBurs-R would not be the (sole) receptor for CcBurs α+β heterodimer. This is a critical issue for this manuscript and needs to be addressed (also in L337 in Discussion).

      Thanks for your insightful suggestion. Following the administration of dsCcBur-R to C. chinensis, the expression of CcBurs-R exhibited a reduction of approximately 66-82% as depicted in Figure 4A, rather than complete suppression. Activation of endogenous CcBurs-R through feeding of the α+β heterodimer protein results in an increase in CcBurs-R expression, with the effectiveness of the rescue effect contingent upon the dosage of the α+β heterodimer protein. Consequently, the capacity of the α+β heterodimer protein to effectively mitigate the impacts of CcBurs-R knockdown on the conversion rate is clearly demonstrated. We have added additional discussion in Line 396-403 of our revised manuscript.

      (9) Fig. 5D needs improvement (the magnification is poor) and further explanation and discussion. mi6012 and CcBurs-R seem to be expressed in complementary tissues--do we see internal tissues also (see problem under point 2)? Again, the magnification is not high enough to understand and appreciate the relationships discussed.

      Thanks for your valuable suggestion. In order to enhance the resolution of the magnified images, we conducted FISH co-localization of miR-6012 and CcBurs-R in 3rd instar nymphs and obtained detailed zoomed-in images. As shown in the magnified view of Figure 5D, miR-6012 and CcBurs-R appear to exhibit complementary expression patterns in tissues. During the FISH assays, epidermis transparency of C. chinensis was achieved via decolorization treatment. Noteworthy observations from Figure 3G and Figure 5E reveal an inverse correlation in the expression profiles of CcBurs-R and miR-6012. Consequently, the FISH results distinctly highlight a significant disparity in the expression levels of CcBurs-R and miR-6012 within the same tissue. We have added related explanation and discussion in Line 291-293 of our revised manuscript.

      (10) The schematic in Fig. 7 is a useful summary, but there is a part of the logic that is unsupported by the data, specifically in terms of environmental influence on cuticle formation (i.e., plasticity). What is the evidence that lower temperatures influence expression of miR-6012? The study measures its expression over life stages, whether with an agonist or not, over a single temperature. Measuring levels of expression under summer form-inducing temperature is necessary to test the dependence of miR-6012 expression on temperature. Otherwise, this result cannot be interpreted as polyphenism control, but rather the control of a specific trait.

      Thanks for your great suggestion. We actually conducted the assessment of miR-6012 expression at specific time intervals (3, 6, 9, 12, 15 days) under different temperatures of 10°C and 25°C. As depicted in Figure 5E, the expression levels of miR-6012 were notably reduced at 10°C compared to 25°C. Additionally, the evaluation of agomir-6012 expression level of C. chinensis under 25°C conditions at various time points (3, 6, 9, 12, 15 days) revealed no significant changes. Hence, we suggest that the impact of miR-6012 on the seasonal morphological transition is influenced upon temperature.

      Recommendations for the authors:

      The authors report a novel role of Bursicon and its receptor in regulating the seasonal polyphenism of Cacopsylla chinensis. They found that low temperature treatment (10°C) activated the Bursicon signaling pathway during the transition from summer-form to winter-form, which influences cuticle pigment content, cuticle chitin content, and cuticle thickness. Moreover, the authors identified miR-6012 and show that it targets CcBurs-R, thereby modulating the function of Bursicon signaling pathway in the seasonal polyphenism of C. chinensis. This discovery expands our knowledge of multiple roles of neuropeptide bursicon action in arthropod biology. However, the m

      anuscript does have several major weaknesses, described under "Public review", which the authors need to address.

      Major issues:

      (1) L152-154 Fig S2E and S2F: Bursicon has been shown to be expressed in the CNS in a specific set of neurons. For example, In the larval CNS of Manduca sexta, bursicon expression is restricted to the subesophageal ganglion (SG), thoracic ganglia, and first abdominal ganglion. Pharate pupae and pharate adults show expression of this heterodimer in all ganglia. In Drosophila larvae, expression of a bursicon heterodimer is confined to abdominal ganglia. The additional neurons in the ventral nerve cord express only burs. In pharate adults, bursicon is produced by neurons in the SG and abdominal ganglia. I am wondering where bursicon subunits are expressed in the C. chinensis CNS? Since the authors have the antibodies, it would be useful to include immunocytochemical staining of bursicon alpha and beta in the CNS. The qPCR results from head or other tissues (Fig S2E and S2F) is not the most informative way to document localization of gene expression. Regarding the qPCR results, they show that the cuticle and the fat body express CcBurs-α and CcBurs-β. Can the authors confirm this unexpected results independently?

      Thanks for your insightful comment. In this study, we did not directly used antibodies targeting bursicon subunits, instead, the bursicon subunits along with a histidine tag were integrated into the expression vector pcDNA3.1 using homologous recombination. The experimental procedures were executed as follows: initially, the histidine tag was fused to the pcDNA3.1-mCherry vector through homologous recombination to generate the recombinant plasmid pcDNA3.1-his-mCherry. Subsequently, the amino acid sequences of the two bursicon subunits were introduced into the pcDNA3.1-his-mCherry vector via homologous recombination to produce the recombinant plasmids pcDNA3.1-CcBurs-α-his-mCherry and pcDNA3.1-CcBurs-β-his-mCherry. Finally, the P2A sequence was incorporated into the vector using reverse PCR to yield the recombinant plasmids pcDNA3.1-CcBurs-α-his-P2A-mCherry and pcDNA3.1-CcBurs-β-his-P2A-mCherry. Consequently, the bursicon subunits, along with the histidine tag, were capable of generating fusion proteins with the histidine tag. Western blot analysis was conducted using antibodies targeting the histidine tag, enabling the detection of histidine expression, which corresponds to the expression of the bursicon subunits. However, they are not suitable to conduct the in vivo immunocytochemical staining of bursicon alpha and beta in the CNS.

      Due to the diminutive size of the C. chinensis nymphs, dissection of the central nervous system (CNS) was unfeasible, precluding specific assessment of bursicon expression in the CNS. Prior literature has documented the expression of bursicon subunits in the epidermis and fat body of C. chinensis. Studies suggest that bursicon subunits not only play a role in the melanization and sclerotization processes of insect epidermis but also have significant roles in insect immunity (An et al., 2012). The presence of bursicon subunits in the epidermis, gut, and fat body of C. chinensis may indicate their crucial roles in the immune functions of these tissues. Further investigation is required to elucidate the specific immune functions they perform, hinting at the potential expression of these bursicon subunits in these two tissues.

      An, S., Dong, S., Wang, Q., Li, S., Gilbert, L. I., Stanley, D., & Song, Q. (2012). Insect neuropeptide bursicon homodimers induce innate immune and stress genes during molting by activating the NF-κB transcription factor Relish. PloS one, 7(3), e34510. https://doi.org/10.1371/journal.pone.0034510

      (2) L222: "CcBurs-R is the Bursicon receptor of C. chinensis". Is this statement supported by affinity binding assay results?

      Thanks for your excellent suggestion. We employed a fluorescence-based assay to quantify calcium ion concentrations and investigate the binding affinities of bursicon heterodimers and homodimers to the bursicon receptor across varying concentrations. Our findings suggest that activation of the receptor by the burs α-β heterodimer leads to significant alterations in intracellular calcium ion levels, whereas stimulation with burs α-α and burs β-β homodimers, in conjunction with Adipokinetic hormone (AKH), maintains consistent intracellular calcium ion levels. Consequently, this research definitively identifies CcBurs-R as the bursicon receptor. For further details, please refer to the Materials and Methods (Lines 493-504), Results (Lines 231-239), and Discussion (Lines 377-384) of our revised manuscript.

      (3) L245 Figure 4I-4J: Since knockdown of bursicon and its receptor cause a decrease pigment accumulation in the cuticle, it would be useful to examine 1-2 rate limiting enzyme-encoding genes in the bursicon regulated cuticle darkening process if possible (as was done for genes involved in cuticle thickening).

      Thanks for your excellent comment. Following the further study, a thorough analysis was conducted to evaluate the impact of bursicon and its receptor on the expression levels of Lactase, Tyrosine hydroxylase, Dopa decarboxylase, Acetyltransferase, and the effects of RNA interference targeting these genes on the seasonal morphological transition. The findings underscored their role in the bursicon-mediated cuticle darkening process. However, as this section is slated for inclusion in an upcoming manuscript intended for submission, it is deemed unsuitable for incorporation into the current manuscript.

      Minor issues:

      (1) L75 "stronger resistance (Ge et al., 2019; Tougeron et al., 2021)". Stronger resistance to what? Stronger resistance to environmental stress or weather condition? Please clarify.

      Thanks for your excellent suggestion. We have changed the statement to “stronger resistance to weather condition” in Line 75 of our revised manuscript.

      (2) L132 Figure 1A and 1B: Bursicon sequence was first identified and functionally characterized in Drosophila melanogaster: is there any reason why Drosophila bursicon sequences were not included in the comparison?

      Thanks for your excellent comment. We have added the sequence of Burs-α and Burs-β of D. melanogaster in the sequence alignment results of Figure 1A and 1B of our revised manuscript.

      (3) Although the authors clearly identify and validate the function for the bursicon genes and its receptor's, there is no mention of whether duplicates of this gene are also present in the pear psyllid. This has been known to happen in otherwise conserved hormone pathways (e.g., insulin receptor in some insects), so a formal check of this should be done.

      Thanks for your excellent comment. As shown in Figure S2A-S2B and 3B, there are two bursicon subunit genes and only one bursicon receptor gene in our selected insect species, for examples Drosophila melanogaster, Diaphorina citri, Bemisia tabaci, Nilaparvata lugens, and Sogatella furcifera. In our transcriptome database of C. chinensis, we also only identified two bursicon subunit genes and only one bursicon receptor gene.

      (4) Line 41: Here, as in the title, "fascinating" is a subjective judgement that does not improve a study's presentation.

      Thanks for your great comment. We have changed "fascinating" to "transformation" in Line 41 and also revised the title of our revised manuscript.

      (5) Line 44: What makes some fields "cutting-edge" and others not?

      Thanks for your excellent suggestion. The expression of "in cutting-edge fields" has been deleted in Line 44 of our revised manuscript.

      (6) Line 97: This is a peculiar choice of reference for the concept of slower development in cold temperatures. The concept of degree-days and growth rates is old and widespread in entomology.

      Thanks for your insightful comment. The reference of Nyamaukondiwa et al., 2011 in Line 95 has been deleted in our revised manuscript.

      (7) Lines 149-150: What justifies the assumption that higher levels of expression mean a more important role? This gene might be just as necessary for development of the summer form, even if expressed at lower levels.

      Thanks for your excellent suggestion. This sentence has been revised to “Increased gene expression levels may potentially contribute to the transition from summer-form to winter-form in C. chinensis.” in Line 168-169 of our revised manuscript.

      (8) The blue arrow in Fig. 7 is confusing.

      Thanks for your excellent suggestion. In Figure 7, the blue arrow represents the down-regulated expression of miR-6012. We have added a description about the blue arrow in Figure 7 of our revised manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Over the last decade, numerous studies have identified adaptation signals in modern humans driven by genomic variants introgressed from archaic hominins such as Neanderthals and Denisovans. One of the most classic signals comes from a beneficial haplotype in the EPAS1 gene in Tibetans that is evidently of Denisovan origin and facilitated high altitude adaptation (HAA). Given that HAA is a complex trait with numerous underlying genetic contributions, in this paper Ferraretti et al. asked whether additional HAA-related genes may also exhibit a signature of adaptive introgression. Specifically, the authors considered that if such a signature exists, they most likely are only mild signals from polygenic selection, or soft sweeps on standing archaic variation, in contrast to a strong and nearly complete selection signal like in the EPAS1. Therefore, they leveraged two methods, including a composite likelihood method for detecting adaptive introgression and a biological networkbased method for detecting polygenic selection, and identified two additional genes that harbor plausible signatures of adaptive introgression for HAA.

      Strengths: 

      The study is well motivated by an important question, which is, whether archaic introgression can drive polygenic adaptation via multiple small effect contributions in genes underlying different biological pathways regulating a complex trait (such as HAA). This is a valid question and the influence of archaic introgression on polygenic adaptation has not been thoroughly explored by previous studies.

      The authors reexamined previously published high-altitude Tibetan whole genome data and applied a couple of the recently developed methods for detecting adaptive introgression and polygenic selection. 

      Weaknesses: 

      My main concern with this paper is that I am not too convinced that the reported genomic regions putatively under polygenic selection are indeed of archaic origin. Other than some straightforward population structure characterizations, the authors mainly did two analyses with regard to the identification of adaptive introgression: First, they used one composite likelihood-based method, the VolcanoFinder, to detect the plausible archaic adaptive introgression and found two candidate genes (EP300 and NOS2). Next, they attempted to validate the identified signal using another method that detects polygenic selection based on biological network enrichments for archaic variants.

      In general, I don't see in the manuscript that the choice of methods here are well justified. VolcanoFinder is one among the several commonly used methods for detecting adaptive introgression (eg. the D, RD, U, and Q statistics, genomatnn, maldapt etc.). Even if the selection was mild and incomplete, some of these other methods should be able to recapitulate and validate the results, which are currently missing in this paper. Besides, some of the recent papers that studied the distribution of archaic ancestry in Tibetans don't seem to report archaic segments in the two gene regions. These all together made me not sure about the presence of archaic introgression, in contrast to just selection on ancestral variation.

      Furthermore, the authors tried to validate the results by using signet, a method that detects enrichments of alleles under selection in a set of biological networks related to the trait. However, the authors did not provide sufficient description on how they defined archaic alleles when scoring the genes in the network. In fact, reading from the method description, they seemed to only have considered alleles shared between Tibetans and Denisovans, but not necessarily exclusively shared between them. If the alleles used for scoring the networks in Signet are also found in other populations such as Han Chinese or Africans, then that would make a substantial difference in the result, leading to potential false positives.

      Overall, given the evidence provided by this article, I am not sure they are adequate to suggest archaic adaptive introgression. I recommend additional analyses for the authors to consider for rigorously testing their hypothesis. Please see the details in my review to the authors. 

      Reviewer #2 (Public Review):

      In Ferrareti et al. they identify adaptively introgressed genes using VolcanoFinder and then identify pathways enriched for adaptively introgressed genes. They also use a signet to identify pathways that are enriched for Denisovan alleles. The authors find that angiogenesis and nitric oxide induction are enriched for archaic introgression.

      Strengths: 

      Most papers that have studied the genetic basis of high altitude (HA) adaptation in Tibet have highly emphasized the role of a few genes (e.g. EPAS1, EGLN1), and in this paper, the authors look for more subtle signals in other genes (e.g EP300, NOS2) to investigate how archaic introgression may be enriched at the pathway level.

      Looking into the biological functions enriched for Denisovan introgression in Tibetans is important for characterizing the impact of Denisovan introgression.

      Weaknesses: 

      The manuscript lacks details or justification about how/why some of the analyses were performed. Below are some examples where the authors could provide additional details.

      The authors made specific choices in their window analysis. These choices are not justified or there is no comment as to how results might change if these choices were perturbed. For example, in the methods, the authors write "Then, the genome was divided into 200 kb windows with an overlap of 50 kb and for each of them we calculated the ratio between the number of significant SNVs and the total number of variants." 

      Additional information is needed for clarity. For example, "we considered only protein-protein interactions showing confidence scores {greater than or equal to} 0.7 and the obtained protein frameworks were integrated using information available in the literature regarding the functional role of the related genes and their possible involvement in high-altitude adaptation." What do the confidence scores mean? Why 0.7?

      In the method section (Identifying gene networks enriched for Denisovan-like derived alleles), the authors write "To validate VolcanoFinder results by using an independent approach". Does this mean that for signet the authors do not use the regions identified as adaptively introgressed using volcanofinder? I thought in the original signet paper, the authors used a summary describing the amount of introgression of a given region.

      Later, the authors write "To do so, we first compared the Tibetan and Denisovan genomes to assess which SNVs were present in both modern and archaic sequences. These loci were further compared with the ancestral reconstructed reference human genome sequence (1000 Genomes Project Consortium et al., 2015) to discard those presenting an ancestral state (i.e., that we have in common with several primate species)." It is not clear why the authors are citing the 1000 genomes project. Are they comparing with the reference human genome reference or with all populations in the 1000 genomes project? Also, are the authors allowing derived alleles that are shared with Africans? Typically, populations from Africa are used as controls since the Denisovan introgression occurred in Eurasia.

      The methods section for Figures 4B, 4C, and 4D is a little hard to understand. What is the x-axis on these plots? Is it the number of pairwise differences to Denisovan? The caption is not clear here. The authors mention that "Conversely, for non-introgressed loci (e.g., EGLN1), we might expect a remarkably different pattern of haplotypes distribution, with almost all haplotype classes presenting a larger proportion of non-Tibetan haplotypes rather than Tibetan ones." There is clearly structure in EGLN1. There is a group of non-Tibetan haplotypes that are closer to Denisovan and a group of Tibetan haplotypes that are distant from Denisovan...How do the authors interpret this? 

      In the original signet paper (Guoy and Excoffier 2017), they apply signet to data from Tibetans. Zhang et al. PNAS (2021) also applied it to Tibetans. It would be helpful to highlight how the approach here is different. 

      We thank the Reviewers for having appreciated the rationale of our study and to have identified potential issues that deserve to be addressed in order to better focus on robust results specifically supported by multiple approaches.

      First, we agree with the Reviewers that clarification and justification for the methodologies adopted in the present study should be deepened with respect to what done in the original version of the manuscript, with the purpose of making it more intelligible for a broad range of scientists. As reported thoroughly in the revised version of the text, the VolcanoFinder algorithm, which we used as the primary method to discover new candidate genomic regions affected by events of adaptive introgression, was chosen among several approaches developed to detect signatures ascribable to such an evolutionary process according to the following reasons: i) VolcanoFinder is one of the few methods that can test jointly events of both archaic introgression and adaptive evolution (e.g., the D statistic cannot formally test for the action of natural selection, having been also developed to provide genome wide estimates of allele sharing between archaic and modern groups rather than to identify specific genomic regions enriched for introgressed alleles); ii) the model tested by the VolcanoFinder algorithm remarkably differs from those considered by other methods typically used to test for adaptive introgression, such as the RD, U and Q statistics, which are aimed at identifying chromosomal segments showing low divergence with respect to a specific archaic sequence and/or enriched in alleles uniquely shared between the admixed group and the source population, as well as characterized by a frequency above a certain threshold in the population under study, thus being useful especially to test an evolutionary scenario conformed to that expected in the case that adaptation was mediated by strong selective sweeps rather than weak polygenic mechanisms (see answer to comment #1 of Reviewer #1 for further details); iii) VolcanoFinder relies on less demanding computational efforts respect to other algorithms, such as genomatnn and Maladapt, which also require to be trained on large genomic simulations built specifically to reflect the evolutionary history of the population under study, thus increasing the possibility to introduce bias in the obtained results if the information that guides simulation approaches is not accurate.

      Despite that, we agree with Reviewer #2 that some criteria formerly implemented during the filtering of VolcanoFinder results (e.g., normalization of LR scores, use of a sliding windows approach, and implementation of enrichment analysis based on specific confidence scores) might introduce erratic changes, which depend on the thresholds adopted, in the list of the genomic regions considered as the most likely candidates to have experienced adaptive introgression. To avoid this issue, and to adhere more strictly to the VolcanoFinder pipeline of analyses developed by Setter et al. 2020, in the revised version of the manuscript we have opted to use raw LR scores and to shortlist the most significant results by focusing on loci showing values falling in the top 5% of the genomic distribution obtained for such a statistic (see Materials and methods for details). 

      Moreover, to further reduce the use of potential arbitrary filtering thresholds we decided to do not implement functional enrichment analysis to prioritize results from the VolcanoFinder method. To this end, although a STRING confidence score (i.e., the approximate probability that a predicted interaction exists between two proteins belonging to the same functional pathway according to information stored in the KEGG database) above 0.7 is generally considered a high confidence score (string-db.org, Szklarczyk et al. 2014), we replaced such a prioritization criterion by considering as the most robust candidates for adaptive introgression only those genomic regions that turned out to be supported by all the approaches used (i.e., VolcanoFinder, Signet, LASSI and Haplostrips analyses).

      According to the Reviewers’ comments on the use of the Signet algorithm, we realized that the rationale beyond such a validation approach was not well described in the original version of the manuscript. First and foremost, we would like to clarify that in the present study we did not use this method to test for the action of natural selection (as it was formerly used by Gouy et al. 2017), but specifically to identify genomic regions putatively affected by archaic introgression. For this purpose, we followed the approach described by Gouy and Excoffier 2020 by searching for significant networks of genes presenting archaic-derived variants observable in the considered Tibetan populations but not in an outgroup population of African ancestry. Accordingly, we used the Signet method as an independent approach to obtain a first validation of introgressed (but not necessarily adaptive) loci pointed out by VolcanoFinder results. 

      In detail, in response to the question by Reviewer #2 about which genomic regions have been considered in the Signet analysis, it is necessary to clarify that to obtain the input score associated to each gene along the genome, as required by the algorithm, we calculated average frequency values per gene by considering all the archaic-derived alleles included in the Tibetan dataset but not in the outgroup one. Therefore, we did not take into account only those loci identified as significant by VolcanoFinder analysis, but we performed an independent genome scan. Then, we crosschecked significant results from VolcanoFinder and Signet approaches and we shortlisted the genomic regions supported by both. This approach thus differs from that of Zhang et al. 2021 in which the input scores per gene were obtained by considering only those loci previously pointed out by another method as putatively introgressed. Moreover, as mentioned in the previous paragraph, our approach differs also from that implemented by Guoy et al. 2017, in which the input scores assigned to each gene were represented by the variants showing the smallest P-value associated to a selection statistic, being thus informative about putative adaptive events but not introgression ones.

      However, as correctly pointed out by both the Reviewers, we formerly performed Signet analysis by considering derived alleles shared between Tibetans and the Denisovan species, without filtering out those alleles that are observed also in other modern human populations. We agree with the Reviewers that this approach cannot rule out the possibility of retaining false positive results ascribable to ancestral polymorphisms rather than introgressed alleles. According to the Reviewers’ suggestion, we thus repeated the Signet analysis by removing derived alleles observed also in an outgroup population of African ancestry (i.e., Yoruba), by assuming that only Eurasian H. sapiens populations experienced Denisovan admixture. In detail, we considered only those alleles that: i) were shared between Tibetans and Denisovan (i.e., Denisovan-like alleles); ii) were assumed to be derived according to the comparison with the ancestral reconstructed reference human genome sequence; iii) were completely absent (i.e., present frequency equal to zero) in the Yoruba population sequenced by the 1000 Genomes Project. Despite the comment of Reviewer #1 seems to propose the possible use of Han Chinese as a further control population, we decided to do not filter out Denisovan-like derived alleles present also in this human group because evidence collected so far suggest that Denisovan introgression in the gene pool of East Asian ancestors predated the split between low-altitude and high-altitude populations (Lu et al. 2016; Hu et al. 2017) and, as mentioned before, we aimed at using the Signet algorithm to validate introgression events rather than adaptive ones (see the answer to comment #6 of Reviewer #1 for further details). Moreover, we would like to remark that we decided to maintain the Signet analysis as a validation method in the revised version of the manuscript because: i) comments from both the Reviewers converge in suggesting how to effectively improve this approach, and ii) it represents a method that goes beyond the simple identification of single putative introgressed alleles, by instead enabling us to point out those biological functions that might have been collectively shaped by gene flow from Denisovans.

      In addition to validate genomic regions putatively affected by archaic introgression by crosschecking results from the VolcanoFinder and Signet analyses, according to the suggestion by Reviewer #1 we implemented a further validation procedure aimed at formally testing for the adaptive evolution of the identified candidate introgressed loci. For this purpose, we applied the LASSI likelihood haplotype based method (Harris & DeGiorgio 2020) to Tibetan whole genome data. Notably, we choose this approach mainly for the following reasons: i) because it is able to detect and distinguish genomic regions that have experienced different types of selective events (i.e. strong and weak ones); ii) it has been demonstrated to have increased power in identifying them with respect to other selection statistics (e.g., H12 and nSL) (Harris & DeGiorgio 2020). Again, we performed an independent genome scan using the LASSI algorithm and then we crosschecked the obtained significant results with those previously supported by VolcanoFinder and Signet approaches in order to shortlist genomic regions that have plausibly experienced both archaic introgression and adaptive evolution.

      Moreover, we maintained a final validation step represented by Haplostrips analysis, which was instead specifically performed on chromosomal segments supported by results from both VolcanoFinder, Signet, and LASSI approaches. This enabled us to assess the similarity between Denisovan haplotypes and those observed in Tibetans (i.e., the population under study in which archaic alleles might have played an adaptive role in response to high-altitude selective pressures), Han Chinese (i.e., a sister group whose common ancestors with Tibetans have experienced Denisovan admixture, but have then evolved at low altitude), and Yoruba (i.e., an outgroup that is assumed to have not received gene flow from Denisovans). 

      In conclusion, we believe that the substantial changes incorporated in the manuscript according to the Reviewers’ suggestions strongly improved the study by enabling us to focus on more solid results with respect to those formerly presented. Interestingly, although the single candidate loci supported by all the approaches now implemented for validating the obtained results have attained higher prioritization with respect to previous ones (which are supported by some but not all the adopted methods), angiogenesis still stands out as the one of the main biological functions that have been shaped by events of adaptive introgression in human groups of Tibetan ancestry. This provides new evidence for the contribution of introgressed Denisovan alleles other than the EPAS1 ones in modulating the complex adaptive responses evolved by Himalayan populations to cope with selective pressures imposed by high altitudes.

      Responses to Recommendations For The Authors:

      Reviewer #1:

      The authors mainly relied on one method, VolcanoFinder (VF), to detect adaptive introgression signals. As one of the recently developed methods, VF indeed demonstrated statistical power at detecting mild selection on archaic variants, as well as detecting soft sweeps on standing variations. However, compared to other commonly used methods for detecting adaptive introgression, such as the U and Q stats (Racimo et al. 2017), genomatnn (Gower et al. 2021), or MaLAdapt (Zhang et al. 2023),

      VF doesn't seem to have better power at capturing mild and incomplete sweeps. And it makes me wonder about the justification for choosing VF over other methods here, which is not clearly explained in the manuscript. If these adaptive introgression candidates are legitimate, even if the signals are mild, at least some of the other methods should be able to recapitulate the signature (even if they don't necessarily make it through the genome-wide significance thresholds). I would be more convinced about the archaic origin of these regions if the authors could validate their reported findings using some of the aforementioned other methods. 

      According to the Reviewer’s suggestion, in the revised version of the manuscript we have expanded the considerations reported as concern the rationale that guided the choice of the adopted methods. In particular, in the Materials and methods section (see page 12) we have specificed the reasons for having used the VolcanoFinder algorithm. 

      First, it represents one of the few approaches that relies on a model able to test jointly the occurrence of archaic introgression and the adaptive evolution of the genomic regions affected by archaic gene flow, without the need for considering the putative source of introgression. This was a relevant aspect for us, beacuse we planned to adopt at least two main independent (and possibly quite different in terms of the underlying approaches) methods to validate the identified candidate intregressed loci and the other algorithm we used (i.e., Signet) was explicitly based on the comparison of modern data with the archaic sequence. Accordingly, the model tested by VolcanoFinder differs from those considered by the RD, U and Q statistics. In fact, RD statistic is aimed at identifying regions of the genome with low divergence with respect to a given archaic reference, while the U/Q statistics can detect those chromosomal segments enriched in alleles that are i) uniquely shared between the admixed group (e.g., Tibetans) and the source population (e.g., Denisovans), and ii) that present a frequency above a specific threshold in the admixed population (Racimo et al. 2016). For instance, all the loci considered as likely involved in adaptive introgression events by Racimo et al. 2016 presented remarkable frequencies, with most of them showing values above 50%. That being so, we decided to do not implement these methods because we believe that they are more suitable for the detection of adaptive introgression events involving few variants with a strong effect on the phenotype, which comport a substantial increase in frequency in the population subjected to the selective pressure (i.e., cases such as that of  EPAS1), while it appears challenging to choose an arbitrary frequency threshold appropriate for the detection of weak and/or polygenic selective events. 

      As regards the possible use of Maladapt or genomatnn approaches as validation methods, we believe that they rely on more demanding computational efforts with respect to the Signet algorithm and, above all, they have the disadvantage of requiring to be trained on simulated genomic data. This makes them more prone to the potential bias introduced in the obtained results by simulations that do not carefully reflect the evolutionary history of the population under study.

      Overall, we do not agree with the Reviwer’s statement about the fact that we mainly relied on a single method to detect adaptive introgression signals because, as mentioned above, the Signet algorithm was specifically used to identify genomic regions putatively affected by introgression. This method relies on assumptions very similar to those described above for the U/Q statistics (e.g. it considers alleles uniquely shared between Tibetans and Denisovans), but avoids the necessity to select a frequency threshold to shortlist the most likely adaptive intregressed loci. In addition, according to another suggestion by the Reviewer we have now implemented a further approach to provide evidence for the adaptive evolution of the candidate introgressed loci (see response to comment #3).  

      As regards the use of Signet, based on comments from both the Reviewers we realized that the rationale beyond such a validation approach was not well described in the original version of the manuscript. First and foremost, we would like to clarify that in the present study we did not use this method to test for the action of natural selection (as it was formerly used by Gouy et al. 2017), but specifically to identify genomic regions putatively affected by archaic introgression. For this purpose, we followed the approach described by Gouy and Excoffier (2020) by searching for significant networks of genes presenting archaic-derived variants observable in the considered Tibetan populations. That being so, we used the Signet method as an independent approach to obtain a first validation of VolcanoFinder results. However, by following suggestions from both the Reviweres, we modified the criteria adopted to filter for archaic-derived variants, by excluding those alleles in common between Denisovan and the Yoruba outgroup population (see response to comment #6 for further information regarding this aspect). 

      To sum up, we think that the combination of VolcanoFinder and Signet+LASSI approaches offered a good compromise between required computational efforts to shortlist the most robust candidates of adaptive introgressed loci and the typologies of model tested (i.e. that does not diascard a priori genomic signatures ascribable to weak and/or polygenic selective events). Morevoer, we would like to remark that we decided to maintain the Signet method as a validation approach in the revised version of the manuscript because: i) comments from both the Reviewers converge in suggesting how to effectively improve this approach, and ii) it represents a method that can be used to perform both single-locus validation analysis and to search for those biological functions that have been collectively much more impacted by archaic introgression, allowing to test a more realistic approximation of the polygenic model of adaptation involving introgressed alleles. In fact, although the single candidate loci supported by all the approaches now implemented for validating the obtained results  (see responses to comments #3 and #7 for further details) have attained higher prioritization with respect to previous ones (i.e., EP300 and NOS2, which are now supported by some but not all the adopted methods), angiogenesis still stands out as one of the main biological functions that have been shaped by events of adaptive introgression in the ancestors of Tibetan populations. 

      Besides, I am a little surprised to see that in Supplementary Figure 2, VF didn't seem to capture more significant LR values in the EPAS1 region (positive control of adaptive introgression) than in the negative control EGLN1 region. The author explained this as the selection on EPAS1 region is "not soft enough", which I find a bit confusing. If there is no major difference in significant values between the positive and negative controls, how would the authors be convinced the significant values they detected in their two genes are true positives? I would like to see more discussion and justification of the VF results and interpretations.

      In the light of such a Reviewer’s observation and according to the Reviewer #2 overall comment on the procedures implemented for filtering VolcanoFinder results, we realized that both normalization of  LR scores and the use of a sliding windows approach might introduce erratic changes, which depend on the thresholds adopted, in the list of the genomic regions considered as the most likely candidates to have experienced adaptive introgression. To avoid this issue, and to adhere more strictly to the VolcanoFinder pipeline of analyses developed by Setter et al. 2020, in the revised version of the manuscript we have opted to use raw LR scores and to shortlist the most significant results by focusing on loci showing values falling in the top 5% of the genomic distribution obtained for such a statistic (see Materials and methods, page 13 lines 4 -16 for further details).

      By following this approach, we indeed observed a pattern clearer than that previously described, in which the distribution of LR scores in the EPAS1 genomic region is remarkably different with respect to that obtained for the EGLN1 gene (Figure 2 – figure supplement 1). More in detail, we identified a total of 19 EPAS1 variants showing scores within the top 5% of LR values, in contrast to only three EGLN1 SNVs. Moreover, LR values were collectively more aggregated in the EPAS1 genomic region and showed a higher average value with respect to what observed for EGLN1. We reported LR values, as well as -log (a) scores calculated for these control genes in Supplement tables 3 and 4.

      Nevertheless, we agree with the Reviewer that results pointed out by VolcanoFinder require to be confirmed by additional methods, which is was what we have done to define both new candidate adaptive intregressed loci and the considered positive/negative controls. In fact, validation analyses performed to confirm signatures of both archaic introgression and adaptive evolution (i.e., Signet, LASSI and Haplostrips) converged in indicating that Tibetan variability at the EGLN1 gene does not seem to have been shaped by archaic introgression events but only by the action of natural selection (see Results, page 5 lines 3-9, page 6 lines 23-25, page 7 lines 29-36; Discussion page 14 lines 33-36; Figure 2 – figure supplement 1B and Figure 4 – figure supplement 1B, 3B and 3D), also according to what was previously proposed (Hu et al., 2017). On the other hand, results from all validation analyses confirmed adaptive introgression signatures at the EPAS1 genomic region (see Results page 4 lines 32-37, page 5 lines 1-2 and 30-34, page 6 lines 23-29; Figure 3A, 3B and Figure 4 – figure supplement 1A, 3A and 3C). 

      Finally, as already reported in the former version of the manuscript, our choice of considering EPAS1 and EGLN1 respectively as positive and negative controls for adaptive introgression was guided by previous evidence suggesting these loci as targets of natural selection in high-altitude Himalayan populations (Yang et al., 2017; Liu et al., 2022), although only EPAS1 was proved to have been involved also in an adaptive introgression event (Huerta-Sanchez et al., 2014; Hu et al., 2017). 

      With that being said, I suggest the authors try to first validate the signal of positive selection in the two gene regions using methods such as H2/H1 (Garud et al. 2015), iHS (Voight et al. 2006) etc. that have demonstrated power and success at detecting mild sweeps and soft sweeps, regardless of if these are adaptive introgression.

      According to the Reviewer’s suggestion, we validated the new candidate adaptive introgressed loci by using also a method to formally test for the action of natural selection. In particular, we decided to use the LASSI (Likelihood-based Approach for Selective Sweep Inference) algorithm developed by Harris & DeGiorgio (2020) mainly for the following reasons: i) it is able to identify both strong and weak genomic signatures of positive selection similarly to others approaches, but additionally it can distinguish these signals by explicitly classifying genomic windows affected by hard or soft selective sweeps; ii) when applied on simulated data generated under different demographic models and by setting a range of different values for the parameters that describe a selective event (e.g., the time at which the beneficial mutation arose, the selection coefficient s) it has been proved to have an increased power with respect to traditional selection scans, such as nSL, H2/H1 and H12 (see Harris & DeGiorgio 2020 for further details).  

      According to such an approach, we were able to recapitulate signatures of natural selection previously observed in Tibetans for both EPAS1 and EGLN1 (Figure 4 – figure supplement 1 and 3C – 3D).  We also obtained comparable patterns for our previous candidate adaptive introgressed loci (i.e., EP300 and NOS2), as well as for the new ones that have been instead prioritized in the revised version of the manuscript according to consistent results also from VolcanoFinder, Signet and Haplostrips analyses (see Results, page 6 lines 30-35; Figure 4C, 4D, Figure 4 – figure supplement 2C and 2D).    

      With regard to the plausible archaic origin of the haplotypes under selection in these gene regions, my concern comes from the fact that other recent studies characterizing the archaic ancestry landscape in Tibetans and East Asians (eg. SPrime reports from Browning et al. 2018, as well as ArchaicSeeker reports from Yuan et al. 2021) didn't report archaic segments in regions overlapping with EP300 and NOS2. So how would the authors explain the discrepancy here, that adaptive introgression is detected yet there is little evidence of archaic segments in the regions? 

      We thank the Reviewer for the comment and the references provided. However, we read the suggested articles and in both of them it does not seem that genomes from individuals of Tibetan ancestry have been analysed. Moreover, in the study by Yuan et al. 2021 we were not able to find any table or supplementary table reporting the genomic segments showing signatures of Denisovan-like introgression in East Asian groups, with only findings from enrichment analyses performed on significant results being described for the Papuan population. Anyway, as reported below in the response to comment #5, in line with what observed by the Reviwer as concerns the original version of the manuscript, according to the additional validation analyses implemented during this revison EP300 and NOS2 received lower prioritization with respect to other loci showing more robust signatures supporting introgression of Denisovan alleles in the gene pool of Tibetan ancestors (i.e., TBC1D1, PRKAG2, KRAS and RASGRF2). Three out of four of these genes are in accordance also with previously published results supporting introgression of Denisovan alleles in the ancestors of present-day Han Chinese (Browning et al. 2018) or directly in the Tibetan genomes (Hu et al. 2017) (see Results, page 5 lines 10-21 and Supplement table 5). Despite that, the reason why not all the candidate adaptive introgression regions detected by our analyses are found among results from Browning et al. 2018 can be represented by the fact that in Han Chinese this archaic variation could have evolved neutrally after the introgression events, thus preventing the identification of chromosomal segments enriched in putative archaic introgressed variants according to VolcanoFinder and LASSI approaches (which consider also the impact of natural selection). In fact, the Sprime method implemented by Browning et al. 2018 focuses only on introgression events rather than adaptive introgression ones. For instance, the Denisovan-like regions identified with Sprime in Han Chinese by such a study do not comprise at all the EPAS1 region. 

      Additionally, looking at Figure 4 and Supplementary Figure 4, the authors showed haplotype comparisons between Tibetans, Denisovan, and Han Chinese for EP300 and NOS2 regions. However, in both figures, there are about equal number of Tibetans and Han Chinese that harbor the haplotype with somewhat close distance to the Denisovan genotype. And this closest haplotype is not even that similar to the Denisovan. So how would the authors rule out the possibility that instead of adaptive introgression, the selection was acting on just an ancestral modern human haplotype?

      We agree with the Reviewer that according to the analyses presented in the original version of the manuscript haplotype patterns observed at EP300 and NOS2 loci by means of the Haplostrips approach cannot ruled out the possibility that their adaptative evolution involved ancestral modern human haplotypes. In fact, after the modifications implemented in the adopted pipeline of analyses based on the Reviewers’ suggestions, their role in modulating complex adaptations to high-altitudes was confirmed also by results obtained with the LASSI algorithm (in addition to results from previous studies Bigham et al., 2010; Zheng et al., 2017; Deng et al., 2019; X. Zhang et al., 2020), but their putative archaic origin received lower prioritization with respect to other loci, being not confirmed by all the analyses performed.

      Furthermore, I have a question about how exactly the authors scored the genes in their network analysis using Signet. The manuscript mentioned they were looking for enrichment of archaic-like derived alleles, and in the methods section, they mentioned they used SNPs that are present in both Denisovan and Tibetan genomes but are not in the chimp ancestral allele state. But are these "derived" alleles also present in Han Chinese or Africans? If so, what are the frequencies? And if the authors didn't use derived alleles exclusively shared between Tibetans and Denisovans, that may lead to false positives of the enrichment analysis, as the result would not be able to rule out the selection on ancestral modern human variation.

      As mentioned in the response to comment #1, by following the suggestions of both the Reviewers we have modified the criteria adopted for filtering archaic derived variants exclusively shared between Denisovans and Tibetans. In particular, we retained as input for Signet analysis only those alleles that i) were shared between Tibetans and Denisovan (i.e., Denisovan-like alleles) ii) were in their derived state and iii) were completely absent (i.e., show frequency equal to zero) in the Yoruba population sequenced by the 1000 Genome Project and used here as an outgroup by assuming that only Eurasian H. sapiens populations experienced Denisovan admixture. We instead decided to do not filter out potential Denisovan-like derived alleles present also in the Han Chinese population because multiple evidence agreed at indicating that gene flow from Denisovans occurred in the ancestral East Asian gene pool no sooner than 48–46 thousand years ago (Teixeira et al. 2019; Zhang et al. 2021; Yuan et al. 2021), thus predating the split between low-altitude and high-altitude groups, which occurred approximately 15 thousand years ago (Lu et al. 2016; Hu et al. 2017). In fact, traces of such an archaic gene-flow are still detectable in the genomes of several low-altitude populations of East Asian ancestry (Yuan et al. 2021).

      Concerning the above, I would also suggest the authors replot their Figure 4 and Figure S4 by adding the African population (eg. YRI) in the plot, and examine the genetic distance among the modern human haplotypes, in contrast to their distance to Denisovan.

      According to the Reviewer’s suggestion, after having identified new candidate adaptive introgressed loci according to the revised pipeline of analyses, we run the Haplostrips algorithm by including in the dataset 27 individuals (i.e., 54 haplotypes) from the Yoruba population sequenced by the 1000 Genomes Project (Figure 4A, 4B, Figure 4 - figure supplement 2A, 2B, 3A).

      Reviewer #2:

      In the methods the authors write "Since composite likelihood statistics are not associated with pvalues, we implemented multiple procedures to filter SNVs according to the significance of their LR values." What does significance mean here?

      After modifications applied to the adopted pipeline of analyses according to the Reviewers’ suggestions (see responses to public reviews and to comments #1, #3, #6, #7 of Reviewer #1), new candidate adaptive introgressed loci have been identified specifically by focusing on variants showing LR values falling in the top 5% of the genomic distribution obtained for such a statistic in order to adhere more strictly to the VolcanoFinder approach developed by Setter et al. 2020. Therefore, the related sentence in the materials and methods section was modified accordingly.

      Signet should be cited the first time it appears in the manuscript. The citation in the references is wrong. It lists R. Nielsen as the last author, but R. Nielsen is not an author of this paper.

      We thank the Reviewer for the comment. We have now mentioned the article by Gouy and Excoffier (2020) in the Results section where the Signet algorithm was first described and we have corrected the related reference.

      I could not find Figure 5 which is cited in the methods in the main text. I assume the authors mean Supplementary Figure 5, but the supplementary files have Figure 4.

      We thank the Reviewer for the comment. We have checked and modified figures included in the article and in the supplementary files to fix this issue.

      I didn't see a table with the genes identified as adaptatively introgressed with VolcanoFinder. This would be useful as I believe this is the first time VolcanoFinder is being used on Tibetan data?

      According to the Reviewer suggestion, we have reported in Supplement table 2 all the variants showing LR scores falling in the top 5% of the genomic distribution obtained for such a statistic, along with the associated α parameters computed by the VolcanoFinder algorithm.

      It is easier for the reviewer if lines have numbers.

      According to the Reviewer suggestion, we have included line numbers in the revised version of the manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      The authors aimed to elucidate the cytological mechanisms by which conjugated linoleic acids (CLAs) influence intramuscular fat deposition and muscle fiber transformation in pig models. Utilizing single-nucleus RNA sequencing (snRNA-seq), the study explores how CLA supplementation alters cell populations, muscle fiber types, and adipocyte differentiation pathways in pig skeletal muscles.

      Thanks!

      Strengths:

      Innovative approach: The use of snRNA-seq provides a high-resolution insight into the cellular heterogeneity of pig skeletal muscle, enhancing our understanding of the intricate cellular dynamics influenced by nutritional regulation strategy.

      Robust validation: The study utilizes multiple pig models, including Heigai and Laiwu pigs, to validate the differentiation trajectories of adipocytes and the effects of CLA on muscle fiber type transformation. The reproducibility of these findings across different (nutritional vs genetic) models enhances the reliability of the results.

      Advanced data analysis: The integration of pseudotemporal trajectory analysis and cell-cell communication analysis allows for a comprehensive understanding of the functional implications of the cellular changes observed.

      Practical relevance: The findings have significant implications for improving meat quality, which is valuable for both the agricultural and food industry.

      Thanks!

      Weaknesses:

      Model generalizability: While pigs are excellent models for human physiology, the translation of these findings to human health, especially in diverse populations, needs careful consideration.

      Thanks!

      Reviewer #2 (Public Review):

      Summary:

      This study comprehensively presents data from single nuclei sequencing of Heigai pig skeletal muscle in response to conjugated linoleic acid supplementation. The authors identify changes in myofiber type and adipocyte subpopulations induced by linoleic acid at depth previously unobserved. The authors show that linoleic acid supplementation decreased the total myofiber count, specifically reducing type II muscle fiber types (IIB), myotendinous junctions, and neuromuscular junctions, whereas type I muscle fibers are increased. Moreover, the authors identify changes in adipocyte pools, specifically in a population marked by SCD1/DGAT2. To validate the skeletal muscle remodeling in response to linoleic acid supplementation, the authors compare transcriptomics data from Laiwu pigs, a model of high intramuscular fat, to Heigai pigs. The results verify changes in adipocyte subpopulations when pigs have higher intramuscular fat, either genetically or diet-induced. Targeted examination using cell-cell communication network analysis revealed associations with high intramuscular fat with fibro-adipogenic progenitors (FAPs).  The authors then conclude that conjugated linoleic acid induces FAPs towards adipogenic commitment. Specifically, they show that linoleic acid stimulates FAPs to become SCD1/DGAT2+ adipocytes via JNK signaling. The authors conclude that their findings demonstrate the effects of conjugated linoleic acid on skeletal muscle fat formation in pigs, which could serve as a model for studying human skeletal muscle diseases.

      Thanks!

      Strengths:

      The comprehensive data analysis provides information on conjugated linoleic acid effects on pig skeletal muscle and organ function. The notion that linoleic acid induces skeletal muscle composition and fat accumulation is considered a strength and demonstrates the effect of dietary interactions on organ remodeling. This could have implications for the pig farming industry to promote muscle marbling. Additionally, these data may inform the remodeling of human skeletal muscle under dietary behaviors, such as elimination and supplementation diets and chronic overnutrition of nutrient-poor diets. However, the biggest strength resides in thorough data collection at the single nuclei level, which was extrapolated to other types of Chinese pigs.

      Thanks!

      Weaknesses:

      While the authors generated a sizeable comprehensive dataset, cellular and molecular validation needed to be improved. For example, the single nuclei data suggest changes in myofiber type after linoleic acid supplementation, yet these data are not validated by other methodologies. Similarly, the authors suggest that linoleic acid alters adipocyte populations, FAPs, and preadipocytes; however, no cellular and molecular analysis was performed to reveal if these trajectories indeed apply. Attempts to identify JNK signaling pathways appear superficial and do not delve deeper into mechanistic action or transcriptional regulation. Notably, a variety of single cell studies have been performed on mouse/human skeletal muscle and adipose tissues. Yet, the authors need to discuss how the populations they have identified support the existing literature on cell-type populations in skeletal muscle.Moreover, the authors nicely incorporate the two pig models into their results, but the authors only examine one muscle group. It would be interesting if other muscle groups respond similarly or differently in response to linoleic acid supplementation.Further, it was unclear whether Heigai and Laiwu pigs were both fed conjugated linoleic acid or whether the comparison between Heigai-fed linoleic acid and Laiwu pigs (as a model of high intramuscular fat). With this in mind, the authors do not discuss how their results could be implicated in human and pig nutrition, such as desirability and cost-effectiveness for pig farmers and human diets high in linoleic acid. Notably, while single nuclei data is comprehensive, there needs to be a statement on data deposition and code availability, allowing others access to these datasets. Moreover, the experimental designs do not denote the conjugated linoleic acid supplementation duration. Several immunostainings performed could be quantified to validate statements. This reviewer also found the Nile Red staining hard to interpret visually and did not appear to support the conclusions convincingly. Within Figure 7, several letters (assuming they represent statistical significance) are present on the graphs but are not denoted within the figure legend.

      Thanks for your suggestions! We accepted your suggestion to revised our manuscript.

      For changes in myofiber type, we performed qPCR to verify the changes of muscle fiber type related gene expression after CLA treatment (Figure 2E); for changes of adipocyte and preadipocyte populations, we also performed immunofluorescence staining, qPCR, and western blotting in LDM tissues and FAPs to verify the alterations of cell types after feeding with CLA (Figure 3D, 3E, 6G, 7C, and 7D). Hence, we think these cellular and molecular results could support our conclusions.

      For JNK signaling pathway, we selected this signaling pathway based on snRNA-seq dataset and verified by activator in vitro experiment. However, we did not explore the mechanistic action and the downstream transcriptional regulators need to be further discussed. We have added these in the discussion part (line 443-448).

      We have added the comparation between different cell-type populations in skeletal muscles (line 362-368 and 385-390).

      For changes in myofiber type of Laiwu pigs, we have discussed in our previous study(Wang et al., 2023). Interestingly, we also found in high IMF content Laiwu pigs, the percentage of type IIa myofibers had an increased tendency (29.37% vs. 23.95%) while the percentage of type IIb myofibers had a decreased tendency (38.56% vs. 43.75%) in this study. We also added this discussion in the discussion part (line 392-395).

      We have supplied the information of treatment in the materials and methods part (line 469-478). We also added the discussion about significance of our study for human and pig nutrition in the discussion part (line 375-376 and 446-447).

      Our data will be made available on reasonable request (line 574-576).

      We have supplied the information of the CLA supplementation duration in the materials and methods part (line 465).

      Porcine FAPs have little lipid droplets and we improved the image quality (Figure 7A). In Figure 7, the Nile Red staining could be quantified and we have the quantification of Oil Red O staining (Figure 7B and 7J). We also added the statistical significance in figure legend.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Suggestions for Improved or Additional Experiments, Data, or Analyses

      Cross-species analysis: To strengthen the generalizability of the results, it would be beneficial to include a comparative analysis with other species, such as human, bovine, or rodent models, using publicly available snRNA-seq datasets.

      Thanks! Our previous study has compared the conserved and unique signatures in fatty skeletal muscles between different species(Wang, Zhou, Wang, & Shan, 2024). We mainly focused on the regulatory mechanism of CLAs in regulating intramuscular fat deposition. However, there is still a blank in the snRNA-seq or scRNA-seq datasets about the effects of CLAs on regulating fat deposition in muscles across other species, including human, bovine or rodent models. Hence, we only analyze the regulatory mechanisms of CLAs influencing intramuscular fat deposition in pigs.

      Functional link: the authors should discuss in the manuscript how the muscles differ in terms of texture, flavor, aroma, etc. before and after CLA administration or between Heigai and Laiwu to provide context and help readers better understand how the observed high-resolution cellular changes relate to these functional properties of meat.

      Thanks! We have added these in the introduction part (line 90-98).

      Improve figures: some figures, particularly those involving Oil Red O and Nail Red, could be improved by including higher magnification images to assess the organization of lipid droplets of individual adipocytes (Figure 7A, I, and K).

      Thanks! Porcine FAPs have little lipid droplets and we improved the image quality (Figure 7A).

      Reviewer #2 (Recommendations For The Authors):

      All of my comments are above. However, I would recommend improving the writing as several areas throughout the results needed clarity.

      Thanks! We have revised our manuscript carefully after accepting your revisions.

      Wang, L., Zhao, X., Liu, S., You, W., Huang, Y., Zhou, Y., . . . Shan, T. (2023) Single-nucleus and bulk RNA sequencing reveal cellular and transcriptional mechanisms underlying lipid dynamics in high marbled pork NPJ Sci Food 7: 23. https://doi.org/10.1038/s41538-023-00203-4

      Wang, L., Zhou, Y., Wang, Y., & Shan, T. (2024) Integrative cross-species analysis reveals conserved and unique signatures in fatty skeletal muscles Sci Data 11: 290. https://doi.org/10.1038/s41597-024-03114-5

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      Moir, Merheb et al. present an intriguing investigation into the pathogenesis of Pol III variants associated with neurodegeneration. They established an inducible mouse model to overcome developmental lethality, administering 5 doses of tamoxifen to initiate the knock-in of the mutant allele. Subsequent behavioral assessments and histological analyses revealed potential neurological deficits. Robust analyses of the tRNA transcriptome, conducted via northern blotting and RNA sequencing, suggested a selective deleterious effect of the variant on the cerebrum, in contrast to the cerebellum and non-cerebral tissues. Through this work, the authors identified molecular changes caused by Pol III mutations, particularly in the tRNA transcriptome, and demonstrated its relative progression and selectivity in brain tissue. Overall, this study provides valuable insights into the neurological manifestations of certain genetic disorders and sheds light on transcripts/products that are constitutively expressed in various tissues.

      Strengths:

      The authors utilize an innovative mouse model to constitutively knock in the gene, enhancing the study's robustness. Behavioral data collection using a spectrometer reduces experimenter bias and effectively complements the neurological disorder manifestations. Transcriptome analyses are extensive and informative, covering various tissue types and identifying stress response elements and mitochondrial transcriptome patterns. Additionally, metabolic studies involving pancreatic activity and glucose consumption were conducted to eliminate potential glucose dysfunction, strengthening the histological analyses.

      Weaknesses:

      The study could have explored identifying the extent of changes in the tRNA transcriptome among different cell types in the cerebrum. Although the authors attempted to show the temporal progression of tRNA transcriptome changes between P42 and P75 mice, the causal link was not established. A subsequent rescue experiment in the future could address this gap.

      Nonetheless, the claims and conclusions are supported by the presented data.

      We thank Reviewer 1 for their thoughtful review and commentary.  We appreciate the reviewer’s finding that our “claims and conclusions are supported by the presented data.”   

      We note that our findings on the temporal progression of transcriptional changes between P42 and P75 apply to both the Pol II and Pol III transcriptomes. Importantly, in the case of Pol III, only precursor and mature tRNAs are affected at P42 whereas at P75, numerous other Pol III transcripts are also changed.  We therefore attribute the changes in tRNA as being causal in disease initiation since this is the earliest  direct consequence of the Polr3a mutation.

      To expand on the evidence demonstrating the progressive nature of Polr3-related disease in our mouse model, the revised manuscript includes new immunofluorescence data showing no change in microglial cell density in the cerebral cortex or the striatum at an early stage in the disease (Supplementary Fig. S6F, G).  This is in striking contrast to the findings at later times (P75) where the number of microglia increased significantly in the Polr3a mutant and exhibit an activated morphology (Fig. 4G,H).   

      We agree with the reviewer that it will be interesting in the future to assess the impact of the Polr3a mutation in different neural cell types and to explore opportunities for suppressing disease phenotypes. 

      Reviewer #2 (Public Review):

      Summary:

      The study "Molecular basis of neurodegeneration in a mouse model of Polr3 related disease" by Moir et.al. showed that how RNA Pol III mutation affects production, maturation and transport of tRNAs. Furthermore, their study suggested that RNA pol III mutation leads to behavioural deficits that are commonly observed in neurodegeneration. Although, this study used a mouse model to establish theses aspects, the study seems to lack a clear direction and mechanism as to how the altered level of tRNA affects locomotor behaviour. They should have used conditional mouse to delete the gene in specific brain area to test their hypothesis. Otherwise, this study shows a more generalized developmental effect rather than specific function of altered tRNA level. This is very evident from their bulk RNA sequencing study. This study provides some discrete information rather than a coherent story. My enthusiasm for publication of this article in eLife is dampened considering following reasons mentioned in the weakness.

      Reviewer 2’s summary contains two misstatements: 

      Moir et.al. showed that how RNA Pol III mutation affects production, maturation and transport of tRNAs.

      Our experiments document the effect of a neurodegenerative disease-causing mutation in RNA polymerase III on the Pol III transcriptome with a particular focus on the tRNAome (i.e. the mature tRNA population). Experiments on the maturation and transport of tRNA were not performed as there was no indication that these processes might be negatively impacted at the earliest time point (P42). Additional comments about tRNA maturation and export are provided under points 8 and 9 (see below). 

      The study seems to lack a clear direction and mechanism as to how the altered level of tRNA affects locomotor behaviour.

      This comment misstates the purpose of our study while overlooking the important results. As stated in the abstract, our goal was to develop “a postnatal whole-body mouse model expressing pathogenic Polr3a mutations to examine the molecular mechanisms by which reduced Pol III transcription results primarily in central nervous system phenotypes.”

      Accordingly, our work provides the first molecular analysis of RNA polymerase III transcription in an animal model of Polr3-related disease. The novelty and importance of the findings, as stated in the abstract, include the discovery that a global reduction in tRNA levels (and not other Pol III transcripts) at an early stage in the disease precedes the frank induction of integrated stress and innate immune responses, activation of microglia and neuronal loss at later times. These later events readily account for the observed neurobehavioral deficits that collectively include risk assessment, locomotor, exploratory and grooming behaviors. 

      Strengths:

      The study created a mouse model to investigate role of RNA PolIII transcription. Furthermore, the study provided RNA seq analysis of the mutant mice and highlighted expression specific transcripts affected by the RNA PolIII mutation.

      Weaknesses:

      (1) The abstract is not clearly written. It is hard to interpret what is the objective of the study and why they are important to investigate. For example: "The molecular basis of disease pathogenesis is unknown." Which disease? 4H leukodystrophy? All neurodegenerative disease?

      We have modified the abstract to more clearly frame the objective of the study and its importance as reflected in the title “Molecular basis of neurodegeneration in a mouse model of Polr3-related disease”. We hope the reviewer will agree that the fourth sentence of the abstract, unchanged from the initial submission, clearly outlines the objective of the study.  

      (2) How cerebral pathology and exocrine pancreatic atrophy are related? How altered tRNA level connects these two axes?

      It is not known how cerebral pathology and exocrine pancreatic atrophy are related beyond their shared Pol III dysfunction in our mouse model of Polr3-related disease. We anticipate that altered tRNA levels connect these two axes. Indeed, the pancreas and the brain are both known to be highly sensitive to perturbations affecting translation (Costa-Mattioli and Walter, 2020 Science doi: 10.1126/science.aat5314). Changes to the tRNA population in the cerebrum and cerebellum of Polr3a mutant mice were extensively documented in the manuscript (e.g. Figs. 3, 5 and 6).  We also found reduced tRNA levels in the pancreas of the mutant mice but did not report these findings due to the absence of a stable reference transcript in total RNA from the atrophied pancreatic tissue, even at the earliest time point examined (P42). 

      (3) Authors mentioned that previously observed reduction mature tRNA level also recapitulated in their study. Why this study is novel then?

      Our study reports the novel finding that a pathogenic Polr3a mutation causes a global reduction in the steady state levels of mature tRNAs, i.e. the levels of all tRNA decoders were reduced with the vast majority these reaching statistical significance (Fig. 6D and 6F). In the introduction we refer to several studies that examined the effect of pathogenic Polr3 mutations on the levels of Pol III-derived transcripts. We noted that these studies examined only a small number of Pol III transcripts in CRISPR-Cas9 engineered cell lines, patient-derived fibroblasts and patient blood. Thus, no study until now has tested for or reported a global defect in the abundance of mature tRNAs in any model of Polr3-related disease. Moreover, no previous study of _Polr3_related disease has analyzed Pol III transcript levels in the brain or in any other tissue. 

      (4) It is very intuitive that deficit in Pol III transcription would severely affect protein synthesis in all brain areas as well as other organs. Hence, growth defect observed in Polr3a mutant mice is not very specific rather a general phenomenon.

      While we agree with the simple assumption that a “deficit in Pol III transcription likely would affect protein synthesis in all brain areas as well as other organs”, this turned out not to be the case. In fact, a novel finding of our study is that not all Polr3a mutant tissues show a translation stress response despite reduced Pol III transcription and reduced mature tRNA levels. This implies that in some tissues the reduction in tRNA levels caused by the Polr3a mutation is not sufficient to affect protein synthesis, at least to a point where the Integrated Stress Response is induced. The underlying basis for the growth deficit has not been defined in this work. However, we noted in the discussion that a growth defect was previously seen in mice where expression of the Polr3a mutation was restricted to the Olig2 lineage.  In the present postnatal whole-body inducible model, we anticipate that the diminished growth of the mice results from a combination of hormonal and nutritional deficits caused by cerebral and pancreatic dysfunction.

      (5) Authors observed specific myelination defect in cortex and hippocampus but not in cerebellum. This is an interesting observation. It is important to find the link between tRNA removal and myelin depletion in hippocampus or cortex? Why is myelination not affected in cerebellum?

      We agree that the specific myelin defect observed in the cortex and hippocampus, but not the cerebellum, is an interesting observation. Pol III dysfunction in this model and reduced tRNA levels are common to both cerebra and cerebella, yet the pathological consequences differ between these regions.  While we do not know why this is the case, the cells that oligodendrocytes support in these regions are functionally different. We suggest in the discussion that subtle defects in oligodendrocyte function in the cerebellum may be uncovered using more sensitive or specific assays than the ones we have employed to date.  In addition, consistent with our findings in other tissues where Pol III transcription and tRNA levels are reduced but phenotypes are lacking, we suggest that oligodendrocytes in the cerebellum may have a different minimum threshold for Pol III activity than in other regions of the brain. 

      (6) How was the locomotor activity measured? The detailed description is missing. Also, locomotion is primarily cerebellum dependent. There is no change in term of growth rate and myelination in cerebellar neurons. I do not understand why locomotor activity was measured.

      We used a behavioral spectrometer with video tracking and pattern-recognition software to quantify ~20 home cage-like behaviors, including locomotor activity, as part of our phenotypic characterization of the mice. This experimenter-unbiased approach reported several metrics of locomotion, specifically, total Track length (the total distance traveled in the instrument), Center Track length and the time spent running (Run Sum) and standing still (Still Sum) in a longitudinal study (Figs. 2A-C and Supplemental Fig. S3A-C). The Materials and Methods section on mouse behavior has been amended to provide a detailed description of these experiments. 

      locomotion is primarily cerebellum dependen_t_

      While we agree that the cerebellum plays a critical role in balance and locomotion, regions of the cerebrum that are affected in our mice, including the primary motor cortex and the basal ganglia (Fig. 4), also have important roles in locomotor activity and control. 

      (7) The correlation with behavioural changes and RNA seq data is missing. There a number of transcripts are affected and mostly very general factors for cellular metabolism. Most of them are RNA Pol II transcribed. How a Pol III mutation influences RNA Pol II driven transcription? I did not find differential expression of any specific transcripts associated with behavioural changes. What is the motivation for transcriptomics analysis? None of these transcripts are very specific for myelination. It is rather a general cellular metabolism effect that indirectly influences myelination.

      The differentially expressed mRNAs identified in our RNAseq analysis at P75 reflect both direct and secondary consequences of dysfunctional Pol III transcription on Pol II transcription. These effects can be achieved by multiple mechanisms. Induction of the Integrated Stress Response (ISR) due to insufficient tRNA can be considered a direct consequence of diminished Pol III transcription on Pol II transcription. An example of a secondary response is the activation of microglia and the innate immune response (which is known to accompany prolonged activation of the ISR), and the loss of neurons and oligodendrocytes. These changes are documented in Figs. 3 and 4. Importantly, loss of neurons, activated microglia and reduced oligodendrocyte numbers are each readily reconciled with changes in behavior.  

      None of these transcripts are very specific for myelination 

      The RNAseq data at P75 indicates only a modest reduction in oligodendrocyte-specific gene expression (as defined by single-cell RNAseq studies of purified cell populations, Mackenzie et al., 2018 Sci. Rep. doi: 10.1038/s41598-018-27293-5). Despite this, some oligodendrocytespecific transcripts with well-known roles in myelination were down-regulated in the Polr3a mutant (e.g. Plp1, Mog and Mobp). In addition, steroid synthesis pathway transcripts involved in the production of cholesterol, an abundant and essential component of myelin, were also downregulated (Supplementary Fig. S4E).

      (8) What genes identified by transcriptomics analysis regulates maturation of tRNA? Authors should at least perform RNAi study to identify possible factor and analyze their importance in maturation of tRNA.

      Of the many proteins involved in the maturation of tRNA (Phizicky and Hopper, 2023 RNA doi: 10.1261/rna.079620.123), RNAseq analysis at P75 identified only amino-acyl tRNA synthetases as being differentially-expressed (fold change >1.5, p adj. < 0.05, Table S1). These genes are canonical indicators of the ATF4-dependent Integrated Stress Response and their upregulation is widely interpreted as an attempt to restore efficient translation. In addition, our analysis of Pol III transcripts at P75 identified a reduction in the level of RppH1 (Fig. 3C), the RNA component of RNase P, which removes the 5’ leader of precursor tRNAs.  However, at P42, there was no effect on RppH1 abundance, or the expression of amino-acyl tRNA synthetase genes (Fig. 5C and Table S3).  Thus, an RNAi study to identify and analyze a possible factor involved in the maturation of tRNA is neither warranted nor relevant to the current body of work.

      (9) What factors are influencing tRNA transport to cytoplasm? It may be possible that Polr3a mutation affect cytoplasmic transport of tRNA. Authors should study this aspect using an imaging experiment.

      Our analysis of tRNA populations in this study employed total cellular RNA and thus reflect the abundance of mature tRNA from all cellular compartments. We have not assessed whether the reduction in tRNA abundance caused by the Polr3a mutation alters the dynamics of tRNA transport from the nucleus to the cytoplasm. However, we consider it highly unlikely that the Polr3a mutation would have a significant effect on cytoplasmic transport of tRNA. Imaging experiments along these lines are beyond the scope of the current study.

      (10) Does alteration of cytoplasmic level of tRNA affects translation? Author should perform translation assay using bio-orthoganal amino acid (AHA) labelling.

      It is not known whether the reduced tRNA levels affect translation globally in the Polr3a mutant, but we predict that this may not be the case. Since tissues (heart and kidney) and brain regions (cerebrum and cerebellum) that share a decrease in tRNA abundance do not share activation of the Integrated Stress Response (a reporter of aberrant translation), we anticipate that effects on translation may be limited to specific regions or cell populations and to specific mRNAs within these cells. The current study provides the foundation for further work to address these questions.

      Reviewer #1 (Recommendations For The Authors):

      Below are a few comments, mostly regarding typographical errors, presentation, and clarity, that we believe would enhance this manuscript:

      On the heatmaps generated, it would be ideal to place "WT" before "KI," with "WT" on the left. This will maintain consistency with the rest of the manuscript, where "WT" conditions precede "KI" conditions, as observed in the bar graphs and dot plots.

      All heatmaps have been remade with WT on the left and KI on the right to maintain consistency throughout the manuscript. 

      Authors mentioned in several instances (Discussion Pg 19 Line 2, for instance) the analysis of changes in the "Pol II transcriptome." Is this a typographical error?

      The reference to the Pol II transcriptome is not a typographical error (Discussion Pg 19 Line2). Here and elsewhere in the manuscript, we are distinguishing between changes to the Pol III transcriptome and the timing of subsequent changes to the Pol II transcriptome. The text has been edited to clarify this relationship in several places.   

      (1) Introduction, Page 4, last paragraph.

      Analysis of the Pol III transcriptome reveals a common decrease in pre-tRNA and mature tRNA populations and few if any changes among other Pol III transcripts across multiple tissues. Analysis of the Pol II transcriptome reveals activation of the integrated stress response in cerebra but not in other surveyed tissues.

      (2) Results, page 8, 2nd paragraph

      To investigate the molecular changes to Pol III transcript levels caused by the Polr3a mutation and any secondary effects on the Pol II transcriptome, we initially focused on the cerebra of adult mice at P75.

      (3) Discussion, Page 19, second paragraph

      Pol III dysfunction and the reduction in the cerebral tRNA population at P42 coincides with behavioral deficits and precedes substantial downstream alterations in the Pol II transcriptome, which include induction of an innate immune response (IR) and an ISR, and indicators of neurodegeneration (i.e., activation of cell death pathways and loss of mitochondrial DNA). These findings suggest a causal role for the lower tRNA abundance and/or altered tRNA profile in disease progression.

      In supplementary figure 1, authors validated the expression of their systems using flow cytometry and observed a high level of recombination frequency in different tissue types. Can the flow cytometry data distinguish between cell types within the cerebrum (neurons/microglia/astrocytes)?

      The flow cytometry experiments reported in Supplementary Fig. S1 used a dual tdTomato-EGFP reporter to assess recombination. The cerebral and cerebellar samples were gated on fluorescence from endogenous expression of tdTomato (red), EGFP (green) and DAPI (blue) staining. In principle, flow cytometry could be used to distinguish between cell types within the cerebrum (neurons/microglia/astrocytes). However,  this would require (i) an antibody to a cell surface marker on the cell type of interest and (ii) a fluorescent probe conjugated to the primary antibody or a fluorescent secondary antibody that is spectrally well resolved from the emission spectra of tdTomato, eGFP and DAPI.

      Results section 1: Is there any particular reason why P28 was chosen as the commencement of tamoxifen injection?

      P28 was chosen so that any effect of the Polr3a mutation on development and differentiation would be limited in the tissues we examined. 

      Fig 1C: The number of asterisks does not match between the graph and the figure legend.

      Fig. 1C has been corrected to match the number of asterisks in the graph and figure legend.

      Results section 3:

      This section seemed a little brief, especially when compared to the depth of the succeeding sections. Authors can state in greater detail which behaviors were quantified. In S3A-C, my understanding is that the animals were placed in an open-field test. This procedure can be briefly mentioned in the methods, as well as in the main manuscript text.

      In the legends of S3, a bracket is missing for "(D-F)" on line 5. Additionally, the alignment of legends for each bar graph could be consistent for all graphs except under the condition of spatial constraint.

      Detailed methods pertaining to the measurement and calculation of home cage-like behaviors reported by the behavioral spectrometer have been added to the Methods section on Mouse Behavior. 

      In the Results, Figs. S3A-C show anxiety-like behaviors which measure the number and duration of visits and the distance traveled  in a 15 cm2  central area of the arena. Figs. 2A-C show locomotor behaviors including Tracklength, Run sum and Still sum. The open field-like behavior is reported as total Tracklength in the behavioral spectrometer, i.e. the total distance travelled in the arena. This is now more clearly described in  the main manuscript and the Methods section. “overall locomotor activity was decreased in Polr3a-tamKI mice as indicated by the reduced track length at P42, P49, P56 and P63 (Fig. 2A).” 

      The legend of S3, now has the missing bracket "(D-F)" on line 5. 

      The legends within each bar graph are now consistent and aligned as much as spatial constraints allow.

      Results section 4:

      Similar to our earlier questions for S1, is it possible to distinguish samples derived from different cell types (neurons/glia)? In figure 4, this is mainly done post-hoc, based on the known gene expression. Maybe the authors could discuss this small limitation? In Fig S4C, the color contrast for the heatmap legend needs to be corrected.

      It is not possible to accurately distinguish different neural cell sub-types, such as different types of neurons, or different types of oligodendrocytes in bulk RNAseq. Hence, we have reported only high confidence correlations based on known gene expression signatures (Fig. 4). We discuss only the data for which we can draw confident conclusions. The heatmap and legend in Fig. S4C has been amended. 

      Results section 5:

      In figure S5A, the alignment of asterisk significance markers could be adjusted.

      Asterisks have been realigned in Fig. S5A

      Reviewer #2 (Recommendations For The Authors):

      Methods Section should include detailed procedure.

      A detailed description of the methods pertaining to the measurement and calculation of behaviors using the behavioral spectrometer has been added to the Methods section.

      Statistical tests should have detailed information

      Statistical tests are detailed in the Methods section “Statistical Analysis”. Additional details pertaining to calculations of behavioral data have been added to the “Mouse behavior” section of the Methods.

    1. Author response:

      The following is the authors’ response to the current reviews.

      Public Reviews:

      Reviewer #2 (Public review):

      Weaknesses:

      The authors have clarified that the first features available for each patient have been used. However, they have not shown that these features did not occur before the time of post-stroke epilepsy. Explicit clarification of this should be performed.

      The data utilized in our analysis were collected during the first examination or test conducted after the patients' admission. We specifically excluded any patients with a history of epilepsy, ensuring that all cases of epilepsy identified in our study occurred after admission. Therefore, the features we analyzed were collected after the patients' admission but prior to the onset of post-stroke epilepsy.

      Reviewer #3 (Public review):

      Weaknesses:

      The writing of the article may be significantly improved.

      Although the external validation is appreciated, cross-validation to check robustness of the models would also be welcome.

      Thank you for your helpful advice.  Performing n-fold cross-validation is a crucial step to ensure the reliability and robustness of the reported results, especially when dealing with the datasets which don't have sufficient quantity.   We revised our code and did a 5 fold cross-validation version ,it didn’t have much promote(because our model has reach the auc of 0.99).Considering that we have sufficient quantity of more than 20000 records, we think split the dataset by 7:3 and train the model is enough for us. We have uploaded the code of 5 fold cross-validation version and ploted the 5 fold test roc  on GitHub at https://github.com/conanan/lasso-ml/lasso_ml_cross_validation.ipynb as an external resource. We  trained the 5 fold average model and ploted the 5 fold test roc curves, the results show some improvement, but it is not substantial because the best model are still tree models in the end.

      External validation results may be biased/overoptimistic, since the authors informed that "The external validation cohort focused more on collecting positive cases 80 to examine the model's ability to identify positive samples", which may result in overoptimistic PPV and Sensitivity estimations. The specificity for the external validation set has not been disclosed.

      Thank you for your valuable feedback regarding the external validation results. We appreciate your concerns about potential bias and overoptimism in our estimations of positive predictive value (PPV) and sensitivity.

      To clarify, we have uploaded the code for external validation on GitHub at https://github.com/conanan/lasso-ml. The results indicate that the PPV is 0.95 and the specificity is 0.98.

      While we focused on collecting more positive cases due to their lower occurrence rate, this approach allows us to better evaluate the model's ability to predict positive samples, which is crucial in clinical settings. We believe that emphasizing positive cases enhances the model's utility for practical applications(So a little overoptimism is acceptable ).


      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Weaknesses 1:

      The methodology needs further consideration. The Discussion needs extensive rewriting.

      Thanks for your advice, we have revised the Discussion

      Reviewer #2 (Public Review):

      Weaknesses 2:

      There are many typos and unclear statements throughout the paper.

      There are some issues with SHAP interpretation. SHAP in its default form, does not provide robust statistical guarantees of effect size. There is a claim that "SHAP analysis showed that white blood cell count had the greatest impact among the routine blood test parameters". This is a difficult claim to make.

      Thank you for your suggestion that the SHAP analysis is really just a means of interpreting the model.  In our research, we compared the SHAP analysis with traditional statistical methods, such as regression analysis.  We found the SHAP results to be consistent with the statistical results from the regression for variables like white blood cell count (see Table 1). This alignment leads us to believe the SHAP analysis is providing reliable insights in this context

      The Data Collection section is very poorly written, and the methodology is not clear.

      Thanks for your advice, we have revised the Data Collection section.

      There is no information about hyperparameter selection for models or whether a hyperparameter search was performed. Given this, it is difficult to conclude whether one machine learning model performs better than others on this task.

      Thank you for the advices of performing hyperparameter. We used the package of sklearn, xgboost, lightgbm of python 3.10 to construct the model and  didn’t change the default settings before. It is not proper and may lead to  less certain conclusions. Now we carry out grid search to select and optimize hyperparameters and they make the model better. The best model is still RF.

      The inclusion and exclusion criteria are unclear - how many patients were excluded and for what reasons?

      The procedure of selection is in figure1. Total there are 42079 records from the stroke database, 24733 patients were diagnosed as ischemic stroke or lacular stoke with new onset. Then we excluded hemorrage stroke(4565),history of stroke(2154), TIA(3570), unclear cause stroke(561) and records who missed important data(6496). Then we excluded patients whose seizure might be attributed to other potential causes (brain tumor, intracranial vascular malformation, traumatic brain injury,etc)(865). Then we exclude patient who had a seizure history(152) or died in hospital (1444). Then we excluded patients who were lost in follow-up (had no outpatient records and can’t contact by phone )or died within 3 months of the stroke incident(813). Finally 21459 cases are involved in this research.

      There is no sensitivity analysis of the SMOTE methodology: How many synthetic data points were created, and how does the number of synthetic data points affect classification accuracy?

      Thanks for your remind, we have accept these advice and change the SMOTE to SMOTEENN (Synthetic Minority Over-sampling Technique combined with Edited Nearest Neighbors) technique to resample an imbalanced dataset for machine learning. The code is

      smoteenn = SMOTEENN(samplingstrategy='auto', randomstate=42)

      the SMOTEENN class comes from the imblearn library. The samplingstrategy='auto' parameter tells the algorithm to automatically determine the appropriate sampling strategy based on the class distribution. The randomstate=42 parameter sets a seed for the random number generator, ensuring reproducibility of the results.

      Did the authors achieve their aims? Do the results support their conclusions?

      Yes, we have achieve some of the aims of predicting PSE while still leave some problem.

      The paper does not clarify the features' temporal origins. If some features were not recorded on admission to the hospital but were recorded after PSE occurred, there would be temporal leakage.

      The data used in our analysis is from the first examination or test conducted after the patients' admission, retrieved from a PostgreSQL database. First, we extracted the initial admission date for patients admitted due to stroke. Then, we identified the nearest subsequent examination data for each of those patients.

      The sql code like follows:

      SELECT TO_DATE(condition_start_date, 'DD-MM-YYYY') AS DATE

      FROM diagnosis

      WHERE person_id ={} and (condition_name like '%梗死%' or condition_name like '%梗塞%') and(condition_name like '%脑%'or condition_name like '%腔隙%'))

      order by DATE limit 1

      The authors claim that their models can predict PSE. To believe this claim, seeing more information on out-of-distribution generalisation performance would be helpful. There is limited reporting on the external validation cohort relative to the reporting on train and test data.

      Thank you for the advice. The external validation is certainly very important, but there have been some difficulties in reaching a perfect solution.  We have tried using open-source databases like the MIMIC database, but the data there does not fit our needs as closely as the records from our own hospital.  The MIMIC database lacks some of the key features we require, and also lacks the detailed patient follow-up information that is crucial for our analysis.   Given these limitations, we have decided to collect newer records from the same hospitals here in Chongqing.  We believe this will allow us to build a more comprehensive dataset to support robust external validation.  While it may not be a perfect solution, gathering this additional data from our local healthcare system is a pragmatic step forward.   Looking ahead, we plan to continue expanding this Chongqing-based dataset and report on the results of the greater external validation in the future.  We are committed to overcoming the challenges around data availability to strengthen the validity and generalizability of our research findings.

      For greater certainty on all reported results, it would be most appropriate to perform n-fold cross-validation, and report mean scores and confidence intervals across the cross-validation splits

      Thank you for your helpful advice. Performing n-fold cross-validation is a crucial step to ensure the reliability and robustness of the reported results, especially when dealing with the datasets which don't have sufficient quantity. While we have sufficient quantity of more than 20000 records, so we think split the dataset by 7:3 and train the model is enough for us. We revised our code and did a 5 fold cross-validation version ,it had little promote(because our model has reach the auc of 0.99), we may use this great technique in our next study if there is not enough cases.

      Additional context that might help readers

      The authors show force plots and decision plots from SHAP values. These plots are non-trivial to interpret, and the authors should include an explanation of how to interpret them.

      Thank you for your helpful advice. It is a great improve for our draft, we have added the explanation that we use the force plot of the first person to show the influence of different features of the first person, we can see that long APTT time contribute best to PSE, then the AST level and others, the NIHSS score may be low and contribute opposite to the final result. Then the decision plot is a collection of model decisions that show how complex models arrive at their predictions

      Reviewer #3 (Public Review):

      Weaknesses3:

      There are issues with the readability of the paper. Many abbreviations are not introduced properly and sometimes are written inconsistently. A lot of relevant references are omitted. The methodological descriptions are extremely brief and, sometimes, incomplete.

      Thanks for your advice, we have revised these flaws.

      The dataset is not disclosed, and neither is the code (although the code is made available upon request). For the sake of reproducibility, unless any bioethical concerns impede it, it would be good to have these data disclosed.

      Thank you for your recommendations. We have made the code available on GitHub at https://github.com/conanan/lasso-ml. While the data is private and belongs to the hospital. Access can be requested by contacting the corresponding author to apply from the hospitals and specifying the purpose of inquiry.

      Although the external validation is appreciated, cross-validation to check the robustness of the models would also be welcome.

      Thank you for your valuable advice. Performing n-fold cross-validation is crucial for ensuring the reliability and robustness of results, especially with limited datasets. However, since we have over 20,000 records, we believe that a 70:30 split for training and testing is sufficient.

      We revised our code and implemented 5-fold cross-validation, which provided minimal improvement, as our model has already achieved an AUC of 0.99. We plan to use this technique in future studies if we encounter fewer cases.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      My comments include two parts:

      (1) Methodology<br /> a-This study was based on multiple clinical indicators to construct a model for predicting the occurrence of PSE. It involved various multi-class indicators such as the affected cortical regions, locations of vascular occlusion, NIHSS scores, etc. Only using the SHAP index to explain the impact of multi-class variables on the dependent variable seems slightly insufficient. It might be worth considering the use of dummy variables to improve the model's accuracy.

      Thank you for the detailed feedback on the study methodology. The SHAP analysis is really just a means of interpreting the model, which we compared with the combination of SHAP and traditional statistics, so we think SHAP analysis is reliable in this research. We have used the dummy variables, expecially when dealing with the affected cortical regions, locations of vascular occlusion, for example if frontal region is involved the variable is 1. But they have less impact in the machine learning model

      b-The study used Lasso regression to select 20 features to build the model. How was the optimal number of 20 features determined?

      Lasso regression is a commonly used feature screening method. Since we extract information from the database and try to include as many features as possible, the cross-verification curve of lasso regression includes 78 features best, but it will lead to too complex model. We select 10,15,20,25,30 features for modeling according to the experiment. When 20 features are found, the model parameters are good and relatively concise. Improve the number of features contribute little to the model effect, decrease the number of features influence the concise of model ,for example the auc of the model with 15 features will drop under 0.95. So we finally select 20 features.

      c-The study indicated that the incidence rate of PSE in the enrolled patients is 4.3%, showing a highly imbalanced dataset. If singly using the SMOTE method for oversampling, could this lead to overfitting?

      Thanks for your remind, singly using the SMOTE method for oversampling is inproper. Now we have find this improvement and change the SMOTE to SMOTEENN (Synthetic Minority Over-sampling Technique combined with Edited Nearest Neighbors) technique to resample an imbalanced dataset for machine learning. First, oversampling with SMOTE and then undersampling with ENN to remove possible noise and duplicate samples. The code is

      smoteenn = SMOTEENN(sampling_strategy='auto', random_state=42)

      the SMOTEENN class comes from the imblearn library. The sampling_strategy='auto' parameter tells the algorithm to automatically determine the appropriate sampling strategy based on the class distribution. The random_state=42 parameter sets a seed for the random number generator, ensuring reproducibility of the results.

      (2) Clinical aspects:

      Line 8, history of ischemic stroke, this is misexpression, could be: diagnosis of ischemic stroke.

      Line 8, several hospitals, should be more exact; how many?

      Line 74 indicates that the data are from a single centre, this should be clarified.

      Line 4 data collection: The criteria read unclear; please clarify further.

      Thanks for your remind, we have revised the draft and correct these errors.

      Line 110, lab parameters: Why is there no blood glucose?

      Because many patients' blood sugar fluctuates greatly and is easily affected by drugs or diet, we finally consider HBA1c as a reference index by asking experts which is more stable.

      Line 295, The author indicated that data lost; this should be clarified in the results part, and further, the treatment of missing data should be clarified in the method part.

      Thanks for your remind, we have revised the draft and correct these errors.

      I hope to see a table of the cohort's baseline characters. The discussion needs extensive rewriting; the author seems to be swinging from the stoke outcome and the seizure, sometimes losing the target.

      Figure1 is the procedure of the selection of patients. Table1 contains the cohort's baseline characters

      For the swinging from the stoke outcome and the seizure, that is because there are few articles on predicting epilepsy directly by relevant indicators, while there are more articles on prognosis. So we can only take epilepsy as an important factor in prognosis and comprehensively discuss it, or we can't find enough articles and discuss them

      Reviewer #2 (Recommendations For The Authors):

      There are typos and examples of text that are not clear, including:

      "About the nihss score, the higher the nihss score, the more likely to be PSE, nihss score has a third effect just below white blood cell count and D-dimer."

      "and only 8 people made incorrect predictions, demonstratijmng a good predictive ability of the model."

      "female were prone to PSE"

      " Waafi's research"

      "One-heat' (should be one-hot)

      Thanks for your remind, we have revised the draft and correct these errors.

      The Data Collection section is poorly written, and the methodology is not clear. It would be much more appropriate to include a table of all features used and an explanation of what these features involve. It would also be useful to see the mean values of these features to assess whether the feature values are reasonable for the dataset.

      Thanks for your remind. All data are from the first examination or test after admission, presented through the postgresql database . First we extract the first date of the patients who was admitted by stroke ,then we extract informations from the nearest examination from the admission. We extract by the SQL code by computer instead of others who may extract data by manual so we get as much data as possible other than only get the features which was reported before .The table of all features used and their mean±std is in table1.

      The paper does not clarify the features' temporal origins. If some features were not recorded on admission to the hospital but were recorded after PSE occurred, there would be temporal leakage. I would need this clarified before believing the authors achieved their claims of building a predictive model.

      All relevant index results were from the first examination after admission, and the mean standard deviation was listed in the statistical analysis section in table1.

      The authors claim that their models can predict PSE. To believe this claim, seeing more information on out-of-distribution generalisation performance would be helpful. There is limited reporting on the external validation cohort relative to the reporting on train and test data.

      Thank you for the advice, the external validation is very important but there are some difficulties to reach a perfect one. We have tried some of the open source database like the mimic database ,but these data don't fit our request because they don't have as much features as our hospital and lack of follow-up of the relevant patients. In the end we collected the newer records in the same hospitals in Chongqing and we will collect more and report a greater external validation in the future.

      For greater certainty on all reported results, It would be most appropriate to perform n-fold cross-validation, and report mean scores and confidence intervals across the cross-validation splits.

      Thank you for your helpful advice. Performing n-fold cross-validation is a crucial step to ensure the reliability and robustness of the reported results, especially when dealing with the datasets which don't have sufficient quantity. While we have sufficient quantity of more than 20000 records, so we think split the dataset by 7:3 and train the model is enough for us. We revised our code and did a 5 fold cross-validation version ,it had little promote, we will use this great technique in our next study.

      The authors show force plots and decision plots from SHAP values. These plots are non-trivial to interpret, and the authors should include an explanation of how to interpret them.

      It is a great improve for our draft, we have added the explanation we use the force plot of the first person to show the influence of different features of the first person, we can see that long APTT time contribute best to PSE, then the AST level and others, the NIHSS score may be low and contribute lower to the final result. Then the decision plot is a collection of model decisions that show how complex models arrive at their predictions

      Reviewer #3 (Recommendations For The Authors):

      Abbreviations should not be defined in the abstract )or only in the abstract).

      Please explicit what are the purposes of the study you are referring to in "Currently, most studies utilize clinical data to establish statistical models, survival analysis and cox regression."

      Authors affirm: "there is still a relative scarcity of research 49 on PSE prediction, with most studies focusing on the analysis of specific or certain risk factors ." This statement is especially curious since the current study uses risk factors as predictors.

      It is not clear to me what the authors mean by "No study has proposed or established a more comprehensive and scientifically accurate prediction model." The authors do not summarize the statistical parameters of previously reported model, or other relevant data to assess coverage or validity (maybe including a Table summarizing such information would be appropriate. In any case, I would try to omit statements that imply, to some extent, discrediting previous studies without sufficient foundation.

      "antiepileptic drugs" is an outdated name. Please use "antiseizure medications"

      Thanks for your remind, we have revised the draft and correct these errors.

      The authors say regarding missing data that they "filled the data of the remaining indicators with missing values of more than 1000 cases by random forest algorithm". Please clarify what you mean by "of more than 1000 cases." Also, provide details on the RF model used to fill in missing data.

      Thanks for your remind. "of more than 1000 cases" was a wrong sentence and we have corrected it. Here is the procedure, first we counted the values of all laboratory indicators for the first time after stroke admission( everyone who was admitted because of stroke would perform blood routine , liver and kidney function and so on), excluded indicators with missing values of more than 10%, and filled the data of the remaining indicators with missing values by random forest algorithm using the default parameter. First, we go through all the features, starting with the one with the least missing (since the least accurate information is needed to fill in the feature with the least missing). When filling in a feature, replace the missing value of the other feature with 0. Each time a regression prediction is completed, the predicted value is placed in the original feature matrix and the next feature is filled in. After going through all the features, the data filling is complete.

      Please specify what do you mean by negative group and positive group, Avoid tacit assumptions.

      Thanks for your remind, we have revised the draft and correct these errors.

      Please provide more details (and references) on the smote oversampling method. Indicate any relevant parameters/hyperparameters.

      Thanks for your remind, we have accept these advice and change the SMOTE to SMOTEENN (Synthetic Minority Over-sampling Technique combined with Edited Nearest Neighbors) technique to resample an imbalanced dataset for machine learning. The code is

      smoteenn = SMOTEENN(sampling_strategy='auto', random_state=42)

      the SMOTEENN class comes from the imblearn library. The sampling_strategy='auto' parameter tells the algorithm to automatically determine the appropriate sampling strategy based on the class distribution. The random_state=42 parameter sets a seed for the random number generator, ensuring reproducibility of the results.

      The methodology is presented in an extremely succinct and non-organic manner (e.g., (Model building) Select the 20 features with the largest absolute value of LASSO." Please try to improve the narrative.

      Lasso regression is a commonly used feature screening method. Since we extract information from the database and try to include as many features as possible, the cross-verification curve of lasso regression includes 78 features best, but it will lead to too complex model. We select 10,15,20,25,30 features for modeling according to the experiment. When 20 features are found, the model parameters are good and relatively concise. Improve the number of features contribute little to the model effect, decrease the number of features influence the concise of model ,for example the auc of the model with 15 features will drop under 0.95. So we finally select 20 features.

      Many passages of the text need references. For example, those that refer to Levene test, Welch's t-test, Brier score, Youden index, and many others (e.g., NIHSS score). Please revise carefully.

      Thanks for your remind, we have revised the draft and correct these errors.

      "Statistical details of the clinical characteristics of the patients are provided in the table." Which table? Number?

      Thanks for your remind, we have revised the draft and correct these errors, it is in table1.

      Many abbreviations are not properly presented and defined in the text, e.g., wbc count, hba1c, crp, tg, ast, alt, bilirubin, bua, aptt, tt, d_dimer, ck. Whereas I can guess the meaning, do not assume everyone will. Avoid assumptions.

      ROC is sometimes written "ROC" and others, "roc." The same happens for PPV/ppv, and many other words (SMOTE; NIHSS score, etc.).

      Please rephrase "ppv value of random forest is the highest, reaching 0.977, which is more accurate for the identification of positive patients(the most important function of our models).". PPV always refer to positive predictions that are corroborated, so the sentences seem redundant.

      Thanks for your remind, we have revised the draft and correct these errors.

      What do you mean by "Complex algorithms". Please try to be as explicit as possible. The text looks rather cryptic or vague in many passages.

      Thanks for your remind, "Complex algorithms" is corrected by machine learning.

      The text needs a thorough English language-focused revision, since the sense of some sentences is really misleading. For instance "only 8 people made incorrect predictions,". I guess the authors try to say that the best algorithm only mispredicted 8 cases since no people are making predictions here. Also, regarding that quote... Are the authors still speaking of the results of the random forest model, which was said to be one of the best performances?

      Thanks for your remind, we have revised the draft and correct these errors.

      The authors say that they used, as predictors "comprehensive clinical data, imaging data, laboratory test data, and other data from stroke patients". However, the total pool of predictors is not clear to me at this point. Please make it explicit and avoid abbreviations.

      Thanks for your remind, we have revised the draft and correct these errors.

      Although the authors say that their code is available upon request, I think it would be better to have it published in an appropriate repository.

      Thanks for your remind, we showed our code at  https://github.com/conanan/lasso-ml.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors investigated how the presence of interspecific introgressions in the genome affects the recombination landscape. This research was intended to inform about genetic phenomena influencing the evolution of introgressed regions, although it should be noted that the research itself is based on examining only one generation, which limits the possibility of drawing far-reaching evolutionary conclusions. In this work, yeast hybrids with large (from several to several dozen percent of the chromosome length) introgressions from another yeast species were crossed. Then, the products of meiosis were isolated and sequenced, and on this basis, the genome-wide distribution of both crossovers (COs) and noncrossovers (NCOs) was examined. Carrying out the analysis at different levels of resolution, it was found that in the regions of introduction, there is a very significant reduction in the frequency of COs and a simultaneous increase in the frequency of NCOs. Moreover, it was confirmed that introgressions significantly limit the local shuffling of genetic information, and NCOs are only able to slightly contribute to the shuffling, thus they do not compensate for the loss of CO recombination.

      Strengths:

      - Previously, experiments examining the impact of SNP polymorphism on meiotic recombination were conducted either on the scale of single hotspots or the entire hybrid genome, but the impact of large introgressed regions from another species was not examined. Therefore, the strength of this work is its interesting research setup, which allows for providing data from a different perspective.

      - Good quality genome-wide data on the distribution of CO and NCO were obtained, which could be related to local changes in the level of polymorphism.

      Weaknesses:

      (1)  The research is based on examining only one generation, which limits the possibility of drawing far-reaching evolutionary conclusions. Moreover, meiosis is stimulated in hybrids in which introgressions occur in a heterozygous state, which is a very unlikely situation in nature. Therefore, I see the main value of the work in providing information on the CO/NCO decision in regions with high sequence diversification, but not in the context of evolution.

      While we are indeed only examining recombination in a single generation, we respectfully disagree that our results aren't relevant to evolutionary processes. The broad goals of our study are to compare recombination landscapes between closely related strains, and we highlight dramatic differences between recombination landscapes. These results add to a body of literature that seeks to understand the existence of variation in traits like recombination rate, and how recombination rate can evolve between populations and species. We show here that the presence of introgression can contribute to changes in recombination rate measured in different individuals or populations, which has not been previously appreciated. We furthermore show that introgression can reduce shuffling between alleles on a chromosome, which is recognized as one of the most important determinants for the existence and persistence of sexual reproduction across all organisms. As we describe in our introduction and conclusion, we see our experimental exploration of the impacts of introgression on the recombination landscape as complementary to studies inferring recombination and introgression from population sequencing data and simulations. There are benefits and challenges to each approach, but both can help us better understand these processes. In regards to the utility of exploring heterozygous introgression, we point out that introgression is often found in a heterozygous state (including in modern humans with Neanderthal and/or Denisovan ancestry). Introgression will always be heterozygous immediately after hybridization, and depending on the frequency of gene flow into the population, the level of inbreeding, selection against introgression, etc., introgression will typically be found as heterozygous.

      - The work requires greater care in preparing informative figures and, more importantly, re-analysis of some of the data (see comments below).

      More specific comments:

      (1) The authors themselves admit that the detection of NCO, due to the short size of conversion tracts, depends on the density of SNPs in a given region. Consequently, more NCOs will be detected in introgressed regions with a high density of polymorphisms compared to the rest of the genome. To investigate what impact this has on the analysis, the authors should demonstrate that the efficiency of detecting NCOs in introgressed regions is not significantly higher than the efficiency of detecting NCOs in the rest of the genome. If it turns out that this impact is significant, analyses should be presented proving that it does not entirely explain the increase in the frequency of NCOs in introgressed regions.

      We conducted a deeper exploration of the effect of marker resolution on NCO detection by randomly removing different proportions of markers from introgressed regions of the fermentation cross in order to simulate different marker resolutions from non-introgressed regions. We chose proportions of markers that would simulate different quantiles of the resolution of non-introgressed regions and repeated our standard pipeline in order to compare our NCO detection at the chosen marker densities. More details of this analysis have been added to the manuscript (lines 188-199, 525-538). We confirmed the effect of marker resolution on NCO detection (as reported in the updated manuscript and new supplementary figures S2-S10, new Table S10) and decided to repeat our analyses on the original data with a more stringent correction. For this we chose our observed average tract size for NCOs in introgressed regions (550bp), which leads to a far more conservative estimate of NCO counts (As seen in the updated Figure 2 and Table 2). This better accounts for the increased resolution in introgressed regions, and while it's possible to be more stringent with our corrections, we believe that further stringency would be unreasonable. We also see promising signs that the correction is sufficient when counting our CO and NCO events in both crosses, as described in our response to comment 39 (response to reviewer #3).

      (2) CO and NCO analyses performed separately for individual regions rarely show statistical significance (Figures 3 and 4). I think that the authors, after dividing the introgressed regions into non-overlapping windows of 100 bp (I suggest also trying 200 bp, 500 bp, and 1kb windows), should combine the data for all regions and perform correlations to SNP density in each window for the whole set of data. Such an analysis has a greater chance of demonstrating statistically significant relationships. This could replace the analysis presented in Figure 3 (which can be moved to Supplement). Moreover, the analysis should also take into account indels.

      We're uncertain of what is being requested here. If the comment refers to the effect of marker density on NCO detection, we hope the response to comment 2 will help resolve this comment as well. Otherwise, we ask for some clarification so that we may correct or revise as appropriate.

      (3) In Arabidopsis, it has been shown that crossover is stimulated in heterozygous regions that are adjacent to homozygous regions on the same chromosome (http://dx.doi.org/10.7554/eLife.03708.001, https://doi.org/10.1038/s41467-022-35722-3).

      This effect applies only to class I crossovers, and is reversed for class II crossovers (https://doi.org/10.15252/embj.2020104858, https://doi.org/10.1038/s41467-023-42511-z). This research system is very similar to the system used by the authors, although it likely differs in the level of DNA sequence divergence. The authors could discuss their work in this context.

      We thank the reviewer for sharing these references. We have added a discussion of our work in the context of these findings in the Discussion, lines 367-376.

      Reviewer #2 (Public Review):

      Summary:

      Schwartzkopf et al characterized the meiotic recombination impact of highly heterozygous introgressed regions within the budding yeast Saccharomyces uvarum, a close relative of the canonical model Saccharomyces cerevisiae. To do so, they took advantage of the naturally occurring Saccharomyces bayanus introgressions specifically within fermentation isolates of S. uvarum and compared their behavior to the syntenic regions of a cross between natural isolates that do not contain such introgressions. Analysis of crossover (CO) and noncrossover (NCO) recombination events shows both a depletion in CO frequency within highly heterozygous introgressed regions and an increase in NCO frequency. These results strongly support the hypothesis that DNA sequence polymorphism inhibits CO formation, and has no or much weaker effects on NCO formation. Eventually, the authors show that the presence of introgressions negatively impacts "r", the parameter that reflects the probability that a randomly chosen pair of loci shuffles their alleles in a gamete.

      The authors chose a sound experimental setup that allowed them to directly compare recombination properties of orthologous syntenic regions in an otherwise intra-specific genetic background. The way the analyses have been performed looks right, although this reviewer is unable to judge the relevance of the statistical tests used. Eventually, most of their results which are elegant and of interest to the community are present in Figure 2.

      Strengths:

      Analysis of crossover (CO) and noncrossover (NCO) recombination events is compelling in showing both a depletion in CO frequency within highly heterozygous introgressed regions and an increase in NCO frequency.

      Weaknesses:

      The main weaknesses refer to a few text issues and a lack of discussion about the mechanistic implications of the present findings.

      - Introduction

      (1) The introduction is rather long. | I suggest specifically referring to "meiotic" recombination (line 71) and to "meiotic" DSBs (line 73) since recombination can occur outside of meiosis (ie somatic cells).

      We agree and have condensed the introduction to be more focused. We also made the suggested edits to include “meiotic” when referring to recombination and DSBs.

      (2) From lines 79 to 87: the description of recombination is unnecessarily complex and confusing. I suggest the authors simply remind that DSB repair through homologous recombination is inherently associated with a gene conversion tract (primarily as a result of the repair of heteroduplex DNA by the mismatch repair (MMR) machinery) that can be associated or not to a crossover. The former recombination product is a crossover (CO), the latter product is a noncrossover (NCO) or gene conversion. Limited markers may prevent the detection of gene conversions, which erase NCO but do not affect CO detection.

      We changed the language in this section to reflect the reviewer’s suggestions.

      (3) In addition, "resolution" in the recombination field refers to the processing of a double Holliday junction containing intermediates by structure-specific nucleases. To avoid any confusion, I suggest avoiding using "resolution" and simply sticking with "DSB repair" all along the text.

      We made the suggested correction throughout the paper.

      (4) Note that there are several studies about S. cerevisiae meiotic recombination landscapes using different hybrids that show different CO counts. In the introduction, the authors refer to Mancera et al 2008, a reference paper in the field. In this paper, the hybrid used showed ca. 90 CO per meiosis, while their reference to Liu et al 2018 in Figure 2 shows less than 80 COs per meiosis for S. cerevisiae. This shows that it is not easy to come up with a definitive CO count per meiosis in a given species. This needs to be taken into account for the result section line 315-321.

      This is an excellent point. We added this context in the results (lines 180-187).

      (5) In line 104, the authors refer to S. paradoxus and mention that its recombination rate is significantly different from that of S. cerevisiae. This is inaccurate since this paper claims that the CO landscape is even more conserved than the DSB landscape between these two species, and they even identify a strong role played by the subtelomeric regions. So, the discussion about this paper cannot stand as it is.

      We agree with the reviewer's point. We also found that the entire paragraph was unnecessary, so it and the sentence in question have been removed.

      (6) Line 150, when the authors refer to the anti-recombinogenic activity of the MMR, I suggest referring to the published work from Martini et al 2011 rather than the not-yet-published work from Copper et al 2021, or both, if needed.

      Added the suggested citation.

      Results

      (7) The clear depletion in CO and the concomitant increase in NCO within the introgressed regions strongly suggest that DNA sequence polymorphism triggers CO inhibition but does not affect NCO or to a much lower extent. Because most CO likely arises from the ZMM pathway (CO interference pathway mainly relying on Zip1, 2, 3, 4, Spo16, Msh4, 5, and Mer3) in S. uvarum as in S. cerevisiae, and because the effect of sequence polymorphism is likely mediated by the MMR machinery, this would imply that MMR specifically inhibits the ZMM pathway at some point in S. uvarum. The weak effect or potential absence of the effect of sequence polymorphism on NCO formation suggests that heteroduplex DNA tracts, at least the way they form during NCO formation, escape the anti-recombinogenic effect of MMR in S. uvarum. A few comments about this could be added.

      We have added discussion and citations regarding the biased repair of DSB to NCO in introgression, lines 380-386.

      (8) The same applies to the fact that the CO number is lower in the natural cross compared to the fermentation cross, while the NCO number is the same. This suggests that under similar initiating Spo11-DSB numbers in both crosses, the decrease in CO is likely compensated by a similar increase in inter-sister recombination.

      Thank you to the reviewer for this observation. We agree that this could explain some differences between the crosses.

      (9) Introgressions represent only 10% of the genome, while the decrease in CO is at least 20%. This is a bit surprising especially in light of CO regulation mechanisms such as CO homeostasis that tends to keep CO constant. Could the authors comment on that?

      We interpret these results to reflect two underlying mechanisms. First, the presence of heterozygous introgression does reduce the number of COs. Second, we believe the difference in COs reflects variation in recombination rate between strains. We note that CO homeostasis need not apply across different genetic backgrounds. Indeed, recombination rate is appreciated to significantly differ between strains of S. cerevisiae (Raffoux et al. 2018), and recombination rate variation has been observed between strains/lines/populations in many different species including Drosophila, mice, humans, Arabidopsis, maize, etc. We reference S. cerevisiae strain variability in the Introduction lines 128-130, and have added context in the Results lines 180-187, and Discussion lines 343-350.

      (10) Finally, the frequency of NCOs in introgressed regions is about twice the frequency of CO in non-introgressed regions. Both CO and NCO result from Spo11-initiating DSBs.

      This suggests that more Spo11-DSBs are formed within introgressed regions and that such DSBs specifically give rise to NCO. Could this be related to the lack of homolog engagement which in turn shuts down Spo11-DSB formation as observed in ZMM mutants by the Keeney lab? Could this simply result from better detection of NCO in introgressed regions related to the increased marker density, although the authors claim that NCO counts are corrected for marker resolution?

      The effect noted by the reviewer remains despite the more conservative correction for marker density applied to NCO counts (as described in the response to Reviewer 1, comment #2). Given that CO+NCO counts in introgressed regions are not statistically different between crosses, it is likely that these regions are simply predisposed to a higher rate of DSBs than the rest of the genome. This is an interesting observation, however, and one that we would like to further explore in future work.

      (11) What could be the explanation for chromosome 12 to have more shuffling in the natural cross compared to the fermentation cross which is deprived of the introgressed region?

      We added this text to the Results, lines 323-327, "While it is unclear what potential mechanism is mediating the difference in shuffling on chromosome 12, we note that the rDNA locus on chromosome 12 is known to differ dramatically in repeat content across strains of S. cerevisiae (22–227 copies) (Sharma et a. 2022), and we speculate that differences in rDNA copy number between strains in our crosses could impact shuffling."

      Technical points:

      (12) In line 248, the authors removed NCO with fewer than three associated markers.

      What is the rationale for this? Is the genotyping strategy not reliable enough to consider events with only one or two markers? NCO events can be rather small and even escape detection due to low local marker density.

      We trust the genotyping strategy we used, but chose to be conservative in our detection of NCOs to account for potential sequencing biases.

      (13) Line 270: The way homology is calculated looks odd to this reviewer, especially the meaning of 0.5 homology. A site is either identical (1 homology) or not (0 homology).

      We've changed the language to better reflect what we are calculating (diploid sequence similarity; see comment #28). Essentially, the metric is a probability that two randomly selected chromatids--one from each parent--will share the same nucleotide at a given locus (akin to calculating the probability of homozygous offspring at a single locus). We average it along a segment of the genome to establish an expected sequence similarity if/when recombination occurs in that segment.

      (14) Line 365: beware that the estimates are for mitotic mismatch repair (MMR). Meiotic MMR may work differently.

      We removed the citation that refers exclusively to mitotic recombination. The statement regarding meiotic recombination is otherwise still reflective of results from Chen & Jinks-Robertson

      (15) Figure 1: there is no mention of potential 4:0 segregations. Did the authors find no such pattern? If not, how did they consider them?

      The program we used to call COs and NCOs (ReCombine's CrossOver program) can detect such patterns, but none were detected in our data.

      Reviewer #3 (Public Review):

      When members of two related but diverged species mate, the resulting hybrids can produce offspring where parts of one species' genome replace those of the other. These "introgressions" often create regions with a much greater density of sequence differences than are normally found between members of the same species. Previous studies have shown that increased sequence differences, when heterozygous, can reduce recombination during meiosis specifically in the region of increased difference. However, most of these studies have focused on crossover recombination, and have not measured noncrossovers. The current study uses a pair of Saccharomyces uvarum crosses: one between two natural isolates that, while exhibiting some divergence, do not contain introgressions; the other is between two fermentation strains that, when combined, are heterozygous for 9 large regions of introgression that have much greater divergence than the rest of the genome. The authors wished to determine if introgressions differently affected crossovers and noncrossovers, and, if so, what impact that would have on the gene shuffling that occurs during meiosis.

      (1) While both crossovers and noncrossovers were measured, assessing the true impact of increased heterology (inherent in heterozygous introgressions) is complicated by the fact that the increased marker density in heterozygous introgressions also increases the ability to detect noncrossovers. The authors used a relatively simple correction aimed at compensating for this difference, and based on that correction, conclude that, while as expected crossovers are decreased by increased sequence heterology, counter to expectations noncrossovers are substantially increased. They then show that, despite this, genetic shuffling overall is substantially reduced in regions of heterozygous introgression. However, it is likely that the correction used to compensate for the effect of increased sequence density is defective, and has not fully compensated for the ascertainment bias due to greater marker density. The simplest indication of this potential artifact is that, when crossover frequencies and "corrected" noncrossover frequencies are taken together, regions of introgression often appear to have greater levels of total recombination than flanking regions with much lower levels of heterology. This concern seriously undercuts virtually all of the novel conclusions of the study. Until this methodological concern is addressed, the work will not be a useful contribution to the field.

      We appreciate this concern. Please see response to comments #2 and #38. We further note that our results depicted in Figure 3 and 4 are not reliant on any correction or comparison with non-introgressed regions, and thus our results regarding sequence similarity and its effect on the repair of DSBs and the amount of genetic shuffling with/without introgression to be novel and important observations for the field.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Line 149 - this sentence refers to a mixture of papers reporting somatic or meiotic recombination and as these processes are based on different crossover pathways, this should not be mixed. For example, it is known that in Arabidopsis MSH2 has a pro-crossover function during meiotic recombination.

      Corrected

      (2) What is unclear to me is how the crosses are planned. Line 308 shows that there were only two crosses (one "natural" and one "fermentation"), but I understand that this is a shorthand and in fact several (four?) different strains were used for the "fermentation cross". At least that's what I concluded from Fig. 1B and its figure caption. This needs to be further explained. Were different strains used for each fermentation cross, or was one strain repeated in several crosses? In Figure 1, it would be worth showing, next to the panel showing "fermentation cross", a diagram of how "natural cross" was performed, because as I understand it, panel A illustrates the procedure common to both types of crosses, and not for "natural cross".

      We thank the reviewer for drawing our attention to confusion about how our crosses were created. We performed two crosses, as depicted in Figure 1A. The fermentation cross is a single cross from two strains isolated from fermentation environments. The natural cross is a single cross from two strains isolated from a tree and insect. Table S1 and the methods section "Strain and library construction" describe the strains used in more detail. We modified Figure 1 and the figure legend to help clarify this. See also response to comment #37.

      (3) The authors should provide a more detailed characterization of the genetic differences between chromosomes in their hybrids. What is the level of polymorphism along the S. uvarum chromosomes used in the experiments? Is this polymorphism evenly distributed? What are the differences in the level of polymorphism for individual introgressions? Theoretically, this data should be visible in Figure 2D, but this figure is practically illegible in the present form (see next comment).

      As suggested, we remade Figure 2D to only include chromosomes with an introgression present, and moved the remaining chromosomes to the supplements (Figure S11). The patterns of markers (which are fixed differences between the strains in the focal cross) should be more clear now. As we detail in the Methods line 507-508, we utilized a total of 24,574 markers for the natural cross and 74,619 markers for the fermentation cross (the higher number in the fermentation cross being due to more fixed differences in regions of introgression).

      (4) Figure 2D should be prepared more clearly, I would suggest stretching the chromosomes, otherwise, it is difficult to see what is happening in the introgression regions for CO and NCO (data for SNPs are more readable). Maybe leave only the chromosomes with introgressions and transfer the rest to the supplement?

      See previous comment.

      (5) How are the Y scales defined for Figure 2D?

      Figure 2D now includes units for the y-axis.

      (6) Are increases in CO levels in fermentation cross-observed at the border with introgressions? This would indicate local compensation for recombination loss in the introgressed regions, similar to that often observed for chromosomal inversions.

      We see no evidence of an increase in CO levels at the borders of introgressions, neither through visual inspection or by comparing the average CO rate in all fermentation windows to that of windows at the edges of introgressions. This is included in the Discussion lines 360-366, "While we are limited in our interpretations by only comparing two crosses (one cross with heterozygous introgression and one without introgression), these results are in line with findings in inversions, where heterozygotes show sharp decreases in COs, but the presence of NCOs in the inverted region (Crown et al., 2018; Korunes & Noor, 2019). However, unlike heterozygous inversions where an increase in COs is observed on freely recombining chromosomes (the inter-chromosomal effect), we do not see an increase in COs on the borders flanking introgression or on chromosomes without introgression."

      (7) Line 336 - "We find positive correlations between CO counts..." - you should indicate here that between fermentation and natural crosses, it was quite hard for me to understand what you calculated.

      We corrected the language as suggested.

      (8) The term "homology" usually means "having a common evolutionary origin" and does not specify the level of similarity between sequences, thus it cannot be measured. It is used incorrectly throughout the manuscript (also in the intro). I would use the term "similarity" to indicate the degree of similarity between two sequences.

      We corrected the language as suggested throughout the document.

      (9) Paragraph 360 and Figure 3 - was the "sliding window" overlapping or non-overlapping?

      We added clarifying language to the text in both places. We use a 101bp sliding window with 50bp overlaps.

      (10) Line 369 - what is "...the proportion of bases that are expected to match between the two parent strains..."?

      We clarified the language in this location, and hopefully changes associated with the comment about sequence similarity will make the comment even clearer in context.

      (11) Line 378 - should it refer to Figure S1 and not Figure 4?

      Corrected.

      (12) Line 399 - should refer to Figure 4, not Figure 5.

      Corrected

      (13) Line 444-449 - the analysis of loss of shuffling in the context of the location of introgression on the chromosome should be presented in the result section.

      We shifted the core of the analysis to the results, while leaving a brief summary in the discussion.

      (14) The authors should also take into account the presence of indels in their analyses, and they should be marked in the figures, if possible.

      We filtered out indels in our variant calling. However, we did analyze our crosses for the presence of large insertions and deletions (Table S2), which can obscure true recombination rates, and found that they were not an issue in our dataset.

      Reviewer #2 (Recommendations For The Authors):

      This reviewer suggests that the authors address the different points raised in the public review.

      (1) This reviewer would like to challenge the relevance of the r-parameter in light of chromosome 12 which has no introgression and still a strong depletion in r in the fermentation cross.

      We added this text to the Results, lines 377-381, "While it is unclear what potential mechanism is mediating the difference in shuffling on chromosome 12, we note that the rDNA locus on chromosome 12 is known to differ dramatically in repeat content across strains of S. cerevisiae (22–227 copies) (Sharma et a. 2022), and we speculate that differences in rDNA copy number between strains in our crosses could impact shuffling."

      (2) This reviewer insists on making sure that NCO detection is unaffected by the marker density, notably in the highly polymorphic regions, to unambiguously support Figure 1C.

      We've changed our correction for resolution to be more aggressive (see response to comment #2), and believe we have now adequately adjusted for marker density (see response to comment #38).

      Reviewer #3 (Recommendations For The Authors):

      I regret using such harsh language in the public review, but in my opinion, there has been a serious error in how marker densities are corrected for, and, since the manuscript is now public, it seems important to make it clear in public that I think that the conclusions of the paper are likely to be incorrect. I regret the distress that the public airing of this may cause. Below are my major concerns:

      (1) The paper is written in a way that makes it difficult to figure out just what the sequence differences are within the crosses. Part of this is, to be frank, the unusual way that the crosses were done, between more than one segregant each from two diploids in both natural and fermentation cases. I gather, from the homology calculations description, that each of these four diploids, while largely homozygous, contained a substantial number of heterozygosities, so individual diploids had different patterns of heterology. Is this correct? And if so, why was this strategy chosen? Why not start with a single diploid where all of the heterologies are known? Why choose to insert this additional complication into the mix? It seems to me that this strategy might have the perverse effect of having the heterology due to the polymorphisms present in one diploid affect (by correction) the impact of a noncrossover that occurs in a diploid that lacks the additional heterology. If polymorphic markers are a small fraction of total markers, then this isn't such a great concern, but I could not find the information anywhere in the manuscript. As a courtesy to the reader, please consider providing at the beginning some basic details about the starting strains-what is the average level of heterology between natural A and natural B, and what fraction of markers are polymorphic; what is the average level of heterology between fermentation A and fermentation B in non-introgressed regions, in introgressed regions, and what fraction of markers are polymorphic? How do these levels of heterology compare to what has been examined before in whole-genome hybrid strains? It also might be worth looking at some of the old literature describing S. cerevisiae/S. carlsbergensis hybrids.

      We thank the reviewer for drawing our attention to confusion about the cross construction. These crosses were conducted as is typical for yeast genetic crosses: we crossed 2 genetically distinct haploid parents to create a heterozygous diploid, then collected the haploid products of meiosis from the same F1 diploid. Because the crosses were made with haploid parents, it is not possible for other genetic differences to be segregating in the crosses. We have revised Figure 1 and its caption to clarify this. Further details regarding the crosses are in the Methods section "Strain and library construction" and in Supplemental Table S1. We only utilized genetic markers that are fixed differences between our parental strains to call CO and NCO. As we detail in the Methods line 507-508, we utilized a total of 24,574 markers for the natural cross and 74,619 markers for the fermentation cross (the higher number in the fermentation cross being due to more fixed differences in regions of introgression). We additionally revised Figure 2D (and Figure S11) to help readers better visualize differences between the crosses.

      (2) There are serious concerns about the methods used to identify noncrossovers and to normalize their levels, which are probably resulting in an artifactually high level of calculated crossovers in Figure 2. As a primary indication of this, it appears in Figure 2 that the total frequency of events (crossovers + noncrossovers) in heterozygous introgressed regions are substantially greater than those in the same region in non-introgressed strains, while just shifting of crossovers to noncrossovers would result in no net increase. The simplest explanation for this is that noncrossovers are being undercounted in non-introgressed relative to introgressed heterozygous regions. There are two possible reasons for this: i. The exclusion of all noncrossover events spanning less than three markers means that many more noncrossovers in introgressed heterozygous regions than in non-introgressed. Assuming that average non-homology is 5% in the former and 1% in the latter, the average 3-marker event will be 60 nt in introgressed regions and 300 nt in non-introgressed regions - so many more noncrossovers will be counted in introgressed regions. A way to check on this - look at the number of crossover-associated markers that undergo gene conversion; use the fraction that involves < 3 markers to adjust noncrossover levels (this is the strategy used by Mancera et al.). ii. The distance used for noncrossover level adjustment (2kb) is considerably greater than the measured average noncrossover lengths in other studies. The effect of using a too-long distance is to differentially under-correct for noncrossovers in non-introgressed regions, while virtually all noncrossovers in heterozygous introgressed regions will be detected. This can be illustrated by simulations that reduce the density of scored markers in heterozygous introgressed regions to the density seen in non-introgressed regions. Because these concerns go to the heart of the conclusions of the paper, they must be addressed quantitatively - if not, the main conclusions of the paper are invalid.

      We adjusted the correction factor (See also response to comment #2) and compared the average number of CO and NCO events in introgressed and non-introgressed regions between crosses (two comparisons: introgression CO+NCO in natural cross vs introgression CO+NCO in fermentation cross; non-introgression CO+NCO in natural cross vs non-introgression CO+NCO in fermentation cross). We found no significant differences between the crosses in either of the comparisons. This indicates that the distribution of total events is replicated in both crosses once we correct for resolution.

      (3) It is important to distinguish the landscape of double-strand breaks from the landscape of recombination frequencies. Double-strand breaks, as measured by uncalibrated levels of Spo11-linked oligos, is a relative number - not an absolute frequency. So it is possible that two species could have a similar break landscape in terms of topography but have absolute levels higher in one species than in the other.

      We agree with this statement, however, we have removed the relevant text to streamline our introduction.

      (4) Lines 123-125. Just meiosis will produce mosaic genomes in the progeny of the F1; further backcrossing will reduce mosaicism to the level of isolated regions of introgression.

      Adjusted the language to be more specific.

      (5) Please provide actual units for the Y axes in Figure 2D.

      We have corrected the units on the axes.

      (6) Tables (general). Are the significance measures corrected for multiple comparisons?

      In Table 3, the cutoff was chosen to be more conservative than a Bonferroni corrected alpha=0.01 with 9 comparisons (0.0011). In text, any result referred to as significant has an associated hypothesis test with a p-value less than its corresponding Bonferroni-corrected alpha of 0.05. This has been clarified in the caption for Table 3 and in the text where relevant.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      I have added a paragraph that addresses the issue of how landmarks might be used and why they are not. The suggestions made in the "Weaknesses" paragraph were concise and excellent and have directly incorporated them into my revised manuscript. This text appears on Page 21 and is shown below. I hope that this is what the editors and reviewers were looking.

      The requested revision is the second paragraph.

      The first paragraph was not written in response to reviews but inspired by a recent paper by Mahdev et al (2024) - https://doi.org/10.1038/s41593-024-01681-9.  I had already requested to add this reference and was encouraged to do so by the Editors. The Mahdev et al paper was very surprising in that it showed that path integration is not constant but that its "gain" can be recalibrated by selfmotion signals. I wondered whether this unexpected capacity extended to path integration also recalibrating the cognitive map and thereby generating the shortcutting behavior we observe. I suggested that, at an abstract level, this would correspond to "coordinate transformation" of the cognitive map. I realize that this is entirely speculative. If the Editors feel that it does not add much to the manuscript and that the speculation goes to far, I will remove the first paragraph and re-submit.

      Added text. P21 and just before the heading: " Implications for theories of hippocampal representations of spatial maps" There were no other changes made in the paper.

      "Path integration uses self-motion signals to update the animal's estimated location on its internal cognitive map. Path integration gain has been shown to be plastic and regulated by landmarks (52). Remarkably, a recent study has revealed that path integration gain can also be directly recalibrated by self-motion signals alone (53), albeit not as effectively as by landmarks (52, 53). An interesting question for future research is whether self-motion signals can also recalibrate the coordinates of a cognitive map. From this perspective, the Target B to Target A shortcut requires a transformation of the cognitive map coordinates so that the start point is now Target B.

      Extensive research has shown that external cues can control hippocampal neuron place fields (11, 12, 54) and the gain of the path integrator (52), making the failure of mice in our study to use such cues puzzling. The failure to use landmarks may be related to our task being low stakes and our pretraining procedure teaching the mouse that such cues are not necessary. Our results may not generalize to more natural conditions where many reliable prominent cues are available, and where there is urgency to find food or water while avoiding predation (55). Under these more naturalistic conditions the use of distal cues to rapidly find a food reward is more likely to be observed."

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this work, the authors continue their investigations on the key role of glycosylation to modulate the function of a therapeutic antibody. As a follow-up to their previous demonstration on how ADCC was heavily affected by the glycans at the Fc gamma receptor (FcγR)IIIa, they now dissect the contributions of the different glycans that decorate the diverse glycosylation sites. Using a well-designed mutation strategy, accompanied by exhaustive biophysical measurements, with extensive use of NMR, using both standard and newly developed methodologies, they demonstrate that there is one specific locus, N162, which is heavily involved in the stabilization of (FcγR)IIIa and that the concomitant NK function is regulated by the glycan at this site.

      Strengths:

      The methodological aspects are carried out at the maximum level.

      Weaknesses:

      The exact (or the best possible assessment) of the glycan composition at the N162 site is not defined.

      We revised the Introduction to include previous findings from our laboratory regarding processing on YTS cells:

      “YTS cells, a key cytotoxic human NK cell line used for these studies, express FcγRIIIa with extensive glycan processing, including the N162 site with predominantly hybrid and complex-type glycoforms {Patel 2021}.” 

      Reviewer #2 (Public Review):

      Summary:

      The authors set out to demonstrate a mechanistic link between Fcgamma receptor (IIIA) glycosylation and IgG binding affinity and signaling - resulting in antibody-dependent cellular cytotoxicity - ADCC. The work builds off prior findings from this group about the general impact of glycosylation on FcR (Fc receptor)-IgG binding.

      Strengths:

      The structural data (NMR) is highly compelling and very significant to the field. A demonstration of how IgG interacts with FcgRIIIA in a manner sensitive to glycosylation of both the IgG and the FcR fills a critical knowledge gap. The approach to demonstrate the selective impact of glycosylation at N162 is also excellent and convincing. The manuscript/study is, overall, very strong.

      Weaknesses:

      There are a number of minor weaknesses that should be addressed.

      (1) Since S164A is the only mutant in Figure 1 that seems to improve affinity, even if minimally, it would be a nice reference to highlight that residue in the structural model in panel B.

      We revised Figure 1B to include the S164 site.

      (2) It is confusing why some of the mutants in the study are not represented in Figure 1 panel A. Those affinities and mutants should be incorporated into panel A so the reader can easily see where they all fall on the scale.

      We thank the reviewer for this comment. We restructured the Results section to highlight that a primary outcome of the experiment referenced was to map the contribution of interface residues to antibody binding affinity. These data were not previously available, highlighting hotspots at the interface. Figure 1A and B report these results.

      We then used a subset of mutations from this experiment, as well as a subset of mutations from an additional library containing mutations proximal to the interface, to build a small library for evaluation using ADCC. The complete binding data for all variants, binding to two different IgG1 Fc glycoforms, is presented in Supplemental Table 1. 

      T167Y in particular needs to be shown, as it is one of few mutants that fall between what seems to be ADCC+ and ADCC- lines. Also, that mutant seems to have a stronger affinity compared to wt (judged by panel D), yet less ADCC than wt. This would imply that the relationship between affinity and activity is not as clean as stated, though it is clearly important. Comments about this would strengthen the overall manuscript.

      We thank the reviewer for this particular insight. We agree that the lack of a clean correlation between ADCC potency and affinity implies additional factors that could have affected these experimental results. We added the following sentence to the discussion. 

      “Notably, the ADCC potency for those high-affinity variants does not fall cleanly on a line, indicating that other factors affect our observations, which may include organization at the cell surface, changes to glycan composition, or receptor trafficking.”

      (3) This statement feels out of place: "In summary, this result demonstrates that the sensitivity to antibody fucosylation may be eliminated through FcγRIIIa engineering while preserving antibody-binding affinity." In Figure 2, the authors do indeed show that mutations in FcgRIIIa can alter the impact of IgG core fucosylation, but implying that receptor engineering is somehow translatable or as impactful therapeutically as engineering the antibody itself deflates the real basic science/biochemical impact of understanding these interactions in molecular detail. Not everything has to be immediately translatable to be important. 

      We agree and removed the highlighted sentence.   

      (4) The findings reported in Figure 2, panel C are exciting. Controls for the quality of digestion at each step should be shown (perhaps in supplementary data).

      We agree. We added an example of the digestions as Figure S2.  

      (5) Figure 3 is confusing (mislabeled?) and does not show what is described in the Results. First, there is a F158V variant in the graph but a V158F variant in the text.

      Please correct this. 

      Thank you for identifying this typo. We corrected Figure 3.

      Second, this variant (V158F/F158V) does not show the 2-fold increase in ADCC with kifunesine as stated. 

      Thank you for drawing our attention to this rounding error. We revised the text to report a statistically significant 1.4-fold increase.

      Finally, there are no statistical evaluations between the groups (+/- kif; +/- fucose). 

      We provide the p values for +/-fuc and +/- Kifunensine for each YTS cell line in the figure. We did not provide a global comparison of p values that included all cell lines due to some cell lines experiencing a significant change and others not. However, we added the raw data as Supplemental Table 2 should readers wish to perform these analyses.

      The differences stated are not clearly statistically significant given the wide spread of the data. This is true even for the wt variant.

      We agree that there are points that overlap in this figure between the different treatments. However, our use of the students T-test (two tailed) using three experiments collected on three different days (each with three technical replicates) provides enough resolution to determine the significance of difference of the means for the different treatments. This is, by our estimation, a highly rigorous manner to collect and analyze the data.  

      (6) The kifunensine impact is somewhat confusing. They report a major change in ADCC, yet similar large changes with trimming only occur once most of the glycan is nearly gone (Figure 2). Kifunensine will tend to generate high mannose and possibly a few hybrid glycans. It is difficult to understand what glycoforms are truly important outside of stating that multi-branched complex-type N-glycans decrease affinity.

      Note that Figure 2 does not evaluate the kifunensine-treated glycan, which is mostly Man8 and Man9 structures. In our previous work, these structures likewise provide increased binding affinity (see pubmed ID 30016589). We believe the most important message is that composition of the N162 glycan (removed with the S164A mutation) regulates NK cell ADCC. On cells, we are not able to modulate N162 glycan composition without affecting potentially every other N-glycan on the surface, so we do not have an ADCC experiments that is directly comparable to Figure 2. Thus, this increased ADCC resulting from kifunensine treatment is consistent with previously observed increases in binding affinity measurement.  

      (7) This is outside of the immediate scope, but I feel that the impact would be increased if differences in NK cell (and thus FcgRIIIA) glycosylation are known to occur during disease, inflammation, age, or some other factor - and then to demonstrate those specific changes impact ADCC activity via this mechanism.

      We agree completely. As mentioned in the Introduction, we know that N162 glycan composition varies substantially from donor to donor based on previous work from our

      lab. Curiously, little variability appeared between donors at the other four Nglycosylation sites. Thus, there is the potential that different NK cell N162 glycan compositions are coincident with different indications. This is an area we are quite interested in pursuing.

    1. Author response:

      The following is the authors’ response to the current reviews.

      Many thanks to the editors for the reviewing of the revised manuscript.

      We are very grateful to the Reviewers for their time and for the appreciation of the revision.

      We thank the Reviewer 3 for acknowledging the use of sulforhodamine B (SRB) fluorescence as a real-time readout of astrocyte volume dynamics. Experimental data in brain slices were provided to validate this approach.<br /> The incomplete matching of our observation with early reported data in cultured astrocytes (e.g., Solenov et al., AJP-Cell, 2004), might reflect certain of their properties differing from the slice/in vivo counterparts as discussed in the manuscript.<br /> The study (T.R. Murphy et al., Front Cell Neurosci., 2017) showed that AQP4 knockout increased astrocyte swelling extent in response to hypoosmotic solution in brain slices (Fig 9), and discussed '... AQP4 can provide an efficient efflux pathway for water to leave astrocytes.’ Correspondingly, our data suggest that AQP4 mediate astrocyte water efflux in basal conditions.<br /> We have discussed the study (Igarashi et al., NeuroReport 2013); our current data would help to understand the cellular mechanisms underlying the finding of Igarashi et al.


      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Pham and colleagues provide an illuminating investigation of aquaporin-4 water flux in the brain utilizing ex vivo and in vivo techniques. The authors first show in acute brain slices, and in vivo with fiber photometry, SRB-loaded astrocytes swell after inhibition of AQP4 with TGN-020, indicative of tonic water efflux from astrocytes in physiological conditions. Excitingly, they find that TGN-020 increases the ADC in DW-MRI in a region-specific manner, potentially due to AQP4 density. The resolution of the DW-MRI cannot distinguish between intracellular or extracellular compartments, but the data point to an overall accumulation of water in the brain with AQP4 inhibition. These results provide further clarity on water movement through AQP4 in health and disease.

      Overall, the data support the main conclusions of the article, with some room for more detailed treatment of the data to extend the findings.

      Strengths:

      The authors have a thorough investigation of AQP4 inhibition in acute brain slices. The demonstration of tonic water efflux through AQP4 at baseline is novel and important in and of itself. Their further testing of TGN-020 in hyper- and hypo-osmotic solutions shows the expected reduction of swelling/shrinking with AQP4 blockade.

      Their experiment with cortical spreading depression further highlights the importance of water efflux from astrocytes via AQP4 and transient water fluxes as a result of osmotic gradients. Inhibition of AQP4 increases the speed of tissue swelling, pointing to a role in the efflux of water from the brain.

      The use of DW-MRI provides a non-invasive measure of water flux after TGN-020 treatment.

      We thank the reviewer for the insightful comments.

      Weaknesses:

      The authors specifically use GCaMP6 and light sheet microscopy to image their brain sections in order to identify astrocytic microdomains. However, their presentation of the data neglects a more detailed treatment of the calcium signaling. It would be quite interesting to see whether these calcium events are differentially affected by AQP4 inhibition based on their cellular localization (ie. processes vs. soma vs. vascular end feet which all have different AQP4 expressions).

      Following the suggestion, we provide new data on the effect of AQP4 inhibition on spontaneous calcium signals in perivascular astrocyte end-feet. As shown now in Fig.S2, acute application of TGN020 induced Ca2+ oscillations in astrocyte end-feet regions where the GCaMP6 labeling lines the profile of the blood vessel. It is noted that on average, the strength of basal Ca2+ signals in the end-feet is higher than that observed across global astrocyte territories (4.65 ± 0.55 vs. 1.45 ± 0.79, p < 0.01), as does the effect of TGN (8.4 ± 0.62 vs. 6.35 ± 0.97, p < 0.05; Fig S2 vs. Fig 2B). This likely reflects the enrichment of AQP4 in astrocyte end-feet. We describe the data in Fig.S2, and on page 8, line 20 – 23.  

      We now use the transgenic line GLAST-GCaMP6 for cytosolic GCaMP6 expression in astrocytes. Spontaneous calcium signals, reflected by transient fluorescence rises, occur in discrete micro-domains whereas the basal GCaMP6 fluorescence in the soma is weak. In the present condition, it is difficult to unambiguously discriminate astrocyte soma from the highly intermingled processes. 

      The authors show the inhibition of AQP4 with TGN-020 shortens the onset time of the swelling associated with cortical spreading depression in brain slices. However, they do not show quantification for many of the other features of CSD swelling, (ie. the duration of swelling, speed of swelling, recovery from swelling).

      Regarding the features of the CSD swelling, we have performed new analysis to quantify the duration of swelling, speed of swelling and the recovery time from swelling in control condition and in the presence of TGN-020. The new analysis is now summarized in Fig. S5. Blocking AQP4 with TGN-020 increases the swelling speed, prolongs the duration of swelling and slows down the recovery from swelling, confirming our observation that acute inhibition of AQP4 water efflux facilitates astrocyte swelling while restrains shrinking. We describe the result on page 11, line 19-21. 

      Significance:

      AQP4 is a bidirectional water channel that is constitutively open, thus water flux through it is always regulated by local osmotic gradients. Still, characterizing this water flux has been challenging, as the AQP4 channel is incredibly water-selective. The authors here present important data showing that the application of TGN-020 alone causes astrocytic swelling, indicating that there is constant efflux of water from astrocytes via AQP4 in basal conditions. This has been suggested before, as the authors rightfully highlight in their discussion, but the evidence had previously come from electron microscopy data from genetic knockout mice.

      AQP4 expression has been linked with the glymphatic circulation of cerebrospinal fluid through perivascular spaces since its rediscovery in 2012 [1]. Further studies of aging[2], genetic models[3], and physiological circadian variation[4] have revealed it is not simply AQP4 expression but AQP4 polarization to astrocytic vascular endfeet that is imperative for facilitating glymphatic flow. Still, a lingering question in the field is how AQP4 facilitates fluid circulation. This study represents an important step in our understanding of AQP4's function, as the basal efflux of water via AQP4 might promote clearance of interstitial fluid to allow an influx of cerebrospinal fluid into the brain. Beyond glymphatic fluid circulation, clearly, AQP4-dependent volume changes will differentially alter astrocytic calcium signaling and, in turn, neuronal activity.

      (1) Iliff, J.J., et al., A Paravascular Pathway Facilitates CSF Flow Through the Brain Parenchyma and the Clearance of Interstitial Solutes, Including Amyloid β. Sci Transl Med, 2012. 4(147): p. 147ra111.

      (2) Kress, B.T., et al., Impairment of paravascular clearance pathways in the aging brain. Ann Neurol, 2014. 76(6): p. 845-61.

      (3) Mestre, H., et al., Aquaporin-4-dependent Glymphatic Solute Transport in the Rodent Brain. eLife, 2018. 7.

      (4) Hablitz, L., et al., Circadian control of brain glymphatic and lymphatic fluid flow. Nature Communications, 2020. 11(1).

      We thank the reviewer in acknowledging the significance of our study and the functional implication in brain glymphatic system. We have now highlighted the mentioned studies as well as the potential implication glymphatic fluid circulation (page 4, line 9-10; page 5, line 1-3; and page 19, line 3-10). 

      Reviewer #2 (Public Review):

      Summary:

      The paper investigates the role of astrocyte-specific aquaporin-4 (AQP4) water channel in mediating water transport within the mouse brain and the impact of the channel on astrocyte and neuron signaling. Throughout various experiments including epifluorescence and light sheet microscopy in mouse brain slices, and fiber photometry or diffusion-weighted MRI in vivo, the researchers observe that acute inhibition of AQP4 leads to intracellular water accumulation and swelling in astrocytes. This swelling alters astrocyte calcium signaling and affects neighboring neuron populations. Furthermore, the study demonstrates that AQP4 regulates astrocyte volume, influencing mainly the dynamics of water efflux in response to osmotic challenges or associated with cortical spreading depolarization. The findings suggest that AQP4-mediated water efflux plays a crucial role in maintaining brain homeostasis, and indicates the main role of AQP4 in this mechanism. However authors highlight that the report sheds light on the mechanisms by which astrocyte aquaporin contributes to the water environment in the brain parenchyma, the mechanism underlying these effects remains unclear and not investigated. The manuscript requires revision.

      Strengths:

      The paper elucidates the role of the astrocytic aquaporin-4 (AQP4) channel in brain water transport, its impact on water homeostasis, and signaling in the brain parenchyma. In its idea, the paper follows a set of complimentary experiments combining various ex vivo and in vivo techniques from microscopy to magnetic resonance imaging. The research is valuable, confirms previous findings, and provides novel insights into the effect of acute blockage of the AQP4 channel using TGN-020.

      We thank the reviewer for the constructive comments.

      Weaknesses:

      Despite the employed interdisciplinary approach, the quality of the manuscript provides doubts regarding the significance of the findings and hinders the novelty claimed by the authors. The paper lacks a comprehensive exploration or mention of the underlying molecular mechanisms driving the observed effects of astrocytic aquaporin-4 (AQP4) channel inhibition on brain water transport and brain signaling dynamics. The scientific background is not very well prepared in the introduction and discussion sections. The important or latest reports from the field are missing or incompletely cited and missconcluded. There are several citations to original works missing, which would clarify certain conclusions. This especially refers to the basis of the glymphatic system concept and recently published reports of similar content. The usage of TGN-020, instead of i.e. available AER-270(271) AQP4 blocker, is not explained. While employing various experimental techniques adds depth to the findings, some reasoning behind the employed techniques - especially regarding MRI - is not clear or seemingly inaccurate. Most of the time the number of subjects examined is lacking or mentioned only roughly within the figure captions, and there are lacking or wrongly applied statistical tests, that limit assessment and reproducibility of the results. In some cases, it seems that two different statistical tests were used for the same or linked type of data, so the results are contradictory even though appear as not likely - based on the figures. Addressing these limitations could strengthen the paper's impact and utility within the field of neuroscience, however, it also seems that supplementary experiments are required to improve the report.

      The current data hint at a tonic water efflux from astrocyte AQP4 in physiological condition, which helps to understand brain water homeostasis and the functional implication for the glymphatic system. The underlying molecular and cellular mechanisms appear multifaceted and functionally interconnected, as discussed (page 14 line 8 –page 15, line 3). We agree that a comprehensive exploration will further advance our understanding.

      The introduction and discussion are now strengthened by incorporating the important advances in glymphatic system while highlighting the relevant studies. 

      The use of TGN-020 was based on its validation by wide range of ex vivo and in vivo studies including the use of heterologous expression system and the AQP4 KO mice. The validation of AER-270(271, the water soluble prodrug) using AQP4 KO mice is reported recently (Giannetto et al., 2024). AER-271 was noted to impact brain water ADC (apparent diffusion coefficient evaluated by diffusion-weighted MRI) in AQP4 KO mice ~75 min after the drug application (Giannetto et al., 2024). This likely reflects that AER270(271) is also an inhibitor for κΒ nuclear factor (NF-κΒ) whose inhibition could reduce CNS water content independent of AQP4 targeting (Salman et al., 2022). In addition, the inhibition efficiency of AER-270(271) seems lower than TGN-020 (Farr et al., 2019; Giannetto et al., 2024; Huber et al., 2009; Salman et al., 2022). We have now supplemented this information in the manuscript (page 7, line 1-6 and page15, line 7-17).

      The description on the DW-MRI is now updated (page 4, line 10-14). 

      We also performed new experiments and data analysis as described in a point-to-point manner below in the section ‘Recommendations For The Authors’.

      Reviewer #3 (Public Review):

      Summary:

      In this manuscript, the authors propose that astrocytic water channel AQP4 represents the dominant pathway for tonic water efflux without which astrocytes undergo cell swelling. The authors measure changes in astrocytic sulforhodamine fluorescence as the proxy for cell volume dynamics. Using this approach, they perform a technically elegant series of ex vivo and in vivo experiments exploring changes in astrocytic volume in response to AQP4 inhibitor TGN-020 and/or neuronal stimulation. The key finding is that TGN-020 produces an apparent swelling of astrocytes and modifies astrocytic cell volume regulation after spreading depolarizations. Additionally, systemic application of TGN-020 produced changes in diffusion-weighted MRI signal, which the authors interpret as cellular swelling. This study is perceived as potentially significant. However, several technical caveats should be strongly considered and perhaps addressed through additional experiments.

      Strengths:

      (1) This is a technically elegant study, in which the authors employed a number of complementary ex vivo and in vivo techniques to explore functional outcomes of aquaporin inhibition. The presented data are potentially highly significant (but see below for caveats and questions related to data interpretation).

      (2) The authors go beyond measuring cell volume homeostasis and probe for the functional significance of AQP4 inhibition by monitoring Ca2+ signaling in neurons and astrocytes (GCaMP6 assay).

      (3) Spreading depolarizations represent a physiologically relevant model of cellular swelling. The authors use ChR2 optogenetics to trigger spreading depolarizations. This is a highly appropriate and much-appreciated approach.

      We thank the reviewer for the effort in evaluating our work.

      Weaknesses:

      (1) The main weakness of this study is that all major conclusions are based on the use of one pharmacological compound. In the opinion of this reviewer, the effects of TGN-020 are not consistent with the current knowledge on water permeability in astrocytes and the relative contribution of AQP4 to this process.

      Specifically: Genetic deletion of AQP4 in astrocytes reduces plasmalemmal water permeability by ~two-three-fold (when measured a 37oC, Solenov et al., AJP-Cell, 2004). This is a significant difference, but it is thought to have limited/no impact on water distribution. Astrocytic volume and the degree of anisosmotic swelling/shrinkage are unchanged because the water permeability of the AQP4null astrocytes remains high. This has been discussed at length in many publications (e.g., MacAulay et al., Neuroscience, 2004; MacAulay, Nat Rev Neurosci, 2021) and is acknowledged by Solenov and Verkman (2004).

      Keeping this limitation in mind, it is important to validate astrocytic cell volume changes using an independent method of cell volume reconstruction (diameter of sulforhodamine-labeled cell bodies? 3D reconstruction of EGFP-tagged cells? Else?)

      Solenov and coll. used the calcein quenching assay and KO mice demonstrating AQP4 as a functional water channel in cultured astrocytes (Solenov et al., 2004). AQP4 deletion reduced both astrocyte water permeability and the absolute amplitude of swelling over comparable time, and also slowed down cell shrinking, which overall parallels our results from acute AQP4 blocking. Yet in Solenovr’s study, the time to swelling plateau was prolonged in AQP4 KO astrocytes, differing from our data from the pharmacological acute blocking. This discrepancy may be due to compensatory mechanisms in chronic AQP4 KO, or reflect the different volume responses in cultured astrocytes from brain slices or in vivo results as suggested previously (Risher et al., 2009). 

      Soma diameter might be an indicator of cell volume change, yet it is challenging with our current fluorescence imaging method that is diffraction-limited and insufficient to clearly resolve the border of the soma in situ. In addition, the lateral diameter of cell bodies may not faithfully reflect the volume changes that can occur in all three dimensions. Rapid 3D imaging of astrocyte volume dynamics with sufficient high Z-axis resolution appears difficult with our present tools. 

      We have now accordingly updated the discussion with relevant literatures being cited (page 17 line 14 – page 18, line 3).

      (2) TGN-020 produces many effects on the brain, with some but not all of the observed phenomena sensitive to the genetic deletion of AQP4. In the context of this work, it is important to note that TGN020 does not completely inhibit AQP4 (70% maximal inhibition in the original oocyte study by Huber et al., Bioorg Med Chem, 2009). Thus, besides not knowing TGN-020 levels inside the brain, even

      "maximal" AQP4 inhibition would not be expected to dramatically affect water permeability in astrocytes.

      This caveat may be addressed through experiments using local delivery of structurally unrelated AQP4 blockers, or, preferably, AQP4 KO mice.

      It is an important point that TGN-020 partially blocks AQP4, implying the actual functional impact of AQP4 per se might be stronger than what we observed. TGN provides a means to acutely probe AQP4 function in situ, still we agree, its limitation needs be acknowledged. We mention this now on page 15, line 7-9 and 14-17.

      We agree that local delivery of an alternative blocker will provide additional information. Meanwhile, local delivery requires the stereotaxic implantation of cannula, which would cause inflammations to surrounding astrocytes (and neurons). The recently introduced AQP4 blocker AER-270(271) has received attention that it influences brain water dynamics (ADC in DW-MRI) in AQP4 KO mice (Giannetto et al., 2024), recalling that AER-270(271) is also an inhibitor for κΒ nuclear factor (NF-κΒ). This pathway can potentially perturb CNS water content and influence brain fluid circulation, in an AQP4independent manner (Salman et al., 2022). The inhibition efficiency on mouse AQP4 of AER-270 (~20%, Farr et al., 2019; Salman et al., 2022) appears lower than TGN-020 (~70%, Huber et al., 2009).

      We chose to use the pharmacological compound to achieve acute blocking of AQP4 thereby avoiding the chronic genetics-caused alterations in brain structural, functional and water homeostasis. Multiple lines of evidence including the recent study (Gomolka et al., 2023), have shown that AQP4 KO mice alters brain water content, extracellular space and cellular structures, which raises concerns to use the transgenic mouse to pinpoint the physiological functions of the AQP4 water channel. 

      We have now mentioned the concerns on AQP4 pharmacology by supplementing additional literatures in the field (page 15, line 8-18). 

      (3) This reviewer thinks that the ADC signal changes in Figure 5 may be unrelated to cellular swelling. Instead, they may be a result of the previously reported TGN-020-induced hyphemia (e.g., H. Igarashi et al., NeuroReport, 2013) and/or changes in water fluxes across pia matter which is highly enriched in AQP4. To amplify this concern, AQP4 KO brains have increased water mobility due to enlarged interstitial spaces, rather than swollen astrocytes (RS Gomolka, eLife, 2023). Overall, the caveats of interpreting DW-MRI signal deserve strong consideration.

      The previous observation show that TGN-020 increases regional cerebral blood flow in wild-type mice but not in AQP4 KO mice (Igarashi et al., 2013). Our current data provide a possible mechanism explanation that TGN-020 blocking of astrocyte AQP4 causes calcium rises that may lead to vasodilation as suggested previously (Cauli and Hamel, 2018). We now add updates to the discussion on page 15, line 3-7.

      We are in line with the reviewer regarding the structural deviations observed with the AQP4 KO mice

      (Gomolka et al., 2023), now mentioned on page 19, line 3-5. Following the Reviewer’s suggestion, we have also updated the interpretation of the DW-MRI signal and point that in addition to being related to the astrocyte swelling, the ADC signal changes may also be caused by indirect mechanisms, such as the transient upregulation of other water-permeable pathways in compensating AQP4 blocking. We now describe this alternative interpretation and the caveats of the DW-MRI signals (page 20, line 1-8). 

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Private recommendations

      My more broad experimental suggestions are in the "weaknesses" section. Some minor points that would improve the manuscript are included below:

      (1) A more detailed explanation for why SRB fluorescence reflects the astrocyte volume changes, whereas typical intracellular GFP does not.

      As an engineered fluorescence protein, the GFP has been used to tag specific type of cells. Meanwhile, as a relatively big protein (MW, 26.9 kDa), the diffusion rate of EGFP is expected to be much less than SRB, a small chemical dye (MW, 558.7 Da). Also, the IP injection of SRB enables geneticsless labeling of brain astrocytes, so to avoid the influence of protein overexpression on astrocyte volume and water transport responses. We have now stated this point in the manuscript (page 13, line 21 – page 14, line 4).

      (2) Figure 1 panel B should have clear labels on the figure and a description in the legend to delineate which part of the panel refers to hyper- or hypo-osmotic treatment.

      We have now updated the figure and the legend.  

      (3) For Figure 2, what is the rationale for analyzing the calcium signaling data between the cell types differently?

      We analyzed calcium micro-domains for astrocytes as their spontaneous signals occur mainly in discrete micro-domains (Shigetomi et al., 2013). While for neurons, we performed global analysis by calculating the mean fluorescence of imaging field of view, because calcium signal changes were only observed at global level rather than in micro-domains. This information is now included (page 24, line1820).

      (4) For Figure 3, the authors mention that TGN-020 likely caused swelling prior to the hypotonic solution administration. Do they have any measurements from these experiments prior to the TGN-020 application to use as a "true baseline" volume?

      The current method detects the relative changes in astrocyte volume (i.e., transmembrane water transport), which nevertheless is blind to the absolute volume value. We have no readout on baseline volumes.  

      (5) For Figures 3 and 4, did the authors see any evidence for regulatory volume decrease? And is this impaired by TGN-020? It is a well-characterized phenomenon that astrocytes will open mechanosensitive channels to extrude ions during hypo-osmotic induced swelling. This process is dependent on AQP4 and calcium signaling [5]

      Mola and coll. provided important results demonstrating the role of AQP4 in astrocyte volume regulation (Mola et al., 2016). In the present study in acute brain slices, when we applied hypotonic solution to induce astrocyte swelling, our protocol did not reveal rapid regulatory volume decrease (e.g., Fig. 3D). When we followed the volume changes of SRB-labeled astrocytes during optogenetically induced CSD, we observed the phase of volume decrease following the transient swelling (Fig. 4F), where the peak amplitude and the degree of recovery were both reduced by inhibiting AQP4 with TGN020. These data imply that regulatory astrocyte volume decrease may occur in specific conditions, which intriguingly has been suggested to be absent in brain slices and in vivo (e.g., Risher et al., 2009). We have not specifically investigated this phenomenon, and now briefly discuss this point on page18 line 6-14.

      (6) Figure 5 box plots do not show all data points, could the authors modify to make these plots show all the animals, or edit the legend to clarify what is plotted?

      We have now updated the plot and the legend. This plot is from all animals (n = 7 per condition).

      (7) pg. 9 line 6, there is a sentence that seems incomplete or otherwise unfinished. "We first followed the evoked water efflux and shrinking induced by hypertonic solution while."

      Fixed (now, page 9 line 17-18). 

      (8)  During the discussion on pg 13 line 11, it may be more clear to describe this as the cotransport of water into the cells with ions/metabolites as reviewed by Macaulay 2021 [6].

      We agree; the text is modified following this suggestion (now page14, line 12-13).  

      (1) Iliff, J.J., et al., A Paravascular Pathway Facilitates CSF Flow Through the Brain Parenchyma and the Clearance of Interstitial Solutes, Including Amyloid β. Sci Transl Med, 2012. 4(147): p. 147ra111.

      (2) Kress, B.T., et al., Impairment of paravascular clearance pathways in the aging brain. Ann Neurol, 2014. 76(6): p. 845-61.

      (3) Mestre, H., et al., Aquaporin-4-dependent Glymphatic Solute Transport in the Rodent Brain. eLife, 2018. 7.

      (4) Hablitz, L., et al., Circadian control of brain glymphatic and lymphatic fluid flow. Nature Communications, 2020. 11(1).

      (5) Mola, M., et al., The speed of swelling kinetics modulates cell volume regulation and calcium signaling in astrocytes: A different point of view on the role of aquaporins. Glia, 2016. 64(1).

      (6) MacAulay, N., Molecular mechanisms of brain water transport. Nat Rev Neurosci, 2021. 22(6): p. 326-344.

      We thank the reviewer. These important literatures are now supplemented to the manuscript together with the corresponding revisions.

      Reviewer #2 (Recommendations For The Authors):

      In its concept, the paper is interesting and provides additional value - however, it requires revision.

      Below, I provide the following remarks for the following sections/ pages/lines:

      ABSTRACT/page 2 (remarks here refer to the rest of the manuscript, where these sentences are repeated):

      - It seems that the 'homeostasis' provides not only physical protection, but also determines the diffusion of chemical molecules...' Please correct the sentence as it is grammatically incorrect.

      It is now corrected (page 2, line 1).

      - The term 'tonic water' is not clear. I understand, after reading the paper, that it is about tonicity of the solutes injected into the mouse.

      We use the term ‘tonic’ to indicate that in basal conditions, a constant water efflux occurs through the APQ4 channel.

      - 'tonic aquaporin water efflux maintains volume equilibrium' - I believe it is about maintaining volume and osmotic equilibrium?

      This description is now refined (now page 2, line 10).

      - It is not clear whether the tonic water outflow refers to the cellular level or outflow from the brain parenchyma (i.e., glymphatic efflux)

      It refers to the cellular level. 

      INTRODUCTION/page 3:

      - 'clearance of waste molecules from the brain as described in the glymphatic system' - The original papers describing the phenomena are not cited: Iliff et al. 2012, 2013, Mestre et al. 2018, as well as reviews by Nedergaard et al.

      Indeed. We have now cited these key literatures (now page 4, line 10).

      - 'brain water diffusion is the basis for diffusion-weighted magnetic resonance imaging (DW-MRI)' - The statement is wrong. it is the mobility of the water protons that DWI is based on, but not the diffusion of molecules in the brain. This should be clarified and based on the DW-MRI principle and the original works by Le Bihan from 1986, 1988, or 2015.

      This sentence is now updated (page 4, line10-14).

      - Similarly, I suggest correcting or removing the citations and the sentence part regarding the clinical use of DWI, as it has no value here. Instead, it would be worth mentioning what actually ADC reflects as a computational score, and what were the results from previous studies assessing glymphatic systems using DWI. This is especially important when considering the mislocalization of the AQP4 channel.

      We now states recent studies using DW-MRI to evaluate glymphatic systems (page 4, line16-17).  

      - 'In the brain, AQP4 is predominantly expressed in astrocytes'-please review the citations. I suggest reading the work by Nielsen 1997, Nagelhus 2013, Wolburg 2011, and Li and Wang from 2017. To my best knowledge, in the brain AQP4 is exclusively expressed in astrocytes.

      Thanks for the reviewer. It is described that while enriched in astrocytes, AQP4 is also expressed in ependymal cells lining the ventricles (e.g., (Mayo et al., 2023; Verkman et al., 2006)). ‘predominantly’ is now removed (page 4, line 21).

      - The conclusion: ' Our finding suggests that aquaporin acts as a water export route in astrocytes in physiological conditions, so as to counterbalance the constitutive intracellular water accumulation caused by constant transmitter and ion uptake, as well as the cytoplasmic metabolism processes. This mechanism hence plays a necessary role in maintaining water equilibrium in astrocytes, thereby brain water homeostasis' seems to be slightly beyond the actual findings in the paper. I suggest clarifying according to the described phenomena.

      We have now refined the conclusion sticking to the experimental observations (page 5, line16-18).

      - The introduction lacks important information on existing AQP4 blockers and their effects, pros and cons on why to use TGN-020. Among others, I would refer to recent work by Giannetto et al 2024, as well as previous work of Mestre et al. 2018 and Gomolka et al. 2023.

      We initiated the study by using TGN-020 as an AQP4 blocker because it has been validated by wide range of ex vivo and in vivo studies as documented in the text (page 7, line 1-6). We also update discussions on the recent advances in validating the AQP4 blocker AER-270(271) while citing the relevant studies (page 15, line 7-17).  

      RESULTS:

      - Page 5, lines 19-20: '...transport, we performed fluorescence intensity translated (FIT) imaging.' - this term was never introduced in the methods so it is difficult for the reader to understand it at first sight. -'To this end,' - it is not clear which action refers to 'this'. (is it about previous works or the moment that the brain samples were ready for imaging? Please clarify, as it is only starting to be clear after fully reading the methods.

      We now refine the description give the principle of our imaging method first, then explain the technical steps. To avoid ambiguity, the term ‘To this end’ is removed. The updated text is now on page 6, line 1-3.  

      - From page 6 onwards - all references to Figures lack information to which part of the figure subpanel the information refers (top/middle bottom or left/middle/right).

      We apologize. The complementary indication is now added for figure citations when applicable.  

      - 'whereas water export and astrocyte shrinking upon hyperosmotic manipulation increased astrocyte fluorescence (Figure 1B). Hence, FIT imaging enables real-time recording of astrocyte transmembrane water transport and volume dynamics.' - this part seems to be undescribed or not clear in the methods.

      We have now refined this description (page 6, line 19-20).

      - Page 6, lines 17-22: TGN-020. In addition to the above, I suggest familiarizing also with the following works by Igarashi 2011. doi: 10.1007/s10072-010-0431-1, and by Sun 2022. doi: 10.3389/fimmu.2022.870029.

      These studies are now cited (page 7, line 3-4).

      - Page 7: ' AQP4 is a bidirectional channel facilitating... ' - AQP4 water channel is known as the path of least resistance for water transfer, please see Manley, Nature Medicine, 2000 and Papadopoulos, Faseb J, 2004.

      This sentence is now updated (page 7, line 12-13).

      - ' astrocyte AQP4 by TGN-020 caused a gradual decrease in SRB fluorescence intensity, indicating an intracellular water accumulation' - tissue slice experiment is a very valuable method. However it seems right, the experiment does not comment on the cell swelling that may occur just due to or as a superposition of tissue deterioration and the effect of TGN-020. The AQP4 channel is blocked, and the influx of water into astrocytes should be also blocked. Thus, can swelling be also a part of another mechanism, as it was also observed in the control group? I suggest this should be addressed thoroughly.

      We performed this experiment in acute brain slices to well control the pharmacological environment and gain spatial-temporal information. Post slicing, the brain slices recovered > 1hr prior to recording, so that the slices were in a stable state before TGN-020 application as evidenced by the stable baseline. The constant decrease in the control trace is due to photobleaching which did not change its curve tendency in response to vehicle. TGN-020, in contrast, caused a down-ward change suggesting intracellular water accumulation and swelling. 

      The experiment was performed at basal condition without active water influx; a decrease in SRB fluorescence hints astrocyteintracellular water buildup. This result shows that in basal condition, astrocyte aquaporin mediates a constant (i.e., tonic) water efflux; its blocking causes intracellular water accumulation and swelling. 

      We have accordingly updated the description of this part (page 7, line 15-20).

      - From the Figure 1 legend: Only 4 mice were subjected to the experiment, and only 1 mouse as a control. I suggest expanding the experiment and performing statistics including two-way ANOVA for data in panels B, C, and D, as no results of statistical tests confirm the significance of the findings provided.

      The panel B confirms that cytosolic SRB fluorescence displays increasing tendency upon water efflux and volume shrinking, and vice versa. As for the panel C, the number of mice is now indicated. Also, the downward change in the SRB fluorescence was now respectively calculated for the phases prior and post to TGN (and vehicle) application, and this panel is accordingly updated. TGN-020 induced a declining in astrocyte SRB fluorescence, which is validated by t-test performed in MATLAB. To clarify, we now add cross-link lines to indicate statistical significance between the corresponding groups (Fig 1C, middle). As for panel D, we calculated the SRB fluorescence change (decrease) relative to the photobleaching tendency illustrated by the dotted line. The significance was also validated by t-test performed in MATLAB.  

      - Figure 1: Please correct the figure - pictures in panel A are low quality and do not support the specificity of SRB for astrocytes. Panels B-D are easier to understand if plotted as normal X/Y charts with associated statistical findings. Some drawings are cut or not aligned.

      In GFAP-EGFP transgenic, astrocytes are labeled by EGFP. SRB labeling (red fluorescence) shows colocalization with EGFP-positive astrocytes, meanwhile not all EGFP-positive astrocytes are labeled by SRB. The PDF conversion procedure during the submission may also somehow have compromised image quality. We have tried to update and align the figure panels.  

      - Page 12: ' TGN-020 increased basal water diffusion within multiple regions including the cortex,

      hippocampus and the striatum in a heterogeneous manner (Figure 5C).'

      This sentence is updated now (page 12, line 12 – page13, line 2). It reads ‘The representative images reveal the enough image quality to calculate the ADC, which allow us to examine the effect of TGN-020 on water diffusion rate in multiple regions (Fig. 5C).’

      - The expression of AQP4 within the brain parenchyma is known to be heterogenous. Please familiarize yourself with works by Hubbard 2015, Mestre 2018, and Gomolka 2023. A correlation between ADC score and AQP4 expression ROI-wise would be useful, but it is not substantial to conduct this experiment.

      We thank the reviewer. This point is stressed on page 19, line 12-14.

      DISCUSSION:

      - Most of the issues are commented on above, so I suggest following the changes applied earlier. -Page 16: 'We show by DW-MRI that water transport by astrocyte aquaporin is critical for brain water homeostasis.' This statement is not clear and does not refer to the actual impact of the findings. DWI is allowed only to verify the changes of ADC fter the application of TGN-020. I suggest commenting on the recent report by Giannetto 2024 here.

      This sentence is now refined (page 19, line 1-2), followed by the updates commenting on the recent studies employing DW-MRI to evaluate brain fluid transport, including the work of (Giannetto et al., 2024) (page 19, line 3-10). 

      METHODS:

      - Page 18: no total number of mice included in all experiments is provided, as well as no clearly stated number of mice used in each experiment. Please correct.

      We have now double checked the number of the mice for the data presented and updated the figure legends accordingly (e.g., updates in legends fig1, fig5, etc).

      -  Page 18, line 7: 'Axscience' is not a producer of Isoflurane, but a company offering help with scientific manuscript writing. If this company's help was used, it should be stated in the acknowledgments section. Reference to ISOVET should be moved from line 15 to line 7.

      We apologize. We did not use external writing help, and now have removed the ‘Axcience’. The Isoflurane was under the mark ‘ISOVET’ from ‘Piramal’. This info is now moved up (page 21, line 11). 

      - Page 18, line 9: ' modified artificial cerebrospinal fluid (aCSF)'. Additional information on the reason for the modified aCSF would be useful for the reader.

      In this modified solution, the concentration of depolarizing ions (Na+, Ca2+) was reduced to lower the potential excitotoxicity during the tissue dissection (i.e., injury to the brain) for preparing the brain slices. Extra sucrose was added to balance the solution osmolarity. This solution has been used previously for the dissection and the slicing steps in adult mice (Jiang et al., 2016). We now add this justification in the text and quote the relevant reference (page 21, line14-16). 

      - Page 19, line 6: a reasoning for using Tamoxifen would be helpful for the reader.

      The Glast-CreERT2 is an inducible conditional mouse line that expresses Cre recombinase selectively in astrocytes upon tamoxifen injection. We now add this information in the text (page 22, line 10-11). 

      - Line 8 - 'Sigma'

      Fixed.

      - Line 7/8: It is not clear if ethanol is of 10% solution or if proportions of ethanol+tamoxifen to oil were of 1:9. The reasoning for each performed step is missing.

      We have now clarified the procedure (page 22, line 11-15).

      - Line 10: '/' means 'or'?

      Here, we mean the bigenic mice resulting from the crossing of the heterozygous Cre-dependent GCaMP6f and Glast-CreERT2 mouse lines. We now modify it to ‘Glast-CreERT2::Ai95GCaMP6f//WT’, in consistence with the presentation of other mouse lines in our manuscript (page 22, line 16).

      - Lines 22-23: being in-line with legislation was already stated at the beginning of the Methods so I suggest combining for clearance.

      Done. 

      - Page 21, line 4: it is good to mention which printer was used, but it would be worth mentioning the material the chamber was printed from - was it ABS?

      Yes. We add this info in the text now (page 24, line 5).

      - Line 9 -'PI' requires spelling out.

      It is ‘Physik Instrumente’, now added (page 24, line 10).

      - Line 11-12: What is the reason for background subtraction - clearer delineation of astrocytes/ increasing SNR in post-processing, or because SRB signal was also visible and changing in the background over time? Was the background removed in each frame independently (how many frames)? How long was the time-lapse and was the F0 frame considered as the first frame acquired? The background signal should be also measured and plotted alongside the astrocytic signal, as a reference (Figure 1). This should be clarified so that steps are to be followed easily.

      We sought to follow the temporal changes in SRB fluorescence signal. The acquired fluorescent images contain not only the SRB signals, but also the background signals consisting of for instance the biological tissue autofluorescence, digital camera background noise and the leak light sources from the environments. The value of the background signal was estimated by the mean fluorescence of peripheral cell-free subregions (15 × 15 µm²) and removed from all frames of time-lapse image stack. The traces shown in the figures reflect the full lengths of the time-lapse recordings. F0 was identified as the mean value of the 10 data points immediately preceding the detected fluorescence changes. The text is now updated (page 24 line 21 - page 25 line 5).

      - Line 15: Was astrocyte image delineation performed manually or automatically? Where was the center of the region considered in the reference to the astrocyte image? It would be good to see the regions delineated for reference.

      Astrocytes labeled by SRB were delineated manually with the soma taken as the center of the region of interest. We now exemplify the delineated region in Fig 1A, bottom.

      - Page 22, line 2: 'x4 objective'.

      Added (now, page 25, line 16). 

      - Line 3: 'barrels' - reference to publication or the explanation missing.

      The relevant reference is now added on barrel cortex (Erzurumlu and Gaspar, 2020) (page 25, line 19-20). 

      - Line 19: were the coordinates referred to = bregma?

      Yes. This info is now added (page 26, line 12). 

      - Line 20: was the habituation performed directly at the acquisition date? It is rather difficult to say that it was a habituation, but rather acute imaging. I suggest correcting, that mice were allowed to familiarize themselves with the setup for 30 minutes prior to the imaging start.

      In this context, although it is a very nice idea and experiment, the influence of acute stress in animals familiar with the setup only from the day of acquisition is difficult to avoid. It is a major concern, especially when considering norepinephrine as a master driver of neuronal and vascular activity through the brain, and strong activation of the hypothalamic-adrenal axis in response to acute stress. It is well known, that the response of monoamines is reduced in animals subjected to chronic v.s acute stress, but still larger than that if the stressor is absent.

      Major remark: The animals should, preferably, be imaged at least after 3 days of habituation based on existing knowledge. I suggest exploring the topic of the importance of habituation. It is difficult though, to objectively review these findings without considering stress and associated changes in vascular dynamics.

      Many thanks for the reviewer to help to precise this information. The text is accordingly updated to describe the experiment (now page 26, line 14). 

      - Page 23, line 17: number of animals included in experiments missing.

      The number of animals is added in Methods (page 27, line 12) and indicated in the legend of Figure 5. 

      - Line 18/19: were the respiratory effects observed after injection of saline or TGN-020? Since DWI was performed, the exclusion of perfusive flow on ADC is impossible.

      I suggest an additional experiment in n=3 animals per group, verifying the HR (and if possible BP) response after injection of TGN-020 and saline in mice.

      The respiratory rate has been recorded. We added the averaged respiratory rate before and after injection of TGN-020 or saline (now, Fig. S6; page 13, line 5-6).

      - Line 22: Please, provide the model of the scanner, the model of the cryoprobe, as well as the model of the gradient coil used, otherwise it is difficult to assess or repeat these experiments.

      We have now added the information of MRI system in Methods section (page 27, line17-21).

      - Page 24: line 3/4: although the achieved spatial resolution of DWI was good and slightly lower than desired and achievable due to limitations of the method itself as well as cryoprobe, it is acceptable for EPI in mice.

      Still, there is no direct explanation provided on the reasoning for using surface instead of volumetric coil, as well as on assuming an anisotropic environment (6 diffusion directions) for DWI measurements. This is especially doubtful if such a long echo-time was used alongside lower-thanpossible spatial resolution. Longer echo time would lower the SNR of the depicted signal but also would favor the depiction of signal from slow-moving protons and larger water pools. On the other hand, only 3 b-values were used, which is the minimum for ADC measurements, while a good research protocol could encompass at least 5 to increase the accuracy of ADC estimation and avoid undersampling between 250 and 1800 b-values. What was the reason for choosing this particular set of b-values and not 50, 600, and 2000? Besides, gradient duration time was optimally chosen, however, I have concerns about the decision for such a long gradient separation times.

      If the protocol could have been better optimized, the assessment could have been also performed in respiratory-gated mode, allowing minimization of the effects of one of the glymphatic system driving forces.

      Thus, I suggest commenting on these issues.

      We chose the cryoprobe to increase the signal-to-noise ratio (SNR) in DW-MRI with long echo-time and high b-value. The volume coil has a more homogeneous SNR in the whole brain rather than the cryoprobe, but SNR should be reduced compared with cryoprobe. We confirmed that, even at the ventral part of the brain, the image quality of DW-MRI images was enough to investigate the ADC with cryoprobe (Fig. 5B-C). This is mentioned now in Methods (page 27, line 17-21).

      We performed DW-MRI scanning for 5 min at each time-point using the condition of anisotropic resolution and 3 b-values, to investigate the time-course of ADC change following the injection of TGN020. Because the effect of TGN-020 appears about dozen of minutes post the injection (Igarashi et al., 2011), fast DW-MRI scanning is required. If isotropic DW-MRI with lower echo-time and more direction is used, longer scan time at each time point is required, maybe more than 1h. We agree that three bvalues is minimum to calculate the ADC and more b-values help to increase the accuracy. However, to achieve the temporal resolution so as to better catch the change of water diffusion, we have decided to use the minimum b-values. The previous study also validates the enough accuracy of DW-MRI with three b-values (Ashoor et al., 2019). Furthermore, previous study that used long diffusion time (> 20 ms) and long echo time (40 ms) shows the good mean diffusivity (Aggarwal et al., 2020), supporting that our protocol is enough to investigate the ADC. We have now updated the description (page 28 line 5-9).  The reason why we choose the b = 250 and 1800 s/mm² is that 2000 s/mm² seems too high to get the good quality of image. In the previous study, we have optimized that ADC is measurable with b = 0, 250, and 1800 s/mm² (Debacker et al., 2020). 

      - Page 24, line 7: What was the post-processing applied for images acquired over 70 minutes? Did it consider motion-correction, co-registration, or drift-correction crucial to avoid pitfalls and mismatches in concluding data?

      The motion correction and co-registration were explained in Methods (page 28, line 12-14).

      Also, were these trace-weighted images or magnitude images acquired since DTI software was used for processing - while ADC fitting could be reliably done in Matlab, Python, or other software. Thus, was DSI software considering all 3 b-values or just used 0 and 1800 for the calculation of mean diffusivity for tractography (as ADC). The details should be explained.

      DSIstudio was used with all three b values (b = 0, 250, and 1800 s/mm²) to calculate the ADC. We added the description in Methods (page 28, line 16-18).

      To make sure that the results are not affected by the MR hardware, I suggest performing 3 control measurements in a standard water phantom, and presenting the results alongside the main findings.

      Thanks for this suggestion. We have performed new experiments and now added the control measurement with three phantoms, that is water, undecane, and dodecane. These new data are summarized now in Fig. S7, showing the stability of ADC throughout the 70 min scanning. We have updated the description on Method part (page 28, line 9-11) and on the Results (page 13, line 6-8).  

      - Line 13: were the ROI defined manually or just depicted from previously co-registered Allen Brain atlas?

      The ROIs of the cortex, the hippocampus, and the striatum were depicted with reference to Allen mouse brain atlas (https://scalablebrainatlas.incf.org/mouse/ABA12). This is explained in Methods (page 28, line 14-16).

      - Line 10: why the average from 1st and 2nd ADC was not considered, since it would reduce the influence of noise on the estimation of baseline ADC?

      We are sorry that it was a typo. The baseline was the average between 1st and 2nd ADC. We corrected the description (page 28, line 20).

      STATISTIC:

      Which type of t-test - paired/unpaired/two samples was used and why? Mann-Whitney U-tets are used as a substitution for parametric t-tests when the data are either non-parametric or assuming normal distribution is not possible. In which case Bonferroni's-Holm correction was used? - I couldn't find any mention of any multiple-group analysis followed by multiple comparisons. Each section of the manuscript should have a description of how the quantitative data were treated and in which aim. I suggest carefully correcting all figures accordingly, and following the remarks given to the Figure 1.

      We used unpaired t-test for data obtained from samples of different conditions. Indeed, MannWhitney U-test is used when the data are non-parametric deviating from normal distributions.  Bonferroni-Holm correction was used for multiple comparisons (e.g., Fig. 4D-E).

      Reviewer #3 (Recommendations For The Authors):

      I think that the following statement is insufficient: "The authors commit to share data, documentation, and code used in analysis". My understanding is eLife expects that all key data to be provided in a supplement.

      We thank the reviewer; we follow the publication guidelines of eLife. 

      References

      Aggarwal, M., Smith, M.D., and Calabresi, P.A. (2020). Diffusion-time dependence of diffusional kurtosis in the mouse brain. Magn Reson Med 84, 1564-1578.

      Ashoor, M., Khorshidi, A., and Sarkhosh, L. (2019). Estimation of microvascular capillary physical parameters using MRI assuming a pseudo liquid drop as model of fluid exchange on the cellular level. Rep Pract Oncol Radiother 24, 3-11.

      Cauli, B., and Hamel, E. (2018). Brain Perfusion and Astrocytes. Trends in neurosciences 41, 409-413.

      Debacker, C., Djemai, B., Ciobanu, L., Tsurugizawa, T., and Le Bihan, D. (2020). Diffusion MRI reveals in vivo and non-invasively changes in astrocyte function induced by an aquaporin-4 inhibitor. PLoS One 15, e0229702.

      Erzurumlu, R.S., and Gaspar, P. (2020). How the Barrel Cortex Became a Working Model for Developmental Plasticity: A Historical Perspective. J Neurosci 40, 6460-6473.

      Farr, G.W., Hall, C.H., Farr, S.M., Wade, R., Detzel, J.M., Adams, A.G., Buch, J.M., Beahm, D.L., Flask, C.A., Xu, K., et al. (2019). Functionalized Phenylbenzamides Inhibit Aquaporin-4 Reducing Cerebral Edema and Improving Outcome in Two Models of CNS Injury. Neuroscience 404, 484-498.

      Giannetto, M.J., Gomolka, R.S., Gahn-Martinez, D., Newbold, E.J., Bork, P.A.R., Chang, E., Gresser, M., Thompson, T., Mori, Y., and Nedergaard, M. (2024). Glymphatic fluid transport is suppressed by the aquaporin-4 inhibitor AER-271. Glia.

      Gomolka, R.S., Hablitz, L.M., Mestre, H., Giannetto, M., Du, T., Hauglund, N.L., Xie, L., Peng, W., Martinez, P.M., Nedergaard, M., et al. (2023). Loss of aquaporin-4 results in glymphatic system dysfunction via brain-wide interstitial fluid stagnation. eLife 12.

      Huber, V.J., Tsujita, M., and Nakada, T. (2009). Identification of aquaporin 4 inhibitors using in vitro and in silico methods. Bioorg Med Chem 17, 411-417.

      Igarashi, H., Huber, V.J., Tsujita, M., and Nakada, T. (2011). Pretreatment with a novel aquaporin 4 inhibitor, TGN-020, significantly reduces ischemic cerebral edema. Neurol Sci 32, 113-116.

      Igarashi, H., Tsujita, M., Suzuki, Y., Kwee, I.L., and Nakada, T. (2013). Inhibition of aquaporin-4 significantly increases regional cerebral blood flow. Neuroreport 24, 324-328.

      Jiang, R., Diaz-Castro, B., Looger, L.L., and Khakh, B.S. (2016). Dysfunctional Calcium and Glutamate Signaling in Striatal Astrocytes from Huntington's Disease Model Mice. J Neurosci 36, 3453-3470.

      Mayo, F., Gonzalez-Vinceiro, L., Hiraldo-Gonzalez, L., Calle-Castillejo, C., Morales-Alvarez, S., Ramirez-Lorca, R., and Echevarria, M. (2023). Aquaporin-4 Expression Switches from White to Gray Matter Regions during Postnatal Development of the Central Nervous System. Int J Mol Sci 24.

      Mola, M.G., Sparaneo, A., Gargano, C.D., Spray, D.C., Svelto, M., Frigeri, A., Scemes, E., and Nicchia, G.P. (2016). The speed of swelling kinetics modulates cell volume regulation and calcium signaling in astrocytes: A different point of view on the role of aquaporins. Glia 64, 139-154.

      Risher, W.C., Andrew, R.D., and Kirov, S.A. (2009). Real-time passive volume responses of astrocytes to acute osmotic and ischemic stress in cortical slices and in vivo revealed by two-photon microscopy. Glia 57, 207-221.

      Salman, M.M., Kitchen, P., Yool, A.J., and Bill, R.M. (2022). Recent breakthroughs and future directions in drugging aquaporins. Trends Pharmacol Sci 43, 30-42.

      Shigetomi, E., Bushong, E.A., Haustein, M.D., Tong, X., Jackson-Weaver, O., Kracun, S., Xu, J., Sofroniew, M.V., Ellisman, M.H., and Khakh, B.S. (2013). Imaging calcium microdomains within entire astrocyte territories and endfeet with GCaMPs expressed using adeno-associated viruses. J Gen Physiol 141, 633-647.

      Solenov, E., Watanabe, H., Manley, G.T., and Verkman, A.S. (2004). Sevenfold-reduced osmotic water permeability in primary astrocyte cultures from AQP-4-deficient mice, measured by a fluorescence quenching method. Am J Physiol Cell Physiol 286, C426-432.

      Verkman, A.S., Binder, D.K., Bloch, O., Auguste, K., and Papadopoulos, M.C. (2006). Three distinct roles of aquaporin-4 in brain function revealed by knockout mice. Biochim Biophys Acta 1758, 10851093.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:  

      Reviewer #1 (Public Review):  

      Summary:  

      The authors have presented data showing that there is a greater amount of spontaneous differentiation in human pluripotent cells cultured in suspension vs static and have used PKCβ and Wnt signaling pathway inhibitors to decrease the amount of differentiation in suspension culture.  

      Strengths:  

      This is a very comprehensive study that uses a number of different rector designs and scales in addition to a number of unbiased outcomes to determine how suspension impacts the behaviour of the cells and in turn how the addition of inhibitors counteracts this effect. Furthermore, the authors were also able to derive new hiPSC lines in suspension with this adapted protocol.  

      Weaknesses:  

      The main weakness of this study is the lack of optimization with each bioreactor change. It has been shown multiple times in the literature that the expansion and behaviour of pluripotent cells can be dramatically impacted by impeller shape, RPM, reactor design, and multiple other factors. It remains unclear to me how much of the results the authors observed (e.g. increased spontaneous differentiation) was due to not having an optimized bioreactor protocol in place (per bioreactor vessel type). For instance - was the starting seeding density, RPM, impeller shape, feeding schedule, and/or any other aspect optimized for any of the reactors used in the study, and if not, how were the values used in the study determined?  

      Thank you for your thoughtful comments. According to your comments, we have performed several experiments to optimize the bioreactor conditions in revised manuscripts. We tested several cell seeding densities and several stirring speeds with or without WNT/PKCβ inhibitors  (Figure 6—figure supplement 1). We found that 1 - 2 x 105 cells/mL of the seeding densities and 50 - 150 rpm of the stirring speeds were applicable in the proliferation of these cells. Also, PKCβ and Wnt inhibitors suppressed spontaneous differentiation in bioreactor conditions regardless with stirring speeds. As for the impeller shape and reactor design, we just used commonly-used ABLE's bioreactor for 30 mL scale and Eppendorf's bioreactors for 320 mL scale, which had been designed and used for human pluripotent stem cell culture conditions in previous studies, respectively (Matsumoto et al., 2022 (doi: 10.3390/bioengineering9110613); Kropp et al., 2016 (doi: 10.5966/sctm.2015-0253)). We cited these previous studies in the Results and Materials and Methods section. We believe that these additional data and explanation are sufficient to satisfy your concerns on the optimization of bioreactor experiments.

      Reviewer #2 (Public Review):  

      This study by Matsuo-Takasaki et al. reported the development of a novel suspension culture system for hiPSC maintenance using Wnt/PKC inhibitors. The authors showed elegantly that inhibition of the Wnt and PKC signaling pathways would repress spontaneous differentiation into neuroectoderm and mesendoderm in hiPSCs, thereby maintaining cell pluripotency in suspension culture. This is a solid study with substantial data to demonstrate the quality of the hiPSC maintained in the suspension culture system, including long-term maintenance in >10 passages, robust effect in multiple hiPSC lines, and a panel of conventional hiPSC QC assays. Notably, large-scale expansion of a clinical grade hiPSC using a bioreactor was also demonstrated, which highlighted the translational value of the findings here. In addition, the author demonstrated a wide range of applications for the IWR1+LY suspension culture system, including support for freezing/thawing and PBMC-iPSC generation in suspension culture format. The novel suspension culture system reported here is exciting, with significant implications in simplifying the current culture method of iPSC and upscaling iPSC manufacturing.  

      Another potential advantage that perhaps wasn't well discussed in the manuscript is the reported suspension culture system does not require additional ECM to provide biophysical support for iPSC, which differentiates from previous studies using hydrogel and this should further simplify the hiPSC culture protocol.  

      Interestingly, although several hiPSC suspension media are currently available commercially, the content of these suspension media remained proprietary, as such the signaling that represses differentiation/maintains pluripotency in hiPSC suspension culture remained unclear. This study provided clear evidence that inhibition of the Wnt/PKC pathways is critical to repress spontaneous differentiation in hiPSC suspension culture.  

      I have several concerns that the authors should address, in particular, it is important to benchmark the reported suspension system with the current conventional culture system (eg adherent feeder-free culture), which will be important to evaluate the usefulness of the reported suspension system.  

      Thank you for this insightful suggestion. In this revised manuscript, we have performed additional experiments using conventional media, mTeSR1 (Stem Cell Technologies, Vancouver, Canada), comparing with the adherent feeder-free culture system in four different hiPSC lines simultaneously. Compared to the adherent conditions, the suspension conditions without chemical treatment decreased the expression of self-renewal marker genes/proteins and increased the expression levels of SOX17, T, and PAX6 (Figure 4 - figure supplement 2). Importantly, the treatment of LY333531 and IWR-1-endo in mTeSR1 medium reversed the decreased expression of these undifferentiated markers and suppressed the increased expression of differentiation markers in suspension culture conditions, reaching the comparable levels of the adherent culture conditions. These results indicated that these chemical treatments in suspension culture are beneficial even when using a conventional culture medium.

      Also, the manuscript lacks a clear description of a consistent robust effect in hiPSC maintenance across multiple cell lines.  

      Thank you for this insightful suggestion. We have performed additional experiments on hiPSC maintenance across 5 hiPSC lines in suspension culture using StemFit AK02N medium simultaneously (Figure 3C - E). Overall, the treatment of LY333531 and IWR-1-endo in the StemFit AK02N medium reversed the decreased expression of these undifferentiated markers and suppressed the increased expression of differentiation markers in suspension culture conditions. Also as above, we have added results using conventional media, mTeSR1, in comparison to the adherent feeder-free culture system in four different hiPSC lines simultaneously. These results show that this chemical treatment consistently produced robust effects in hiPSC maintenance across multiple cell lines using multiple conventional media.

      There are also several minor comments that should be addressed to improve readability, including some modifications to the wording to better reflect the results and conclusions.  

      In the revised manuscript, we have added and corrected the descriptions to improve readability, including some modifications to the wording to better reflect the results and conclusions. 

      Reviewer #3 (Public Review):  

      In the current manuscript, Matsuo-Takasaki et al. have demonstrated that the addition of PKCβ and WNT signaling pathway inhibitors to the suspension cultures of iPSCs suppresses spontaneous differentiation. These conditions are suitable for large-scale expansion of iPSCs. The authors have shown that they can perform single-cell cloning, direct cryopreservation, and iPSC derivation from PBMCs in these conditions. Moreover, the authors have performed a thorough characterization of iPSCs cultured in these conditions, including an assessment of undifferentiated stem cell markers and genetic stability. The authors have elegantly shown that iPSCs cultured in these conditions can be differentiated into derivatives of three germ layers. By differentiating iPSCs into dopaminergic neural progenitors, cardiomyocytes, and hepatocytes they have shown that differentiation is comparable to adherent cultures.

      This new method of expanding iPSCs will benefit the clinical applications of iPSCs.  

      Recently, multiple protocols have been optimized for culturing human pluripotent stem cells in suspension conditions and their expansion. Additionally, a variety of commercially available media for suspension cultures are also accessible. However, the authors have not adequately justified why their conditions are superior to previously published protocols (indicated in Table 1) and commercially available media. They have not conducted direct comparisons.  

      Thank you for this careful suggestion. In this revised manuscript, we have added results using a conventional medium, mTeSR1 (Stem Cell Technologies), which has been used for the suspension culture in several studies. Compared to the adherent conditions using mTeSR1 medium, the suspension conditions with the same medium decreased the ratio of TRA1-60/SSEA4-positive cells and OCT4positive cells and the expression levels of OCT4 and NANOG and decreased the expression levels of SOX17, T, and PAX6 in 4 different hiPSC lines simultaneously (Figure 4 - Supplement 2). Importantly, the treatment of LY333531 and IWR-1-endo in the mTeSR1 medium reversed the decreased expression of these undifferentiated markers. With these direct comparisons, we were able to justify why our conditions are superior to previously published protocols using commercially available media.

      Additionally, the authors have not adequately addressed the observed variability among iPSC lines. While they claim in the Materials and Methods section to have tested multiple pluripotent stem cell lines, they do not clarify in the Results section which line they used for specific experiments and the rationale behind their choices. There is a lack of comparison among the different cell lines. It would also be beneficial to include testing with human embryonic stem cell lines.  

      Thank you for this insightful suggestion. In this revised manuscript, we have added results on 5 different hiPSC lines at the same time (Figure 3 C-E). Excuse for us, but it is hard to use human embryonic stem cell lines for this study due to ethical issues in Japanese governmental regulations. The treatment of LY333531 and IWR-1-endo increased the expression of self-renewal marker genes/proteins and decreased the expression levels of SOX17, T, and PAX6 in these hiPSC lines in general. These results indicated that these chemical treatments in suspension culture were robust in general while addressing the observed variability among iPSC lines.

      Additionally, there is a lack of information regarding the specific role of the two small molecules in these conditions.  

      In this revised manuscript, we have added data and discussion regarding the specific role of the two small molecules in these conditions in the Results and Discussion section. For using WNT signaling inhibitor, we hypothesized that adding Wnt signaling inhibitors may inhibit the spontaneous differentiation of hiPSCs into mesendoderm. Because exogenous Wnt signaling induces the differentiation of human pluripotent stem cells into mesendoderm lineages (Nakanishi et al, 2009; Sumi et al, 2008; Tran et al, 2009; Vijayaragavan et al, 2009; Woll et al, 2008). Also, endogenous expression and activation of Wnt signaling in pluripotent stem cells are involved in the regulation of mesendoderm differentiation potentials (Dziedzicka et al, 2021). For using PKC inhibitors, "To identify molecules with inhibitory activity on neuroectodermal differentiation, hiPSCs were treated with candidate molecules in suspension conditions. We selected these candidate molecules based on previous studies related to signaling pathways or epigenetic regulations in neuroectodermal development (reviewed in (GiacomanLozano et al, 2022; Imaizumi & Okano, 2021; Sasai et al, 2021; Stern, 2024) ) or in pluripotency safeguards (reviewed in (Hackett & Surani, 2014; Li & Belmonte, 2017; Takahashi & Yamanaka, 2016; Yagi et al, 2017))." 

      We also found that the expression of naïve pluripotency markers, KLF2, KLF4, KLF5, and DPPA3, were up-regulated in the suspension conditions treated with LY333531 and IWR-1-endo while the expression of OCT4 and NANOG was at the same levels (Figure 5—figure supplement 2). Combined with RT-qPCR analysis data on 5 different hiPSC lines (Figure 3E), these results suggest that IWRLY conditions may drive hiPSCs in suspension conditions to shift toward naïve pluripotent states.

      The authors have not attempted to elucidate the underlying mechanism other than RNA expression analysis.  

      Regarding the underlying mechanisms, we have added results and discussion in the revised manuscript.  For Wnt activation in human pluripotent stem cells, several studies reported some WNT agonists were expressed in undifferentiated human pluripotent stem cells (Dziedzicka et al., 2021; Jiang et al, 2013; Konze et al, 2014). In suspension culture, cell aggregation causes tight cell-cell interaction. The paracrine effect of WNT agonists in the cell aggregation may strongly affect neighbor cells to induce spontaneous differentiation into mesendodermal cells. Thus, we think that the inhibition of WNT signaling is effective to suppress the spontaneous differentiation into mesendodermal lineages in suspension culture.

      For PKC beta activation in human pluripotent stem cells, we have shown that phosphorylated PKC beta protein expression is up-regulated in suspension culture than in adherent culture with western blotting (Figure 3 - figure supplement 1). The treatment of PKCβ inhibitor is effective to suppress spontaneous differentiation into neuroectodermal lineages. For future perspectives, it is interesting to examine (1) how and why PKCβ is activated (or phosphorylated), especially in suspension culture conditions, and (2) how and why PKCβ inhibition can suppress the neuroectodermal differentiation. Conversely, it is also interesting to examine how and why PKCβ activation is related to neuroectodermal differentiation.

      For these reasons some aspects of the manuscript need to be extended:  

      (1) It is crucial for authors to specify the culture media used for suspension cultures. In the Materials and Methods section, the authors mentioned that cells in suspension were cultured in either StemFit AK02N medium, 415 StemFit AK03N (Cat# AK03N, Ajinomoto, Co., Ltd., Tokyo, Japan), or StemScale PSC416 suspension medium (A4965001, Thermo Fisher Scientific, MA, USA). The authors should clarify in the text which medium was used for suspension cultures and whether they observed any differences among these media.  

      Sorry for this confusion. Basically in this study, we use StemFit AK02N medium (Figure 1-5, 7-9). For bioreactor experiments (Figure 6), we use StemFit AK03N medium, which is free of human and animalderived components and GMP grade. To confirm the effect of IWRLY chemical treatment, we use StemScale suspension medium (Figure 4 - figure supplement 1) and mTeSR1 medium (Figure 4 - figure supplement 2 and Figure 8 - figure supplement 1). In the revised manuscript we clarified which medium was used for suspension cultures in the Results and Materials and Methods section.

      Although we have not compared directly among these media in suspension culture (, which is primarily out of the focus of this study), we have observed some differences in maintaining self-renewal characteristics, preventing spontaneous differentiation (including tendencies to differentiate into specific lineages), stability or variation among different experimental times in suspension culture conditions. Overcoming these heterogeneity caused by different media, the IWRLY chemical treatment stably maintain hiPSC self-renewal in general. We have added this issue in the Discussion section.

      (2) In the Materials and Methods section, the authors mentioned that they used multiple cell lines for this study. However, it is not clear in the text which cell lines were used for various experiments. Since there is considerable variation among iPSC lines, I suggest that the authors simultaneously compare 2 to 3 pluripotent stem cell lines for expansion, differentiation, etc.  

      Thank you for this careful suggestion. We have added more results on the simultaneous comparison using StemFit AK02N medium in 5 different hiPSC lines (Figure 3 C-E) and using mTeSR1 medium in 4 different hiPSC lines (Figure 4 - figure supplement 2). From both results, we have shown that the treatment of LY333531 and IWR-1-endo was beneficial in maintaining the self-renewal of hiPSCs while suppressing spontaneous differentiation.

      (3) Single-cell sorting can be confusing. Can iPSCs grown in suspensions be single-cell sorted?

      Additionally, what was the cloning efficiency? The cloning efficiency should be compared with adherent cultures.  

      Sorry for this confusion. With our method, iPSCs grown in IWRLY suspension conditions can be singlecell sorted. We have improved the clarity of the schematics (Figure 7A). Also, we added the data on the cloning efficiency, which are compared with adherent cultures (Figure 7B). The cloning efficiency of adherent cultures was around 30%. While the cloning efficiency of suspension cultures without any chemical treatment was less than 10%, the IWR-1-endo treatment in the suspension cultures increased the efficiency was more than 20%. However, the treatment of LY333531 decreased the efficiency. These results indicated that the IWR-1-endo treatment is beneficial in single-cell cloning in suspension culture.

      (4) The authors have not addressed the naïve pluripotent state in their suspension cultures, even though PKC inhibition has been shown to drive cells toward this state. I suggest the authors measure the expression of a few naïve pluripotent state markers and compare them with adherent cultures  

      Thank you for this insightful comment. In the revised manuscript, we have added the data of RT-qPCR in 5 different hiPSC lines and specific gene expression from RNA-seq on naïve pluripotent state markers (Figure 3E and Figure 5 - figure supplement 2), respectively. Interestingly, the expression of KLF2, KLF4, KLF5, and DPPA3 is significantly up-regulated in IWRLY conditions. These results suggested that IWRLY suspension conditions drove hiPSCs toward naïve pluripotent state.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):  

      Overall, I feel that this study is very interesting and comprehensive, but has significant weaknesses in the bioprocessing aspects. More optimization data is required for the suspension culture to truly show that the differentiation they are observing is not an artifact of a non-optimized protocol.  

      Thank you for your thoughtful comments. Following your comments, we have performed several experiments to optimize the bioreactor conditions in revised manuscripts. We tested several cell seeding densities and several stirring speeds with or without WNT/PKCβ inhibitors (Figure 6—figure supplement 1). From these optimization experiments, we found that 1 - 2 x 105 cells/mL of the seeding densities and 50 - 150 rpm of the stirring speeds were applicable in the proliferation of these cells. Also, PKCβ and Wnt inhibitors suppressed spontaneous differentiation in bioreactor conditions regardless with acceptable stirring speeds. As for the impeller shape and reactor design, we just used commonly-used ABLE's bioreactor for 30 mL scale and Eppendorf's bioreactors for 320 mL scale, which had been designed and used for human pluripotent stem cell culture conditions in previous studies, respectively (Matsumoto et al., 2022 (doi: 10.3390/bioengineering9110613); Kropp et al., 2016 (doi:10.5966/sctm.2015-0253). We cited these previous studies in the Results section. We believe that these additional data and explanation are sufficient to satisfy your concerns on the optimization of bioreactor experiments.

      Reviewer #2 (Recommendations For The Authors):  

      The following comments should be addressed by the authors to improve the manuscript:  

      (1) Abstract: '...a scalable culture system that can precisely control the cell status for hiPSCs is not developed yet.' There were previous reports for a scalable iPSC culture system so I would suggest toning down/rephrasing this point: eg that improvement in a scalable iPSC culture system is needed.  

      Thank you for this careful suggestion. Following this suggestion, We have changed the sentence as "the improvement in a scalable culture system that can precisely control the cell status for hiPSCs is needed."

      (2) Line 71: please specify what media was used as a 'conventional medium' for suspension culture, was it Stemscale?  

      As suggested, we specified the media as StemFit AK02N used for this experiment. 

      (3) Fig 1E: It's not easy to see gating in the FACS plots as the threshold line is very faint, please fix this issue.  

      As suggested, we used thicker lines for the gating in the FACS plots (Figure 1E).

      (4) Fig 1G-J, Fig 2D-H: The RNAseq figures appeared pixelated and the resolution of these figures should be improved. The x-axis label for Fig 1H is missing.  

      We have improved these figures in their resolution and clarity. Also, we have added the x-axis label as "enrichment distribution" for gene set enrichment analysis (GSEA) in Figures 1H, 5F, and 5- figure supplement 1B.

      (5) Line 103-107: 'Since Wnt signaling induces the differentiation of human pluripotent stem cells into mesendoderm lineages, and is endogenously involved in the regulation of mesendoderm differentiation of pluripotent stem cells.....'. The two points seem the same and should be clarified.  

      Sorry for this unclear description. We have changed this description as "Exogenous Wnt signaling induces the differentiation of human pluripotent stem cells into mesendoderm lineages (Nakanishi et al, 2009; Sumi et al, 2008; Tran et al, 2009; Vijayaragavan et al, 2009; Woll et al, 2008). Also, endogenous expression and activation of WNT signaling in pluripotent stem cells are involved in the regulation of mesendoderm differentiation potentials (Dziedzicka et al, 2021; Jiang et al, 2013)." With this description, we hope that you will understand the difference of two points.

      (6) Line 113: 'In samples treated with inhibitors' should be 'In samples treated with Wnt inhibitors'.  

      Thank you for this careful suggestion. We have corrected this. 

      (7) Line 115: '....there was no reduction in PAX6 expression.' That's not entirely correct, there was a reduction in PAX6 in IWR-1 endo treatment compared to control suspension culture (is this significant?), but not consistently for IWP-2 treatment. Please rephrase to more accurately describe the results.  

      Sorry for this inaccurate description. We have corrected this phrase as "there was only a small reduction in PAX6 expression in the IWR-1-endo-treated condition and no reduction in the IWP2-treated condition" as recommended.

      (8) It's critical to show that the effect of the suspension culture system developed here can maintain an undifferentiated state for multiple hiPSC lines. I think the author did test this in multiple cell lines, but the results are scattered and not easy to extract. I would recommend adding info for the hiPSC line used for the results in the legend, eg WTC11 line was used for Figure 3, 201B7 line was used for Figure 2. I would suggest compiling a figure that confirms the developed suspension system (IWR-1 +LY) can support the maintenance of multiple hiPSC lines.  

      Thank you for this insightful suggestion. We have added data on hiPSC maintenance across 5 hiPSC lines in suspension culture using StemFit AK02N medium simultaneously (Figure 3C - E) and on hiPSC maintenance across 4 hiPSC lines in suspension culture using mTeSR1 medium simultaneously  (Figure 4 - figure supplement 2). Together, the treatment of LY333531 and IWR-1-endo in these media reversed the decreased expression of these undifferentiated markers and suppressed the increased expression of differentiation markers in suspension culture conditions. These results show that these chemical treatment produced a consistent robust effect in hiPSC maintenance across multiple cell lines.

      (9) Line 166: Please use the correct gene nomenclature format for a human gene (italicised uppercase) throughout the manuscript. Also, list the full gene name rather than PAX2,3,5.  

      Sorry for the incorrectness of the gene names. We have corrected them.

      (10) Please improve the resolution for Figure 4D.  

      We have provided clearer images of Figure 4D.

      (11) In the first part of the study, the control condition was referred to as 'suspension culture' with spontaneous differentiation, but in the later parts sometimes the term 'suspension culture' was used to describe the IWR1+LY condition (ie lines 271-272). I would suggest the authors carefully go through the manuscript to avoid misinterpretation on this issue.  

      Thank you for this careful suggestion. To avoid this misinterpretation on this issue, we use 'suspension culture' for just the conventional culture medium and 'LYIWR suspension culture' for the culture medium supplemented with LY333531 and IWR1-endo in this manuscript.

      (12) Figure 5: It is impressive to demonstrate that the IWR1+LY suspension culture enables large-scale expansion of a clinical-grade hiPSC line using a bioreactor, yielding 300 vials/passage. Can the author add some information regarding cell yield using a conventional adherent culture system in this cell line? This will provide a comparison of the performance of the IWR1+LY suspension culture system to the conventional method.  

      Thank you for this valuable suggestion. We have provided information regarding cell yield using a conventional adherent culture system in this cell line in the Results as "Since the population doubling time (PDT) of this hiPSC line in adherent culture conditions is 21.8 - 32.9 hours at its production (https://www.cira-foundation.or.jp/e/assets/file/provision-of-ips-cells/QHJI14s04_en.pdf), this proliferation rate in this large scale suspension culture is comparable to adherent culture conditions."

      (13) Line 273: For testing the feasibility of using IWR1+LY media to support the freeze and thaw process, the author described the cell number and TRA160+/OCT4+ cell %. How is this compared to conventional media (eg E8)? It would be nice to see a head-to-head comparison with conventional media, quantification of cell count or survival would be helpful to determine this.  

      For this issue, we attempted a direct freeze and thaw process using conventional media, StemFit AK02N in 201B7 line (Figure 8) or mTeSR1 in 4 different hiPSC lines(Figure 8 - figure supplement 1) with or without IWR1+LY. However, since the hiPSCs cultured in suspension culture conditions without IWR1+LY quickly lost their self-renewal ability, these frozen cells could not be recovered in these conditions nor counted. Our results indicate that the addition to IWR1+LY in the thawing process support the successful recovery in suspension conditions.

      (14) More details of the passaging method should be added in the method section. Do you do cell count following accutase dissociation and replate a defined density (eg 1x10^5/ml)?  

      Yes. We counted the cells in every passage in suspension culture conditions. We have added more explanation in the Materials and Methods as below.

      "The dissociated cells were counted with an automatic cell counter (Model R1, Olympus) with Trypan Blue staining to detect live/dead cells. The cell-containing medium was spun down at 200 rpm for 3 minutes, and the supernatant was aspirated. The cell pellet was re-suspended with a new culture medium at an appropriate cell concentration and used for the next suspension culture."

      (15) The IWR1+LY suspension culture system requires passage every 3-5 days. Is there still spontaneous differentiation if the hiPSC aggregate grows too big?  

      Thank you for this insightful question.

      Yes. The size of hiPSC aggregates is critical in maintaining self-renewal in our method as previous studies showed. Stirring speed is a key to make the proper size of hiPSC aggregates in suspension culture. Also, the culture period between passages is another key not to exceed the proper size of hiPSC aggregates. Thus, we keep stirring speed at 90 rpm (135 rpm for bioreactor conditions) basically and passaging every 3 - 5 days in suspension culture conditions.

      (16) Several previous studies have described the development of hiPSC suspension culture system using hydrogel encapsulation to provide biophysical modulation (reviewed in PMID: 32117992). In comparison, it seems that the IWR1+LY suspension system described here does not require ECM addition which further simplifies the culture system for iPSC. It would be good to add more discussion on this topic in the manuscript, such as the potential role of the E-cadherin in mediating this effect - as RNAseq results indicated that CDH1 was upregulated in the IWR1+LY condition).  

      Thank you for this valuable suggestion. We have added more discussion on this topic in the Discussion section as below.

      "Thus, our findings show that suspension culture conditions with Wnt and PKCβ inhibitors (IWRLY suspension conditions) can precisely control cell conditions and are comparable to conventional adhesion cultures regarding cellular function and proliferation. Many previous 3D culture methods intended for mass expansion used hydrogel-based encapsulation or microcarrier-based methods to provide scaffolds and biophysical modulation (Chan et al, 2020). These methods are useful in that they enable mass culture while maintaining scaffold dependence. However, the need for special materials and equipment and the labor and cost involved are concerns toward industrial mass culture. On the other hand, our IWRLY suspension conditions do not require special materials such as hydrogels, microcarriers, or dialysis bags, and have the advantage that common bioreactors can be used. "

      "On the other hand, it is interesting to see whether and how the properties of hiPSCs cultured in IWRLY suspension culture conditions are altered from the adherent conditions. Our transcriptome results in comparison to adherent conditions show that gene expression associated with cell-to-cell attachment, including E-cadherin (CDH1), is more activated. This may be due to the status that these hiPSCs are more dependent on cell-to-cell adhesion where there is no exogenous cell-to-substrate attachment in the three-dimensional culture. Previous studies have shown that cell-to-cell adhesion by E-cadherin positively regulates the survival, proliferation, and self-renewal of human pluripotent stem cells (Aban et al, 2021; Li et al, 2012; Ohgushi et al, 2010). Furthermore, studies have shown that human pluripotent stem cells can be cultured using an artificial substrate consisting of recombinant E-cadherin protein alone without any ECM proteins (Nagaoka et al, 2010). Also, cell-to-cell adhesion through gap junctions regulates the survival and proliferation of human pluripotent stem cells (Wong et al, 2006; Wong et al, 2004). These findings raise the possibility that the cell-to-cell adhesion, such as E-cadherin and gap junctions, are compensatory activated and support hiPSC self-renewal in situations where there are no exogenous ECM components and its downstream integrin and focal adhesion signals are not forcedly activated in suspension culture conditions. It will be interesting to elucidate these molecular mechanisms related to E-cadherin in the hiPSC survival and self-renewal in IWRLY suspension conditions in the future."

      Reviewer #3 (Recommendations For The Authors):  

      (1) I am a bit confused about the passage of adherent cultures. The authors claim that they used EDTA for passaging and plated cells at a density of 2500 cells/cm2. My understanding is that EDTA is typically used for clump passaging rather than single-cell passaging.  

      Sorry about this confusion. We routinely use an automatic cell counter (model R1, Olympus) which can even count small clumpy cells accurately. Thus, we show the cell numbers in the passaging of adherent hiPSCs.  

      (2) Figure 2D- The authors have not directly compared IWR-1-endo with IWR-1-endo+Go6983 for the expression of T and SOX17, a simultaneous comparison would be an interesting data.  

      As recommended, we have added the data that directly compared IWR-1-endo with IWR-1endo+Go6983 for the expression of T and SOX17 in Figure 2D. The addition of IWR-1-endo alone decreased the expression of T and SOX17, but not PAX6, which were similar to the data in Figure 2C.

      (3) Oxygen levels play a crucial role in pluripotency maintenance. Could the authors please specify the oxygen levels used for culturing cells in suspension?  

      Sorry for not mentioning about oxygen levels in this study. We basically use normal oxygen levels (i.e., 21% O2) in suspension culture conditions. We have explained this in the Materials and Methods section.

      (4) Figure supplement 1 (G and H): In the images, it is difficult to determine whether the green (PAX6 and SOX17) overlaps with tdT tomato. For better visualization, I suggest that the authors provide separate images for the green and red colors, as well as an overlay.  

      Sorry for these unclear images. We have provided separate images for the green and red colors, as well as an overlay in Figure 1- figure supplement 1 G and H.

      (5) The authors have only compared quantitatively the expression of TRA-1-60 for most of the figures. I suggest that the authors quantitatively measure the expression of other markers of undifferentiated stem cells, such as NANOG, OCT4, SSEA4, TRA-1-81, etc.  

      We have added the quantitative data of the expression of markers of undifferentiated hiPSCs including NANOG, OCT4, SSEA4, and TRA-1-60 on 5 different hiPSC lines in Figure 3 C-E.

      (6) In Figure 2D, the authors have tested various small molecules but the rationale behind testing those molecules is missing in the text.  

      These molecules are chosen as putatively affecting neuroectodermal induction from the pluripotent state.

      We have added the rationale with appropriate references in the Results section as below.

      "We have chosen these candidate molecules based on previous studies related to signaling pathways or epigenetic regulations in neuroectodermal development (reviewed in (Giacoman-Lozano et al, 2022; Imaizumi & Okano, 2021; Sasai et al, 2021; Stern, 2024) ) or in pluripotency safeguards (reviewed in (Hackett & Surani, 2014; Li & Belmonte, 2017; Takahashi & Yamanaka, 2016; Yagi et al, 2017)) (Figure 2A; listed in Supplementary Table 1). "

      (7) In the beginning authors used Go6983 but later they switched to LY333531, the reasoning behind the switch is not explained well.  

      To explain the reasons for switching to LY333531 from Go6983 clearly, we reorganized the order of results and figures. In short, we found that the suppression of PAX6 expression in hiPSCs cultured in suspension conditions was observed with many PKC inhibitors, all of which possessed PKCβ inhibition activity (Figure 2—figure supplement 2B-D). Also, elevated expression of PKCβ in suspension-cultured hiPSCs could affect the spontaneous differentiation (Figure 3—figure supplement 1A-C). To further explore the possibility that the inhibition of PKCβ is critical for the maintenance of self-renewal of hiPSCs in the suspension culture, we evaluated the effect of LY333531, a PKCβ specific inhibitor. The maintenance of suspension-cultured hiPSCs is specifically facilitated by the combination of PKCβ and Wnt signaling inhibition (Figure 3A and B; Figure 2—figure supplement 1). Last, we performed longterm culture for 10 passages in suspension conditions and compared hiPSC growth in the presence of LY333531 or Go6983. LY333531 was superior in the proliferation rate and maintaining OCT4 protein expression in the long-term culture (Figure 4). Thus, we used IWR-1-endo and LY333531 for the rest of this study.

      (8) I suggest the authors measure cell death after the treatment with LY+IWR-1-endo.  

      Thank you for this valuable suggestion. We have measured cell death after the treatment with LY+IWR1-endo and found that the chemical combination had no or little effects on the cell death. We have added data in Figure 3—figure supplement 2 and the description in the Results section as below. "We also examined whether the combination of PKCb and Wnt signaling inhibition affects the cell survival in suspension conditions. In this experiment, we used another PKC inhibitor, Staurosporine (Omura et al, 1977), which has a strong cytotoxic effect as a positive control of cell death in suspension conditions. The addition of IWR-1-endo and LY333531 for 10 days had no effects on the apoptosis while the addition of Staurosporine for 2 hours induced Annexin-V-positive apoptotic cells  (Figure 3—figure supplement 2). These results indicate that the combination of PKCb and Wnt signaling inhibition has no or little effects on the cell survival in suspension conditions."

      (9) The authors have performed reprogramming using episomal vectors and using Sendai viruses. In both the protocols authors have added small molecules at different time points, for episomal vector protocol at day 3 and Sendai virus protocol at day 23. Why is this different?  

      Thank you for this insightful question. We intended that these differences should be reflected in the degree of the expression from these reprogramming vectors. The expression of reprogramming factors from these vectors should suppress the spontaneous differentiation in reprogramming cells. Sendai viral vectors should last longer than episomal plasmid vectors. Thus, we thought that adding these chemical inhibitors for episomal plasmid vector conditions from the early phase of reprogramming and for Sendai viral vector conditions from the late phase of reprogramming. For future perspectives, we might further need to optimize the timing of adding these molecules.

      (10) The protocol for three germ layer differentiation using a specific differentiation medium requires further elaboration. For instance, the authors mentioned that suspension cultures were transferred to differentiation media but did not emphasize the cell number and culture conditions before moving the cultures to the differentiation media.  

      Sorry for this unclear description. We have added the explanation on the cell number and culture conditions before moving the cultures to the differentiation media in the Materials and Methods section as below.

      "As in the maintenance conditions, 4 × 105 hiPSC were seeded in one well of a low-attachment 6-well plate with 4 mL of StemFit AK02N medium supplemented with 10 µM Y-27632. This plate was placed onto the plate shaker in the CO2 incubator. Next day, the medium was changed to the germ layer specific differentiation medium."

    1. Author response:

      Joint Public Reviews:

      Here, the authors compare how different operationalizations of adverse childhood experience exposure related to patterns of skin conductance response during a fear conditioning task. They use a large dataset to definitively understand a phenomenon that, to date, has been addressed using a range of different definitions and methods, typically with insufficient statistical power. Specifically, the authors compared the following operationalizations: dichotomization of the sample into "exposed" and "non-exposed" categories, cumulative adversity exposure, specificity of adversity exposure, and dimensional (threat versus deprivation) adversity exposure. The paper is thoughtfully framed and provides clear descriptions and rationale for procedures, as well as package version information and code. The authors' overall aim of translating theoretical models of adversity into statistical models, and comparing the explanatory power of each model, respectively, is an important and helpful addition to the literature. However, the analysis would be strengthened by employing more sophisticated modelling techniques that account for between-subjects covariates and the presentation of the data needs to be streamlined to make it clearer for the broad audience for which it is intended.

      Strengths

      Several outstanding strengths of this paper are the large sample size and its primary aim of statistically comparing leading theoretical models of adversity exposure in the context of skin conductance response. This paper also helpfully reports Cohen's d effect sizes, which aid in interpreting the magnitude of the findings. The methods and results are generally thorough.

      Weaknesses

      Weakness 1: The largest concern is that the paper primarily relies on ANOVAs and pairwise testing for its analyses and does not include between-subjects covariates. Employing mixedeffects models instead of ANOVAs would allow more sophisticated control over sources of random variance in the sample (especially important for samples from multi-site studies such as the present study), and further allow the inclusion of potentially relevant between-subjects covariates such as age (e.g. Eisenstein et al., 1990) and gender identity or sex assigned at birth (e.g. Kopacz II & Smith, 1971) (perhaps especially relevant due to possible to gender or sex-related differences in ACE exposure; e.g. Kendler et al., 2001). Also, proxies for socioeconomic status (e.g. income, education) can be linked with ACE exposure (e.g. Maholmes & King, 2012) and warrant consideration as covariates, especially if they differ across adversity-exposed and unexposed groups. 

      We appreciate the reviewer's suggestion and recognize the value of using (more) sophisticated statistical methods. However, we think that considerations which methods to employ should not only be guided by perceived complexity and think that the chosen ANOVA -based approach provides reliable and valid data. In our revision, we address the reviewer's suggestion by demonstrating that employing mixed models leaves the reported results unchanged (a). We would also like to refer the reviewer to the robustness analyses provided in the initial supplementary material (b).

      a) Re-running analyses using mixed models

      Based on the reviewers' suggestion, we repeated our main analyses (association between exposure to childhood adversity and SCRs, arousal, valence, and contingency ratings during fear acquisition and generalization) using linear mixed models, including age, sex, educational attainment, and childhood adversity as fixed effects, and site as a random effect. These analyses produced results similar to those in our manuscript, demonstrating a significant effect of childhood adversity on SCRs, as assessed by CS discrimination during both acquisition training and the generalization phase, and on general reactivity, but not on linear deviation scores (LDS). For the different rating types, we did not observe any significant effects of childhood adversity.

      We would prefer to retain our main analyses as they are and report the linear mixed model results as additional results in the supplement. However, if the reviewer and editor have strong preferences otherwise, we are open to presenting the mixed models in the main manuscript and moving our previous analyses to the supplement.

      We added the following paragraph to the main manuscript (page 25-26):

      “At the request of a reviewer, we repeated our main analyses by using linear mixed models including age, sex, school degree (i.e., to approximate socioeconomic status), and exposure to childhood adversity as mixed effects as well as site as random effect. These analyses yielded comparable results demonstrating a significant effect of childhood adversity on CS discrimination during acquisition training and the generalization phase as well as on general reactivity, but not on the generalization gradients in SCRs (see Supplementary Table 2 A). Consistent with the results of the main analyses reported in our manuscript, we did not observe any significant effects of childhood adversity on the different types of ratings when using mixed models (see Supplementary Table 2 B-D). Some of the mixed model analyses showed significantly lower CS discrimination during acquisition training and generalization, and lower general reactivity in males compared to females (see Supplementary Table 2 for details).”

      b) Additional robustness tests for the main analyses (already provided in the initial submission as supplementary material)

      We would also like to refer the reviewer to the robustness analyses in the initial supplement to account for possible site effects. Adding site to the analyses affected the pvalue in only one instance: entering site as covariate in analyses of CS discrimination during acquisition training attenuated the p-value of the ACQ exposure effect from p = 0.020 to p = 0.089.

      Further robustness checks involved repeating our main analyses while excluding (a) physiological non-responders (participants with only SCRs = 0) and (b) extreme outliers (data points ± 3 SDs from the mean) to ensure generalizable results. These repetitions of the analyses did not lead to any changes in the results.

      We did not include age in our primary analyses due to the homogeneity of our sample and the lack of related hypotheses. Additionally, socio-economic status was assessed only crudely via the highest education level attained, rendering it of limited use.

      Weakness 2: On a related methodological note, the authors mention that scores representing threat and deprivation were not problematically collinear due to VIFs being <10; however, some sources indicate that VIFs should be <5 (e.g. Akinwande et al., 2015).

      We thank the reviewer for bringing different cut-offs to our attention. We have revised this section to highlight the arbitrary nature of their interpretation (page 33):

      “Within the dimensional model framework, the issue of multicollinearity among predictors (i.e., different childhood adversity types) is frequently discussed (McLaughlin et al., 2021; Smith & Pollak, 2021). If we apply the rule of thumb of a variance inflation factor (VIF) > 10, which is often used in the literature to indicate concerning multicollinearity (e.g., Hair, Anderson, Tatham, & Black, 1995; Mason, Gunst, & Hess, 1989; Neter, Wasserman, & Kutner, 1989), we can assume that that multicollinearity was not a concern in our study (abuse: VIF = 8.64; neglect: VIF = 7.93). However, some authors state that VIFs should not exceed a value of 5 (e.g., Akinwande, Dikko, and Samson (2015)), while others suggest that these rules of thumb are rather arbitrary (O’brien, 2007).”

      Weakness 3: Additionally, the paper reports that higher trait anxiety and depression symptoms were observed in individuals exposed to ACEs, but it would be helpful to report whether patterns of SCR were in turn associated with these symptom measures and whether the different operationalizations of ACE exposure displayed differential associations with symptoms.

      We thank the reviewer for highlighting these relevant points. We have included additional analyses in the supplementary material in response to this comment. Figures and the corresponding text are also copied below for your convenience.

      We added the following paragraphs to the main manuscript: Methods (page 21):

      “Analyses of trait anxiety and depression symptoms

      To further characterize our sample, we compared individuals being unexposed compared to exposed to childhood adversity on trait anxiety and depression scores by using Welch tests due to unequal variances.

      On the request of a reviewer, we additionally investigated the association of childhood adversity as operationalized by the different models used in our explanatory analyses (i.e., cumulative risk, specificity, and dimensional model) and trait anxiety as well as depression scores (see Supplementary Figure 7). By using STAI-T and ADS-K scores as independent variable, we calculated a) a comparison of conditioned responding of the four severity groups (i.e., no, low, moderate, severe exposure to childhood adversity) using one-way ANVOAs and the association with the number of sub-scales exceeding an at least moderate cut-off in simple linear regression models for the implementation of the cumulative risk model, and b) the association with the CTQ abuse and neglect composite scores in separate linear regression models for the implementation of the specificity/dimensional models. On request of the reviewer, we also calculated the Pearson correlation between trait anxiety (i.e., STAI-T scores), depression scores (i.e., ADS-K scores) and conditioned responding in SCRs (see Supplementary Table 8).”

      Results (page 38):

      “Analyses of trait anxiety and depression symptoms

      As expected, participants exposed to childhood adversity reported significantly higher trait anxiety and depression levels than unexposed participants (all p’s < 0.001; see Table 1 and Supplementary Figure 6). This pattern remained unchanged when childhood adversity was operationalized differently - following the cumulative risk approach, the specificity, and dimensional model (see methods). These additional analyses all indicated a significant positive relationship between exposure to childhood adversity and trait anxiety as well as depression scores irrespective of the specific operationalization of “exposure” (see Supplementary Figure 7).

      CS discrimination during acquisition training and the generalization phase, generalization gradients, and general reactivity in SCRs were unrelated to trait anxiety and depression scores in this sample with the exception of a significant association between depression scores and CS discrimination during fear acquisition training (see Supplementary Table 8). More precisely, a very small but significant negative correlation was observed indicating that high levels of depression were associated with reduced levels of CS discrimination (r = -0.057, p =0.033). The correlation between trait anxiety levels and CS discrimination during fear acquisition training was not statistically significant but on a descriptive level, high anxiety scores were also linked to lower CS discrimination scores (r = -0.05, p = 0.06) although we highlight that this should not be overinterpreted in light of the large sample. However, both correlations (i.e., CS-discrimination during fear acquisition training and trait anxiety as well as depression, respectively) did not statistically differ from each other (z = 0.303, p = 0.762, Dunn & Clark, 1969). Interestingly, and consistent with our results showing that the relationship between childhood adversity and CS discrimination was mainly driven by significantly lower CS+ responses in exposed individuals, trait anxiety and depression scores were significantly associated with SCRs to the CS+, but not to the CS- during acquisition training (see Supplementary Table 8).”

      Weakness 4: Given the paper's framing of SCR as a potential mechanistic link between adversity and mental health problems, reporting these associations would be a helpful addition. These results could also have implications for the resilience interpretation in the discussion (lines 481-485), which is a particularly important and interesting interpretation.

      We have added a paragraph on this to the discussion (page 41):

      “Interestingly, in our study, trait anxiety and depression scores were mostly unrelated to SCRs, defined by CS discrimination and generalization gradients based on SCRs as well as general SCR reactivity, with the exception of a significant - albeit minute - relationship between CS discrimination during acquisition training and depression scores (see above). Although reported associations in the literature are heterogeneous (Lonsdorf et al., 2017), we may speculate that they may be mediated by childhood adversity. We conducted additional mediation analyses (data not shown) which, however, did not support this hypothesis. As the potential links between reduced CS discrimination in individuals exposed to childhood adversity and the developmental trajectories of psychopathological symptoms are still not fully understood, future work should investigate these further in - ideally - prospective studies.”

      Weakness 5: Given that the manuscript criticizes the different operationalizations of childhood adversity, there should be greater justification of the rationale for choosing the model for the main analyses. Why not the 'cumulative risk' or 'specificity' model? Related to this, there should also be a stronger justification for selecting the 'moderate' approach for the main analysis. Why choose to cut off at moderate? Why not severe, or low? Related to this, why did they choose to cut off at all? Surely one could address this with the continuous variable, as they criticize cut-offs in Table 2.

      We thank the reviewers and editors for bringing to our attention that our reasoning for choosing the main model was not clear. As outlined in the manuscript, we chose the approach for the main analyses from the literature as a recent review on this topic (Ruge et al., 2023) has shown the moderate CTQ cut-off to be the most abundantly employed in the field of research on associations between childhood adversity and threat learning. We have made this rationale more explicit in our revised manuscript (page 15/21):

      “Operationalization of "exposure"

      We implemented different approaches to operationalize exposure to childhood adversity in the main analyses and exploratory analyses (see Table 2). In the main analyses, we followed the approach most commonly employed in the field of research on childhood adversity and threat learning - using the moderate exposure cut-off of the CTQ (for a recent review see Ruge et al. (2024)). In addition, the heterogeneous operationalizations of classifying individuals into exposed and unexposed to childhood adversity in the literature (Koppold, Kastrinogiannis, Kuhn, & Lonsdorf, 2023; Ruge et al., 2024) hampers comparison across studies and hence cumulative knowledge generation. Therefore, we also provide exploratory analyses (see below) in which we employ different operationalizations of childhood adversity exposure.”

      “Exploratory analyses

      Additionally, the different ways of classifying individuals as exposed or unexposed to childhood adversity in the literature (Koppold et al., 2023; for discussion see Ruge et al., 2024) hinder comparison across studies and hence cumulative knowledge generation. Therefore, we also conducted exploratory analyses using different approaches to operationalize exposure to childhood adversity (see Table 2 for details).”

      Furthermore, as correctly noted, we fully agree that employing the moderate cut-off (or any cut-off in fact) is in principle an arbitrary decision - despite being guided by and derived from the literature in the field. However, we would like to draw the reviewers’ attention to Figure 5 in the initial submission (please see also below): Although the differences in SCR between severity groups were not significant, the overall pattern suggests at a descriptive level that the decline in CS discrimination, LDS and general reactivity in SCR occurs mainly when childhood adversity exceeds a moderate level. Thus, while we used the moderate cut-off as it was recently shown to be the most widely used approach in the literature (see Ruge et al., 2023), our exploratory analyses also seem to suggest on a descriptive level, that this cut-off may indeed “make sense”. We also refer to this in the results section (page 31-32) and discussion (page 43-44):

      Results:

      “However, on a descriptive level (see Figure 5), it seems that indeed exposure to at least a moderate cut-off level may induce behavioral and physiological changes (see main analysis, Bernstein & Fink, 1998). This might suggest that the cut-off for exposure commonly applied in the literature (see Ruge et al., 2024) may indeed represent a reasonable approach.”

      Discussion:

      “It is noteworthy, however, that this cut-off appears to map rather well onto psychophysiological response patterns observed here (see Figure 5). More precisely, our exploratory results of applying different exposure cut-offs (low, moderate, severe, no exposure) seem to indicate that indeed a moderate exposure level is “required” for the manifestation of physiological differences, suggesting that childhood adversity exposure may not have a linear or cumulative effect.”

      Weakness 6: In the Introduction, the authors predict less discrimination between signals of danger (CS+) and safety (CS-) in trauma-exposed individuals driven by reduced responses to the CS+. Given the potential impact of their findings for a larger audience, it is important to give greater theoretical context as to why CS discrimination is relevant here, and especially what a reduction in response specifically to danger cues would mean (e.g. in comparison to anxiety, where safety learning is impacted).

      We thank the reviewer for highlighting that this was not sufficiently clear. We revised the paragraph in the introduction as follows (page 7-8):

      “Fear acquisition as well as extinction are considered as experimental models of the development and exposure-based treatment of anxiety- and stress-related disorders. Fear generalization is in principle adaptive in ensuring survival (“better safe than sorry”), but broad overgeneralization can become burdensome for patients. Accordingly, maintaining the ability to distinguish between signals of danger (i.e., CS+) and safety (i.e., CS-) under aversive circumstances is crucial, as it is assumed to be beneficial for healthy functioning (Hölzel et al., 2016) and predicts resilience to life stress (Craske et al., 2012), while reduced discrimination between the CS+ and CS- has been linked to pathological anxiety (Duits et al., 2015; Lissek et al., 2005): Meta-analyses suggest that patients suffering from anxiety- and stress-related disorders show enhanced responding to the safe CS- during fear acquisition (Duits et al., 2015). During extinction, patients exhibit stronger defensive responses to the CS+ and a trend toward increased discrimination between the CS+ and CS- compared to controls, which may indicate delayed and/or reduced extinction (Duits et al., 2015). Furthermore, meta-analytic evidence also suggests stronger generalization to cues similar to the CS+ in patients and more linear generalization gradients (Cooper, van Dis, et al., 2022; Dymond, Dunsmoor, Vervliet, Roche, & Hermans, 2015; Fraunfelter, Gerdes, & Alpers, 2022). Hence, aberrant fear acquisition, extinction, and generalization processes may provide clear and potentially modifiable targets for intervention and prevention programs for stress-related psychopathology (McLaughlin & Sheridan, 2016).”

      Recommendations for the authors:

      Abstract:

      Comment 1:

      (a) It does not succinctly describe the background rationale well (i.e. it tries to say too much). It should be streamlined. There is a lot of 'jargon', which muddies the results, and too many concepts are introduced at each part and assume knowledge from the reader. 

      We thank the reviewer for providing constructive guidance for revisions. We have revised our abstract according to these suggestions.

      (b) Multiple terms for childhood trauma are used: ACEs, early adversity, childhood trauma, and childhood maltreatment. Choose one term and stick to it to enhance clarity. Why not just use childhood adversity, as in the title? Related to this, the use of ACEs sets up an expectation that ACE questionnaire was used, so readers are then surprised to find they used the childhood trauma questionnaire.

      We thank the reviewer for bringing this to our attention. As suggested by the reviewer, we use the term “childhood adversity” in our revised manuscript.

      Introduction:

      Comment 2:

      The phrasing seems to 'exaggerate' the trauma problem and is too broad in the first paragraph - e.g., "two-thirds of people experience one or more traumatic events..." It is important to clarify that not all of these people will go on to develop behavioral, somatic, and psychopathological conditions. Could break this down more into how many people have low, moderate, or severe for clarity, as 1 childhood adversity is different to 5+, and the type.

      We thank the reviewer for bringing this to our attention and have revised the first paragraph accordingly (page 6). Please note, however, that in the literature typically a specific cut-off (e.g. moderate) is used and the number of individuals that would meet different cut-offs (e.g., low and high) are not specifically reported.

      “Exposure to childhood adversity is rather common, with nearly two thirds of individuals experiencing one or more traumatic events prior to their 18th birthday (McLaughlin et al., 2013). While not all trauma-exposed individuals develop psychopathological conditions, there is some evidence of a dose-response relationship (Danese et al., 2009; Smith & Pollak, 2021; Young et al., 2019). As this potential relationship is not yet fully clear, understanding the mechanisms by which childhood adversity becomes biologically embedded and contributes to the pathogenesis of stress-related somatic and mental disorders is central to the development of targeted intervention and prevention programmes.”

      Comment 3:

      The published cut-offs for exposed/unexposed should be indicated here.

      We have included the published cut-offs as suggested (page 10):

      We operationalize childhood adversity exposure through different approaches: Our main analyses employ the approach adopted by most publications in the field (see Ruge et al., 2024 for a review) - dichotomization of the sample into exposed vs. unexposed based on published cut-offs for the Childhood Trauma Questionnaire [CTQ; Bernstein et al. (2003); Wingenfeld et al. (2010)]. Individuals were classified as exposed to childhood adversity if at least one CTQ subscale met the published cut-off (Bernstein & Fink, 1998; Häuser, Schmutzer, & Glaesmer, 2011) for at least moderate exposure (i.e., emotional abuse  13, physical abuse  10, sexual abuse  8, emotional neglect  15, physical neglect  10).

      Comment 4:

      Please check for overly complex sentences, and reduce the complexity. For example: "In addition, we provide exploratory analyses that attempt to translate dominant (verbal) theoretical accounts (McLaughlin et al., 2021; Pollak & Smith, 2021) on the impact of exposure to ACEs into statistical tests while acknowledging that such a translation is not unambiguous and these exploratory analyses should be considered as showcasing a set of plausible solutions."

      We have revised this section and carefully proofread our manuscript by paying attention to this (page 10):

      “In addition, we provide exploratory analyses that attempt to translate dominant (verbal) theoretical accounts (McLaughlin et al., 2021; Pollak & Smith, 2021) on the impact of exposure to childhood adversity into statistical tests. At the same time, we acknowledge that such a translation is not unambiguous and these exploratory analyses should be considered as showcasing a set of plausible solutions”

      Here is another example of reducing the complexity of our sentences (page 6):

      “Learning is a core mechanism through which environmental inputs shape emotional and cognitive processes and ultimately behavior. Thus, learning mechanisms are key candidates potentially underlying the biological embedding of exposure to childhood adversity and their impact on development and risk for psychopathology (McLaughlin & Sheridan, 2016).”

      Methods:

      Comment 5:

      Is this study part of a larger project? These outcomes were probably not the primary outcomes of this multicenter project. The readers need to understand how this (crosssectional?) analysis was nested in this larger trial.

      We thank the reviewers and editor for bringing to our attention that this was not sufficiently clear. Thus far, we included the information that we used the participants recruited for large multicentric study in the main manuscript, but point to the inclusion of more information in the supplement (page 11):

      “In total, 1678 healthy participants (age_M_ = 25.26 years, age_SD_ = 5.58 years, female = 60.10%, male = 39.30%) were recruited in a multi-centric study at the Universities of Münster, Würzburg, and Hamburg, Germany (SFB TRR58). Data from parts of the Würzburg sample have been reported previously (Herzog et al., 2021; Imholze et al., 2023; Schiele, Reinhard, et al., 2016; Schiele, Ziegler, et al., 2016; Stegmann et al., 2019). These previous reports, also those focusing on experimental fear conditioning (Schiele, Reinhard, et al., 2016; Stegmann et al., 2019), addressed, however, research questions different from the ones investigated here (see also Supplementary Material for details).”

      Moreover, we have included additional information on the larger trial in our revised supplement (page 2):

      “Participants of this study were recruited in a multi-centric collaborative research center “Fear, anxiety, anxiety disorders” joining forces between the Universities of Hamburg,

      Würzburg, and Münster, Germany (SFB TRR58). During the second funding period of (20132016), all three sites recruited a large sample (N ~500) in the context of the Z project. All participants underwent the cross-sectional experimental paradigm reported here and were additionally extensively characterized to allow specific subprojects to recruit target subpopulations serving different aims with a focus on molecular genetic, epigenetic, or other research questions (see Herzog et al. (2021); Imholze et al. (2023); Schiele, Reinhard, et al. (2016); Schiele, Ziegler, et al. (2016); Stegmann et al. (2019)). The question on the association of exposure to childhood adversity and recent adversity was part of the primary research question of one subproject led by the senior author of this work (B07, TBL) and was hence a research question of primary interest also for this multicentric project.”

      Comment 6:

      Table 1 does not include percentages (a reader must calculate them: for example, 15% exposed?). These numbers belong in the results (i.e., it is confusing to read about the exposed/non-exposed before we know how it has been calculated).

      We have added the percentages as suggested and have included information on how exposed and unexposed was calculated as a table caption. We have considered moving the table to the results section but find it more suitable here. 

      Comment 7:

      A procedure figure could be useful.

      We thank the reviewer for this advice and have included a procedure figure in the supplementary material.

      Comment 8:

      Physiological data recordings and processing paragraph: The reasoning as to why the authors chose log transformation over square root transformation, or an approach that does not require transformation is not clear.

      We thank the reviewer for notifying us that we did not make this point clear enough. We opted for a log-transformation and range-correction of the SCR data because we use these transformations consistently in our laboratory (e.g., Ehlers et al., 2020; Kuhn et al., 2016; Scharfenort & Lonsdorf, 2016; Sjouwerman et al., 2015; Sjouwerman et al. 2020). In addition, log-transformed and range-corrected data are assumed to be closer to a normal distribution, to have a lower error variance resulting in larger effect sizes (Lykken & Venables, 1971; Lykken, 1972; Sjouwerman et al., 2022), and appear to have - at least descriptively - higher reliability compared to raw data (Klingelhöfer-Jens et al., 2022). We added a sentence on this to the methods section (page 14):

      Note that previous work using this sample (Schiele, Reinhard, et al., 2016; Stegmann et al., 2019) had used square-root transformations but we decided to employ a log-transformation and range-correction (i.e., dividing each SCR by the maximum SCR per participant). We used log-transformation and range-correction for SCR data because these transformations are standard practice in our laboratory and we strive for methodological consistency across different projects (e.g., Ehlers, Nold, Kuhn, Klingelhöfer-Jens, & Lonsdorf, 2020; Kuhn, Mertens, & Lonsdorf, 2016; Scharfenort, Menz, & Lonsdorf, 2016; Sjouwerman & Lonsdorf, 2020; Sjouwerman, Niehaus, & Lonsdorf, 2015). Additionally, log-transformed and rangecorrected data are generally assumed to approximate a normal distribution more closely and exhibit lower error variance, which leads to larger effect sizes (Lykken, 1972; Lykken & Venables, 1971; Sjouwerman, Illius, Kuhn, & Lonsdorf, 2022). Additionally, on a descriptive level, this combination of transformations appear to offer greater reliability compared to using raw data alone (Klingelhöfer-Jens, Ehlers, Kuhn, Keyaniyan, & Lonsdorf, 2022).

      Ehlers, M. R., Nold, J., Kuhn, M., Klingelhöfer-Jens, M., & Lonsdorf, T. B. (2020). Revisiting potential associations between brain morphology, fear acquisition and extinction through new data and a literature review. Scientific Reports, 10(1), 19894. https://doi.org/10.1038/s41598-020-76683-1

      Kuhn, M., Mertens, G., & Lonsdorf, T. B. (2016). State anxiety modulates the return of fear. International Journal of Psychophysiology: Official Journal of the International Organization of Psychophysiology, 110, 194–199. https://doi.org/10.1016/j.ijpsycho.2016.08.001

      Scharfenort, R., & Lonsdorf, T. B. (2016). Neural correlates of and processes underlying generalized and differential return of fear. Social Cognitive and Affective Neuroscience, 11(4), 612–620. https://doi.org/10.1093/scan/nsv142

      Sjouwerman, R., Niehaus, J., & Lonsdorf, T. B. (2015). Contextual Change After Fear Acquisition Affects Conditioned Responding and the Time Course of Extinction Learning—Implications for Renewal Research. Frontiers in Behavioral Neuroscience, 9. https://doi.org/10.3389/fnbeh.2015.00337

      Sjouwerman, R., Scharfenort, R., & Lonsdorf, T. B. (2020). Individual differences in fear acquisition: Multivariate analyses of different emotional negativity scales, physiological responding, subjective measures, and neural activation. Scientific Reports, 10(1), 15283. https://doi.org/10.1038/s41598-020-72007-5

      Comment 9:

      There are 24 lines of text of R packages. I do not think this is necessary for the manuscript document and could be moved to the Supplement.

      We thank the reviewer for this comment and understand that it may take a considerable amount of space to list all the references of the R packages. However, we think it is important to prominently credit the respective authors of the R packages. Yet, if this is an important concern of the reviewer and editor, we will reconsider this point.

      Comment 10:

      It is not clear why the authors chose to analyze summary scores across trials rather than including a time factor for the acquisition phase.

      We would like to thank the reviewer for highlighting that the factor time may be interesting as well. However, we think that in our case the time factor is less interesting, as the acquisition effect itself is rather strong. Nevertheless, we have included a figure in the supplement that shows the time course of the SCR by displaying trial-by-trial data across the acquisition and generalization phase for transparency. This figure (Supplementary figure 4) shows that the trajectories appear to barely differ between individuals who were unexposed vs. exposed to moderate childhood adversity. Hence, we think that the analysis approach we have chosen is unlikely to overshadow central time-depending effects. However, if the reviewer and editor has strong feelings about this point, we will consider integrating additional analyses including the time factor in the supplement.

      Results:

      Comment 11:

      The caption of Figure 3 does not match the figure. Please check this.

      We thank the reviewers and editor for attentive reading and have revised this part.

      References:

      Comment 12:

      The Ruge et al paper that is cited many times throughout does not have a valid DOI in the References section. Additionally, the author list on the preprint server is substantially different from that listed in the manuscript. Please correct this reference.

      We thank the reviewers and editor for attentive reading and have corrected this reference. The provided doi was functioning at our end and we hope that this now also applies to the reviewers.

    1. Author response:

      Reviewer #1:

      Response to Public Review

      We thank the reviewer for taking the time to carefully read our paper and to provide helpful comments and suggestions, most of which we have incorporated in our revised manuscript.  One of this reviewer’s (and reviewer #2’s) main concerns was that the confocal images provided in some cases did not appear to reflect the quantitative data in the bar graphs.  These images were provided only for illustrative purposes, to give the reader a sense of what the primary data look like. The reviewer may not have appreciated that the quantitative data reflect counts of RNA smFISH signals (dots) in hundreds of cells collected through z-stacks comprising multiple optical sections in multiple flies for each condition  For example, in P1a control condition (in Figure 2A), we have analyzed 135 neurons from 8 individuals. There, the number of z-planes ranged from 3 to 8 per hemisphere. It is generally not possible to find a single confocal section that encompasses quantitatively the statistics that are presented in the graphs. Presenting the data as an MIP (Maximum Intensity Projection, i.e., collapsed z-stack) in a single panel would generate an image that is too cluttered to see any detail.  We have now included, for the reader’s benefit, additional example confocal sections in both a z-stack and from the opposite hemisphere, in Supplemental Figure S4D. We have also inserted clarifying statements in the text on p. 7 (lines 154-156).

      Another suggestion from Reviewer #1 is that "it would be more informative to separate in the quantification between the GAL4-expressing neurons and the non-expressing ones" based on the presented pictures where more non-P1a neurons (that the reviewer speculates may be pC1-type neurons) are activated by a male-male encounter than by a male-female encounter, while the P1a-positive neurons seem to be more responsive during courtship behavior. In this paper, we were not looking at pC1 neurons and did not try to answer which neuronal population(s) outside of the P1a population is/are responsible for aggression and/or courtship. Rather, we focused on P1a neurons and addressed whether P1a neurons that induce both aggression and courtship behavior when they are artificially activated (Hoopfer et al. 2015) are also naturally activated during spontaneous performance of these two social behaviors. However, this result did not exclude the possibility that P1a neurons were inactive during naturalistic courtship or aggression. Our data in the current manuscript provide further experimental evidence in support of the idea that P1a neurons as a population play a role in both of these behaviors. Moreover, we provided data identifying P1a neurons activated only during aggression or during courtship (or both). However this does not exclude that pC1 or other neighboring populations are activated during aggression as well (See also the response to 'Recommendations For The Authors' and text lines 151-154).

      In Figure 3, we used opto-HI-FISH to identify candidate downstream targets (direct or indirect) of P1a neurons. We used 50 Hz Chrimson stimulation to activate P1a neurons to induce expression of Hr38 and identified Kenyon cells in the mushroom body (MB) and PAM neurons (as well as pCd neurons) as potential downstream targets of P1a cells. In Figure 3 – supplement we performed calcium imaging of KCs and PAM neurons in response to P1a optogenetic stimulation to confirm independently our results from the Hr38 labeling experiments. That control was the purpose of that supplemental experiment.

      Based on those imaging data, the reviewer asked the further question of which [natural] behavioral context induces Hr38 expression in these populations (i.e., mating or aggression). This question is reasonable because our calcium imaging data (Figure 3-supplement) showed that both Kenyon cells and PAM neurons are active only during photo-stimulation of P1a neurons.  Our previous behavioral studies (Inagaki et al., 2014; Hoopfer et al., 2015) showed that 50 Hz photo-stimulation of P1a neurons in freely moving flies induced unilateral wing extension during stimulation, while aggression was observed only after the offset of the stimulation (Hoopfer et.al., 2015). Based on the comparison of those behavioral data to the imaging results in this paper, the reviewer suggested that Kenyon cells and PAM neurons are activated during courtship rather than during aggression. This is certainly a possible interpretation. However it is difficult to extrapolate from behavioral experiments in freely moving animals to calcium imaging results in head-fixed flies, particularly with response to neural dynamics.  Furthermore, Hr38 expression, like that of other IEGs (e.g., c-fos), may reflect persistently activated 2nd messenger pathways (e.g., cAMP, IP3) in Kenyon cells and PAM neurons that are not detected by calcium imaging, but that nevertheless play a role in mediating its behavioral effects. We still do not understand the mechanisms of how optogenetic stimulation of P1a neurons in freely behaving flies induces aggression vs. courtship behavior. Although 50 Hz stimulation of P1a neurons does not induce aggressive behavior during photo-stimulation, it is possible that this manipulation activates both aggression and courtship circuits, but that the courtship circuit might inhibit aggressive behavior at a site downstream of the MB (e.g., in the VNC). Once stimulation is terminated and courtship stops the fly would show aggressive behavior, due to release of that downstream inhibition (see Models in Anderson (2016) Fig 2d, e). In that case, there would be no apparent inconsistency between the imaging data and behavioral data. We agree that the reviewer's question is interesting and important but we feel that answering this question with decisive experiments is beyond the scope of this manuscript.

      Finally, Reviewer #1 suggested a method to evaluate the Hr38 signals in the catFISH experiment of Figure 4. We appreciate their suggestions, but the way that we evaluated the Hr38 signals was basically the same as the way the reviewer suggested. We apologize for the confusion caused by the lack of detailed descriptions in the original manuscript. We have now revised the methods section to explain more clearly how we define the cells as positive based on Hr38EXN and Hr38INT signals.

      Response to Recommendations for the authors:

      “To strengthen the author's argumentation, I would distinguish in their quantification between gal4+ from the other [classes of neighboring neurons]” (Fig. 2 and 4).”

      Our focus in this paper was to ask simply whether P1a neurons are active or not active during natural occurrences of the social behaviors they can evoke when artificially activated. We did not claim that they are the only cells in the region that control the behaviors.  It is not possible to compare their activation to that of 'other' cells neighboring P1a neurons without a separate marker to identify those cells driven by a different reporter system (e.g., LexA). This in turn would require repeating all of the experiments in Figs 2 and 4 from scratch with new genotypes permitting dual-labeling of the two populations by different XFPs, and quantifying the data using 4-color labeling. We respectfully submit that such curiosity-driven experiments, while in principle interesting, are beyond the scope of the present manuscript.  However, we have inserted text to acknowledge the possibility that the aggression-activated Hr38 signals in P1a- cells neighboring P1a+ cells may correspond to other classes of P1 neurons (of which there are 70 in total) or to pC1 cells. Changes:  Text lines 151-154.

      “if the magenta dot is outside of the nuclei I would not count this as positive also the size of the dot seems to be a good marker of the reality of the signal). I would measure the intensity of the hr38EXN. A high Hr38EXN level associated with the presence of hr38INT would indicate that the cell has been activated during both encounters, while a lower hr38EXN with no hr38INT would suggest only an activation during the 1st behavioural context. Finally, a lower hr38EXN associated with the presence of hr38INT would suggest the opposite, an activation only during the 2nd behaviour.”

      We agree that there are some tiny dot signals with hr38 INT probe that are more likely the background signals. We only counted the INT probe signals as positive when the cells had a clearly visible dot and also co-localize with the exonic probe's signal, as primary (un-spliced) Hr38 transcripts in the nucleus should be positive for both EXN and INT probes. Regarding the reviewer’s latter comments, we agree with their interpretation of the catFISH results and that is how we interpreted them originally. We measured the intensity of hr38EXN expression and defined hr38EXN-labeled cells as “positive” when the relative intensity was 3σ >average, a stringent criterion. In the revised manuscript, we added more detailed information in the methods section regarding our criteria for defining cell types as positive.

      “Knowing that the P1a neurons (using the split-gal4) can trigger only wing extension when activated by optogenetic 50Hz, I would test to which behavioral context the MB neurons and the PAM neurons positively respond to.”

      As we answered in 'Response to Public Review,' our opto-HI-FISH experiments identified Kenyon cells in the mushroom body (MB) and PAM neurons (as well as pCd neurons) as potential downstream targets of P1a cells, using Hr38 labeling. The purpose of the calcium imaging experiment in Figure 3 – supplement was to confirm the P1a-dependent activation of KCs and PAM neurons using an independent method. In that respect this control experiment was successful in that methodological confirmation. The reviser raised an interesting question about how our calcium imaging experiments relate to our behavioral experiments, in terms of the dynamics of KC and PAM activation. A recent publication (Shen et al., 2023) revealed that courtship behavior has a positive valence and that activation of P1 neurons mimics a courtship-reward state via activation of PAM dopaminergic neurons. Therefore, it is reasonable to think that PAM neurons (and Kenyon cells as downstream of PAM neurons) are activated during female exposure. However those data do not exclude the possibility that inter-male aggression is also rewarding in Drosophila males, as it has shown to be in mice. This is an interesting curiosity-driven question that has yet to be resolved.  Therefore, as mentioned in the 'Response to Public Review,' we feel that the additional experiment the reviewer suggests is beyond the scope of our manuscript.

      Changes: None.

      Minor comments:

      “Please provide different pictures from main fig2 and sup2 for the three common conditions (control, aggression, and courtship).” 

      The data set for Figure 2 and Figure 2 supplement are from the same experiment. Because of the limited space, we just presented the selected key conditions ('Control', 'Aggression', and 'Courtship') in the main figure and put the complete data set (including these three key conditions) in the supplemental figure.

      Changes: None

      “Please, provide scale bars for the images.”

      Also, Reviewer #2 commented, 'Scale bars are missing on all the images throughout the main and supplementary figures.'

      We have now added scale bars for each figure. 

      “Fig.1: “Is the chrimsonTdtom images from endogenous fluorescence? It is not said in the legend and anti-dsred is not provided in the material and method while anti-GFP is.”

      We are sorry for the confusion and thank the reviewer for raising that question. The signals were native fluorescence, and we have now added that information to the figure legend.

      P7: "As an initial proof-of-concept application of HI-FISH, we asked whether neuronal subsets initially identified in functional screens for aggression-promoting neurons (Asahina et al., 2014; Hoopfer et al., 2015; Watanabe et al., 2017) were actually active during natural aggressive behavior. These included P1a, Tachykinin-FruM+ (TkFruM), and aSP2 neurons". Please put the references to the corresponding group of neurons listed. For example: "These included P1a neurons [Hoopfer et al., 2015]". 

      We have now added these references.

      P9: "Optogenetic and thermogenetic stimulation experiments have shown that that P1a interneurons can promote both male-directed aggression and male- or female-directed courtship" typo

      We appreciate the reviewer for catching this error and have corrected the text.

      (P10:" To validate this approach, we first asked whether we could detect Hr38 induction in pCd neurons, which were previously shown by calcium imaging to be (indirect) targets of P1a neurons". Reference [Jung et al., 2020] 

      We have now added this reference.

      Fig. 4A: Put the time scale on the diagram (3h adaptation-20min-30min rest-20min-10min rest-collect) 

      We have now added the time scale in Figure 4A.

      Reviewer #2: 

      Response to Public Review: 

      We thank the reviewer for their helpful comments and suggestions. We have addressed most of them in our revised manuscript. The main concern of Reviewer #2 was the temporal resolution of the HI-catFISH experiment shown in Figure 4 and Figure 4-Supplement. Our original manuscript illustrated temporal patterns of Hr38EXN and Hr38ITN signals concomitant with different behavioral paradigms (Figure 4B). The reviewer pointed out that the illustrated experimental design does not reflect the actual data shown in Figure 4-Supplement A-C. We believe this issue was raised because we drew the temporal pattern of Hr38EXN signals in Figure 4B based on the intensity of Hr38EXN signals (Figure 4-Supplement B) rather than based on the % number of positive cells (Figure 4-Supplement C). We have now revised the schematic time course of Hr38EXN signals in Figure 4B using the % of positive cells. We believe this change will be helpful for readers to understand better the experimental design since we used the % of positive cells to identify patterns of P1a neuron activation during male-male vs. male-female social interactions in Figure 4D. Another suggestion from Reviewer #2 was to add additional controls, such as the quantification of the intronic and exonic Hr38 probes after either only the first or second social context exposure. In response, we have now added the data from only the first social context (Figure 4C, and 4D, right column). These new data provides evidence that there are essentially no detectable Hr38INT signals 60 minutes later without a second behavioral context, while Hr38EXN signals are still present at the time of the analysis.  Unfortunately, we are not able to provide the converse dataset with the second behavioral context only to show that Hr38 INT signals are detected. On this point, we call the reviewer’s attention to Figure 4-supplement-S4A-C, which show that the INT probe signals are detectable at 15 and 30 minutes following stimulation, but not at 60 minutes.  In the experiment of Fig. 4B, flies are fixed and labeled for Hr38 30 minutes after the beginning of the second behavior, conditions under which we should obtain robust INT signals (as observed).  EXN signals are also expected at 30 minutes because the primary (non-spliced) RNA transcript detected by the INT probe also contains exonic sequences.

      Response to Recommendations for the authors:

      Given that the development of in situ HCR for the adult fly brain is so central to the present manuscript, I think that the methods section describing the HCR protocol can be significantly improved. In particular, the authors should fully describe the in situ HCR protocol including the 'minor modifications' they refer to, and define how they calculate the 'relative intensity to the background'.

      We appreciate the reviewer’s suggestion. We have now revised the methods section to describe the procedure in more detail. Also, we will submit a separate document describing the HI-FISH protocol.

      Note: The authors refer to a recently published paper by Takayanagi-Kiya et al (2023) describing activity-based neuronal labeling using a different immediate early gene, stripe/egr-1. The authors state the following: 'That study used a GAL4 driver for the stripe/egr-1 gene to label and functionally manipulate activated neurons. In contrast, our approach is based purely on detecting expression of the IEG mRNA using..'. Takayanagi-Kiya et al. (2023) also use in situ mRNA detection of the IEG stripe/egr-1 and not only a GAL4 driver system. This claim should be modified and the paper should be cited in the introduction of the present paper.

      We have now cited the paper in the Introduction and have modified and moved the description originally in 'Note' section to Discussion (text lines: 392-404) as the reviewer requested. We have emphasized the difference between the two approaches for comparing neuronal activities during two different behaviors within the same animal. Takayanagi-Kiya used GAL4/UAS and stripe protein expression with immunohistochemistry to analyze neuronal activities during two different behaviors, while we exclusively analyzed Hr38 mRNA expression for this purpose, using intronic and exonic Hr38 probes. This approach made it possible to perform catFISH with higher temporal resolution and also allows extension of our approach to other IEGs for which antibodies are not available.

      Please specify the nature of the iron fillings in the methods section.

      We added a detailed description in the methods section, including the catalog number.

      In Figure 1B, the authors may add a dashed outline to the regions magnified in 1C so that readers can more easily follow the figures. Moreover, it would be informative to see a more detailed quantification of the number of Hr38-positive cells in different brain regions marked by Fru-GAL4.

      We have now added the whole brain images for each condition in Figure 1C and also quantitative data in Figure 1-Supplement C, as the reviewer suggested.

      In the middle right aggression panel of Figure 2A, it looks as if one P1a neuron is not outlined.

      We have carefully examined other z-planes through this region and based on those data have concluded that the signals mentioned by the reviewer are neurites from neurons labeled in other z-planes.

      Changes: None.

      The images in Figure 2A can be again found in Figure Supplement 2A, yet the number of neurons analyzed suggests the quantification was performed from different samples. The images in Figure Supplement 2A should be either changed or it should be explained as to why the images are the same yet the numbers in the legend are different.

      We apologize for the confusion. Figure 2 and Figure 2-Supplement are from the same experiment. To avoid clutter we illustrated three key conditions ('Control,' 'Aggression,' and 'Courtship') in the main figure. The reason why the numbers in the legend are different is that the purpose of presenting Figure 2-Supplement B-D was to determine whether there were differences in the intensity of Hr38 FISH signals in the neurons considered as 'positive' in different conditions. Therefore, the numbers described in Figure 2-Supplement legend are derived only from those neurons that were considered Hr38-positive, while the numbers in Figure 2 include all neurons analyzed. We have now added notes to explain this in the Figure 2 – supplement legend.

      The panels of the quantification of the Hr38 relative intensity in Figure 2B/C/D are very difficult to read, ideally, they should be plotted as in Figure Supplement 2B/C/D.

      The graphs in Figure 2B-D (upper) show data from all GFP-labeled cells scored, including cells defined as 'negative' or 'borderline.' In contrast, the graphs in Figure 2-supplement show the relative Hr38 signal intensity in those GFP neurons defined as positive based on the analysis in Fig. 2B. If we were to plot the data in Fig. 2B (upper) as box plots (like that in Figure-2-supplement), we would see either a skewed (only negative cells) or a bimodal distribution (one around the negative population and the other around the positive population); the shapes of these distributions would likely be hidden in the box-whisker plots format. Therefore, we prefer to plot all of the data points as we did in the original manuscript. However, we agree that the data points in the original manuscript were hard to read. We therefore changed the format of the datapoints from blurry dots to open circles with clear solid lines.

      In Figure 2B/C/D, please specify in the figure legend what 'grouped in categories according to character' means. 

      We used letters to mark statistically significant differences (or lack thereof) between conditions. Bars sharing at least one common letter are not significantly different.  If they do not share any letter, they are significantly different. For example, Aggression: bc vs. Dead: bc, means no difference. Aggression: bc vs. No Food: b, or Aggression: bc vs. Courtship: c also means no difference between Aggression and each of the two other conditions. However, 'No Food: b' and 'Courtship: c' have no common letter, meaning they are different. This is a standard method for showing statistically comparisons among multiple bars without lots of asterisks and horizontal bars cluttering the figure, and we have revised the legend to clarify what each letter means. We have also removed the color shading in Figure 2 B-D as it may have been confusing.

      A quantification of the number of Hr38-positive neurons and Hr38 relative intensity during the entire time course would be informative in Figure 3D. 

      Although the data set for this figure is different from that for Figure 4-Supplement A-C, the main claim is the same. Therefore, Figure 4 - Supplement essentially provides the information that the reviewer suggested. However, we also reanalyzed the data set used for the original Figure 3D and evaluated % positive cells at the 30-minute time point and have now added that number in the figure legend.

      In the legend of Figure 3D, it says '..The expression level reaches its peak at 30-60min', yet I don't see timepoints beyond 60min. Please rephrase or add additional timepoints. 

      We apologize for the error. We have rephrased the text.

      Figure Supplement 3A/D: please add an outline or a schematic figure to better understand where the imaging is performed.

      We added illustrated schemas next to the title of each experiment (P1->PAM neurons (bundle) and P1 -> Kenyon cells (bundle)).

      Figure Supplement 3C/F: please add information about the statistical test to the corresponding figure legend.

      We have added a phrase to describe the test used.

      Figure Supplement 3G/H/I/J: motion artifacts can potentially strongly affect the performed analysis given that cell bodies are very small and highly subjected to motion. Can the authors comment on how they corrected for motion?

      We have now described how we corrected for motion artifacts in the Methods section.

      Figure 4C/D: It seems as if the representative images don't reflect the quantification, e.g., in the male -> female panel, close to 100% of the neurons are positive for the exonic probe as opposed to approx. 40% in the bar graph.

      Please see our response to this issue in the 'Response to Public Review (Reviewer #1)'.

      Additional controls should be included in Figure 4C in order to assess the temporal resolution of HI-CatFISH more in detail (see 'Weaknesses').

      We have also answered this in the 'Response to Public Review'.

      The authors should adjust the scheme in the main Figure 4B to reflect the data presented in Figure S4A and C. For instance, the peak for the intronic version is observed at 15 minutes, while at 30 minutes, both the exonic and intronic signals show an equal level of signal.

      We have addressed this issue in the 'Response to Public Review'.

      We thank the reviewers again for their helpful comments and hope that with these changes, the manuscript will now be acceptable for official publication in eLife.

    1. Author response:

      Reviewer #1 (Public Review): 

      The manuscript entitled "A septo-hypothalamic-medullary circuit directs stress-induced analgesia" by Shah et al., showed that the dLS-to-LHA circuit is sufficient and necessary for stress-induced analgesia (SIA), which is mediated by the rostral ventromedial medulla (RVM) in a opioid-dependent manner. This study is interesting and important and the conclusions are largely supported by the data. I have a few concerns as follows:

      We thank the reviewer for finding our study “interesting”, “important”, and “conclusions are largely supported by data”.

      (1)  The present data show that activation of dLS neurons produces SIA, however, this manipulation is non-specific. It may be better to see the effect of specific manipulation of stress-activated c-Fos positive neurons in the dLS using a combination of the Tet-Off system and chemogenetic/optogenetic tools. 

      We agree with the reviewer that activating the stress-“trapped” neurons will be more specific way to induce SIA through septal activation, compared to the activation of entire dLS strategy pursued by us. In most likelihood, we expect to see a robust SIA if specifically stress responsive dLS neurons are observed. We are in the process of acquiring the genetic tools required for “Trapping” stress neurons and expect to be able to perform the experiments suggested by the reviewers in the coming months. 

      (2)  Depending on its duration, and intensity, stress can exert potent and bidirectional modulatory effects on pain, either reducing pain (SIA) or exacerbating it (stress-induced hyperalgesia, SIH). Is the circuit in the manuscript involved in SIH?

      As mentioned by the reviewer, it would be reasonable to suspect that the dLS neurons are involved in SIH. However, we believe that the experiments to test this hypothesis is outside the scope of this paper, since here we have focused on the circuit mechanisms for SIA. However, in the revised discussion section, we have included the possibility of dLS neurons driving SIH. 

      (3)  It is well-accepted that opioid and cannabinoid receptors participate in the SIA, and the evidence is especially strong for the RVM endocannabinoid system. Given this, why did the authors focus their study on the opioid system?

      We agree with the reviewer that dLS-mediated SIA may work through neural circuits centered on RVM expressing receptors for either or both opioids and endocannabinoids. We primarily focused on the opioidergic system in the RVM as decades of mechanistic work has revealed how the ON, OFF, and neutral neurons modulate pain through the endogenous opioids and even mediate SIA. In the revised discussion, we have included the possibility of involvement of both pain modulatory systems. 

      (4)  Does silencing of the dLS neurons affect stress-induced anxiety-like behaviors? Alternatively, what is the relationship between SIA and the level of stress-induced anxiety?

      We did not test if the silencing of dLS would affect stress-induced anxiety, as our focus was on the pain modulatory effects of dLS activation. The relationships between levels of SIA and stress-induced anxiety will be interesting to explore in future. We believe we would need better behavioral assays compared to the existing ones to quantitatively measure levels of stress-induced anxiety and SIA levels.

      (5)  Direct electrophysiological evidence should be provided to confirm the efficacy of the MP-CNO.

      We agree with the reviewer that ex-vivo electrophysiology experiments will substantiate the effectiveness of the MP-CNO. However, we do not have the expertise, or the instrumentation required to perform these experiments in our laboratory.

      (6)  Is the LHA a specific downstream target for SIA, and is the LHA involved in stressinduced anxiety-like behaviors?

      Several lines of evidence points to the fact that LHA neurons are involved in stressinduced anxiety. We have also shown that the dLS downstream neurons in the LHA are activated by acute restraint by fiber photometry recordings. Thus, we expect activation of the LHA neurons will cause stress-induced anxiety. However, we wanted to focus on the pain modulation aspect of the dLS-LHA-RVM circuitry.

      (7)  Do LHA neurons have direct projections to the RVM? If yes, what is its role in the SIA?

      Our anatomical studies using transsynaptic anterograde and retrograde viral strategies in the Figure 6 shows that the LHA neurons have direct projections to the RVM, and these neurons are sufficient in driving hyperalgesia, as well as necessary for SIA. 

      Reviewer #2 (Public Review): 

      Summary: 

      In this manuscript, Shah et al. explore the function of an understudied neural circuitry from the dLS -> LHA -> RVM in mediating stress-induced analgesia. They initially establish this neural circuitry through a series of intersectional tracings. Subsequently, they conduct behavioral tests, coupled with optogenetic or chemogenetic manipulations, to confirm the involvement of this pathway in promoting analgesia. Additionally, fiber photometry experiments are employed to investigate the activity of each brain region in response to stress and pain. 

      Strengths: 

      Overall, the study is comprehensive, and the findings are compelling. 

      We appreciate the reviewer for finding our manuscript “comprehensive” and “compelling”.

      Weaknesses: 

      One noteworthy concern arises regarding the overarching hypothesis that restrainedinduced stress promotes analgesia. A more direct interpretation suggests that intense struggling, rather than stress per se, activates the dLS -> LHA -> RVM pathway that may drive analgesic responses. 

      We agree with the reviewer that our data can be interpreted as “intense struggling”, rather than the “acute stress” might have altered the pain thresholds in mice. However, we would like to point out that the restraint induced stress model that we have used has been long regarded as a standard for inducing stress. Moreover, we have demonstrated that dLS activation results into acute stress by measuring the blood corticosterone levels, and showed that dLS activations caused stress-induced anxiety through lightdark box tests.

      Reviewer #2 (Recommendations For The Authors): 

      Please find below my other comments for improvements. 

      Introduction: The authors claimed that "dLS neurons receive nociceptive inputs from the thalamus and somatosensory cortices." However, citations are missing.

      We have added the citations.

      Figure 1 B&C: Although this paper focuses on the dLS, it would be informative to also include vLS c-Fos images (maybe in a supplementary figure), given that these data appear to be already acquired. The inclusion of vLS data will provide critical information regarding potential specificity (or lack of) across LS subregions in stress responses.

      In the revised manuscript we have added the vLS c-Fos images as suggested by the reviewer. 

      Figure 1D: Quantification of Vgat vs. Vglut neurons is missing. It is unclear if the Vgat neurons are restricted to small clusters.

      We did not add the Vglut vs, Vgat quantification since from both of our experiments and publicly available data from the Allen Brain Atlas show that almost all of the neurons in the LS are gabaergic. We found very rare,0-2 Vglut2 expressing neurons per section in the the LS of the mouse brain.

      Figure 1G: The Y-axis label is missing. 

      We have added the axis in the revised manuscript.

      Figure 2: The authors claimed that dLS neurons are preferentially tuned to stress caused by physical restraint. However, it appears that these neurons are specifically tuned to intense struggle behavior (transient) rather than stress (prolonged).

      We agree with the reviewer that the SIA observed in mice with dLS activation, can be interpreted as the effect of transient struggle behavior rather than the prolonged stress. However, we would like to point out that the acute restraint for one hour is known to produce prolonged stress, and is backed up by increased blood coticosterone levels and stress-induced anxiety (Fig1-Fig Supplementary 1).

      Figure 4: The authors provided compelling evidence that dLS neurons synapse on LHA Vglut2 neurons. However, it is unclear if they exclusively target the Vglut2 neurons or also synapse on LHA Vgat neurons.

      We agree with the reviewer that even though the majority of the dLS downstream neurons in the LHA are glutamatergic, as now shown in the Fig. 4D, few neurons do not express Vglut and thus must be Gabaergic. 

      Figure 5D: It is unclear if the trace represents dLS or LHA calcium signal (in the main text, the authors claimed both).

      Now, we have mentioned the neurons on the LHA we have recorded from at the top of Figure 5C, D. 

      Figure 6 G&H: Presumably, ΔG-Rabies does not transmit across neurons due to the deletion of the glycoprotein (G) gene. Thus, it is unclear why dLS and LHA neurons express mCherry after injecting rabies into RVM.

      The aim of the rabies experiment was to test that the cells in the LHA that receive inputs from the dLS are the same ones that send projections downstream to the RVM. To this end, we used a monosynaptic rabies virus that has retrograde properties. Hence, when injected into the RVM, it was taken up by the terminals of the LHA neurons in the RVM and traveled to the cell bodies in the LHA. We injected the AAV1-Transsyn-Cre in the dLS, so only the cells downstream of the dLS in the LHA can express the Credependent glycoprotein (G) gene. Thus, the rabies-mCherry virus infected the LHA neurons downstream of dLS specifically, and jumped a synapse, to label the upstream dLS neurons.

      The authors claim that "RVMpost-LHA neurons may modulate nociceptive thresholds through their local synaptic connections within the RVM, recurrent connections with the PAG, or direct interactions with spinal cord neurons." It is unclear what the "local synaptic connections within the RVM" means. It is also unclear whether there is evidence of recurrent connections between the RVM and PAG.

      We meant by local connections as intrinsic connections within the RVM, as in some or few of the RVM neurons, post LHA might be interneurons and mediating SIA by modulating the ON or OFF cells. There are some anatomical evidence for the ascending inputs from RVM to the PAG and the we have now included the citation in the mentioned section of the manuscript.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary

      In this manuscript, Day et al. present a high-throughput version of expansion microscopy to increase the throughput of this well-established super-resolution imaging technique. Through technical innovations in liquid handling with custom-fabricated tools and modifications to how the expandable hydrogels are polymerized, the authors show robust ~4-fold expansion of cultured cells in 96-well plates. They go on to show that HiExM can be used for applications such as drug screens by testing the effect of doxorubicin on human cardiomyocytes. Interestingly, the effects of this drug on changing DNA organization were only detectable by ExM, demonstrating the utility of HiExM for such studies.

      Overall, this is a very well-written manuscript presenting an important technical advance that overcomes a major limitation of ExM - throughput. As a method, HiExM appears extremely useful and the data generally support the conclusions.

      Strengths

      Hi-ExM overcomes a major limitation of ExM by increasing the throughput and reducing the need for manual handling of gels. The authors do an excellent job of explaining each variation introduced to HiExM to make this work and thoroughly characterize the impressive expansion isotropy. The dox experiments are generally well-controlled and the comparison to an alternative stressor (H2O2) significantly strengthens the conclusions.

      Weaknesses

      (1) It is still unclear to me whether or not cells that do not expand remain in the well given the response to point 1. The authors say the cells are digested and washed away but then say that there is a remaining signal from the unexpanded DNA in some cases. I believe this is still a concern that potential users of the protocol should be aware of.

      Although ProteinaseK digestion removes most of the unexpanded cells, DNA can sometimes persist. As such, we occasionally observe Hoechst signal underneath cells. The residual DNA is easily differentiated from nuclear Hoechst signal and does not confound interpretation of results. We have added a new supplementary figure that further clarifies this point.

      (2) Regarding the response to point 9, I think this information should be included in the manuscript, possibly in the methods. It is important for others to have a sense of how long imaging may take if they were to adopt this method.

      We have added detailed information to the methods section to address this point as shown below.  In general, we image HiExM samples on the Opera Phenix at 63x with the following parameters: 100% laser power for all channels; 200 ms exposure for Hoechst, 500-1000+ ms exposure for immunostained channels depending on the strength of the stain and the laser; 60 optical sections with 1 micron spacing; and 4-20 fields of view per well depending on the cell density and sample size requirements. Therefore, imaging one full 96-well plate (60 wells total as we avoid the outer wells) takes anywhere from 3 hr to 64 hr depending on the combination of parameters used.

      Reviewer #2 (Public review):

      Summary:

      In the present work, the authors present an engineering solution to sample preparation in 96-well plates for high-throughput super resolution microscopy via Expansion Microscopy. This is not a trivial problem, as the well cannot be filled with the gel, which would prohibit expansion of the gel. They thus engineered a device that can spot a small droplet of hydrogel solution and keep it in place as it polymerises. It occupies only a small portion space at the center of each well, the gel can expand into all directions and imaging and staining can proceed by liquid handling robots and an automated microscope.

      Strengths:

      In contrast to Reference 8, the authors system is compatible with standard 96 well imaging plates for high-throughput automated microscopy and automated liquid handling for most parts of the protocol. They thus provide a clear path towards high throughput exM and high throughout super resolution microscopy, which is a timely and important goal.

      Addition upon revision:

      The authors addressed this reviewer's suggestions.

      Reviewer #3 (Public review):

      Summary:

      Day et al. introduced high-throughput expansion microscopy (HiExM), a method facilitating the simultaneous adaptation of expansion microscopy for cells cultured in a 96-well plate format. The distinctive features of this method include: 1) the use of a specialized device for delivering a minimal amount (~230 nL) of gel solution to each well of a conventional 96-well plate, and 2) the application of the photochemical initiator, Irgacure 2959, to successfully form and expand toroidal gel within each well.

      Addition upon revision:

      Overall, the authors have adequately addressed most of the concerns raised. There are a few minor issues that require attention.

      Minor comments:

      Figure S10: There appears to be a discrepancy in the panel labeling. The current labels are EH, but it is unclear whether panels A-D exist. Also, this reviewer thought that panels G and H would benefit from statistical testing to strengthen the conclusions. As a general rule for scientific graph presentation, the y-axis of all graphs should start at zero unless there is a compelling reason not to do so.

      We have revised Figure S10 to address your comments.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      By examining the prevalence of interactions with ancient amino acids of coenzymes in ancient versus recent folds, the authors noticed an increased interaction propensity for ancient interactions. They infer from this that coenzymes might have played an important role in prebiotic proteins.

      Strengths:

      (1) The analysis, which is very straightforward, is technically correct. However, the conclusions might not be as strong as presented.

      (2) This paper presents an excellent summary of contemporary thought on what might have constituted prebiotic proteins and their properties.

      (3) The paper is clearly written.

      We are grateful for the kind comments of the reviewer on our manuscript. However, we would like to clarify a possible misunderstanding in the summary of our study. Specifically, analysis of "ancient versus recent folds" was not really reported in our results. Our analysis concerned "coenzyme age" rather than the "protein folds age" and was focused mainly on interaction with early vs. late amino acids in protein sequence. While structural propensities of the coenzyme binding sites were also analyzed, no distinction on the level of ancient vs. recent folds was assumed and this was only commented on in the discussion, based on previous work of others. 

      Weaknesses:

      (1) The conclusions might not be as strong as presented. First of all, while ancient amino acids interact less frequently in late with a given coenzyme, maybe this just reflects the fact that proteins that evolved later might be using residues that have a more favorable binding free energy.

      We would like to point out that there was no distinction between proteins that evolved early or late in our dataset of coenzyme-binding proteins. The aim of our analysis was purely to observe trends in the age of amino acids vs. age of coenzymes. While no direct inference can be made from this about early life as all the proteins are from extant life (as highlighted in the discussion of our work), our goal was to look for intrinsic propensities of early vs. late amino acids in binding to the different coenzyme entities. Indeed, very early interactions would be smeared by the eons of evolutionary history (perhaps also towards more favourable binding free energy, as pointed out also by the reviewer). Nevertheless, significant trends have been recorded across the PDB dataset, pointing to different propensities and mechanistic properties of the binding events. Rather than to a specific evolutionary past, our data therefore point to a “capacity” of the early amino acids to bind certain coenzymes, and we believe that this is the major (and standing) conclusion of our work, along with the properties of such interactions. In our revised version, we will carefully go through all the conclusions and make sure that this message stands out, but we are confident that the following concluding sentences copied from the abstract and the discussion of our manuscript fully comply with our data:

      “These results imply the plausibility of a coenzyme-peptide functional collaboration preceding the establishment of the Central Dogma and full protein alphabet evolution”

      “While no direct inferences about distant evolutionary past can be drawn from the analysis of extant proteins, the principles guiding these interactions can imply their potential prebiotic feasibility and significance.”

      “This implies that late amino acids would not be necessarily needed for the sovereignty of coenzyme-peptide interplay.”

      We would also like to add that proteins that evolved later might not always have higher free energy of binding. Musil et al., 2021 (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8294521/)  showed in their study on the example of haloalkane dehalogenase Dha A that the ancestral sequence reconstruction is a powerful tool for designing more stable, but also more active proteins. Ancestral sequence reconstruction relies on finding ancient states of protein families to suggest mutations that will lead to more stable proteins than are currently existing proteins. Their study did not explore the ligand-protein interactions specifically but showed that ancient states often show more favorable properties than modern proteins.

      (2) What about other small molecules that existed in the probiotic soup? Do they also prefer such ancient amino acids? If so, this might reflect the interaction propensity of specific amino acids rather than the inferred important role of coenzymes.

      We appreciate the comment of the reviewer towards other small molecules, which we assume points mainly towards metal ions (i.e. inorganic cofactors). We completely agree with the reviewer that such interactions are of utmost importance to the origins of life. Intentionally, they were not part of our study, as these have already been studied previously by others (e.g. Bromberg et al., 2022; and reviewed in Frenkel-Pinter et al., 2020) and also us (Fried et al., 2022). For example, it is noteworthy that prebiotically relevant metal binding sites (e.g. of Mg2+) exhibit enrichment in early amino acids such as Asp and Glu while more recent metal (e.g. Cu and Zn) site in the late amino acids His and Cys (Fried et al., 2022). At the same time, comparable analyses of amino acid - coenzyme trends were not available.

      Nevertheless, involvement of metal ions in the coenzyme binding sites was also studied here and pointed to their bigger involvement with the Ancient coenzymes. In the revised version of the manuscript, we will be happy to enlarge the discussion of the studies concerning inorganic cofactors.

      The following sentence was added in the discussion of the revised manuscript:

      “This would also be true for direct interaction of early peptides/proteins and metal ions, independent of organic cofactor involvement, as discussed previously by us and others (Bromberg et al., 2022; Frenkel-Pinter et al., 2020; Fried et al., 2022).  For example, it has been observed that coordination of prebiotically most relevant metal ions (e.g., Mg2+) is more often mediated by early amino acids such as Asp and Glu, whereas metal ions of later relevance (e.g., Cu and Zn) bind more frequently via late amino acids like His and Cys (Fried et al. 2022). Similarly, ancient metal binding folds have been shown to be enriched in early amino acids (Bromberg et al., 2022).”

      (3) Perhaps the conclusions just reflect the types of active sites that evolved first and nothing more.

      We partly agree on this point with the reviewer but not on the fact why it is listed as the weakness of our study and on the “nothing more” notion. Understanding what the properties of the earliest binding sites is key to merging the gap between prebiotic chemistry and biochemistry. The potential of peptides preceding ribosomal synthesis (and the full alphabet evolution) along with prebiotically plausible coenzymes addresses exactly this gap, which is currently not understood.  

      Reviewer #2 (Public Review):

      I enjoyed reading this paper and appreciate the careful analysis performed by the investigators examining whether 'ancient' cofactors are preferentially bound by the first-available amino acids, and whether later 'LUCA' cofactors are bound by the late-arriving amino acids. I've always found this question fascinating as there is a contradiction in inorganic metal-protein complexes (not what is focused on here). Metal coordination of Fe, Ni heavily relies on softer ligands like His and Cys - which are by most models latecomer amino acids. There are no traces of thiols or imidazoles in meteorites - although work by Dvorkin has indicated that could very well be due to acid degradation during extraction. Chris Dupont (PNAS 2005) showed that metal speciation in the early earth (such as proposed by Anbar and prior RJP Williams) matched the purported order of fold emergence.

      As such, cofactor-protein interactions as a driving force for evolution has always made sense to me and I admittedly read this paper biased in its favor. But to make sure, I started to play around with the data that the authors kindly and importantly shared in the supplementary files. Here's what I found:

      Point 1: The correlation between abundance of amino acids and protein age is dominated by glycine.

      There is a small, but visible difference in old vs new amino acid fractional abundance between Ancient and LUCA proteins (Figure 3, Supplementary Table 3). However, the bias is not evenly distributed among the amino acids - which Figure 4A shows but is hard to digest as presented. So instead I used the spreadsheet in Supplement 3 to calculate the fractional difference FDaa = F(old aa)-F(new aa). As expected from Figure 3, the mean FD for Ancient is greater than the mean FD for LUCA. But when you look at the same table for each amino acid FDcofactor = F(ancient cofactor) - F(LUCA cofactor), you now see that the bias is not evenly distributed between older and newer amino acids at all. In fact, most of the difference can be explained by glycine (FDcofactor = 3.8) and the rest by also including tryptophan (FDcofactor = -3.8). If you remove these two amino acids from the analysis, the trend seen in Figure 3 all but disappears.

      Troubling - so you might argue that Gly is the oldest of the old and Trp is the newest of the new so the argument still stands. Unfortunately, Gly is a lot of things - flexible, small, polar - so what is the real correlation, age, or chemistry? This leads to point 2.

      We truly acknowledge the effort that the reviewer made in the revision of the data and for the thoughtful, deeper analysis. We agree that this deserves further discussion of our data. 

      As invited by the reviewer, we indeed repeated the analysis on the whole dataset. First, we would like to point out that the reviewer was most probably referring to the Supplementary Fig. 2 (and not 3, which concerns protein folds). While the difference between Ancient and LUCA coenzyme binding is indeed most pronounced for Gly and Trp, we failed to confirm that the trend disappears if those two amino acids are removed from the analysis (additional FDcofactors of 3.2 and -3.2 are observed for the early and late amino acids, resp.), as seen in Table I below. The main additional contributors to this effect are Asp (FD of 2.1) and Ser (FD of 1.8) from the early amino acids and Arg (FD of -2.6) and Cys (FD of -1.7) of the late amino acids. Hence, while we agree with the reviewer that Gly and Trp (the oldest and the youngest) contribute to this effect the most, we disagree that the trend reduces to these two amino acids.  

      In addition, the most recent coenzyme temporality (the Post-LUCA) was neglected in the reviewer’s analysis. The difference between F (old) and F (new) is even more pronounced in Post-LUCA than in LUCA, vs. Ancient (Supplementary table 5A) and depends much less on Trp. Meanwhile, Asp, Ser, Leu, Phe, and Arg dominate the observed phenomenon (Supplementary table 5b). This further supports our lack of agreement with the reviewer’s point. Nevertheless, we remain grateful for this discussion and we will happily include this additional analysis in the Supplementary Material of our revised manuscript.

      The following text (and the additional data) was included in the revised manuscript version:

      “To explore the contribution of individual amino acids to this effect, fractional difference (FD) for early vs. late amino acids among the Ancient, LUCA, and Post-LUCA coenzyme binding was calculated (Supplementary Table 5). The mean FD revealed a similar trend to the amino acid composition analysis (Fig. 3). The amino acids most enriched in LUCA vs. Post-LUCA are Gly, Ser, and Leu (FD of 4.4, 4.3, and 4.1 respectively), while the most depleted include Phe, Arg, and His (FD of -11, -4.2, and -3.2) (Supplementary Table 5B).”

      Point 2 - The correlation is dominated by phosphate.

      In the ancient cofactor list, all but 4 comprise at least one phosphate (SAM, tetrahydrofolic acid, biopterin, and heme). Except for SAM, the rest have very low Gly abundance. The overall high Gly abundance in the ancient enzymes is due to the chemical property of glycine that can occupy the right-hand side of the Ramachandran plot. This allows it to make the alternating alphaleft-alpharight conformation of the P-loop forming Milner-White's anionic nest. If you remove phosphate binding folds from the analysis the trend in Figure 3 vanishes.

      Likewise, Trp is an important functional residue for binding quinones and tuning its redox potential. The LUCA cofactor set is dominated by quinone and derivatives, which likely drives up the new amino acid score for this class of cofactors.

      Once again, we are thankful to the reviewer for raising this point. The role of Gly in the anionic nests proposed by Milner-White and Russel, as well as the Trp role in quinone binding are important points that we would be happy to highlight more in the discussion of the revised manuscript. 

      Nevertheless, we disagree that the trends reduce only to the phosphate-containing coenzymes and importantly, that “the trend in Figure 3 vanishes” upon their removal. Supplementary table 6A and 6B show the data for coenzymes excluding those with phosphate moiety and the trend in Fig. 3 remains, albeit less pronounced.

      The following text was included in the revised manuscript version:

      “Moreover, we investigated whether the observed trend in amino acid occurrence at the binding sites was dominated by the presence of phosphate groups, which are common in many ancient cofactors except for SAM, Tetrahydrofolic acid, Biopterin, and Heme. An additional analysis therefore excluded all phosphate-containing coenzymes indicating that while the trend is less pronounced, it remains even in the absence of phosphate groups (Supplementary Table 6).”

      In summary, while I still believe the premise that cofactors drove the shape of peptides and the folds that came from them - and that Rossmann folds are ancient phosphate-binding proteins, this analysis does not really bring anything new to these ideas that have already been stated by Tawfik/Longo, Milner-White/Russell, and many others.

      I did this analysis ad hoc on a slice of the data the authors provided and could easily have missed something and I encourage the authors to check my work. If it holds up it should be noted that negative results can often be as informative as strong positive ones. I think the signal here is too weak to see in the noise using the current approach.

      We are grateful to the reviewer for encouraging further look at our data. While we hope that the analysis on the whole dataset (listed in Tables I - IV) will change the reviewer’s standpoint on our work, we would still like to comment on the questioned novelty of our results. In fact, the extraordinary works by Tawfik/Longo and Milner-While/Russel (which were cited in our manuscript multiple times) presented one of the motivations for this study.   We take the opportunity to copy the part of our discussion that specifically highlights the relevance of their studies, and points out the contribution of our work with respect to theirs.  

      “While all the coenzymes bind preferentially to protein residue sidechains, more backbone interactions appear in the ancient coenzyme class when compared to others. This supports an earlier hypothesis that functions of the earliest peptides (possibly of variable compositions and lengths) would be performed with the assistance of the main chain atoms rather than their sidechains (Milner-White and Russel 2011). Longo et al., recently analyzed binding sites of different phosphate-containing ligands which were arguably of high relevance during earliest stages of life, connecting all of today’s core metabolism (Longo et al., 2020 (b)). They observed that unlike the evolutionary younger binding motifs (which rely on sidechain binding), the most ancient lineages indeed bind to phosphate moieties predominantly via the protein backbone.

      Our analysis assigns this phenomenon primarily to interactions via early amino acids that (as mentioned above) are generally enriched in the binding interface of the ancient coenzymes. This implies that late amino acids would not be necessarily needed for the sovereignty of coenzyme-peptide interplay.”

      Unlike any other previous work, our study involves all the major coenzymes (not just the phosphate-containing ones) and is based on their evolutionary age, as well as age of amino acids. It is the first PDB-wide systematic evolutionary analysis of coenzyme-amino acid binding. Besides confirming some earlier theoretical assertions (such as role of backbone interactions in early peptide-coenzyme evolution) and observations (such as occurrence of the ancient phosphate-containing coenzymes in the oldest protein folds), it uncovers substantial novel knowledge. For example, (i) enrichment of early amino acids in the binding of ancient coenzymes, vs. enrichment of late amino acids in the binding of LUCA and Post-LUCA coenzymes, (ii) the trends in secondary structure content of the binding sites of coenzyme of different temporalities, (iii) increased involvement of metal ions in the ancient coenzyme binding events, and (iv) the capacity of only early amino acids to bind ancient coenzymes. In our humble opinion, all of these points bring important contributions in the peptide-coenzyme knowledge gap which has been discussed in a number of previous studies.

      Recommendations for the authors:

      (1) By only focusing on coenzymes, the authors may have overestimated their importance. What about other small molecules that existed in the prebiotic soup? Do they also prefer such ancient amino acids? If so, this might reflect the interaction propensity of specific amino acids rather than some possible role in very ancient proteins. Or it might diminish the conjectured importance of coenzymes.

      The following sentence was added in the discussion of the revised manuscript:

      “This would also be true for direct interaction of early peptides/proteins and metal ions, independent of organic cofactor involvement, as discussed previously by us and others (Bromberg et al., 2022; Frenkel-Pinter et al., 2020; Fried et al., 2022).  For example, it has been observed that coordination of prebiotically most relevant metal ions (e.g., Mg2+) is more often mediated by early amino acids such as Asp and Glu, whereas metal ions of later relevance (e.g., Cu and Zn) bind more frequently via late amino acids like His and Cys (Fried et al. 2022). Similarly, ancient metal binding folds have been shown to be enriched in early amino acids (Bromberg et al., 2022).”

      (2) The authors should analyze whether the interactions are with similar types of amino acids in ancient versus early proteins.

      While we appreciate the interesting suggestion, we would like to clarify that we did not aim to elucidate the differences between early and late protein folds - we agree that this might add an interesting perspective to our work, but we feel that it is well beyond the scope of our current study.

      (3) The authors might also wish to do sequence alignments to the structures in early versus late evolving proteins to see how general this pattern of residue usage is beyond the limited set of proteins found in the PDB.

      This is an interesting suggestion but similar to the previous recommendation, it is not within the scope of this study where no distinction between early and late evolving proteins has been made.  

      There has been a number of attempts to classify the folds as shared among Bacteria, Archea and Eukaryota or specific to  one or two of these groups of organisms (https://link.springer.com/article/10.1007/s00239-023-10136-xhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC9541633/) - this does not however compare easily with our time scales - where ancient ligands occur well before the last common ancestor.

      We also agree  the set of sequences present in the PDB is biased, but perhaps it is less biased than we have thought. The recent fantastic work https://www.biorxiv.org/content/10.1101/2024.03.18.585509v2)  from Nicola Bordin and his colleagues from Orengo group attempted to classify over 200 milion structures in Alphafold database in so called Encyclopedia of Domains and they found out that nearly 80% of detected domains can be assigned to already known superfamilies in CATH (https://www.biorxiv.org/content/10.1101/2024.03.18.585509v2).

      (4) The authors might wish to consider the results in Skolnick, H. Zhou, and M. Gao. On the possible origin of protein homochirality, structure, and biochemical function. PNAS 2019: 116(52): 26571-26579.

      Based on the editorial recommendation, the following sentence was added in the discussion:

      “It has been implied by computer simulations that coenzymes could bind to proteins with similar propensity even before the onset of protein homochirality, despite lower structural stability and secondary structure content in heterochiral polypeptides (Skolnick et al., 2019).”

    1. Author Response:

      Reviewer #1 (Public Review):

      This work makes several contributions: (1) a method for the self-supervised segmentation of cells in 3D microscopy images, (2) an cell-segmented dataset comprising six volumes from a mesoSPIM sample of a mouse brain, and (3) a napari plugin to apply and train the proposed method.

      First, thanks for acknowledging our contributions of a new tool, new dataset, and new software.

      (1) Method

      This work presents itself as a generalizable method contribution with a wide scope: self-supervised 3D cell segmentation in microscopy images. My main critique is that there is almost no evidence for the proposed method to have that wide of a scope. Instead, the paper is more akin to a case report that shows that a particular self-supervised method is good enough to segment cells in two datasets with specific properties.

      First, thanks for acknowledging our contributions of a new tool, new dataset, and new software. We agree we focus on lightsheet microscopy data, therefore to narrow the scope we have changed the title to “CellSeg3D: self-supervised 3D cell segmentation for light-sheet microscopy”.

      To support the claim that their method "address[es] the inherent complexity of quantifying cells in 3D volumes", the method should be evaluated in a comprehensive study including different kinds of light and electron microscopy images, different markers, and resolutions to cover the diversity of microscopy images that both title and abstract are alluding to. The main dataset used here (a mesoSPIM dataset of a whole mouse brain) features well-isolated cells that are easily distinguishable from the background. Otsu thresholding followed by a connected component analysis already segments most of those cells correctly.

      You have selectively dropped the last part of that sentence that is key: “.... 3D volumes, often in cleared neural tissue” – which is what we tackle. The next sentence goes on to say: “We offer a new 3D mesoSPIM dataset and show that CellSeg3D can match state-of-the-art supervised methods.” Thus, we literally make it clear our claims are on MesoSPIM and cleared data.

      The proposed method relies on an intensity-based segmentation method (a soft version of a normalized cut) and has at least five free parameters (radius, intensity, and spatial sigma for SoftNCut, as well as a morphological closing radius, and a merge threshold for touching cells in the post-processing). Given the benefit of tweaking parameters (like thresholds, morphological operation radii, and expected object sizes), it would be illuminating to know how other non-learning-based methods will compare on this dataset, especially if given the same treatment of segmentation post-processing that the proposed method receives. After inspecting the WNet3D predictions (using the napari plugin) on the used datasets I find them almost identical to the raw intensity values, casting doubt as to whether the high segmentation accuracy is really due to the self-supervised learning or instead a function of the post-processing pipeline after thresholding.

      First, thanks for testing our tool, and glad it works for you. The deep learning methods we use cannot “solve” this dataset, and we also have a F1-Score (dice) of ~0.8 with our self-supervised method. We don’t see the value in applying non-learning methods; this is unnecessary and beyond the scope of this work.

      I suggest the following baselines be included to better understand how much of the segmentation accuracy is due to parameter tweaking on the considered datasets versus a novel method contribution:<br /> * comparison to thresholding (with the same post-processing as the proposed method)<br /> * comparison to a normalized cut segmentation (with the same post-processing as the proposed method)<br /> * comparison to references 8 and 9.

      Ref 8 and 9 don’t have readily usable (https://github.com/LiangHann/USAR) or even shared code (https://github.com/Kaiseem/AD-GAN), so re-implementing this work is well beyond the bounds of this paper. We benchmarked Cellpose, StartDist, SegResNets, and a transformer – SwinURNet. Moreover, models in the MONAI package can be used. Note, to our knowledge the transformer results also are a new contribution that the Reviewer does not acknowledge.

      I further strongly encourage the authors to discuss the limitations of their method. From what I understand, the proposed method works only on well-separated objects (due to the semantic segmentation bottleneck), is based on contrastive FG/BG intensity values (due to the SoftNCut loss), and requires tuning of a few parameters (which might be challenging if no ground-truth is available).

      We added text on limitations. Thanks for this suggestion.

      (2) Dataset

      I commend the authors for providing ground-truth labels for more than 2500 cells. I would appreciate it if the Methods section could mention how exactly the cells were labelled. I found a good overlap between the ground truth and Otsu thresholding of the intensity images. Was the ground truth generated by proofreading an initial automatic segmentation, or entirely done by hand? If the former, which method was used to generate the initial segmentation, and are there any concerns that the ground truth might be biased towards a given segmentation method?

      In the already submitted version, we have a 5-page DataSet card that fully answers your questions. They are ALL labeled by hand, without any semi-automatic process.

      In our main text we even stated “Using whole-brain data from mice we cropped small regions and human annotated in 3D 2,632 neurons that were endogenously labeled by TPH2-tdTomato” - clearly mentioning it is human-annotated.

      (3) Napari plugin

      The plugin is well-documented and works by following the installation instructions.

      Great, thanks for the positive feedback.

      However, I was not able to recreate the segmentations reported in the paper with the default settings for the pre-trained WNet3D: segments are generally too large and there are a lot of false positives. Both the prediction and the final instance segmentation also show substantial border artifacts, possibly due to a block-wise processing scheme.

      Your review here does not match your comments above; above you said it was working well, such that you doubt the GT is real and the data is too easy as it was perfectly easy to threshold with non-learning methods.

      You would need to share more details on what you tried. We suggest following our code; namely, we provide the full experimental code and processing for every figure, as was noted in our original submission: https://github.com/C-Achard/cellseg3d-figures.

      Reviewer #2 (Public Review):

      Summary:

      The authors propose a new method for self-supervised learning of 3d semantic segmentation for fluorescence microscopy. It is based on a WNet architecture (Encoder / Decoder using a UNet for each of these components) that reconstructs the image data after binarization in the bottleneck with a soft n-cuts clustering. They annotate a new dataset for nucleus segmentation in mesoSPIM imaging and train their model on this dataset. They create a napari plugin that provides access to this model and provides additional functionality for training of own models (both supervised and self-supervised), data labeling, and instance segmentation via post-processing of the semantic model predictions. This plugin also provides access to models trained on the contributed dataset in a supervised fashion.

      Strengths:

      (1) The idea behind the self-supervised learning loss is interesting.

      (2) The paper addresses an important challenge. Data annotation is very time-consuming for 3d microscopy data, so a self-supervised method that yields similar results to supervised segmentation would provide massive benefits.

      Thank you for highlighting the strengths of our work and new contributions.

      Weaknesses:

      The experiments presented by the authors do not adequately support the claims made in the paper. There are several shortcomings in the design of the experiment and presentation of the results. Further, it is unclear if results of similar quality as reported can be achieved within the GUI by non-expert users.

      Major weaknesses:

      (1) The main experiments are conducted on the new mesoSPIM dataset, which contains quite small and well separated nuclei. It is unclear if the good performance of the novel self-supervised learning method compared to CellPose and StarDist would hold for dataset with other characteristics, such as larger nuclei with a more complex morphology or crowded nuclei.

      StarDist is not pretrained, we trained it from scratch as we did for WNet3D. We retrained Cellpose and reported the results both with their pretrained model and our best-retrained model. This is documented in Figure 1 and Suppl. Figure 1. We also want to push back and say that they both work very well on this data. In fact, our main claim is not that we beat them, it is that we can match them with a self-supervised method.

      Further, additional preprocessing of the mesoSPIM images may improve results for StarDist and CellPose (see the first point in minor weaknesses). Note: having a method that works better for small nuclei would be an important contribution. But I am uncertain the claims hold for larger and/or more crowded nuclei as the current version of the paper implies.

      Figure 2 benchmarks our method on larger and denser nuclei, but we do not intend to claim this is a universal tool. It was specifically designed for light-sheet (brain) data, and we have adjusted the title to be more clear. But we also show in Figure 2 it works well on more dense and noisy samples, hinting that it could be a promising approach. But we agree, as-is, it’s unlikely to be good for extremely dense samples like in electron microscopy, which we never claim it would be.

      With regards to preprocessing, we respectfully disagree. We trained StarDist (and asked the main developer of StarDist, Martin Weigert, to check our work and he is acknowledged in the paper) and it does very well. Cellpose we also retrained and optimized and we show it works as-well-as leading transformer and CNN-based approaches. Again, we only claimed we can be as good as these methods with an unsupervised approach.

      The contribution of the paper would be stronger if a comparison with StarDist / CellPose was also done on the additional datasets from Figure 2.

      We appreciate that more datasets would be ideal, but we always feel it’s best for the authors of tools to benchmark their own tools on data. We only compared others in Figure 1 to the new dataset we provide so people get a sense of the quality of the data too; there we did extensive searches for best parameters for those tools. So while we think it would be nice, we will leave it to those authors to be most fair. We also narrowed the scope of our claims to mesoSPIM data (added light-sheet to the title), which none of the other examples in Figure 2 are.

      (2) The experimental setup for the additional datasets seems to be unrealistic. In general, the description of these experiments is quite short and so the exact strategy is unclear from the text. However, you write the following: "The channel containing the foreground was then thresholded and the Voronoi-Otsu algorithm used to generate instance labels (for Platynereis data), with hyperparameters based on the Dice metric with the ground truth." I.e., the hyperparameters for the post-processing are found based on the ground truth. From the description it is unclear whether this is done a) on the part of the data that is then also used to compute metrics or b) on a separate validation split that is not used to compute metrics. If a): this is not a valid experimental setup and amounts to training on your test set. If b): this is ok from an experimental point of view, but likely still significantly overestimates the quality of predictions that can be achieved by manual tuning of these hyperparameters by a user that is not themselves a developer of this plugin or an absolute expert in classical image analysis, see also 3. Note that the paper provides notebooks to reproduce the experimental results. This is very laudable, but I believe that a more extended description of the experiments in the text would still be very helpful to understand the set-up for the reader. Further, from inspection of these notebooks it becomes clear that hyper-parameters where indeed found on the testset (a), so the results are not valid in the current form.

      We apologize for this confusion; we have now expanded the methods to clarify the setup is now b; you can see what we exactly did as well in the figure notebook: https://c-achard.github.io/cellseg3d-figures/fig2-b-c-extra-datasets/self-supervised-extra.html#threshold-predictions. For clarity, we additionally link each individual notebook now in the Methods.

      (3) I cannot obtain similar results to the ones reported in the manuscript using the plugin. I tried to obtain some of the results from the paper qualitatively: First I downloaded one of the volumes from the mesoSPIM dataset (c5image) and applied the WNet3D to it. The prediction looks ok, however the value range is quite narrow (Average BG intensity ~0.4, FG intensity 0.6-0.7). I try to apply the instance segmentation using "Convert to instance labels" from "Utilities". Using "Voronoi-Otsu" does not work due to an error in pyClesperanto ("clGetPlatformIDs failed: PLATFORM_NOT_FOUND_KHR"). Segmentation via "Connected Components" and "Watershed" requires extensive manual tuning to get a somewhat decent result, which is still far from perfect.

      We are sorry to hear of the installation issue; pyClesperanto is a dependency that would be required to reproduce the images (sounds like you had this issue; https://forum.image.sc/t/pyclesperanto-prototype-doesnt-work/45724 ) We added to our docs now explicitly the fix: https://github.com/AdaptiveMotorControlLab/CellSeg3D/pull/90. We recommend checking the reproduction notebooks (which were linked in initial submission): https://c-achard.github.io/cellseg3d-figures/intro.html.

      Then I tried to obtain the results for the Mouse Skull Nuclei Dataset from EmbedSeg. The results look like a denoised version of the input image, not a semantic segmentation. I was skeptical from the beginning that the method would transfer without retraining, due to the very different morphology of nuclei (much larger and elongated). None of the available segmentation methods yield a good result, the best I can achieve is a strong over-segmentation with watersheds.

      - We are surprised to hear this; did you follow the following notebook which directly produces the steps to create this figure? (This was linked in preprint): https://c-achard.github.io/cellseg3d-figures/fig2-c-extra-datasets/self-supervised-extra .html

      -  We have made a video demo for you such that any step that might be unclear is also more clear to a user: (https://youtu.be/U2a9IbiO7nE).

      -  We also expanded the methods to include the exact values from the notebook into the text.

      Minor weaknesses:

      (1) CellPose can work better if images are resized so that the median object size in new images matches the training data. For CellPose the cyto2 model should do this automatically. It would be important to report if this was done, and if not would be advisable to check if this can improve results.

      We reported this value in Figure 1 and found it to work poorly, that is why we retrained Cellpose and found good performance results (also reported in Figure 1). Resizing GB to TB volumes for mesoSPIM data is otherwise not practical, so simply retraining seems the preferable option, which is what we did.

      (2) It is a bit confusing that F1-Score and Dice Score are used interchangeably to evaluate results. The dice score only evaluates semantic predictions, whereas F1-Score evaluates the actual instance segmentation results. I would advise to only use F1-Score, which is the more appropriate metric. For Figure 1f either the mean F1 score over thresholds or F1 @ 0.5 could be reported. Furthermore, I would advise adopting the recommendations on metric reporting from https://www.nature.com/articles/s41592-023-01942-8.

      We are using the common metrics in the field for instance and semantic segmentation, and report them in the methods. In Figure 2f we actually report the “Dice” as defined in StarDist (as we stated in the Methods). Note, their implementation is functionally equivalent to F1-Score of an IoU >= 0, so we simply changed this label in the figure now for clarity. We agree this clarifies for the expert readers what was done, and we expanded the methods to be more clear about metrics. We added a link to the paper you mention as well.

      (3) A more conceptual limitation is that the (self-supervised) method is limited to intensity-based segmentation, and so will not be able to work for cases where structures cannot be distinguished based on intensity only. It is further unclear how well it can separate crowded nuclei. While some object separation can be achieved by morphological operations this is generally limited for crowded segmentation tasks and the main motivation behind the segmentation objective used in StarDist, CellPose, and other instance segmentation methods. This limitation is only superficially acknowledged in "Note that WNet3D uses brightness to detect objects [...]" but should be discussed in more depth.

      Note: this limitation does not mean at all that the underlying contribution is not significant, but I think it is important to address this in more detail so that potential users know where the method is applicable and where it isn't.

      We agree, and we added a new section specifically on limitations. Thanks for raising this good point. Thus, while self-supervision comes at the saving of hundreds of manual labor, it comes at the cost of more limited regimes it can work on. Hence why we don’t claim this should replace excellent methods like Cellpose or Stardist, but rather complement them and can be used on mesoSPIM samples, as we show here.

    1. Author Response:

      We thank the reviewers for their thoughtful comments on our manuscript. In this provisional response, we aim to address the major concerns raised and outline a plan for a revised version of the manuscript. A more detailed point-by-point response will follow with the revision.

      The reviewers appreciated our efforts to combine computational modelling with experimental work. However, they also expressed the need for more clarity in explaining how the model was set up, what was simulated, and what the insights and limitations are. In the revision, we plan to improve the discussion section to clarify all of these points. 

      The reviewers also highlighted the need for more transparency regarding the code and the mathematical formulas used in this study. We agree that this is an important issue. While we have already made the software and code for our computational model, along with instructions on how to run it, available in Zenodo (see Ref. 1), and have extensively described the original computational model and formulas in a 13-page supplementary file in our previous study (see Ref. 2), we recognize from the reviewers’ comments that additional transparency is needed. To address this, we will provide an appendix in the revision that includes a full model description, covering the incorporation of cell differentiation and death, a list of parameters, and details on how parameter values were chosen.

      Additionally, in the revised manuscript, we will add a paragraph to more thoroughly discuss the limitations of our approach, as well as avenues for future studies. We hope this will clarify both capabilities and limitations of our model in a way that is more  accessible to readers of eLife.

      References:

      1. Virtual Thymus Model (version 2.0). Published: Jun 14, 2024.  doi:10.5281/zenodo.11656320

      2. Aghaallaei, Narges, et al. "αβ/γδ T cell lineage outcome is regulated by intrathymic cell localization and environmental signals." Science Advances 7.29 (2021): eabg3613.

    1. Author response:

      Reviewer #1:

      Weaknesses:

      (1) The activity of the dominant negatives lacks appropriate controls. This is crucial given that mouse mutants for PG5, PG6, PG7, and three of the four PG4 genes show no major effects on limb induction or growth. Understanding these discrepancies is essential.

      Given the importance of the Loss of Function (LOF) experiments, we will provide additional evidence for the validity of the dominant-negative strategy and constructs used.

      (2) The authors mention redundancies in Hox activity, consistent with numerous previous reports. However, they only use single dominant-negative versions of each Hox paralog gene individually. If Hox4 and Hox5 functions are redundant, experiments should include simultaneous dominant negatives for both groups.

      To clarify redundancies in Hox activity, we will test whether simultaneous expression of dominant-negative forms of more than one Hox genes induces a stronger effect compared to the expression of a single dominant-negatives Hox genes.

      (3) The main conclusion that Hox4 and Hox5 provide permissive cues on which Hox6/7 induce the forelimb is not sufficiently supported by the data. An experiment expressing simultaneous dnHox4/5 and Hox6/7 is needed. If the hypothesis is correct, this should block Hox6/7's capacity to expand the limb bud or generate an extra bulge.

      We agree that this is an excellent additional experiment to corroborate our conclusion and will perform this experiment in our revision.

      (4) The identity of the extra bulge or extended limb bud is unclear. The only marker supporting its identity as a forelimb is Tbx5, while other typical limb development markers are absent. Tbx5 is also expressed in other regions besides the forelimb, and its presence does not guarantee forelimb identity. For instance, snakes express Tbx5 in the lateral mesoderm along much of their body axis.

      To date, Tbx5 is the best marker for the forelimb. While it is true that the Tbx5 expression is broader than the limb field, this occurs only at early stages before forelimb bud formation. We will work towards a further definition of this extra bulge.

      (5) It is important to analyze the skeletons of all embryos to assess the effect of reduced limb buds upon dnHox expression and determine whether extra skeletal elements develop from the extended bud or ectopic bulge.

      We have analysed the cartilage structure of operated embryos with GOF experiments and found no skeletal elements within the ectopic wing bud in the neck. Additionally, in our revision, we can further analyse the wing skeleton of operated embryos with LOF experiments, which would provide more detailed assessments of the impact of dominant-negative Hox genes on wing bud formation.

      Reviewer #2:

      Weaknesses

      (1) By contrast to the GOF experiments that induce ectopic limb budding, the LOF experiments, which use dominant negative forms of Hoxa4, Hoxa5, Hoxa6, and Hoxa7, are more challenging to interpret due to the absence of data on the specificity of the dominant negative constructs. Absent such controls, one cannot be certain that effects on limb development are due to disruption of the specific Hox proteins that are being targeted.

      We will revise our manuscript to clarify the specificity of the dominant-negative strategy used.

      (2) A test of their central hypothesis regarding the necessity and sufficiency of the Hox genes under investigation would be to co-transfect the neck with full-length Hoxa6/a7 AND the dnHoxA4/a5. If their hypothesis is correct, then the dn constructs should block the limb-inducing ability of Hoxa6/a7 overexpression (again, validation of specificity of the DN constructs is important here).

      This is an excellent idea and we will implement the experiment in our revision.

      (3) The paper could be strengthened by providing some additional data, which should already exist in their RNA-Seq dataset, such as supplementary material that shows the actual gene expression data that are represented in the Venn diagram, heatmap, and GO analysis in Figure 3.

      We will incorporate this suggestion and include additional data from our RNA-seq analysis.

      (4) The results of these experiments in chick embryos are rather unexpected based on previous knockout experiments in mice, and this needs to be discussed.

      In our revision, we will appropriately expand the discussion on the discrepancies observed between knockout mouse models and our chick embryo experiments.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Fernandez et al. investigate the influence of maternal behavior on bat pup vocal development in Saccopteryx bilineata, a species known to exhibit vocal production learning. The authors performed detailed longitudinal observations of wild mother-pup interactions to ask whether non-vocal maternal displays during juvenile vocal practice or 'babbling', affect vocal production. Specifically, the study examines the durations of pup babbling events and the developmental babbling phase, in relation to the amount of female display behavior, as well as pup age and the number of nearby singing adult males. Furthermore, the authors examine pup vocal repertoire size and maturation in relation to the number of maternal displays encountered during babbling. Statistical models identify female display behavior as a predictor of i) babbling bout duration, ii) the length of the babbling phase, iii) song composition, and iv) syllable maturation. Notably, these outcomes were not influenced by the number of nearby adult males (the pups' source of song models) and were largely independent of general maturation (pup age). These findings highlight the impact of non-vocal aspects of social interactions in guiding mammalian vocal development.

      We thank Reviewer 1 for the time and effort dedicated to the revision of our study. The suggestions for the revision of our manuscript were very helpful and will improve our manuscript. 

      Strengths:

      Historically, work on developmental vocal learning has focused on how juvenile vocalizations are influenced by the sounds produced by nearby adults (often males). In contrast, this study takes the novel approach of examining juvenile vocal ontogeny in relation to non-vocal maternal behavior, in one of the few mammals known to exhibit vocal production learning. The authors collected an impressive dataset from multiple wild bat colonies in two Central American countries. This includes longitudinal acoustic recordings and behavioral monitoring of individual mother-pup pairs, across development.

      The identified relationships between maternal behavior and bat pup vocalizations have intriguing implications for understanding the mechanisms that enable vocal production learning in mammals, including human speech acquisition. As such, these findings are likely to be relevant to a broad audience interested in the evolution and development of social behavior as well as sensory-motor learning.

      We thank reviewer 1 for this assessment. 

      Weaknesses:

      The authors qualitatively describe specific patterns of female displays during pup babbling, however, subsequent quantitative analyses are based on two aggregate measures of female behavior that pool across display types. Consequently, it remains unclear how certain maternal behaviors might differentially influence pup vocalizations (e.g. through specific feedback contingencies or more general modulation of pup behavioral states).

      In analyzing the effects of maternal behavior on song maturation, the authors focus on the most common syllable type produced across pups. This approach is justified based on the syllable variability within and across individuals, however, additional quantification and visual presentation of categorized syllable data would improve clarity and potentially strengthen resulting claims.

      We agree that our analysis of maternal behaviour does not investigate potential contingencies between particular maternal behavioural displays and pup vocalizations (e.g.

      particular syllable types). Our data collected for this study on maternal behaviour includes direct observations, field notes and/or video recordings. In the future, it will be necessary to work with high-speed cameras for the analysis of potential contingencies between particular maternal behavioural displays and specific pup vocalizations, which allow this kind of fine-detailed analysis. We have planned future studies investigating whether pup vocalizations elicit contingent maternal responses or vice versa. In the revision of our manuscript, we will include a comment pointing out that this special behaviour will be investigated in greater detail in the future. 

      As suggested by reviewer 1, in our revised manuscript we will include more information on methods to improve understandability. In particular, we will:

      - present more information on different steps of our acoustic analyses

      - provide additional and clearer spectrogram figures representing the different syllable types and categorizations 

      - change the figures accompanying our GLMM analyses following the suggestion of Reviewer 1

      Reviewer #2 (Public review):

      Summary:

      This study explores how maternal behaviors influence vocal learning in the greater sac-winged bat (Saccopteryx bilineata). Over two field seasons, researchers tracked 19 bat pups from six wild colonies, examining vocal development aspects such as vocal practice duration, syllable repertoire size, and song syllable acquisition. The findings show that maternal behaviors significantly impact the length of daily babbling sessions and the overall babbling phase, while the presence of adult male tutors does not.

      The researchers conducted detailed acoustic analyses, categorizing syllables and evaluating the variety and presence of learned song syllables. They discovered that maternal interactions enhance both the number and diversity of learned syllables and the production of mature syllables in the pups' vocalizations. A notable correlation was found between the extent of acoustic changes in the most common learned syllable type and maternal activity, highlighting the key role of maternal feedback in shaping pups' vocal development.

      In summary, this study emphasizes the crucial role of maternal social feedback in the vocal development of S. bilineata. Maternal behaviors not only increase vocal practice but also aid in acquiring and refining a complex vocal repertoire. These insights enhance our understanding of social interactions in mammalian vocal learning and draw interesting parallels between bat and human vocal development.

      We thank reviewer 2 for his/her time and effort dedicated to the revision of our study. The suggestions were very helpful in improving our manuscript. 

      Strengths:

      This paper makes significant contributions to the field of vocal learning by looking at the role of maternal behaviors in shaping the vocal learning phenotype of Saccopteryx bilineata. The paper uses a longitudinal approach, tracking the vocal ontogeny of bat pups from birth to weaning across six colonies and two field seasons, allowing the authors to assess how maternal interactions influence various aspects of vocal practice and learning, providing strong empirical evidence for the critical role of social feedback in non-human mammalian vocal learners. This kind of evidence highlights the complexity of the vocal learning phenotype and shows that it goes beyond the right auditory experience and having the right circuitry.

      The paper offers a nuanced understanding of how specific maternal behaviors impact the acquisition and refinement of the vocal repertoire, while showing the number of male tutors - the source of adult song - did not have much of an effect. The correlation between maternal activity and acoustic changes in learned syllable types is a novel finding that underscores the importance of non-vocal social interactions in vocal learning. In vocal learning research, with some notable exceptions, experience is often understood as auditory experience. This paper highlights how, even though that is one important piece of the puzzle, other kinds of experience directly affect the development of vocal behavior. This is of particular importance in the case of a mammalian species such as Saccopteryx bilineata, as this kind of result is perhaps more often associated with avian species.

      Moreover, the study's findings have broader implications for our understanding of vocal learning across species. By drawing parallels between bat and human vocal development (and in some ways to bird vocal development), the paper highlights common mechanisms that may underlie vocal practice and learning in both humans and other mammals. This interdisciplinary perspective enriches the field and encourages further comparative studies, ultimately advancing our knowledge of the evolutionary and developmental processes that shape vocal productive learning in all its dimensions.

      Weaknesses:

      Some weaknesses can be pointed out, but in fairness, the authors acknowledge them in one way or another. As such, these are not flaws per se, but gaps that can be filled with further research.

      Experimental manipulations, such as controlled playback experiments or controlled environments, could strengthen the causal claims by directly testing the effects of specific maternal behaviors on vocal development. Certainly, the strengths of the paper will be consolidated after such work is performed.

      The reliance on the number of singing males as a proxy for social acoustic input. This measure does not account for the variability in the quality, frequency, or duration of the male songs to which the pups are exposed. A more detailed analysis of the acoustic environment, including direct measurements of song exposure and its impact on vocal learning, would provide a clearer understanding of the role of male tutors.

      Finally, and although it would be unlikely that these results are unique to Saccopteryx bilineata, the study's focus on a single species limits at present the generalizability of some of its findings to other vocal learning mammals. While the parallels drawn between bat and human vocal development are intriguing, the conclusions will be more robust when supported by comparative studies involving multiple species of vocal learners. This will help to identify whether the observed maternal influences on vocal development reported here are unique to Saccopteryx bilineata or represent a broader phenomenon in chiropteran, mammalian, or general vocal learning. Expanding the scope of research to include a wider range of species and incorporating cross-species comparisons will significantly enhance the contribution of this study to the field of vocal learning.

      Thank you for your suggestions and comments. 

      Regarding your main comment 1: In the future, we plan to implement temporary captivity experiments to investigate how maternal behaviours affect pup vocal development. This study provides the necessary basis for conducting future playback studies investigating specific behaviours in a controlled environment.

      Regarding your main comment 2: We completely agree that the number of singing males only represents a proxy for acoustic input that pups receive during ontogeny. In the future, we plan to investigate in detail how the acoustic landscape influences pup vocal development and learning. This will include quantifying how long pups are exposed to song during ontogeny and, assessing the influence of different tutors, including a detailed analysis of song syllables of the adult tutors to compare it to vocal trajectories of song syllables in pups. 

      Regarding your main comment 3: We also fully agree that it is unlikely that these results are unique to Saccopteryx bilineata. We are certain that other mammalian vocal learners show parallels to the vocal development and learning processes of S. bilineata. Especially bats are a promising taxon for comparative studies because their vocal production and perception systems are highly sophisticated (due to their ability to echolocate). The high sociability of this taxon also includes a variety of social systems and vocal capacities (e.g. regarding vocal repertoire size, vocal learning capacities, information content, etc.) which support social learning and social feedback – as shown in our study. 

      As suggested, in our revised manuscript we will include information on the validation of the ethogram. Furthermore, we will correct all the spelling mistakes – thank you very much for pointing them out!

    1. Author response:

      We appreciate all the reviewers for their encouraging comments and thoughtful feedback. We are confident that we can incorporate many of the suggestions to provide a clearer overall picture in the revised manuscript. In particular, we agree with the reviewers' concern that some of our methodological decisions, including our choice of metrics, require further clarification. We will focus on revising the methods section to make these decisions more transparent and to address any misunderstandings related to the analysis.

      We also value the request to include more data, such as intermediate results and additional control analyses. We will carefully assess which results to include in the main manuscript and which to provide in an extended supplementary section.

      To offer a more detailed understanding of our quantification of "prediction tendency," we refer to our previous work (Schubert et al., 2023, 2024), where we elaborate on our analytical choices in great detail and provide additional control analyses (e.g., ensuring that the relationship with speech tracking is not driven by participants' signal-to-noise ratio; Schubert et al., 2023).

      Additionally, we would like to clarify that the aim of this manuscript is not to analyze viewing behavior in depth but to replicate the general finding of ocular speech tracking, as presented in Gehmacher et al. (2024). A thorough investigation of specific ocular contributions (e.g., microsaccades or blinks) would require a separate research question and distinct analysis approaches, given the binary nature of such events.

      Nevertheless, we share the reviewers' interest in independent results from the current study, and we plan to carefully select and present the most relevant findings in the revised manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this paper, Misic et al showed that white matter properties can be used to classify subacute back pain patients that will develop persisting pain.

      Strengths:

      Compared to most previous papers studying associations between white matter properties and chronic pain, the strength of the method is to perform a prediction in unseen data. Another strength of the paper is the use of three different cohorts. This is an interesting paper that provides a valuable contribution to the field.

      We thank the reviewer for emphasizing the strength of our paper and the importance of validation on multiple unseen cohorts.

      Weaknesses:

      The authors imply that their biomarker could outperform traditional questionnaires to predict pain: "While these models are of great value showing that few of these variables (e.g. work factors) might have significant prognostic power on the long-term outcome of back pain and provide easy-to-use brief questionnaires-based tools, (21, 25) parameters often explain no more than 30% of the variance (28-30) and their prognostic accuracy is limited.(31)". I don't think this is correct; questionnaire-based tools can achieve far greater prediction than their model in about half a million individuals from the UK Biobank (Tanguay-Sabourin et al., A prognostic risk score for the development and spread of chronic pain, Nature Medicine 2023).

      We agree with the reviewer that we might have under-estimated the prognostic accuracy of questionnaire-based tools, especially, the strong predictive accuracy shown by Tangay-Sabourin 2023.  In this revised version, we have changed both the introduction and the discussion to reflect the questionnaire-based prognostic accuracy reported in the seminal work by Tangay-Sabourin. 

      In the introduction (page 4, lines 3-18), we now write:

      “Some studies have addressed this question with prognostic models incorporating demographic, pain-related, and psychosocial predictors.1-4 While these models are of great value showing that few of these variables (e.g. work factors) might have significant prognostic power on the long-term outcome of back pain, their prognostic accuracy is limited,5 with parameters often explaining no more than 30% of the variance.6-8. A recent notable study in this regard developed a model based on easy-to-use brief questionnaires to predict the development and spread of chronic pain in a variety of pain conditions capitalizing on a large dataset obtained from the UK-BioBank. 9 This work demonstrated that only few features related to assessment of sleep, neuroticism, mood, stress, and body mass index were enough to predict persistence and spread of pain with an area under the curve of 0.53-0.73. Yet, this study is unique in showing such a predictive value of questionnaire-based tools. Neurobiological measures could therefore complement existing prognostic models based on psychosocial variables to improve overall accuracy and discriminative power. More importantly, neurobiological factors such as brain parameters can provide a mechanistic understanding of chronicity and its central processing.”

      And in the conclusion (page 22, lines 5-9), we write:

      “Integrating findings from studies that used questionnaire-based tools and showed remarkable predictive power9 with neurobiological measures that can offer mechanistic insights into chronic pain development, could enhance predictive power in CBP prognostic modeling.”

      Moreover, the main weakness of this study is the sample size. It remains small despite having 3 cohorts. This is problematic because results are often overfitted in such a small sample size brain imaging study, especially when all the data are available to the authors at the time of training the model (Poldrack et al., Scanning the horizon: towards transparent and reproducible neuroimaging research, Nature Reviews in Neuroscience 2017). Thus, having access to all the data, the authors have a high degree of flexibility in data analysis, as they can retrain their model any number of times until it generalizes across all three cohorts. In this case, the testing set could easily become part of the training making it difficult to assess the real performance, especially for small sample size studies.

      The reviewer raises a very important point of limited sample size and of the methodology intrinsic of model development and testing. We acknowledge the small sample size in the “Limitations” section of the discussion.   In the resubmission, we acknowledge the degree of flexibility that is afforded by having access to all the data at once. However, we also note that our SLF-FA based model is a simple cut-off approach that does not include any learning or hidden layers and that the data obtained from Open Pain were never part of the “training” set at any point at either the New Haven or the Mannheim site.  Regarding our SVC approach we follow standard procedures for machine learning where we never mix the training and testing sets. The models are trained on the training data with parameters selected based on cross-validation within the training data. Therefore, no models have ever seen the test data set. The model performances we reported reflect the prognostic accuracy of our model. We write in the limitation section of the discussion (page 20, lines 20-21, and page 21, lines 1-6):

      “In addition, at the time of analysis, we had “access” to all the data, which may lead to bias in model training and development.  We believe that the data presented here are nevertheless robust since multisite validated but need replication. Additionally, we followed standard procedures for machine learning where we never mix the training and testing sets. The models were trained on the training data with parameters selected based on cross-validation within the training data. Therefore, no models have ever seen the test data set. The model performances we reported reflect the prognostic accuracy of our model”. 

      Finally, as discussed by Spisak et al., 10 the key determinant of the required sample size in predictive modeling is the ” true effect size of the brain-phenotype relationship”, which we think is the determinant of the replication we observe in this study. As such the effect size in the New Haven and Mannheim data is Cohen’s d >1.

      Even if the performance was properly assessed, their models show AUCs between 0.65-0.70, which is usually considered as poor, and most likely without potential clinical use. Despite this, their conclusion was: "This biomarker is easy to obtain (~10 min of scanning time) and opens the door for translation into clinical practice." One may ask who is really willing to use an MRI signature with a relatively poor performance that can be outperformed by self-report questionnaires?

      The reviewer is correct, the model performance is fair which limits its usefulness for clinical translation.  We wanted to emphasize that obtaining diffusion images can be done in a short period of time and, hence, as such models’ predictive accuracy improves, clinical translation becomes closer to reality. In addition, our findings are based on older diffusion data and limited sample sizes coming from different sites and different acquisition sequences.  This by itself would limit the accuracy especially since the evidence shows that sample size affects also model performance (i.e. testing AUC)10.  In the revision, we re-worded the sentence mentioned by the reviewer to reflect the points discussed here. This also motivates us to collect a more homogeneous and larger sample.  In the limitations section of the discussion, we now write (page 21, lines 6-9):

      “Even though our model performance is fair, which currently limits its usefulness for clinical translation, we believe that future models would further improve accuracy by using larger homogenous sample sizes and uniform acquisition sequences.”

      Overall, these criticisms are more about the wording sometimes used and the inference they made. I think the strength of the evidence is incomplete to support the main claims of the paper.

      Despite these limitations, I still think this is a very relevant contribution to the field. Showing predictive performance through cross-validation and testing in multiple cohorts is not an easy task and this is a strong effort by the team. I strongly believe this approach is the right one and I believe the authors did a good job.

      We thank the reviewer for acknowledging that our effort and approach were useful.

      Minor points:

      Methods:

      I get the voxel-wise analysis, but I don't understand the methods for the structural connectivity analysis between the 88 ROIs. Have the authors run tractography or have they used a predetermined streamlined form of 'population-based connectome'? They report that models of AUC above 0.75 were considered and tested in the Chicago dataset, but we have no information about what the model actually learned (although this can be tricky for decision tree algorithms). 

      We apologize for the lack of clarity; we did run tractography and we did not use a pre-determined streamlined form of the connectome.

      Finding which connections are important for the classification of SBPr and SBPp is difficult because of our choices during data preprocessing and SVC model development: (1) preprocessing steps which included TNPCA for dimensionality reduction, and regressing out the confounders (i.e., age, sex, and head motion); (2) the harmonization for effects of sites; and (3) the Support Vector Classifier which is a hard classification model11.

      In the methods section (page 30, lines 21-23) we added: “Of note, such models cannot tell us the features that are important in classifying the groups.  Hence, our model is considered a black-box predictive model like neural networks.”

      Minor:

      What results are shown in Figure 7? It looks more descriptive than the actual results.

      The reviewer is correct; Figure 7 and Supplementary Figure 4 were both qualitatively illustrating the shape of the SLF. We have now changed both figures in response to this point and a point raised by reviewer 3.  We now show a 3D depiction of different sub-components of the right SLF (Figure 7) and left SLF (Now Supplementary Figure 11 instead of Supplementary Figure 4) with a quantitative estimation of the FA content of the tracts, and the number of tracts per component.  The results reinforce the TBSS analysis in showing asymmetry in the differences between left and right SLF between the groups (i.e. SBPp and SBPr) in both FA values and number of tracts per bundle.

      Reviewer #2 (Public Review):

      The present study aims to investigate brain white matter predictors of back pain chronicity. To this end, a discovery cohort of 28 patients with subacute back pain (SBP) was studied using white matter diffusion imaging. The cohort was investigated at baseline and one-year follow-up when 16 patients had recovered (SBPr) and 12 had persistent back pain (SBPp). A comparison of baseline scans revealed that SBPr patients had higher fractional anisotropy values in the right superior longitudinal fasciculus SLF) than SBPp patients and that FA values predicted changes in pain severity. Moreover, the FA values of SBPr patients were larger than those of healthy participants, suggesting a role of FA of the SLF in resilience to chronic pain. These findings were replicated in two other independent datasets. The authors conclude that the right SLF might be a robust predictive biomarker of CBP development with the potential for clinical translation.

      Developing predictive biomarkers for pain chronicity is an interesting, timely, and potentially clinically relevant topic. The paradigm and the analysis are sound, the results are convincing, and the interpretation is adequate. A particular strength of the study is the discovery-replication approach with replications of the findings in two independent datasets.

      We thank reviewer 2 for pointing to the strength of our study.

      The following revisions might help to improve the manuscript further.

      - Definition of recovery. In the New Haven and Chicago datasets, SBPr and SBPp patients are distinguished by reductions of >30% in pain intensity. In contrast, in the Mannheim dataset, both groups are distinguished by reductions of >20%. This should be harmonized. Moreover, as there is no established definition of recovery (reference 79 does not provide a clear criterion), it would be interesting to know whether the results hold for different definitions of recovery. Control analyses for different thresholds could strengthen the robustness of the findings.

      The reviewer raises an important point regarding the definition of recovery.  To address the reviewers’ concern we have added a supplementary figure (Fig. S6) showing the results in the Mannheim data set if a 30% reduction is used as a recovery criterion, and in the manuscript (page 11, lines 1,2) we write: “Supplementary Figure S6 shows the results in the Mannheim data set if a 30% reduction is used as a recovery criterion in this dataset (AUC= 0.53)”.

      We would like to emphasize here several points that support the use of different recovery thresholds between New Haven and Mannheim.  The New Haven primary pain ratings relied on visual analogue scale (VAS) while the Mannheim data relied on the German version of the West-Haven-Yale Multidimensional Pain Inventory. In addition, the Mannheim data were pre-registered with a definition of recovery at 20% and are part of a larger sub-acute to chronic pain study with prior publications from this cohort using the 20% cut-off12. Finally, a more recent consensus publication13 from IMMPACT indicates that a change of at least 30% is needed for a moderate improvement in pain on the 0-10 Numerical Rating Scale but that this percentage depends on baseline pain levels.

      - Analysis of the Chicago dataset. The manuscript includes results on FA values and their association with pain severity for the New Haven and Mannheim datasets but not for the Chicago dataset. It would be straightforward to show figures like Figures 1 - 4 for the Chicago dataset, as well.

      We welcome the reviewer’s suggestion; we added these analyses to the results section of the resubmitted manuscript (page 11, lines 13-16): “The correlation between FA values in the right SLF and pain severity in the Chicago data set showed marginal significance (p = 0.055) at visit 1 (Fig. S8A) and higher FA values were significantly associated with a greater reduction in pain at visit 2 (p = 0.035) (Fig. S8B).”

      - Data sharing. The discovery-replication approach of the present study distinguishes the present from previous approaches. This approach enhances the belief in the robustness of the findings. This belief would be further enhanced by making the data openly available. It would be extremely valuable for the community if other researchers could reproduce and replicate the findings without restrictions. It is not clear why the fact that the studies are ongoing prevents the unrestricted sharing of the data used in the present study.

      We greatly appreciate the reviewer's suggestion to share our data sets, as we strongly support the Open Science initiative. The Chicago data set is already publicly available. The New Haven data set will be shared on the Open Pain repository, and the Mannheim data set will be uploaded to heiDATA or heiARCHIVE at Heidelberg University in the near future. We cannot share the data immediately because this project is part of the Heidelberg pain consortium, “SFB 1158: From nociception to chronic pain: Structure-function properties of neural pathways and their reorganization.” Within this consortium, all data must be shared following a harmonized structure across projects, and no study will be published openly until all projects have completed initial analysis and quality control.

      Reviewer #3 (Public Review):

      Summary:

      Authors suggest a new biomarker of chronic back pain with the option to predict the result of treatment. The authors found a significant difference in a fractional anisotropy measure in superior longitudinal fasciculus for recovered patients with chronic back pain.

      Strengths:

      The results were reproduced in three different groups at different studies/sites.

      Weaknesses:

      - The number of participants is still low.

      The reviewer raises a very important point of limited sample size. As discussed in our replies to reviewer number 1:

      We acknowledge the small sample size in the “Limitations” section of the discussion.   In the resubmission, we acknowledge the degree of flexibility that is afforded by having access to all the data at once. However, we also note that our SLF-FA based model is a simple cut-off approach that does not include any learning or hidden layers and that the data obtained from Open Pain were never part of the “training” set at any point at either the New Haven or the Mannheim site.  Regarding our SVC approach we follow standard procedures for machine learning where we never mix the training and testing sets. The models are trained on the training data with parameters selected based on cross-validation within the training data. Therefore, no models have ever seen the test data set. The model performances we reported reflect the prognostic accuracy of our model. We write in the limitation section of the discussion (page 20, lines 20-21, and page 21, lines 1-6):

      “In addition, at the time of analysis, we had “access” to all the data, which may lead to bias in model training and development.  We believe that the data presented here are nevertheless robust since multisite validated but need replication. Additionally, we followed standard procedures for machine learning where we never mix the training and testing sets. The models were trained on the training data with parameters selected based on cross-validation within the training data. Therefore, no models have ever seen the test data set. The model performances we reported reflect the prognostic accuracy of our model”. 

      Finally, as discussed by Spisak et al., 10 the key determinant of the required sample size in predictive modeling is the ” true effect size of the brain-phenotype relationship”, which we think is the determinant of the replication we observe in this study. As such the effect size in the New Haven and Mannheim data is Cohen’s d >1.

      - An explanation of microstructure changes was not given.

      The reviewer points to an important gap in our discussion.  While we cannot do a direct study of actual tissue microstructure, we explored further the changes observed in the SLF by calculating diffusivity measures. We have now performed the analysis of mean, axial, and radial diffusivity. 

      In the results section we added (page 7, lines 12-19): “We also examined mean diffusivity (MD), axial diffusivity (AD), and radial diffusivity (RD) extracted from the right SLF shown in Fig.1 to further understand which diffusion component is different between the groups. The right SLF MD is significantly increased (p < 0.05) in the SBPr compared to SBPp patients (Fig. S3), while the right SLF RD is significantly decreased (p < 0.05) in the SBPr compared to SBPp patients in the New Haven data (Fig. S4). Axial diffusivity extracted from the RSLF mask did not show significant difference between SBPr and SBPp (p = 0.28) (Fig. S5).”

      In the discussion, we write (page 15, lines 10-20):

      “Within the significant cluster in the discovery data set, MD was significantly increased, while RD in the right SLF was significantly decreased in SBPr compared to SBPp patients. Higher RD values, indicative of demyelination, were previously observed in chronic musculoskeletal patients across several bundles, including the superior longitudinal fasciculus14.  Similarly, Mansour et al. found higher RD in SBPp compared to SBPr in the predictive FA cluster. While they noted decreased AD and increased MD in SBPp, suggestive of both demyelination and altered axonal tracts,15 our results show increased MD and RD in SBPr with no AD differences between SBPp and SBPr, pointing to white matter changes primarily due to myelin disruption rather than axonal loss, or more complex processes. Further studies on tissue microstructure in chronic pain development are needed to elucidate these processes.”

      - Some technical drawbacks are presented.

      We are uncertain if the reviewer is suggesting that we have acknowledged certain technical drawbacks and expects further elaboration on our part. We kindly request that the reviewer specify what particular issues need to be addressed so that we can respond appropriately.

      Recommendations For The Authors:

      We thank the reviewers for their constructive feedback, which has significantly improved our manuscript. We have done our best to answer the criticisms that they raised point-by-point.

      Reviewer #2 (Recommendations For The Authors):

      The discovery-replication approach of the current study justifies the use of the terminus 'robust.' In contrast, previous studies on predictive biomarkers using functional and structural brain imaging did not pursue similar approaches and have not been replicated. Still, the respective biomarkers are repeatedly referred to as 'robust.' Throughout the manuscript, it would, therefore, be more appropriate to remove the label 'robust' from those studies.

      We thank the reviewer for this valuable suggestion. We removed the label 'robust' throughout the manuscript when referring to the previous studies which didn’t follow the same approach and have not yet been replicated.

      Reviewer #3 (Recommendations For The Authors):

      This is, indeed, quite a well-written manuscript with very interesting findings and patient group. There are a few comments that enfeeble the findings.

      (1) It is a bit frustrating to read at the beginning how important chronic back pain is and the number of patients in the used studies. At least the number of healthy subjects could be higher.

      The reviewer raises an important point regarding the number of pain-free healthy controls (HC) in our samples. We first note that our primary statistical analysis focused on comparing recovered and persistent patients at baseline and validating these findings across sites without directly comparing them to HCs. Nevertheless, the data from New Haven included 28 HCs at baseline, and the data from Mannheim included 24 HCs. Although these sample sizes are not large, they have enabled us to clearly establish that the recovered SBPr patients generally have larger FA values in the right superior longitudinal fasciculus compared to the HCs, a finding consistent across sites (see Figs. 1 and 3). This suggests that the general pain-free population includes individuals with both low and high-risk potential for chronic pain. It also offers one explanation for the reported lack of differences or inconsistent differences between chronic low-back pain patients and HCs in the literature, as these differences likely depend on the (unknown) proportion of high- and low-risk individuals in the control groups. Therefore, if the high-risk group is more represented by chance in the HC group, comparisons between HCs and chronic pain patients are unlikely to yield statistically significant results. Thus, while we agree with the reviewer that the sample sizes of our HCs are limited, this limitation does not undermine the validity of our findings.

      (2) Pain reaction in the brain is in general a quite popular topic and could be connected to the findings or mentioned in the introduction.

      We thank the reviewer for this suggestion.  We have now added a summary of brain response to pain in general; In the introduction, we now write (page 4, lines 19-22 and page 5, lines 1-5):

      “Neuroimaging research on chronic pain has uncovered a shift in brain responses to pain when acute and chronic pain are compared. The thalamus, primary somatosensory, motor areas, insula, and mid-cingulate cortex most often respond to acute pain and can predict the perception of acute pain16-19. Conversely, limbic brain areas are more frequently engaged when patients report the intensity of their clinical pain20, 21. Consistent findings have demonstrated that increased prefrontal-limbic functional connectivity during episodes of heightened subacute ongoing back pain or during a reward learning task is a significant predictor of CBP.12, 22. Furthermore, low somatosensory cortex excitability in the acute stage of low back pain was identified as a predictor of CBP chronicity.23”

      (3) It is clearly observed structural asymmetry in the brain, why not elaborate this finding further? Would SLF be a hub in connectivity analysis? Would FA changes have along tract features? etc etc etc

      The reviewer raises an important point. There is ground to suggest from our data that there is an asymmetry to the role of the SLF in resilience to chronic pain. We discuss this at length in the Discussion section. We have, in addition, we elaborated more in our data analysis using our Population Based Structural Connectome pipeline on the New Haven dataset. Following that approach, we studied both the number of fiber tracts making different parts of the SLF on the right and left side. In addition, we have extracted FA values along fiber tracts and compared the average across groups. Our new analyses are presented in our modified Figures 7 and Fig S11.  These results support the asymmetry hypothesis indeed. The SLF could be a hub of structural connectivity. Please note however, given the nature of our design of discovery and validation, the study of structural connectivity of the SLF is beyond the scope of this paper because tract-based connectivity is very sensitive to data collection parameters and is less accurate with single shell DWI acquisition. Therefore, we will pursue the study of connectivity of the SLF in the future with well-powered and more harmonized data.

      (4) Only FA is mentioned; did the authors work with MD, RD, and AD metrics?

      We thank the reviewer for this suggestion that helps in providing a clearer picture of the differences in the right SLF between SBPr and SBPp. We have now extracted MD, AD, and RD for the predictive mask we discovered in Figure 1 and plotted the values comparing SBPr to SBPp patients in Fig. S3, Fig. S4., and Fig. S5 across all sites using one comprehensive harmonized analysis. We have added in the discussion “Within the significant cluster in the discovery data set, MD was significantly increased, while RD in the right SLF was significantly decreased in SBPr compared to SBPp patients. Higher RD values, indicative of demyelination, were previously observed in chronic musculoskeletal patients across several bundles, including the superior longitudinal fasciculus14.  Similarly, Mansour et al. found higher RD in SBPp compared to SBPr in the predictive FA cluster. While they noted decreased AD and increased MD in SBPp, suggestive of both demyelination and altered axonal tracts15, our results show increased MD and RD in SBPr with no AD differences between SBPp and SBPr, pointing to white matter changes primarily due to myelin disruption rather than axonal loss, or more complex processes. Further studies on tissue microstructure in chronic pain development are needed to elucidate these processes.”

      (5) There are many speculations in the Discussion, however, some of them are not supported by the results.

      We agree with the reviewer and thank them for pointing this out. We have now made several changes across the discussion related to the wording where speculations were not supported by the data. For example, instead of writing (page 16, lines 7-9): “Together the literature on the right SLF role in higher cognitive functions suggests, therefore, that resilience to chronic pain is a top-down phenomenon related to visuospatial and body awareness.”, We write: “Together the literature on the right SLF role in higher cognitive functions suggests, therefore, that resilience to chronic pain might be related to a top-down phenomenon involving visuospatial and body awareness.”

      (6) A method section was written quite roughly. In order to obtain all the details for a potential replication one needs to jump over the text.

      The reviewer is correct; our methodology may have lacked more detailed descriptions.  Therefore, we have clarified our methodology more extensively.  Under “Estimation of structural connectivity”; we now write (page 28, lines 20,21 and page 29, lines 1-19):

      “Structural connectivity was estimated from the diffusion tensor data using a population-based structural connectome (PSC) detailed in a previous publication.24 PSC can utilize the geometric information of streamlines, including shape, size, and location for a better parcellation-based connectome analysis. It, therefore, preserves the geometric information, which is crucial for quantifying brain connectivity and understanding variation across subjects. We have previously shown that the PSC pipeline is robust and reproducible across large data sets.24 PSC output uses the Desikan-Killiany atlas (DKA) 25 of cortical and sub-cortical regions of interest (ROI). The DKA parcellation comprises 68 cortical surface regions (34 nodes per hemisphere) and 19 subcortical regions. The complete list of ROIs is provided in the supplementary materials’ Table S6.  PSC leverages a reproducible probabilistic tractography algorithm 26 to create whole-brain tractography data, integrating anatomical details from high-resolution T1 images to minimize bias in the tractography. We utilized DKA 25 to define the ROIs corresponding to the nodes in the structural connectome. For each pair of ROIs, we extracted the streamlines connecting them by following these steps: 1) dilating each gray matter ROI to include a small portion of white matter regions, 2) segmenting streamlines connecting multiple ROIs to extract the correct and complete pathway, and 3) removing apparent outlier streamlines. Due to its widespread use in brain imaging studies27, 28, we examined the mean fractional anisotropy (FA) value along streamlines and the count of streamlines in this work. The output we used includes fiber count, fiber length, and fiber volume shared between the ROIs in addition to measures of fractional anisotropy and mean diffusivity.”

      (7) Why not join all the data with harmonisation in order to reproduce the results (TBSS)

      We have followed the reviewer’s suggestion; we used neuroCombat harmonization after pooling all the diffusion weighted data into one TBSS analysis. Our results remain the same after harmonization. 

      In the Supplementary Information we added a paragraph explaining the method for harmonization; we write (SI, page 3, lines 25-34):

      “Harmonization of DTI data using neuroCombat. Because the 3 data sets originated from different sites using different MR data acquisition parameters and slightly different recruitment criteria, we applied neuroCombat 29  to correct for site effects and then repeated the TBSS analysis shown in Figure 1 and the validation analyses shown in Figures 5 and 6. First, the FA maps derived using the FDT toolbox were pooled into one TBSS analysis where registration to a standard template FA template (FMRIB58_FA_1mm.nii.gz part of FSL) was performed.  Next, neuroCombat was applied to the FA maps as implemented in Python with batch (i.e., site) effect modeled with a vector containing 1 for New Haven, 2 for Chicago, and 3 for Mannheim originating maps, respectively. The harmonized maps were then skeletonized to allow for TBSS.”

      And in the results section, we write (page 12, lines 2-21):

      “Validation after harmonization

      Because the DTI data sets originated from 3 sites with different MR acquisition parameters, we repeated our TBSS and validation analyses after correcting for variability arising from site differences using DTI data harmonization as implemented in neuroCombat. 29 The method of harmonization is described in detail in the Supplementary Methods. The whole brain unpaired t-test depicted in Figure 1 was repeated after neuroCombat and yielded very similar results (Fig. S9A) showing significantly increased FA in the SBPr compared to SBPp patients in the right superior longitudinal fasciculus (MNI-coordinates of peak voxel: x = 40; y = - 42; z = 18 mm; t(max) = 2.52; p < 0.05, corrected against 10,000 permutations).  We again tested the accuracy of local diffusion properties (FA) of the right SLF extracted from the mask of voxels passing threshold in the New Haven data (Fig.S9A) in classifying the Mannheim and the Chicago patients, respectively, into persistent and recovered. FA values corrected for age, gender, and head displacement accurately classified SBPr  and SBPp patients from the Mannheim data set with an AUC = 0.67 (p = 0.023, tested against 10,000 random permutations, Fig. S9B and S7D), and patients from the Chicago data set with an AUC = 0.69 (p = 0.0068) (Fig. S9C and S7E) at baseline, and an AUC = 0.67 (p = 0.0098)  (Fig. S9D and S7F) patients at follow-up,  confirming the predictive cluster from the right SLF across sites. The application of neuroCombat significantly changes the FA values as shown in Fig.S10 but does not change the results between groups.”

      Minor comments

      (1) In the case of New Haven data, one used MB 4 and GRAPPA 2, these two factors accelerate the imaging 8 times and often lead to quite a poor quality.<br /> Any kind of QA?

      We thank the reviewer for identifying this error. GRAPPA 2 was in fact used for our T1-MPRAGE image acquisition but not during the diffusion data acquisition. The diffusion data were acquired with a multi-band acceleration factor of 4.  We have now corrected this mistake.

      (2) Why not include MPRAGE data into the analysis, in particular, for predictions?

      We thank the reviewer for the suggestion. The collaboration on this paper was set around diffusion data. In addition, MPRAGE data from New Haven related to prediction is already published (10.1073/pnas.1918682117) and MPRAGE data of the Mannheim data set is a part of the larger project and will be published elsewhere.

      (3) In preprocessing, the authors wrote: "Eddy current corrects for image distortions due to susceptibility-induced distortions and eddy currents in the gradient coil"<br /> However, they did not mention that they acquired phase-opposite b0 data. It means eddy_openmp works likely only as an alignment tool, but not susceptibility corrector.

      We kindly thank the reviewer for bringing this to our attention. We indeed did not collect b0 data in the phase-opposite direction, however, eddy_openmp can still be used to correct for eddy current distortions and perform motion correction, but the absence of phase-opposite b0 data may limit its ability to fully address susceptibility artifacts. This is now noted in the Supplementary Methods under Preprocessing section (SI, page 3, lines 16-18): “We do note, however, that as we did not acquire data in the phase-opposite direction, the susceptibility-induced distortions may not be fully corrected.”

      (4) Version of FSL?

      We thank the reviewer for addressing this point that we have now added under the Supplementary Methods (SI, page 3, lines 10-11): “Preprocessing of all data sets was performed employing the same procedures and the FMRIB diffusion toolbox (FDT) running on FSL version 6.0.”

      (5) Some short sketches about the connectivity analysis could be useful, at least in SI.

      We are grateful for this suggestion that improves our work. We added the sketches about the connectivity analysis, please see Figure 7 and Supplementary Figure 11.

      (6) Machine learning: functions, language, version?

      We thank the reviewer for pointing out these minor points that we now hope to have addressed in our resubmission in the Methods section by adding a detailed description of the structural connectivity analysis. We added: “The DKA parcellation comprises 68 cortical surface regions (34 nodes per hemisphere) and 19 subcortical regions. The complete list of ROIs is provided in the supplementary materials’ Table S7.  PSC leverages a reproducible probabilistic tractography algorithm 26 to create whole-brain tractography data, integrating anatomical details from high-resolution T1 images to minimize bias in the tractography. We utilized DKA 25 to define the ROIs corresponding to the nodes in the structural connectome. For each pair of ROIs, we extracted the streamlines connecting them by following these steps: 1) dilating each gray matter ROI to include a small portion of white matter regions, 2) segmenting streamlines connecting multiple ROIs to extract the correct and complete pathway, and 3) removing apparent outlier streamlines. Due to its widespread use in brain imaging studies27, 28, we examined the mean fractional anisotropy (FA) value along streamlines and the count of streamlines in this work. The output we used includes fiber count, fiber length, and fiber volume shared between the ROIs in addition to measures of fractional anisotropy and mean diffusivity.”

      The script is described and provided at: https://github.com/MISICMINA/DTI-Study-Resilience-to-CBP.git.

      (7) Ethical approval?

      The New Haven data is part of a study that was approved by the Yale University Institutional Review Board. This is mentioned under the description of the data “New Haven (Discovery) data set (page 23, lines 1,2).  Likewise, the Mannheim data is part of a study approved by Ethics Committee of the Medical Faculty of Mannheim, Heidelberg University, and was conducted in accordance with the declaration of Helsinki in its most recent form. This is also mentioned under “Mannheim data set” (page 26, lines 2-5): “The study was approved by the Ethics Committee of the Medical Faculty of Mannheim, Heidelberg University, and was conducted in accordance with the declaration of Helsinki in its most recent form.”

      (1) Traeger AC, Henschke N, Hubscher M, et al. Estimating the Risk of Chronic Pain: Development and Validation of a Prognostic Model (PICKUP) for Patients with Acute Low Back Pain. PLoS Med 2016;13:e1002019.

      (2) Hill JC, Dunn KM, Lewis M, et al. A primary care back pain screening tool: identifying patient subgroups for initial treatment. Arthritis Rheum 2008;59:632-641.

      (3) Hockings RL, McAuley JH, Maher CG. A systematic review of the predictive ability of the Orebro Musculoskeletal Pain Questionnaire. Spine (Phila Pa 1976) 2008;33:E494-500.

      (4) Chou R, Shekelle P. Will this patient develop persistent disabling low back pain? JAMA 2010;303:1295-1302.

      (5) Silva FG, Costa LO, Hancock MJ, Palomo GA, Costa LC, da Silva T. No prognostic model for people with recent-onset low back pain has yet been demonstrated to be suitable for use in clinical practice: a systematic review. J Physiother 2022;68:99-109.

      (6) Kent PM, Keating JL. Can we predict poor recovery from recent-onset nonspecific low back pain? A systematic review. Man Ther 2008;13:12-28.

      (7) Hruschak V, Cochran G. Psychosocial predictors in the transition from acute to chronic pain: a systematic review. Psychol Health Med 2018;23:1151-1167.

      (8) Hartvigsen J, Hancock MJ, Kongsted A, et al. What low back pain is and why we need to pay attention. Lancet 2018;391:2356-2367.

      (9) Tanguay-Sabourin C, Fillingim M, Guglietti GV, et al. A prognostic risk score for development and spread of chronic pain. Nat Med 2023;29:1821-1831.

      (10) Spisak T, Bingel U, Wager TD. Multivariate BWAS can be replicable with moderate sample sizes. Nature 2023;615:E4-E7.

      (11) Liu Y, Zhang HH, Wu Y. Hard or Soft Classification? Large-margin Unified Machines. J Am Stat Assoc 2011;106:166-177.

      (12) Loffler M, Levine SM, Usai K, et al. Corticostriatal circuits in the transition to chronic back pain: The predictive role of reward learning. Cell Rep Med 2022;3:100677.

      (13) Smith SM, Dworkin RH, Turk DC, et al. Interpretation of chronic pain clinical trial outcomes: IMMPACT recommended considerations. Pain 2020;161:2446-2461.

      (14) Lieberman G, Shpaner M, Watts R, et al. White Matter Involvement in Chronic Musculoskeletal Pain. The Journal of Pain 2014;15:1110-1119.

      (15) Mansour AR, Baliki MN, Huang L, et al. Brain white matter structural properties predict transition to chronic pain. Pain 2013;154:2160-2168.

      (16) Wager TD, Atlas LY, Lindquist MA, Roy M, Woo CW, Kross E. An fMRI-based neurologic signature of physical pain. N Engl J Med 2013;368:1388-1397.

      (17) Lee JJ, Kim HJ, Ceko M, et al. A neuroimaging biomarker for sustained experimental and clinical pain. Nat Med 2021;27:174-182.

      (18) Becker S, Navratilova E, Nees F, Van Damme S. Emotional and Motivational Pain Processing: Current State of Knowledge and Perspectives in Translational Research. Pain Res Manag 2018;2018:5457870.

      (19) Spisak T, Kincses B, Schlitt F, et al. Pain-free resting-state functional brain connectivity predicts individual pain sensitivity. Nat Commun 2020;11:187.

      (20) Baliki MN, Apkarian AV. Nociception, Pain, Negative Moods, and Behavior Selection. Neuron 2015;87:474-491.

      (21) Elman I, Borsook D. Common Brain Mechanisms of Chronic Pain and Addiction. Neuron 2016;89:11-36.

      (22) Baliki MN, Petre B, Torbey S, et al. Corticostriatal functional connectivity predicts transition to chronic back pain. Nat Neurosci 2012;15:1117-1119.

      (23) Jenkins LC, Chang WJ, Buscemi V, et al. Do sensorimotor cortex activity, an individual's capacity for neuroplasticity, and psychological features during an episode of acute low back pain predict outcome at 6 months: a protocol for an Australian, multisite prospective, longitudinal cohort study. BMJ Open 2019;9:e029027.

      (24) Zhang Z, Descoteaux M, Zhang J, et al. Mapping population-based structural connectomes. Neuroimage 2018;172:130-145.

      (25) Desikan RS, Segonne F, Fischl B, et al. An automated labeling system for subdividing the human cerebral cortex on MRI scans into gyral based regions of interest. Neuroimage 2006;31:968-980.

      (26) Maier-Hein KH, Neher PF, Houde J-C, et al. The challenge of mapping the human connectome based on diffusion tractography. Nature Communications 2017;8:1349.

      (27) Chiang MC, McMahon KL, de Zubicaray GI, et al. Genetics of white matter development: a DTI study of 705 twins and their siblings aged 12 to 29. Neuroimage 2011;54:2308-2317.

      (28) Zhao B, Li T, Yang Y, et al. Common genetic variation influencing human white matter microstructure. Science 2021;372.

      (29) Fortin JP, Parker D, Tunc B, et al. Harmonization of multi-site diffusion tensor imaging data. Neuroimage 2017;161:149-170.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors intended to investigate the earliest mechanisms enabling self-prioritization, especially in the attention. Combining a temporal order judgement task with computational modelling based on the Theory of Visual Attention (TVA), the authors suggested that the shapes associated with the self can fundamentally alter the attentional selection of sensory information into awareness. This self-prioritization in attentional selection occurs automatically at early perceptual stages. Furthermore, the processing benefits obtained from attentional selection via self-relatedness and physical salience were separated from each other.

      Strengths:

      The manuscript is written in a way that is easy to follow. The methods of the paper are very clear and appropriate.

      Thank you for your valuable feedback and helpful suggestions. Please see specific answers below.

      Weaknesses:

      There are two main concerns:

      (1) The authors had a too strong pre-hypothesis that self-prioritization was associated with attention. They used the prior entry to consciousness (awareness) as an index of attention, which is not appropriate. There may be other processing that makes the stimulus prior to entry to consciousness (e.g. high arousal, high sensitivity), but not attention. The self-related/associated stimulus may be involved in such processing but not attention to make the stimulus easily caught. Perhaps the authors could include other methods such as EEG or MEG to answer this question.

      We found the possibility of other mechanisms to be responsible for “prior entry” interesting too, but believe there are solid grounds for the hypothesis that it is indicative of attention:

      First, prior entry has a long-standing history as in index of attention (e.g., Titchener, 1903; Shore et al., 2001; Yates and Nicholls, 2009; Olivers et al. 2011; see Spence & Parise, 2010, for a review.) Of course, other factors (like the ones mentioned) can contribute to encoding speed. However, for the perceptual condition, we systematically varied a stimulus feature that is associated with selective attention (salience, see e.g. Wolfe, 2021) and kept other features that are known to be associated with other factors such as arousal and sensitivity constant across the two variants (e.g. clear over threshold visibility) or varied them between participants (e.g. the colours / shapes used).

      Second, in the social salience condition we used a manipulation that has repeatedly been used to establish social salience effects in other paradigms (e.g., Li et al., 2022; Liu & Sui, 2016; Scheller et al., 2024; Sui et al., 2015; see Humphreys & Sui, 2016, for a review). We assume that the reviewer’s comment suggests that changes in arousal or sensitivity may be responsible for social salience effects, specifically. We have several reasons to interpret the social salience effects as an alteration in attentional selection, rather than a result of arousal or sensitivity:

      Arousal and attention are closely linked. However, within the present model, arousal is more likely linked to the availability of processing resources (capacity parameter C). That is, enhanced arousal is typically not stimulus-specific, and therefore unlikely affects the *relative* advantage in processing weights/rates of the self-associated (vs other-associated) stimuli. Indeed, a recent study showed that arousal does not modulate the relative division of attentional resources (as modelled by the Theory of Visual Attention; Asgeirsson & Nieuwenhuis, 2017). As such, it is unlikely that arousal can explain the observed results in relative processing changes for the self and other identities.

      Further, there is little reason to assume that presenting a different shape enhances perceptual sensitivity. Firstly, all stimuli were presented well above threshold, which would shrink any effects that were resulting from increases in sensitivity alone. Secondly, shape-associations were counterbalanced across participants, reducing the possibility that specific features, present in the stimulus display, lead to the measurable change in processing rates as a result of enhanced shape-sensitivity.

      Taken together, both, the wealth of literature that suggests prior entry to index attention and the specific design choices within our study, strongly support the notion that the observed changes in processing rates are indicative of changes in attentional selection, rather than other mechanisms (e.g. arousal, sensitivity).

      (2) The authors suggested that there are two independent attention processes. I suspect that the brain needs two attention systems. Is there a probability that the social and perceptual (physical properties of the stimulus) salience fired the same attention processing through different processing?

      We appreciate this thought-provoking comment. We conceptualize attention as a process that can facilitate different levels of representation, rather than as separate systems tuned to specific types of information. Different forms of representation, such as the perceptual shape, or the associated social identity, may be impacted by the same attentional process at different levels of representation. Indeed, our findings suggest that both social and perceptual salience effects may result from the same attentional system, albeit at different levels of representation. This is further supported by the additivity of perceptual and social salience effects and the negative correlation of processing facilitations between perceptually and socially salient cues. These results may reflect a trade-off in how attentional resources are distributed between either perceptually or socially salient stimuli.

      Reviewer #2 (Public review):

      Summary:

      The main aim of this research was to explore whether and how self-associations (as opposed to other associations) bias early attentional selection, and whether this can explain well-known self-prioritization phenomena, such as the self-advantage in perceptual matching tasks. The authors adopted the Visual Attention Theory (VAT) by estimating VAT parameters using a hierarchical Bayesian model from the field of attention and applied it to investigate the mechanisms underlying self-prioritization. They also discussed the constraints on the self-prioritization effect in attentional selection. The key conclusions reported were:

      (1) Self-association enhances both attentional weights and processing capacity

      (2) Self-prioritization in attentional selection occurs automatically but diminishes when active social decoding is required, and

      (3) Social and perceptual salience capture attention through distinct mechanisms.

      Strengths:

      Transferring the Theory of Visual Attention parameters estimated by a hierarchical Bayesian model to investigate self-prioritization in attentional selection was a smart approach. This method provides a valuable tool for accessing the very early stages of self-processing, i.e., attention selection. The authors conclude that self-associations can bias visual attention by enhancing both attentional weights and processing capacity and that this process occurs automatically. These findings offer new insights into self-prioritization from the perspective of the early stage of attentional selection.

      Thank you for your valuable feedback and helpful suggestions. Please see specific answers below.

      Weaknesses:

      (1) The results are not convincing enough to definitively support their conclusions. This is due to inconsistent findings (e.g., the model selection suggested condition-specific c parameters, but the increase in processing capacity was only slight; the correlations between attentional selection bias and SPE were inconsistent across experiments), unexpected results (e.g., when examining the impact of social association on processing rates, the other-associated stimuli were processed faster after social association, while the self-associated stimuli were processed more slowly), and weak correlations between attentional bias and behavioral SPE, which were reported without any p-value corrections. Additionally, the reasons why the attentional bias of self-association occurs automatically but disappears during active social decoding remain difficult to explain. It is also possible that the self-association with shapes was not strong enough to demonstrate attention bias, rather than the automatic processes as the authors suggest. Although these inconsistencies and unexpected results were discussed, all were post hoc explanations. To convince readers, empirical evidence is needed to support these unexpected findings.

      Thank you for outlining the specific points that raise your concern. We were happy to address these points as follows:

      a. Replications and Consistency: In our study, we consistently observed trends (relative reduction in processing speed of the self-associated stimulus) in the social salience conditions across experiments. While Experiment 2 demonstrated a significant reduction in processing rate towards self-stimuli, there was a notable trend in Experiment 1 as well.

      b. Condition-specific parameters: The condition-specific C parameters, though presenting a small effect size, significantly improved model fit. Inspecting the HDI ranges of our estimated C parameters indicates a high probability (85-89%) that processing capacity increased due to social associations, suggesting that even small changes (~2Hz) can hold meaningful implications within the context attentional selection.

      Please also note that the main conclusions about relative salience (self/other, salient/non-salient) are based on the relative processing rates. Processing rates are the product of the processing capacity (condition- but not stimulus dependent) and the attentional weight (condition and stimulus dependent). The latter is crucial to judge the *relative* advantage of the salient stimulus. Hence, the self-/salient stimulus advantage that is reflected in the ‘processing rate difference’ is automatically also reflected in the relative attentional weights attributed to the self/other and salient/non-salient stimuli. As such, the overall results of an automatic relative advantage of self-associated stimuli hold, independently of the change in overall processing capacity.

      c. Correlations: Regarding the correlations the reviewer noted, we wish to clarify that these were exploratory, and not the primary focus of our research. The aim of these exploratory analyses was to gauge the contribution of attentional selection to matching-based SPEs. As SPEs measured via the matching task are typically based on multiple different levels of processing, the contribution of early attentional selection to their overall magnitude was unclear. Without being able to gauge the possible effect sizes, corrected analyses may prevent detecting small but meaningful effects. As such, the effect sizes reported serve future studies to estimate power a priori and conduct well-powered replications of such exploratory effects. Additionally, Bayes factors were provided to give an appreciation of the strength of the evidence, all suggesting at least moderate evidence in favour of a correlation. Lastly, please note that effects that were measured within individuals and task (processing rate increase in social and perceptual decision dimensions in the TOJ task) showed consistent patterns, suggesting that the modulations within tasks were highly predictive of each other, while the modulations between tasks were not as clearly linked. We will add this clarification to the revised manuscript.

      d. Unexpected results: The unexpected results concerning the processing rates of other-associated versus self-associated stimuli certainly warrant further discussion. We believe that the additional processing steps required for social judgments, reflected in enhanced reaction times, may explain the slower processing of self-associated stimuli in that dimension. We agree that not all findings will align with initial hypotheses, and this variability presents avenues for further research. We have added this to the discussion of social salience effects.

      e. Whether association strength can account for the findings: We appreciate the scepticism regarding the strength of self-association with shapes. However, our within-participant design and control matching task indicate that the relative processing advantage for self-associated stimuli holds across conditions. This makes the scenario that “the self-association with shapes was not strong enough to demonstrate attention bias” very unlikely. Firstly, the relative processing advantage of self-associated stimuli in the perceptual decision condition, and the absence of such advantage in the social decision condition, were evidenced in the same participants. Hence, the strength of association between shapes and social identities was the same for both conditions. However, we only find an advantage for the self-associated shape when participants make perceptual (shape) judgements. It is therefore highly unlikely that the “association strength” can account for the difference in the outcomes between the conditions in experiment 1. Also, note that the order in which these conditions were presented was counter-balanced across participants, reducing the possibility that the automatic self-advantage was merely a result of learning or fatigue. Secondly, all participants completed the standard matching task to ascertain that the association between shapes and identities did indeed lead to processing advantages (across different levels).

      In summary, we believe that the evidence we provide supports the final conclusions. We do, of course, welcome any further empirical evidence that could enhance our understanding of the contribution of different processing levels to the SPE and are committed to exploring these areas in future work.

      (2) The generalization of the findings needs further examination. The current results seem to rely heavily on the perceptual matching task. Whether this attentional selection mechanism of self-prioritization can be generalized to other stimuli, such as self-name, self-face, or other domains of self-association advantages, remains to be tested. In other words, more converging evidence is needed.

      The reviewer indicates that the current findings heavily rely on the perceptual matching task, and it would be more convincing to include other paradigm(s) and different types of stimuli. We are happy to address these points here: first, we specifically used a temporal order paradigm to tap into specific processes, rather than merely relying on the matching task. Attentional selection is, along with other processes, involved in matching, but the TOJ-TVA approach allows tapping into attentional selection specifically.  Second, self-prioritization effects have been replicated across a wide range of stimuli (e.g. faces: Wozniak et al., 2018; names or owned objects: Scheller & Sui, 2022a, or even fully unfamiliar stimuli: Wozniak & Knoblich, 2019) and paradigms (e.g. matching task: Sui et al., 2012; cross-modal cue integration: e.g. Scheller & Sui, 2022b; Scheller et al., 2023; continuous flash suppression: Macrae et al., 2017; temporal order judgment: Constable et al., 2019; Truong et al., 2017). Using neutral geometric shapes, rather than faces and names, addresses a key challenge in self research: mitigating the influence of stimulus familiarity on results. In addition, these newly learned, simple stimuli can be combined with other paradigms, such as the TOJ paradigm in the current study, to investigate the broader impact of self-processing on perception and cognition.

      To the best of our knowledge, this is the first study showing evidence about the mechanisms that are involved in early attentional selection of socially salient stimuli. Future replications and extensions would certainly be useful, as with any experimental paradigm.

      (3) The comparison between the "social" and "perceptual" tasks remains debatable, as it is challenging to equate the levels of social salience and perceptual salience. In addition, these two tasks differ not only in terms of social decoding processes but also in other aspects such as task difficulty. Whether the observed differences between the tasks can definitively suggest the specificity of social decoding, as the authors claim, needs further confirmation.

      Equating the levels of social and perceptual salience is indeed challenging, but not an aim of the present study. Instead, the present study directly compares the mechanisms and effects of social and perceptual salience, specifically experiment 2. By manipulating perceptual salience (relative colour) and social salience (relative shape association) independently and jointly, and quantifying the effects on processing rates, our study allows to directly delineate the contributions of each of these types of salience. The results suggest additive effects (see also Figure 7). Indeed, the possibility remains that these effects are additive because of the use of different perceptual features, so it would be helpful for future studies to explore whether similar perceptual features lead to (supra-/sub-) additive effects. In either case, the study design allows to directly compare the effects and mechanisms of social and perceptual salience.

      Regarding the social and perceptual decision dimensions, they were not expected to be equated. Indeed, the social decision dimension requires additional retrieval of the associated identity, making it likely more challenging. This additional retrieval is also likely responsible for the slower responses towards the social association compared to the shape itself. However, the motivation to compare the effects of these two decisional dimensions lies in the assumption that the self needs to be task relevant. Some evidence suggests that the self needs to be task-relevant to induce self-prioritization effects (e.g., Woźniak & Knoblich, 2022). However, these studies typically used matching tasks and were powered to detect large effects only (e.g. f = 0.4, n = 18). As it is likely that lacking contribution of decisional processing levels (which interact with task-relevance) will reduce the SPE, smaller self-prioritization effects that result from earlier processing levels may not be detected with sufficient statistical power. Targeting specific processing levels, especially those with relatively early contributions or small effect sizes, requires larger samples (here: n = 70) to provide sufficient power. Indeed, by contrasting the relative attentional selection effects in the present study we find that the self does not need to be task-relevant to produce self-prioritization effects. This is in line with recent findings of prior entry of self-faces (Jubile & Kumar, 2021)

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      Liu et al., present an immersion objective adapter design called RIM-Deep, which can be utilized for enhancing axial resolution and reducing spherical aberrations during inverted confocal microscopy of thick cleared tissue.

      Strengths:

      RI mismatches present a significant challenge to deep tissue imaging, and developing a robust immersion method is valuable in preventing losses in resolution. Liu et al., present data showing that RIM-Deep is suitable for tissue cleared with two different clearing techniques, demonstrating the adaptability and versatility of the approach.

      Greetings, we greatly appreciate your feedback. In truth, we have utilized three distinct clearing techniques, including iDISCO, CUBIC, and MACS, to substantiate the adaptability and multifunctionality of the RIM-Deep adapter.

      Weaknesses:

      Liu et al., claim to have developed a useful technique for deep tissue imaging, but in its current form, the paper does not provide sufficient evidence that their technique performs better than existing ones.

      We are in complete agreement with your recommendation, and the additional experiments will conduct a thorough comparison of the efficacy between the RIM-deep adapter and the official adapter in the context of fluorescence bead experiments, along with their performance in cubic and MASC tissue clearing techniques.

      Reviewer 2 (Public review):

      The authors used different clearing methods to demonstrate the suitability of RIM-Deep for various sample preparation protocols with clearing solutions of different refractive indices. They clearly demonstrate that the RIM-Deep chamber is compatible with all three methods. Brain samples are characterized by complex networks of cells and are often hard to visualize. Despite the dense, complex structure of brain tissue, the RIM-Deep method generated high-quality images of all three samples. As the authors stated, increasing imaging depth often goes hand in hand with purchasing expensive new equipment, exchanging several microscopy parts, or purchasing a new microscopy setup. Innovations like the RIM-Deep chamber might pave the way for cost-effective imaging and expand the applicability of inverted confocal microscopy.

      Weeknesses:

      (1) However, since this study introduces a novel imaging technique aiming to revolutionize imaging of large samples, additional control experiments would strengthen the data. From the three clearing protocols used (CUBIC, MACS, and iDISCO), only the brain section from Macaca fascicularis cleared with iDISCO was imaged with the standard chamber and the RIM-Deep method. This comparison indeed shows a more than 2-fold increase in imaging depth, a significant enhancement in microscopy. However, it would have been important to evaluate and show the imaging depth differences in the other two samples, as they were cleared with different protocols and treated with clearing solutions of different refractive indices compared to iDISCO.

      Thank you for your suggestion. We will investigate the imaging performance of brain tissue using the other two clearing protocols with both the official adapter and the RIM-deep method.

      (2) The description of the figures and figure panels should be improved for a better understanding of the experiments performed and the resulting images/data.

      Thank you for your suggestion. We will revise the figure legends in detail.

      (3) While the authors used a Nikon AX inverted laser scanning confocal microscope, the study would benefit from evaluating the performance of the RIM-Deep method using other inverted confocal microscopes or even wide-field microscopes.

      Thank you for your suggestion. We also recognize that evaluating the performance of the RIM-Deep method on other inverted confocal microscopes will help further validate its applicability and robustness. We will supplement these experiments to expand the scope and reliability of RIM-Deep.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      In their manuscript, the authors propose a learning scheme to enable spiking neurons to learn the appearance probability of inputs to the network. To this end, the neurons rely on error-based plasticity rules for feedforward and recurrent connections. The authors show that this enables the networks to spontaneously sample assembly activations according to the occurrence probability of the input patterns they respond to. They also show that the learning scheme could explain biases in decision-making, as observed in monkey experiments. While the task of neural sampling has been solved before in other models, the novelty here is the proposal that the main drivers of sampling are within-assembly connections, and not between-assembly (Markov chains) connections as in previous models. This could provide a new understanding of how spontaneous activity in the cortex is shaped by synaptic plasticity. 

      The manuscript is well written and the results are presented in a clear and understandable way. The main results are convincing, concerning the spontaneous firing rate dependence of assemblies on input probability, as well as the replication of biases in the decision-making experiment. Nevertheless, the manuscript and model leave open several important questions. The main problem is the unclarity, both in theory and intuitively, of how the sampling exactly works. This also makes it difficult to assess the claims of novelty the authors make, as it is not clear how their work relates to previous models of neural sampling. 

      We agree with the reviewer that our previous manuscript was not clear regarding the mechanism of the model. We have performed additional simulations and included a derivation of the learning rule to address this, which we explain below.

      Regarding the unclarity of the sampling mechanism, the authors state that withinassembly excitatory connections are responsible for activating the neurons according to stimulus probability. However, the intuition for this process is not made clear anywhere in the manuscript. How do the recurrent connections lead to the observed effect of sampling? How exactly do assemblies form from feedforward plasticity? This intuitive unclarity is accompanied by a lack of formal justification for the plasticity rules. The authors refer to a previous publication from the same lab, but it is difficult to connect these previous results and derivations to the current manuscript. The manuscript should include a clear derivation of the learning rules, as well as an (ideally formal) intuition of how this leads to the sampling dynamics in the simulation. 

      We have included a derivation of our plasticity rules in lines 871-919 in the revised manuscript. Consistent with our claim that predictive plasticity updates the feedforward and the recurrent synapses to predict output firing rates, we have shown that the corresponding cost function measures the discrepancy among the recurrent prediction, feedforward prediction, and the output firing rate. The resultant feedforward plasticity is the same with our previous rule (Asabuki and Fukai, 2020), which segments the salient patterns embedded in the input sequence. The recurrent plasticity rule suggests that the recurrent prediction learns the statistical model of the evoked activity, enabling the network to replay the learned internal model.  

      Similarly, for the inhibitory plasticity, we defined a cost function that evaluates the difference between the firing rate and inhibitory potential within each neuron. This rule is crucial for maintaining balanced network dynamics. See our response below for more details on the role of inhibitory plasticity.

      Some of the model details should furthermore be cleared up. First, recurrent connections transmit signals instantaneously, which is implausible. Is this required, would the network dynamics change significantly if, e.g., excitation arrives slightly delayed? Second, why is the homeostasis on h required for replay? The authors show that without it the probabilities of sampling are not matched, but it is not clear why, nor how homeostasis prevents this. Third, G and M have the same plasticity rule except for G being confined to positive values, but there is no formal justification given for this quite unusual rule. The authors should clearly justify (ideally formally) the introduction of these inhibitory weights G, which is also where the manuscript deviates from their previous 2020 work. My feeling is that inhibitory weights have to be constrained in the current model because they have a different goal (decorrelation, not prediction) and thus should operate with a completely different plasticity mechanism. The current manuscript doesn't address this, as there is no overall formal justification for the learning algorithm. 

      First, while the reviewer's suggestion to test with delayed excitation is intriguing and crucial for a more biologically detailed spiking neuron model, we have chosen to maintain the current model configuration. Our use of Poisson spiking neurons, which generate spikes based on instantaneous firing rates, does not heavily depend on precise spike timing information. Therefore, to preserve the simplicity of our results, we kept the model unchanged.

      Second, we agree that our previous claim regarding the importance of the memory trace h for sampling may have been confusing. As shown in Supplementary Figure 7b in the revised manuscript, when we eliminated the dynamics of the memory trace, sampling performance did indeed decrease. However, we also observed that the assembly activity ratio continued to show a linear relationship with stimulus probabilities. Based on these findings, we have revised our claim in the manuscript to clarify that the memory trace is primarily critical for firing rate homeostasis, rather than directly influencing sampling within the learned network. We have explained this in ll. 446-448 in the revised manuscript.

      Third, we explored a new architecture where all recurrent connections are either exclusively excitatory or inhibitory, keeping their sign throughout the learning process. This change addresses the reviewer's concern about our initial assumption that only the inhibitory connection G was constrained to non-negative values. We found that inhibition plays a crucial role in decorrelation and prediction, helping activate specific assemblies through competition while preventing runaway excitation within active assemblies. We have explained this in ll.560-593 in the revised manuscript.

      Finally, the authors should make the relation to previous models of sampling and error-based plasticity more clear. Since there is no formal derivation of the sampling dynamics, it is difficult to assess how they differ exactly from previous (Markov-based) approaches, which should be made more precise. Especially, it would be important to have concrete (ideally experimentally testable) predictions on how these two ideas differ. As a side note, especially in the introduction (line 90), this unclarity about the sampling made it difficult to understand the contrast to Markovian transition models. 

      As the reviewer pointed out, previous computational models have demonstrated that recurrent networks with Hebbian-like plasticity can learn appropriate Markovian statistics (Kappel et al., 2014; Asabuki and Clopath, 2024). However, our model differs conceptually from these previous models. While Kappel et al. showed that STDP in winner-take-all circuits can approximate online learning of hidden Markov models (HMMs), a key difference with our model is that their neural representations acquire sequences using Markovian sampling dynamics, whereas our model does not depend on such ordered sampling. Specifically, in their model, sequential sampling arises from learned structures in the off-diagonal elements of the recurrent connections (i.e., between-assembly connections). In contrast, our network learns to stochastically generate recurrent cell assemblies by relying solely on within-assembly connections. A similar argument can be made for Asabuki and Clopath paper as well. Further, while our model introduced plasticity rule for all types of connections, Asabuki and Clopath paper introduced plasticity for recurrent synapses projecting on the excitatory neurons only and the cell assembly memberships were preconfigured unlike our model. We have added additional clarifying sentences in ll. 757-772 of the revised manuscript to elaborate on this point.

      There are also several related models that have not been mentioned and should be discussed. In 663 ff. the authors discuss the contributions of their model which they claim are novel, but in Kappel et al (STDP Installs in Winner-Take-All Circuits an Online Approximation to Hidden Markov Model Learning) similar elements seem to exist as well, and the difference should be clarified. There is also a range of other models with lateral inhibition that make use of error-based plasticity (most recently reviewed in Mikulasch et al, Where is the error? Hierarchical predictive coding through dendritic error computation), and it should be discussed how the proposed model differs from these. 

      We have clarified the difference from previously proposed recurrent network model to perform Markovian sampling. Please see our reply above.

      We have also included additional sentence in ll. 704-709 in the revised manuscript to discuss how our model differs from similar predictive learning models: “It should be noted that while several network models that perform errorbased computations like ours exploit only inhibitory recurrent plasticity (Mikulasch et al., 2021; Mackwood et al., 2021; Hertäg and Clopath., 2022; Mikulasch et al., 2023), our model learns the structured spontaneous activity to reproduce the evoked statistics by modifying both excitatory and inhibitory recurrent connections.”

      Reviewer #2 (Public Review):

      Summary: 

      The paper considers a recurrent network with neurons driven by external input. During the external stimulation predictive synaptic plasticity adapts the forward and recurrent weights. It is shown that after the presentation of constant stimuli, the network spontaneously samples the states imposed by these stimuli. The probability of sampling stimulus x^(i) is proportional to the relative frequency of presenting stimulus x^(i) among all stimuli i=1,..., 5. 

      Methods: 

      Neuronal dynamics: 

      For the main simulation (Figure 3), the network had 500 neurons, and 5 nonoverlapping stimuli with each activating 100 different neurons where presented. The voltage u of the neurons is driven by the forward weights W via input rates x, the inhibitory recurrent weights G, are restricted to have non-negative weights (Dale's law), and the other recurrent weights M had no sign-restrictions. Neurons were spiking with an instantaneous Poisson firing rate, and each spike-triggered an exponentially decaying postsynaptic voltage deflection. Neglecting time constants of the postsynaptic responses, the expected postsynaptic voltage reads (in vectorial form) as 

      u = W x + (M - G) f (Eq. 5) 

      where f =; phi(u) represents the instantaneous Poisson rate, and phi a sigmoidal nonlinearity. The rate f is only an approximation (symbolized by =;) of phi(u) since an additional regularization variable h enters (taken up in Point 4 below). The initialisation of W and M is Gaussian with mean 0 and variance 1/sqrt(N), N the number of neurons in the network. The initial entries of G are all set to 1/sqrt(N). 

      Predictive synaptic plasticity: 

      The 3 types of synapses were each adapted so that they individually predict the postsynaptic firing rate f, in matrix form 

      ΔW ≈ (f - phi( W x ) ) x^T 

      ΔM ≈ (f - phi( M f ) ) f^T 

      ΔG ≈ (f - phi( M f ) ) f^T but confined to non-negative values of G (Dale's law). 

      The ^T tells us to take the transpose, and the ≈ again refers to the fact that the ϕ entering in the learning rule is not exactly the ϕ determining the rate, only up to the regularization (see Point 4). 

      Main formal result: 

      As the authors explain, the forward weight W and the unconstrained weight M develop such that, in expectations, 

      f =; phi( W x ) =; phi( M f ) =; phi( G f ) , 

      consistent with the above plasticity rules. Some elements of M remain negative. In this final state, the network displays the behaviour as explained in the summary. 

      Major issues: 

      Point 1: Conceptual inconsistency 

      The main results seem to arise from unilaterally applying Dale's law only to the inhibitory recurrent synapses G, but not to the excitatory recurrent synapses M. 

      In fact, if the same non-negativity restriction were also imposed on M (as it is on G), then their learning rules would become identical, likely leading to M=G. But in this case, the network becomes purely forward, u = W x, and no spontaneous recall would arise. Of course, this should be checked in simulations. 

      Because Dale's law was only applied to G, however, M and G cannot become equal, and the remaining differences seem to cause the effect. 

      Predictive learning rules are certainly powerful, and it is reasonable to consider the same type of error-correcting predictive learning rule, for instance for different dendritic branches that both should predict the somatic activity. Or one may postulate the same type of error-correcting predictive plasticity for inhibitory and excitatory synapses, but then the presynaptic neurons should not be identical, as it is assumed here. Both these types of error-correcting and error-forming learning rules for same-branches and inhibitory/excitatory inputs have been considered already (but with inhibitory input being itself restricted to local input, for instance). 

      The model presented above lacked biological plausibility in several key aspects. Specifically, we assumed that the recurrent connection M could change sign through plasticity and be either excitatory or inhibitory, while the inhibitory connection G was restricted to being inhibitory only. This initial setting does not reflect the biological constraint that synapses typically maintain a consistent excitatory or inhibitory type. Furthermore, due to this unconstrained recurrent connectivity M, the original model had two types of inhibitory connections (i.e., the negative part of M and the inhibitory connection G) without providing a clear computational role for each type of inhibition.

      To address these limitations and to understand the role of the two types of inhibition, we explored a new architecture where all recurrent connections are either exclusively excitatory or inhibitory, keeping their sign throughout the learning process. This change addresses the reviewer's concern about our initial assumption that only the inhibitory connection G was constrained to non-negative values. We found that inhibition plays a crucial role in prediction and decorrelation, helping activate specific assemblies through competition while preventing runaway excitation within active assemblies. We have explained this in ll. 561593 in the revised manuscript.

      Point 2: Main result as an artefact of an inconsistently applied Dale's law? 

      The main result shows that the probability of a spontaneous recall for the 5 nonoverlapping stimuli is proportional to the relative time the stimulus was presented. This is roughly explained as follows: each stimulus pushes the activity from 0 up towards f =; phi( W x ) by the learning rule (roughly). Because the mean weights W are initialized to 0, a stimulus that is presented longer will have more time to push W up so that positive firing rates are reached (assuming x is non-negative). The recurrent weights M learn to reproduce these firing rates too, while the plasticity in G tries to prevent that (by its negative sign, but with the restriction to non-negative values). Stimuli that are presented more often, on average, will have more time to reach the positive target and hence will form a stronger and wider attractor. In spontaneous recall, the size of the attractor reflects the time of the stimulus presentation. This mechanism so far is fine, but the only problem is that it is based on restricting G, but not M, to non-negative values. 

      As mentioned above, we have included an additional simulation where all weights are non-negative. We have demonstrated the new results in Figure 6 before presenting the two-population model in the revised manuscript (Figure 7), so that readers can follow the importance of two pathways of inhibitory connections.

      Point 3: Comparison of rates between stimulation and recall. 

      The firing rates with external stimulations will be considerably larger than during replay (unless the rates are saturated). 

      This is a prediction that should be tested in simulations. In fact, since the voltage roughly reads as  u = W x + (M - G) f,  and the learning rules are such that eventually M =; G, the recurrences roughly cancel and the voltage is mainly driven by the external input x. In the state of spontaneous activity without external drive, one has  u = (M - G) f ,  and this should generate considerably smaller instantaneous rates f =; phi(u) than in the case of the feedforward drive (unless f is in both cases at the upper or lower ceiling of phi). This is a prediction that can also be tested. 

      Because the figures mostly show activity ratios or normalized activities, it was not possible for me to check this hypothesis with the current figures. So please show non-normalized activities for comparing stimulation and recall for the same patterns. 

      We agree with the reviewer that the activity levels of spontaneous and induced activity should be compared. We have shown the distributions of activity level of these activities in our new Figure 2d. As expected, we found that the evoked activity showed stronger activity compared to the spontaneous activity.  

      Point 4: Unclear definition of the variable h. 

      The formal definition of h = hi is given by (suppressing here the neuron index i and the h-index of tau) 

      tau dh/dt = -h if h>u, (Eq. 10)  h = u otherwise. 

      But if it is only Equation 10 (nothing else is said), h will always become equal to u, or will vanish, i.e. either h=u or h=0 after some initial transient. In fact, as soon as h>u, h is decaying to 0 according to the first line. If u is >0, then it stops at u=h according to the second line. No reason to change h=u further. If u<=0 while h>u, then h is converging to 0 according to the first line and will stay there. I guess the authors had issues with the recurrent spiking simulations and tried to fix this with some regularization. However as presented, it does not become clear how their regulation works. 

      We apologize for the reviewer that our definition of h was unclear. As the reviewer pointed out, since the memory trace is always positive and larger than (or equal to) the membrane potential, it is possible that the membrane potential becomes always negative and the memory trace reach to 0 constantly. However, since the network is always balanced between excitatory and inhibitory inputs, and it does not happen that the membrane potential always diverges negatively. In fact, we trained without any manipulations other than the memory trace described in the manuscript, and the network was able to learn the assembly structure stably. 

      BTW: In Eq. 11 the authors set the gain beta to beta = beta0/h which could become infinite and, putatively more problematic, negative, depending on the value of h. Maybe some remark would convince a reader that no issues emerge from this. 

      We have mentioned in ll. 864-866 in the revised manuscript that no issues emerge from the slope parameter.

      Added from discussions with the editor and the other reviewers: 

      Thanks for alerting me to this Supplementary Figure 8. Yes, it looks like the authors did apply there Dale's law for both the excitatory and inhibitory synapses. Yet, they also introduced two types of inhibitory pathways converging both to the excitatory and inhibitory neurons. For me, this is a confirmation that applying Dale's law to both excitatory and inhibitory synapses, with identical learning rules as explained in the main part of the paper, does not work. 

      Adding such two pathways is a strong change from the original model as introduced before, and based on which all the Figures in the main text are based. Supplementary Figure 8 should come with an analysis of why a single inhibitory pathway does not work. I guess I gave the reason in my Points 1-3. Some form of symmetry breaking between the recurrent excitation and recurrent inhibition is required so that, eventually, the recurrent excitatory connection will dominate. 

      Making the inhibitory plasticity less expressive by applying Dale's law to only those inhibitory synapses seems to be the answer chosen in the Figures of the main text (but then the criticism of unilaterally applying Dale's law). 

      Applying Dale's law to both types of synapses, but dividing the labor of inhibition into two strictly separate and asymmetric pathways, and hence asymmetric development of excitatory and inhibitory weights, seems to be another option. However, introducing such two separate inhibitory pathways, just to rescue the fact that Dale's law is applied to both types of synapses, is a bold assumption. Is there some biological evidence of such two pathways in the inhibitory, but not the excitatory connections? And what is the computational reasoning to have such a separation, apart from some form of symmetry breaking between excitation and inhibition? I guess, simpler solutions could be found, for instance by breaking the symmetry between the plasticity rules for the excitatory and inhibitory neurons. All these questions, in my view, need to be addressed to give some insights into why the simulations do work. 

      The reviewer’s intuition is correct. To effectively learn cell assembly structures and replay their activities, our model indeed requires two types of inhibitory connections. Please refer to our response above for further details. 

      Overall, Supplementary Figure 8 seems to me too important to be deferred to the Supplement. The reasoning behind the two inhibitory pathways should appear more prominently in the main text. Without this, important questions remain. For instance, when thinking in a rate-based framework, the two inhibitory pathways twice try to explain the somatic firing rate away. Doesn't this lead to a too strong inhibition? Can some steady state with a positive firing rate caused by the recurrence, in the absence of an external drive, be proven? The argument must include the separation into Path 1 and Path 2. So far, this reasoning has not been entered. 

      In fact, it might be that, in a spiking implementation, some sparse spikes will survive. I wonder whether at least some of these spikes survive because of the other rescuing construction with the dynamic variable h (Equation 10, which is not transparent, and that is not taken up in the reasoning either, see my Point 4)

      Perhaps it is helpful for the authors to add this text in the reply to them. 

      We have moved the former Supplemental Figure 8 to the main Figure 7. Please see our response above about the role of dual inhibitory connection types.

      Reviewer #3 (Public Review): 

      Summary: 

      The work shows how learned assembly structure and its influence on replay during spontaneous activity can reflect the statistics of stimulus input. In particular, stimuli that are more frequent during training elicit stronger wiring and more frequent activation during replay. Past works (Litwin-Kumar and Doiron, 2014; Zenke et al., 2015) have not addressed this specific question, as classic homeostatic mechanisms forced activity to be similar across all assemblies. Here, the authors use a dynamic gain and threshold mechanism to circumnavigate this issue and link this mechanism to cellular monitoring of membrane potential history. 

      Strengths: 

      (1) This is an interesting advance, and the authors link this to experimental work in sensory learning in environments with non-uniform stimulus probabilities. 

      (2) The authors consider their mechanism in a variety of models of increasing complexity (simple stimuli, complex stimuli; ignoring Dale's law, incorporating Dale's law). 

      (3) Links a cellular mechanism of internal gain control (their variable h) to assembly formation and the non-uniformity of spontaneous replay activity. Offers a promise of relating cellular and synaptic plasticity mechanisms under a common goal of assembly formation. 

      Weaknesses: 

      (1) However, while the manuscript does show that assembly wiring does follow stimulus likelihood, it is not clear how the assembly-specific statistics of h reflect these likelihoods. I find this to be a key issue. 

      We agree that our previous claim regarding the importance of the memory trace h for sampling may have been confusing. As shown in Supplementary Figure 7b, when we eliminated the dynamics of the memory trace, sampling performance did indeed decrease. However, we also observed that the assembly activity ratio continued to show a linear relationship with stimulus probabilities. Based on these findings, we revised our claim in the manuscript to clarify that the memory trace is primarily critical for learning to avoid trivial solutions, rather than directly influencing sampling within the learned network. We have explained this in ll. 446-448 in the revised manuscript.

      (2) The authors' model does take advantage of the sigmoidal transfer function, and after learning an assembly is either fully active or nearly fully silent (Figure 2a). This somewhat artificial saturation may be the reason that classic homeostasis is not required since runaway activity is not as damaging to network activity. 

      The reviewer's intuition is correct. The saturating nonlinearity is important for the network to form stable assembly structures. We have added an additional sentence in ll. 866-868 to mention this.

      (3) Classic mechanisms of homeostatic regulation (synaptic scaling, inhibitory plasticity) try to ensure that firing rates match a target rate (on average). If the target rate is the same for all neurons then having elevated firing rates for one assembly compared to others during spontaneous activity would be difficult. If these homeostatic mechanisms were incorporated, how would they permit the elevated firing rates for assemblies that represent more likely stimuli? 

      LIF neurons) may solve this problem by utilizing spike-timing statistics.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      Minor issues: 

      Figure 1: It would be helpful to display the equation for output rate here as well. 

      We have included the equation in the revised Figure 1a.

      Figure 3c: Typo "indivisual neurons". 

      We have modified the typo. We thank the reviewer for their careful review.

      Line 325: Do you mean Figure 3f,g? 

      We repeated the task with different numbers of stimuli in Supplementary Figure 1c,d.

      Line 398: Winner-take-all can be misunderstood, as it typically stands for competition in inference, not in learning. 

      We have rephrased it as “unstable dynamics” in l. 400

      Line 429: Are intra-assembly and within-assembly the same? If so please use consistent terminology. 

      We have made the terminology consistent.

      Line 792 ff.: Please mention that (t) was left away. 

      We have included a sentence to mention it in ll. 847-848 in the revised manuscript.

      Line 817: Should u_i be v_i? 

      We have modified the term.

      Methods: What is the value of tau_h? 

      We have used 𝜏! \=10 s, which is mentioned in l. 853

    1. Author response:

      We are appreciative of the editors’ and reviewers’ positive comments and constructive suggestions, which will help us to improve our manuscript. We will make changes as required by the reviewers. Our primary focus will be on revising and clarifying certain aspects:

      First, recent research has revealed a strong correlation between brain synchronization and group decision-making, a key neural marker. We aim to bolster our hypothesis by reviewing additional literature, ensuring accuracy in terminology and appropriateness in phrasing.

      Second, it is crucial to note that we will include additional methodological details, such as the details of the experiment, the significance of individual difference variables, and the details of the data analyses.

      Third, despite introducing a novel perspective in our study, we acknowledge the utilization of the conventional fNIRS hyperscanning analyses, which are widely accepted within the research community. Our methodology entails the identification of significant channels via one-sample t-tests, subsequently complemented by either ANOVAs or independent sample t-tests, without the need for double dipping.

      We will address all the issues raised by the reviewers.We believe that the manuscript will significantly benefit from the insightful suggestions and invaluable contributions made by the editors and reviewers.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      In the second round of reviews, Reviewer 2 made three specific comments. The first comment criticises us for not including a set of equations they had requested in their first review. We did, in fact, include the requested equations in our revised submission, which were in the Supplementary Information, and were also cited in the main text of our revised manuscript and our changes were made clear in our response to the reviewer. The second comment, the reviewer suggested adding one word to a sentence in the abstract. We have made this change (line 23). The third comment, the reviewer highlights a sentence where we agree we could have been more clear. The sentence can be rectified by adding one word to the current sentence, which we have done (line 232). We believe the changes required to our manuscript are very minor, and we have implemented these two suggested changes, which are highlighted in the revised manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer 1

      Overall, this work is quite comprehensive and is logically and rigorously designed. The phenotypic and functional data on 2C are strong.

      Thank you for your positive feedback on our findings!

      (1) Comment from Reviewer 1 suggesting the mechanistic insights of 2C are primarily derived from transcriptomic and genomic datasets without experimental verification. 

      Thank you for emphasizing the importance of experimental validation to support our transcriptomic and genomic findings. We acknowledge the gap in direct experimental evidence for the mechanistic insights of section 2C and recognize the value of such validation in strengthening our conclusions. While we recognize the importance of such validation, our current dataset lacks the comprehensive preliminary results necessary for inclusion in the supplemental material. We believe that the mechanistic insights presented offer a substantial foundation for the future research, where we aim to explore these aspects in depth with targeted experimental approaches.

      Reviewer 2

      Together their data may suggest a regenerative effect of 2C both in vitro and in vivo settings. If confirmed, this study might unlock therapeutic strategy for cardiac regeneration.

      Thank you for your positive comment on the significance of our findings and the valuable therapeutic potential of 2C in cardiac regeneration!

      (1) Comment from Reviewer 2 pointing out the the main hypothesis (line 50) that Isl1 cells have regenerative properties is not extremely novel. 

      We agree with the reviewer that Isl1-positive cells possess regenerative properties. Following the reviewer’s suggestion, we have revised the original wording (line 46 in the revised manuscript).

      (2) Comment from Reviewer 2 asking for providing a rationale for this 20x reduction of A-485 concentration? It would be useful to get a titration of this compound for the effects tested. 

      As suggested by the reviewer, we have added the titration results of A-485 in Figure 1—figure supplement 1F-G.

      (3) Comment from Reviewer 2 confusing to clearly understand what proportion of CMs dedifferentiate to become RCCs. The lineage tracing data suggests only 0.6%-1.5% of cells undergo this transition. It is difficult to understand how such a small fraction can have wide effects in their different experimental settings. This is specifically true when the author quantified nuclear and cytosolic area on brightfield pictures - would the same effect on nuclear/cytosolic area be observed in Isl1 KO cells. 

      We appreciate the reviewer's insightful observation on the proportion of CMs undergoing dedifferentiation into RCCs and the potential impact of this subset on our experimental outcomes. The lineage tracing data indicating that only 0.6%-1.5% of CMs transition to RCCs indeed reflects a modest proportion. This observation raises valid questions regarding the broader implications of such a limited fraction in the context of cardiac regeneration and the experimental effects reported. It's important to note that while the proportion of CMs dedifferentiating into RCCs is small, the biological significance and potential impact of these RCCs could be disproportionately large. Emerging evidence suggests that even a minimal number of stem or progenitor cells can exert significant effects on tissue repair and regeneration, possibly through paracrine mechanisms or by acting as key signaling centers within the tissue microenvironment (Fernandes et al., 2015). Regarding the specific question about 2C’s effects on nuclear/cytosolic area in Isl1 knockout (KO) cells, we appreciate the suggestion and consider that such comparative studies would provide valuable insights for future comprehensively understanding the significant impact of 2C-induced RCCs in future search. In addition, ISL1 KO cells are also described in detail in the article published in eLife in 2018 by Quaranta et al.

      (4) Comment from Reviewer 2 asking for the effect of CHIR + I-BET-762 alone. 

      As suggested by the reviewer, we have added the results of CHIR + T-BET-762 in Figure 1—figure supplement 1H.

      (5) Comment from Reviewer 2 suggesting a transparent explaination about the effects of A-485 on acetylation status.

      We thank the reviewer for highlighting the confusion regarding the effects of A-485 on the acetylation status of H3K27Ac and H3K9Ac. Upon re-examination of our data and statements, we recognize the need for clarity in our explanation and the inconsistency it may have caused (lines 223-231 on page 8).

      Initially, our observations suggested a selective effect of A-485 on H3K27Ac based on early experimental results (Figure 7—figure supplement 1). This conclusion was drawn from preliminary analyses that focused predominantly on this specific histone mark. However, upon further comprehensive examination of our data, including additional replicates and more sensitive detection methods, we observed that A-485 also impacts H3K9Ac levels (Figure 7—figure supplement 1F). This latter finding emerged from expanded datasets that were not initially considered in our preliminary conclusions.

      The "further analyses" mentioned referred to these subsequent experimental investigations, which included chromatin immunoprecipitation (ChIP) assays and extended sample sizes, providing a more robust dataset for evaluating the effects of A-485. We understand the importance of transparency and rigor in scientific communication. To address this, we have revised the manuscript to clearly delineate the progression of our analyses and the evidential basis for our revised understanding of A-485's effects. This includes a detailed description of the methodologies employed in our follow-up experiments (line 537 on page 27), the statistical approaches for data analysis (lines 226-227 in supporting information), and how these led to the updated interpretation regarding A-485's impact on histone acetylation (lines232-269).

      (6) Comment from Reviewer 2 asking for the difference in the ChIP peaks representation of the y-axis on the ChIP traces.

      Thank you for raising this quest. Actually, we did not normalise the sequencing depth and the y-axis represents the number of counts (line 537 on page 27 and lines 226-227 in supporting information).

      (7) Comment from Reviewer 2 suggesting the possibility of testing this 2C protocol on mESCs to see if similar changes are subject to and how these mouse RCCs differ transcriptionally from Isl1+ progenitor cells isolated from neonatal mice (P1-P5)?

      Thank you for your insightful questions. Testing the 2C protocol on mouse embryonic stem cells (mESCs) to observe if similar changes occur presents an excellent opportunity to further validate the versatility and applicability of our findings across different stem cell models. We agree that such experiments would not only strengthen the current study but also provide valuable insights into the conservation of mechanisms across species. We are currently in the process of setting up experiments to address this very question and anticipate that the results will significantly contribute to our understanding of cardiomyocyte differentiation processes. Regarding the transcriptional comparison between mouse regenerative cardiac cells (RCCs) induced by our 2C protocol and Isl1+ progenitors isolated from neonatal mice (P1-P5), this comparison is indeed crucial for delineating the specific identity and developmental potential of the RCCs generated. However, a comprehensive side-by-side transcriptomic analysis is required to systematically identify these differences and understand their biological implications. We plan to undertake this analysis as part of our future studies, which will include detailed RNA sequencing and comparative gene expression profiling to elucidate the transcriptional similarities and differences between these cell populations. These future directions will enhance our current findings, provide a deeper mechanistic understanding, and confirm the potential of the 2C protocol in regenerative medicine applications. We appreciate the reviewer's suggestions and acknowledge the importance of these experiments in advancing the field.

      (8) Comment from Reviewer 2 with a suggestion to have a precise clarification of statistics & data acquisition.

      As suggested by the reviewer, we have revised clarifications to make them clearer (lines 228-233 in supporting information and a precise description of each paragraph involving statistical analyses).

      Reviewer 3

      The findings may have a translation potential. The idea of promoting the regenerative capacity of the heart by reprogramming CMs into RCCs is interesting.

      Thank you for your appreciation of the significance and translational potential of our findings!

      (1) Comment from Reviewer 3 suggesting the mechanism involved in the 2C-mediated generation of RCCs is unclear and the lead found in the RAN-seq and ChIP-seq are not experimatally validated.

      We acknowledge the reviewer's concern regarding the lack of experimental validation for the mechanisms identified through RNA-seq and ChIP-seq analyses in the generation of RCCs from the 2C state. We understand the importance of substantiating these molecular leads with empirical data to strengthen our conclusions. Currently, our findings are based on in-depth bioinformatic analyses, which have provided us with valuable insights and a strong basis for hypothesis generation. Moving forward, we plan to prioritize experimental validation of key pathways and targets identified in our study. This will include designing targeted experiments to elucidate the functional roles of these mechanisms in the 2C-mediated generation of RCCs. We appreciate the opportunity to clarify our approach and future directions, and we are committed to addressing this gap in subsequent work.

      (2) Comment from Reviewer 3 considering the very low number of RCCs (0.6%-1.5% of cells) generated cannot protect the heart from MI, and whether 2C affects the the survival or metabolism of existing CM under hypoxia conditions, and what percentage of cells are regenerated by 2C treatment post-MI?

      We appreciate the reviewer's insightful queries regarding the protective effects of 2C treatment against myocardial infarction (MI) given the low percentage of RCCs generated. It is our hypothesis that the benefits of 2C treatment extend beyond mere cell numbers. We propose that 2C may enhance the survival and metabolic resilience of existing CMs under hypoxic conditions, thereby contributing to cardiac protection post-MI. Our future investigations will aim to quantify the precise percentage of cells regenerated by 2C treatment post-MI and explore its broader impacts on cardiac tissue survival and repair mechanisms.

      (3) Comment from Reviewer 3 suggesting the administration of 2C in mice, as well as whether 2C affects cardiac function under basal conditions and any physiology in mice, and the need to examine cardiac structural and functional parameters after administration of 2C.

      We appreciate the reviewer's interest in the potential effects of 2C administration on cardiac function and overall physiology in mice. While we observed a decrease in body weight at P5 compared to controls, our immunofluorescence staining did not indicate any changes in cardiac structure (Figure 4— figure supplement 1E). This suggests that while 2C administration impacts neonatal rat physiology, it does not adversely affect cardiac structure under basal conditions. Further investigations are planned to assess the functional parameters of the heart post-2C administration to comprehensively understand its effects.

      (4) Comment from Reviewer 3 suggesting the potential effects of 2C on other cell types of the heart, including fibroblasts and endothelial cells, in vitro and in vivo.

      We value the reviewer's suggestion to explore the effects of 2C on various cardiac cell types, including fibroblasts and endothelial cells, both in vitro and in vivo. We acknowledge the importance of understanding the broader impact of 2C treatment across different cell populations within the heart, given its potential protective effects. To address this, we are designing a series of experiments to assess 2C's influence on these cell types, aiming to elucidate any changes in their behavior, proliferation, and function following treatment. This comprehensive approach will allow us to better understand the mechanistic basis of 2C's cardioprotective effects.

      (5) Comment from Reviewer 3 suggesting validation the effect of 2C in a dose-dependent manner.

      As suggested by the reviewer, we have supplemented the effect of 2C in dose-dependent (Figure 1— figure supplement 1F-G).

      (6) Comment from Reviewer 3 suggesting an explanation of how A-485 affects H3K27Ac and H3K9Ac.

      We appreciate the reviewer pointing out the discrepancy regarding the effects of A-485 on H3K27Ac and H3K9Ac. Upon re-examination of our data, we realize that our initial interpretation may have overlooked the broader impact of A-485 on histone acetylation patterns. It appears that A-485 does indeed influence both H3K27Ac and H3K9Ac, contrary to our initial statement. This oversight will be corrected in our revised manuscript, where we will provide a more detailed analysis and discussion of A-485's impact on these histone marks, alongside an explanation for the observed effects (lines 223-269 across page 8-9).

      (7) Comment from Reviewer 3 with a correction to use "regeneration" at the screeing stage.

      As suggested by the reviewer, we have amended the wording in the text (line 66 on page 3).

      Reviewer 4

      Comment from Reviewer 4 suggesting more information that clarifies and justifies the hypothesis.

      As suggested by the reviewer, we added more information to clarify and justify the hypothesis (lines 39-47 on page 3).

      (1) Comment from Reviewer 4 pointing out the story line is not well developed.

      To address the reviewer’s question, we revised the manuscript to ensure a smooth and coherent logical flow.

      (2) Comment from Reviewer 4 pointing out the purpose in choosing to study ISL1-CMs.

      As raised by the reviewer, we have clarified the rationale for using ISL1 as a marker to define RCCs in revised manuscript (lines 39-47 on page 3).

      (3) Comment from Reviewer 4 pointing out the missing references in row 57-58.

      Thank you for pointing this out, we fixed it.

      (4) Comment from Reviewer 4 suggesting more explains and show the results of the screening compounds.

      As suggested by the reviewer, we added additional explanations in lines 65-73 and showed the screening results in Figure 1—figure supplement 1F-H.

      (5) Comment from Reviewer 4 suggesting an in-depth discussion of the findings.

      Thank you for the suggestion, we included additional discussion at the end of the article.

      (6) Comment from Reviewer 4 suggesting a conclusion should be inculded in the main text.

      Thank you for the suggestion, we made a revision.

      (7) Comment from Reviewer 4 pointing out the cell viability under different concentrations of 2C.

      As mentioned by the reviewer, have supplemented the cell numbers during different doses of 2C treatment (Figure 2F).

      (8) Comment from Reviewer 4 pointing out the missing information in the methods.

      Thank you for the suggestion, we made additions.

      (9) Comment from Reviewer 4 suggesting more explanations in Figure S3A.

      As mentioned by the reviewer, we made a revision in original Fig.S3A (now is Figure 2—figure supplement 1).

      (10) Comment from Reviewer 4 pointing out the high variability of mCherry cells (%) in Figure 3J.

      Thank you. We made a revision.

      (11) Comment from Reviewer 4 suggesting more explanations on the DNA-binding motif of ISL1 in the cells treated with A-485 or 2C.

      Thank you for the suggestion, we added additional explanations (lines 270-274 on page 9).

      (12) Comment from Reviewer 4 pointing out the unclear labeling in Figure S1B and D.

      Thank you for the suggestion, made a revision (lines 240-245 in supporting information).

      (13) Comment from Reviewer 4 suggesting a relative quantification of the proteins in Figure 1H.

      Thank you for the suggestion. We have quantified the relative expression levels of proteins in original Fig. 1H. As shown in Figure 1F.

      (14) Comment from Reviewer 4 suggesting to provide detailed information in the methodology part about the compounds.

      Thank you for the suggestion, we made a revision.

      (15) Comment from Reviewer 4 pointing out the insufficient explanations on figure legends.

      Thank you for the suggestion, we made a revision.

      (16) Comment from Reviewer 4 suggesting more independent experiments to reduce the high variations in “ns” between NC and 2C at 60h+3d shown in Figure 2E and F.

      Thank you for the suggestion, we made a revision in Figure 2F.

      (17) Comment from Reviewer 4 suggesting a limitations should be provided in the text.

      Thank you for the suggestion, we have made provide a limitation statement in the revised manuscript (lines 300-311 on page 10).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The study investigates the impact of Clonal Hematopoiesis of Indeterminate Potential (CHIP) on Immune Checkpoint Inhibitor (ICI) therapy outcomes in NSCLC patients, analyzing blood samples from 100 patients pre- and post-ICI therapy for CHIP, and conducting single-cell RNA sequencing (scRNA-seq) of PBMCs in 63 samples, with validation in 180 more patients through whole exome sequencing. Findings show no significant CHIP influence on ICI response, but a higher CHIP prevalence in NSCLC compared to controls, and a notable CHIP burden in squamous cell carcinoma. Severely affected CHIP groups showed NF-kB pathway gene enrichment in myeloid clusters.

      Strengths:

      The study is commendable for analyzing a significant cohort of 100 patients for CHIP and utilizing scRNA-seq on 63 samples, showcasing the use of cutting-edge technology. The study tackles the vital clinical question of predicting ICI therapy outcomes in NSCLC.

      Weaknesses:

      The manuscript's comparison of CHIP prevalence between NSCLC patients and healthy controls could be strengthened by providing more detailed information on the control group. Specifically, details such as sex, smot king status, and comorbidities are needed to ensure the differences in CHIP are attributable to lung cancer rather than other factors. Including these details, along with a comparative analysis of demographics and comorbidities between both groups and clarifying how the control group was selected, would enhance the study's credibility and conclusions.

      Reviewer #2 (Public Review):

      Summary:

      The authors used a large cohort of patients with metastatic lung cancer pre- and 1-3 weeks post-immunotherapy. The goal was to investigate whether immunotherapy results in changes in CHIP clones (using targeted sequencing and whole exome sequencing) as well as to investigate whether patients with CHIP changed their response to immunotherapy (single-cell RNA sequencing).

      Strengths:

      This represents a large cohort of patients, and comprehensive assays - including targeted sequencing, whole exome sequencing, and single-cell RNA sequencing.

      Weaknesses:

      Findings are not necessarily unexpected. With regards to clonal dynamics, it would be very unlikely to see any changes within a few weeks' time frame. Longer follow-up to assess clonal dynamics would realistically be necessary.

      We truly appreciate constructive comments by the reviewers and editors. We agree with these comments and did our best to address them to improve the paper. Please see the following pages.

      Reviewer #1 (Recommendations For The Authors):

      Comment 1-1. In Figure 3B, the changes in frequency are challenging to discern. Consider employing connected line plots or another visual representation to enhance clarity and interpretation.

      Thank you for the suggestion. We modified Figure 3B to efficiently visualize the changes in cell proportion. Please note that the proportional changes in cell populations were not statistically significant by treatment, pathology, or clonal hematopoiesis (CH) severity.

      Comment 1-2. On page 13, Figure 3D is mentioned before Figure 3C. Please re-order to follow the correct sequence.

      We corrected the sequence of the figure and revised the text accordingly.

      Comment 1-3. Supplementary Figure 9 reveals an intriguing observation: the hypoxia and TNF signaling pathways appear to be regulated in opposite directions between CHIP-negative subjects and those with a Variant Allele Frequency (VAF) greater than 0.1. It would be insightful if the authors could delve into the potential implications or interpretations of this finding.

      We appreciate the reviewer's insightful comment. In the GSEA results presented in Supplementary Figure 9 and Figure 3C, we specifically focused on TNF signaling in monocytes and cDCs. Our subsequent analysis revealed that the adaptation of inflammatory signals is enriched in the myeloid cells in the CHIP-severe patients (Supplementary Fig. S12). Following the reviewer’s comment, we found that the leading-edge genes were shared between the TNF signaling and hypoxia pathways in most clusters (Supplementary Fig. S15). Suggested core genes, such as FOS, DUSP1, JUN, and PPP1R15A, play critical roles in the inflammatory phenotypes of myeloid lineages. Based on this finding, we added a paragraph in the Discussion section to address the implications of these shared signatures as follows (lines 340-348):

      “Our GSEA results specifically indicated the enrichment of TNF signaling and hypoxia pathways in most clusters of patients with severe CH (Supplementary Fig. S9). The leading-edge genes from GSEA results showed core genes such as FOS, DUSP1, JUN, and PPP1R15A, which are known to play critical roles in the inflammatory phenotypes of immune cells, were shared between the TNF signaling and hypoxia pathways in all significant clusters. (Supplementary Fig. S15). Furthermore, gene regulatory network analysis using SCENIC indicated a higher enrichment of inflammatory signatures in myeloid lineages (Supplementary Fig. S9), highlighting the pronounced inflammatory phenotype of CH clones in these cell lineages.”

      Comment 1-4. The plots in Supplementary Figure 12 would benefit from enlargement to improve legibility and facilitate a better understanding of the data presented.

      We improved resolutions and enlarged Supplementary Figure S12.

      Reviewer #2 (Recommendations For The Authors):

      Comment 2-1. The authors state that CHIP is seen at a higher prevalence in the metastatic lung (44/100) vs controls (5/42), however, no in-depth info other than age is given about the normal cohort (Table S2). It would be important to make sure the cohorts are matched with regards to smoking hx, age range, etc before making the claim that CHIP is more frequent in the metastatic lung cancer group.

      Thank you for the comment. To provide additional information of control cohort including current smoking habits and their sex information, we added columns in Table S2. While we tried to match the age distributions between the control group without a history of solid cancer and the lung cancer cohort, we observed that the lung cancer cohort had slightly older ages (mean ages: 58.9 vs. 64.1 years), a higher prevalence of smoking (current smokers: 11/42 vs. 37/100), and a higher proportion of males (male/female: 18/24 vs. 91/9).  Age and smoking are well-known epidemiological contributors to lung cancer and could influence the prevalence of clonal hematopoiesis (CH).

      However, previous studies have reported similar prevalence rates of CH in NSCLC patients, which aligned with our findings (Bolten et al., 2020 Nat Genet; Hong et al., 2022 Cancer Res). Moreover, our most prevalent CH mutations, including DNMT3A, TET2, and PPM1D, were marginally affected by smoking, and this trend has been consistently observed in healthy populations (Levin et al., 2022 Sci Rep). We have acknowledged these factors as major limitations of our study in the Discussion section as follows (lines 379-390):

      “Also, the distinct characteristics of our cohort can be confounders for our results. Compared to control patients, our cohort was biased toward slightly older ages, higher prevalence of smoking habits, and with a higher proportion of males (mean age: 64.1 vs. 58.9; current smokers: 37/100 vs. 11/42; male/female: 91/9 vs. 18/24 Supplementary Figures S1 and S3). However, previous studies have reported similar prevalence rates of clonal hematopoiesis in NSCLC patients, aligned with our findings (9,51). Moreover, our most prevalent CH mutations, including DNMT3A, TET2, and PPM1D, were marginally affected by smoking, and this trend has been consistently observed in both healthy populations and NSCLC patients (10,51,52).”

      Comment 2-2. Figure 1 - 1A states there were 100 CHIP and CHIP-PD mutations identified, but in 1B, C, and D there are < 100 bars and/or dots shown. How were the mutations in 1A then triaged to be shown in 1B-D?

      It appears that our poor annotation caused this misunderstanding. In Figure 1A, we showed the number of samples in each study group but did not provide detailed information in the legend. We found 67 mutations among the 100 patients and presented the mutational statistics in Figures 1B–D. Accordingly, we have revised the Figure 1 legend to clarify this sentence “The numbers indicate sample counts in each group.” (lines 426-427).

      Comment 2-3. Table S4 - would be helpful to have # of variant reads and # of total reads as columns (and also calculate VAF for an additional column).

      Thank you for the comment. We added columns revealing the total number of reads and the number of variant reads in Table S4. Also, we calculated the VAF and included it as a new column as suggested by the reviewer.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The present work from Velloso and collaborators investigated the transcription profiles of resident and recruited hypothalamic microglia. They found sex-dependent differences between males and females and identified the protective role of chemokine receptor CXCR3 against diet-induced obesity.

      Strengths:

      (1) Novelty;

      (2) Relevance, since this work provides evidence about a subset of recruited microglia that has a protective effect against DIO. This provides a new concept in hypothalamic inflammation and obesity.

      Weaknesses:

      (1) Lack of mechanistic insight into the sex-dependent effects;

      (2) Analysis of indirect calorimetry data requires more depth;

      (3) A deeper analysis of hypothalamic inflammation and ER stress pathways would strengthen the manuscript.

      Reviewer #2 (Public Review):

      Summary:

      This study by Mendes et al provides novel key insights into the role of chemotaxis and immune cell recruitment into the hypothalamus in the development of diet-induced obesity. Specifically, the authors reveal that although transcriptional changes in hypothalamic resident microglia following exposure to high-fat feeding are minor, there are compelling transcriptomic differences between resident microglia and microglia recruited to the hypothalamus, and these are sexually dimorphic. Using independent loss-of-function studies, the authors also demonstrate an important role of CXCR3 and hypothalamic CXCL10 in the hypothalamic recruitment of CCR2+ positive cells on metabolism following exposure to high-fat diet-feeding in mice. This manuscript puts forth conceptually novel evidence that inhibition of chemotaxis-mediated immune cell recruitment accelerates body weight gain in high-fat diet-feeding, suggesting that a subset of microglia that express CXCR3 may confer protective, anti-obesogenic effects.

      Strengths:

      The work is exciting and relevant given the prevalence of obesity and the consequences of inflammation in the brain on perturbations of energy metabolism and ensuant metabolic diseases. Hypothalamic inflammation is associated with disrupted energy balance, and activated microglia within the hypothalamus resulting from excessive caloric intake and saturated fatty acids are often thought to be mediators of impairment of hypothalamic regulation of metabolism. The present work reports a novel notion in which immune cells recruited into the hypothalamus that express chemokine receptor CXCR3 may have a protective role against diet-induced obesity. In vivo studies reported herein demonstrate that inhibition of CXCR3 exacerbates high-fat diet-induced body weight gain, increases circulating triglycerides and fasting glucose levels, worsens glucose tolerance, and increases the expression of orexigenic neuropeptides, at least in female mice.

      This work provides a highly interesting and needed overview of preclinical and clinical brain inflammation, which is relevant to readers with an interest in metabolism and immunometabolism in the context of obesity.

      Using flow cytometry, cell sorting, and transcriptomics including RNA-sequencing, the manuscript provides novel insights into transcriptional landscapes of resident and recruited microglia in the hypothalamus. Importantly, sex differences are investigated.

      Overall, the manuscript is perceived to be highly interesting, relevant, and timely. The discussion is thoughtful, well-articulated, and a pleasure to read and felt to be of interest to a broad audience.

      Weaknesses:

      There were no major weaknesses perceived. Some comments for potential textual additions to the results/discussion are listed in recommendations to authors.

      Comments from the authors regarding the evaluation of the article: We publicly express our gratitude for the work of both Reviewers. The comments were timely and constructive and guided us toward preparing a new version of the article which contains novel data that strengthened the overall quality of the study.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Experiments with ovariectomized female mice with (and without) estrogen replacement would help to address the physiological basis of the observed sexdependent effects.

      We performed an experiment with female C57BL/6J Unib, subdivided into Sham, OVX, and OVX+EST groups, which were exposed to HFD for 4 weeks. We monitored the weekly evolution of body weight and food intake. At the end of the protocol, the animals fasted for 4 hours. Then, we measured fasting blood glucose and estradiol; and extracted tissues (hypothalamus and

      WAT). In the hypothalamus samples, we evaluated, by RT-qPCR, the expression of chemokines, chemokine receptors, and some pro-inflammatory cytokines and neuropeptides. We evaluated the body mass relative WAT weight. The new results are presented in Supplementary Figure 1.

      Indirect calorimetric analysis of energy expenditure will benefit from ANCOVA analysis using body weight as a covariate. Moreover, locomotor activity should be also controlled.

      All statistical analysis regarding energy expenditure is corrected by body mass, thus, there is no need for ANCOVA, we clarified this in the text. The determination of locomotor activity is now included in Supplementary Figures 2 and 3. 

      A deeper analysis of hypothalamic inflammation and ER stress pathways would strengthen the manuscript.

      We performed new experiments to determine the expression of hypothalamic inflammation and ER stress pathaways. This is shown in Suppl. Fig. 2 and 3. 

      Mechanistic inhibition of CXCR3 was performed by CXCL10 immunoneutralization and CXCR3 antagonism. Those approaches are correct and well-performed, however considering the experience of the group in hypothalamic studies, I miss a virogenetic-based knockdown. Do the authors have any data on that?

      This is indeed a great point. Unfortunately, we did not succeed in obtaining mice Cre lineages that would be needed for the proposed experiments. We included this as a weakness of the study. 

      Reviewer #2 (Recommendations For The Authors):

      There are a few typographical errors for correction:

      -  Page 4, line 157: CCL10 to CXCL10.

      -  Page 6, line 226: makers to markers.

      -  Page 7, lines 283 and 287, Figure 6C: INF to IFN.

      All errors were corrected, as recommended. 

      Parts of the manuscript may be difficult for readers without knowledge of transcriptomics to interpret; thus, further description of several of the figures (e.g. Figure 3 and 4) may be helpful.

      We expanded the text in Results to clarify this issue.

      Could the authors comment on the choice of peripheral administration of CXCR3 antagonist as opposed to central (e.g. icv) administration? Indeed, systemic inhibition of CXCR3 produced significant alterations in body weight gain and glucose tolerance in female mice given high-fat diets and reduced CCR2 and CXCR3 immunostaining in the hypothalamus. Could changes to peripheral (e.g. WAT, liver) immune responses to the diet underlie the metabolic changes observed?

      CXCR3+ cells are present in very small numbers in the hypothalamus under basal conditions. In HFD, these are recruited from the periphery to the CNS, so, we believe ICV treatment with AMG487 would not reduce recruitment to the hypothalamic parenchyma. With the same animals in which we performed the locomotor activity, we performed RT-qPCR of WAT and liver and analyzed the expression of genes involved in lipid and glucose metabolism. This is now in Supplementary Figures 2 and 3. We included a comment in the text to explain our rationale for this approach.

      Besides hypothalamic mRNA levels of chemokines and chemokine receptors, does systemic CXCR3 antagonism affect other aspects linked to diet-induced impairments of hypothalamic regulation of energy homeostasis, like inflammation, ER stress and/or mitochondrial dynamics/function? It would be interesting to reveal the consequence of reduced CCR2+ microglial migration to the hypothalamus with chronic high-fat diet exposure.

      We performed new experiments shown in Supplementary Figures 2 and 3 to deal with these important questions. In the hypothalamus of females there were no changes in the expression of transcripts encoding proteins involved in endoplasmic reticulum homeostasis and mitochondrial turnover, whereas in males there was a reduction of Ddit3 and Mfn1. Moreover, in females the inhibition of CXCR3 promoted no changes in the liver expression of lipidogenic and gluconeogenic genes, and no changes in the white adipose tissue expression of lipidogenic genes. In the liver of males, there was a reduction in the expression of Fasn and an increase in the expression of G6pc3. As for the females, in males, there were no changes in the white adipose tissue expression of lipidogenic genes.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Weaknesses:

      There are some minor weaknesses.

      Comment 1:Notably, there are not a lot of new insights coming from this paper. The structural comparisons between MCC and PCC have already been described in the literature and there were not a lot of significant changes (outside of the exo- to endo- transition) in the presence vs. absence of substrate analogues.

      We agree that the structures of the human MCC and PCC holoenzymes are similar to their bacterial homologs. That is due to the conserved sequences and functions of MCC and PCC across different species.

      Comment 2: There is not a great deal of depth of analysis in the discussion. For example, no new insights were gained with respect to the factors contributing to substrate selectivity (the factors contributing to selectivity for propionyl-CoA vs. acetyl-CoA in PCC). The authors state that the longer acyl group in propionyl-CoA may mediate stronger hydrophobic interactions that stabilize the alpha carbon of the acyl group at the proper position. This is not a particularly deep analysis and doesn't really require a cryo-EM structure to invoke. The authors did not take the opportunity to describe the specific interactions that may be responsible for the stronger hydrophobic interaction nor do they offer any plausible explanation for how these might account for an astounding difference in the selectivity for propionyl-CoA vs. acetyl-CoA. This suggests, perhaps, that these structures do not yet fully capture the proper conformational states.

      We appreciate this comment. Unfortunately, in the cryo-EM maps of the PCC holoenzymes, the acyl groups were not resolved (fig. S6), so we were unable to analyze the specific interactions between the acyl-CoAs and PCC. We have revised the manuscript and acknowledged this limitation in the second paragraph of the discussion section: 

      “In the cryo-EM maps of the PCC holoenzymes, the acyl groups of acetyl-CoA and propionylCoA were not resolved (fig. S6), limiting the analysis of the interactions between the acyl groups and PCC. Nevertheless, the PCC-PCO and PCC-ACO structures determined in our study demonstrate that the conformations of the acyl-CoA binding pockets in the two structures are almost identical (Fig. 3F, fig. S7, B and C). In addition, the well resolved CoA groups of propionyl-CoA and acetyl-CoA bind at the same position in human PCC holoenzyme (Fig. 3F). These findings indicate that propionyl-CoA and acetyl-CoA bind to PCC with a similar binding mode.”

      Comment 3: The authors also need to be careful with their over-interpretation of structure to invoke mechanisms of conformational change. A snapshot of the starting state (apo) and final state (ligand-bound) is insufficient to conclude *how* the enzyme transitioned between conformational states. I am constantly frustrated by structural reports in the biotin-dependent enzymes that invoke "induced conformational changes" with absolutely no experimental evidence to support such statements. Conformational changes that accompany ligand binding may occur through an induced conformational change or through conformational selection and structural snapshots of the starting point and the end point cannot offer any valid insight into which of these mechanisms is at play.

      Point accepted. We have revised our manuscript to use conformational differences instead of conformational changes to describe the differences between the apo and ligand-bound states (see the last paragraph of the introduction section and the third paragraph of the discussion section).

      Reviewer #2 (Public Review):

      Comments and questions to the manuscripts:

      Comment 1: I'm quite impressed with the protein purification and structure determination, but I think some functional characterization of the purified proteins should be included in the manuscript. The activity of enzymes should be the foundation of all structures and other speculations based on structures.

      We appreciate this comment. However, since we purified the endogenous BDCs and the sample we obtained was a mixture of four BDCs, the enzymatic activity of this mixture cannot accurately reflect the catalytic activity of PCC or MCC holoenzyme. We have revised the manuscript and acknowledged this limitation in the first paragraph of the results section: 

      “We did not characterize the enzyme activities of the mixed BDCs because the current methods used to evaluate the carboxylase activities of BDCs, such as measuring the ATP hydrolysis or incorporation of radio-labeled CO2, are unable to differentiate the specific carboxylase activity of each BDC.”

      Comment 2: In Figure 1B, the structure of MCC is shown as two layers of beta units and two layers of alpha units, while there is only one layer of alpha units resolved in the density maps. I suggest the authors show the structures resolved based on the density maps and show the complete structure with the docked layer in the supplementary figure.

      We appreciate this comment. We have shown the cryo-EM maps of the PCC and MCC holoenzymes in fig. S8 to indicate the unresolved regions in these structures. The BC domains in one layer of MCCα in the MCC-apo structure were not resolved. However, we think it would be better to show a complete structure in Fig. 1 to provide an overall view of the MCC holoenzyme. We have revised Fig. 1B and the figure legend to clearly point out which domains were not resolved in the cryo-EM map and were built in the structure through docking. We have also revised the main text to clearly describe which parts of the holoenzymes were not resolved in the cryo-EM maps and how the complete structures were built.

      Comment 3: In the introduction, I suggest the author provide more information about the previous studies about the structure and reaction mechanisms of BDCs, what is the knowledge gap, and what problem you will resolve with a higher resolution structure. For example, you mentioned in line 52 that G437 and A438 are catalytic residues, are these residues reported as catalytic residues or this is based on your structures? Has the catalytic mechanism been reported before? Has the role of biotin in catalytic reactions revealed in previous studies?

      Point accepted. It was reported that G419 and A420 in Streptomyces coelicolor PCC, corresponding to G437 and A438 in human PCCβ, were the catalytic residues for the secondstep carboxylation reaction (PMID: 15518551). The same study also reported the catalytic mechanism of the carboxyl transfer reaction. The role of biotin in the BDC-catalyzed carboxylation reactions has been extensively studied (PMIDs: 22869039, 28683917). We have revised the manuscript to introduce the catalytic mechanisms of BDCs elucidated through the investigation of prokaryotic BDCs in the fourth paragraph of the introduction section. 

      Comment 4: In the discussion, the authors indicate that the movement of biotin could be related to the recognition of acyl-CoA in BDCs, however, they didn't observe a change in the propionyl-CoA bound MCC structure, which is contradictory to their speculation. What could be the explanation for the exception in the MCC structure?

      We appreciate this comment. We do not have a good explanation for why we did not observe a change in the propionyl-CoA bound MCC structure. It is noteworthy that neither acetyl-CoA nor propionyl-CoA is the natural substrate of MCC. Recently, a cryo-EM structure of the human MCC holoenzyme in complex with its natural substrate, 3-methylcrotonyl-CoA, has been resolved (PDB code: 8J4Z). In this structure, the binding site of biotin and the conformation of the CT domain closely resemble that in our acetyl-CoA-bound MCC structure. Therefore, the movement of biotin induced by acetyl-CoA binding mimics that induced by the binding of MCC's natural substrate, 3-methylcrotonyl-CoA, indicating that in comparison with propionylCoA, acetyl-CoA is closer to 3-methylcrotonyl-CoA regarding its ability to bind to MCC. We have discussed this possibility in the last paragraph of the discussion section. We have also added a supplementary figure (fig. S11) to compare the structures of human MCC holoenzyme in complex with acetyl-CoA and 3-methylcrotonyl-CoA.

      Comment 5: In the discussion, the authors indicate that the selectivity of PCC to different acyl-CoA is determined by the recognition of the acyl chain. However, there are no figures or descriptions about the recognition of the acyl chain by PCC and MCC. It will be more informative if they can show more details about substrate recognition in Figures 3 and 4.

      We appreciate this comment. Unfortunately, in the cryo-EM maps of the PCC holoenzymes, the acyl groups were not resolved (fig. S6), so we were unable to analyze the specific interactions between the acyl-CoAs and PCC. We have revised the manuscript and acknowledged this limitation in the second paragraph of the discussion section: 

      “In the cryo-EM maps of the PCC holoenzymes, the acyl groups of acetyl-CoA and propionylCoA were not resolved (fig. S6), limiting the analysis of the interactions between the acyl groups and PCC. Nevertheless, the PCC-PCO and PCC-ACO structures determined in our study demonstrate that the conformations of the acyl-CoA binding pockets in the two structures are almost identical (Fig. 3F, fig. S7, B and C). In addition, the well resolved CoA groups of propionyl-CoA and acetyl-CoA bind at the same position in human PCC holoenzyme (Fig. 3F). These findings indicate that propionyl-CoA and acetyl-CoA bind to PCC with a similar binding mode.”

      Comment 6: How are the solved structures compared with the latest Alphafold3 prediction?

      Since AlphaFold3 was not released when our manuscript was submitted, we did not compare the solved structures with the AlphaFold3 predictions. We have now carried out the predictions using Alphafold3. Due to the token limitation of the AlphaFold3 server, we can only include two α and six β subunits of human PCC or MCC in the prediction. The overall assembly patterns of the Alphafold3-predicted structures are similar to that of the cryo-EM structures. The RMSDs between PCCα, PCCβ, MCCα, and MCCβ in the apo cryo-EM structures and those in the AlphaFold3-predicted structures are 7.490 Å, 0.857 Å, 7.869 Å, and 1.845 Å, respectively. The PCCα and MCCα subunits adopt an open conformation in the cryo-EM structures but adopt a closed conformation in the AlphaFold-3 predicted structures, resulting in large RMSDs.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      DMS-MaP is a sequencing-based method for assessing RNA folding by detecting methyl adducts on unpaired A and C residues created by treatment with dimethylsulfate (DMS). DMS also creates methyl adducts on the N7 position of G, which could be sensitive to tertiary interactions with that atom, but N7-methyl adducts cannot be detected directly by sequencing. In this work, the authors adopt a previously developed method for converting N7-methyl-G to an abasic site to make it detectable by sequencing and then show that the ability of DMS to form an N7-methyl-G adduct is sensitive to RNA structural context. In particular, they look at the G-quadruplex structure motif, which is dense with N7-G interactions, is biologically important, and lacks conclusive methods for in-cell structural analysis. 

      Strengths: 

      - The authors clearly show that established methods for detecting N7-methyl-G adducts can be used to detect those adducts from DMS and that the formation of those adducts is sensitive to structural context, particularly G-quadruplexes. 

      - The authors assess the N7-methyl-G signal through a wide range of useful probing analyses, including standard folding, adduct correlations, mutate-and-map, and single-read clustering. 

      - The authors show encouraging preliminary results toward the detection of G-quadruplexes in cells using their method. Reliable detection of RNA G-quadruplexes in cells is a major limitation for the field and this result could lead to a significant advance. 

      - Overall, the work shows convincingly that N7-methyl-G adducts from DMS provide valuable structural information and that established data analyses can be adapted to incorporate the information. 

      We thank the reviewer for their time and appreciate the reviewer for their positive assessment as well as for their suggestions which we have addressed below.

      Weaknesses: 

      - Most of the validation work is done on the spinach aptamer and it is the only RNA tested that has a known 3D structure. Although it is a useful model for validating this method, it does not provide a comprehensive view of what results to expect across varied RNA structures. 

      Thank you for your insightful comments. We agree that a more comprehensive view of BASH MaP involves probing a larger variety of RNAs with known 3D-structures beyond Spinach and the poly-UG RNA. Although outside the scope of this publication, more work is needed to reveal the determinants of N7G reactivity to DMS.

      - It's not clear from this work what the predictive power of BASH-MaP would be when trying to identify G-quadruplexes in RNA sequences of unknown structure. Although clusters of G's with low reactivity and correlated mutations seem to be a strong signal for G-quadruplexes, no effort was made to test a range of G-rich sequences that are known to form G-quadruplexes or not. Having this information would be critical for assessing the ability of BASH-MaP to identify G-quadruplexes in cells. 

      - Although the authors present interesting results from various types of analysis, they do not appear to have developed a mature analysis pipeline for the community to use. I would be inclined to develop my own pipeline if I were to use this method. 

      Thank you for your suggestion. We have more clearly annotated the python scripts and GitHub repository which contain all custom scripts used for analyzing BASH MaP data. These changes will enable researchers to more easily utilize our developed pipelines.

      - There are various aspects of the DAGGER analysis that don't make sense to me: <br /> (1) Folding of the RNA based on individual reads does not represent single-molecule folding since each read contains only a small fraction of the possible adducts that could have formed on that molecule. As a result, each fold will largely be driven by the naive folding algorithm. I recommend a method like DREEM that clusters reads into profiles representing different conformations. 

      (2) How reliable is it to force open clusters of low-reactivity G's across RNA's that don't already have known G-quadruplexes? 

      (3) By forcing a G-quadruplex open it will be treated as a loop by the folding algorithm, so the energetics won't be accurate. 

      (4) It's not clear how signals on "normal" G's are treated. In Figure 5C some are wiped to 0 but others are kept as 1. 

      Thank you for your keen observations regarding the conceptual frameworks utilized in DAGGER. We have included a complimentary analysis to DAGGER utilizing Spinach BASH MaP data with DANCE, an algorithm which shares an underlying architecture with DREEM, and found that DANCE analysis gave similar results to those found with DAGGER. However, we have not benchmarked DAGGER’s performance on a range of RNAs and compared the results with expectation-maximization algorithms like DREEM and DANCE.

      To minimize the effects of artificially creating loops with tertiary folding constraints, we utilized the RNA folding algorithm CONTRAfold which relies less on direct energetic calculations than other commonly used RNA folding algorithms such as RNAstructure.

      We have updated the main text to more clearly indicate how DAGGER handles signals at G’s in a range of conditions. The main text now better clarifies the specific logic used for determining which G’s contain either a 0 or a 1 in the bitvector encoding used in DAGGER analysis.

      Reviewer #2 (Public Review): 

      Summary: 

      The manuscript introduces BASH MaP and DAGGER, innovative tools for analyzing RNA tertiary structures, specifically focusing on the G-quadruplexes. Traditional methods have struggled to detect and analyze these structures due to their reliance on interactions on the Hoogsteen face of guanine, which are not readily observable through conventional probing that targets Watson-Crick interactions. BASH MaP employs dimethyl sulfate and potassium borohydride to enhance the detection of N7-methylguanosine by converting it into an abasic site, thereby enabling its identification through misincorporation during reverse transcription. This method provides higher precision in identifying G-quadruplexes and offers deeper insights into RNA's structural dynamics and alternative conformations in both vitro and cellular contexts. Overall, the study is well-executed, demonstrating robust signal detection of N7-Gs with some compelling positive controls, thorough analysis, and beautifully presented figures. 

      Strengths: 

      The manuscript introduces a new method to detect G-quadruplexes (G-qs) that simplifies and potentially enhances the robustness and quantification compared to previous methods relying on reverse transcription truncations. The authors provide a strong positive control, demonstrating a 70% misincorporation at endogenous N7-G within the 18S rRNA, which illustrates BASH MaP's high signal-to-noise ratio. The data concerning the detection of positive control G-qs is particularly compelling. 

      Weaknesses: 

      Figure 3E shows considerable variability in the correlations among guanosines, suggesting that the methods may struggle with specificity in determining guanosine participation within and between different quadruplexes. There is no estimation of the methods false positive discovery rate.

      Thank you for your positive assessment and for your time to come up with suggestions to improve this publication. We have addressed your specific comments in the “Recommendations For The Authors” section below.

      Reviewer #3 (Public Review): 

      Summary: 

      In this study, the authors aim to develop an experimental/computational pipeline to assess the modification status of an RNA following treatment with dimethylsulfate (DMS). Building upon the more common DMS Map method, which predominantly assesses the modification status of the Watson-Crick-Franklin face of A's and C's, the authors insert a chemical processing step in the workflow prior to deep sequencing that enables detection of methylation at the N7 position of guanosine residues. This approach, termed BASH MaP, provides a more complete assessment of the true modification status of an RNA following DMS treatment and this new information provides a powerful set of constraints for assessing the secondary structure and conformational state of an RNA. In developing this work, the authors use Spinach as a model RNA. Spinach is a fluorogenic RNA that binds and activates the fluorescence of a small molecule ligand. Crystal structures of this RNA with ligand bound show that it contains a G-quadruplex motif. In applying BASH MaP to Spinach, the authors also perform the more standard DMS MaP for comparison. They show that the BASH MaP workflow appears to retain the information yielded by DMS MaP while providing new information about guanosine modifications. In Spinach, the G-quadruplex G's have the least reactive N7 positions, consistent with the engagement of N7 in hydrogen bonding interactions at G's involved in quadruplex formation. Moreover, because the inclusion of data corresponding to G increases the number of misincorporations per transcript, BASH MaP is more amenable to analysis of co-occurring misincorporations through statistical analysis, especially in combination with site-specific mutations. These co-occurring misincorporations provide information regarding what nucleotides are structurally coupled within an RNA conformation. By deploying a likelihood-ratio statistical test on BASH MaP data, the authors can identify Gs in G-quadruplexes, deconvolute G-G correlation networks, base-triple interactions and even stacking interactions. Further, the authors develop a pipeline to use the BASH MaP-derived G-modification data to assist in the prediction of RNA secondary structure and identify alternative conformations adopted by a particular RNA. This seems to help with the prediction of secondary structure for Spinach RNA. 

      Strengths: 

      The BASH Map procedure and downstream data analysis pipeline more fully identify the complement of methylations to be identified from the DMS treatment of RNA, thereby enriching the information content. This in turn allows for more robust computational/statistical analysis, which likely will lead to more accurate structure predictions. This seems to be the case for the Spinach RNA. 

      Weaknesses: 

      The authors demonstrate that their method can detect G-quadruplexes in Spinach and some other RNAs both in vitro and in cells. However, the performance of BASH MaP and associated computational analysis in the context of other RNAs remains to be determined. 

      We thank the reviewer for their time spent analyzing this manuscript, for their positive assessment and for their suggestions on improving this publication. We have addressed your specific comments in the “Recommendations For The Authors” section below.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      Although the text is clear and coherent, the overall flow of the manuscript comes across as "here's a bunch of stuff I tried." Maybe you're looking to get this out quickly, but it would have been much more impactful (and enjoyable to read) a description of a more polished final product. 

      Thank you for your highlighting the strengths and weaknesses of this manuscript. We have changed parts of the main text to enhance the overall flow of the manuscript and increase reader enjoyability.

      Reviewer #2 (Recommendations For The Authors): 

      I have only a few comments: 

      Major: 

      (1) Analysis of Guanosine Correlations in Figure 3E: In Figure 3E, there is a lot of variability in the correlations among guanosines. For example, G46 shows a strong correlation with G93 (within the same quadruplex) but also correlates with G91, G95 (in different quadruplexes), and G97 (not part of any quadruplex as per the model in Figure 3C). Contrarily, G86 exhibits weak correlations, and G50 along with G89 shows no significant correlations. These findings imply that BASH MaP followed by RING MaP analysis struggles to accurately distinguish between guanosines within the same or different quadruplexes in Spinach. Perhaps there are some opportunities to enhance the specificity in determining guanosine participation within quadruples, a great point for the authors to discuss. 

      Thank you for your comments and careful analysis of the pattern of correlations produced by BASH MaP. We agree that BASH MaP followed by RING MaP analysis is unable to unambiguously distinguish between guanosines within the same or different quadruplex layers. This finding was a surprise as we initially assumed that quadruplex layers would behave in a manner like Watson-Crick base pairs and produce specific signals in the corresponding RING MaP heatmaps.  We suspect that this may be due to mutations in specific G’s being associated with altered conformations which allow other G’s to form different interactions that affect DMS reactivity.  This may be unique to the highly complex structure in Spinach.  However, we think BASH-MaP clearly provides signals that point to key residues within the G-quadruplex, even if it does not clearly identify all of them.

      This idea is supported by experiments described in Figure 4, which show that mutation of a single guanosine residue causes a complete breakdown of the hydrogen-bonding network throughout all quadruplex layers. Additionally, DMS methylation of an N7G in a quadruplex is likely to disrupt base stacking interactions in and around the quadruplex. The compounding effects of a dynamic G-quadruplex and DMS-induced changes to local base stacking properties explains both the strong correlations with G97, which is base-stacked with the quadruplex, and the inability to specifically identify the guanosines which comprise specific quadruplex quartets. We have further emphasized this point in an updated discussion section.

      (2) Potential Consolidation of Figures 3 and 4: Figure 4 appears quite similar to Figure 3 but employs M2-seq instead of relying on spontaneous mutations. It might be beneficial to merge these figures to demonstrate that M2-seq can more effectively identify correlations between guanosines in quadruplexes. 

      We agree that Figures 3 and 4 appear quite similar but there is an important distinction to be made between RING MaP and M2-seq analysis. We suspect that the mechanism causing correlations between guanosines in quadruplexes for RING MaP as “RNA breathing” in contrast to the spontaneous T7 RNA polymerase-induced mutation model proposed in Cheng et al. PNAS 2017, https://doi.org/10.1073/pnas.1619897114. To determine whether correlations between guanosines in Spinach BASH MaP experiments rely on spontaneous mutations, we compared the fraction of reads containing misincorporations at pairs of quadruplex guanosines over a range of DMS concentrations.  The spontaneous mutation model predicts a linear dependence between quadruplex guanosine signals and DMS dose while an “RNA breathing” or double-DMS hit model predicts a quadratic dependence on DMS dose (Cheng et al. PNAS 2017, https://doi.org/10.1073/pnas.1619897114). Our data may support a quadratic dependence on DMS dose for multiple pairs of G-quadruplex guanosines, while they demonstrate a linear dependence between helical G’s (Supplementary Data Fig. 9). Together, these data suggest that BASH MaP followed by RING MaP analysis detects double-DMS modification events for pairs of quadruplex guanosines. Therefore, BASH MaP and RING MaP analysis provide a complimentary approach to M2 BASH MaP and reveal guanosine correlations in contexts where pre-installed mutations are incompatible such as the study of endogenously expressed RNAs.

      (3) Estimation of False Positive Rates: An estimation of the false positive rate for G-quadruplex identification would be invaluable. Since identification currently depends on the absence of DMS modification, it's important to consider how other factors like solvent inaccessibility or library generation might affect the detection and be misinterpreted as G-quadruplexes. Although this could be a subject of future work, some discussion by the authors would enhance the manuscript. 

      We have added a table summarizing sensitivity, positive predictive value, and false positive rate for different G-quadruplex identification schemes.  See Supplementary Table 1.

      Minor: 

      (4) Line 273 Reference Correction: Please adjust the reference in line 273 to accurately reflect that the G-quadruplex experiments compare potassium with lithium, not sodium. 

      In cellulo G-quadruplex reverse transcriptase (RT) stop assays as described by Guo and Bartel (https://www.science.org/doi/10.1126/science.aaf537) compared RT stops between DMS treated mRNA refolded in potassium and sodium buffers. We have clarified in the text that traditionally, G-quadruplex RT stop assays compare potassium with lithium.

      (5) Consistency in Figure 1 (Panels F and G): Aligning BASH MaP (170 mM DMS) as the y-axis in both panels F and G would visually align the data points and enhance the graphical coherence across these panels. 

      Thank you for noticing the subtleties in our data presentation and for the suggestion on how to improve our graphical coherence across panels. We specifically choose not to align BASH MaP (170 mM DMS) as the y-axis for panels F and G because we did not want the reader to mistakenly assume that the data for BASH MaP (170 mM DMS) presented in panels F and G is the same data. In panel F, BASH MaP was performed under standard DMS probing buffer conditions which utilized a pH 7.5 bicine buffer. The purpose of panel F is to show the reproducibility of BASH MaP under various DMS concentrations. In panel G, BASH MaP was performed under DMS probing buffer conditions which promote the formation of m3U using a pH 8.3 bicine buffer. The purpose of panel G is to show that the borohydride treatment and depurination steps in BASH MaP do not react with DMS-derived m1A, m3C, and m3U in a manner which prevents their measurement through cDNA misincorporation. Together, these experimental differences cause the data points for BASH MaP (170 mM DMS) to vary between panels F and G which would lead to more confusion for the reader and detract from the intended message we are trying to convey through panels F and G. 

      (6) Statistical Detail in Figure 1E: Incorporating a confidence interval or a P-value in Figure 1E would enrich the statistical depth and provide readers with a clearer understanding of the data's significance. 

      Thank you for the suggestion of including a p-value in Figure 1E to provide the readers with a clearer understanding of the data’s significance. The effect of combining DMS treatment and borohydride reduction on the misincorporation rate of G’s in Spinach is so dramatic that the raw data sufficiently provides the readers a clear understanding of its significance.

      (7) Reevaluation of Figure 2B: Considering the small number of Gs in single-stranded regions and base triples, it might be more informative to move Figure 2B to supplementary information. Focusing on Figure 2C, which consolidates non-quadruplex categories, could provide more impactful insights. 

      Thank you for your suggestion. It is important to initially provide an overall characterization of N7G DMS reactivity for G’s in a variety of structural contexts before more specifically looking at G-quadruplexes. Panel B is an important part of figure 2 for the following two reasons:

      First, a reader’s first question upon seeing the N7G chemical reactivity for Spinach as showed in Figure 2A is likely to ask whether base-paired G’s and single-stranded G’s have similar or different DMS reactivities. Figure 2, panel B shows that generally, single-stranded G’s appear to have higher DMS reactivity than base-paired G’s except for 2 G’s which display hyper-reactivity. The basis for this hyper-reactivity is addressed in Figure 4.

      Second, panel B highlights the wide range in N7G DMS reactivities. Since the G-quadruplex G’s display a dramatically lower DMS reactivity as compared to single-stranded G’s and hyper-reactive base-paired G’s, the dynamic range of DMS reactivities was difficult to capture in a single panel. Panel C does not convey these dynamics appropriately as a stand-alone figure.

      (8) Enhancements to Figure 2G: Improving the visibility of mutation rates in this figure would help. Suggestions include coloring bars by nucleotide type for intuitive visual comparison and adjusting the y-axis to a logarithmic scale to better represent near-zero mutation rates. Additionally, employing histograms or box plots could directly compare DMS reactivities and provide a clearer analysis. 

      Thank you for your suggestions on enhancing the presentation of BASH MaP applied to an mRNA. The main purpose of figure 2G was to validate whether BASH MaP could detect G’s engaged in a G-quadruplex in a cell. In-cell G-quadruplex folding measurements as performed by Guo and Bartel (https://www.science.org/doi/10.1126/science.aaf537) only identified a few G-quadruplexes which were folded and only the 3’ end of the G-quadruplex was detected. We therefore reasoned that the 3’ most G’s of these select set of G-quadruplexes were the only validated G’s engaged in a G-quadruplex in cells. In the instance of the AKT2 mRNA, Guo and Bartel found that 4 G’s appeared to be folded in a G-quadruplex in cells (Supplementary figure 2E). These G’s are indicated at the bottom of the plot with black bars and the label “In-cell G-quadruplex guanosines”. Therefore, we hypothesized that these G’s would display low DMS reactivity with BASH MaP while other G’s in the AKT2 mRNA would display higher chemical reactivities. We followed a standard convention in displaying chemical reactivities used extensively in the field where black bars indicate low reactivity, yellow bars indicate moderate reactivity, and red bars indicate high reactivity. The data in Fig 2G directly supports Guo and Bartel’s prediction of an in-cell folded G-quadruplex in the AKT2 mRNA because the 4 G’s predicted to be engaged in a G-quadruplex all displayed near zero DMS reactivities.

      We agree that adjusting the y-axis to a logarithmic scale would better represent near-zero mutations rates. However, the purpose of figure 2G is not to compare all positions with near-zero mutation rates. Instead, our use of standard conventions in displaying chemical reactivities is sufficient for the purpose of displaying BASH MaP’s ability to validate in-cell G-quadruplex G’s.

      Later in the paper, we go a step further and create a better criterion than simple N7G DMS reactivity for identifying G’s engaged in a G-quadruplex. For further analysis of G’s with near zero DMS reactivities, see Figure 3 and Supplementary figure 4 which utilizes RING Mapper to identify lowly-reactive G’s which produce co-occurring misincorporations.

      (9) Scale Consistency in Figure 3: Ensuring that the correlation scales are uniform across Panels A, B, D, and E would facilitate easier comparison of the data, enhancing the overall coherence of the findings. Using raw correlation values could also improve clarity and interpretation. 

      Thank you for the suggestions to facilitate easier comparisons of data in Figure 3. We have ensured the correlation scales are uniform across panels A, B, D, and E to enhance the coherence of these findings. We initially visualized the data in Figure 3 by plotting raw correlation values, but we found these values differed between DMS MaP and BASH MaP datasets, likely because of the low-level background mutations introduced by the borohydride reduction step of BASH (see Supplementary figure 3A). However, performing a global normalization of correlation strength values computed by RING mapper enabled clear comparisons between DMS MaP and BASH MaP RING heatmaps and revealed structural domains consistent with the crystal structure of Spinach.

      (10) Correction on Line 506: Please update the reference to M2 BASH MaP for accuracy. 

      Thank you. We have updated the main text to incorporate this comment.

      Reviewer #3 (Recommendations For The Authors): 

      The paper describes multiple applications and multiple methods of analysis of the BASH Map data, which collectively make the manuscript more difficult to follow. The manuscript would become more readable and user-friendly if there were some overview figures to describe the sequencing pipeline and the various computational workflows that the BASH MaP data are fed into (e.g. RING Mapper, DAGGER, M2 BASH MaP, Co-occurring Misincorporations, Secondary Structure Prediction). One or more summary schemes that provide an overview would strongly assist with the clarity and overall content of the paper. 

      Thank you for your suggestions. We have incorporated a summary scheme of the various computational workflows and their use cases in Fig 7.

      Line 165. Here, misincorporation rates for all four nucleotides are discussed, but m3U is not mentioned until from the following paragraph. It would be appropriate and clearer to mention this sooner. 

      Thank you for your suggestion. We have restructured this section to introduce the DMS modification m3U in an earlier paragraph to increase clarity for readers.

      Line 506: spelling of DAGGER. 

      Thank you. We have updated the main text to incorporate this comment.

      Line 645: I found this paragraph difficult to follow, especially the line starting 649. I thought the logic was to exclude G's involved in tertiary interactions from base-paring in the secondary structure prediction. Some clarification would be helpful. 

      Thank you for your comments. We have restructured the paragraph to emphasize that DAGGER only applies tertiary folding constraints to sequencing reads without misincorporations at G’s engaged in tertiary interactions. We reasoned that sequencing reads with a misincorporation at a G engaged in a tertiary interaction likely come from an RNA molecule which is in an alternative tertiary conformational state. In this specific circumstance, a tertiary folding constraint may impose incorrect restrictions on the folding of RNA molecules due to distinct tertiary conformations.

      Line 817. "Ability to". 

      Thank you. We have updated the main text to incorporate this comment.

      Figure 6F. Mistake in the axis description. 

      Thank you. We have updated the main text to incorporate this comment.

      Consider combining the paragraphs at lines 850 and 903. 

      Thank you for the suggestion. We rearranged paragraphs in the discussion to improve clarity.

      Line 1546. The final conc of DMS would be nice to see here.

      Thank you. We have updated the main text to incorporate this comment.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Using a knock-out mutant strain, the authors tried to decipher the role of the last gene in the mycofactocin operon, mftG. They found that MftG was essential for growth in the presence of ethanol as the sole carbon source, but not for the metabolism of ethanol, evidenced by the equal production of acetaldehyde in the mutant and wild type strains when grown with ethanol (Fig 3). The phenotypic characterization of ΔmftG cells revealed a growth-arrest phenotype in ethanol, reminiscent of starvation conditions (Fig 4). Investigation of cofactor metabolism revealed that MftG was not required to maintain redox balance via NADH/NAD+, but was important for energy production (ATP) in ethanol. Since mycobacteria cannot grow via substrate-level phosphorylation alone, this pointed to a role of MftG in respiration during ethanol metabolism. The accumulation of reduced mycofactocin points to impaired cofactor cycling in the absence of MftG, which would impact the availability of reducing equivalents to feed into the electron transport chain for respiration (Fig 5). This was confirmed when looking at oxygen consumption in membrane preparations from the mutant and would type strains with reduced mycofactocin electron donors (Fig 7). The transcriptional analysis supported the starvation phenotype, as well as perturbations in energy metabolism, and may be beneficial if described prior to respiratory activity data.

      We thank the reviewer for their thorough evaluation of our work. We carefully considered whether transcriptional data should be presented before the respirometry data. However, this would disrupt other transitions and the flow of thoughts between sections, so that we prefer to keep the order of sections as is.

      While the data and conclusions do support the role of MftG in ethanol metabolism, the title of the publication may be misleading as the mutant was able to grow in the presence of other alcohols (Supp Fig S2).

      We agree that ethanol metabolism was the focus of this work and that phenotypes connected to other alcohols were less striking. We, therefore, changed “alcohol” to “ethanol” in the title of the manuscript.

      Furthermore, the authors propose that MftG could not be involved in acetate assimilation based on the detection of acetate in the supernatant and the ability to grow in the presence of acetate. The minimal amount of acetate detected in the supernatant but a comparative amount of acetaldehyde could point to disruption of an aldehyde dehydrogenase.

      We do not agree that MftG might be involved in acetaldehyde oxidation. According to our hypothesis, the disruption of an acetaldehyde dehydrogenase would lead to the accumulation of acetaldehyde. However, we observed an equal amount of acetaldehyde in cultures of M. smegmatis WT and ∆mftG grown on ethanol as well as on ethanol + glucose. Furthermore, the amount of acetate detected in the supernatants is not “minimal” as the reviewer points out but higher as or comparable to the acetaldehyde concentration (Figure 3 E and F, note that acetate concentration are indicated in g/L, acetaldehyde concentrations in µM). Furthermore, the accumulation of mycofactocinols in ∆mftG mutants grown on ethanol is not in agreement with the idea of MftG being an aldehyde dehydrogenase but very well supports our hypothesis that MftG is involved in cofactor reoxidation.

      The link between mycofactocin oxidation and respiration is shown, however the mutant has an intact respiratory chain in the presence of ethanol (oxygen consumption with NADH and succinate in Fig 7C) and the NADH/NAD+ ratios are comparable to growth in glucose. Could the lack of growth of the mutant in ethanol be linked to factors other than respiration?

      Indeed, by using NADH and succinate as electron donors we show that the respiratory chain is largely intact in WT and ∆mftG grown on ethanol. Also, when mycofactocinols were used as an electron donor, we observed that respiration was comparable to succinate respiration in the WT. However, respiration was severely hampered in membranes of ∆mftG when mycofactocinols were offered as reducing agent. These findings support our hypothesis very well that MftG is necessary to shuttle electrons from mycofactocin to the respiratory chain, while the rest of the respiratory chain stayed intact. The fact that NADH/NAD+ ratios are comparable between ethanol and glucose conditions are interesting but indirectly support our hypothesis that mycofactocin and not NAD is the major cofactor in ethanol metabolism. Therefore, we do not see any evidence that the lack of growth of the mutant in ethanol is linked to factors other than respiration.

      To this end, bioinformatic investigation or other evidence to identify the membrane-bound respiratory partner would strengthen the conclusions.

      We generally agree that it is an important next step to identify the direct interaction partners of MftG. However, we are convinced that experimental evidence using several orthogonal approaches is required to unequivocally identify interaction partners of MftG. Nevertheless, we agree that a preliminary bioinformatics study, could guide follow-up studies. We therefore attempted to predict interaction partners of MftG using D-SCRIPT and Alphafold 2. However, our approach did not reveal any meaningful results. Thus, we prefer not to integrate this approach into the manuscript but briefly summarize our methodology here: To predict potential interaction partners of M. smegmatis mc2 155 MftG (MSMEG_1428), D-SCRIPT (Sledzieski et al. 2021, https://doi.org/10.1016/j.cels.2021.08.010) with the Topsy-Turvy model version 1 (Singh et al. 2022, https://doi.org/10.1093/bioinformatics/btac258) was employed to screen every combination of the MSMEG_1428 amino acid sequence with the amino acid sequence of every potential interaction partner from the M. smegmatis mc2 155 predicted total proteome (total 6602 combinations, UniProt UP000000757,  Genome Accession CP000480). Predictions failed for eight potential interaction partners due to size constraints (MSMEG_0019, MSMEG_0400, MSMEG_0402, MSMEG_0408, MSMEG_1252, MSMEG_3715, MSMEG_4727, MSMEG_4757; all amino acids sequences ≥ 2000 AA). Afterward, the top 100 predicted interaction partners, ranked by D-SCRIPT protein-protein-interaction score, were subjected to an Alphafold 2 multimer prediction using ColabFold batch version 1.5.5 (AlphaFold 2 with MMseqs2, Mirdita et al. 2022, https://doi.org/10.1038/s41592-022-01488-1) on a Google Colab T4 GPU with a Python 3 environment and the following parameters (msa_mode: MMseqs2 (UniRef+Environmental), num_models = 1, num_recycles = 3, stop_at_score = 100, num_relax = 0, relax_max_iterations = 200, use_templates = False). As input, the MSMEG_1428 amino acid sequence was used as protein 1 and the amino acid sequence of the potential interaction partner was used as protein 2. In addition, proteins of the electron transport chain and the dormancy regulon (dos regulon) were included as potential interaction partners. In total, 222 unique potential MftG interactions were predicted. The AlphaFold 2 model interface predicted template modelling (ipTM) score peaked at 0.45 for MftG-MftA. This score, however, lies below the threshold of 0.75, which indicates a likely false prediction of interaction (Yin et al. 2022, https://doi.org/10.1002/pro.4379). Nonetheless, the models with the highest ipTM scores (MftG with MftA, MSMEG_3233, MSMEG_4260, MSMEG_0419, MSMEG_5139, MSMEG_5140) were inspected manually using ChimeraX version 1.8 (Meng et al. 2023, https://doi.org/10.1002/pro.4792). However, no reasonable interaction was found.

      Reviewer #2 (Public Review):

      Summary

      Patrícia Graça et al., examined the role of the putative oxidoreductase MftG in regeneration of redox cofactors from the mycofactocin family in Mycolicibacerium smegmatis. The authors show that the mftG is often co-encoded with genes from the mycofactocin synthesis pathway in M. smegmatis genomes. Using a mftG deletion mutant, the authors show that mftG is critical for growth when ethanol is the only available carbon source, and this phenotype can be complemented in trans. The authors demonstrate the ethanol associated growth defect is not due to ethanol induced cell death, but is likely a result of carbon starvation, which was supported by multiple lines of evidence (imaging, transcriptomics, ATP/ADP measurement and respirometry using whole cells and cell membranes). The authors next used LC-MS to show that the mftG deletion mutant has much lower oxidised mycofactocin (MFFT-8 vs MMFT-8H2) compared to WT, suggesting an impaired ability to regenerate myofactocin redox cofactors during ethanol metabolism. These striking results were further supported by mycofactocin oxidation assays after over-expression of MftG in the native host, but also with recombinantly produced partially purified MftG from E. coli. The results showed that MftG is able to partially oxidise mycofactocin species, finally respirometry measurements with M. smegmatis membrane preparations from WT and mftG mutant cells show that the activity of MftG is indispensable for coupling of mycofactocin electron transfer to the respiratory chain. Overall, I find this study to be comprehensive and the conclusions of the paper are well supported by multiple complementary lines of evidence that are clearly presented.

      Strengths

      The major strengths of the paper are that it is clearly written and presented and contains multiple, complementary lines of experimental evidence that support the hypothesis that MftG is involved in the regeneration of mycofactocin cofactors, and assists with coupling of electrons derived from ethanol metabolism to the aerobic respiratory chain. The data appear to support the authors hypotheses.

      We thank the reviewer for their thorough evaluation of our work.

      Weaknesses

      No major weaknesses were identified, only minor weaknesses mostly surrounding presentation of data in some figures.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) In Fig 6 C and D, would it not be expected that MMFT-2H2 would be decreasing over time as MMFT-2 is increasing?

      This is true. MMFT-2H2 is indeed decreasing while MMFT-2 in increasing, however, since the y-axis is drawn in logarithmic scale the visible difference is not proportional to the actual changes. The increase of MMFT-2 against a very low starting point is more clearly visible than the decrease of MMFT-2H2, which was added in high quantities.

      (2) It would be beneficial to include rationale regarding the electron acceptors tested and why FAD was not included.

      FAD is a prosthetic group of the enzyme and was always a component of the assay. The other electron acceptors were chosen as potential external electron acceptors.

      (3) Bioinformatic analysis to capture possible interacting partners of MftG

      See our response to the previous review.

      Reviewer #2 (Recommendations For The Authors):

      Questions:

      (1) The co-occurrence analysis showed that one genome encoded mftG, but not mftC - do the authors think that this is a mftG mis-annotation?

      This is a good question. We have investigated this case more closely and conclude that this particular mftG is not a misannotation. Instead, it appears that the mftC gene underwent gene loss in this organism. We added on page 8, line 15: “Only one genome (Herbiconiux sp. L3-i23) encoding a bona fide MftG did not harbor any MftC homolog. However, close inspection revealed the presence of mftD, mftF, and a potential mftA gene but a loss of mftB,C and E in this organism.”

      (2) Figure 3A - the complemented mutant strain shows enhanced growth on ethanol when compared to the WT strain with the same mftG complementation vector, suggesting that dysregulation from the expression plasmid may not be responsible for this phenotype. Have the authors conducted whole genome sequencing on the mutant/complement isolate to rule out secondary mutations?

      This is an interesting point. We have not conducted further investigations into the complement mutant. However, we can confidently state that the complementation was successful in that it restored growth of the ∆mftG mutant on ethanol, thus confirming that the growth arrest of the mutant was due to the lack of mftG activity and not due to any secondary mutation. We also observed that both the complement strain and the overexpression strain, both of which are based on the same overexpression plasmid, exhibited shorter lag phases, faster growth and higher final cell densities compared to the wild type. We interpret these data in a way that overexpression of mftG might lift a growth limited step. Notably, this is only an interpretation, we do not make this claim. What we cannot explain at the moment, is the observation that the complement mutant grew to a higher OD than the overexpression strain. This is indeed interesting, and it might be due to an artefact or due to complex regulatory effects, which are hard to study without an in-depth characterization of the different strains involved. While this goes beyond the scope of this study, we are convinced that our main conclusions are not challenged by this phenomenon.

      (3) Figure 4C - could the yellow fluorescence that suggests growth arrest be quantified in these images similar to the size and septa/replication sites?

      In principle, this is a good suggestion. However, the amount of yellow fluorescence only differed in the starvation condition between genotypes. Since this condition was not a focus of this study, we preferred not to discuss these differences further.

      (4) Figure 4E - the complemented mutant strain has very high error, why is that? Could this phenotype not be complemented?

      It is true that the standard deviation (SD) is relatively high in this experiment. This is due to the fact that single-cell analyses based on microscopic images were conducted here - not bulk measurements of the average fluorescence. This means that the high variance partially reflects phenotypic heterogeneity of the population, rather than inefficient complementation. While it is interesting that not all cells behaved equally, a finding that deserves further investigations in the future, we conclude that the mean value is a good representative for the efficiency of the complementation.

      (5) While the whole cell extract experiment presented in Figure 6A is very clear, could the authors include SDS page or MS results of their partially purified MftG preparations used for figure B-F in the supplementary data to rule out any confounding factors that may be oxidising mycofactocin species in these preparations?

      We did not include SDS-Page or MS results since the enzyme preparations obtained were not pure. This is why we refer to the preparation as “partially purified fraction”. Since we were aware of the risk of confounding factors being potentially present in the preparation, we used two different expression hosts (M. smegmatismftG and E. coli) and included negative controls, i.e., a reaction using protein preparations from the same host that underwent the exact same purification steps but lacked the mftG gene. For instance, Figure 6A shows the negative control (M. smegmatismftG) and the verum (M. smegmatismftG-mftG_His6). Although this control is not shown in panels BCD for more clarity, we can assure that the proposed activity of MftG as never been detected in any extract of _M. smegmatismftG. Concerning MftG preparations obtained from heterologous expression in E. coli, we also performed empty vector controls and inactivated protein controls. We added a new Supplementary Figure S4 to show one example control. Taken together, the usage of two different expression hosts along with corresponding background controls clearly demonstrates that mycofactocinol oxidation only occurred in protein extracts of bacterial strains that contained the mftG gene. Taken together, these data indicate that the observed mycofactocinol dehydrogenase activity is connected to MftG and not to any background activity.

      Recommendations:

      • A suggestion - revise sub-titles in the results section to be more 'results-oriented' e.g. rather than 'the role of MftG in growth and metabolism of mycobacteria' consider instead 'MftG is critical for M. smegmatis capacity to utilise ethanol as a sole carbon source for growth' or something similar.

      In principle this is a good idea for many manuscripts. However, we have the impression that this approach does not reflect the complexity and additive aspect of the sections of our manuscript.

      • For clarity, revise all figures to include p-values in the figure legend rather than above the figures (use asterisks to indicate significance).

      We are not sure whether the deletion of p-values in the figures would enhance clarity. We would prefer to leave them within figures.

      • Figure 5B -revise colour legend, it is unclear which bar on the graph corresponds to which strain.

      The figure legend was enlarged to enhance readability.

      • Page 8 - MftG and MftC should be lowercase and italicised as the authors are writing about the co-occurrence of genes encoded in genomes, not proteins.

      Good point, we changed some instances of MftG / MftC to mftG / mftC, to more specifically refer to the gene level. However, in some cases, the protein level is more appropriate, for instance, the phylogenies are based on protein sequences. That is why we used the spelling MftG / MftC in these cases.

      • Page 9 - for clarity move Figure 3 after first in text citation.

      We moved Figure 3.

      • Page 17 - for clarity move Figure 5 after first in text citation.

      We moved Figure 5. We furthermore reformatted figure legend to fit onto the same page as the figures.

      • Page 20, line 17 - 'was attempted' change to 'was performed'. The authors did more than attempt purification, they succeeded!

      Since purification of MftG was not successful, we prefer the term “attempted” here. However, activity assays indeed indicate successful production of MftG.

      • Page 20, line 19-21 - data showing that the MftG-HIS6 complements ∆mftG could be included in supplementary information.

      Complementation was obvious by growth on media containing ethanol as a sole carbon source.

      • Page 26 line 25 - 'we also we' delete duplicated we.

      Thank you for the hint, we deleted the second instance of “we” in the manuscript.

      • Page 26 Line 26 - 'mycofactocinols were oxidised to mycofactocinols', should this read mycofactocinols were oxidised to mycofactocinones?

      Correct. We changed “mycofactocinols” to “mycofactocinones”

      • Page 28 line 17, huc hydrogenase operon

      We added (“huc operon”).

      • Page 38 line 24, 'Two' not 'to'.

      This is a misunderstanding. “To” is correct

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public reviews):

      (1) Given that this is one of the first studies to report the mapping of longitudinal intactness of proviral genomes in the globally dominant subtype C, the manuscript would benefit from placing these findings in the context of what has been reported in other populations, for example, how decay rates of intact and defective genomes compare with that of other subtypes where known.  

      Most published studies are from men living with HIV-1 subtype B and the studies are not from the hyperacute infection phase and therefore a direct head-to-head comparison with the FRESH study is difficult.  However, we can cite/highlight and contrast our study with a few a few examples from acute infection studies as follows.

      a. Peluso et. al., JCI, 2020, showed that in Caucasian men (SCOPE study), with subtype B infection, initiating ART during chronic infection virus intact genomes decayed at a rate of 15.7% per year, while defective genomes decayed at a rate of 4% per year.  In our study we showed that in chronic treated participants genomes decreased at a rate of 25% (intact) and 3% (defective) per month for the first 6 months of treatment.

      b. White et. al., PNAS, 2021, demonstrated that in a cohort of African, white and mixed-race American men treated during acute infection, the rate of decay of intact viral genomes in the first phase of decay was <0.3 logs copies in the first 2-3 weeks following ART initiation. In the FRESH cohort our data from acute treated participants shows a comparable decay rate of 0.31 log copies per month for virus intact genomes.

      c. A study in Thailand (Leyre et. al., 2020, Science Translational Medicine), of predominantly HIV-1 CRF01-AE subtype compared HIV-reservoir levels in participants starting ART at the earliest stages of acute HIV infection (in the RV254/SEARCH 010 cohort) and participants initiating ART during chronic infection (in SEARCH 011 and RV304/SEARCH 013 cohorts). In keeping with our study, they showed that the frequency of infected cells with integrated HIV DNA remained stable in participants who initiated ART during chronic infection, while there was a sharp decay in these infected cells in all acutely treated individuals during the first 12 weeks of therapy.  Rates of decay were not provided and therefore a direct comparison with our data from the FRESH cohort is not possible.

      d. A study by Bruner et. al., Nat. Med. 2016, described the composition of proviral populations in acute treated (within 100 days) and chronic treated (>180 days), predominantly male subtype B cohort. In comparison to the FRESH chronic treated group, they showed that in chronic treated infection 98% (87% in FRESH) of viral genomes were defective, 80% (60% in FRESH) had large internal deletions and 14% (31% in FRESH) were hypermutated.  In acute treated 93% (48% in FRESH) were defective and 35% (7% in FRESH) were hypermutated.  The differences frequency of hypermutations could be explained by the differences in timing of infection specifically in the acute treated groups where FRESH participants initiate ART at a median of 1 day after infection.  It is also possible that sex- or race-based differences in immunological factors that impact the reservoir may play a role.  

      This study also showed that large deletions are non-random and occur at hotspots in the HIV-1 genome. The design of the subtype B IPDA assay (Bruner et. al., Nature, 2019) is based on optimal discrimination between intact and deleted sequences - obtained with a 5′ amplicon in the Ψ region and a 3′ amplicon in Envelope. This suggest that Envelope is a hotspot for large while deletions in Ψ is the site of frequent small deletions and is included in larger 5′ deletions. In the FRESH cohort of HIV-1

      subtype C, genome deletions were most frequently observed between Integrase and Envelope relative to Gag (p<0.0001–0.001).

      e. In 2017, Heiner et. al., in Cell Rep, also described genetic characteristics of the latent HIV-1 reservoir in 3 acute treated and 3 chronic treated male study participants with subtype B HIV.  Their data was similar to Bruner et. al. above showing proportions of intact proviruses in participants who initiated therapy during acute/early infection at 6% (94% defective) and chronic infection at 3% (97% defective). In contrast the frequencies in FRESH in acute treated were 52% intact and 48% defective and in chronic infection were 13% intact and 87% defective.  These differences could be attributed to the timing of treatment initiation where in the aforementioned study early treatment ranged from 0.6-3.4 months after infection.

      (2) Indeed, in the abstract, the authors indicate that treatment was initiated before the peak. The use of the term 'peak' viremia in the hyperacute-treated group could perhaps be replaced with 'highest recorded viral load'. The statistical comparison of this measure in the two groups is perhaps more relevant with regards to viral burden over time or area under the curve viral load as these are previously reported as correlates of reservoir size. 

      We have edited the manuscript text to describe the term peak viraemia in hyperacute treated participants more clearly (lines 443-444). We have now performed an analysis of area under the curve to compare viral burden in the two study groups and found associations with proviral DNA levels after one year. This has been added to the results section (lines 162-163).

      Reviewer #2 (Public reviews):

      (1) Other factors also deserve consideration and include age, and environment (e.g. other comorbidities and coinfections.)

      We agree that these factors could play a role however participants in this study were of similar age (18-23), and information on co-morbidities and coinfections are not known.

      Reviewer #3 (Public reviews):

      (1) The word reservoir should not be used to describe proviral DNA soon after ART initiation. It is generally agreed upon that there is still HIV DNA from actively infected cells (phase 1 & 2 decay of RNA) during the first 6-12 months of ART. Only after a full year of uninterrupted ART is it really safe to label intact proviral HIV DNA as an approximation of the reservoir. This should be amended throughout.

      We agree and where appropriate have amended the use of the word reservoir to only refer to the proviral load after full viral suppression, i.e., undetectable viral load.

      (2) All raw, individualized data should be made available for modelers and statisticians. It would be very nice to see the RNA and DNA data presented in a supplementary figure by an individual to get a better grasp of intra-host kinetics.

      We will make all relevant data available and accessible to interested parties on request. We have now added a section on data availability (lines 489-491).

      (3) The legend of Supplementary Figure 2 should list when samples were taken.

      The data in this figure represents an overall analysis of all sequences available for each participant at all time points.  This has now been explained more clearly in the figure legend.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for The Authors):

      (1) It is recommended that the introduction includes information to set the scene regarding what is currently reported on the composition of the reservoir for those not in the immediate field of study i.e., the reported percentage of defective genomes and in which settings/populations genome intactness has been mapped, as this remains an area of limited information.

      We have now included summary of other reported findings in the field in the introduction (lines 89-92, 9498) and discussion (lines 345-350).  A more detailed overview has been provided in the response to public reviews.

      (2) It may be beneficial to state in the main text of the paper what the purpose of the Raltegravir was and that it was only administered post-suppression. Looking at Table 1, only the hyperacute treatment group received Raltegravir and this could be seen as a confounder as it is an integrase inhibitor. Therefore, this should be explained.

      Once Raltegravir became available in South Africa, all new acute infections in the study cohort had an intensified 4-drug regimen that included Raltegravir.  A more detailed explanation has now been included in the methods section (lines 435-437).

      (3) Can the authors explain why the viral measures at 6 months post-ART are not shown for chronictreated individuals in Figure 1 or reported on in the text?

      The 6 months post-ART time point has been added to Figure 1.

      (4) Can the authors indicate in the discussion, how the breakdown of proviral composition compares to subtype B as reported in the literature, for example, are the common sites of deletion similar, or is the frequency of hypermutation similar?

      Added to discussion (lines 345-350).

      (5) Do the numbers above the bars in Figure 3 represent the number of sampled genomes? If so, this should be stated.

      Yes, the numbers above the bars represent the number of sampled genomes. This has been added to the Figure 3 legend.

      (6) In the section starting on line 141, the introduction implies a comparison with immunological features, yet what is being compared are markers of clinical disease progression rather than immune responses. This should be clarified/corrected.

      This has been corrected (line 153).

      (7) Line 170 uses the term 'immediately' following infection, however, was this not 1 -3 days after?

      We have changed the word “immediately” to “1-3 days post-detection” (line 181).

      (8) Can the sampling time-points for the two groups be given for the longitudinal sequencing analysis?

      The sequencing time points for each group is depicted in Figure 2.

      (9) Line 183 indicates that intact genomes contributed 65% of the total sequence pool, yet it's given as 35% in the paragraph above. Should this be defective genomes?

      Yes, this was a typographical error.  Now corrected to read “defective genomes” (line 193).

      (10) The section on decay kinetics of intact and defective genomes seems to overlap with the section above and would flow better if merged.

      Well noted, however we choose to keep these sections separate.

      (11) Some references in the text are given in writing instead of numbering.

      This has been corrected.

      (12) In the clonal expansion results section, can it be indicated between which two time-points expansion was measured?

      This analysis was performed with all sequences available for each participant at all time points.  We have added this explanation to the respective Figure legend.

      Reviewer #2 (Recommendations for The Authors):

      (1) The statement on line 384 "Our data showed that early ART...preserves innate immune factors" - what innate immune factors are being referred to?

      We have removed this statement.

      (2) HLA genotyping methods are not included in the Methods section

      Now included and referenced (lines 481-483).

      (3) Are CD4:CD8 ratios available for the cohorts? This could be another informative clinical parameter to analyse in relation to HIV-1 proviral load after 1 year of ART – as done for the other variables (peak VL, and the CD4 measures).

      Yes, CD4:CD8 ratios are available. We performed the recommended analysis but found no associations with HIV-1 proviral load after 1 year of ART. We have added this to the results section (lines 163-164).

      (4) Reference formatting: Paragraph starting at line 247 (Contribution of clonal expansion...) - the two references in this paragraph are not cited according to the numbering system as for the rest of the manuscript. The Lui et al, 2020 reference is missing from the reference list - so will change all the numbering throughout.

      This has been corrected.

      Reviewer #3 (Recommendations for The Authors):

      (1) To allow comparison to past work. I suggest changing decay using % to half-life. I would also mention the multiple studies looking at total and intact HIV DNA decay rates in the intro.

      We do not have enough data points to get a good estimate of the half-life and therefor report decay as percentage per month for the first 6 months. 

      (2) Line 73: variability is the wrong word as inter-individual variability is remarkably low. I think the authors mean "difference" between intact and total.

      We have changed the word variability to difference as suggested.

      (3) Line 297: I am personally not convinced that there is data that definitively shows total HIV DNA impacting the pathophysiology of infection. All of this work is deeply confounded by the impact of past viremia. The authors should talk about this in more detail or eliminate this sentence.

      We have reworded the statement to read “Total HIV-1 DNA is an important biomarker of clinical outcomes.” (Lines 308-309).

      (4) Line 317; There is no target cell limitation for reservoir cells. The vast majority of CD4+ T cells during suppressive ART are uninfected. The mechanism listing the number of reservoir cells is necessarily not target cell limitation.

      We agree. The statement this refers to has been reworded as follows: “Considering, that the majority of CD4 T cells remain uninfected it is likely that this does not represent a higher number of target cells, and this warrants further investigation.” (lines 325-326).

      (5) Line 322: Some people in the field bristle at the concept of total HIV DNA being part of the reservoir as defective viruses do not contribute to viremia. Please consider rephrasing. 

      We acknowledge that there are deferring opinions regarding total HIV DNA being part of the reservoir as defective viruses do not contribute to viremia, however defective HIV proviruses may contribute to persistent immune dysfunction and T cell exhaustion that are associated comorbidities and adverse clinical outcomes in people living with HIV.  We have explained in the text that total HIV-DNA does not distinguish between replication-competent and -defective viruses that contribute to the viral reservoir.

      (6) Line 339: The under-sampling statement is an understatement. The degree of under-sampling is massive and biases estimates of clonality and sensitivity for intact HIV. Please see and consider citing work by Dan Reeves on this subject.

      We agree and have cited work by Dan Reeves (line 358).

      (7) Line 351: This is not a head-to-head comparison of biphasic decay as the Siliciano group's work (and others) does not start to consider HIV decay until one year after ART. I think it is important to not consider what happens during the first year of ART to be reservoir decay necessarily.

      Well noted.

      (8) Line 366-371: This section is underwritten. In nearly all PWH studies to date, observed reservoirs are highly clonal.

      We agree that observed reservoirs are highly clonal but have not added anything further to this section.

      (9) It would be nice to have some background in the intro & discussion about whether there is any a priori reason that clade C reservoirs, or reservoirs in South African women, might differ (or not) from clade B reservoirs observed in different study participants.

      We have now added this to the introduction (lines 94-103).

      (10) Line 248: This sentence is likely not accurate. It is probable that most of the reservoir is sustained by the proliferation of infected CD4+ T cells. 50% is a low estimate due to under-sampling leading to false singleton samples. Moreover, singletons can also be part of former clones that have contracted, which is a natural outcome for CD4+ T cells responding to antigens &/or exhibiting homeostasis. The data as reported is fine but more complex ecologic methods are needed to truly probe the clonal structure of the reservoir given severe under sampling.

      Well noted.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      The present study's main aim is to investigate the mechanism of how VirR controls the magnitude of MEV release in Mtb. The authors used various techniques, including genetics, transcriptomics, proteomics, and ultrastructural and biochemical methods. Several observations were made to link VirR-mediated vesiculogenesis with PG metabolism, lipid metabolism, and cell wall permeability. Finally, the authors presented evidence of a direct physical interaction of VirR with the LCP proteins involved in linking PG with AG, providing clues that VirR might act as a scaffold for LCP proteins and remodel the cell wall of Mtb. Since the Mtb cell wall provides a formidable anatomical barrier for the entry of antibiotics, targeting VirR might weaken the permeability of the pathogen along with the stimulation of the immune system due to enhanced vesiculogenesis. Therefore, VirR could be an excellent drug target. Overall, the study is an essential area of TB biology.

      We thank the reviewer for the kind assessment of our paper.  

      Strengths: 

      The authors have done a commendable job of comprehensively examining the phenotypes associated with the VirR mutant using various techniques. Application of Cryo-EM technology confirmed increased thickness and altered arrangement of CM-L1 layer. The authors also confirmed that increased vesicle release in the mutant was not due to cell lysis, which contrasts with studies in other bacterial species. 

      Another strength of the manuscript is that biochemical experiments show altered permeability and PG turnover in the mutant, which fits with later experiments where authors provide evidence of a direct physical interaction of VirR with LCP proteins. 

      Transcriptomics and proteomics data were helpful in making connections with lipid metabolism, which the authors confirmed by analyzing the lipids and metabolites of the mutant. 

      Lastly, using three approaches, the authors confirm that VirR interacts with LCP proteins in Mtb via the LytR_C terminal domain. 

      Altogether, the work is comprehensive, experiments are designed well, and conclusions are made based on the data generated after verification using multiple complementary approaches.

      We are glad that this reviewer finds our study of interest and well designed.   

      Weaknesses: 

      (1) The major weakness is that the mechanism of VirR-mediated EV release remains enigmatic. Most of the findings are observational and only associate enhanced vesiculogenesis observed in the VirR mutant with cell wall permeability and PG metabolism. The authors suggest that EV release occurs during cell division when PG is most fragile. However, this has yet to be tested in the manuscript - the AFM of the VirR mutant, which produces thicker PG with more pore density, displays enhanced vesiculogenesis. No evidence was presented to show that the PG of the mutant is fragile, and there are differences in cell division to explain increased vesiculogenesis. These observations, counterintuitive to the authors' hypothesis, need detailed experimental verification.

      We concur with the reviewer that we do not have direct evidence showing a more fragile PG in the virR mutant and our statement is supported by a compendium of different results. However, this statement is framed in the discussion section as a possible scenario, acknowledging that more experiments are needed to make such connection. Nevertheless, we provide additional data on the molecular characterization of virRmut PG using MS to show a significant increase in the abundance of deacetylated muropeptides, a feature that has been linked to altered lysozyme sensitivity in other unrelated Gram-positive bacteria

      (Fig 8 G,H).  

      (2.1) Transcriptomic data only adds a little substantial. Transcriptomic data do not correlate with the proteomics data. It remains unclear how VirR deregulates transcription. 

      We concur with the reviewer that information provided by transcriptomics and proteomics is a bit fragmented and, taking into consideration the low correlation between both datasets, it does not help to explain the phenotype observed in the mutant. This issue has also been raised by another reviewer so, we have paid special attention to that. 

      To refine the biological interpretation of the transcriptomic data we have integrated the complemented strain (virRmut-Comp) in our analyses. This led us to narrow down the virR-dependent transcriptomics signature to the sets of genes that appear simultaneously deregulated in virRmut with respect to both WT and complemented strain in either direction. Furthermore, to identify the transcription factors whose regulatory activity appear disrupted in the mutant strain, we have resorted to an external dataset (Minch et al. 2015) and found a set of 10 transcriptional regulators whose regulons appear significantly impacted in the virRmut strain. While admittedly these improvements do not fully address the question tackled by the reviewer, we found that they contribute to a more precise characterization of the VirR-dependent transcriptional signatures, as well as the regulons, in the genome-wide transcriptional regulatory network of the pathogen that appear altered because of virR disruption. We acknowledge that the lack of correlation between whole-cell lysates proteomics and transcriptomic data is something intriguing, albeit not uncommon in Mycobacterium tuberculosis. However, differences in the protein cargo of the vesicles from different strains share key pathways in common with the transcriptomic analyses, such as the enrichments in cell wall biogenesis and peptidoglycan biosynthesis that are observed both among genes that are downregulated in both cases in virRmut.

      (2.2) TLCs of lipids are not quantitative. For example, the TLC image of PDIM is poor; quantitative estimation needs metabolic labeling of lipids with radioactive precursors. Further, change in PDIMs is likely to affect other lipids (SL-1, PAT/DAT) that share a common precursor (propionyl- CoA).

      We also agree with the reviewer that TLC, as it is, it is not quantitative. However, we do not have access to radioactive procedures. In the new version of the manuscript, we have run TLCs on all the strains tested to resolve SLs and PAT/DATs (Fig S8). Our results show a reduction in the pool of SL and DATs in the mutant, indicating that part of the methylmalonil pool is diverted to the synthesis of PDIMs. 

      (3) The connection of cholesterol with cell wall permeability is tenuous. Cholesterol will serve as a carbon source and contribute to the biosynthesis of methyl-branched lipids such as PDIM, SL-1, and PAD/DAT. Carbon sources also affect other aspects of physiology (redox, respiration, ATP), which can directly affect permeability and import/export of drugs. Authors should investigate whether restoration of the normal level of permeability and EV release is not due to the maintenance of cell wall lipid balance upon cholesterol exposure of the VirR mutant.

      We concur with the reviewer that cholesterol as a sole carbon source is introducing many changes in Mtb cells beside permeability. Consequently, we investigated the virRmut lipid profile upon exposure to either cholesterol or TRZ (Fig S8). Both WT and virRmut-Comp strains were included in the analysis. Polar lipid analysis revealed that either cholesterol or TRZ exposure induced a marked reduction in PIMs and cardiolipin (DPG) levels in virRmut relative to WT or complemented strains (Fig S8A). Analysis of apolar lipids indicated that, relative to glycerol MM, virRmut cultured in the presence of cholesterol or TRZ showed reduced levels of TDM and DATs compared to WT and virRmut-Comp strains (Fig S8B). These results suggest a lack of correlation between modulation of cell permeability by cholesterol and TRZ and lipid levels in the absence of VirR.

      Furthermore, about this section, we would like to mention that we have modified the reference used for the annotation of the DosR regulon: moving from the definition of the regulon used in the previous submission (coming from Rustad, el at. PLoS One 3(1), e1502 (2008). The enduring hypoxic response of Mycobacterium tuberculosis) to the more recent characterization of the regulon based on CHiPseq data, reported in Minch et al. 2015. This was done to ensure coherence with the transcriptomics analyses in the new figure 4.

      (4) Finally, protein interaction data is based on experiments done once without statistical analysis. If the interaction between VirR and LCP protein is expected on the mycobacterial membrane, how the SPLIT_GFP system expressed in the cytoplasm is physiologically relevant. No explanation was provided as to why VirR interacts with the truncated version of LCP proteins and not with the full-length proteins.

      We have repeated the experiments and applied statistics (Figure 9). As stated in the manuscript this assay has successfully been applied to interrogate interactions of domains of proteins embedded in the membrane of mycobacteria. Therefore, we believe that this assay is valid to interrogate interactions between Lcp proteins.

      Reviewer #2 (Public Review): 

      Summary: 

      In this work, Vivian Salgueiro et al. have comprehensively investigated the role of VirR in the vesicle production process in Mtb using state-of-the-art omics, imaging, and several biochemical assays. From the present study, authors have drawn a positive correlation between cell membrane permeability and vesiculogenesis and implicated VirR in affecting membrane permeability, thereby impacting vesiculogenesis. 

      Strengths: 

      The authors have discovered a critical factor (i.e. membrane permeability) that affects vesicle production and release in Mycobacteria, which can broadly be applied to other bacteria and may be of significant interest to other scientists in the field. Through omics and multiple targeted assays such as targeted metabolomics, PG isolation, analysis of Diaminopimelic acid and glycosyl composition of the cell wall, and, importantly, molecular interactions with PG-AG ligating canonical LCP proteins, the authors have established that VirR is a central scaffold at the cell envelope remodelling process which is critical for MEV production. 

      We thank the reviewer for the kind assessment of the paper.

      Weaknesses: 

      Throughout the study, the authors have utilized a CRISPR knockout of VirR. VirR is a non-essential gene for the growth of Mtb; a null mutant of VirR would have been a better choice for the study. 

      According to Tn mutant databases and CRISPR databases, virR is a non-essential gene. However, we have tried to interrupt this gene using the allelic exchange substitution approach via phages many times with no success. So far there is no precedent of a clean KO mutant in this gene. White et al., generated a virR mutant consisting of deletion of a large fragment of the c-terminal part of the protein, pretty much replicating the effect of the Tn insertion site in the virR Tn mutant. These precedents made us to switch to CRISPR technology.  

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      (1) The authors monitored cell lysis by measuring the release of a cytoplasmic iron-responsive protein (IdeR). Since EV release is regulated by iron starvation, which is directly sensed by IdeR, another control (unrelated to iron) is needed. A much better approach would be to use hydrophobic/hydrophilic probes to measure changes in the cell wall envelope.

      Does the VirR complemented strain have a faint IdeR band in the supernatant? The authors need to clarify. Also, it's unclear whether the complementation restored normal VirR levels or not. 

      We thank the reviewer for this recommendation. Consequently, we have complemented these studies by an alternative approach based on serially diluted cultures spotted on solid medium. These results align very well with that of western blot using IdeR levels in the supernatant as a surrogate of cell lysis.

      We also noticed the presence of a faint IdeR band in the supernatant of the complemented strain and suggestive of a possible cell lysis. However, as shown in other section this was not translated into increased levels of vesiculation. As previously shown in a previous paper describing VirR as a genetic determinant of vesiculogenesis, VirR levels in the complemented strains are not just restored but increased considerably. This overexpression could explain the potential artifact of a leaky phenotype in the complemented strain. In addition to that previous study, the proteomic data included in this paper clearly shows a restoration of VirR levels relative to the WT strains.

      (2) Figure 2C: The data are weak; I don't see any difference in incorporating FDAAs in MM media. Even in the 7H9 medium, differences appear only at the last time point (20 h). What happens at the time point after 20 h (e.g., 48 h)? How do we differentiate between defective permeability or anabolism leading to altered PG? No statistical analysis was performed.

      We apologize for the incomplete assessment of the results in this figure. First, this figure just shows differential incorporation of FDAAs in the different strains in different media. As per previous studies (Kuru et al (2017) Nat. Protocols), these probes can freely enter into cells and may be incorporated into PG by at least three different mechanisms, depending on the species: through the cytoplasmic steps of PG biosynthesis and via two distinct transpeptidation reactions taking place in the periplasm. Consequently, the differential labeling observed in virRmut relative to WT strain may be a consequence of the enlarge PG observe din the mutant. We have repeated the experiment and created new data. First, we have cultured strains with a blue FDAA (HADA) for 48 to ensure full labeling. Then, we washed cells and cultured in the presence of a second FDAA, this time green (FDL) for 5 h. The differential incorporation of FDL relative to HADA was then measured under the fluorescence microscope. This experiment showed a virRmut incorporate more FDL that the other strains, suggesting an altered PG remodeling.  modified the figure to make clearer the early and late time points of the time-course and applied statistics.

      (3) Many genes (~ 1700) were deregulated in the mutant. Since these transcriptional changes do not correlate at the protein level in WCL, it's important to determine VirR-specificity. RNA-Seq of VirR complemented strain is important.

      We think this was an extremely important point, and we thank the reviewer for pointing this out. Following their suggestion, we have analyzed and integrated data from the complemented strain, which we have added to the GEO submission, to conclude that, in fact, differences in expression between the complemented strain and either the WT, or virRmut are also common and highly significant. Albeit this is not completely unexpected, given the nature of our mutants and the fact that the complemented strains show significantly higher levels of expression of VirR -both at the RNA and protein levels- than the WT, it motivated us to narrow down our definition of VirR-dependent genes to adopt a combined criterium that integrated the complemented strain. Following this approach, we considered the set of genes upregulated (downregulated) in virRmut as those whose expression in that strain is, at the same time, significantly higher (lower) than in WT as well as in virRmut-Comp. Working with this integrated definition, the genes considered -399 upregulated and 502 downregulated genes- are those whose observed expression changes are more likely to be genuinely VirR-dependent rather than any non-specific consequence of the mutagenesis protocols. Despite the lower number of genes in these sets, the repetition of all our functional enrichment analyses based on this combined criterium leads us to conclusions that are largely compatible with those presented in the first version of the paper.

      (4) Transcriptome data provide no clues about how VirR could mediate expression deregulation. Is there an overlap with the regulations/regulons of any Mtb transcription factors? One clue is DosR; however, DosR only regulates 50-60 genes in Mtb. 

      Again, we would like to thank the reviewer for this recommendation, which we have followed accordingly to generate a new section in the results named “VirR-dependent genes intersect the regulons of key transcriptional regulators of the responses to stress, dormancy, and cell wall remodeling”. As we explain in this new section, we resorted to the regulon annotations reported in (Minch et al. 2015), where ChIP-seq data is collected on binding events between a panel of 143 transcription factors (TFs) and DNA genome-wide. The dataset includes 7248 binding events between regulators and DNA motifs in the vicinity of targets’ promoters. After completing enrichment analyses with the resulting regulons, we identified 10 transcription factors whose intersections with the sets of up and downregulated genes in virRmut were larger than expected by chance (One tailed Fisher exact test, OR>2, FDR<0.1). Those regulators -which, as guessed by the referee, included DevR-, control key pathways related with cell wall remodeling, stress responses, and transition to dormancy.

      (5) How many proteins that are enriched or depleted in the EVs of the VirR mutant also affected transcriptionally in the mutant? How does VirR regulate the abundance and transport of protein in EVs? 

      While the intersection between genes and proteins that appear upregulated in the virRmut strain both at transcriptional and vesicular protein levels (N=21) was found larger than expected by chance (OR=2.0 p=7.0E-3), downregulated genes and proteins in virRmut (N=14) were not enriched in each other. These results, indicated, at most, a scarce correlation between RNA and protein levels (a phenomenon nonetheless previously observed in Mycobacterium tuberculosis, among other organisms, see Cortés et al. 2013). Admittedly, the compilation of these omics data is insufficient, by itself to pinpoint the specific regulatory mechanisms through which the absence of VirR impacts protein abundance in EVs. For the sake of transparency, this has been acknowledged in the discussion section of the resubmitted version of the manuscript.

      (6) The assumption that a depleted pool of methylmalonyl CoA is due to increased utilization for PDIM biosynthesis is problematic. Without flux-based measurement, we don't know if MMCoA is consumed more or produced less, more so because Acc is repressed in the VirR mutant EVs. Further, MMCoA feeds into the TCA cycle and other methyl-branched lipids. Without data on other lipids and metabolism, the depletion of MMCoA is difficult to explain.

      The differential expression statistics compiled suggest that both effects may be at place, since we observed, at the same time, a downregulation of enzymes controlling methylmalonyl synthesis from propionyl-CoA (i.e. Acc, at the protein level), as well as an upregulation of enzymes related with its incorporation into DIM/PDIMs (i.e. pps genes). Both effects, combined, would favor an increased rate of methylmalonyl production, and a slower depletion rate, thus contributing to the higher levels observed. We however concur with the reviewer that fluxomics analyses will contribute to shed light on this question in a more decisive manner, and we have acknowledged this in the discussion section too.   

      (7) Figure 5: Deregulation of rubredoxins and copper indicates impaired redox balance and respiration in the mutant. The data is complex to connect with permeability as TRZ is mycobactericidal and also known to affect the respiratory chain. The authors need to investigate if, in addition to permeability, the presence of VirR is essential for maintaining bioenergetics.

      The data related to rubredoxins and copper has been modified after reanalyzing transcriptomic data including the complemented strain. Nevertheless, we found that some features of the response to stresses may be impaired in the mutant, including the one to oxidative stress. In this regard, we found the enhanced sensitivity of the mutant to H2O2 relative to WT and complemented strains. This piece of data is now included as Fig S3 in the new version of the manuscript.

      (8) Differential regulation of DoS regulon and cholesterol growth could also be linked to differences in metabolism, redox, and respiration. What is the phenotype of VirR mutants in terms of growth and respiration in the presence of cholesterol/TRZ? 

      We thank the reviewer for this suggestion. Consequently, we have added a new section to Results that suggest that other aspects of mycobacterial physiology may be affected in the virR mutant when cultured in the presence of cholesterol or TRZ: 

      “Modulation of EV levels and permeability in virRmut by cholesterol and TRZ. We next wondered about the effect of culturing virRmut on both cholesterol or TRZ could have on cell growth, permeability and EV production. In the case of cholesterol, it has also been shown to affect other aspects of physiology (redox, respiration, ATP), which can directly affect permeability (Lu et al., 2017). We monitored virRmut growth cultured in MM supplemented with either glycerol, cholesterol as a sole carbon source, and TRZ at 3 ug ml-1 for 20 days. While cholesterol significantly enhanced the growth virRmut after 5 days relative to glycerol medium, supplementation of glycerol medium with TRZ restricted growth during the whole time-course (Fig S5A). The study of cell permeability in the same conditions indicated that the enhanced cell permeability observed in glycerol MM was reduced when virRmut when cultured with cholesterol as sole carbon source. Conversely, the presence of TRZ increased cell permeability relative to the medium containing solely glycerol (Fig S5C). As we have previously observed for the WT strain, either condition (Chol or TRZ) also modified vesiculation levels in the mutant accordingly (Fig S5B). These results strongly indicates that other aspects of mycobacterial physiology besides permeability are also affected in the virR mutant and may contribute to the observed enhanced vesiculation.

      (9) PDIM TLC is not evident; both DimA and DImB should be clearly shown. It will also be necessary to show other methyl-branched lipids, such as SL-1 and PAT/DAT, because the increase in PDIM can take away methyl malonyl CoA from the biosynthesis of SL-1 and PAT/DAT. Studies have shown that SLI-, PAT/DAT, and PDIM are tightly regulated, where an increase in one lipid pool can affect the abundance of other lipids. Quantitative assays using 14C acetate/propionate are most appropriate for these experiments. 

      We apologize for the fact that TLC analysis is not performed in a radioactive fashion. However, we do not have access to this approach. To answer reviewer question about the fact that other methyl-branched lipids may explain the altered flux of methyl malonyl CoA, we have run TLCs on all the strains tested to resolve SLs and PAT/DATs (Fig S8). Notably, we observed a reduction in the level of these lipids (SL1 or PAT/DAT) in virRmut cultured in glycerol relative to WT and complemented strains, suggesting that the excess of PDIM synthesis can take away methyl malonyl CoA from the biosynthesis of SL-1 and PAT/DAT in the absence of VirR (Fig S8B).

      (10) Figure 8: Interaction between VirR and Lcp proteins. Since these interactions are happening in the membrane, using a split GFP system where proteins are expressed in the cytoplasm is unlikely to be relevant.

      Also, experiments on Figure 8C are performed once, and representation needs to be clarified; split GFP needs a positive control, and negative control (CtpC) is not indicated in the figure.

      We have repeated the experiments and applied statistics (Figure 9). As stated in the manuscript this assay has successfully been applied to interrogate interactions of domains of proteins embedded in the membrane of mycobacteria. Therefore, we believe that this assay is valid to interrogate interactions between Lcp proteins.

      Reviewer #2 (Recommendations For The Authors):  

      (1) Authors should consider making more effort to mine the omics data and integrate them. Given the amount of data that is generated with the omics, they need to be looked at together to find out threads that connect all of them. 

      In the resubmitted version of the paper, we have followed reviewer´s recommendation by incorporating new analyses that integrated the virRmut-C strain, and tried to provide context to the differences found in the context of broader transcriptional regulatory networks (new figure 4), as well as in the context of metabolic pathways related with PDIM biosynthesis from methylmalonyl (figure 6I, already present in the first submission). We consider that these additions contribute to a deeper interpretation of the omics data in the line of what was suggested by the reviewer.

      (2) The interpretation given by authors in lines 387-390 is an interpretation that does not have sufficient support and, hence should be moved into discussion. 

      We thank the reviewer for this recommendation. We believe that these new analyses and integration studies now support the above statement.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      I recommend being explicit regarding how the animals were habituated to blood sampling.

      On lines 109-111 we have added a more detailed explanation of how mice were habituated to blood sampling. This includes details that mice were held and had their tails palpated for approximately 5 minutes per day.

      Were any mice excluded due to loss or movement of the implant over time? Any details to allow replication of long-term measurements like this should be included.

      No mice lost their cannulas during experimentation so we have added a sentence on this on lines 303 to 304 to this effect.  We have also noted that there was a slight decrease in signal over the months of experimentation. A statement on line 318 has also been added that clarifies two mice lost between the pregnancy and lactation stages of experiment were euthanised due to dystocia.

      The text states that synchronized episodic activity reappeared as early as 3 days after birth, citing Figure 6c as evidence. There is no 6c. Figure 6b shows day 5 after birth.

      This has been corrected.

      The methods state mRNA levels had to be "above background" to be counted as colocalization. At how many fold/what percent above background was a cell considered positive for expression?

      Positive hybridisation was scored according to the manufacturer’s protocol and a statement to this effect has been added on line 144.

      Please ensure figure titles or the data graphs explicitly give the genotype of the mice in all figures (or state the mice are wildtype).

      Genotype has been added to figure titles where possible. Genotypes are always given in figure legends and tables and/or explicitly stated on the figure itself.

      Figure 4's title states events are "perfectly" correlated, which is a subjective term. I recommend saying "consistently" or "temporally" correlated, depending on your meaning.

      This has been amended to read “consistently correlated”

      Reviewer #2 (Recommendations For The Authors):

      The comments below aim to clarify the paper's methodology and results but do not detract from my overall enthusiasm for this work.

      - Given past studies demonstrating prolactin action in the brain, particularly the MPOA/MPN, is essential for maternal behavior, can the authors please clarify why this behavior is retained in the cam2a prlr knockout mice? The authors mention that prlr in the MPOA is only knocked down 50% compared to WT controls. Is this sufficient to retain maternal behavior?

      In our experience 50% Prlr in the MPOA is sufficient to retain normal maternal behaviour in most animals including the ones in this experiment (our original paper describing this showed relatively normal behavior, for example, with a vGAT and vGlut-mediated knockouts, and even a double knockout – it was only when we achieved complete KO with an AAV-Cre that we saw failure of maternal behavior – Brown et al, PNAS 3;114(40):10779-10784 2017). We have added a statement on lines 157-159 regarding this.  We have an additional paper in preparation specifically characterising the maternal behaviour and lactation outcomes in this line of mice, and we find most animals display normal maternal behaviour, with slightly impaired milk production in later lactation.

      - Supplementary Figure 1. Can the authors please clarify the criteria for a cell to be positive for prlr? The methods state that the signal must be "above background level." How was the background measurement obtained? In the negative control?

      As per above, scoring of positive hydribisation was done according to the manufacturer’s protocol and a statement to this effect has been added on line 144.

      - Lines 310-314: This sentence describes RNAscope analysis of prlr knockdown in kisspeptin cells and refers to Extended Figure 3 - but I believe this is in Supplementary Figure 1.

      This has been corrected.

      - Figure 3-4: When mice return to estrous cycles, the amplitude of episodic kisspeptin neuron activity is the same as 24 hours after weaning, which appears much lower than in virgin females. Does this reach significance? If so, do the authors know why kisspeptin activity is still suppressed, and can they comment on why this may not affect estrous cyclicity?

      This does not reach significance – see Supplementary Table 1 (4C) for statistics. Therefore, no further analysis was done. This question would need to be examined with a follow up experiment. Given the 5s on, 15 s off scheduled mode of recording used here, amplitude was not an extremely accurate measure and amplitude has been reported as relative within each mouse. There is also an additional issue of a gradual reduction in amplitude of signal over time in these long-term experiments – although it is true that much larger signals were detected after ovariectomy at the end of the experiment.  At present, we have not tried to interpret whether the changes in amplitude are informative.

      - Fiber photometry studies: Please indicate whether a post-mortem examination of GCaMP transfection and fiber photometry placement was conducted, and what region of the ARC was imaged.

      Brains from these mice were collected, however postmortem analysis of cannula placement of GCaMP6 transfection was not carried out in all mice. This was based on our experience with this method, in that the quality and characteristic pattern of activity seen, as well as corresponding LH secretion following an SE, was indicative of successful cannula placement and transfection.  Incorrectly placed cannular failed to show SEs. A trial was done with 3 mice and cannula placement was found to be in the caudal ARC (cARC) with GFP (attached to GCaMP) restricted to the cARC. A statement has been added on lines 306-313 regarding this.

      - Were male mice removed before birth? Please add to the methods section if not included.

      Yes, male mice were removed after a sperm plug was seen and were never present at parturition. We have inserted additional details on line 95 to this effect.

      Reviewer #3 (Recommendations For The Authors):

      (1) Line 172: n=7-8 per group, yet in Supplementary Figure 2, n=6 per group.

      These are referring to different groups of mice. N=7-8 is referring to the group size of mice in Figure 2 that were given mifepristone or vehicle control. In contrast the Supplementary figure 2 n number refers to the mice in the pilot study. Additional n number for the pilot study has been added on line 194.

      (2) Line 314: Extended = suppl; Figure 3 = 1.

      This has been corrected.

      (3) Line 451: Figure 6C, does not exist.

      This has been corrected.

      Line 590: Reference 23 could be replaced by Ordog T et al 1998 Am J Physiol 274,E665 because it is later and more relevant to the topic.

      This reference has been replaced with the suggested reference.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public Review):

      Summary:

      Wang and colleagues presented an investigation of pig-origin bacteria Bacillus velezensis HBXN2020, for its released genome sequence, in vivo safety issue, probiotic effects in vitro, and protection against Salmonella infection in a murine model. Various techniques and assays are performed; the main results are all descriptive, without new insight advancing the field or a mechanistic understanding of the observed protection.

      Thank you very much for your reading and comments our manuscript.

      Strengths:

      An extensive study on probiotic property of the Bacillus velezensis strain HBXN2020

      Thank you very much for your reading and comments our manuscript.

      Weaknesses:

      The main results are descriptive without mechanistic insight. Additionally, most of the results and analysis parts are separated without a link or a story-telling way to deliver a concise message.

      Thank you for your comments and suggestions on our manuscript. In later work, we will focus on exploring the antibacterial substances and bactericidal mechanisms of B. velezensis. The manuscript results and analysis sections have been extensively revised. We appreciate your review and feedback.

      Reviewer #2 (Public Review):

      Summary:

      In this study, Wang and colleagues study the potential probiotic effects of Bacillus velezensis. Bacillus species have potential benefit to serve as probiotics due to their ability to form endospores and synthesize secondary metabolites. B. velezensis has been shown to have probiotic effects in plants and animals but data for human use are scarce, particularly with respect to salmonella-induced colitis. In this work, the authors identify a strain of B. velezensis and test it for its ability to control colitis in mice.

      Thanks for the constructive comments and the positive reception of the manuscript.

      Key findings:

      (1) The authors sequence an isolate for B. velezensis - HBXN2020 and describe its genome (roughly 4 mb, 46% GC-content etc).

      Thanks for the constructive comments and the positive reception of the manuscript.

      (2) The authors next describe the growth of this strain in broth culture and survival under acid and temperature stress. The susceptibility of HBXN2020 was tested against various antibiotics and against various pathogenic bacteria. In the case of the latter, the authors set out to determine if HBXN2020 could directly inhibit the growth of pathogenic bacteria. Convincing data, indicating that this is indeed the case, are presented.

      Thanks for the constructive comments and the positive reception of the manuscript.

      (3) To determine the safety profile of BHXN2020 (for possible use as a probiotic), the authors infected the strain in mice and monitored weight, together with cytokine profiles. Infected mice displayed no significant weight loss and expression of inflammatory cytokines remained unchanged. Blood cell profiles of infected mice were consistent with that of uninfected mice. No significant differences in tissues, including the colon were observed.

      Thanks for the constructive comments and the positive reception of the manuscript.

      (4) Next, the authors tested the ability to HBXN2020 to inhibit growth of Salmonella typhimurium (STm) and demonstrate that HBXN2020 inhibits STm in a dose dependent manner. Following this, the authors infect mice with STm to induce colitis and measure the ability of HBXN2020 to control colitis. The first outcome measure was a reduction in STm in faeces. Consistent with this, HBXN2020 reduced STm loads in the ileum, cecum, and colon. Colon length was also affected by HBXN2020 treatment. In addition, treatment with HBXN2020 reduced the appearance colon pathological features associated with colitis, together with a reduction in inflammatory cytokines.

      Thanks for the constructive comments and the positive reception of the manuscript.

      (5) After noting the beneficial (and anti-inflammatory effects) of HBXN2020, the authors set out to investigate effects on microbiota during treatment. Using a variety of algorithms, the authors demonstrate that upon HXBN2020 treatment, microbiota composition is restored to levels akin to that seen in healthy mice.

      Thanks for the constructive comments and the positive reception of the manuscript.

      (6) Finally, the authors assessed the effect of using HBXN2020 as prophylactic treatment for colitis by first treating mice with the spores and then infecting with STm. Their data indicate that treatment with HBXN2020 reduced colitis. A similar beneficial impact was seen with the gut microbiota.

      Thanks for the constructive comments and the positive reception of the manuscript.

      Strengths:

      (1) Good use of in vitro and animal models to demonstrate a beneficial probiotic effect.

      Thank you very much for your reading and comments our manuscript.

      (2) Most observations are supported using multiple approaches.

      Thanks for the comments and the positive reception of the manuscript.

      (3) Mouse experiments are very convincing.

      Thanks for the comments and the positive reception of the manuscript.

      Weaknesses:

      (1) Whilst a beneficial effect is observed, there no investigation of the mechanism that underpins this.

      Thank you for pointing this out. We apologize for any inconvenience caused by the lack of mechanism research of the manuscript. In later work, we will focus on exploring the antibacterial substances and bactericidal mechanisms of B. velezensis. Thank you for your suggestions, and we hope our response has addressed your concerns.

      (2) Mouse experiments would have benefited from the use of standard anti-inflammatory therapies to control colitis. That way the authors could compare their approach of using bacillus spores that current gold standard for treatment.

      We gratefully appreciate for your valuable comments. The comments improve the quality and depth of manuscript. Based on your suggestion, we have supplemented this in the revised manuscript. We appreciate your review and feedback, and have marked the updated contents in the revised manuscript.

      The updated contents were presented in line 198-378 in results section of the revised manuscript.

      Reviewer #3 (Public Review):

      Summary:

      The manuscript by Wang et al. investigates the effects of B. velezensis HBXN2020 in alleviating S. Typhimurium-induced mouse colitis. The results showed that B. velezensis HBXN2020 could alleviate bacterial colitis by enhancing intestinal homeostasis (decreasing harmful bacteria and enhancing the abundance of Lactobacillus and Akkermansia) and gut barrier integrity and reducing inflammation.

      Thanks for the comments and the positive reception of the manuscript.

      Strengths:

      B. velezensis HBXN2020 is a novel species of Bacillus that can produce a great variety of secondary metabolites and exhibit high antibacterial activity against several pathogens. B. velezensis HBXN2020 is able to form endospores and has strong anti-stress capabilities. B. velezensis HBXN2020 has a synergistic effect with other beneficial microorganisms, which can improve intestinal homeostasis.

      Thanks for the comments and the positive reception of the manuscript.

      Weaknesses:

      Few studies about the clinical application of Bacillus velezensis. Thus, more studies are still needed to explore the effectiveness of Bacillus velezensis before clinical application.

      Thanks for your suggestion. This study serves as an exploratory investigation before the application of Bacillus velezensis. The main purpose of this study is to explore the potential of Bacillus velezensis in application. We appreciate your review and feedback and hope that our response adequately addresses your concerns.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Most of my previous comments are well addressed, here are a few examples.<br /> While in my last comment, I requested a Colitis Mouse Model, which will well resemble the diarrhea disease caused by Salmonella in mammals. The available statement is not convincing, please check https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2225501/, https://pubs.rsc.org/en/content/articlelanding/2020/fo/d0fo01017k please replace "colitis" to a normal infection model. The current statement is incorrect.

      Thank you for your valuable suggestion. The comments improve the quality of manuscript. We have corrected this in the revised manuscript as suggested. We have marked the updated contents in the revised manuscript.

      The updated contents were presented in line 2, 29, 38, 46, 48, 199, 204, 246, 248, 282, 307, 310, 316, 431, 433, 464, 466, 473, 494, 497, 499, 504, 513, 518, 525, 706, 710 and 735 in the revised manuscript.

      Certain parts remain to be overestimated, to my knowledge, the language and logical flow should be addressed thoroughly.

      Here are suggestions to improve the logical flow of the manuscript.

      (1) Probiotic sampling and isolation

      (2) in vitro assessment

      (3) genomic sequencing and in silico safety assessment (Crit Rev Food Sci Nutr. 2023;63(32):11244-11262), which should be included as a right ref.

      (4) in vivo assay for safety evaluation, but not biosafety (it has a different meaning!!)

      (5) infection model and protection assay.

      We gratefully appreciate for your valuable comments. The comments improve the quality and depth of manuscript. According to your suggestion, we do our best to correct those problems in the revised manuscript. We would like to express our apologies once again and hope that the revised manuscript meets your expectations. We have marked the updated contents in the revised manuscript.

      Also, please pay attention to the logical link or transition sentences between each part to connect the dots in each part.

      We gratefully appreciate for your valuable comments. The comments improve the quality of manuscript. According to your suggestion, we have corrected this in the revised manuscript. We have marked the updated contents in the revised manuscript. 

      Finally, there are also lots of typos and errors, please improve through the text.<br /> For example, Line 521. "Stain", and more...

      Thanks for pointing this out. Based on your suggestion, we have corrected in the revised manuscript. We have marked the updated contents in the revised manuscript.

      The updated contents were presented in line 753, 1055, 1087 in the revised manuscript.

      Reviewer #2 (Recommendations For The Authors):

      The revised manuscript by Wang and colleagues attempts to address concerns raised during the first round of review.

      All minor comments have been addressed and in general, the major concerns have been partially addressed in the revised manuscript.

      The outstanding concerns relate to the mechanistic basis of the observations. The authors made no attempt to address this in a meaningful manner. Secondly, the issue of comparing the responses to what would be standard therapy (such as anti-inflammatories) was also handled in a somewhat dismissive manner, referring to other ongoing/future work. The clinical utility of the findings are hard to ascertain if there is no comparison to the current gold standard therapeutic approach.

      I have no further suggestions for the authors, save for those previously made.

      Thank you for pointing this out. We apologize for any inconvenience caused by the lack of mechanism research of the manuscript. In later work, we will focus on exploring the antibacterial substances and bactericidal mechanisms of B. velezensis. Thank you for your suggestions, and we hope our response has addressed your concerns.

      Secondly, About the comparative trial of oral bacillus spore treatment with the current gold standard for treatment, we have supplemented this in the revised manuscript. We appreciate your review and feedback, and have marked the updated contents in the revised manuscript.

      The updated contents were presented in line 198-378 in results section of the revised manuscript.

      Reviewer #3 (Recommendations For The Authors):

      This is a revision, they have addressed all my concerns, and now it is acceptable.

      Thank you very much for your comments and recognition of the manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      In this manuscript, Zhou et al describe a deaminase and reader protein-assisted RNA m5C sequencing method. The general strategy is similar to DART-seq for m6A sequencing, but the difference is that in DART-seq, m6A sites are always followed by C which can be deaminated by fused APOBEC1 to provide a high resolution of m6A sites, while in the case of m5C, no such obvious conserved motifs for m5C sites exist, therefore, the detection resolution is much lower. In addition, the authors used two known m5C binding proteins ALYREF and YBX1 to guide the fused deaminases, but it is not clear whether these two binding proteins can bind most m5C sites and compete with other m5C binding proteins.

      Thank you for your kind suggestion. RNA affinity chromatography and mass spectrometry analyses using biotin-labelled oligonucleotides with or without m5C were performed in previous reports (doi:10.1038/cr.2017.55 and doi: 10.1038/s41556-019-0361-y), and the results showed that ALYREF and YBX1 had a more prominent binding ability to m5C -modified oligonucleotides. Moreover, these two m5C -binding proteins are also responsible for mRNA m5C binding, so we chose to use their ability to bind targeted m5C to construct a DRAM detection system in anticipation of transcriptome-wide m5C detection. We hope to propose a suitable detection strategy for RNA m5C, and there will certainly be room for optimization of the DRAM system in the future with more in-depth studies of m5C binding proteins. We have discussed the above issue in lines 75-82 and 315-318.

      It is well known that two highly modified m5C sites exist in 28S RNA and many m5C sites exist in tRNA, the authors should validate their methods first by detecting these known m5C sites and evaluate the possible false positives in rRNA and tRNA.

      Thank you for your kind suggestion. We attempted PCR amplification of sequences flanking m5C sites 3782 and 4447 on 28S rRNA, as well as multiple m5C sites on tRNA, including m5C48 and m5C49 on tRNAVal, m5C48 and m5C49 on tRNAAsp, and m5C48 on tRNALys.

      However, Sanger sequencing revealed no valid mutations, which was implemented in Figure S3. We believe this outcome indicates that the DRAM system is more suited for transcriptome-wide m5C detection of mRNAs. This is supported by current reports that ALYREF and YBX1 are responsible for the m5C-binding proteins of mRNAs (doi:10.1038/cr.2017.55 and doi: 10.1038/s41556-019-0361-y). The above results and descriptions were added to lines 136-143.

      In mRNA, it is not clear what is the overlap between the technical replicates. In Figures 4A and 4C, they detected more than 10K m5C sites, and most of them did not overlap with sites uncovered by other methods. These numbers are much larger than expected and possibly most of them are false positives.

      Thank you for your kind suggestion. We observed significant overlap between the technical repeats by comparing the data across biological repeats, as shown in Figure S4C and described in lines 174-175. We considered m5C modification in a region only when editing events were detected in at least two biological replicates, ensuring a high-stringency screening process (details seen in the revised method in lines 448-455 and Figure 3F). With more in-depth research into m5C readers, we aim to achieve more accurate detection in the future.

      Besides, it is not clear what is the detection sensitivity and accuracy since the method is neither single base resolution nor quantitative.

      Thank you for your suggestion. As shown in Figure 3G, we found that the editing window of the DRAM system exhibited enrichment of approximately 20 bp upstream and downstream of the m5C site. Previous reports identified Type I m5C sites, which tend to have a downstream "NGGG" motif, and Type II m5C sites, which often contain a downstream "UCCA" motif. However, these m5C motifs do not fully characterize all m5C sites, and their presence downstream of an m5C site is not guaranteed (doi: 10.1038/s41594-019-0218-x). This limitation complicates single-base resolution analysis by the DRAM system. Nevertheless, we believe that with further exploration of m5C sequence features, precise single-base resolution detection can be achieved in the future. This point is also discussed in lines 314-322.

      Regarding the quantitative level of the assay, we conducted additional experiments by progressively reducing the expression levels of the fusion proteins. Sanger sequencing revealed that the editing efficiency of A-to-G and C-to-U within the m5C region significantly decreased as fusion protein expression diminished (Figure S9). These findings suggest that the DRAM system's transfection efficiency is concentration-dependent and that the ratio of editing efficiency to transfection efficiency could aid in the quantitative analysis of m5C using the DRAM system. The relative results were supplemented in Figure S9 and discussed in lines 263-271.

      There are no experiments to show that the detected m5C sites are responsive to the writer proteins such as NSUN2 and NSUN6, and the determination of the motifs of these writer proteins.

      Thank you for your kind suggestion. We have performed a motif enrichment analysis based on the sequences spanning 10 nt upstream and downstream of DRAM-editing sites. The relative results of this analysis were supplemented in Figure S4D and lines 168-171. Unfortunately, we did not identify any clear sequence preferences for the m5C sites catalyzed by the methyltransferases NSUN2 and NSUN6, which have previously been associated with “G”-rich sequences and the “CUCCA” motif. This limitation is mainly due to the DRAM detection system’s inability to achieve single-base resolution for m5C detection, which is also explained in the above response.

      Reviewer #2:

      (1) The use of two m5C reader proteins is likely a reason for the high number of edits introduced by the DRAM-Seq method. Both ALYREF and YBX1 are ubiquitous proteins with multiple roles in RNA metabolism including splicing and mRNA export. It is reasonable to assume that both ALYREF and YBX1 bind to many mRNAs that do not contain m5C.

      To substantiate the author's claim that ALYREF or YBX1 binds m5C-modified RNAs to an extent that would allow distinguishing its binding to non-modified RNAs from binding to m5C-modified RNAs, it would be recommended to provide data on the affinity of these, supposedly proven, m5C readers to non-modified versus m5C-modified RNAs. To do so, this reviewer suggests performing experiments as described in Slama et al., 2020 (doi: 10.1016/j.ymeth.2018.10.020). However, using dot blots like in so many published studies to show modification of a specific antibody or protein binding, is insufficient as an argument because no antibody, nor protein, encounters nanograms to micrograms of a specific RNA identity in a cell. This issue remains a major caveat in all studies using so-called RNA modification reader proteins as bait for detecting RNA modifications in epitranscriptomics research. It becomes a pertinent problem if used as a platform for base editing similar to the work presented in this manuscript.

      We thank the reviewer for the valuable suggestion. Previous studies have shown that while ALYREF and YBX1 can bind mRNAs without the m5C modification, their binding affinity for m5C-modified oligonucleotides is significantly higher than for unmethylated controls. This has been demonstrated through experiments such as in vitro tractography, electrophoretic mobility shift assay (EMSA) (doi:10.1038/cr.2017.55), and UHPLC-MRM-MS/MS. Additionally, isothermal titration calorimetry measurements and PAR-CLIP experiments have shown that mutations in the key amino acids responsible for m5C binding in ALYREF and YBX1 result in a significant reduction in their ability to m5C (doi: 10.1038/s41556-019-0361-y).

      Although Me-RIP analysis was unsuccessful in our laboratory, likely due to the poor specificity of the m5C antibody, we alternatively performed RNA pulldown experiments. These experiments verified that the ability of DRAMmut-expressing proteins to bind RNA with m5C modification was virtually absent compared to DRAM-expressing proteins, while their binding ability with non-modified RNA was not significantly affected. The relative RNA pulldown results were supplemented in Figure S1E, S1F and lines 110-111. Therefore, we believe that by integrating DRAMmut group, our DRAM system could effectively exclude the false-positive mutations caused by unspecific binding of DRAM’s reader protein to non-m5C-modified mRNAs.

      (2) Since the authors use a system that results in transient overexpression of base editor fusion proteins, they might introduce advantageous binding of these proteins to RNAs. It is unclear, which promotor is driving construct expression but it stands to reason that part of the data is based on artifacts caused by overexpression. Could the authors attempt testing whether manipulating expression levels of these fusion proteins results in different editing levels at the same RNA substrate?

      Thank you for pointing this out. To investigate how different expression levels of these proteins influence A-to-G and C-to-U editing within the same m5C region, we conducted a gradient transfection using plasmid concentrations of 1500 ng, 750 ng and 300 ng. This approach allowed us to progressively reduce the expression levels of the fusion proteins. Sanger sequencing revealed that the editing efficiency of A-to-G and C-to-U within the m5C region significantly decreased as fusion protein expression diminished. These findings suggest that the transfection efficiency of the DRAM system is concentration-dependent and that the ratio of editing efficiency to transfection efficiency may assist in the quantitative analysis of m5C using the DRAM system. The relative results and hypotheses were added and discussed in Figure S9 and lines 263-271 of the revised manuscript.

      (3) Using sodium arsenite treatment of cells as a means to change the m5C status of transcripts through the downregulation of the two major m5C writer proteins NSUN2 and NSUN6 is problematic and the conclusions from these experiments are not warranted. Sodium arsenite is a chemical that poisons every protein containing thiol groups. Not only do NSUN proteins contain cysteines but also the base editor fusion proteins. Arsenite will inactivate these proteins, hence the editing frequency will drop, as observed in the experiments shown in Figure 5, which the authors explain with fewer m5C sites to be detected by the fusion proteins.

      Thank you for pointing this out. We used bisulfite sequencing PCR to determine that the m5C levels in RPSA and AP5Z1 were significantly reduced after sodium arsenite treatment. This was followed by a significant decrease in editing frequency detected by the DRAM system in sodium arsenite-treated samples compared to untreated samples. This reduction aligns with the decreased editing efficiency observed in methyltransferase-deficient cells (as shown in Figures 2G and 2H), which initially convinced us that these results reflected the DRAM system's ability to monitor dynamic changes in m5C levels.

      However, as the reviewer pointed out, sodium arsenite treatment could potentially inactivate the fusion proteins, leading to the observed reduction in editing efficiency. This possibility has not been conclusively ruled out in our current experiments. Optimizing this validation may require the future development of more specific m5C inhibitors. In light of this, we have revised our previous results and conclusions in lines 235-244, and discussed these points in lines 308-315.

      (4) The authors should move high-confidence editing site data contained in Supplementary Tables 2 and 3 into one of the main Figures to substantiate what is discussed in Figure 4A. However, the data needs to be visualized in another way than an Excel format. Furthermore, Supplementary Table 2 does not contain a description of the columns, while Supplementary Table 3 contains a single row with letters and numbers.

      Thank you for your kind suggestion. We have visualized the data from Supplementary Tables 2 and 3 into Figure 3F, presenting it as a screening flowchart for high-confidence editing sites. In Supplementary Table 3, we have displayed only the DRAM-mutated genes, which is why it contains a single row with letters and numbers. As requested, we have included descriptions of each column and reorganized the Supplementary table 2 and 3 accordingly.

      (5) The authors state that "plotting the distribution of DRAM-seq editing sites in mRNA segments (5'UTR, CDS, and 3'UTR) highlighted a significant enrichment near the initiation codon (Figure 3F).", which is not true when this reviewer looks at Figure 3F.

      Thank you for your kind suggestion, and we replaced the expression of " near the initiation codon" with "in the CDS" in lines 192-193.

      (6) The authors state that "In contrast, cells expressing the deaminase exhibited a distinct distribution pattern of editing sites, characterized by a prevalence throughout the 5'UTR.", which is not true when this reviewer looks at Figure 3F.

      Thank you for your kind suggestion. This distribution was actually characterized by a prevalence throughout the "3'UTR", but not "5'UTR". We have also made the necessary changes in lines 193-195.

      (7) The authors claim in the final conclusion: "In summary, we developed a novel deaminase and reader protein assisted RNA m5C methylation approach...", which is not what the method entails. The authors deaminate As or Us close to 5mC sites based on the binding of a deaminase-containing protein.

      Thank you for your kind suggestion, and we have made the necessary changes in lines 331-334.

      (8) The authors claim that "The data supporting the findings of this study are available within the article and its Supplementary Information." However, no single accession number for the deposited sequencing data can be found in the text or the supplementary data. Without the primary data, none of the claims can be verified.

      Thank you for pointing this out. The sequencing data from this study has already been deposited to the GEO database (GEO assession number: GSE254194, GEO token:ororioukbdqtpcn), and we will ensure it is made publicly available in a timely manner.

      (a) To underscore point (1), a recent publication (https://doi.org/10.1038/s41419-023-05661-y) reported: "To further identify the potential mRNAs regulated by ALYREF, we performed RNA-seq analysis in control or ALYREF knockdown T24 cells. After knockdown of ALYREF, 143 mRNAs differentially expressed, including 94 downregulated mRNAs (NC reads >100, |Fold change | >1.5, P-value <0.05). Functional enrichment analysis using Kyoto Encyclopedia of Genes and Genomes (KEGG) indicated that regulated mRNAs by ALYREF are chiefly enriched in canonical cancer-related pathways (Fig. S4A), including TGF-β signaling, MAPK signaling, and NF-κB signaling, strongly supporting the oncogenic function of ALYREF in tumor progression. Among these 94 downregulated genes, 11 mRNA showed a significant reduction in m5C methylation after NUSN2 silencing in T24 cells, combined with previously transcriptome-wide RNA-BisSeq data of T24 cells [21] (Fig. 4A)."

      These results translate into 94 mRNAs are regulated by ALYREF in bladder cancer-derived cells. From those, very few (11) mRNA identities respond to NSUN2-dependent RNA methylation mediated by ALYREF binding.The question then arises, is that number sufficient to claim that ALYREF is a m5C-binding protein?

      And if so, how does the identification of 10.000+ edits by DRAM-Seq compare with the 94 mRNAs that are regulated by ALYREF? Were these 94 mRNAs identified by DRAM-Seq.

      Thank you for your kind suggestion. Previous reports by Yang et al. ( doi: 10.1038/cr.2017.55), including the literature you refer to, have detailed the close relationship between ALYREF and m5C modification, and the ALY/REF export factor (ALYREF) was identified as the first nuclear m5C reader, and it was demonstrated that many mRNAs are regulated by ALYREF, and is therefore considered to be an m5C-binding protein.

      As required, by comparing the DRAM-edited mRNAs with the reported 94 mRNAs, we found that only 55.32% of the 94 mRNAs regulated by ALYREF could be detected by the DRAM system. This indicates that the DRAM system specifically targets certain mRNAs, as illustrated in Figure S4E. The relevant results were described and discussed in lines 175-179.

      (b) Line 123:

      "The deep sequencing results showed that the deamination rates of RPSA and SZRD1 were 75.5% and 27.25%, respectively. (Fig. 2A, B)."

      The Figure shows exactly the opposite of bisulfite-mediated deamination. These are the cytosines that were not deaminated by the chemical treatment and therefore can be sequenced as cytosines and not thymidines. Hence, the term deamination rate is wrong.

      Thank you for your kind suggestion. We have made the necessary change in lines 129-130 to change the deamination rates to m⁵C fraction.

      (c) Line 157:

      "DRAM-seq analysis further confirmed that DRAM was detected in an m5C-dependent manner, with minimal mutations in AP5Z1 and RPSA mRNAs in methyltransferase knockout cells compared to wild-type cells (Fig. 3C, D)."

      There is no indication of what the authors mean by minimal mutation in these Figures. The term "minimal mutation" should be reconsidered as well.

      Thank you for your kind suggestion. We intended to express that "Mutations in AP5Z1 and RPSA mRNA are reduced in methyltransferase-deficient cells." There was an issue with the initial formulation, and we have made the necessary changes in lines 165-167.

      (d) Line 167:

      "To further delineate the characteristics of the DRAM-seq data, we compared the distribution of DRAM-seq editing sites within the gene structure, specifically examining their occurrences in the 5'untranslated region (5'UTR), 3' untranslated region (3'UTR), CDS and ncRNA."

      Which part of a coding RNA is meant by "ncRNA"?

      Thank you for pointing this out. This was actually the Intergenic or Intron region, but not ncRNA. We have also corrected this labelling in Figure 3G and lines 186-189 of the revised manuscript.

      (e) Line 189:

      "Subsequently, we assessed the capacity of DRAM-seq to detect m5C on a transcriptome-wide scale, comparing its performance to BS-seq that have been previously reported with great authority."

      The term "great authority" is not a scientific term. Please, remove adulation to senior authors.

      Thank you for your kind suggestion. We removed this unsuitable expression and made the necessary changes in lines 207-208.

      (f) Line 233:

      "Several experiments have highlighted the requirement of 100-500 ng of RNA for m5C-RIP-seq, while BS-seq necessitates an even more demanding 500-750 μg of RNA21,25,61."

      This reviewer doubts that RNA bisulfite sequencing required half to one mg of RNA input. Please, check these references.

      Thank you for your kind suggestion. According to the references, we corrected μg to ng and made the necessary changes in lines 251-252.

      (g) Line 247:

      "Several experiments have highlighted the requirement of 100-500 ng of RNA for m5C-RIP-seq, while BS-seq necessitates an even more demanding 500-750 μg of RNA21,25,61."

      This reviewer doubts that RNA bisulfite sequencing requires half to one mg of RNA input. Please, check these references.

      Thank you for your kind suggestion. According to the references, we corrected μg to ng and made the necessary changes in lines 251-252.

      (h) Line 292:

      "Since m5C lacks a fixed motif, DRAM has an apparent limitation in achieving single-base resolution for detecting m5C."

      m5C deposition by NSUN2 and NSUN6 occurs in particular motifs that were coined Type I and II motifs. Hence, this statement is not correct.

      Thank you for your kind suggestion. Previous reports identified Type I m5C sites, which tend to have a downstream "NGGG" motif, and Type II m5C sites, which often contain a downstream "UCCA" motif. However, these m5C motifs do not fully characterize all m5C sites, and their presence downstream of an m5C site is not guaranteed (doi: 10.1038/s41594-019-0218-x ). Therefore, we have corrected the expression “fixed motif” to “fixed base composition for characterizing all m5C modification sites” in lines 317.

      (i) Line 390:

      "1 μl of total cellular RNA was used for sequencing library gene..."

      1 uL does not allow us to deduce which RNA mass was used for cDNA synthesis.

      Thank you for your kind suggestion. According to our cDNA synthesis protocol, we corrected “1μl” to “1μg” in lines 422-423.

      (j) Line 405:

      "...was assessed on the Agilent 5400 system (Agilent, USA) and quantified by QPCR (1.5 nM)"

      What does the 1.5 nM refer to in this sentence?

      Thank you for your kind suggestion. Here, "1.5nM" means that the concentration of the constructed library should be no less than 1.5nM. We have also revised this expression in the methods in lines 436-438.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors use analysis of existing data, mathematical modelling, and new experiments, to explore the relationship between protein expression noise, translation efficiency, and transcriptional bursting.

      Strengths:

      The analysis of the old data and the new data presented is interesting and mostly convincing.

      Thank you for the constructive suggestions and comments. We address the individual comments below.

      Weaknesses:

      (1) My main concern is the analysis presented in Figure 4. This is the core of mechanistic analysis that suggests ribosomal demand can explain the observed phenomenon. I am both confused by the assumptions used here and the details of the mathematical modelling used in this section. Firstly, the authors' assumption that the fluctuations of a single gene mRNA levels will significantly affect ribosome demand is puzzling. On average the total level of mRNA across all genes would stay very constant and therefore there are no big fluctuations in the ribosome demand due to the burstiness of transcription of individual genes. Secondly, the analysis uses 19 mathematical functions that are in Table S1, but there are not really enough details for me to understand how this is used, are these included in a TASEP simulation? In what way are mRNA-prev and mRNA-curr used? What is the mechanistic meaning of different terms and exponents? As the authors use this analysis to argue ribosomal demand is at play, I would like this section to be very much clarified.

      Thank you for raising two important points. Regarding the first point, we agree that the overall ribosome demand in a cell will remain more or less the same even with fluctuations in mRNA levels of a few genes. However, what we refer to in the manuscript is the demand for ribosomes for translating mRNA molecules of a single gene. This demand will vary with the changes in the number of the mRNA molecules of that gene. When the mRNA copy number of the gene is low, the number of ribosomes required for translation is low. At a subsequent timepoint when the mRNA number of the same gene goes up rapidly due to transcriptional bursting, the number of ribosomes required would also increase rapidly. The process of allocation of ribosomes for translation of these mRNA molecules will vary between cells, and this process can lead to increased expression variation of that gene among cells.

      Regarding the second point, each of the 19 mathematical functions was individually tested in the TASEP model and stochastic simulation. The parameters ‘mRNA-curr’ and ‘mRNA-prev’ are the mRNA copy numbers at the current time point and the previous time point in the stochastic simulation, respectively. These numbers were calculated from the rate of production of mRNA, which is influenced by the burst frequency and the burst size, as well as the rate of mRNA removal. We would expand this section with explanation for all parameters and terms in the revised manuscript.

      (2) Overall, the paper is very long and as there are analytical expressions for protein noise (e.g. see Paulsson Nature 2004), some of these results do not need to rely on Gillespie simulations. Protein CV (noise) can be written as three terms representing protein noise contribution, mRNA expression contribution, and bursty transcription contribution. For example, the results in panel 1 are fully consistent with the parameter regime, protein noise is negligible compared to transcriptional noise.

      Thank you for referring to the paper on analytical expressions for protein noise. We introduced translational bursting and ribosome demand in our model, and these are linked to stochastic fluctuations in mRNA and ribosome numbers. In addition, our model couples transcriptional bursting with translational bursting and ribosome demand. Since these processes are all stochastic in nature, we felt that the stochastic simulation would be able to better capture the fluctuations in mRNA and protein expression levels originating from these processes. For consistency, we used stochastic simulations throughout even when the coupling between transcription and translation were not considered.

      Reviewer #2 (Public review):

      This work by Pal et al. studied the relationship between protein expression noise and translational efficiency. They proposed a model based on ribosome demand to explain the positive correlation between them, which is new as far as I realize. Nevertheless, I found the evidence of the main idea that it is the ribosome demand generating this correlation is weak. Below are my major and minor comments.

      Thank you for your helpful suggestions and comments. We note that the direct experimental support required for the ribosome demand model would need experimental setups that are beyond the currently available methodologies. We address the individual comments below.

      Major comments:

      (1) Besides a hypothetical numerical model, I did not find any direct experimental evidence supporting the ribosome demand model. Therefore, I think the main conclusions of this work are a bit overstated.

      Direct experimental evidence of the hypothesis would require generation of ribosome occupancy maps of mRNA molecules at the level of single cells and at time intervals that closely match the burst frequency of the genes. This is beyond the currently available methodologies. However, there are other evidences that support our model. For example, earlier work in cell-free systems have showed that constraining cellular resources required for transcription or translation can increase expression heterogeneity (Caveney et al., 2017). In addition, genome-wide analysis of expression noise in yeast also revealed that the association between protein noise and translational efficiency was highest in the group of genes with the most bursty transcription (Supplementary fig. S20).

      (2) I found that the enhancement of protein noise due to high translational efficiency is quite mild, as shown in Figure 6A-B, which makes the biological significance of this effect unclear.

      Although we agree with the reviewer’s comment that the effect of translational efficiency on protein noise may not be as substantial as the effect of transcriptional bursting, it has been observed in studies across bacteria, yeast and Arabidopsis (Ozbudak et al., 2003; Blake et al., 2003; Wu et al., 2022). In addition, the relationship between translational efficiency and protein noise is in contrast with the inverse relationship observed between mean expression and noise (Newman et al., 2006; Silander et al., 2012). We also note that the goal of the manuscript was not to evaluate the strength of the association, but to understand the basis of the influence of translational efficiency on protein noise.

      (3) The captions for most of the figures are short and do not provide much explanation, making the figures difficult to read.

      We will revise the figure captions to include more details as per the reviewer’s suggestion.

      (4) It would be helpful if the authors could define the meanings of noise (e.g., coefficient of variation?) and translational efficiency in the very beginning to avoid any confusion. It is also unclear to me whether the noise from the experimental data is defined according to protein numbers or concentrations, which is presumably important since budding yeasts are growing cells.

      For all published datasets where we had measurements from a large number of genes/promoters, we used the measures of adjusted noise (for mRNA noise) and Distance-to-median (DM, for protein noise). For experiments that we performed on a limited number of promoters, we used the measure of coefficient of variation (CV) to quantify noise, as calculation of adjusted noise or DM was not possible. Translational efficiency refers to translation rate which is determined by both the translation initiation rate and the translation elongation rate. The noise at the protein level was quantified from the signal intensity of GFP tagged proteins, which was proportional to protein numbers without considering cell volume. For quantification of noise at the mRNA level, single-cell RNA-seq data was used, which provided mRNA numbers in individual cells.

      (5) The conclusions from Figures 1D and 1E are not new. For example, the constant protein noise as a function of mean protein expression is a known result of the two-state model of gene expression, e.g., see Equation (4) in Paulsson, Physics of Life Reviews 2005.

      Yes, they are not new, but we included these results for setting the baseline for comparison with simulation results that appear in the later part of the manuscript where we included translational bursting and ribosome demand in our models.

      (6) In Figure 4C-D, it is unclear to me how the authors changed the mean protein expression if the translation initiation rate is a function of variation in mRNA number and other random variables.

      The translation initiation rate varied from a baseline initiation rate depending on the mRNA numbers and other variables. We changed the baseline initiation rate to alter the mean protein expression levels. We will elaborate this section in the revised manuscript.

      (7) If I understand correctly, the authors somehow changed the translation initiation rate to change the mean protein expression in Figures 4C-D. However, the authors changed the protein sequences in the experimental data of Figure 6. I am not sure if the comparison between simulations and experimental data is appropriate.

      It is an important observation. Even though we changed the translation initiation rate to change the mean expression (Fig. 4C-D), we noted in the description in the model (Fig. 3D) that the changes in the translation initiation rate was also linked with changes in the translation elongation rate. The translation initiation rate can only increase if the ribosomes already bound to the mRNA traverse quicker through the mRNA. This means that an increase in the translation initiation rate will occur only if the translation elongation rate is also increased, which will lead to lower traversal time of the ribosomes through the mRNA (Fig. 3D). Similarly, an increase in the translation elongation rate will allow more ribosomes to initiate translation. Thus, the parameters translation initiation rate and translation elongation rate are interconnected. This has also been observed in an experimental study by Barrington et al. (2023). Having said that, however, the models can also be expressed in terms of the translation elongation rate, instead of the translation initiation rate, and this modification will not change the results of the simulations due to interconnectedness of the initiation rate and the elongation rate.  

      References

      C. L. Barrington, G. Galindo, A. L. Koch, E. R. Horton, E. J. Morrison, S. Tisa, T. J. Stasevich, O. S. Rissland. Synonymous codon usage regulates translation initiation. Cell Rep. 42, 113413 (2023).

      W. J. Blake, M. Kaern, C. R. Cantor, J. J. Collins, Noise in eukaryotic gene expression. Nature 422, 633-637 (2003).

      P. M. Caveney, S. E. Norred, C. W. Chin, J. B. Boreyko, B. S. Razooky, S. T. Retterer, C. P. Collier, M. L. Simpson, Resource Sharing Controls Gene Expression Bursting. ACS Synth Biol. 6, 334-343 (2017)

      J. R. Newman, S. Ghaemmaghami, J. Ihmels, D. K. Breslow, M. Noble, J. L. DeRisi, J. S. Weissman, Single-cell proteomic analysis of S. cerevisiae reveals the architecture of biological noise. Nature, 441, 840-846 (2006).

      E. M. Ozbudak, M. Thattai, I. Kurtser, A. D. Grossman, A. van Oudenaarden, Regulation of noise in the expression of a single gene. Nat Genet. 31, 69-73 (2002).

      O. K. Silander, N. Nikolic, A. Zaslaver, A. Bren, I. Kikoin, U. Alon, M. Ackermann, A genome-wide analysis of promoter-mediated phenotypic noise in Escherichia coli. PLoS Genet. 8, e1002443 (2012).

      H. W. Wu, E. Fajiculay, J. F. Wu, C. S. Yan, C. P. Hsu, S. H. Wu, Noise reduction by upstream open reading frames. Nat Plants. 8, 474-480 (2022).

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review)

      Summary:

      Numerous mechanism and structural studies reported the cooperative role of Oct4 and Sox2 during the establishment of pluripotency during reprogramming. Due to the difficulty in sample collection and RNA-seq with low-number cells, the precise mechanisms remain in early embryos. This manuscript reported the role of OCT4 and SOX2 in mouse early embryos using knockout models with low-input ATAC-seq and RNA-seq. Compared to the control, chromatin accessibility and transcriptome were affected when Oct4 and Sox2 were deleted in early ICM. Specifically, decreased ATAC-seq peaks showed enrichment of Motifs of TF such as OCT, SOX, and OCT-SOX, indicating their importance during early development. Moreover, by deep analysis of ATAC-seq and RNA-seq data, they found Oct4 and Sox2 target enhancer to activate their downstream genes. In addition, they also uncovered the role of OS during development from the morula to ICM, which provided the scientific community with a more comprehensive understanding.

      Strengths:

      On the whole, the manuscript is innovative, and the conclusions of this paper are mostly well supported by data, however, there are some issues that need to be addressed.

      Weaknesses:

      Major Points:

      (1) In Figure 1, a more detailed description of the knockout strategy should be provided to clarify itself. The knockout strategy in Fig1 is somewhat obscure, such as how is OCT4 inactivated in Oct4mKO2 heterozygotes. As shown in Figure 1, the exon of OCT4 is not deleted, and its promoter is not destroyed. Therefore, how does OCT4 inactivate to form heterozygotes?

      Thank you for your kind suggestions. We will add a detailed description of the knockout strategy in the legends for Figure 1A and 1B, as shown below:

      Figure 1A. Schemes of mKO2-labeled Oct4 KO (Oct4mKO2) and Oct4 flox alleles. In the Oct4mKO2 allele, a PGK-pac∆tk-P2A-mKO2-pA cassette was inserted 3.6 kb upstream of the Oct4 transcription start site (TSS) and a promoter-less FRT-SA-IRES-hph-P2A-Venus-pA cassette was inserted into Oct4 intron 1. The inclusion of a stop codon followed by three sets of polyadenylation signal sequences (pA) after the Venus cassette ensures both transcriptional and translational termination, effectively blocking the expression of Oct4 exons 2–5.

      Figure 1B. Schemes of EGFP-labeled Sox2 KO (Sox2EGFP) and Sox2 flox alleles. In the Sox2EGFP allele, the 5’ untranslated region (UTR), coding sequence and a portion of the 3’ UTR of Sox2 were deleted and replaced with a PGK-EGFP-pA cassette. Notably, 1,023 bp of the Sox2 3’UTR remaine intact.

      (2) Is ZP 3-Cre expressed in the zygotes? Is there any residual protein?

      Thank you for the question. While we have not directly tested for ZP3-Cre expression in zygotes, the published transcriptome and proteomics data shows that ZP3 is present at both the transcriptional and protein levels in wild-type zygotes (Deng et al., Science, 2014; Gao et al., Cell Reports, 2017). This suggests that ZP3-Cre could potentially be expressed in zygotes as well.

      (3) What motifs are enriched in the rising ATAC-seq peaks after knocking out of OCT4 and SOX2?

      Thank you for the question. The enriched motifs in the rising ATAC-seq peaking in Oct4 KO and Sox2 KO ICMs are the GATA, TEAD, EOMES and KLF motifs, as shown in Figure 4A and Figure supplement 7.

      (4) The ordinate of Fig4c is lost.

      Thank you for the question. The y-axis is average normalized signals (reads per million-normalized pileup signals). We will add it in the revised version.

      (5) Signals of H3K4me1, H3K27ac, and so on are usually used to define enhancers, and the loci of enhancers vary greatly in different cells. In the manuscript, the authors defined ATAC-seq peaks far from the TSS as enhancers. The definition in this manuscript is not strictly an enhancer.

      Thank you for this insightful comment. We will search for and analyze published omics data on H3K4me1 and H3K27ac in early embryos or mouse embryonic stem cells to conduct this analysis.

      (6) If Oct4 and Sox2 truly activate sap 30 and Uhrf 1, what effect does interfering with both genes have on gene expression and chromatin accessibility?

      Thank you for the interesting question. Unfortunately, we have not conducted this specific experiment, so we do not have direct results. However, Sap30 is a key component of the mSin3A corepressor complex, while Uhrf1 regulates the establishment and maintenance of DNA methylation. Both proteins are known to function as repressors. Therefore, we hypothesize that interfering with these two genes could alleviate repression of some genes, such as trophectoderm markers, similar to what we have observed in Oct4 KO and Sox2 KO ICMs.

      Reviewer #2 (Public review):

      In this manuscript, Hou et al. investigate the interplay between OCT4 and SOX2 in driving the pluripotent state during early embryonic lineage development. Using knockout (KO) embryos, the authors specifically analyze the transcriptome and chromatin state within the ICM-to-EPI developmental trajectory. They emphasize the critical role of OCT4 and the supportive function of SOX2, along with other factors, in promoting embryonic fate. Although the paper presents high-quality data, several key claims are not well-supported, and direct evidence is generally lacking.

      Major Points:

      (1) Although the authors claim that both maternal KO and maternal KO/zygotic hetero KO mice develop normally, the molecular changes in these groups appear overestimated. A wildtype control is recommended for a more robust comparison.

      Thank you for your valuable feedback. However, I’m unclear on what is meant by “the molecular changes in these groups appear overestimated.” Could the reviewer kindly provide more details or clarify which specific aspects of the molecular changes they are referring to? This would help us better address the concern.

      (2) The authors assert that OCT4 and SOX2 activate the pluripotent network via the OCT-SOX enhancer. However, the definition of this enhancer is based solely on proximity to TSSs, which is a rough approximation. Canonical enhancers are typically located in intronic and intergenic regions and marked by H3K4me1 or H3K27ac. Re-analyzing enhancer regions with these standards could be beneficial. Additionally, the definitions of "close to" or "near" in lines 183-184 are unclear and not defined in the legends or methods.

      Thank you for this insightful comment. We will search for and analyze published omics data on H3K4me1 and H3K27ac in early embryos or mouse embryonic stem cells to address the concern of “enhancer”.

      The definition of "close to" or "near" in lines 183-184 is in the legend of Figure 2E and methods. In the GSEA analysis, Ensembl protein-coding genes with TSSs located within 10 kb of ATAC-seq peak centers were included.

      (3) There is no evidence that the decreased peaks/enhancers could be the direct targets of Oct4 and Sox2 throughout this manuscript. Figures 2 and 4 show only minimal peak annotations related to OCT and SOX motifs, and there is a lack of chromatin IP data. Therefore, claims about direct targets are not substantiated and should be appropriately revised.

      Thank you for the comment. In Figure Supplement 3C, we analyzed published Sox2 CUT&RUN data from E4.5 ICMs (Li et al., Science, 2023), which demonstrates that the reduced ATAC-seq peaks in our Sox2 KO ICMs are enriched with Sox2 CUT&RUN signals. This data suggests that decreased peaks/enhancers could be the direct targets of Sox2. Unfortunately, we did not to find similar published data for Oct4 in embryos.

      (4) Lines 143-146 lack direct data to support the claim. Actually, the main difference in cluster 1, 11 and 3, 8, 14 is whether the peak contains OCT-SOX motif. However, the reviewer cannot get any information of peaks activated by OCT4 rather than SOX2 in cluster 1, 11.

      Thank you for the comment. As the reviewer pointed out, we agree that clusters 3, 8, 14 is more enriched with OCT-SOX motifs than clusters 1/11. However, this is consistent with our observation that the accessibility of peaks in clusters 1 and 11 mainly relies on Oct4, while the accessibility of clusters 3, 8, 14 relies on both Oct4 and Sox2. Probably the word “activate” is not accurate. We will rearrange the texts as below:

      “Notably, compared to the peaks dependent on Oct4 but not Sox2 (Figure 2B, clusters 1 and 11), those reliant on both Oct4 and Sox2 show greater enrichment of the OCT-SOX motif (Figure 2B, clusters 3, 8 and 14). The former group tended to be already open in the morula, while the latter group became open in the ICM. “

      Minor Points:

      (1) Lines 153-159: The figure panel does not show obvious enrichment of SOX2 signals or significant differences in H3K27ac signals across clusters, thus not supporting the claim.

      Thank you for the comments.

      Line 153-159 reference two datasets:  Figure supplement 3C and 3D.

      In Figure supplement 3C, the average plots above the heatmaps show that the decreased ATAC-seq peaks exhibited higher enrichment with Sox2 CUT&RUN signals compared to the increased or unchanged peaks.

      Regarding Figure supplement 3D, we agree that the H3K27ac signal is only slightly more enriched on the decreased peaks than the unchanged peaks, However, it's important to note that only the top 57,512 strongest of the 142,096 unchanged peaks were included in the analysis. We excluded the weaker unchanged peaks because they are less informative. but if included, they could reduce the average H3K27ac signal for the unchanged peaks.

      (2) Lines 189-190: The term "identify" is overstated for the integrative analysis of RNA-seq and ATAC-seq, which typically helps infer TF targets rather than definitively identifying them.

      Thank you for the suggestion. We will replace “identify” with “infer”. The revised version is as below:

      “In addition, integration of the ATAC-seq and RNA-seq data allowed us to infer previously unknown targets of Oct4 and Sox2, such as Sap30 and Uhrf1, which are essential for somatic cell reprogramming and embryonic development.”

      (3) The Discussion is lengthy and should be condensed.

      Thank you for the suggestion. We will shorten it.

    1. Author response:

      We thank the editors and reviewers for their valuable feedback and are committed to addressing their suggestions in a revised manuscript. We appreciate the reviewers’ recognition of the value of our findings, including the insights into the consequences of synaptic topography and the investigation of spike initiation zones in DNs, which further advance our understanding of signal processing. Our studies offer broader insights into synaptic organization and its significance for dendritic integration in an ethologically relevant context.

      We particularly appreciate the reviewer's suggestion to elaborate on the electrophysiological properties of DNs and to consider the electrotonic distance in our analysis. We also thank the reviewers for highlighting points that need clarification. In short, our models suggest that DNs effectively distribute synapses to maintain linear encoding of synapse numbers when multiple synapses are coactivated. This supports the results of an earlier study suggesting that synapse number gradients encode the location of an approaching stimulus in these neurons (Dombrovski et al., 2023).

      We also agree with the reviewers that the temporal activation of synapses is highly relevant for this system. However, we have focused on synaptic topography because the characterization of temporal patterns of VPN activity is currently lacking in the field. A more detailed investigation of temporal dynamics is therefore beyond the scope of this study.

      With the publication of the reviewed preprint, we have now made the computational pipeline and models available on GitHub (https://github.com/AusbornLab/VPN-DN-synapse-normalization).

      Reference

      Dombrovski M, Peek MY, Park J-Y, Vaccari A, Sumathipala M, Morrow C, Breads P, Zhao A, Kurmangaliyev YZ, Sanfilippo P, Rehan A, Polsky J, Alghailani S, Tenshaw E, Namiki S, Zipursky SL, Card GM. 2023. Synaptic gradients transform object location to action. Nature 613:534–542. doi:10.1038/s41586-022-05562-8

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review): 

      Summary: 

      Dr. Santamaria's group previously utilized antigen-specific nanomedicines to induce immune tolerance in treating autoimmune diseases. The success of this therapeutic strategy has been linked to expanded regulatory mechanisms, particularly the role of T-regulatory type-1 (TR1) cells. However, the differentiation program of TR1 cells remained largely unclear. Previous work from the authors suggested that TR1 cells originate from T follicular helper (TFH) cells. In the current study, the authors aimed to investigate the epigenetic mechanisms underlying the transdifferentiation of TFH cells into IL-10-producing TR1 cells. Specifically, they sought to determine whether this process involves extensive chromatin remodeling or is driven by preexisting epigenetic modifications. Their goal was to understand the transcriptional and epigenetic changes facilitating this transition and to explore the potential therapeutic implications of manipulating this pathway. 

      The authors successfully demonstrated that the TFH-to-TR1 transdifferentiation process is driven by pre-existing epigenetic modifications rather than extensive new chromatin remodeling. The comprehensive transcriptional and epigenetic analyses provide robust evidence supporting their conclusions. 

      Strengths: 

      (1) The study employs a broad range of bulk and single-cell transcriptional and epigenetic tools, including RNA-seq, ATAC-seq, ChIP-seq, and DNA methylation analysis. This comprehensive approach provides a detailed examination of the epigenetic landscape during the TFH-to-TR1 transition. 

      (2) The use of high-throughput sequencing technologies and sophisticated bioinformatics analyses strengthens the foundation for the conclusions drawn. 

      (3) The data generated can serve as a valuable resource for the scientific community, offering insights into the epigenetic regulation of T-cell plasticity. 

      (4) The findings have significant implications for developing new therapeutic strategies for autoimmune diseases, making the research highly relevant and impactful. 

      We thank the reviewer for providing constructive feedback on the manuscript.

      Weaknesses: 

      (1) While the scope of this study lies in transcriptional and epigenetic analyses, the conclusions need to be validated by future functional analyses. 

      We fully agree with the reviewer’s suggestion. We have added the following text to the Discussion to address this concern: “The current study provides a foundational understanding of how the epigenetic landscape of TFH cells evolves as they transdifferentiate into TR1 progeny in response to chronic ligation of cognate TCRs using pMHCII-NPs. Our current studies focus on functional validation of these observations, by carrying out extensive perturbation studies of the TFH-TR1 transdifferentiation pathway in conditional transcription factor gene knock-out mice. In these ongoing studies, genes coding for a series of transcription factors expressed along the TFH-TR1 pathway are selectively knocked out in T cells, to ascertain (i) the specific roles of key transcription factors in the various cell conversion events and transcriptional changes that take place along the TFH-TR1 cell axis; (ii) the roles that such transcription factors play in the chromatin re-modeling events that underpin the TFH-TR1 transdifferentiation process; and (iii) the effects of transcription factor gene deletion on phenotypic and functional readouts of TFH and regulatory T cell function.”

      (2) This study successfully identified key transcription factors and epigenetic marks. How these factors mechanistically drive chromatin closure and gene expression changes during the TFH-to-TR1 transition requires further investigation. 

      Agreed. Please see our response to point #1 above.  

      (3) The study provides a snapshot of the epigenetic landscape. Future dynamic analysis may offer more insights into the progression and stability of the observed changes. 

      We have previously shown that the first event in the pMHCII-NP-induced TFH-TR1 transdifferentiation process involves proliferation of cognate TFH cells in the splenic germinal centers. This event is followed by immediate transdifferentiation of the proliferated TFH cells into transitional and terminally differentiated TR1 subsets. Although the snapshot provided by our single cell studies reported herein documents the simultaneous presence of the different subsets composing the transdifferentiation pathway at any given time point, the transdifferentiation process itself is extremely fast, such that proliferated TFH cells already transdifferentiate into TR1 cells after a single pMHCII-NP dose (Sole et al., 2023a). This makes it extremely challenging to pursue dynamic experiments. Notwithstanding this caveat, ongoing studies of cognate T cells post treatment withdrawal, coupled to single cell studies of the TFHTR1 pathway in transcription factor gene knockout mice exhibiting perturbed transdifferentiation processes are likely to shed light into the progression and stability of the epigenetic changes reported herein. 

      To address this limitation in the manuscript, we have added the following paragraph to the Discussion: “Although the snapshot provided by our single cell studies reported herein documents the simultaneous presence of the different subsets composing the TFH-TR1 cell pathway upon the termination of treatment, the transdifferentiation process itself is extremely fast, such that proliferated TFH cells already transdifferentiate into TR1 cells after a single pMHCII-NP dose (6). This makes it extremely challenging to pursue dynamic experiments. Notwithstanding this caveat, ongoing studies of cognate T cells post treatment withdrawal, coupled to single cell studies of the TFH-TR1 pathway in transcription factor gene knockout mice exhibiting perturbed transdifferentiation processes are likely to shed light into the progression and stability of the epigenetic changes reported herein”. 

      Reviewer #1 (Recommendations for the authors): 

      The authors may consider the following suggestions to improve this study: 

      (1) The authors may include a brief background on type 1 diabetes and the model involving BDC2.5 T cells to provide context for readers who may not be familiar with these aspects. 

      We have added this information to the first paragraph in the Results section: “BDC2.5mi/I-Ag7-specific CD4+ T cells comprise a population of autoreactive T cells that contribute to the progression of spontaneous autoimmune diabetes in NOD mice. The size of this type 1 diabetes-relevant T cell specificity is small and barely detectable in untreated NOD mice, but treatment with cognate pMHCII-NPs leads to the expansion and formation of antidiabetogenic TR1 cells that retain the antigenic specificity of their precursors (3). As a result, treatment of hyperglycemic NOD mice with these compounds results in the reversal of type 1 diabetes (3).”

      (2) It is understandable that further biological and functional experiments are beyond the scope of this paper, but it would be of interest to know how the authors envision future studies based on the transcriptional and epigenetic information obtained thus far. 

      We have added the following text to the Discussion section: “The current study provides a foundational understanding of how the epigenetic landscape of TFH cells evolves as they transdifferentiate into TR1 progeny in response to chronic ligation of cognate TCRs using pMHCII-NPs. Our current studies focus on functional validation of these observations, by carrying out extensive perturbation studies of the TFH-TR1 transdifferentiation pathway in conditional transcription factor gene knock-out mice. In these ongoing studies, genes coding for a series of transcription factors expressed along the TFH-TR1 pathway are selectively knocked out in T cells, to ascertain (i) the specific roles of key transcription factors in the various cell conversion events and transcriptional changes that take place along the TFH-TR1 cell axis; (ii) the roles that such transcription factors play in the chromatin re-modeling events that underpin the TFH-TR1 transdifferentiation process; and (iii) the effects of transcription factor gene deletion on phenotypic and functional readouts of TFH and regulatory T cell function.”

      (3) The authors may consider adjusting figures where genes are crowded or difficult to read due to small font size. 

      Figures with crowded text have been modified to facilitate reading.

      Reviewer #2 (Public Review): 

      Summary: 

      This study, based on their previous findings that TFH cells can be converted into TR1 cells, conducted a highly detailed and comprehensive epigenetic investigation to answer whether TR1 differentiation from TFH is driven by epigenetic changes. Their evidence indicated that the downregulation of TFH-related genes during the TFH to TR1 transition depends on chromatin closure, while the upregulation of TR1-related genes does not depend on epigenetic changes. 

      Strengths: 

      (1) A significant advantage of their approach lies in its detailed and comprehensive assessment of epigenetics. Their analysis of epigenetics covers chromatin open regions, histone modifications, DNA methylation, and using both single-cell and bulk techniques to validate their findings. As for their results, observations from different epigenetic perspectives mutually supported each other, lending greater credibility to their conclusions. This study effectively demonstrates that (1) the TFH-to-TR1 differentiation process is associated with massive closure of OCRs, and (2) the TR1-poised epigenome of TFH cells is a key enabler of this transdifferentiation process. Considering the extensive changes in epigenetic patterns involved in other CD4+ T lineage commitment processes, the similarity between TFH and TR1 in their epigenetics is intriguing. 

      (2) They performed correlation analysis to answer the association between "pMHC-NPinduced epigenetic change" and "gene expression change in TR1". Also, they have made their raw data publicly available, providing a comprehensive epigenomic database of pMHC-NPinduced TR1 cells. This will serve as a valuable reference for future research. 

      We thank the reviewer for his/her constructive feedback and suggestions for improvement of the manuscript.

      Weaknesses: 

      (1) A major limitation is that this study heavily relies on a premise from the previous studies performed by the same group on pMHC-NP-induced T-cell responses. This significantly limits the relevance of their conclusion to a broader perspective. Specifically, differential OCRs between Tet+ and naïve T cells were limited to only 821, as compared to 10,919 differential OCRs between KLH-TFH and naïve T cells (Figure 2A), indicating that the precursors and T cell clonotypes that responded to pMHC-NP were extremely limited. This limitation should be clearly discussed in the Discussion section. 

      We agree that this study focuses on a very specific, previously unrecognized pathway discovered in mice treated with pMHCII-NPs. Despite this apparent narrow perspective, we now have evidence that this is a naturally occurring pathway that also develops in other contexts (i.e., in mice that have not been treated with pMHCII-NPs). Furthermore, this pathway affords a unique opportunity to further understand the transcriptional and epigenetic mechanisms underpinning T cell plasticity; the findings reported can help guide/inform not only upcoming translational studies of pMHCII-NP therapy in humans, but also other research in this area. 

      We have added the following text to the Discussion to address this limitation: “Although the TFH-TR1 transdifferentiation was discovered in mice treated with pMHCII-NPs, we now have evidence that this is a naturally occurring pathway that also develops in other contexts (i.e., in mice that have not been treated with pMHCII-NPs). Importantly, the discovery of this transdifferentiation process affords a unique opportunity to further understand the transcriptional and epigenetic mechanisms underpinning T cell plasticity; the findings reported here can help guide/inform not only upcoming translational studies of pMHCII-NP therapy in humans, but also other research in this area”.

      We acknowledge that, in the bulk ATAC-seq studies, the differences in the number of OCRs found in tetramer+ cells or KLH-induced TFH cells vs. naïve T cells may be influenced by the intrinsic oligoclonality of the tetramer+ T cell pool arising in response to repeated pMHCII-NP challenge (Sole et al., 2023a). However, we note that our scATAC-seq studies of the tetramer+ T cell pool found similar differences between the oligoclonal tetramer+ TFH subpool and its (also oligoclonal) tetramer+ TR1 counterparts (i.e., substantially higher number of OCRs in the former vs. the latter relative to naïve T cells). 

      This has been clarified in the revised version of the manuscript, by adding the following text to the last paragraph of the Results subsection entitled “Contraction of the chromatin in pMHCII-NP-induced Tet+ vs. TFH cells at the bulk level”: “We acknowledge that, in the bulk ATAC-seq studies, the differences in the number of OCRs found in tetramer+ cells or KLHinduced TFH cells vs. naïve T cells may be influenced by the intrinsic oligoclonality of the tetramer+ T cell pool arising in response to repeated pMHCII-NP challenge (6). However, we note that scATAC-seq studies of the tetramer+ T cell pool found similar differences between the oligoclonal tetramer+ TFH subpool and its (also oligoclonal) tetramer+ TR1 counterparts (i.e., substantially higher number of OCRs in the former vs. the latter relative to naïve T cells)”.

      (2) This article uses peak calling to determine whether a region has histone modifications, claiming that the regions with histone modifications in TFH and TR1 are highly similar. However, they did not discuss the differences in histone modification intensities measured by ChIP-seq. For example, as shown in Figure 6C, IL10 H3K27ac modification in Tet+ cells showed significantly higher intensity than KLH-TFH, while in this article, it may be categorized as "possessing same histone modification region". This will strengthen their conclusions.

      We appreciate your suggestion to discuss differences in histone modification intensities as measured by ChIP-seq. However, we respectfully disagree with the reviewer’s interpretation of these data.

      Our study primarily focuses on the identification of epigenetic similarities and differences between pMHCII-NP-induced tetramer+ cells and KLH-induced TFH cells relative to naive T cells. The outcome of direct comparisons of histone deposition (ChIP-seq) between these cell types is summarized in the lower part of Figure 4B and detailed in Datasheet 5. Throughout this section, we mention the number of differentially enriched regions, their overlap with OCRs shared between tetramer+ TFH and tetramer+ TR1 cells based on scATAC-seq data, and the associated genes. Clearly, the epigenetic modifications that TR1 cells inherit from TFH cells were acquired by TFH cells upon differentiation from naïve T cell precursors. 

      Regarding the specific point raised by the reviewer on differences in the intensity of the H3K27Ac peaks linked to Il10 in Figure 6C, we note that the genomic tracks shown are illustrative. Thorough statistical analyses involving signal background for each condition and p-value adjustment did not support differential enrichment for H3K27Ac deposition around the Il10 gene between pMHCII-NP-induced tetramer+ T cells and KLH-induced TFH cells. 

      This has now been clarified by adding the following text to the end of the Results subsection entitled ”H3K4me3, H3K27me3 and H3K27ac marks in genes upregulated during the TFH-to-TR1 cell conversion are already in place at the TFH cell stage”: “We note that, although in the representative chromosome track views shown in Fig. 6C there appear to be differences in the intensity of the peaks, thorough statistical analyses involving signal background for each condition and p-value adjustment did not support differential enrichment for histone deposition around the Il10 gene between pMHCII-NP-induced tetramer+ T cells and KLH-induced TFH cells.” 

      We have also clarified this in the corresponding section of the Methods section (“ATACseq and ChIP-seq” under “Bioinformatic and Statistical Analyses”): “Given that peak calling alone does not account for variations in the intensity of histone mark deposition, analysis of differential histone deposition includes both qualitative and quantitative assessments. Whereas qualitative assessment involves evaluating the overall pattern and distribution of the various histone marks, quantitative assessment measures the intensity and magnitude of histone mark deposition.”

      (3) Last, the key findings of this study are clear and convincing, but some results and figures are unnecessary and redundant. Some results are largely a mere confirmation of the relationship between histone marks and chromatin status. I propose to reduce the number of figures and text that are largely confirmatory. Overall, I feel this paper is too long for its current contents. 

      We understand your concern about the potential redundancy of some results and figures. Our aim in including these analyses was to provide a comprehensive understanding of the intricate relationships between epigenetic features and transcriptomic differences. We believe that a detailed examination of these relationships is crucial for several reasons: (i) the breadth of the data allows for a thorough exploration of the relationships between histone marks, open chromatin status and transcriptional differences. This comprehensive approach helps to ensure that our conclusions are robust and well-supported; (ii) some of the results that may appear confirmatory are, in fact, important for validating and reinforcing the consistency of our findings across different contexts. These details are intended to provide a nuanced understanding of the interactions between epigenetic features and gene expression; and (iii) By presenting a detailed analysis, we aim to offer a solid foundation for future research in this area. The extensive data presented will serve as a valuable resource for others in the field who may seek to build on our findings.

      That said, we have carefully reviewed the manuscript to identify and streamline elements that might be perceived as overly redundant, while retaining the depth of analysis that we believe is essential.

      Reviewer #2 (Recommendations for the authors): 

      (1) In Figure 1E, the text states "94% (n=217/231) of the genes associated with chromatin regions that had closed during the TFH-TR1 conversion,", but n=231 do not match with n=1820 provided in Figure 1D as downregulated genes. This is one of the examples that do not match numbers among figures or lack sufficient explanations. Please check those numbers carefully and add some sentences if necessary. 

      We note that the text referring to Figure 1D describes the total number of differentially expressed genes between Tet+ TR1 and Tet+ TFH cells using the scMultiome dataset (n = 2,086 genes downregulated in the former vs. the latter; and n = 266 genes upregulated in the former vs. the latter). The text in the paragraph that follows (referring to Figure 1E) focuses exclusively on the genes that had closed chromatin regions during the TFH-to-TR1 conversion, to ascertain whether or not chromatin closure was indeed associated with such gene downregulation. 

      We have modified the first sentence in the paragraph referring to Figure 1E to clarify this point for the reader: “Further analyses focusing on the genes that had closed chromatin regions during the TFH-to-TR1 conversion, confirmed…”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors have developed a valuable method based on a fully cell-free system to express a channel protein and integrate it into a membrane vesicle in order to characterize it biophysically. The study presents a useful alternative to study channels that are not amenable to being studied by more traditional methods.

      Strengths:

      The evidence supporting the claims of the authors is solid and convincing. The method will be of interest to researchers working on ionic channels, allowing them to study a wide range of ion channel functions such as those involved in transport, interaction with lipids, or pharmacology.

      Weaknesses:

      The inclusion of a mechanistic interpretation of how the channel protein folds into a protomer or a tetramer to become functional in the membrane would strengthen the study.

      Work from other labs has described key factors which can improve expression and artificial lipid integration of cellfree derived transmembrane proteins (PMIDs: 35520093, 29625253, 26270393) . However, a significant number of additional experiments would be needed to elucidate the exact biophysical properties governing channel assembly of synthetically derived polycystins. We carried out additional biochemical experiments to address these concerns (see new Figure 1— figure supplement 1 D, E). We used fluorescence-detection size-exclusion chromatography (FSEC) with the goal of understanding how much of the CFE-derived protomers are biochemically folding and assembly into functional tetramers upon incorporation into SUVs. When compared to protein recombinant sources from HEK cells, the production of assembled channels is less than 4% when using the CFE+SUV approach, an estimate based on the oligomer peak fluorescence. In the absence of chaperones found in cells, the assembly of synthetically derived protomers into tetramers is likely intrinsic to the chemical properties of the proteins, and the biophysical principles governing helical membrane protein when inserted into the lipid membrane  (PMID:35133709). We have added our interpretation in lines 111-121.

      Reviewer #2 (Public Review):

      It is challenging to study the biophysical properties of organelle channels using conventional electrophysiology. The conventional reconstitution methods require multiple steps and can be contaminated by endogenous ionophores from the host cell lines after purification. To overcome this challenge, in this manuscript, Larmore et al. described a fully synthetic method to assay the functional properties of the TRPP channel family. The TRPP channels are an important organelle ion channel family that natively traffic to primary cilia and ER organelles. The authors utilized cell-free protein expression and reconstitution of the synthetic channel protein into giant unilamellar vesicles (GUV), the single channel properties can be measured using voltage-clamp electrophysiology. Using this innovative method, the authors characterized their membrane integration, orientation, and conductance, comparing the results to those of endogenous channels. The manuscript is well-written and may present broad interest to the ion channel community studying organelle ion channels. Particularly because of the challenges of patching native cilia cells, the functional characterization is highly concentrated in very few labs. This method may provide an alternative approach to investigate other channels resistant to biophysical analysis and pharmacological characterization.

      Thank you for evaluating our manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) It would be useful to explain how the Polycystin protein is folded under the experimental conditions used. The expression data shown in Figure 1 Supplement 1B show different protein concentrations of protomer or tetramer. However, it is not described how each form is identified and distinguished. It is also important to mention in the manuscript that this method is only applicable to membrane channels that do not require chaperons for its folding and expression into the membrane. How is the tetramer mechanistically conformed? In line 184, it is stated that this method can be leveraged for studying the effects of channel subunit composition. Would this method allow the expression of two different subunit proteins in order to produce a heteromeric channel?

      In Figure 1—figure supplement 1B, total fluorescence from the synthesized channel-GFP was measured. Protein concentration was calculated based on the linear regression of the GFP standards. Monomeric protein concentration was reported directly from total fluorescence. Tetrameric protein concentration was calculated by dividing the fluorescence by four, and subsequently calculating the concentration based off the GFP standards. 

      This is a good point. Based on your suggestion, we carried out additional biochemical experiments (see new Figure 1— figure supplement 1 D, E). We used fluorescence-detection size-exclusion chromatography (FSEC) with the goal of understanding how much of the CFE-derived protomers are biochemically folding and assembly into functional tetramers upon incorporation into SUVs. As controls we produced recombinant PKD2-GFP and PKD2L1GFP channels as elution time standards and to compare the relative production of tetrameric channels generated when using the two expression systems. The synthetically derived polycystin channels indeed produced tetramers and protomers, which supports feasibility of using this method to assay their functional properties.  When compared to protein recombinant sources from HEK cells, the production of assembled channels is less than 4% when using the CFE+SUV approach, an estimate based on the oligomer peak fluorescence. We speculate that assembly of synthetically derived protomers into tetramers is likely intrinsic to the chemical properties of the proteins, and the biophysical principles governing helical membrane protein when inserted into the lipid membrane (PMID: 35133709). Although an interesting question, a systematic analysis of these channel-lipid interactions is beyond the scope of this eLife Report but can be addressed in future studies. The limitation of using this method to characterize channels which fold and membrane integrate without the aid of molecular chaperones is now stated in lines 201205. In principle, the CFE-GUV method can be deployed to co-express different subunits to produce heteromeric channels. We have modified the text lines 192-197 to be clearer on this point.

      (2) The type of plasmid (and promoter) required for this methodology should be mentioned.

      Added to the methods (lines 210-211). “PKD2 and PKD2L1 are in pET19b plasmid under T7 promoter.”

      (3) Since this paper is methodological, it would be useful to have some information about the stability of the GUVs containing the synthetic channel. In Methods, it is stated that GUV vesicles are used on the same day (line 207). And in line 193 it says that the reactions (?) are placed at 4{degree sign}C for storage.

      Restated in lines 226-228: GUVs are electroformed and used for electrophysiology the same day. SUVs with channel incorporated are stored at 4°C for 3 days.

      (4) A comment reasoning why the PKD2 protein is more frequently incorporated into the membrane in comparison to PKD2L1 should be included. A brief description of the differences between these two proteins would also be helpful for the reader.

      In terms of overall protein production and oligomeric assembly— more PKD2L1 channels are produced compared to PKD2 (see new Figure 1C, and Figure 1— figure supplement 1 D, E). In lines 149-155 we note single channel openings were frequently observed for the high expressing PKD2L1 channels, but this often resulted in patch instability. As a result, GUV patches with lower expressing PKD2-GFP channel were more stable and thus more successfully recorded from. We have revised the text to be clearer on this point.

      (5) There are no methods for preparing hippocampal neurons or IMCD cells shown in Figure 4 Supplement 1. Instead, the method of mammalian cultures provided corresponds to HEK 293T cells.

      This information has been added to lines 273-284.

      (6) Minor:

      In Figure 2C, please include the actual % of the Cell488+Surface647+Clear lumen vesicles.

      Added

      Line 99, 108: Figures 1B and 1C are swapped. Please correct.

      Corrected in figure and figure legends.

      Line 108: misspelling: effect.

      Done

      Line 109: check sentence: verb is missing.

      Sentence now reads “Minimal changes in fluorescence were detected when a control plasmid (Ctrl) encoding a non- fluorescent protein (dihyrofolate reductase) was used in the reaction.”

      Line 145: recoding. Correct.

      Recoding changed to recordings

      Line 169: "from" is missing (recorded from MCD cilia).

      Added

      Line 169: In Table 1, the PKD2 K+ conductance magnitudes recorded from IMCD cilia were significantly smaller, not larger as stated, than those assayed using CFE-GUV system. Please correct.

      Corrected

      Line 180: "of" is missing (adaptation of CFE derived...).

      Corrected

      Line 182: "to" is missing (generalized to other channels).

      Corrected

      Line 193: "in" 4ºC, correct at.

      Corrected

      Line 197: replace "mole" for "mol".

      Corrected

      Line 207: are used "within the" same day.

      Corrected

      Line 210: c-terminally. C-should be capital letter.

      Corrected

      Line 231: n-terminally. N- should be capital letter.

      Corrected

      Reviewer #2 (Recommendations For The Authors):

      The authors validated their method using PKD2 and PKD2L1 channels, demonstrating the potential of this approach. However, a few points merit further clarification or validation:

      (1) Stability of the protein vesicles for recording. The authors observed membrane instability during voltage transitions. It would be beneficial to discuss potential solutions to enhance stability.

      In lines 197-202, we have added a discussion of potential solutions to enhance stability. CsF in the intracellular saline could be added to stabilize the GUV membranes. CsF is frequently added to stabilize whole cell membranes in HTS planer patch clamp recording. We did not explore this formulation because Cs+ would limit outward polycystin conductance. We also suggest but did not test altering the membrane formulation of GUVs with additional cholesterol to stabilize these recordings.

      (2) Validation. Further discussion on how broadly this method can be applied to other channels would strengthen the manuscript.

      We have included further discussion on this point in lines 190-206. 

      (3) Protein production estimated by a standard GFP absorbance assay. The estimation of protein production using GFP absorption may be affected by improperly folded protein. Additional validation methods could be considered.

      C-terminal GFP fluorescence has been widely used in expression systems to designate proper folding of the target protein upstream of the GFP-tag (PMID: 22848743, PMID: 21805523, PMID: 35520093). Nonetheless we have conducted additional experiments designed to estimate the amount of assembled PKD2 and PKD2L1 channels generated using the CFE method. In the new Figure 1— figure supplement 1 D, E, we carried out fluorescencedetection size-exclusion chromatography and compared channel assembly of recombinant and CFE+SUV derived PKD2-GFP and PKD2L1-GFP. Here, we clearly observed tetrameric and protomeric forms of the channels using the synthetic approach, which supports feasibility of using this method to assay their functional properties (see new Figure 1— figure supplement 1 D, E).  When compared to protein recombinant sources from HEK cells, the production of assembled channels is less than 4% when using the CFE+SUV approach, an estimate based on the oligomer peak fluorescence. 

      (4) Single channels were observed more frequently from PKD2 incorporated GUVs compared to PKD2L1. Does this just randomly happen or is there a reason behind this difference?

      In terms of overall protein production and oligomeric assembly— more PKD2L1 channels are produced compared to PKD2 (Figure 1C, and Figure 1— figure supplement 1 D, E). This is apparent whether the channels are produced recombinantly in cells or when using the cell-free method (Figure 1— figure supplement 1 D, E). In lines 149-155, we note single channel openings were frequently observed but that the high expression of the PKD2L1 often resulted in patch instability. As a result, GUV patches the lower expressing PKD2-GFP channel were more stable and thus more successfully recorded from. As requested, we have included a brief description of the two proteins in lines 76-78. 

      (5) Additional validation or clarification for examining the channel orientation may strengthen the manuscript.

      We have modified the text to make this point clearer. 

      (6) Advantage and limitations. The authors compared the recordings from hippocampal primary cilia membranes, noting differences in conductance magnitudes compared to the GUV method. Further discussing the limitations and advantages of this approach for the biophysical properties of organelle channels would be beneficial.

      We have revised the final paragraph to discuss the limitations of this method.

      (7) Including experiments that demonstrate ligand-induced activation or inhibition to further validate the current using this method would strengthen the manuscript (optional, not required).

      Despite our best attempts, exchange of the external bath to apply inhibitors (Gd3+, La3+) resulted in GUV patch instability. Our plans are to investigate ways to stabilize the high resistance seals to develop pharmacological screening using the CFE+GUV method.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We would like to thank the reviewers for their interest in our studies. In response to their comments, we have conducted additional experiments and made the necessary revisions to the manuscript. The new studies included to address the reviewers’ comments are shown in Figure 1B, 1F, Figure 2—figure supplement 1, Figure 3, Figure 3—figure supplement 1, Figure 3—figure supplement 2, Figure 3—figure supplement 3, Figure 4E, Figure 4—figure supplement 1, Figure 5, Figure 5—figure supplement 1, Figure 5—figure supplement 2D, and Figure 6. We are grateful for the critiques, which have helped us substantially improve the quality of the manuscript.

      Below, we have provided a point-by-point response to the reviewers’ comments.  

      Public Reviews:

      Reviewer #1 (Public Review):

      In this paper, the authors show that disruption of calcineurin, which is encoded by tax-6 in C. elegans, results in increased susceptibility to P. aeruginosa, but extends lifespan. In exploring the mechanisms involved, the authors show that disruption of tax-6 decreases the rate of defecation leading to intestinal accumulation of bacteria and distension of the intestinal lumen. The authors further show that the lifespan extension is dependent on hlh-30, which may be involved in breaking down lipids following deficits in defecation, and nhr-8, whose levels are increased by deficits in defecation. The authors propose a model in which disruption of the defecation motor program is responsible for the effect of calcineurin on pathogen susceptibility and lifespan, but do not exclude the possibility that calcineurin affects these phenotypes independently of defecation.

      We thank the reviewer for providing an excellent summary of our work. We have performed additional experiments as suggested by both the reviewers and believe we have thoroughly addressed all the reviewers' concerns.

      Reviewer #2 (Public Review):

      The manuscript titled "Calcineurin Inhibition Enhances Caenorhabditis elegans Lifespan by Defecation Defects-Mediated Calorie Restriction and Nuclear Hormone Signaling" by Priyanka Das, Alejandro Aballay, and Jogender Singh reveals that inhibiting calcineurin, a conserved protein phosphatase, in C. elegans affects the defecation motor program (DMP), leading to intestinal bloating and increased susceptibility to bacterial infection. This intestinal bloating mimics calorie restriction, ultimately resulting in an enhanced lifespan. The research identifies the involvement of HLH-30 and NHR-8 proteins in this lifespan enhancement, providing new insights into the role of calcineurin in C. elegans DMP and mechanisms for longevity.

      The authors present novel findings on the role of calcineurin in regulating the defecation motor program in C. elegans and how its inhibition can lead to lifespan enhancement. The evidence provided is solid with multiple experiments supporting the main claims.

      Strengths:

      The manuscript's strength lies in the authors' use of genetic and biochemical techniques to investigate the role of calcineurin in regulating the DMP, innate immunity, and lifespan in C. elegans. Moreover, the authors' findings provide a new mechanism for calcineurin inhibitionmediated longevity extension, which could have significant implications for understanding the molecular basis of aging and developing interventions to promote healthy aging.

      (1) The study uncovers a new role for calcineurin in the regulation of C. elegans DMP and a potential novel pathway for enhancing lifespan via calorie restriction involving calcineurin, HLH-30, and NHR-8 in C. elegans.

      (2) Multiple signaling pathways involved in lifespan enhancement were investigated with fairly strong experimental evidence supporting their claims.

      We thank the reviewer for an excellent summary of our work and for highlighting the strengths of the findings.

      Weaknesses:

      The manuscript's weaknesses include the lack of mechanistic details regarding how calcineurin inhibition leads to defects in the DMP and induces calorie restriction-like effects on lifespan.

      The exact site of calcineurin action, i.e., whether in the intestine or enteric muscles (Lee et al., 2005), and the possible molecular mechanisms linking calcineurin inhibition, DMP defects, and lifespan were not adequately explored. Although characterization of the full mechanism is probably beyond the scope of this paper, given the relative simplicity and advantages of using C. elegans as a model organism for this study, some degree of rigor is expected with additional straightforward control experiments as listed below:

      The authors state that tax-6 knockdown animals had drastically reduced expulsion events (Figure 2G), leading to irregular DMP (Lines 144-145), but did not describe the nature of DMP irregularity. For example, did the reduced expulsion events still occur with regular intervals but longer cycle lengths? Or was the rhythmicity completely abolished? The former would suggest the intestine clock is still intact, and the latter would indicate that calcineurin is required for the clock to function. Therefore, ethograms of DMP in both wild-type and tax6 mutant animals are warranted to be included in the manuscript. Along the same line, besides the cycle length, the three separable motor steps (aBoc, pBoc, EMC) are easily measurable, with each step indicating where the program goes wrong, hence the site of action, which is precisely the beauty of studying C. elegans DMP. Unfortunately, the authors did not use this opportunity to characterize the exact behavior phenotypes of the tax-6 mutant to guide future investigations. Furthermore, it is interesting that about 64% of tax-6 (p675) animals had normal DMP. The authors attributed this to p675 being a weak allele. It would be informative to further examine tax-6 RNAi as in other experiments or to make a tax-6 null mutant with CRISPR. In addition, in one of the cited papers (Lee et al., 2005), the exact calcineurin loss-of-function strain tax-6(p675) was shown to have normal defecation, including normal EMC, while the gain-of-function mutant of calcineurin tax-6(jh107) had abnormal EMC steps. It wasn't clear from Lee et al., 2005, if the reported "normal defecation" was only referring to the expulsion step or also included the cycle length. Nevertheless, this potential contradiction and calcineurin gain-of-function mutant is highly relevant to the current study and should be further explored as a follow-up to previously reported results. For some of the key experiments, such as tax-6's effects on susceptibility to PA14, DMP, intestinal bloating, and lifespan, additional controls, as the norm of C. elegans studies, including second allele and rescue experiments, would strengthen the authors' claims and conclusions.

      We have now included lifespan, survival on P. aeruginosa, and DMP data using an additional knockout allele, tax-6(ok2065). Additionally, we have added ethograms of DMP for both tax-6 RNAi and the tax-6(ok2065) mutant. Our observations indicate that tax-6 inhibition leads to a complete loss of DMP rhythmicity, suggesting that calcineurin is essential for maintaining the DMP clock. While characterizing the DMP, we noticed that expulsion events appeared superficial in the tax-6(ok2065) mutant, with little to no gut content released. Consequently, we examined the movement of gut content and found that both tax-6(ok2065) mutants and tax-6 knockdown animals showed significantly reduced gut content movement. The new findings on the characterization of DMP are presented in Figure 2—figure supplement 1, Figure 3, Figure 3—figure supplement 1, and Figure 3—figure supplement 2. The text in the results section reads (lines 160-176): “Next, we investigated whether the reduced number of expulsion events was due to regular intervals with longer cycle lengths or if rhythmicity was entirely disrupted upon tax-6 knockdown. To assess this, we obtained ethograms of the DMP for N2 animals grown on control and tax-6 RNAi. While animals on control RNAi displayed regular cycles of pBoc, aBoc, and EMC, the tax-6 RNAi animals exhibited disrupted rhythmicity (Figure 3A and Figure 3—figure supplement 1). Most tax-6 knockdown animals lacked the pBoc and aBoc steps and had sporadic expulsion events. Isolated pBoc events were occasionally observed, indicating a complete loss of rhythmicity in tax-6 knockdown animals. Ethograms for tax-6(ok2065) animals also showed disrupted rhythmicity (Figure 3B and Figure 3—figure supplement 2). Although the number of expulsion events appeared higher in tax-6(ok2065) animals compared to tax-6 RNAi animals (Figure 3—figure supplement 1 and 2), these expulsion events seemed superficial, releasing little to no gut content. This suggested slow movement of gut content in tax6(ok2065) animals, leading to constipation and intestinal bloating. We examined gut content movement by measuring the clearance of blue dye (erioglaucine disodium salt) from the gut. The clearance was significantly slower in tax-6(ok2065) animals compared to N2 animals (Figure 3C), indicating impaired gut content movement due to the loss of tax-6. Similarly, tax-6 knockdown animals also showed significantly slowed gut content movement (Figure 3D).”

      Moreover, we have added a potential reason for the tax-6(p675) contradictory results from Lee et al., 2005 (lines 154-159): “At the 1-day-old adult stage, about 36% of tax-6(p675) animals showed irregular and slowed DMP, while the remainder had regular DMP (Figure 2H), suggesting that tax-6(p675) is a weak allele. The fraction of the animals with irregular DMP appeared to increase with age, indicating that this phenotype might be agedependent. This may also explain why tax-6(p675) animals were reported to have a normal defecation cycle in an earlier study (Lee et al., 2005).”

      The second weakness of this manuscript is the data presentation for all survival rate curves. The authors stated that three independent experiments or biological replicates were performed for each group but only showed one "representative" curve for each plot. Without seeing all individual datasets or the averaged data with error bars, there is no way to evaluate the variability and consistency of the survival rate reported in this study.

      We now provide all replicates data in the source data files.

      Overall, the authors' claims and conclusions are justified by their data, but further experiments are needed to confirm their findings and establish the detailed mechanisms underlying the observed effects of calcineurin inhibition on the DMP, calorie restriction, and lifespan in C. elegans.

      We have conducted additional experiments to elucidate the role of calcineurin in the DMP and to investigate the impact of the DMP on calorie restriction and lifespan in C. elegans, as described in the various responses to the reviewers’ comments. 

      Recommendations for the authors:

      Our specific comments to guide the authors, should they choose to revise the manuscript:

      The RNAi experiments in the eat-2 mutant background are difficult to interpret. If these animals are eating fewer bacteria, it is possible that there is also less tax-6 dsRNA being ingested and therefore less tax-6 inactivation. These experiments should be conducted with a tax-6 null allele.

      We have included lifespan experiments with the eat-2(ad465);tax-6(ok2065) double mutant, along with the individual single mutant controls, as shown in Figure 4E. These results demonstrate that the eat-2 mutation does not further extend the lifespan of the tax-6(ok2065) mutant. Additionally, we confirmed that the eat-2(ad465) mutants do not exhibit defects in feeding-based RNAi (Figure 4—figure supplement 1).

      While aak-2, hlh-30, and nhr-8 mutants may not have an eat phenotype, the negative tax-6 RNAi results should be confirmed with a tax-6 null mutant to obviate the consideration that these background mutations reduce RNAi efficacy.

      The genes hlh-30 and nhr-8 are located very close to tax-6 on chromosome IV (https://wormbase.org//#012-34-5), which made it challenging to generate double mutants. However, we tested the RNAi sensitivity of the hlh-30(tm1978) and nhr-8(ok186) mutants and confirmed that they are not defective in RNAi (Figure 5—figure supplement 1). We also found that tax-6 RNAi disrupted the DMP in both hlh-30(tm1978) and nhr-8(ok186) mutants (Figure 5—figure supplement 2). Furthermore, our results show that hlh-30(tm1978) and nhr-8(ok186) animals have increased susceptibility to P. aeruginosa upon tax-6 knockdown (Figure 6A, B), indicating that tax-6 RNAi was effective in these mutants. Since the phenotype in the aak-2 mutant was only partially observed, we did not conduct further experiments with aak-2 mutants.

      Reviewer #1 (Recommendations For The Authors):

      The low penetrance of defecation cycle defects in tax-6(p675) worms brings into question the role of the defecation deficits in the phenotypes caused by the disruption of tax6. At the same time, the low penetrance provides a golden opportunity to test this. Do tax6(p675) worms with a normal defecation cycle length have extended longevity? Increased susceptibility to bacterial pathogens? Smaller body size? Distended lumen? Decreased fat accumulation? Increased pha-4 and nhr-8 expression? It would be relatively straightforward to measure defecation cycle length in individual tax-6(p675) worms, bin them into normal defecation and slow defecation groups, and then compare the above-mentioned phenotypes.

      We appreciate the reviewer's interesting suggestion. However, the DMP defect phenotype in tax-6(p675) worms appears to be age-dependent, with the number of DMPdefective worms increasing as they age. Additionally, we observed that exposure to P. aeruginosa accelerates the onset of DMP defects in tax-6(p675) worms. As a result, tax6(p675) worms are not suitable for the type of experiments the reviewer suggested. Nevertheless, we believe that the additional data using the tax-6(ok2065) mutant, along with the characterization of ethograms of DMP, firmly establishes the role of calcineurin in maintaining a regular DMP in C. elegans.

      Another way to dissect specific effects of calcineurin disruption from phenotypes resulting from defecation motor program deficits would be to further characterize other worms with deficits in defecation (flr-1, nhx-2, pbo-1 RNAi). It is mentioned that they have decreased lifespan. Do they also show increased susceptibility to bacterial pathogens? Do they show decreased fat? Is their lifespan dependent on HLH-30 and NHR-8?

      We thank the reviewer for this important suggestion. We have now included data with flr-1, nhx-2, and pbo-1 RNAi, which shows that the knockdown of these genes also enhances susceptibility to P. aeruginosa (Figure 3—figure supplement 3G). Knockdown of these genes is already known to reduce fat levels in N2 worms, and we demonstrate that they similarly reduce fat levels in hlh-30(tm1978) and nhr-8(ok186) animals (Figure 5B, C, F, G). Additionally, we found that the increased lifespan observed upon knockdown of these genes (as well as with tax-6 knockdown) is dependent on HLH-30 and NHR-8 (Figure 5A, D).

      To place "enhanced susceptibility to pathogen" within the proposed model, it would be important to examine the effect of HLH-30 and NHR-8 disruption on this phenotype. The proposed model suggests that this phenotype is independent of HLH-30 and NHR-8, but this should be tested experimentally. Similarly, it would be important to test the effect of HLH-30 and NHR-8 disruption on defecation cycle length to determine if defecation deficits are upstream or downstream of deficits in the defecation motor program

      We show that the knockdown of tax-6 leads to defects in the DMP in hlh30(tm1978) and nhr-8(ok186) animals (Figure 5—figure supplement 2). Moreover, we show that hlh-30(tm1978) and nhr-8(ok186) animals have increased susceptibility to P. aeruginosa upon tax-6 knockdown (Figure 6A, B). These results are described as (lines 279-285): “Given that HLH-30 and NHR-8 are essential for lifespan extension upon calcineurin inhibition, we investigated whether these pathways also influence survival in response to P. aeruginosa infection following calcineurin knockdown. Both hlh-30(tm1978) and nhr-8(ok186) animals showed significantly reduced survival upon tax-6 RNAi (Figure 6A, B). These findings suggested that the reduced survival on P. aeruginosa following calcineurin inhibition is independent of HLH-30 and NHR-8 and is more likely due to increased gut colonization by P. aeruginosa resulting from DMP defects (Figure 6C).”

      Is the lifespan of tax-6(p675) increased? This would be important to measure and include in Figure 1.

      Indeed, the lifespan of tax-6(p675) mutants is increased. We have included the lifespan of tax-6(p675) and tax-6(ok2065) in Figure 1F.

      In Figure 2, disruption of tax-6 appears to result in a clear decrease in body size. To what extent is the decrease in fat/worm in Figure 3 simply a result of the worms being smaller? Perhaps, a measurement of Oil-Red-O intensity PER AREA would be a more appropriate measure.

      The ORO intensity values we had shown per animal were already area normalized. We have now indicated this in the Figure Legends.

      There are multiple long-lived mutant strains such as clk-1 and isp-1 that have an increased defecation cycle length. To what extent do these worms exhibit phenotypes similar to tax-6 disruption? isp-1 have increased resistance to bacterial pathogens suggesting that defecation motor program deficits are not sufficient to increase susceptibility to bacterial pathogens.

      We have now examined the clk-1 and isp-1 mutants and found that these mutants exhibit reduced gut colonization by P. aeruginosa compared to N2 animals. This reduction in colonization may be attributed to the slowed pharyngeal pumping rates observed in these mutants. These findings suggest that the phenotypes associated with a slow DMP versus a disrupted DMP could be significantly different. The manuscript with the new data on these mutants reads (lines 177-192): “We then explored whether the disruption of DMP rhythmicity due to tax-6 knockdown affected P. aeruginosa responses similarly to longer but regular DMP cycles. To do this, we studied P. aeruginosa colonization in clk-1(qm30) and isp1(qm150) mutants, which have regular but extended DMP cycles (Feng et al., 2001; Wong et al., 1995). Interestingly, both clk-1(qm30) and isp-1(qm150) mutants showed significantly reduced intestinal colonization by P. aeruginosa compared to N2 animals (Figure 3—figure supplement 3A-D). This reduced colonization could be attributed to their significantly decreased pharyngeal pumping rates (Wong et al., 1995; Yee et al., 2014), suggesting a lower intake of bacterial food in these mutants. While the survival of clk-1(qm30) animals on P. aeruginosa was comparable to N2 animals (Figure 3—figure supplement 3E), isp1(qm150) animals exhibited significantly improved survival (Figure 3—figure supplement 3F). Conversely, knockdown of flr-1, nhx-2, and pbo-1 in N2 animals resulted in significantly reduced survival on P. aeruginosa compared to control RNAi (Figure 3—figure supplement 3G). Knockdown of these genes causes complete disruption of DMP rhythmicity, increasing gut colonization by P. aeruginosa (Singh and Aballay, 2019a). Overall, these findings demonstrated that calcineurin is crucial for maintaining the DMP ultradian clock, and its inhibition increases susceptibility to P. aeruginosa by disrupting the DMP.”

      Line 192. This statement is speculative. There is no evidence that HLH-30 is mediating lipid depletion in these worms.

      We have removed this statement. We observed that the knockdown of flr-1, nhx2, and pbo-1 resulted in significant fat depletion in hlh-30(tm1978) animals (Figure 5B, C). Additionally, tax-6 knockdown also caused a small but significant reduction in fat levels in hlh-30(tm1978) animals. This contrasts with our initial submission, possibly due to the increased number of animals included in the analysis. These findings suggest that the increase in lifespan due to DMP defects requires HLH-30, likely through a mechanism independent of HLH-30’s role in fat depletion. We have updated the manuscript text and model (Figure 6C) accordingly.

      In Figure S2, tax-6 RNAi appears to have a more detrimental effect in pmk-1 mutants than the other mutants. The authors should comment on this.

      We have added the following sentence in the manuscript (lines 123-125): “The knockdown of tax-6 appeared to have a more pronounced effect in pmk-1(km25) mutants than in other mutants, suggesting that inhibition of tax-6 might exacerbate the adverse effects observed in pmk-1(km25) mutants.”

      Reviewer #2 (Recommendations For The Authors):

      Line 192-193: The statement is confusing and not accurate because HLH-30 did not enhance lifespan with or without calcineurin (Figure 4A and S4A, also in Lapierre 2023). The takeaway should be along the lines of calcineurin inhibition enhancing lifespan through HLH-30 or HLH-30 being required for lifespan enhancement via calcineurin inhibition.

      We have removed this statement. We now state (lines 237-239): “Knockdown of tax-6 did not extend the lifespan of hlh-30(tm1978) animals (Figure 5A), indicating that HLH-30 is required for the increased lifespan observed with calcineurin inhibition.”

      Line 261: Similar to the point above. Where is the data showing NHR-8 increases lifespan with or without calcineurin?

      We have removed this sentence.

      Figure 1 legend line 699: animals per condition per replicate >90, but in the Method section Line 317, it says more than 80 animals per condition per replicate. Could be more accurate.

      We have now specified in the Methods section that the exact number of animals per condition is provided in the source data files. Since different lifespan curves within a given figure panel had varying numbers of animals, we have indicated the lower boundary for all curves (including the replicates). The precise number of animals for each lifespan experiment is available in the source data files.

      Figures 2F and G, "tax-6" should be labeled as "tax-6 RNAi" to be consistent with other figures.

      We thank the reviewer for this suggestion and have updated the label to “tax-6 RNAi”.

      In summary, we would like to thank the reviewers again for providing constructive critiques. We believe we have fully addressed all the concerns of the reviewers by carrying out several new experiments and modifying the text. The manuscript has undergone substantial revision and has thereby improved significantly. We do hope that the evidence in support of the conclusions is found to be complete in the revised manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Recommendations for the authors:

      - The authors should think about revising the terminology used to describe electrophysiological data in zebrafish (Fig.5): "posterior" hair cells in a neuromast are sensitive to posterior-to-anterior flow, which is currently termed "anterior". This is confusing because when "posterior" or "anterior" is used, for instance in the labels of the figure, one may get confused about whether this applies to hair-cell position or directionality of the stimulus. It would help to always use clearer terminology for the stimulus (e.g. posterior-to-anterior (P-to-A) as in Kindig 2023, or "from the tail"). Also, the authors may want to clarify what we should see in Fig.5 demonstrating that posterior hair cells, with reversed hair-bundle polarity, actually evince transduction of similar magnitude as anterior hair cells, with normal polarity of their hair bundles. 

      This nomenclature can indeed be confusing. Per the reviewers request we have changed the terminology to always refer to the direction of flow sensed by the hair cells. For example, HCs that respond to posterior-directed flow or anterior-directed flow. We now denote these HCs as (A to P) and (P to A), respectively in the Figure for clarity. We have modified Figure 5, the Figure 5 legend and Results (starting line 339) to reflect these changes.

      In addition, in our results we now provide more context when comparing the response magnitude of the anterior-sensing hair cells in gpr156 mutants to the response magnitude of the two diVerent orientations of hair cells in controls.

      - Also, does it make sense that there is no defect in MET for mouse otolith organs with deleted GPR156, whereas there is a diVerence in the zebrafish lateral line? It would help motivate the study on mechanoelectrical transduction (see comment of Reviewer 1 below). 

      We previously discussed this point and recognized that subtle eVects remain possible in mouse (previously Discussion line 614). We have now  modified the text in the Discussion to better emphasize this point (new line 627). The Eatock lab is currently working on developing calcium imaging in the mouse utricle to revisit this question in a future study. "Subtle e)ects remain possible, however, given the variance in single-cell electrophysiological data from both control and mutant mice.  Nevertheless, current results are consistent with normal HC function in the Gpr156 mouse mutant, a prerequisite to interrogate how non-reversed HCs a)ects vestibular behavior."

      To help motivate transduction studies starting in the second Result paragraph, we added a transition at Line 205 that was indeed lacking:

      "Gpr156 inactivation could be a powerful model to specifically ask how HC reversal contributes to vestibular function. However, GPR156 may have other confounding roles in HCs besides regulating their orientation, similar to EMX2, which impacts mechanotransduction in zebrafish HCs (Kindig et al., 2023) and a)erent innervation  in mouse and zebrafish HCs (Ji et al., 2022; Ji et al., 2018)."

      (1) One overarching objective of this study was to use the Gpr156 KO model to discover how polarity reversal informs vestibular function (Introduction, overall summary in the last paragraph) . Pairing behavioral defects with hair cell orientation is only possible if hair cell transduction is normal, which had to be tested.

      (2) The notion that experiments that produced negative results are unecessary and are not properly motivated can only apply in retrospect. At early stages we performed electrophysiology because we did not know whether transduction would be normal in absence of GPR156. We also did not know whether innervation would be normal. The fact that both appear normal makes Gpr156 KO a better model to address the importance of orientation reversal (conclusion of the Discussion line 705).

      See also reply to Reviewer #1 below.

      Reviewer #1 (Recommendations For The Authors): 

      Fig1, panel B appears to show diVerent focal planes for Gpr156del/+ and Gpr156del/del. 

      Figure 1B had control and mutant panels at slightly diVerent focal planes indeed. We swapped the right (mutant) panel image and adjusted intensities in the control image to match adjustments of the new mutant image.  

      Given that this work is largely about polarity and connectivity to neurons, I do not understand the need to assess mechanosensitivity in Gpr156 mutants. Please explain in the text, as follows: "After establishing normal numbers and types of mouse vestibular HCs, we assessed whether HCs respond normally to hair bundle deflections in the absence of GPR156." We did this because... 

      Please see reply above in 'Recommendations for the authors' for comment about the need to assess mechanosensitivity. We agree that this transition was lacking, and we added an explanation as recommended:

      "Gpr156 inactivation could be a powerful model to specifically ask how HC reversal contributes to vestibular function. However, GPR156 may have other confounding roles in HCs besides regulating their orientation, similar to EMX2, which impacts mechanotransduction in zebrafish HCs (Kindig et al., 2023) and a)erent innervation  in mouse and zebrafish HCs (Ji et al., 2022; Ji et al., 2018)."

      Anyway, the data in Figures 2, 3 and 4 seems somewhat superfluous to the main message of the paper. 

      Please see reply above in 'Recommendations for the authors'. This data may appear superfluous in retrospect but we could not claim that behavioral changes in Gpr156 mutants reflect the role of the line of polarity reversal if, for example, hair cell transduction was abnormal. We had to perform experiments to figure this out. We were further motivated as data began to emerge from the zebrafish lateral line that showed eVects on HC transduction. Although we did not get positive results on this question in the mouse, we think the diVerence between models should be included as a significant part of the narrative.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We thank the reviewers for the constructive criticism and detailed assessment of our work which helped us to significantly improve our manuscript. We made significant changes to the text to better clarify our goals and approaches. To make our main goal of extracting the network dynamics clearer and to highlight the main advantage of our method in comparison with prior work we incorporated Videos 1-4 into the main text. We hope that these changes, together with the rest of our responses, convincingly demonstrate the utility of our method in producing results that are typically omitted from analysis by other methods and can provide important novel insights on the dynamics of the brain circuits. 

      Reviewer #1 (Public Review):

      (1) “First, this paper attempts to show the superiority of DyNetCP by comparing the performance of synaptic connectivity inference with GLMCC (Figure 2).”

      We believe that the goals of our work were not adequately formulated in the original manuscript that generated this apparent misunderstanding. As opposed to most of the prior work focused on reconstruction of static connectivity from spiking data (including GLMCC), our ultimate goal is to learn the dynamic connectivity structure, i.e. to extract time-dependent strength of the directed connectivity in the network. Since this formulation is fundamentally different from most of the prior work, therefore the goal here is not to show the “improvement” or “superiority” over prior methods that mostly focused on inference of static connectivity, but rather to thoroughly validate our approach and to show its usefulness for the dynamic analysis of experimental data. 

      (2) “This paper also compares the proposed method with standard statistical methods, such as jitter-corrected CCG (Figure 3) and JPSTH (Figure 4). It only shows that the results obtained by the proposed method are consistent with those obtained by the existing methods (CCG or JPSTH), which does not show the superiority of the proposed method.”

      The major problem for designing such a dynamic model is the virtual absence of ground-truth data either as verified experimental datasets or synthetic data with known time-varying connectivity. In this situation optimization of the model hyper-parameters and model verification is largely becoming a “shot in the dark”. Therefore, to resolve this problem and make the model generalizable, here we adopted a two-stage approach, where in the first step we learn static connections followed in the next step by inference of temporally varying dynamic connectivity. Dividing the problem into two stages enables us to separately compare the results of both stages to traditional descriptive statistical approaches. Static connectivity results of the model obtained in stage 1 are compared to classical pairwise CCG (Fig.2A,B) and GLMCC (Fig.2 C,D,E), while dynamic connectivity obtained in step 2 are compared to pairwise JPSTH (Fig.4D,E).

      Importantly, the goal here therefore is not to “outperform” the classical descriptive statistical or any other approaches, but rather to have a solid guidance for designing the model architecture and optimization of hyper-parameters. For example, to produce static weight results in Fig.2A,B that are statistically indistinguishable from the results of classical CCG, the procedure for the selection of weights which contribute to averaging is designed  as shown in Fig.9 and discussed in details in the Methods. Optimization of the L2 regularization parameter is illustrated in Fig.4 – figure supplement 1 that enables to produce dynamic weights very close to cJPSTH as evidenced by Pearson coefficient and TOST statistical tests. These comparisons demonstrate that indeed the results of CCG and JPSTH are faithfully reproduced by our model that, we conclude, is sufficient justification to apply the model to analyze experimental results. 

      (3) “However, the improvement in the synaptic connectivity inference does not seem to be convincing.”

      We are grateful for the reviewer to point out to this issue that we believe, as mentioned above, results from the deficiency of the original manuscript to clarify the major motivation for this comparison. Comparison of static connectivity inferred by stage 1 of our model to the results of GLMCC in Fig.2C,D,E is aimed at optimization of yet another two important parameters - the pair spike threshold and the peak height threshold. Here, in Fig. 2D we show that when the peak height threshold is reduced from rigorous 7 standard deviations (SD) to just 5 SD, our model recovers 74% of the ground truth connections that in fact is better than 69% produced by GLMCC for a comparable pair spike threshold of 80. As explained above, we do not intend to emphasize here that our model is “superior” since it was not our goal, but rather use this comparison to illustrate the approach for optimization of thresholds for units and pairs filtering as described in detail in Fig. 11 and corresponding section in Methods.

      To address these misunderstandings and better clarify the goal of our work we changed the text in the Introductory section accordingly. We also incorporated Videos 1-4 from the Supplementary Materials into the main text as Video 1, Video 2, Video 3, and Video 4. In fact, these videos represent the main advantage (or “superiority”) of our model with respect to prior art that enables to infer the time-dependent dynamics of network connectivity as opposed to static connections.

      (4) “While this paper compares the performance of DyNetCP with a state-of-the-art method (GLMCC), there are several problems with the comparison. For example: 

      (a) This paper focused only on excitatory connections (i.e., ignoring inhibitory neurons). 

      (b) This paper does not compare with existing neural network-based methods (e.g., CoNNECT: Endo et al. Sci. Rep. 2021; Deep learning: Donner et al. bioRxiv, 2024).

      (c) Only a population of neurons generated from the Hodgkin-Huxley model was evaluated.”

      (a) In general, the model of Eq.1 is agnostic to excitatory or inhibitory connections it can recover. In fact, Fig. 5 and Fig.6 illustrate inferred dynamic weights for both excitatory (red arrows) and inhibitory (blue arrows) connections between excitatory (red triangles) and inhibitory (blue circles) neurons. Similarly, inhibitory and excitatory dynamic interactions between connections are represented in Fig. 7 for the larger network across all visual cortices.

      (b) As stated above, the goal for the comparison of the static connectivity results of stage 1 of our model to other approaches is to guide the choice of thresholds and optimization of hyperparameters rather than claiming “superiority” of our model. Therefore, comparison with “static” CNN-based model of Endo et al. or ANN-based static model of Donner et al. (submitted to bioRxiv several months after our submission to eLife) is beyond the scope of this work. 

      (c) We have chosen exactly the same sub-population of neurons from the synthetic HH dataset of Ref. 26 that is used in Fig.6 of Ref. 26 that provides direct comparison of connections reconstructed by GLMCC in the original Ref.26 and the results of our model. 

      (5) “In summary, although DyNetCP has the potential to infer synaptic connections more accurately than existing methods, the paper does not provide sufficient analysis to make this claim. It is also unclear whether the proposed method is superior to the existing methods for estimating functional connectivity, such as jitter-corrected CCG and JPSTH. Thus, the strength of DyNetCP is unclear.”

      As we explained above, we have no intention to claim that our model is more accurate than existing static approaches. In fact, it is not feasible to have better estimation of connectivity than direct descriptive statistical methods as CCG or JPSTH. Instead, comparison with static (CCG and GLMCC) and temporal (JPSTH) approaches are used here to guide the choice of the model thresholds and to inform the optimization of hyper-parameters to make the prediction of the dynamic network connectivity reliable. The main strength of DyNetCP is inference of dynamic connectivity as illustrated in Videos 1-4. We demonstrated the utility of the method on the largest in-vivo experimental dataset available today and extracted the dynamics of cortical connectivity in local and global visual networks. This information is unattainable with any other contemporary methods we are aware of. 

      Reviewer #1 (Recommendations for the Authors):

      (6) “First, the authors should clarify the goal of the analysis, i.e., to extract either the functional connectivity or the synaptic connectivity. While this paper assumes that they are the same, it should be noted that functional connectivity can be different from synaptic connectivity (see Steavenson IH, Neurons Behav. Data Anal. Theory 2023).”

      The goal of our analysis is to extract dynamics of the spiking correlations. In this paper we intentionally avoided assigning a biological interpretation to the inferred dynamic weights. Our goal was to demonstrate that a trough of additional information on neural coding is hidden in the dynamics of neural correlations. The information that is typically omitted from the analysis of neuroscience data. 

      Biological interpretation of the extracted dynamic weights can follow the terminology of the shortterm plasticity between synaptically connected neurons (Refs 25, 33-37) or spike transmission strength (Refs 30-32,46). Alternatively, temporal changes in connection weights can be interpreted in terms of dynamically reconfigurable functional interactions of cortical networks (Refs 8-11,13,47) through which the information is flowing. We could not also exclude interpretation that combines both ideas. In any event our goal here is to extract these signals for a pair (video1, Fig.4), a cortical local circuit (Video 2, Fig.5), and for the whole visual cortical network (Videos 3, 4 and Fig.7). 

      To clarify this statement, we included a paragraph in the discussion section of the revised paper. 

      (7) “Finally, it would be valuable if the authors could also demonstrate the superiority of DyNetCP qualitatively. Can DyNetCP discover something interesting for neuroscientists from the large-scale in vivo dataset that the existing method cannot?”

      The model discovers dynamic time-varying changes in neuron synchronous spiking (Videos 1-4) that more traditional methods like CCG or GLMCC are not able to detect. The revealed dynamics is happening at the very short time scales of the order of just a few ms during the stimulus presentation. Calculations of the intrinsic dimensionality of the spiking manifold (Fig. 8) reveal that up to 25 additional dimensions of the neural code can be recovered using our approach. These dimensions are typically omitted from the analysis of the neural circuits using traditional methods.  

      Reviewer #2 (Public Review):

      (1) “Simulation for dynamic connectivity. It certainly seems doable to simulate a recurrent spiking network whose weights change over time, and I think this would be a worthwhile validation for this DyNetCP model. In particular, I think it would be valuable to understand how much the model overfits, and how accurately it can track known changes in coupling strength.”

      We are very grateful to the reviewer for this insight. Verification of the model on synthetic data with known time-varying connectivity would indeed be very useful. We did generate a synthetic dataset to test some of the model performance metrics - i.e. testing its ability to distinguish True Positive (TP) from False Positive (FP) “serial” or “common input” connections (Fig.10A,B). Comparison of dynamic and static weights might indeed help to distinguish TP connections from an artifactual FP connections. 

      Generating a large synthetic dataset with known dynamic connections that mimics interactions in cortical networks is, however, a separate and not very trivial task that is beyond the scope of this work. Instead, we designed a model with an architecture where overfitting can be tested in two consecutive stages by comparison with descriptive statistical approaches – CCG and JPSTH. Static stage 1 of the model predicts correlations that are statistically indistinguishable from the CCG results (Fig.2A,B). The dynamic stage 2 of the model produce dynamic weight matrices that faithfully reproduce the cJPSTH (Fig.4D,E). Calculated Pearson correlation coefficients and TOST testing enable optimizing the L2 regularization parameter as shown in Fig.4 – supplement 1 and described in detail in the Methods section. The ability to test results of both stages separately to descriptive statistical results is the main advantage of the chosen model architecture that allow to verify that the model does not overfit and can predict changes in coupling strength at least as good as descriptive statistical approaches (see also our answer above to the Reviewer #1 questions).

      (2) “If the only goal is "smoothing" time-varying CCGs, there are much easier statistical methods to do this (c.f. McKenzie et al. Neuron, 2021. Ren, Wei, Ghanbari, Stevenson. J Neurosci, 2022), and simulations could be useful to illustrate what the model adds beyond smoothing.”

      We are grateful to the reviewer for bringing up these very interesting and relevant references that we added to the discussion section in the paper. Especially of interest is the second one, that is calculating the time-varying CCG weight (“efficacy” in the paper terms) on the same Allen Institute Visual dataset as our work is using. It is indeed an elegant way to extract time-variable coupling strength that is similar to what our model is generating. The major difference of our model from that of Ren et al., as well as from GLMCC and any statistical approaches is that the DyNetCP learns connections of an entire network jointly in one pass, rather than calculating coupling separately for each pair in the dataset without considering the relative influence of other pairs in the network. Hence, our model can infer connections beyond pairwise (see Fig. 11 and corresponding discussion in Methods) while performing the inferences with computational efficiency. 

      (3) “Stimulus vs noise correlations. For studying correlations between neurons in sensory systems that are strongly driven by stimuli, it's common to use shuffling over trials to distinguish between stimulus correlations and "noise" correlations or putative synaptic connections. This would be a valuable comparison for Figure 5 to show if these are dynamic stimulus correlations or noise correlations. I would also suggest just plotting the CCGs calculated with a moving window to better illustrate how (and if) the dynamic weights differ from the data.”

      Thank you for this suggestion. Note that for all weight calculations in our model a standard jitter correction procedure of Ref. 33 Harrison et al., Neural Com 2009 is first implemented to mitigate the influences of correlated slow fluctuations (slow “noise”). Please also note that to obtain the results in Fig. 5 we split the 440 total experimental trials for this session (when animal is running, see Table 1) randomly into 352 training and 88 validation trials by selecting 44 training trials from each configuration of contrast or grating angle and 11 for validation. We checked that this random selection, if changed, produced the very same results as shown in Fig.5. 

      Comparison of descriptive statistical results of pairwise cJPSTH and the model are shown in Fig. 4D,E. The difference between the two is characterized in Fig.4 – supplement 1 in detail as evidenced by Pearson coefficient and TOST statistical tests.

      Reviewer #2 (Recommendations for the Authors):

      (4) “The method is described as "unsupervised" in the abstract, but most researchers would probably call this "supervised" (the static model, for instance, is logistic regression).”

      The model architecture is composed of two stages to make parameter optimization grounded. While the first stage is regression, the second and the most important stage is not. Therefore, we believe the term “unsupervised” is justified. 

      (5) “Introduction - it may be useful to mention that there have been some previous attempts to describe time-varying connectivity from spikes both with probabilistic models: Stevenson and Kording, Neurips (2011), Linderman, Stock, and Adams, Neurips (2014), Robinson, Berger, and Song, Neural Computation (2016), Wei and Stevenson, Neural Comp (2021) ... and with descriptive statistics: Fujisawa et al. Nat Neuroscience (2008), English et al. Neuron (2017), McKenzie et al. Neuron (2021).”

      We are very grateful to both reviewers for bringing up these very interesting and relevant references that we gladly included in the discussions within the Introduction and Discussion sections. 

      (6) “In the section "Static connectivity inferred by the DyNetCP from in-vivo recordings is biologically interpretable"... I may have missed it, but how is the "functional delay" calculated? And am I understanding right that for the DyNetCP you are just using [w_i\toj, w_j\toi] in place of the CCG?”

      The functional delay is calculated as a time lag of the maximum (or minimum) in the CCG (or static weight matrix). The static weight that the model is extracting is indeed the wiwj product. We changed the text in this section to better clarify these definitions. 

      (7) “P14 typo "sparce spiking" sparse”

      Fixed. Thank you. 

      (8) “Suggest rewarding "Extra-laminar interactions reveal formation of neuronal ensembles with both feedforward (e.g., layer 4 to layer 5), and feedback (e.g., layer 5 to layer 4) drives." I'm not sure this method can truly distinguish common input from directed, recurrent cortical effects. Just as an example in Figure 5, it looks like 2->4, 0->4, and 3>2 are 0 lag effects. If you wanted to add the "functional delay" analysis to this laminar result that could support some stronger claims about directionality, though.”

      The time lags for the results of Fig. 5 are indeed small, but, however, quantifiable. Left panel Fig. 5A shows static results with the correlation peaks shifted by 1ms from zero lag.

      (9) “Methods - I think it would be useful to mention how many parameters the full DyNetCP model has.”

      Overall, after the architecture of Fig.1C is established, dynamic weight averaging procedure is selected (Fig.9), and Fourier features are introduced (Fig.10), there is just a few parameters to optimize including L2 regularization (Fig.4 – supplement 1) and loss coefficient  (Fig.1 – figure supplement 1A). Other variables, common for all statistical approaches, include bin sizes in the lag time and in the trial time. Decreasing the bin size will improve time resolution while decreasing the number of spikes in each bin for reliable inference. Therefore, number of spikes threshold and other related thresholds α𝑠 , α𝑤 , α𝑝 as well as λ𝑖λ𝑗, need to be adjusted accordingly (Fig.11) as discussed in detail in the Methods, Section 4. We included this sentence in the text. 

      (10) “It may be useful to also mention recent results in mice (Senzai et al. Neuron, 2019) and monkeys (Trepka...Moore. eLife, 2022) that are assessing similar laminar structures with CCGs.”

      Thank you for pointing out these very interesting references. We added a paragraph in “Dynamic connectivity in VISp primary visual area” section comparing our results with these findings. In short, we observed that connections are distributed across the cortical depth with nearly the same maximum weights (Fig.7A) that is inconsistent with observed in Trepka et al, 2022 greatly diminished static connection efficacy within <200µm from the source. It is consistent, however, with the work of Senzai et al, 2019 that reveals much stronger long-distance correlations between layer 2/3 and layer 5 during waking in comparison to sleep states. In both cases these observations represent static connections averaged over a trial time, while the results presented in Video 3 and Fig.7A show strong temporal modulation of the connection strength between all the layers during the stimulus presentation. Therefore, our results demonstrate that tracking dynamic connectivity patterns in local cortical networks can be invaluable in assessing circuitlevel dynamic network organization.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this work, the authors utilize recurrent neural networks (RNNs) to explore the question of when and how neural dynamics and the network's output are related from a geometrical point of view. The authors found that RNNs operate between two extremes: an 'aligned' regime in which the weights and the largest PCs are strongly correlated and an 'oblique' regime where the output weights and the largest PCs are poorly correlated. Large output weights led to oblique dynamics, and small output weights to aligned dynamics. This feature impacts whether networks are robust to perturbation along output directions. Results were linked to experimental data by showing that these different regimes can be identified in neural recordings from several experiments.

      Strengths:

      A diverse set of relevant tasks.

      A well-chosen similarity measure.

      Exploration of various hyperparameter settings.

      Weaknesses:

      One of the major connections found BCI data with neural variance aligned to the outputs.

      Maybe I was confused about something, but doesn't this have to be the case based on the design of the experiment? The outputs of the BCI are chosen to align with the largest principal components of the data.

      The reviewer is correct. We indeed expected the BCI experiments to yield aligned dynamics. Our goal was to use this as a comparison for other, non-BCI recordings in which the correlation is smaller, i.e. dynamics closer to the oblique regime. We adjusted our wording accordingly and added a small discussion at the end of the experimental results, Section 2.6.

      Proposed experiments may have already been done (new neural activity patterns emerge with long-term learning, Oby et al. 2019). My understanding of these results is that activity moved to be aligned as the manifold changed, but more analyses could be done to more fully understand the relationship between those experiments and this work.

      The on- vs. off-manifold experiments are indeed very close to our work. On-manifold initializations, as stated above, are expected to yield aligned solutions. Off-manifold initializations allow, in principle, for both aligned and oblique solutions and are thus closer to our RNN simulations. If, during learning, the top PCs (dominant activity) rotate such that they align with the pre-defined output weights, then the system has reached an aligned solution. If the top PCs hardly change, and yet the behavior is still good, this is an oblique solution. There is some indication of an intermediate result (Figure 4C in Oby et al.), but the existing analysis there did not fully characterize these properties. Furthermore, our work suggests that systematically manipulating the norm of readout weights in off-manifold experiments can yield new insights. We thus view these as relevant results but suggest both further analysis and experiments. We rewrote the corresponding section in the discussion to include these points.

      Analysis of networks was thorough, but connections to neural data were weak. I am thoroughly convinced of the reported effect of large or small output weights in networks. I also think this framing could aid in future studies of interactions between brain regions.

      This is an interesting framing to consider the relationship between upstream activity and downstream outputs. As more labs record from several brain regions simultaneously, this work will provide an important theoretical framework for thinking about the relative geometries of neural representations between brain regions.

      It will be interesting to compare the relationship between geometries of representations and neural dynamics across connected different brain areas that are closer to the periphery vs. more central.

      It is exciting to think about the versatility of the oblique regime for shared representations and network dynamics across different computations.

      The versatility of the oblique regime could lead to differences between subjects in neural data.

      Thank you for the suggestions. Indeed, this is precisely why relative measures of the regime are valuable, even in the absence of absolute thresholds for regimes. We included your suggestions in the discussion.

      Reviewer #2 (Public Review):

      Summary:

      This paper tackles the problem of understanding when the dynamics of neural population activity do and do not align with some target output, such as an arm movement. The authors develop a theoretical framework based on RNNs showing that an alignment of neural dynamics to output can be simply controlled by the magnitude of the read-out weight vector while the RNN is being trained. Small magnitude vectors result in aligned dynamics, where low-dimensional neural activity recapitulates the target; large magnitude vectors result in "oblique" dynamics, where encoding is spread across many dimensions. The paper further explores how the aligned and oblique regimes differ, in particular, that the oblique regime allows degenerate solutions for the same target output.

      Strengths:

      - A really interesting new idea that different dynamics of neural circuits can arise simply from the initial magnitude of the output weight vector: once written out (Eq 3) it becomes obvious, which I take as the mark of a genuinely insightful idea.

      - The offered framework potentially unifies a collection of separate experimental results and ideas, largely from studies of the motor cortex in primates: the idea that much of the ongoing dynamics do not encode movement parameters; the existence of the "null space" of preparatory activity; and that ongoing dynamics of the motor cortex can rotate in the same direction even when the arm movement is rotating in opposite directions.

      - The main text is well written, with a wide-ranging set of key results synthesised and illustrated well and concisely.

      - The study shows that the occurrence of the aligned and oblique regimes generalises across a range of simulated behavioural tasks.

      - A deep analytical investigation of when the regimes occur and how they evolve over training.

      - The study shows where the oblique regime may be advantageous: allows multiple solutions to the same problem; and differs in sensitivity to perturbation and noise.

      - An insightful corollary result that noise in training is needed to obtain the oblique regime.

      - Tests whether the aligned and oblique regimes can be seen in neural recordings from primate cortex in a range of motor control tasks.

      Weaknesses:

      - The magnitude of the output weights is initially discussed as being fixed, and as far as I can tell all analytical results (sections 4.6-4.9) also assume this. But in all trained models that make up the bulk of the results (Figures 3-6) all three weight vectors/matrices (input, recurrent, and output) are trained by gradient descent. It would be good to see an explanation or results offered in the main text as to why the training always ends up in the same mapping (small->aligned; large->oblique) when it could, for example, optimise the output weights instead, which is the usual target (e.g. Sussillo & Abbott 2009 Neuron).

      We understand the reviewer’s surprise. We chose a typical setting (training all weights of an RNN with Adam) to show that we don’t have to fine-tune the setting (e.g. by fixing the output weights) to see the two regimes. However, other scenarios in which the output weights do change are possible, depending on the algorithm and details in the way the network is parameterized. Understanding why some settings lead to our scenario (no change in scale) and others don’t is not a simple question. A short explanation here, nonetheless:

      - Small changes to the internal weights are sufficient to solve the tasks.

      - Different versions of gradient descent and different ways of parametrizing the network lead to different results in which parts of the weights get trained. This goes in particular for how weight scales are introduced, e.g. [Jacot et al. 2018 Neurips], [Geiger et al. 2020 Journal of Statistical Mechanics], or [Yang, Hu 2020, arXiv, Feature learning in infinite-width networks]. One insight from these works is that plain gradient descent (GD) with small output weights leads to learning only at the output (and often divergence or unsuccessful learning). For this reason, plain GD (or stochastic GD) is not suitable for small output weights (the aligned regime). Other variants of GD, such as Adam or RMSprop, don’t have this problem because they shift the emphasis of learning to the hidden layers (here the recurrent weights). This is due to the normalization of the gradients.

      - FORCE learning [Sussillo & Abbott 2009] is somewhat special in that the output weights are simultaneously also used as feedback weights. That is, not only the output weights but also an additional low-rank feedback loop through these output weights is trained. As a side note: By construction, such a learning algorithm thus links the output directly to the internal dynamics, so that one would only expect aligned solutions – and the output weights remain correspondingly small in these algorithms [Mastrogiuseppe, Ostojic, 2019, Neural Comp].

      - In our setting, the output is not fed back to the network, so training the output alone would usually not suffice. Indeed, optimizing just the output weights is similar to what happens in the lazy training regime. These solutions, however, are not robust to noise, and we show that adding noise during the training does away with these solutions.

      To address this issue in the manuscript, we added the following sentence to section 2.2: “While explaining this observation is beyond the scope of this work, we note that (1) changing the internal weights suffices to solve the task, and that (2) the extent to which the output weights change during learning depends on the algorithm and specific parametrization [21, 27, 85].”

      - It is unclear what it means for neural activity to be "aligned" for target outputs that are not continuous time-series, such as the 1D or 2D oscillations used to illustrate most points here.

      Two of the modeled tasks have binary outputs; one has a 3-element binary vector.

      For any dynamics and output, we compare the alignment between the vector of output weights and the main PCs (the leading component of the dynamics). In the extreme of binary internal dynamics, i.e., two points {x_1, x_2}, there would only be one leading PC (the line connecting the two points, i.e. the choice decoder).

      - It is unclear what criteria are used to assign the analysed neural data to the oblique or aligned regimes of dynamics.

      Such an assignment is indeed difficult to achieve. The RNN models we showed were at the extremes of the two regimes, and these regimes are well characterized in the case of large networks (as described in the methods section). For the neural data, we find different levels of alignment for different experiments. These differences may not be strong enough to assign different regimes. Instead, our measures (correlation and relative fitting dimension) allow us to order the datasets. Here, the BCI data is more aligned than non-BCI data – perhaps unsurprisingly, given the experimental design of the prior and the previous findings for the rotation task [Russo et al, 2018]. We changed the manuscript accordingly, now focusing on the relative measure of alignment, even in the absence of absolute thresholds. We are curious whether future studies with more data, different tasks, or other brain regions might reveal stronger differentiation towards either extreme.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      There's so much interesting content in the supplement - it seemed like a whole other paper! It is interesting to read about the dynamics over the course of learning. Maybe you want to put this somewhere else so that more people read it?

      We are glad the reviewer appreciated this content. We think developing these analysis methods is essential for a more complete understanding of the oblique regime and how it arises, and that it should therefore be part of the current paper.

      Nice schematic in Figure 1.

      There were some statements in the text highlighting co-rotation in the top 2 PCs for oblique networks. Figure 4a looks like aligned networks might also co-rotate in a particular subspace that is not highlighted. I could be wrong, but the authors should look into this and correct it if so. If both aligned and oblique networks have co-rotation within the top 5 or so PCs, some text should be updated to reflect this.

      This is indeed the case, thanks for pointing this out! For one example, there is co-rotation for the aligned network already in the subspace spanned by PCs 1 and 3, see the figure below. We added a sentence indicating that co-rotation can take place at low-variance PCs for the aligned regime and pointed to this figure, which we added to the appendix (Fig. 17).

      While these observations are an important addition, we don’t think they qualitatively alter our results, particularly the stronger dissociation between output and internal dynamics for oblique than aligned dynamics.

      Figure 4 color labels were 'dark' and 'light'. I wasn't sure if this was a typo or if it was designed for colorblind readers? Either way, it wasn't too confusing, but adding more description might be useful.

      Fixed to red and yellow.

      Typo "Aligned networks have a ratio much large than one"

      Typo "just started to be explored" Typo "hence allowing to test"

      Fixed all typos.

      Reviewer #2 (Recommendations For The Authors):

      - Explain/discuss in the main text why the initial output weights reliably result in the required internal RNN dynamics (small->aligned; large->oblique) after training. The magnitude of the output weights is initially discussed as being fixed, and as far as I can tell all analytical results (sections 4.6-4.9) also assume this. But in all trained models that make up the bulk of the results (Figures 3-6) all three weight vectors/matrices (input, recurrent, and output) are trained by gradient descent. It would be good to see an explanation or results offered in the main text as to why the training always ends up in the same mapping (small->aligned; large->oblique) when it could, for example, just optimise the output weights instead.

      See the answer to a similar comment by Reviewer #1 above.

      - Page 6: explain the 5 tasks.

      We added a link to methods where the tasks are described.

      - Page 6/Fig 3 & Methods: explain assumptions used to compute a reconstruction R^2 between RNN PCs and a binary or vector target output.

      We added a new methods section, 4.4, where we explain the fitting process in Fig. 3. For all tasks, the target output was a time series with P specified target values in N_out dimensions. We thus always applied regression and did not differentiate between binary and non-binary tasks.

      - Page 8: methods and predictions are muddled up: paragraph ending "along different directions" should be followed by paragraph starting "Our intuition...". The intervening paragraph ("We apply perturbations...") should start after the first sentence of the paragraph "To test this,...".

      Right, these sentences were muddled up indeed. We put them in the correct order.

      - Page 10: what are the implications of the differences in noise alignment between the aligned and oblique regimes?

      The noise suppression in the oblique regime is a slow learning process that gradually renders the solution more stable. With a large readout, learning separates into two phases. An early phase, in which a “lazy” solution is learned quickly. This solution is not robust to noise. In a second, slower phase, learning gradually leads to a more robust solution: the oblique solution. The main text emphasizes the result of this process (noise suppression). In the methods, we closely follow this process. This process is possibly related to other slow learning process fine-tuning solutions, e.g., [Blanc et al. 2020, Li et al. 2021, Yang et al. 2023]. Furthermore, it would be interesting to see whether such fine-tuning happens in animals [Ratzon et al. 2024]. We added corresponding sentences to the discussion.

      - Neural data analysis:

      (i) Page 11 & Fig 7: the assignment of "aligned" or "oblique" to each neural dataset is based on the ratio of D_fit/D_x. But in all cases this ratio is less than 1, indicating fewer dimensions are needed for reconstruction than for explaining variance. Given the example in Figure 2 suggests this is an aligned regime, why assign any of them as "oblique"?

      We weakened the wording in the corresponding section, and now only state that BCI data leans more towards aligned, non-BCI data more towards oblique. This is consistent with the intuition that BCI is by construction aligned (decoder along largest PCs) and non-BCI data already showed signs of oblique dynamics (co-rotating leading PCs in the cycling task, Russo et al. 2018).

      We agree that Fig 2 (and Fig 3) could suggest distinguishing the regimes at a threshold D_fit/D_x = 1, although we hadn’t considered such a formal criterion.

      (ii) Figure 23 and main text page 11: discuss which outputs for NLB and BCI datasets were used in Figure 7 & and main text; the NLB results vary widely by output type - discuss in the main text; D_fit for NLB-maze-accuracy is missing from panel D; as the criterion is D_fit/D_x, plot this too.

      We now discuss which outputs were used in Fig. 7 in its caption: the velocity of the task-relevant entity (hand/finger/cursor). This was done to have one quantity across studies. We added a sentence to the main text, p. 11, which points to Fig 22 (which used to be Fig 23) and states that results are qualitatively similar for other decoded outputs, despite some fluctuations in numerical values and decodability.

      Regarding Fig 22: D_fit for NLB-maze-accuracy was beyond the manually set y-limit (for visibility of the other data points). We also extended the figure to include D_fit/D_x. We also discovered a small bug in the analysis code which required us to rerun the analysis and reproduce the plots. This also changed some of the numbers in the main text.

      - Discussion:

      "They do not explain why it [the "irrelevant activity"] is necessary", implies that the following sentence(s) will explain this, but do not. Instead, they go on to say:

      "Here, we showed that merely ensuring stability of neural dynamics can lead to the oblique regime": this does not explain why it is necessary, merely that it exists; and it is unclear what results "stability of neural dynamics" is referring to.

      We agree this was not a very clear formulation. We replaced these last three sentences with the following:

      “Our study systematically explains this phenomenon: generating task-related output in the presence of large, task-unrelated dynamics requires large readout weights. Conversely, in the presence of large output weights, resistance to noise or perturbations requires large, potentially task-unrelated neural dynamics (the oblique regime).”

      - The need for all 27 figures was unclear, especially as some seemed not to be referenced or were referenced out of order. Please check and clarify.

      Fig 16 (Details for network dynamics in cycling tasks) and Fig 21 (loss over learning time for the different tasks) were not referenced, and are now removed.

      We also reordered the figures in the appendix so that they would appear in the order they are referenced. Note that we added another figure (now Fig. 17) following a question from Reviewer #1.

    1. Author Response:

      Reviewer #1 (Public review):

      In this study, Deshmukh et al. provide an elegant illustration of Haldane's sieve, the population genetics concept stating that novel advantageous alleles are more likely to fix if dominant because dominant alleles are more readily exposed to selection. To achieve this, the authors rely on a uniquely suited study system, the female-polymorphic butterfly Papilio polytes.

      Deshmukh et al. first reconstruct the chronology of allele evolution in the P. polytes species group, clearly establishing the non-mimetic cyrus allele as ancestral, followed by the origin of the mimetic allele polytes/theseus, via a previously characterized inversion of the dsx locus, and most recently, the origin of the romulus allele in the P. polytes lineage, after its split from P. javanus. The authors then examine the two crucial predictions of Haldane's sieve, using the three alleles of P. polytes (cyrus, polytes, and romulus). First, they report with compelling evidence that these alleles are sequentially dominant, or put in other words, novel adaptive alleles either are or quickly become dominant upon their origin. Second, the authors find a robust signature of positive selection at the dsx locus, across all five species that share the polytes allele.

      In addition to exquisitely exemplifying Haldane's sieve, this study characterizes the genetic differences (or lack thereof) between mimetic alleles at the dsx locus. Remarkably, the polytes and romulus alleles are profoundly differentiated, despite their short divergence time (< 0.5 my), whereas the polytes and theseus alleles are indistinguishable across both coding and intronic sequences of dsx. Finally, the study reports incidental evidence of exon swaps between the polytes and romulus alleles. These exon swaps caused intermediate colour patterns and suggest that (rare) recombination might be a mechanism by which novel morphs evolve.

      This study advances our understanding of the evolution of the mimicry polymorphism in Papilio butterflies. This is an important contribution to a system already at the forefront of research on the genetic and developmental basis of sex-specific phenotypic morphs, which are common in insects. More generally, the findings of this study have important implications for how we think about the molecular dynamics of adaptation. In particular, I found that finding extensive genetic divergence between the polytes and romulus alleles is striking, and it challenges the way I used to think about the evolution of this and other otherwise conserved developmental genes. I think that this study is also a great resource for teaching evolution. By linking classic population genetic theory to modern genomic methods, while using visually appealing traits (colour patterns), this study provides a simple yet compelling example to bring to a classroom.

      In general, I think that the conclusions of the study, in terms of the evolutionary history of the locus, the dominance relationships between P. polytes alleles, and the inference of a selective sweep in spite of contemporary balancing selection, are strongly supported; the data set is impressive and the analyses are all rigorous. I nonetheless think that there are a few ways in which the current presentation of these data could lead to confusion, and should be clarified and potentially also expanded.

      We thank the reviewer for the kind and encouraging assessment of our work.

      (1) The study is presented as addressing a paradox related to the evolution of phenotypic novelty in "highly constrained genetic architectures". If I understand correctly, these constraints are assumed to arise because the dsx inversion acts as a barrier to recombination. I agree that recombination in the mimicry locus is reduced and that recombination can be a source of phenotypic novelty. However, I'm not convinced that the presence of a structural variant necessarily constrains the potential evolution of novel discrete phenotypes. Instead, I'm having a hard time coming up with examples of discrete phenotypic polymorphisms that do not involve structural variants. If there is a paradox here, I think it should be more clearly justified, including an explanation of what a constrained genetic architecture means. I also think that the Discussion would be the place to return to this supposed paradox, and tell us exactly how the observations of exon swaps and the genetic characterization of the different mimicry alleles help resolve it.

      The paradox that we refer to here is essentially the contrast of evolving new adaptive traits which are genetically regulated, while maintaining the existing adaptive trait(s) at its fitness peak. While one of the mechanisms to achieve this could be differential structural rearrangement at the chromosomal level, it could arise due to alternative alleles or splice variants of a key gene (caste determination in Cardiocondyla ants), and differential regulation of expression (the spatial regulation of melanization in Nymphalid butterflies by ivory lncRNA). In each of these cases, a new mutation would have to give rise to a new phenotype without diluting the existing adaptive traits when it arises. We focused on structural variants, because that was the case in our study system, however, the point we were making referred to evolution of novel traits in general. We will add a section in the revised discussion to address this.

      (2) While Haldane's sieve is clearly demonstrated in the P. polytes lineage (with cyrus, polytes, and romulus alleles), there is another allele trio (cyrus, polytes, and theseus) for which Haldane's sieve could also be expected. However, the chronological order in which polytes and theseus evolved remains unresolved, precluding a similar investigation of sequential dominance. Likewise, the locus that differentiates polytes from theseus is unknown, so it's not currently feasible to identify a signature of positive selection shared by P. javanus and P. alphenor at this locus. I, therefore, think that it is premature to conclude that the evolution of these mimicry polymorphisms generally follows Haldane's sieve; of two allele trios, only one currently shows the expected pattern.

      We agree with the reviewer that the genetic basis of f. theseus requires further investigation. f. theseus occupies the same level on the dominance hierarchy of dsx alleles as f. polytes (Clarke and Sheppard, 1972) and the allelic variant of dsx present in both these female forms is identical, so there exists just one trio of alleles of dsx. Based on this evidence, we cannot comment on the origin of forms theseus and polytes. They could have arisen at the same time or sequentially. Since our paper is largely focused on the sequential evolution of dsx alleles through Haldane’s sieve, we have included f. theseus in our conclusions. We think that it fits into the framework of Haldane’s sieve due to its genetic dominance over the non-mimetic female form. However, this aspect needs to be explored further in a more specific study focusing on the characterization, origin, and developmental genetics of f. theseus in the future.

      Reviewer #2 (Public review):

      Summary:

      Deshmukh and colleagues studied the evolution of mimetic morphs in the Papilio polytes species group. They investigate the timing of origin of haplotypes associated with different morphs, their dominance relationships, associations with different isoform expressions, and evidence for selection and recombination in the sequence data. P. polytes is a textbook example of a Batesian mimic, and this study provides important nuanced insights into its evolution, and will therefore be relevant to many evolutionary biologists. I find the results regarding dominance and the sequence of events generally convincing, but I have some concerns about the motivation and interpretation of some other analyses, particularly the tests for selection.

      We thank the reviewer for these insightful remarks.

      Strengths:

      This study uses widespread sampling, large sample sizes from crossing experiments, and a wide range of data sources.

      We appreciate this point. This strength has indeed helped us illuminate the evolutionary dynamics of this classic example of balanced polymorphism.

      Weaknesses:

      (1) Purpose and premise of selective sweep analysis

      A major narrative of the paper is that new mimetic alleles have arisen and spread to high frequency, and their dominance over the pre-existing alleles is consistent with Haldane's sieve. It would therefore make sense to test for selective sweep signatures within each morph (and its corresponding dsx haplotype), rather than at the species level. This would allow a test of the prediction that those morphs that arose most recently would have the strongest sweep signatures.

      Sweep signatures erode over time - see Figure 2 of Moest et al. 2020 (https://doi.org/10.1371/journal.pbio.3000597), and it is unclear whether we expect the signatures of the original sweeps of these haplotypes to still be detectable at all. Moest et al show that sweep signatures are completely eroded by 1N generations after the event, and probably not detectable much sooner than that, so assuming effective population sizes of these species of a few million, at what time scale can we expect to detect sweeps? If these putative sweeps are in fact more recent than the origin of the different morphs, perhaps they would more likely be associated with the refinement of mimicry, but not necessarily providing evidence for or against a Haldane's sieve process in the origin of the morphs.

      Our original plan was to perform signatures of sweeps on individual morphs, but we have very small sample sizes for individual morphs in some species, which made it difficult to perform the analysis. We agree that signatures of selective sweeps cannot give us an estimate of possible timescales of the sweep. They simply indicate that there may have been a sweep in a certain genomic region. Therefore, with just the data from selective sweeps, we cannot determine whether these occurred with refining of mimicry or the mimetic phenotype itself. We have thus made no interpretations regarding time scales or causal events of the sweep. Additionally, we discuss the results we obtained for individual alleles represent what could have occurred at the point of origin of mimetic resemblance or in the course of perfecting the resemblance, although we cannot differentiate between the two at this point (lines 320 to 333).

      (2) Selective sweep methods

      A tool called RAiSD was used to detect signatures of selective sweeps, but this manuscript does not describe what signatures this tool considers (reduced diversity, skewed frequency spectrum, increased LD, all of the above?). Given the comment above, would this tool be sensitive to incomplete sweeps that affect only one morph in a species-level dataset? It is also not clear how RAiSD could identify signatures of selective sweeps at individual SNPs (line 206). Sweeps occur over tracts of the genome and it is often difficult to associate a sweep with a single gene.

      RAiSD (https://www.nature.com/articles/s42003-018-0085-8) detects selective sweeps using the μ statistic, which is a combined score of SFS, LD, and genetic diversity along a chromosome. The tool is quite sensitive and is able to detect soft sweeps. RAiSD can use a VCF variant file comprising of SNP data as input and uses an SNP-driven sliding window approach to scan the genome for signatures of sweep. Using an SNP file instead of runs of sequences prevents repeated calculations in regions that are sparse in variants, thereby optimizing execution time. Due to the nature of the input we used, the μ statistic was also calculated per site. We then tried to annotate the SNPs based on which genes they occur in and found that all species showing mimicry had atleast one site that showed a signature of sweep contained within the dsx locus.

      (3) Episodic diversification

      Very little information is provided about the Branch-site Unrestricted Statistical Test for Episodic Diversification (BUSTED) and Mixed Effects Model of Evolution (MEME), and what hypothesis the authors were testing by applying these methods. Although it is not mentioned in the manuscript, a quick search reveals that these are methods to study codon evolution along branches of a phylogeny. Without this information, it is difficult to understand the motivation for this analysis.

      We thank you for bringing this to our notice, we will add a few lines in the Methods about the hypothesis we were testing and the motivation behind this analysis. We will additionally cite a previous study from our group which used these and other methods to study the molecular evolution of dsx across insect lineages.

      (4) GWAS for form romulus

      The authors argue that the lack of SNP associations within dsx for form romulus is caused by poor read mapping in the inverted region itself (line 125). If this is true, we would expect strong association in the regions immediately outside the inversion. From Figure S3, there are four discrete peaks of association, and the location of dsx and the inversion are not indicated, so it is difficult to understand the authors' interpretation in light of this figure.

      We indeed observe the regions flanking dsx showing the highest association in our GWAS. This is a bit tricky to demonstrate in the figure as the genome is not assembled at the chromosome level. However, the association peaks occur on scf 908437033 at positions 2192979, 1181012 and 1352228 (Fig. S3c, Table S3) while dsx is located between 1938098 and 2045969. We will add the position of dsx in the figure legend of the revised manuscript.

      (5) Form theseus

      Since there appears to be only one sequence available for form theseus (actually it is said to be "P. javanus f. polytes/theseus"), is it reasonable to conclude that "the dsx coding sequence of f. theseus was identical to that of f. polytes in both P. javanus and P. alphenor" (Line 151)? Looking at the Clarke and Sheppard (1972) paper cited in the statement that "f. polytes and f. theseus show equal dominance" (line 153), it seems to me that their definition of theseus is quite different from that here. Without addressing this discrepancy, the results are difficult to interpret.

      Among P. javanus individuals sampled by us, we obtained just one individual with f. theseus and the H P allele, however, in the data we added from a previously published study (Zhang et. al. 2017), we were able to add nine more individuals of this form (Fig. S4b and S7), while we did not show these individuals in Fig 3 (which was based on PCR amplification and sequencing of individual exons od dsx), all the analysis with sequence data was performed on 10 theseus individuals in total. In Zhang et. al. the authors observed what we now know are species specific differences when comparing theseus and polytes dsx alleles and not allele-specific differences. Our observations were consistent with these findings.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Question 1: The experiment that utilizes lactose or glucose supplementation to infer the importance of carbohydrate recognition by galectin-9 cannot be interpreted unequivocally owing to the growth-enhancing effect of lactose supplementation on Mtb during liquid culture in vitro.

      Thank you for this very constructive comment. We repeated the experiments by lowering the concentration of lactose or AG from 10 μg/mL to 1 μg/mL. We found that low concentration of lactose or AG showed neglectable effect on Mtb growth, however, they still reversed the inhibitory effect of galectin-9 on mycobacterial growth (revised Fig. 2A, C). Therefore, we consider that the supplementation of lactose or AG reverse galectin-9 mediated inhibition of Mtb growth largely through its carbohydrate recognition rather than their growth-enhancing effect.

      Question 2: Similar to the comment above, the apparent dose-independent effect of galectin-9 on Mtb growth in vitro is difficult to reconcile with the interpretation that galectin is functioning as claimed.

      We thank the reviewer for the correction. Indeed, as the reviewer pointed out, galectin-9 inhibits Mtb growth in dose-independent manner. We had corrected the claim in the revised manuscript (Line 114).

      Question 3: The claimed differences in galectin-9 concentration in sera from tuberculin skin test (TST)-negative or TST-positive non-TB cases versus active TB patients are not immediately apparent from the data presented.

      We appreciate your concern. Previous samples are from a cohort set up in Max Plank Institute for Infection Biology. We have performed the detection of galectin-9 in sera in another independent cohort of active TB patients and healthy donors in China. And we found higher abundance of galectin-9 in serum from TB patients than that from heathy donors (revised Fig. 1E).

      Question 4: Neither fluorescence microscopy nor electron microscopy analyses are supported by high-quality, interpretable images which, in the absence of supporting quantitative data, renders any claims of anti-AG mAb specificity (fluorescence microscopy) or putative mAb-mediated cell wall swelling (electron microscopy) highly speculative.

      We appreciate your concern. We have improved the procedure of the immunofluorescence assay and obtained high-quality and interpretable images with quantitative data (revised Fig. 4F). As for electron microscopy analyses, we added clearer label indicating cell wall in revised manuscript (revised Fig. 7C).

      Question 5: Finally, the absence of any discussion of how anti-AG antibodies (similarly, galectin-9) gain access to the AG layer in the outer membrane of intact Mtb bacilli (which may additionally possess an extracellular capsule/coat) is a critical omission - situating these results in the context of current knowledge about Mtb cellular structure (especially the mycobacterial outer membrane) is essential for plausibility of the inferred galectin-9 and anti-AG mAb activities.

      Exactly, AG is hidden by mycolic acids in the outer layer of Mtb cell wall. As we have discussed in the Discussion part of previous manuscript (line 285), we speculate that during Mtb replication, cell wall synthesis is active and AG becomes exposed, thereby facilitating its binding to galectin-9 or AG antibody and leading to Mtb growth arrest. It’s highly possible that galectin-9 or AG antibody targets replicating Mtb.

      To Reviewer #2 (Public Review):

      Question 1: In light of other observations that cleaved galectin-9 levels in the plasma is a biomarker for severe infection (Padilla A et al Biomolecules 2021 and Iwasaki-Hozumi H et al. Biomoleucles 2021) it is difficult to reconcile the author's interpretation that the elevated gal-9 in Active TB patients (Figure 1E) contributes to the maintenance of latent infection in humans. The authors should consider incorporating these observations in the interpretation of their own results.

      Thank you for these very insightful comments. We observed elevated levels of galectin-9 in the serum of active TB patients, consistent with reports indicating that cleaved galectin-9 levels in the serum serve as a biomarker for severe infection (Iwasaki-Hozumi et al., 2021; Padilla et al., 2020). We consider that the elevated levels of galectin-9 in the serum of active TB may be an indicator of the host immune response to Mtb infection, however, the magnitude of elevated galectin-9 is not sufficient to control Mtb infection and maintain latent infection. This is highly similar to other protective immune factors such as interferon gamma, which is elevated in active TB as well (El-Masry et al., 2007; Hasan et al., 2009). We have included the discussion in the revised manuscript (line 298).

      Question 2: The anti-AG titers were measured only in individuals with active TB (Figure 3C), generally thought to be a less protective immunological state. The speculation that individuals with anti-AG titers have some protection is not founded. Further only 2 mAbs were tested to demonstrate restriction of Mtb in culture. It is possible that clones of different affinities for AG present within a patient's polyclonal AG-antibody responses may or may not display a direct growth restriction pressure on Mtb in culture. The authors should soften the claims about the presence of AG-titers in TB patients being indicative of protection.

      We appreciate your concern. As per your suggestion, we have softened the claim to that “We speculate that during Mtb infection, anti-AG IgG antibodies are induced, which potentially contribute to protection against TB by directly inhibiting Mtb replication albeit seemingly in vain.”

      References

      El-Masry, S., Lotfy, M., Nasif, W.A., El-Kady, I.M., and Al-Badrawy, M. (2007). Elevated serum level of interleukin (IL)-18, interferon (IFN)-gamma and soluble Fas in patients with pulmonary complications in tuberculosis. Acta microbiologica et immunologica Hungarica 54, 65-77.

      Hasan, Z., Jamil, B., Khan, J., Ali, R., Khan, M.A., Nasir, N., Yusuf, M.S., Jamil, S., Irfan, M., and Hussain, R. (2009). Relationship between circulating levels of IFN-gamma, IL-10, CXCL9 and CCL2 in pulmonary and extrapulmonary tuberculosis is dependent on disease severity. Scandinavian journal of immunology 69, 259-267.

      Iwasaki-Hozumi, H., Chagan-Yasutan, H., Ashino, Y., and Hattori, T. (2021). Blood Levels of Galectin-9, an Immuno-Regulating Molecule, Reflect the Severity for the Acute and Chronic Infectious Diseases. Biomolecules 11.

      Padilla, S.T., Niki, T., Furushima, D., Bai, G., Chagan-Yasutan, H., Telan, E.F., Tactacan-Abrenica, R.J., Maeda, Y., Solante, R., and Hattori, T. (2020). Plasma Levels of a Cleaved Form of Galectin-9 Are the Most Sensitive Biomarkers of Acquired Immune Deficiency Syndrome and Tuberculosis Coinfection. Biomolecules 10.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Weaknesses:

      The match between fractal and classical cycles is not one-to-one. For example, the fractal method identifies a correlation between age and cycle duration in adults that is not apparent with the classical method. This raises the question as to whether differences are due to one method being more reliable than another or whether they are also identifying different underlying biological differences. It is not clear for example whether the agreement between the two methods is better or worse than between two human scorers, which generally serve as a gold standard to validate novel methods. The authors provide some insight into differences between the methods that could account for differences in results. However, given that the fractal method is automatic it would be important to clearly identify criteria for recordings in which it will produce similar results to the classical method.

      Thank you for these insightful suggestions. In the revised Manuscript, we have added a number of additional analyses that provide a quantitative comparison between the classical and fractal cycle approaches aiming to identify the source of the discrepancies between classical and fractal cycle durations. Likewise, we assessed the intra-fractal and intra-classical method reliability as outlined below.

      Reviewer #1 (Recommendations For The Authors):

      One of the challenges in interpreting the results of the manuscript is understanding whether the differences between the two methods are due to a genuine difference in what these two methods are quantifying or simply noise/variability in each method. If the authors could provide some more insight into this, it would be a great help in assessing their findings and I think bolster the applicability of their method.

      (1) Method reliability: The manuscript clearly shows that cycle length is robustly correlated between fractal and classical in multiple datasets, however, it is hard to assign a meaningful interpretation to the correlation value (ie R = 0.5) without some reference point. This could be provided by looking at the intra-method correlation of cycle lengths. In the case of classical scoring, inter-scorer results could be compared, if the R-value here is significantly higher than 0.5 it would suggest genuine differences between the methods. In the case of fractal scoring, inter-electrode results could be compared / results with slight changes to the peak prominence threshold or smoothing window.

      In the revised Manuscript, we performed the following analyses to show the intra-method reliability:

      a) Classical cycle reliability: For the revised Manuscript, an additional scorer has independently defined classical sleep cycles for all datasets and marked sleep cycles with skipped REM sleep. Likewise, we have performed automatic sleep cycle detection using the R “SleepCycles” package by Blume & Cajochen (2021). We have added a new Table S8 to Supplementary Material 2 that shows the averaged cycle durations and cycle numbers obtained by the two human scorers and automatic algorithm as well as the inter-scorer rate agreement. We have added a new sheet named “Classical method reliability” that reports classical cycle durations for each participant and each dataset as defined by two human scorers and the algorithm To the Supplementary Excel file.

      We found that the correlation coefficients between two human scorers ranged from 0.69 to 0.91 (in literature, r’s > 0.7 are defined as strong scores) in different datasets, thus being higher than correlation coefficients between fractal and classical cycle durations, which in turn ranged from 0.41 to 0.55 (r’s in the range of 0.3 – 0.7 are considered moderate scores). The correlation coefficients between human raters and the automatic algorithm showed remarkably lower coefficients ranging from 0.30 to 0.69 (moderate scores) in different datasets, thus lying within the range of the correlation coefficients between fractal and classical cycle durations. This analysis is reported in Supplementary Material 2, section ”Intra-classical method reliability” and Table S8.

      b) Fractal cycle reliability: In the revised Supplementary Material 2 of our Manuscript, we assessed the intra-fractal method reliability, we correlated between the durations of fractal cycles calculated as defined in the main text, i.e., using a minimum peak prominence of 0.94 z and smoothing window of 101 thirty-second epochs, with those calculated using a minimum peak prominence ranging from 0.86 to 1.20 z with a step size of 0.04 z and smoothing windows ranging from 81 to 121 thirty-second epochs with a step size of 10 epochs (Table S7). We found that fractal cycle durations calculated using adjacent minimum peak prominence (i.e., those that differed by 0.04 z) showed r’s > 0.92, while those calculated using adjacent smoothing windows (i.e., those that differed by 10 epochs) showed r’s > 0.84. In addition, we correlated fractal cycle durations defined using different channels and found that the correlation coefficients ranged between 0.66 – 0.67 (Table S1). Thus, most of the correlations performed to assess intra-fractal method reliability showed correlation coefficients (r > 0.6) higher than those obtained to assess inter-method reliability (r = 0.41 – 0.55), i.e., correlations between fractal and classical cycle. This analysis is reported in Supplementary Material 2, section ”Intra-fractal method reliability” and Table S7. Likewise, we have added a new sheet named “Fractal method reliability” that reports the actual values for the abovementioned parameters to the Supplementary Excel file. For a discussion on potential sources of differences, see below.

      (2) Origin of method differences: The authors outline a few possible sources of discrepancies between the two methods (peak vs REM end, skipped REM cycle detection...) but do not quantify these contributions. It would be interesting to identify some factors that could predict for either a given night of sleep or dataset whether it is likely to show a strong or weak agreement between methods. This could be achieved by correlating measures of the proposed differences ("peak flatness", fractal cycle depth, or proportion of skipped REM cycles) with the mismatch between the two methods.

      In the revised Manuscript, we have quantified a few possible sources of discrepancies between the durations of fractal vs classical cycles and added a new section named “Sources of fractal and classical cycle mismatches” to the Results as well as new Tables 5 and S10 (Supplementary Material 2). Namely, we correlated the difference in classical vs fractal sleep cycle durations on the one side, and either the amplitude of fractal descent/ascent (to reflect fractal cycle depth), duration of cycles with skipped REM sleep/TST, duration of wake after sleep onset/TST or the REM episode length of a given cycle (to reflect peak flatness) on the other side. We found that a higher difference in classical vs fractal cycle duration was associated with a higher proportion of wake after sleep onset (r = 0.226, p = 0.001), shallower fractal descents (r = 0.15, p = 0.002) and longer REM episodes (r = 0.358, p < 0.001, n = 417 cycles, Table S10 in Supplementary Material 2). The rest of the assessed parameters showed no significant correlations (Table S10). We have added a new sheet named “Fractal-classical mismatch” that reports the actual values for the abovementioned parameters to the Supplementary Excel file.  

      (3) Skipped REM cycles: the authors underline that the fractal method identified skipped REM cycles. It seems likely that manual identification of skipped REM cycles is particularly challenging (ie we would expect this to be a particular source of error between two human scorers). If this is indeed the case, it would be interesting to discuss, since it would highlight an advantage of their methodology that they already point out (l644).

      In the revised Manuscript, we have added the inter-scorer rate agreement regarding cycles with skipped REM sleep, which was equal to 61%, which is 32% lower than the performance of our fractal cycle algorithm (93%). These findings are now reported in the “Skipped cycles” section of the Results and in Table S9 of Supplementary Material 2. We also discuss them in Discussion:

      “Our algorithm detected skipped cycles in 93% of cases while the hypnogram-based agreement on the presence/absence of skipped cycles between two independent human raters was 61% only; thus, 32% lower. We deduce that the fractal cycle algorithm detected skipped cycles since a lightening of sleep that replaces a REM episode in skipped cycles is often expressed as a local peak in fractal time series.”<br /> Discussion, section “Fractal and classical cycles comparison”, paragraph 5.

      Minor comments:

      - In the subjects where the number of fractal and classical cycles did not match, how large was the difference (ie just one extra cycle or more)? Correlating cycle numbers could be one way to quantify this.

      In the revised Manuscript, we have reported the required information for the participants with no one-to-one match (46% of all participants) as follows: 

      “In the remaining 46% of the participants, the difference between the fractal and classical cycle numbers ranged from -2 to 2 with the average of -0.23 ± 1.23 cycle. This subgroup had 4.6 ± 1.2 fractal cycles per participant, while the number of classical cycles was 4.9 ± 0.7 cycles per participant. The correlation coefficient between the fractal and classical cycle numbers was 0.280 (p = 0.006) and between the cycle durations – 0.278 (p=0.006).” Results, section “Correspondence between fractal and classical cycles”, last paragraph.

      - When discussing the skipped REM cycles (l467), the authors explain: "For simplicity and between-subject consistency, we included in the analysis only the first cycles". I'm not sure I understood this, could they clarify to which analysis they are referring to?

      In the revised Manuscript, we performed this analysis twice: using first cycles and using all cycles and therefore have rephrased this as follows:

      _“We tested whether the fractal cycle algorithm can detect skipped cycles, i.e., the cycles where an anticipated REM episode is skipped (possibly due to too high homeostatic pressure). We performed this analysis twice. First, we counted all skipped cycles (except the last cycles of a night, which might lack REM episode for other reasons, e.g., a participant had/was woken up). Second, we counted only the first classical cycles (i.e., the first cycle out of the 4 – 6 cycles that each participant had per night, Fig. 3 A – B) as these cy_cles coincide with the highest NREM pressure. An additional reason to disregard skipped cycles observed later during the night was our aim to achieve higher between-subject consistency as later skipped cycles were observed in only a small number of participants.” Results, section “Skipped cycles”, first paragraph.

      - The inclusion of all the hypnograms as a supplementary is a great idea to give the reader concrete intuition of the data. If the limits of the sleep cycles for both methods could be added it would be very useful.

      Supplementary Material 1 has been updated such that each graph has a mark showing the onsets of fractal and classical sleep cycles, including classical cycles with skipped REM sleep.

      - The difference in cycle duration between adults and children seems stronger / more reliable for the fractal cycle method, particularly in the histogram (Figure 3C). Is this difference statistically significant?

      In the revised Manuscript, we have added the Multivariate Analysis of Variance to compare F-values, partial R-squared and eta squared. The findings are as follows:

      “To compare the fractal approach with the classical one, we performed a Multivariate Analysis of Variance with fractal and classical cycle durations as dependent variables, the group as an independent variable and the age as a covariate. We found that fractal cycle durations showed higher F-values (F(1, 43)  \= 4.5 vs F(1, 43) = 3.1), adjusted R squared (0.138 vs 0.089) and effect sizes (partial eta squared 0.18 vs 0.13) than classical cycle durations.” Results, Fractal cycles in children and adolescents, paragraph 3.

      There have been some recent efforts to define sleep cycles in an automatic way using machine learning approaches. It could be interesting to mention these in the discussion and highlight their relevance to the general endeavour of automatizing the sleep cycle identification process.

      In the Discussion of the revised Manuscript, we have added the section on the existing automatic sleep cycle definition algorithms:

      “Even though recently, there has been a significant surge in sleep analysis incorporating various machine learning techniques and deep neural network architectures, we should stress that this research line mainly focused on the automatic classification of sleep stages and disorders almost ignoring the area of sleep cycles. Here, as a reference method, we used one of the very few available algorithms for sleep cycle detection (Blume & Cajochen, 2021). We found that automatically identified classical sleep cycles only moderately correlated with those detected by human raters (r’s = 0.3 – 0.7 in different datasets). These coefficients lay within the range of the coefficients between fractal and classical cycle durations (r = 0.41 – 0.55, moderate) and outside the range of the coefficients between classical cycle durations detected by two human scorers (r’s = 0.7 – 0.9, strong, Supplementary Material 2, Table S8).” Discussion, section “Fractal and classical cycles comparison”, paragraph 4.

      Reviewer #2 (Public Review):

      One weakness of the study, from my perspective, was that the IRASA fits to the data (e.g. the PSD, such as in Figure 1B), were not illustrated. One cannot get a sense of whether or not the algorithm is based entirely on the fractal component or whether the oscillatory component of the PSD also influences the slope calculations. This should be better illustrated, but I assume the fits are quite good.

      Thank you for this suggestion. In the revised Manuscript, we have added a new figure (Fig.S1 E, Supplementary Material 2), illustrating the goodness of fit of the data as assessed by the IRASA method.

      The cycles detected using IRASA are called fractal cycles. I appreciate the use of a simple term for this, but I am also concerned whether it could be potentially misleading? The term suggests there is something fractal about the cycle, whereas it's really just that the fractal component of the PSD is used to detect the cycle. A more appropriate term could be "fractal-detected cycles" or "fractal-based cycle" perhaps?

      We agree that these cycles are not fractal per se. In the Introduction, when we mention them for the first time, we name them “fractal activity-based cycles of sleep” and immediately after that add “or fractal cycles for short”. In the revised version, we renewed this abbreviation with each new major section and in Abstract. Nevertheless, given that the term “fractal cycles” is used 88 times, after those “reminders”, we used the short name again to facilitate readability. We hope that this will highlight that the cycles are not fractal per se and thus reduce the possible confusion while keeping the manuscript short.

      The study performs various comparisons of the durations of sleep cycles evaluated by the IRASA-based algorithm vs. conventional sleep scoring. One concern I had was that it appears cycles were simply identified by their order (first, second, etc.) but were not otherwise matched. This is problematic because, as evident from examples such as Figure 3B, sometimes one cycle conventionally scored is matched onto two fractal-based cycles. In the case of the Figure 3B example, it would be more appropriate to compare the duration of conventional cycle 5 vs. fractal cycle 7, rather than 5 vs. 5, as it appears is currently being performed.

      In cases where the number of fractal cycles differed from the number of classical cycles (from 34 to 55% in different datasets as in the case of Fig.3B), we did not perform one-to-one matching of cycles. Instead, we averaged the duration of the fractal and classical cycles over each participant and only then correlated between them (Fig.2C). For a subset of the participants (45 – 66% of the participants in different datasets) with a one-to-one match between the fractal and classical cycles, we performed an additional correlation without averaging, i.e., we correlated the durations of individual fractal and classical cycles (Fig.4S of Supplementary Material 2). This is stated in the Methods, section Statistical analysis, paragraph 2.

      There are a few statements in the discussion that I felt were either not well-supported. L629: about the "little biological foundation" of categorical definitions, e.g. for REM sleep or wake? I cannot agree with this statement as written. Also about "the gradual nature of typical biological processes". Surely the action potential is not gradual and there are many other examples of all-or-none biological events.

      In the revised Manuscript, we have removed these statements from both Introduction and Discussion.

      The authors appear to acknowledge a key point, which is that their methods do not discriminate between awake and REM periods. Thus their algorithm essentially detected cycles of slow-wave sleep alternating with wake/REM. Judging by the examples provided this appears to account for both the correspondence between fractal-based and conventional cycles, as well as their disagreements during the early part of the sleep cycle. While this point is acknowledged in the discussion section around L686. I am surprised that the authors then argue against this correspondence on L695. I did not find the "not-a-number" controls to be convincing. No examples were provided of such cycles, and it's hard to understand how positive z-values of the slopes are possible without the presence of some wake unless N1 stages are sufficient to provide a detected cycle (in which case, then the argument still holds except that its alterations between slow-wave sleep and N1 that could be what drives the detection).

      In the revised Manuscript, we have removed the “NaN analysis” from both Results and Discussion. We have replaced it with the correlation between the difference between the durations of the classical and fractal cycles and proportion of wake after sleep onset. The finding is as follows:

      “A larger difference between the durations of the classical and fractal cycles was associated with a higher proportion of wake after sleep onset in 3/5 datasets as well as in the merged dataset (Supplementary Material 2, Table S10).” Results, section “Fractal cycles and wake after sleep onset”, last two sentences. This is also discussed in Discussion, section “Fractal cycles and age”, paragraph 1, last sentence. 

      To me, it seems important to make clear whether the paper is proposing a different definition of cycles that could be easily detected without considering fractals or spectral slopes, but simply adjusting what one calls the onset/offset of a cycle, or whether there is something fundamentally important about measuring the PSD slope. The paper seems to be suggesting the latter but my sense from the results is that it's rather the former.

      Thank you for this important comment. Overall, our paper suggests that the fractal approach might reflect the cycling nature of sleep in a more precise and sensitive way than classical hypnograms. Importantly, neither fractal nor classical methods can shed light on the mechanism underlying sleep cycle generation due to their correlational approach. Despite this, the advantages of fractal over classical methods mentioned in our Manuscript are as follows:

      (1) Fractal cycles are based on a real-valued metric with known neurophysiological functional significance, which introduces a biological foundation and a more gradual impression of nocturnal changes compared to the abrupt changes that are inherent to hypnograms that use a rather arbitrary assigned categorical value (e.g., wake=0, REM=-1, N1=-2, N2=-3 and SWS=-4, Fig.2 A).

      (2) Fractal cycle computation is automatic and thus objective, whereas classical sleep cycle detection is usually based on the visual inspection of hypnograms, which is time-consuming, subjective and error-prone. Few automatic algorithms are available for sleep cycle detection, which only moderately correlated with classical cycles detected by human raters (r’s = 0.3 – 0.7 in different datasets here).

      (3) Defining the precise end of a classical sleep cycle with skipped REM sleep that is common in children, adolescents and young adults using a hypnogram is often difficult and arbitrary.   The fractal cycle algorithm could detect such cycles in 93% of cases while the hypnogram-based agreement on the presence/absence of skipped cycles between two independent human raters was 61% only; thus, 32% lower.

      (4) The fractal analysis showed a stronger effect size, higher F-value and R-squared than the classical analysis for the cycle duration comparison in children and adolescents vs young adults. The first and second fractal cycles were significantly shorter in the pediatric compared to the adult group, whereas the classical approach could not detect this difference.

      (5) Fractal – but not classical – cycle durations correlated with the age of adult participants.

      These bullets are now summarized in Table 5 that has been added to the Discussion of the revised manuscript.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Liu and colleagues applied the hidden Markov model on fMRI to show three brain states underlying speech comprehension. Many interesting findings were presented: brain state dynamics were related to various speech and semantic properties, timely expression of brain states (rather than their occurrence probabilities) was correlated with better comprehension, and the estimated brain states were specific to speech comprehension but not at rest or when listening to non-comprehensible speech.

      Strengths:

      Recently, the HMM has been applied to many fMRI studies, including movie watching and rest. The authors cleverly used the HMM to test the external/linguistic/internal processing theory that was suggested in comprehension literature. I appreciated the way the authors theoretically grounded their hypotheses and reviewed relevant papers that used the HMM on other naturalistic datasets. The manuscript was well written, the analyses were sound, and the results had clear implications.

      Weaknesses:

      Further details are needed for the experimental procedure, adjustments needed for statistics/analyses, and the interpretation/rationale is needed for the results.

      We greatly appreciate the reviewers for the insightful comments and constructive suggestions. Below are the revisions we plan to make:

      (1) Experimental Procedure: We will provide a more detailed description of the stimuli and comprehension tests in the revised manuscript. Additionally, we will upload the corresponding audio files and transcriptions as supplementary data to ensure full transparency. 

      (2) Statistics/Analyses: In response to the reviewer's suggestions, we have reproduced the states' spatial maps using unnormalized activity patterns. For the resting state, we observed a state similar to the baseline state described by Song, Shim, & Rosenberg (2023). However, for the speech comprehension task, all three states showed network activity levels that deviated significantly from zero. Furthermore, we regenerated the null distribution for behavior-brain state correlations using a circular shift approach, and the results remain largely consistent with our previous findings. We have also made other adjustments to the analyses and introduced some additional analyses, as per the reviewer's recommendations. These changes will be incorporated into the revised manuscript.

      (3) Interpretation/Rationale: We will expand on the interpretation of the relationship between state occurrence and semantic coherence. Specifically, we will highlight that higher semantic coherence may enable the brain to more effectively accumulate information over time. State #2 appears to be involved in the integration of information over shorter timescales (hundreds of milliseconds), while State #3 is engaged in longer timescales (several seconds). 

      Reviewer #2 (Public review):

      Liu et al. applied hidden Markov models (HMM) to fMRI data from 64 participants listening to audio stories. The authors identified three brain states, characterized by specific patterns of activity and connectivity, that the brain transitions between during story listening. Drawing on a theoretical framework proposed by Berwick et al. (TICS 2023), the authors interpret these states as corresponding to external sensory-motor processing (State 1), lexical processing (State 2), and internal mental representations (State 3). States 1 and 3 were more likely to transition to State 2 than between one another, suggesting that State 2 acts as a transition hub between states. Participants whose brain state trajectories closely matched those of an individual with high comprehension scores tended to have higher comprehension scores themselves, suggesting that optimal transitions between brain states facilitated narrative comprehension.

      Overall, the conclusions of the paper are well-supported by the data. Several recent studies (e.g., Song, Shim, and Rosenberg, eLife, 2023) have found that the brain transitions between a small number of states; however, the functional role of these states remains under-explored. An important contribution of this paper is that it relates the expression of brain states to specific features of the stimulus in a manner that is consistent with theoretical predictions.

      (1) It is worth noting, however, that the correlation between narrative features and brain state expression (as shown in Figure 3) is relatively low (~0.03). Additionally, it was unclear if the temporal correlation of the brain state expression was considered when generating the null distribution. It would be helpful to clarify whether the brain state expression time courses were circularly shifted when generating the null. 

      We have regenerated the null distribution by circularly shifting the state time courses. The results remain consistent with our previous findings: p = 0.002 for the speech envelope, p = 0.007 for word-level coherence, and p = 0.001 for clause-level coherence. 

      We notice that in other studies which examined the relationship between brain activity and word embedding features, the group-mean correlation values are similarly low but statistically significant and theoretically meaningful (e.g., Fernandino et al., 2022; Oota et al., 2022). We think these relatively low correlations is primarily due to the high level of noise inherent in neural data. Brain activity fluctuations are shaped by a variety of factors, including task-related cognitive processing, internal thoughts, physiological states, as well as arousal and vigilance. Additionally, the narrative features we measured may account for only a small portion of the cognitive processes occurring during the task. As a result, the variance in narrative features can only explain a limited portion of the overall variance in brain activity fluctuations.

      We will update Figure 3 and relevant supplementary figures to reflect the new null distribution generated via circular shift. Furthermore, we will expand the discussion to address why the observed brain-stimuli correlations are relatively small, despite their statistical significance.

      (2) A strength of the paper is that the authors repeated the HMM analyses across different tasks (Figure 5) and an independent dataset (Figure S3) and found that the data was consistently best fit by 3 brain states. However, it was not entirely clear to me how well the 3 states identified in these other analyses matched the brain states reported in the main analyses. In particular, the confusion matrices shown in Figure 5 and Figure S3 suggests that that states were confusable across studies (State 2 vs. State 3 in Fig. 5A and S3A, State 1 vs. State 2 in Figure 5B). I don't think this takes away from the main results, but it does call into question the generalizability of the brain states across tasks and populations. 

      We identified matching states across analyses based on similarity in the activity patterns of the nine networks. For each candidate state identified in other analyses, we calculate the correlation between its network activity pattern and the three predefined states from the main analysis, and set the one it most closely resembled to be its matching state. For instance, if a candidate state showed the highest correlation with State #1, it was labelled State #1 accordingly. 

      Each column in the confusion matrix depicts the similarity of each candidate state with the three predefined states. In Figure S3 (analysis for the replication dataset), the highest similarity occurred along the diagonal of the confusion matrix. This means that each of the three candidate states was best matched to State #1, State #2, and State #3, respectively, maintaining a one-to-one correspondence between the states from two analyses.

      For the comparison of speech comprehension task with the resting and the incomprehensible speech condition, there was some degree of overlap or "confusion." In Figure 5A, there were two candidate states showing the highest similarity to State #2. In this case, we labelled the candidate state with the the strongest similarity as State #2, while the other candidate state is assigned as State #3 based on this ranking of similarity. This strategy was also applied to naming of states for the incomprehensible condition. The observed confusion supports the idea that the tripartite-state space is not an intrinsic, task-free property. To make the labeling clearer in the presentation of results, we will use a prime symbol (e.g., State #3') to indicate cases where such confusion occurred, helping to distinguish these ambiguous matches.

      In the revised manuscript, we will give a detailed illustration for how the correspondence of states across analyses were made. 

      (3) The three states identified in the manuscript correspond rather well to areas with short, medium, and long temporal timescales (see Hasson, Chen & Honey, TiCs, 2015). Given the relationship with behavior, where State 1 responds to acoustic properties, State 2 responds to word-level properties, and State 3 responds to clause-level properties, the authors may want to consider a "single-process" account where the states differ in terms of the temporal window for which one needs to integrate information over, rather than a multi-process account where the states correspond to distinct processes.

      The temporal window hypothesis indeed provides a better explanation for our results. Based on the spatial maps and their modulation by speech features, States #1, #2, and #3 seem to correspond to the short, medium, and long processing timescales, respectively. We will update the discussion to reflect this interpretation. 

      We sincerely appreciate the constructive suggestions from the two anonymous reviewers, which have been highly valuable in improving the quality of the manuscript.

    1. Author response:

      Joint Public Review:

      Strengths:

      The insulin-dependent signaling in the central nervous system is relatively understudied. This explorative study delves into several interesting and clinically relevant possibilities, examining how insulin-dependent signaling and its crosstalk with WNK kinases might affect brain circuits involved in memory formation and/or anxiety. Therefore, these findings might inspire follow-up studies performed in disease models for disorders that exhibit impaired glucose metabolism, deficient memory, or anxiety, such as Diabetes mellitus, Alzheimer's disease, or most psychiatric disorders.

      The graphical presentation of the figures is of high quality, which helps the reader to obtain a good overview and easily understand the experimental design, results, and conclusions.

      The behavioral studies are well conducted and provide valuable insights into the role of WNK kinases in glucose metabolism and their effect on learning and memory. Additionally, the authors evaluate the levels of basal and induced anxiety in Figures 1 and 2, enhancing our understanding of how WNK signaling might engage in cognitive function and anxiety-like behavior, particularly in the context of altered glucose metabolism.

      We thank the reviewers for recognizing the strengths of our study.

      Weaknesses:

      The study used a WNK643 inhibitor as the only tool to manipulate WNK1-4 activity. This inhibitor seems selective; however, it has been reported that it exhibits different efficiency in inhibiting the individual WNK kinases among each other (e.g. PMID: 31017050, PMID: 36712947). Additionally, the authors do not analyze nor report the expression profiles or activity levels of WNK1, WNK2, WNK3, and WNK4 within the relevant brain regions (i.e. hippocampus, cortex, amygdala). Combined, these weaknesses raise concerns about the direct involvement of WNK kinases within the selected brain regions and behavior circuits. It would be beneficial if the authors provided gene profiling for WNK1, 2, 3, and -4 (e.g. using Allen brain atlas). To confirm the observations, the authors should either add results from using other WNK inhibitors or, preferentially, analyze knock-down or knock-out animals/tissue targeting the single kinases.

      We thank the reviewers for the suggestions. To address the criticism and as recommended, we have planned to include gene profiling for WNK1-4 in the brain from Allen brain atlas. Additionally, we have planned to include the effect of WNK1 knockdown on pAKT levels in immortalized SHSY5Y cells.

      The authors do not report any data on whether the global inhibition of WNKs affects insulin levels. Since the authors wish to demonstrate the synergistic effect of simultaneous insulin treatment and WNK1-4 inhibition, such data are missing.

      To address this critique, we have planned to include plasma insulin levels upon global inhibition of WNKs using WNK463 in C57BL/6J mice.

      The study discovered that the Sortilin receptor binds to OSR1, leading the authors to speculate that Sortilin may be involved in the insulin-dependent GLUT4 surface trafficking. However, the authors do not provide any evidence supporting Sortilin's involvement in insulin- or WNK-dependent GLUT4 trafficking. Thus, this conclusion should be qualified, rephrased, or additional data included.

      We thank the reviewers for suggesting experiments that will significantly enhance the clarity of our conclusions. We have planned to include immunofluorescence staining data for sortilin localization in SHSY5Y cells under conditions of DMSO, insulin and/or WNK463 treatment. These data would suggest whether WNK463 treatment affects localization of sortilin in the golgi network which has been shown by previous studies to affect sortilin-dependent GLUT4 trafficking.

    1. Author response:

      We would like to thank the reviewers for their positive evaluation of our work, and their comments inspiring useful discussion. We will provide an in-depth response once one of the key authors has returned from parental leave (in some months), but below we share initial thoughts:  

      Both reviewers asked to see more gaze data to understand how eye movements in patients with achromatopsia might drive our results. We will expand our analyses of eye tracking data and discuss the implications in more depth, but would like to note that our key findings (no change in signal coverage in the foveal rod-scotoma projection zone in achromats, and changes in connective fields) are both robust to eye movement, and unlikely to be driven by gaze differences. Where this is less clear (i.e., population Receptive Field eccentricities are shifted outwards and increased in size), we have highlighted this and avoided drawing strong conclusions. 

      Reviewer 1 questioned why smaller connective fields (CFs) were observed in achromats, suggesting that their flatter V1 eccentricity tuning should predict larger CFs. It’s not straightforward to predict how V1's population receptive field (pRF) tuning profile shapes V3's sampling extent, as CFs are driven, but not dictated by V1 - they combine and integrate V1 signals. As we’re dealing with an atypically developed visual system, assumptions about expected relationships are complicated further. We believe that the most relevant aspect of pRF data to the interpretability of V3 CF extent, is the ratio between V1 and V3 pRF sizes. Our outcomes show that pRF sizes in achromats, while larger in V1, are more normalized in V3, predicting more local V3 sampling from V1. This is what our quantifications of CF size show across two independent measures with different stimuli. We will provide further data to address reviewer 1's various queries about the potential causes of the pRF eccentricity shifts in achromats, the relationship between pRFs and CFs, and methodological details of CF fits.

      We thank the reviewers again for their insightful  comments and look forward to providing more comprehensive responses to their queries substantiated with data as soon as possible.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      The findings of Ziolkowska and colleagues show that a specific projection from the nucleus reuniens of the thalamus (RE) to dorsal hippocampal CA1 neurons plays an important role in fear extinction learning in male and female mice. In and of itself, this is not a particularly new finding, although the authors' identification of structural alterations from within dorsal CA1 stratum lacunosum moleculare (SLM) as a candidate mechanism for the learning-related plasticity is potentially novel and exciting. The authors use a range of anatomical and functional approaches to demonstrate structural synaptic changes in dorsal CA1 that parallel the necessary role of RE inputs in modulating extinction learning. Yet, the significance of these findings is substantially limited by several technical shortcomings in the experimental design, and the authors' central interpretation. Otherwise, there remain several strengths in the design and interpretation that offset some of these concerns.

      Given that much is already known about the role of RE and hippocampus in modulating fear learning and extinction, it remains unclear whether addressing these concerns would substantially increase the impact of this study beyond the specific area of speciality. Below, several major weaknesses will be highlighted, followed by several miscellaneous comments.

      Methodological:

      (1) One major methodological weakness in the experimental design involves the widespread misapplication of Ns used for the statistical analyses. Much of the anatomical analyses of structural synaptic changes in the RE-CA1 pathway use N = number of axons (Figs. 1, 2), N = number of dendrites (Figs. 3, 4), and N = number of sections (Fig. 7; note that there are 7 figures in total). In every instance, N = animal number should be used. It is unclear which of these results would remain significant if N = animal number were used in each or how many more animals would be required. This is problematic since these data comprise the main evidence for the authors' central conclusion that specific structural synaptic changes are associated with fear extinction learning.

      We do agree with the reviewer that N = animal number is the preferred way to present data in most of our experiments. However, in some experimental groups we observed a very low number of entries. For example, in the 5US group we found RE+/+ spines only in 3 out of 6 analyzed animals. We believe that this observation is not due to technical problems as mCherry virus transduction required to find RE+/+ spines is similar in all experimental groups and we analyzed similar volumes of tissue. While this result still allows the calculation of density of RE+/+ spines per animal it generates no entries for spine area and PSD95 mean gray value if N = animal number. Hence, we decided to use N=animals to calculate spines and boutons densities, and N=dendritic spines/boutons to calculate other spine/bouton parameters.

      (2) There is a lack of specific information regarding what constitutes learning with respect to behavioral freezing. It is never clearly stated what specific intervals are used over which freezing is measured during acquisition, extinction, and in extinction retrieval tests. Additionally, assessment of freezing during retrieval at 5- and 30-min time points doesn't lay to rest the possibility that there were differences in the decay rate over the 30-min period (also see below).

      We added a detailed description of how learning was assessed.

      ln 125-134: For assessment of learning we used percent of time spent by animals freezing (% freezing). Freezing behavior was defined as complete lack of movement, except respiration. To assess within-session learning (working memory) we compared pre- and post-US freezing frequency (the first 148 sec vs last 30 sec) during the CFC session (day 1). To assess formation of long-term contextual fear memory, we compared pre-US freezing (day 1) and the first 5 minutes of the Extinction session (day 2). To assess within session contextual fear extinction we ran 2-way ANOVA to assess the effect of time and manipulation on freezing frequency. Freezing data were analyzed in 5-minute bins. To assess formation of long-term contextual fear extinction memory we compared the first 5 minutes of the Extinction session (day 2) and Test session (day 3).

      As suggested by the reviewer, we also added data for all six 5-minut bins of Extinction sessions. 

      (3) A minor-to-moderate methodological weakness concerns the authors' decision to utilize saline injected groups as controls for the chemogenetics experiments (Figs. 5, 6). The correct design is to have a CNO-only group with the same viral procedure sans hM4Di. This concern is partly mitigated by the inclusion of a CNO vs. saline injection control experiment (Fig. 6).

      Figure 5 does not describe a chemogenetic experiment.

      We added new groups with control virus (CNO vs saline) to Figure 6 (now Fig. 6D and H). 

      The chemogenetic experiment shown on Figure 7 has all 4 experimental groups (Control vs hM4Di and saline vs CNO).

      (4) In the electron microscopic analyses of dendritic spines (Fig. 5), comparison of only the fear acquisition versus extinction training, and the lack of inclusion of a naïve control group, makes it difficult to understand how these structural synaptic changes are occurring relative to baseline. It is noteworthy that the authors utilize the tripartite design in other anatomical analyses (Fig. 2-4).

      We added data for the Naive mice to Figure 5.

      (5) Interpretation:

      The main interpretive weakness in the study is the authors' claim that their data shows a role for the RE-CA1 pathway in memory consolidation (i.e., see Abstract). This claim is based on the premise that, although RE-CA1 pathway inactivation with CNO treatment 30 min prior to contextual fear extinction did not affect freezing at 5- and 30-min time points relative to saline controls, these rats showed greater freezing when tested on extinction retrieval 24 h thereafter. First, the data do not rule out possible differences in the decay rate of freezing during extinction training due to CNO administration. Next, the fact that CNO is given prior to training still leaves open the possibility that acquisition was affected, even if there were not any frank differences in freezing. Support for this latter possibility derives from the fact that mice tested for extinction retrieval as early as 5 min after extinction training (Fig. 6C) showed the same impairments as mice tested 24 h later (Figs. 6A). Further, all the structural synaptic changes argued to underlie consolidation were based on analysis at a time point immediately following extinction training, which is too early to allow for any long-term changes that would underlie memory consolidation, but instead would confer changes associated with the extinction training event.

      We do agree with the reviewer that our data do not allow us to conclude whether RE-CA1 pathway is involved in acquisition or consolidation of CFE memory. Therefore, we avoid those terms in the manuscript. We just conclude that RE→CA1 participates in the CFE.

      Reviewer #2 (Public review):

      Summary:

      Ziółkowska et al. characterize the synaptic mechanisms at the basis of the REdCA1 contribution to the consolidation of fear memory extinction. In particular, they describe a layer specific modulation of RE-dCA1 excitatory synapses modulation associated to contextual fear extinction which is impaired by transient chemogenetic inhibition of this pathway. These results indicate that RE activity-mediated modulation of synaptic morphology contributes to the consolidation of contextual fear extinction

      Strengths:

      The manuscript is well conceived, the statistical analysis is solid and methodology appropriate. The strength of this work is that it nicely builds up on existing literature and provides new molecular insight on a thalamo-hippocampal circuit previously known for its role in fear extinction. In addition, the quantification of pre- and post-synapses is particularly thorough.

      Weaknesses:

      The findings in this paper are well supported by the data more detailed description of the methods is needed.

      (1) In the paragraph Analysis of dCA1 synapses after contextual fear extinction (CFE), more experimental and methodological data should be given in the text: 

      - how was PSD95 used for the analysis, what was the difference between RE. Even if Thy1-GFP mice were used in Fig.2, it appears they were not used for bouton size analysis. To improve clarity, I suggest moving panel 2C to Figure 3. It is not clear whether all RE axons were indiscriminately analysed in Fig. 2 or if only the ones displaying colocalization with both PSD95 and GFP were analysed. If GFP was not taken into account here, analysed boutons could reflect synapses onto inhibitory neurons and this potential scenario should be discussed.

      PSD-95 immunostaining in close apposition to boutons was used to identify RE buttons innervating CA1 (Fig 1 and 2). In these cases PSD-95 signal was not quantified. PSD-95 in close apposition to dendritic spines was used as a proxy of PSDs in CA1 (Figure 3, 4 and 7). In these cases we assessed the integrated mean gray value of PSD-95 signal per dendritic spine (Figure 3, 4) or per ROI (Figure 7). This is explained in detail in the section Confocal microscopy and image quantification (ln 149-172).

      GFP signal was not taken into account during boutons analysis. This is explained in the materials and methods section Confocal microscopy and image quantification (ln 149-172).

      We indicate that PSD-95 is a marker of excitatory synapses located both on excitatory and inhibitory neurons.

      Ln 258: RE boutons were identified in SO and SLM as axonal thickenings in close apposition to PSD-95-positive puncta (a synaptic scaffold used as a marker of excitatory synapses located both on excitatory and inhibitory neurons (Kornau et al., 1995; El-Husseini et al., 2000; Chen et al., 2011; Dharmasri et al., 2024). 

      We also cite literature demonstrating that RE projects to the hippocampal formation and forms asymmetric synapses with dendritic spines and dendrites, suggesting innervation of excitatory synapses on both excitatory and aspiny inhibitory neurons (ln 673).

      As advised by the reviewer the Figure 2C panel was moved to Figure 3 (now it is Fig 3A).

      (2) in the methods: The volume of intra-hippocampal CNO injections should be indicated. The concentration of 3 uM seems pretty low in comparison with previous studies. CNO source is missing.

      This section has been rewritten to be more clear. The concentration of CNO was chosen based on the previous studies (Stachniak et al., 2014).

      ln 103: Cannula placement. Mice were anesthetized by inhalation of 3–5% isoflurane (IsoFlo; Abbott Animal Health) in oxygen and positioned in a stereotaxic frame (51503, Stoelting, Wood Dale, IL, USA). Two holes were drilled in the skull, and a double guide cannulae (2 mm apart and 2 mm long; 26GA, Plastics One) was lowered into the holes such that the cannula tip was located over dorsal CA1 area (2 mm posterior to bregma, ±1 mm lateral, and −1.3 mm vertical). Cannulae were kept patent by using 33-gauge internal dummy cannulae (Plastics One). The animals were used in contextual fear conditioning 21 days after the cannulation. Animals received bilateral CNO (3 μM, 0.2 μl per side for 1 min; Tocris Bioscience, Cat. No. 4936) (Stachniak et al., 2014) or saline injections (0.2 μl per side) 30 minutes before Extinction session via intrahippocampal injection cannulae (33-gauge). After the infusion, the cannula was left in place for 30 seconds. The cannula placement was verified by histology, and only data from animals with correct cannula implants were included in statistical analyses.”

      (3) More details of what software/algorithm was used to score freezing should be included. 

      Freezing was automatically scored with VideoFreeze™ Software (Med Associates Inc.).

      (4) Antibody dilutions for IHC should be indicated. Secondary antibody incubation time should be indicated.

      The missing information is added.

      ln 144: Next, sections were incubated in 4°C overnight with primary antibodies directed against PSD-95 (1:500, Millipore, MAB 1598), washed three times in 0.3% Triton X-100 in PBS and incubated in room temperature for 90 minutes with a secondary antibody bound with Alexa Fluor 647 (1:500, Invitrogen, A31571). 

      (5) No statement about code and data availability is present.

      The statements are added.

      ln 785: Row data and the code used for analysis of confocal data is available at OSF (https://osf.io/bnkpx/).

      Reviewer #3 (Public review):

      Summary:

      This paper examined the role of nucleus reuniens (RE) projections to dorsal CA1 neurons in context fear extinction learning. First, they show that RE neurons send excitatory projections to the stratum oriens (SO) and the stratum lacunosum moleculare (SLM), but not the stratum radiatum (SR). After context fear conditioning, the synaptic connections between RE and dCA1 neurons in the SLM (but not the SO) are weakened (reduced bouton and spine density) after mice undergo context fear conditioning. This weakening is reversed by extinction learning, which leads to enhanced synaptic connectivity between RE inputs and dendrites in the SLM. Control experiments demonstrate that the observed changes are due to extinction and not caused by simple exposure to the context. Extinction learning also induced increases in the size (volume and surface area) of the post-synaptic density (PSD) in SLM. To establish the functional role of RE inputs to dCA1, the researchers used an inhibitory DREADD to silence this pathway during extinction learning. They observe that extinction memory (measured 2-hours or 24-hours later) is impaired by this inhibition. Control experiments show that the extinction memory deficit is not simply due to increased freezing caused by inactivation of the pathway or injections of CNO. Inhibiting the RO projection during extinction learning also reduced the levels of PSD-95 protein levels in the spines of dCA1 neurons.

      Strengths:

      Based on their results, the authors conclude that, "the RE→SLM pathway participates in the updating of fearful context value by actively regulating CFE-induced molecular and structural synaptic plasticity in the SLM.". I believe the data are generally consistent with this hypothesis, although there is an important control condition missing from the behavioral experiments.

      Weaknesses:

      (1) A defining feature of extinction learning is that it is context specific (Bouton, 2004). It is expressed where it was learned, but not in other environments. Similarly, it has been shown that internal contexts (or states) also modulate the expression of extinction (Bouton, 1990). For example, if a drug is administered during extinction learning, it can induce a specific internal state. If this state is not present during subsequent testing, the expression of extinction is impaired just as it is when the physical context is altered (Bouton, 2004). It is possible that something similar is happening in Figure 6. In these experiments, CNO is administered to inactivate the RE-dCA1 projection during extinction learning. The authors observe that this manipulation impairs the expression of extinction the next day (or 2-hours later). However, the drug is not given again during the test. Therefore, it is possible that CNO (and/or inactivation of the RE-dCA1 pathway) induces a state change during extinction that is not present during subsequent testing. Based on the literature cited above, this would be expected to disrupt fear extinction as the authors observed. To determine if this alternative explanation is correct, the researchers need to add groups that receive CNO during extinction training and subsequent extinction testing. If the deficits in extinction expression reported in Figure 6 result from a state change, then these groups should not exhibit an impairment. In contrast, if the authors' account is correct, then the expression of extinction should still be disrupted in mice that receive CNO during training and testing.

      We do agree with the reviewer that such an experiment would be interesting. However, it could be also confusing as we could not distinguish whether the possible behavioral effects are related to the state-dependent aspects of CFE or impaired recall of CFE. Importantly, previous studies showed that RE is crucial for extinction recall (Totty et al., 2023). We also show that CFE memory is impaired not only when the animals recall CFE without CNO (day 3) but also with CNO (day 4) (Figure 6C). Moreover, we do not see the effects of CNO on CFE in the control groups (Figure 6D and H). So we believe that it is unlikely that CNO results in state-dependent CFE.

      (2) In their analysis of dCA1 synapses after contextual fear extinction (CFE) (Figure 4), the authors should have compared Ctx and Ctx-Ctx animals against naïve animals (as they did in Figure 3) when comparing 5US and Ext with naïve animals. Otherwise, the authors cannot make the following conclusion; "since changes of SLM synapses were not observed in the animals exposed to the familiar context that was not associated with the USs, our data support the role of the described structural plasticity at the RE→SLM synapses in CFE, rather than in processing contextual information in general.".

      We assume that the key experimental groups to conclude about synaptic plasticity related to particular behavior are the groups that differ just by one factor/experience. For CFE that would be mice sacrificed immediately before and after CFE session (Figure 2 & 3); on the other hand to conclude about the effects of the re-exposure to the neutral context mice sacrificed before and after second exposure to the neutral context are needed (Figure 4). The naive group, as it differs by at least two manipulations from the Ext and Ctx-Ctx groups, is interesting but not crucial in both cases. This group would be necessary if we focused on the memories of FC or novel context. However, these topics are not the main focus of the current manuscript. Still, the naive group is shown on Figures 2 & 3 to check if CFE brings spine parameters to the levels observed in mice with low freezing.

      We have re-written the cited paragraph to be more precise in our conclusions. 

      "Overall, our data demonstrate that synapses in all dCA1 strata undergo structural or molecular changes relevant to CFC and/or CFE. However, only in SLM CFE-induced synaptic changes are likely to be directly regulated by RE inputs as they appear on RE+ dendrites and spines. Since such changes of SLM synapses were not observed in the animals re-exposed to the neutral context, our data support the role of the described structural plasticity at the RE→SLM synapses in CFE, rather than in processing contextual information in general."

      (3) In the materials and methods section, the description of cannula placements is confusing and needs to be rewritten.

      This section has been rewritten.

      ln 103: Cannula placement. Mice were anesthetized by inhalation of 3–5% isoflurane (IsoFlo; Abbott Animal Health) in oxygen and positioned in a stereotaxic frame (51503, Stoelting, Wood Dale, IL, USA). Two holes were drilled in the skull, and a double guide cannulae (2 mm apart and 2 mm long; 26GA, Plastics One) was lowered into the holes such that the cannula tip was located over dorsal CA1 area (2 mm posterior to bregma, ±1 mm lateral, and −1.3 mm vertical). Cannulae were kept patent by using 33-gauge internal dummy cannulae (Plastics One). The animals were used in contextual fear conditioning 21 days after the cannulation. Animals received bilateral CNO (3 μM, 0.2 μl per side for 1 min; Tocris Bioscience, Cat. No. 4936) (Stachniak et al., 2014) or saline injections (0.2 μl per side) 30 minutes before Extinction session via intrahippocampal injection cannulae (33-gauge). After the infusion, the cannula was left in place for 30 seconds. The cannula placement was verified by histology, and only data from animals with correct cannula implants were included in statistical analyses.”

    1. Author response:

      We are grateful to the reviewers for their thoughtful and constructive feedback on our manuscript. Based on the Public Reviews, we will address the concerns raised by each reviewer through a combination of new analyses, clarifications, and expanded discussion as outlined below:

      Reviewer #1:

      (1) Integration of Positive Selection Results:

      We will enhance the integration of positive selection analyses throughout the manuscript. Specifically, we will discuss how the positively selected sites in primates, including site 193, inform IFIT1 function. We will expand the discussion to explain how PAML, FUBAR, and MEME complement each other and why MEME did not detect site 193 in primates. Additionally, we will provide a rationale for focusing on the three sites identified in primates and address the overlap with bat orthologs.

      (2) Expression Levels and Antiviral Activity:

      We acknowledge the variability in IFIT1 ortholog expression levels. To address this, we will quantify and normalize protein expression to GAPDH across all orthologs, allowing for a more accurate comparison of antiviral activity. We will revise the text to clarify that species-specific diVerences in viral suppression may be influenced by expression levels.

      (3) Clarification of Terminology and Data Interpretation:

      We will refine our description of the antiviral eVects observed for SINV in Figure 4E. We will also revise statements related to protein expression in the relevant sections to improve accuracy.

      (4) Cohesion of Data:

      We will work to more tightly connect the evolutionary analysis with the functional virology data, framing the manuscript around how positive selection shapes IFIT1 function across species. 

      Reviewer #2:

      (1) Recombination Analysis of IFIT1:

      We will conduct a recombination analysis using GARD from the HyPhy package to ensure that the signatures of positive selection are not confounded by recombination between IFIT1 and IFIT1B. 

      (2) Clarification of IFIT1 Homologs Studied:

      We will provide additional details on how IFIT1 orthologs were selected, including addressing the relationship between IFIT1 and IFIT1B. We will support this by presenting additional sequence comparisons to demonstrate the orthology of the proteins studied.

      (3) Chimpanzee IFIT1 Loss of Function:

      We will revise the discussion of chimpanzee IFIT1 to better reflect the data. 

      (4) Presentation of Antiviral Specificity Data:

      We will include a supplementary table listing the percentage of infection normalized to control by VSV and VEEV for each ortholog to allow for clearer comparisons.

      Additionally, we will provide an alternative visualization to better compare the data sets. 

      Reviewer #3:

      (1) Alternative Hypotheses for IFIT1 Antiviral Activity such as IFIT1-IFIT interactions:

      We will expand the discussion to consider alternative hypotheses, including the potential for IFIT1 activity to be regulated through interactions with other IFIT family members. Therefore, we will address how IFIT1-IFIT interactions may be broadly applicable to our findings with IFIT1 orthologs. In addition, we will clarify that we do not conclude that residues 362/4/6 are the sole drivers of antiviral specificity across the orthologs tested in this study.

      (2) Generalization of Findings Across Orthologs:

      We acknowledge that the functional importance of residues 362/4/6 may not be generalizable across all orthologs. We will discuss this limitation more explicitly in the manuscript, while also expanding on how these findings apply specifically to primate IFIT1 orthologs.

      We believe that these revisions will address the key concerns raised by the reviewers and strengthen the manuscript. We look forward to submitting the revised version for further consideration.

    1. Author Response:

      We are grateful to the reviewers for their encouraging comments and constructive suggestions. These suggestions will be valuable to improve the revised manuscript.

      Reviewer 1:

      PD-1 signaling is suppressive to the establishment of cytokine-producing effector cells in general. However, as the reviewer pointed out, one of the results in Fig. 2H showing a decrease of IFN-gamma-producing cells is against this trend. The data indicate percentages of cytokine-producing cells, which are not always consistent with the absolute number of activated T cells. Nonetheless, we plan additional experiments in order to address the question.

      For PD-1YFYF experiments in Figs. 3-5, there were moderate changes in cytokine production between wild-type and mutant PD-1. We conducted gene transduction to newly prepared T cells in each experiment. In addition, to monitor the immunosuppressive effect of PD-1 agonist antibodies, these T cells were stimulated using PD-L1-deficient APC. Therefore, we think these cytokine levels were most likely a technical variation, but not specific function of PD-1YFYF.

      Anti-PD-L1 mAb was used for the optimal blockade of PD-1/PD-L1 blockade, and the concentration of antibody (5 microg/ml) is within a normal range for this purpose. We used variable concentrations of OVA peptide to set up experiments with different intensities of TCR stimulation. TCR signal intensity has been shown to affect CD4+ T cell differentiation into Th1 and Th2 cells. We lowered the peptide concentration to test the effect of PD-1 signals under the suboptimal TCR stimulation.

      Reviewer 2:

      Antigen-specific T cells from immunized mice are not ideal for Th differentiation studies because activated T cells in response to the antigen might have already undergone functional differentiation in vivo. Incorporating the reviewer’s suggestion, we will test alternative approach including human CD4+ T cells.

      For the allergy model, we will expand the analysis for inflammatory effectors.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      The main research question could be defined more clearly. In the abstract and at some points throughout the manuscript, the authors indicate that the main purpose of the study was to assess whether the allocation of endogenous attention requires saccade planning [e.g., ll.3-5 or ll.247-248]. While the data show a coupling between endogenous attention and saccades, they do not point to a specific direction of this coupling (i.e., whether endogenous attention is necessary to successfully execute a saccade plan or whether a saccade plan necessarily accompanies endogenous attention).

      Thanks for the suggestion. We have modified the text in the abstract and at various points in the text to make it more clear that the study investigates the relationship between attention and saccades in one particular direction, first attentional deployment and then saccade planning.

      Some of the analyses were performed only on subgroups of the participants. The reporting of these subgroup analyses is transparent and data from all participants are reported in the supplementary figures. Still, these subgroup analyses may make the data appear more consistent, compared to when data is considered across all participants. For instance, the exogenous capture in Experiments 1 and 2 appears much weaker in Figure 2 (subgroup) than Figure S3 (all participants). Moreover, because different subgroups were used for different analyses, it is often difficult to follow and evaluate the results. For instance, the tachometric curves in Figure 2 (see also Figure 3 and 4) show no motor bias towards the cue (i.e., performance was at ~50% for rPTs <75 ms). I assume that the subsequent analyses of the motor bias were based on a very different subgroup. In fact, based on Figure S2, it seems that the motor bias was predominantly seen in the unreliable participants. Therefore, I often found the figures that were based on data across all participants (Figures 7 and S3) more informative to evaluate the overall pattern of results.

      Indeed, our intent was to dissociate the effects on saccade bias and timing as clearly as possible, even if that meant having to parse the data into subgroups of participants for different analyses. We do think conceptually this is the better strategy, because the bias and timing effects were distinct and not strongly correlated with specific participants or task variants. For instance, the unreliable participants were somewhat more consistently biased in the same direction, but the reliable participants also showed substantial biases, so the difference in magnitude was relatively modest. This can be more easily appreciated now that the reliable and unreliable participants are indicated in Figures 3 and 5. The impact of the bias is also discussed further in the last paragraphs of the Results, which note that the bias was not a reliable predictor of overall success during informed choices.

      Reviewer #3 (Public Review):

      (1) In this experimental paradigm, participants must decide where to saccade based on the color of the cue in the visual periphery (they should have made a prosaccade toward a green cue and an antisaccade away from a magenta cue). Thus, irrespective of whether the cue signaled that a prosaccade or an antisaccade was to be made, the identity of the cue was always essential for the task (as the authors explain on p. 5, lines 129-138). Also, the location where the cue appeared was blocked, and thus known to the participants in advance, so that endogenous attention could be directed to the cue at the beginning of a trial (e.g., p. 5, lines 129-132). These aspects of the experimental paradigm differ from the classic prosaccade/antisaccade paradigm (e.g. Antoniades et al., 2013, Vision Research). In the classic paradigm, the identity of the cues does not have to be distinguished to solve the task, since there is only one stimulus that should be looked at (prosaccade) or away from (antisaccade), and whether a prosaccade or antisaccade was required is constant across a block of trials. Thus, in contrast to the present paradigm, in the classic paradigm, the participants do not know where the cue is about to appear, but they know whether to perform a prosaccade or an antisaccade based on the location of the cue.

      The present paradigm keeps the location of the cue constant in a block of trials by intention, because this ensures that endogenous attention is allocated to its location and is not overpowered by the exogenous capture of attention that would happen when a single stimulus appeared abruptly in the visual field. Thus, the reason for keeping the location of the cue constant seems convincing. However, I wondered what consequences the constant location would have for the task representations that persist across the task and govern how attention is allocated. In the classic paradigm, there is always a single stimulus that captures attention exogenously (as it appears abruptly). In a prosaccade block, participants can prioritize the visual transient caused by the stimulus, and follow it with a saccade to its coordinates. In an antisaccade block, following the transient with a saccade would always be wrong, so that participants could try to suppress the attention capture by the transient, and base their saccade on the coordinates of the opposite location. Thus, in prosaccade and antisaccade blocks, the task representations controlling how visual transients are processed to perform the task differ. In the present task, prosaccades and antisaccades cannot be distinguished by the visual transients. Thus, such a situation could favor endogenous attention and increase its influence on saccade planning, even though saccade planning under more naturalistic conditions would be dominated by visual transients. I suggest discussing how this (and vice versa the emphasis on visual transients in the classic paradigm) could affect the generality of the presented findings (e.g., how does this relate to the interpretation that saccade plans are obligatorily coupled to endogenous attention? See, Results, p. 10, lines 306-308, see also Deubel & Schneider, 1996, Vision Research).

      Great discussion point. There are indeed many ways to set up an experiment where one must either look to a relevant cue or look away from it. Furthermore, it is also possible to arrange an experiment where the behavior is essentially identical to that in the classic antisaccade task without ever introducing the idea of looking away from something (Oor et al., 2023). More important than the specific task instructions or the structure of the event sequence, we think the fundamental factors that determine behavior in all of these cases are the magnitudes of the resulting exogenous and endogenous signals, and whether they are aligned or misaligned. Under urgent conditions, consideration of these elements and their relevant time scales explains behavior in a wide variety of tasks (see Salinas and Stanford, 2021). Furthermore, a recent study (Zhu et al., 2024) showed that the activation patterns of neurons in monkey prefrontal cortex during the antisaccade task can be accurately predicted from their stimulus- and saccade-related responses during a simpler task (a memory guided saccade task). This lends credence to the idea that, at the circuit level, the qualities that are critical for target selection and oculomotor performance are the relative strengths of the exogenous and endogenous signals, and their alignment in space and time. If we understand what those signals are, then it no longer matters how they were generated. The Discussion now includes a paragraph on this issue.

      (2) Discussion (p. 16, lines 472-475): The authors suppose that "It is as if the exogenous response was automatically followed by a motor bias in the opposite direction. Perhaps the oculomotor circuitry is such that an exogenous signal can rapidly trigger a saccade, but if it does not, then the corresponding motor plan is rapidly suppressed regardless of anything else.". I think this interesting point should be discussed in more detail. Could it also be that instead of suppression, other currently active motor plans were enhanced? Would this involve attention? Some attention models assume that attention works by distributing available (neuronal) processing resources (e.g., Desimone & Duncan, 1995, Annual Review of Neuroscience; Bundesen, 1990, Psychological Review; Bundesen et al., 2005, Psychological Review) so that the information receiving the largest share of resources results in perception and is used for action, but this happens without the active suppression of information.

      The rebound seen after the exogenously driven changes is certainly interesting, and we agree that it could involve not only the suppression of a specific motor plan but also enhancement of another (opposite) plan. However, we think that, given the lack of prior data with the requisite temporal precision, further elaboration of this point would just be too speculative in the context of the point that we are trying to make, which is simply that the underlying choice dynamics are more rapid and intricate than is generally appreciated.

      (3) Methods, p. 19, lines 593-596: It is reported that saccades were scored based on their direction. I think more information should be provided to understand which eye movements entered the analysis. Was there a criterion for saccade amplitude? I think it would be very helpful to provide data on the distributions of saccade amplitudes or on their accuracy (e.g. average distance from target) or reliability (e.g. standard deviation of landing points). Also, it is reported that some data was excluded from the analysis, and I suggest reporting how much of the data was excluded. Was the exclusion of the data related to whether participants were "reliable" or "unreliable" performers?

      The reported results are based on all saccades (detected according to a velocity threshold) that were produced after the go signal and in a predominantly horizontal direction (within ± 60° of the cue or non-cue), which were the vast majority (> 99%). Indeed, most saccades were directed to the choice targets, with 95% of them within ± 14.2° of the horizontal plane. The excluded (non-scored) trials were primarily fixation breaks plus a small fraction of trials with blinks, which compromised saccade determination. There was no explicit amplitude criterion; applying one (for instance, excluding any saccades with amplitude < 2°) produced minimal changes to the data. Overall, saccade amplitudes were distributed unimodally with a median of 7.7° and a 95% confidence interval of [3.7°, 9.7°], whereas the choice targets were located at ± 8° horizontally. This is now reported in the Methods.

      As far as data exclusion, analyses were based on urgent trials (gap > 0); non-urgent (gap < 0) trials were excluded from calculation of the tachometric curves simply because they might correspond to a slightly different regime (go signal after cue onset) and to long processing times in the asymptotic range (rPT in 200–300 ms) or beyond, which are not as informative. However, including them made no appreciable difference to the results. No data were excluded based on participant performance or identity; all psychometric analyses were carried out after the selection of trials based on the scoring criteria described above. This is now stated in the Methods.

      (4) Results, p. 9, lines 262-266: Some data analyses are performed on a subset of participants that met certain performance criteria. The reasons for this data selection seem convincing (e.g. to ensure empirical curves were not flat, line 264). Nevertheless, I suggest to explain and justify this step in more detail. In addition, if not all participants achieved an acceptable performance and data quality, this could also speak to the experimental task and its difficulty. Thus, I suggest discussing the potential implications of this, in particular, how this could affect the studied mechanisms, and whether it could limit the presented findings to a special group within the studied population.

      The ideal (i.e., best) analysis for determining the cost of an antisaccade for each individual participant (Fig. 4c) was based on curve fitting and required task performance to rise consistently above chance at long rPTs in both pro and anti trials. This is why the mentioned conditions on the fits were imposed. This is now explained in the text. This ideal analysis was not viable for all tachometric curves not necessarily because of task difficulty but also because of high variability or high bias in a particular experiment/condition. It is true that the task was somewhat difficult, but this manifested in various ways across the dataset, so attempting to draw a clean-cut classification of participants based on “difficulty” may not be easy or all that informative (as can be gleaned from Fig. S1). There simply was a range of success levels, as one might expect from any task that requires some nontrivial cognitive processing. Also note that no participants were excluded flat out from analysis. Thus, at the mentioned point in the text, we simply note that a complementary analysis is presented later that includes all participants and all conditions and provides a highly consistent result (namely, Fig. 7e). Then, in the last section of the Results, where Fig. 7 is presented, we point out that there is considerable variance in performance at long rPTs, and that it relates to both the bias and the difficulty of the task across participants.   

      Reviewer #1 (Recommendations For The Authors):

      (1) I have some questions related to the initial motor bias:

      a) Based on Figure S3, which shows the tachometric curves using data from all participants, there only seems to be a systematic motor bias in Experiments 1 and 3 but no bias in Experiments 2 and 4. It is unclear to me why this is different from the data shown in Figure 7.

      For the bars in Fig. 7, accuracy (% correct) was computed for each participant and then averaged across participants, whereas for the data in Fig. S3, trials were first pooled across participants and then accuracy was computed for each rPT bin. The different averaging methods produce slightly different results because some participants had more trials in the guessing range than others, and different biases.  

      b) Based on Figure 7 (and Figure S3), there was no motor bias in Experiment 4. Based on the correlations between motor bias and time difference between pro and antisaccades, I would expect that the rise points between pro and antisaccades would be more similar in this Experiment. Was this the case?

      No. Figs. 3c and S3d show that the rise times of pro and anti trials for Experiment 4 still differ by about 30 ms (around the 75% correct mark), and the rest of the panels in those figures show that the difference is similar for all experiments. What happens is that Figs. 7 and S3 show that on average the bias is zero for Experiment 4, but that does not mean that the average difference in rise times is zero because there is an offset in the data (correlation is not the same as regression). The most relevant evidence is in Fig. 6c, which shows that, for an overall bias of zero, one would still expect a positive difference in rise times of about 25–30 ms. This figure now includes a regression line, and the corresponding text now explains the relationship between bias and rise times more clearly. Thanks for asking; this is an important point that was not sufficiently elaborated before.

      c) If I understand correctly, the initial motor bias was predominantly observed in participants who were classified as 'unreliable performers' (comparing Figure S2 and Figure 2). Was there a correlation between the motor bias and overall success in the task? In other words: Was a strong motor bias generally disadvantageous?

      Good question. Participants classified as ‘unreliable’ were somewhat more consistently biased in the same direction than those classified as ‘reliable’, but the distinction in magnitude was not large. This can be better appreciated now in Fig. 5 by noting the mix of black (reliable) and gray labels (unreliable) along the x axes. The unreliable participants were also, by definition, less accurate in their asymptotic performance in at least one experiment (Fig. S1). In general, however, this classification was used simply to distinguish more clearly the two main effects in the data (timing cost and bias). In fact, the motor bias was not a reliable predictor of performance during informed choices: across all participants, the mean accuracy in the asymptotic range (rPT > 200 ms) had a weak, non-significant correlation with the bias (ρ = ‒0.07, p = 0.7). So, no, the motor bias did not incur an obvious disadvantage in terms of overall success in the task. Its more relevant effect was the asymmetry in performance that it promoted between pro- and antisaccade trials (Fig. 6c). This is now explained at the end of the Results.

      (2) One of the key analyses of the current study is the comparison of the rPT required to make informed pro and antisaccades (ll.246 ff). I think it would be informative for readers to see the results of this analysis separately for all four experiments. For instance, based on Figure 4a and b, it looks like the rise points were actually very similar between pro and antisaccades in Experiment 1.

      We agree that the ideal analysis would be to compute the performance rise point for pro- and antisaccade curves for each experiment and each participant, but as is now noted in the text, this requires a steady and substantial rise in the tachometric curve, which is not always obtained at such a fine-grained level; the underlying variability can be glimpsed from the individual points in Fig. 7a, b. Indeed, in Fig. 4a, b the mean difference between pro and anti rise points appears small for Experiment 1 — but note that the two panels include data from only partially overlapping sets of participants; the figure legend now makes this more clear. Again, this is because the required fitting procedure was not always reliable in both conditions (pro and anti) for a given subject in a given experiment. Thus, panels a and b cannot be directly compared. The key results are those in Fig. 4c, which compare the rise points in the two conditions for the same participants (11 of them, for which both rise points could be reliably determined). In that case the mean difference is evident, and the individual effect consistent for 9 of the 11 participants (as now noted).

      A similar comparison for Experiments 1 or 2 individually would include fewer data points and lose statistical power. However, on average, the results for Experiments 1 and 2 (separately) were indeed very similar; in both cases, the comparison between pro and anti curves pooled across the same qualifying participants as in Fig. 4c produced results that were nearly identical to those of Fig. 4d (as can be inferred from Fig. 2a, b). Furthermore, results for the four individual experiments pooled across all participants are presented in Figure S3, which shows delayed rises in antisaccade performance consistent with the single participant data (Fig. 4c).

      (3) Figure 3: It would be helpful to indicate the reliable performers that were used for Figure 3a in the bar plots in Figure 3b. Same for Figures 3c and d.

      Done. Thanks for the suggestion.

      (4) Introduction: The literature on the link between covert attention and directional biases in microsaccades seems relevant in the context of the current study (e.g., Hafed et al., 2002, Vision Res; Engbert & Kliegl, 2003, Vision Res; Willett & Mayo, 2023, Proc Natl Acad Sci USA).

      Yes, thanks for the suggestion. The introduction now mentions the link between attentional allocation and microsaccade production.

      (5) ll.395ff & Figure 7f: Please clarify whether data were pooled across all four experiments for this analysis.

      Yes, the data were pooled, but a positive trend was observed for each of the four experiments individually. This is now stated.

      (6) ll.432-433: There is evidence that the attentional locus and the actual saccade endpoint can also be dissociated (e.g., Wollenberg et al., 2018, PLoS Biol; Hanning et al., 2019, Proc Natl Acad Sci USA).

      True. We have rephrased accordingly. Thanks for the correction.

      (7) ll.438-440: This sentence is difficult to parse.

      Fixed.

      Reviewer #2 (Recommendations For The Authors):

      The manuscript is well-written and compelling. The biggest issue for me was keeping track of the specifics of the individual experiments. I think some small efforts to reinforce those details along the way would help the reader. For example, in the Figure 3 figure legend, I found the parenthetical phrase "high luminence cue, low luminence non-cue)" immensely helpful. It would be helpful and trivial to add the corresponding phrase after "Experiment 4" in the same legend.

      Thanks for the suggestion. Legends and/or labels have been expanded accordingly in this and other figures.

      Line 314: "..had any effect on performance,..." Should there be a callout to Figure 2 here?

      Done.

      It wasn't clear to me why the specific high and low luminance values (48 and 0.25) were chosen. I assume there was at least some quick perceptual assessment. If that's the case or if the values were taken from prior work, please include that information.

      Done.

      Reviewer #3 (Recommendations For The Authors):

      Minor points. Please note that the comments made in the public review above are not repeated here.

      (1) Introduction, p. 2, lines 41-45: It is mentioned that the effects of covert attention or a saccade can be quite distinct. I suggest specifying in what way.

      Done.

      (2) Introduction, p. 2, lines 46-47: It is said that the relation between attention and saccade planning was still uncertain and then it is stressed that this was the case for more natural viewing conditions. However, the discussed literature and the experimental approach of the current study still rely on experimental paradigms that are far from natural viewing conditions. Thus, I suggest either discussing the link between these paradigms and natural viewing in more detail or leaving out the reference to natural viewing at this point (I think the latter suggestion would fit the present paper best).

      We followed the latter suggestion.

      (3) Introduction (e.g. p. 3, lines 55-58): The authors discuss the effects that sustaining fixation might have on attention and eye movements. Recently, it has been found that maintaining fixation can ameliorate cognitive conflicts that involve spatial attention (Krause & Poth, 2023, iScience). It seems interesting to include this finding in the discussion, because it supports the authors' view that it is necessary to study fixation and eye movements rather than eye movements alone to uncover their interplay with attention and decision-making.

      Thanks for the reference. The reported finding is certainly interesting, but we find it somewhat tangential to the specific point we make about strong fixation constraints — which is that they suppress internally driven motor activity, including biases, that are highly informative of the relationship between attention and saccade planning (lines 466‒472, 541‒561). Whether fixation state has other subtle consequences for cognitive control is an intriguing, important issue, for sure. But we would rather maintain the readers’ focus on the reasons why less restrictive fixation requirements are relevant for understanding the deployment of attention.

      (4) Results, p. 9, lines 264-266: It is reported that "The rise points were statistically the same across experiments for both prosaccades (p=0.08, n=10, permutation test)...", but the p-value seems quite close to significance. I suggest mentioning this and phrasing the sentence a bit more carefully.

      We now refer to the rise points as “similar”.

      (5) Figure 7 a-d: It might help readers who first skim through the figures before reading the text to use other labels for the bins on the x-axis that spell out the name of the phase in the trial. It might also help to visualize the bins on the plot of a tachymetric function (in this case, changing the labels could be unnecessary).

      Thanks for the suggestion. We added an insert to the figure to indicate the correspondence between labels and time bins more intuitively.

      (6) Methods, p. 18, lines 566-567: On some trials, participants received an auditory beep as a feedback stimulus. As this could induce a burst of arousal, I wondered how it affected the subsequent trials.

      This is an interesting issue to ponder. We agree that, in principle, the beep could have an impact on arousal. However, what exactly would be predicted as a consequence? The absence of a beep is meant to increase the urgency of the participant, so some effect of the beep event on RT would be expected anyway as per task instructions. Thus, it is unclear whether an arousal contribution could be isolated from other confounds. That said, three observations suggest that, at most, an independent arousal effect would be very small. First, we have performed multisensory experiments (unpublished) with auditory and visual stimuli, and have found that it is difficult to obtain a measurable effect of sound on an urgent visual choice task unless the experimental conditions are particularly conducive; namely, when the visual stimuli are dim and the sound is loud and lateralized. None of these conditions applies to the standard feedback beep. Second, because most trials are on time, the meaningful feedback signal is conveyed by the absence of the beep. But this signal to alter behavior (i.e., respond sooner) has zero intensity and is therefore unlikely to trigger a strong exogenous, automatic response. Finally, in our data, we can parse the trials that followed a beep (the majority) from those that did not (a minority). In doing so, we found no differences with respect to perceptual performance; only minor differences in RT that were identical for pro- and antisaccade trials. All this suggests to us that it is very unlikely that the feedback alters arousal significantly on specific trials, somehow impacting the tachometric curve (a contribution to general arousal across blocks or sessions is possible, of course, but would be of little consequence to the aims of the study).

      (7) Methods, p. 18, lines 574-577: I suggest referring to the colors or the conditions in the text as it was done in the experiments, just to prevent readers being confused before reading the methods.

      We appreciate the thought, but think that the study is easier to understand by pretending, initially, that the color assignments were fixed. This is a harmless simplification. Mentioning the actual color assignments early on would be potentially more confusing and make the description of the task longer and more contrived.

      (8) Methods, p. 18, Table 1: Given that the authors had a spectrophotometer, I suggest providing (approximate) measurements for the stimulus colors in addition to the luminance (i.e. not just RGB values).

      Unfortunately, we have since switched the monitor in our setup, so we don’t have the exact color measurements for the stimuli used at the time. We will keep the suggestion in mind for future studies though.

      References

      Oor EE, Stanford TR, Salinas E (2023) Stimulus salience conflicts and colludes with endogenous goals during urgent choices. iScience 26:106253.

      Salinas E, Stanford TR (2021) Under time pressure, the exogenous modulation of saccade plans is ubiquitous, intricate, and lawful. Curr Opin Neurobiol 70:154-162.

      Zhu J, Zhou XM, Constantinidis C, Salinas E, Stanford TR (2024) Parallel signatures of cognitive maturation in primate antisaccade performance and prefrontal activity. iScience.  doi: https://doi.org/10.1016/j.isci.2024.110488.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We thank the reviewers and editors for insightful feedback on how we could improve the manuscript. We have revised the manuscript and addressed the points raised.

      Regarding the technical issues raised about the quality of patch clamp recordings (Reviewer 2), we acknowledge that the upper limit of the access resistance cutoff should be lower and that the accepted change should be 10-20%. To this end, we have revised the manuscript to more accurately detail the quality metrics used. The access resistance for the neurons in paired recordings were below 40 MΩ (similar to the metric used by Kolb et al. 2019), and if the access changed above 50 MΩ, we stopped recording from that neuron. Furthermore, the inclusion of neurons in the histogram with access resistance above 50 MΩ was to highlight the total number of neurons patched but not necessarily used in paired recordings. As this was done with an automated robotic system, the neurons would still undergo an initial voltage clamp and current clamp protocol before the pipette would release the neuron and patch another cell. To the point of Reviewer 2, this patch-walk protocol could also be alternatively implemented using manual recording approaches and this point has been included in the revised manuscript.

      Regarding the spatial restrictions (Reviewer 3), we agree that the average intersomatic distance is higher than ideal. This was likely due to failed patch attempts; for instance, if one pipette successfully achieved whole cell, and the other pipette had several sequential failed patch attempts, the intersomatic distance (ISD) would increase with each failed attempt due to the user selected index of cells. Ideally, the pipettes would be walking across a slice with low ISD if the whole-cell success rate was closer to 100%. To overcome this challenge in future work, automated cell identification and tracking could enable the path planning to be continuously updated after each patch attempt. Given the whole-cell success rate efficiency for a given electrophysiologist, we believe that the automated robot could be improved in later versions to include routeplanning algorithms to minimize the distance between neurons. Alternatively, this patch-walk system could also be integrated to improve connectivity yields for manual recording approaches as well.

      For the point raised about morphological identification, we believe that while important, morphological identification is out of the scope for this project. Future work will include neuronal reconstruction. Regarding the other points, we will amend the manuscript to highlight other key metrics such as maximum time we could hold a neuron under the whole-cell configuration. Additionally, we agree with Reviewer 3 that some of the current language may cause confusion, and we will amend it accordingly.

      To all the reviewers, thank you for your time, understanding, and the opportunity to improve our manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review): 

      Summary: 

      As the scientific community identifies increasing numbers of genetic variants that cause rare human diseases, a challenge is how the field can most quickly identify pharmacological interventions to address known deficits. The authors point out that defining phenotypic outcomes required for drug screen assays is often challenging, and emphasize how invertebrate models can be used for quick ID of compounds that may address genetic deficits. A major contribution of this work is to establish a framework for potential intervention drug screening based on quantitative imaging of morphology and mobility behavior, using methods that the authors show can define subtle phenotypes in a high proportion of disease gene knockout mutants. 

      Overall, the work constitutes an elegant combination of previously developed high-volume imaging with highly detailed quantitative phenotyping (and some paring down to specific phenotypes) to establish proof of principle on how the combined applications can contribute to screens for compounds that may address specific genetic deficits, which can suggest both mechanism and therapy. 

      In brief, the authors selected 25 genes for which loss of function is implicated in human neuro-muscular disease and engineered deletions in the corresponding C. elegans homologs. The authors then imaged morphological features and behaviors prior to, during, and after blue light stimuli, quantitating features, and clustering outcomes as they elegantly developed previously (PMID 35322206; 30171234; 30201839). In doing so, phenotypes in 23/25 tested mutants could be separated enough to distinguish WT from mutant and half of those with adequate robustness to permit high-throughput screens, an outcome that supports the utility of general efforts to ID phenotypes in C. elegans disease orthologs using this approach. A detailed discussion of 4 ciliopathy gene defects, and NACLN-related channelopathy mutants reveals both expected and novel phenotypes, validating the basic approach to modeling vetted targets and underscoring that quantitative imaging approaches reiterate known biology. The authors then screened a library of nearly 750 FDA-approved drugs for the capacity to shift the unc-80 NACLN channel-disrupted phenotype closer to the wild type. Top "mover" compound move outcome in the experimental outcome space; and also reveal how "side effects" can be evaluated to prioritize compounds that confer the fewest changes of other parameters away from the center. 

      Strengths: 

      Although the imaging and data analysis approaches have been reported and the screen is limited in scope and intervention exposure, it is important that the authors strongly combine individual approach elements to demonstrate how quantitative imaging phenotypes can be integrated with C. elegans genetics to accelerate the identification of potential modulators of disease (easily extendable to other goals). Generation of deletion alleles and documentation of their associated phenotypes (available in supplemental data) provide potentially useful reagents/data to the field. The capacity to identify "over-shooting" of compound applications with suggestions for scale back and to sort efficacious interventions to minimize other changes to behavioral and physical profiles is a strong contribution. 

      Weaknesses: 

      The work does not have major weaknesses, although it may be possible to expand the discussion to increase utility in the field: 

      (1) Increased discussion of the challenges and limitations of the approach may enhance successful adaptation application in the field. 

      It is quite possible that morphological and behavioral phenotypes have nothing to do with disease mechanisms and rather reflect secondary outcomes, such that positive hits will address "off-target" consequences. 

      This is possible and can only be determined with human data. We now discuss the possibility in the discussion.

      The deletion approach is adequately justified in the text, but the authors may make the point somewhere that screening target outcomes might be enhanced by the inclusion of engineered alleles that match the human disease condition. Their work on sod-1 alleles (PMID 35322206) might be noted in this discussion. 

      We agree and now mention this work in the discussion. We are currently working on a collection of strains with patient-specific mutations.

      Drug testing here involved a strikingly brief exposure to a compound, which holds implications for how a given drug might engage in adult animals. The authors might comment more extensively on extended treatments that include earlier life or more extended targeting. The assumption is that administering different exposure periods and durations, but if the authors are aware as to whether there are challenges associated with more prolonged applications, larger scale etc. it would be useful to note them. 

      More prolonged applications are definitely possible. We chose short treatments for this screen to model the potential for changing neural phenotypes once developmental effects of the mutation have already occurred. We now briefly discuss this choice and the potential of longer treatments in the discussion.

      (2) More justification of the shift to only a few target parameters for judging compound effectiveness. 

      - In the screen in Figure 4D and text around 313, 3 selected core features of the unc-80 mutant (fraction that blue-light pause, speed, and curvature) were used to avoid the high replicate requirements to identify subtle phenotypes. Although this strategy was successful as reported in Figure 5, the pared-down approach seems a bit at odds with the emphasis on the range of features that can be compared mutant/wt with the author's powerful image analysis. Adding details about the reduced statistical power upon multiple comparisons, with a concrete example calculated, might help interested scientists better assess how to apply this tool in experimental design. 

      To empirically test the effect of including more features on the subsequent screen, we have repeated the analysis using increasing numbers of features. In a new supplementary figure we find increasing the number of features reduces our power to detect rescue. At 256 features, we would not be able to detect any compounds that rescued the disease model phenotype.

      (3) More development of the side-effect concept. The side effects analysis is interesting and potentially powerful. Prioritization of an intervention because of minimal perturbation of other phenotypes might be better documented and discussed a bit further; how reliably does the metric of low side effects correlate with drug effectiveness? 

      Ultimately this can only be determined with clinical trial data on multiple drugs, but there are currently no therapeutic options for UNC80 deficiency in humans. We have included some extra discussion of the side effect concept.

      Reviewer #2 (Public Review): 

      Summary and strengths: 

      O'Brien et al. present a compelling strategy to both understand rare disease that could have a neuronal focus and discover drugs for repurposing that can affect rare disease phenotypes. Using C. elegans, they optimize the Brown lab worm tracker and Tierpsy analysis platform to look at the movement behaviors of 25 knockout strains. These gene knockouts were chosen based on a process to identify human orthologs that could underlie rare diseases. I found the manuscript interesting and a powerful approach to making genotype-phenotype connections using C. elegans. Given the rate at which rare Mendelian diseases are found and candidate genes suggested, human geneticists need to consider orthologous approaches to understand the disease and seek treatments on a rapid time scale. This approach is one such way. Overall, I have a few minor suggestions and some specific edits. 

      Weaknesses: 

      (1) Throughout the text on figures, labels are nearly impossible to read. I had to zoom into the PDF to determine what the figure was showing. Please make text in all figures a minimum of 10-point font. Similarly, the Figure 2D point type is impossible to read. Points should be larger in all figures. Gene names should be in italics in all figures, following C. elegans convention. 

      We have updated all figures with larger labels and, where necessary, split figures to allow for better readability. We’ve also corrected italicisation.

      (2) I have a strong bias against the second point in Figure 1A. Sequencing of trios, cohorts, or individuals NEVER identifies causal genes in the disease. This technique proposes a candidate gene. Future experiments (oftentimes in model organisms) are required to make those connections to causality. Please edit this figure and parts of the text. 

      We have removed references to causation. We were thinking of cases where a known variant is found in a patient where causality has already been established rather than cases of new variant discovery.

      (3) How were the high-confidence orthologs filtered from 767 to 543 (lines 128-131)? Also, the choice of the final list of 25 genes is not well justified. Please expand more about how these choices were made. 

      We now explain the extra keyword filtering step. For the final filtering step, we simply examined the list and chose 25. There is therefore little justification to provide and we acknowledge these cannot be seen as representative of the larger set according to well-defined rules. The choice was based on which genes we thought would be interesting using their descriptions or our prior knowledge (“subjective interestingness” in the main text).

      (4) Figures 3 and 4, why show all 8289 features? It might be easier to understand and read if only the 256 Tierpsy features were plotted in the heat maps. 

      In this case, we included all features because they were all tested for differences between mutants and controls. By consistently using all features for each fingerprint we can be sure that the features that are different that we want to highlight in box plots can be referred to in the fingerprint.

      (5) The unc-80 mutant screen is clever. In the feature space, it is likely better to focus on the 256 less-redundant Tierpsy features instead of just a number of features. It is unclear to me how many of these features are correlated and not providing more information. In other words, the "worsening" of less-redundant features is far more of a concern than the "worsening" of 1000 correlated features. 

      This is a good point. We’ve redone the analysis using the Tierpsy 256 feature set and included this as a supplementary figure. We find that the same trend exists when looking at this reduced feature set.

      Reviewer #3 (Public Review): 

      In this study, O'Brien et al. address the need for scalable and cost-effective approaches to finding lead compounds for the treatment of the growing number of Mendelian diseases. They used state-of-the-art phenotypic screening based on an established high-dimensional phenotypic analysis pipeline in the nematode C. elegans. 

      First, a panel of 25 C. elegans models was created by generating CRISPR/Cas9 knock-out lines for conserved human disease genes. These mutant strains underwent behavioral analysis using the group's published methodology. Clustering analysis revealed common features for genes likely operating in similar genetic pathways or biological functions. The study also presents results from a more focused examination of ciliopathy disease models. 

      Subsequently, the study focuses on the NALCN channel gene family, comparing the phenotypes of mutants of nca-1, unc-77, and unc-80. This initial characterization identifies three behavioral parameters that exhibit significant differences from the wild type and could serve as indicators for pharmacological modulation. 

      As a proof-of-concept, O'Brien et al. present a drug repurposing screen using an FDA-approved compound library, identifying two compounds capable of rescuing the behavioral phenotype in a model with UNC80 deficiency. The relatively short time and low cost associated with creating and phenotyping these strains suggest that high-throughput worm tracking could serve as a scalable approach for drug repurposing, addressing the multitude of Mendelian diseases. Interestingly, by measuring a wide range of behavioural parameters, this strategy also simultaneously reveals deleterious side effects of tested drugs that may confound the analysis. 

      Considering the wealth of data generated in this study regarding important human disease genes, it is regrettable that the data is not actually made accessible. This diminishes the study's utility. It would have a far greater impact if an accessible and user-friendly online interface were established to facilitate data querying and feature extraction for specific mutants. This would empower researchers to compare their findings with the extensive dataset created here. Otherwise, one is left with a very limited set of exploitable data. 

      We have now made the feature data available on Zenodo (https://doi.org/10.5281/zenodo.12684118) as a matrix of feature summaries and individual skeleton timeseries data (the feature matrix makes it more straightforward to extract the data from particular mutants for reanalysis). We have also created a static html version of the heatmap in Figure 2 containing the entire behavioural feature set extracted by Tierpsy. This can be opened in a browser and zoomed for detailed inspection. Mousing over the heatmap shows the names of features at each position making it easier to arrive at intuitive conclusions like ‘strain A is slow’ or ‘strain B is more curved’.

      Another technical limitation of the study is the use of single alleles. Large deletion alleles were generated by CRISPR/Cas9 gene editing. At first glance, this seems like a good idea because it limits the risk that background mutations, present in chemically-generated alleles, will affect behavioral parameters. However, these large deletions can also remove non-coding RNAs or other regulatory genetic elements, as found, for example, in introns. Therefore, it would be prudent to validate the behavioral effects by testing additional loss-of-function alleles produced through early stop codons or targeted deletion of key functional domains. 

      We have added a note in the main text on limitations of deletion alleles. We like the idea of making multiple alleles in future studies, especially in cases where a project is focussed on just one or a few genes.

      Recommendations for the authors

      Reviewer #1 (Recommendations For The Authors): 

      Note that none of the above suggestions or the one immediately below are considered mandatory. 

      One additional minor point: The dual implication of mevalonate perturbations for NACLM deficiencies is striking. At the same time, the mevalonate pathway is critical for embryo viability among other things, which prompts questions about how reproductive physiology is integrated in this screen approach. It appears that sterilization protocols are not used to prepare screen target animals, but it would be useful to know if there were a signature associated with drug-induced sterility that might help identify one potential common non-interesting outcome of compound treatments in general. In this work, the screen treatment is only 4 hours, which is probably too short to compromise reproduction, but as noted above, it is likely users would intend to expose test subjects for much longer than 4-hour periods. 

      This is an interesting point. In its current form our screen doesn’t assess reproductive physiology. This is something that we will consider in ongoing projects.

      Figures 

      Figure 1D might be omitted or moved to supplement. 

      We have removed 1D and moved figure 1E as a standalone table (Table 1) to improve readability.

      Figure 2D "key" is hard to make out size differences for prestim, bluelight, and poststim -more distinctive symbols should be used. 

      We have increased the size of the symbols so that the key is easier to read.

      Line 412 unc-25 should be in italics 

      Corrected

      Reviewer #2 (Recommendations For The Authors): 

      Specific edits: 

      All of the errors below have been corrected.

      Line 47, "loss of function" should be hyphenated because it is a compound adjective that modifies mutations. 

      Line 50, "genetically-tractable" should not be hyphenated because it is not a compound adjective. It is an adverb-adjective pair. Line 102 has the same grammatical issue. 

      Line 85, "rare genetic diseases" do not "affect nervous system function". The disease might have deficits in this function, but the disease does not do anything to function. 

      Line 86, it should be mutations not mutants. Mutations are changes to DNA. Mutants are individuals with mutations. 

      Throughout, wild-type should be hyphenated when it is used as a compound adjective. 

      Figure 4, asterisks is spelled incorrectly. 

      Reviewer #3 (Recommendations For The Authors): 

      - As stated in the public review, the utility of the study is limited by the lack of access to the complete dataset. The wealth of data produced by the study is one of its major outputs. 

      We have made the data publicly available on Zenodo. We appreciate the request.

      - Describe the exact break-points of the different alleles, because it was not readily feasible to derive them from the gene fact sheets provided in the supplementary materials. 

      We have now provided the start position and total length of deletion for each gene in the gene fact sheets.

      - Figure 1C: what does "Genetic homology"/"sequence identity" refer to? How were these values calculated? 

      UNC-49 is clearly not 95% identical to vertebrate GABAR subunits at the protein level. 

      We have changed the axis label to “BLAST % Sequence Identity” to clarify that these values are calculated from BLAST sequence alignments on WormBase and the Alliance Genom Resources webpages.

      - Figure 1E : The data presented in Figure 1E appears somewhat unreliable. For example, a cursory check showed: 

      (1) Wrong human ortholog: unc-49 is a Gaba receptor, not a Glycine receptor as indicated in the second column. 

      (2) Wrong disease association: dys-1 is not associated with Bardet-Biedl syndrome; overall the data indicated in the table does not seem to fully match the HPO database. 

      (3) Inconsistent disease association: why don't the avr-14 and glc-2 (and even unc-49) profiles overlap/coincide given that they present overlapping sets of human orthologs. 

      Thank you for catching this! We have corrected gene names which were mistakenly pasted. We have also made this a standalone table (Table 1) for improved readability.

      - Error in legend to figure 4I : "with ciliopathies and N2" > ciliopathies should be "NALCN disease". 

      - Error at line 301: "Figures 2E-H" should be "Figures 4E-H". 

      Corrected.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Wang and colleagues identify biallelic variants of DNAH3 in four unrelated Han Chinese infertile men through whole-exome sequencing, which contributes to abnormal sperm flagellar morphology and ultrastructure. To investigate the importance of DNAH3 in male infertility, the authors generated crispant Dnah3 knockout (KO) male mice. They observed that KO mice are also infertile, showing a severe reduction in sperm movement with abnormal IDA (inner dynein arms) and mitochondrion structure. Moreover, nonfunctional DNAH3 expression decreased the expression of IDA-associated proteins in the spermatozoa of patients and KO mice, which are involved in the disruption of sperm motility. Interestingly, the infertility of patients and KO mice is rescued by intracytoplasmic sperm injection (ICSI). Taken together, the authors propose that DNAH3 is a novel pathogenic gene for asthenoterozoospermia and male infertility.

      Strengths:

      This work investigates the role of DNAH3 in sperm mobility and male infertility. By using gold-standard molecular biology techniques, the authors demonstrate with exquisite resolution the importance of DNAH3 in sperm morphology, showing strong evidence of its role in male infertility. Overall, this is a very interesting, well-written, and appealing article. All aspects of the study design and methods are well described and appropriate to address the main question of the manuscript. The conclusions drawn are consistent with the analyses conducted and supported by the data.

      Weaknesses:

      The paper is solid, and in its current form, I have not detected relevant weaknesses.

      We thank the comments from the reviewer very much.

      Reviewer #2 (Public Review):

      Wang et al. investigated the role of dynein axonemal heavy chain 3 (DNAH3) in male infertility. They found that variants of DNAH3 were present in four infertile men, and the deficiency of DNAH3 in sperm affects sperm mobility. Additionally, they showed that Dnah3 knockout male mice are infertile. Furthermore, they demonstrated that DNAH3 influences inner dynein arms by regulating several DNAH proteins. Importantly, they showed that intracytoplasmic sperm injection (ICSI) can rescue the infertility in Dnah3 knockout mice and two patients with DNAH3 variants.

      Strengths:

      The conclusions of this paper are well-supported by data.

      Weaknesses:

      The sample/patient size is small; however, the findings are consistent with those of a recent study on DNAH3 in male infertility involving 432 patients.

      We extend our sincere gratitude to the expert reviewers for their valuable comments and insightful suggestions.

      A cohort of 587 unrelated infertile men with asthenoteratozoospermia was recruited to investigate the potential genetic etiology using WES. In addition to mutations in DNAH3 identified in four patients, mutations in serval other genes previous reported by our group, including CFAP65 (Zhang et al., 2019. PMID: 31571197), DNAH8 (Yang et al., 2020. PMID: 32681648), DNAH12 (Li et al., 2022. PMID: 34791246), FISIP2 (Zheng et al., 2023. PMID: 35654582), CEP128 (Zhang et al., 2022. PMID: 35296684), CEP78 (Zhang et al., 2022. PMID: 36206347), CT55 (Zhang et al., 2023. PMID: 36481789), SPATA20 (Wang et al., 2023. PMID: 36415156), TENT5D (Zhang et al., 2024. PMID: 38228861), CFAP52 (Jin et al., 2023. PMID: 38126872), CEP70 (Ruan et al., 2023. PMID: 36967801), PRSS55 (Liu et al., 2022. PMID: 35821214), as well as other unreported variants were also identified.

      Reviewer #3 (Public Review):

      Summary:

      (1) To further explore the genetic basis of asthenoteratozoospermia, the authors performed whole-exome sequencing analyses among infertile males affected by asthenoteratozoospermia. Four unrelated Han Chinese patients were found to carry biallelic variations of DNAH3, a gene encoding IDA-associated protein.

      (2) To verify the function of IDA associated protein DNAH3, the authors generated a Dnah3-KO mouse model and revealed that the loss of DNAH3 leads to severe male infertility as a result of the severe reduction in sperm movement with the abnormal IDA and mitochondrion structures.

      (3) Mechanically, they confirmed decreased expression of IDA-associated proteins (including DNAH1, DNAH6 and DNALI1) in the spermatozoa from patients with DNAH3 mutations and Dnah3-KO male mice.

      (4) Then, they also found that male infertility caused by DNAH3 deficiency could be rescued by intracytoplasmic sperm injection (ICSI) treatment in humans and mice.

      Strengths:

      (1) In addition to existing research, the authors provided novel variants of DNAH3 as important factors leading to asthenoteratozoospermia. This further expands the spectrum of pathogenic variants in asthenoteratozoospermia.

      (2) By mechanistic studies, they found that DNAH3 deficiency led to decreased expression of IDA-associated proteins, which may be used to explain the disruption of sperm motility and reduced fertility caused by DNAH3 deficiency.

      (3) Then, successful ICSI outcomes were observed in patients with DNAH3 mutations and Dnah3 KO mice, which will provide an important reference for genetic counselling and clinical treatment of male infertility.

      We are very grateful for the reviewer's careful comments.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for The Authors):

      I have carefully read the revised versions of this manuscript, and I would like to thank the authors for addressing all my previous concerns.

      I have no additional comments or suggestions.

      We thank the reviewer for reviewing our revised manuscript.

      Reviewer #2 (Recommendations for The Authors):

      (1) Statistical analyses should be provided alongside the quantification (Fig S1B, S7C).

      According to the suggestions of the reviewer, we have added statistical analyses of the corresponding quantification in the legends of Figure S1 and Figure S7.

      (2) The numbers of sperms counted in Fig S1A should be listed.

      In response to reviewer's valuable suggestions. We have listed the corresponding ratio of different morphological defects in sperm tail of the patients in Figure S1A.

      (3) Due to the high similarities in experimental design, data and conclusions between the current study and previously published work by Meng et al. (2024), as well as the very similar titles of the two studies, it is crucial to emphasize the differences in the Discussion section.

      Many thanks for reviewer's kind suggestions for our revised manuscript.

      Employing whole-exome sequencing (WES) on infertile men to identify candidate variants, followed by in-silico and functional analysis of these variants, and generating mouse models using CRISPR-Cas9 technology, has proven to be an efficient and widely used approach for uncovering the causative genes of male infertility associated with sperm defects. Both our study and the recent work by Meng et al. utilized this approach to verify whether DNAH3 mutations are a cause of asthenoteratozoospermia. Additionally, we have also updated the title of our study to: 'DNAH3 deficiency causes flagellar inner dynein arm loss and male infertility in humans and mice'.

      Meng et al. reported DNAH3 mutations in asthenoteratozoospermia affected patients, revealing multiple morphological defects in sperm tail. Moreover, ultrastructural abnormalities of the flagellar axoneme in the patients were evident in these patients, characterized by a disrupted '9+2' arrangement and the notable absence of IDAs. Additionally, they generated Dnah3 KO mice, which were infertile and exhibited moderate morphological abnormalities. While the '9+2' microtubule arrangement in the flagella of their Dnah3 KO mice remained intact, the IDAs on the microtubules were partially absent. In our study, we observed similar phenotypic differences between DNAH3-deficient patients and Dnah3 KO mice. Both studies suggest that DNAH3 plays a crucial role in human and mouse male reproduction.

      However, there are notable differences between the two studies. Firstly, the phenotypes of Dnah3 KO mice showed slight differences. Meng et al. generated two Dnah3 KO mouse models (KO1 and KO2), and both of which exhibited significantly higher sperm motility and progressive motility than in our study, where nearly all sperm were completely immobile. Furthermore, their Dnah3 KO2 mice even displayed motility comparable to WT mice and retained partial fertility. We speculate that these differences may be attributed to variations in mouse genetic background or the presence of a truncated DNAH3 protein resulting from specific knockout strategies. Secondly, we conducted additional research and uncovered novel findings. We revealed that male infertility caused by DNAH3 mutations follows an autosomal recessive inheritance pattern, as confirmed through Sanger sequencing of the patients' parents. We also discovered the dynamic expression and localization of DNAH3 during spermatogenesis in humans and mice through immunofluorescent staining. We further found that DNAH3 deficiency had no impact on ciliary development in the oviduct or on oogenesis in mice, resulting in normal female fertility. Moreover, in the absence of DNAH3 in both humans and mice, the expression of IDA-associated proteins, including DNAH1, DNAH6 and DNALI1, was decreased, while the expression of ODA-associated proteins remained unaffected, indicating that DNAH3 is involved in sperm axonemal development, specifically through its role in the assembly of IDAs. Collectively, our study corroborates the findings of Meng et al., and provides additional unique insights, comprehensively elucidating the critical role of DNAH3 in human and mouse spermatogenesis.

      We have added these discussions in line 275 to line 306.

      Reviewer #3 (Recommendations for The Authors):

      I have no more recommendations for the authors.

      We thank the reviewer for reviewing our revised manuscript.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      The authors showed that autophagy-related genes are involved in plant immunity by regulating the protein level of the salicylic acid receptor, NPR1.<br /> The experiments are carefully designed and the data is convincing. The authors did a good job of understanding the relationship between ATG6 and NRP1.

      The authors have addressed most of my previous concerns.

      Thank you so much for acknowledging our research. It is incredibly rewarding to see our work recognized. We hope that our findings will inspire new perspectives and foster further exploration in this area.

      Reviewer #2 (Public Review):

      The manuscript by Zhang et al. explores the effect of autophagy regulator ATG6 on NPR1-mediated immunity. The authors propose that ATG6 directly interacts with NPR1 in the nucleus to increase its stability and promote NPR1-dependent immune gene expression and pathogen resistance. This novel role of ATG6 is proposed to be independent of its role in autophagy in the cytoplasm. The authors demonstrate through biochemical analysis that ATG6 interacts with NPR1 in yeast and very weakly in vitro. They further demonstrate using overexpression transgenic plants that in the presence of ATG6-mcherry the stability of NPR1-GFP and its nuclear pool is increased.

      Comments on revised version:

      The authors demonstrate the correlation between overexertion of atg6 and higher stability and activity of npr1. They claim a novel activity of atg6 in the nucleus.

      Overall, the experimental scope of the study is solid, however, the over-interpretation of the results substantially reduces the significance and value of this study for the target plant immunity readership.

      Thank you very much for you constructive and insightful comments, as well as for acknowledging the experimental scope of this study. In addition, we have made every effort to address the over-interpretation of the results, as per your comments, ensuring they are more accurate and concise. In the revised version, the modified content has been highlighted in blue to clearly indicate the changes made.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      The authors have addressed most of my concerns. I have no further comments.

      Thank you so much for acknowledging our research. It is incredibly rewarding to see our work recognized. We hope that our findings will inspire new perspectives and foster further exploration in this area.

      Reviewer #2 (Recommendations For The Authors):

      As I previously commented, in fig. 2a and c, the discrepancy between levels of atg6-mcherry in microscope image vs WB has to be explained. The explanation provided by the authors is incomplete and may mislead. The most likely reason for the difference is that the fluorescence signal in fig. 2a is predominantly from free mCherry, rather than the atg6-mcherry fusion. This has to be included in the main text to avoid misleading the reader.

      Thank you very much for you constructive and insightful comments, in response to your comments, we have incorporated the necessary explanations into the revised manuscript (lines 160-164).

      In fig. 1B, the PD fraction has to show the size range of free GST. Also, please use "anti" to indicate that these are immunoblots,.

      Thank you for pointing this out. In the revised manuscript, we identified the range of free GST and used "anti:GST and anti:His" to indicate that these are immunoblots.

      In fig 1C, the WB has to show the free GFP band in the input and IP fractions together with NPR1, rather than in separate blots.

      Thank you for bringing this to our attention. Fig. 1c has been replaced, and the updated image now shows the free GFP band in the input and IP fractions together with NPR1-GFP.

      In fig. 1d, the bifc signal has to be quantified from multiple images across the biological repeats. Also, there's no significance in showing the chlorophyll autofluorescence. What is the purpose of this? They need to use a nuclear marker instead.

      Thank you for your suggestion. Based on your input, we utilized ImageJ software to quantify the YFP fluorescence signal. A total of n = 15 independent images were analyzed, and the corresponding results have been added to Figure 1e. Monitoring chlorophyll autofluorescence serves as a useful background signal, aiding in the distinction between the fluorescence signal of the target protein and background noise. This approach helps reduce potential signal overlap or interference during the experiment, thereby enhancing the reliability of the results.

      Please provide a sequence alignment with multiple ATGs to show the conservation of the presumed bipartite NLS. This information has to be included in the main data.

      Thank you very much for your constructive and insightful comments. We analyzed the putative nuclear localization signal (NLS) in the ATG6 protein sequence using the online INSP (Identification of Nuclear Signal Peptide) prediction software (http://www.csbio.sjtu.edu.cn/bioinf/INSP/). The prediction results indicated the presence of a potential nuclear localization sequence "FLKEKKKKK" within the ATG6 protein, spanning from the 217th to the 223rd amino acid. Additionally, we utilized INSP to investigate the nuclear localization sequences of various ATG proteins (TaATG6a [1], TaATG6b [1], TaATG6c [1], SlATG8h [2]) that have been previously reported to localize in the nucleus. This analysis revealed a relatively conserved NLS sequence motif: "E/K-K/E-K-K-L/K-K" in these ATG proteins. In line with your suggestion, the results of this sequence comparison have been incorporated into the revised manuscript as Figure 2c. The revised manuscript includes a description of the corresponding results. (lines 146-156).

      Fig. 3d and f, how many blots are used for this quantification? Please include all the individual analyzed blots in the supplementary data. In addition, if you present such quantification with error bars, then statistical analysis is required.

      Thank you for pointing this out. In Figure 3d, three independent blots were utilized for this quantification. In Figure 3f, two independent blots were used. The individual analyzed blots have been included in the supplementary Figure 7. We also conducted a statistical analysis as shown in Fig 3d and f, with a detailed description included in the legend section (lines 858 and 861).

      In fig. 4, please indicate what is the normalizing gene. Also, what are the error bars?

      Thank you for pointing this out. In Fig.4, values are means ± SD (n = 3 biological replicates). The AtActin gene was used as the internal control. We have included a detailed description in the figure notes

      In fig. 4b the labeling is missing.

      Thank you for bringing this to our attention. We have included the labeling for Fig. 4 in the revised manuscript.

      Lines 236-239: this statement contradicts the data in fig. 5b: the levels of NPR1-GFP are actually reduced in the presence of atg6 at 24h. So, this result has to be described more accurately by stating that the increase is transient, and it is evident more at 8h, but not at 20-24h.

      Thank you very much for you constructive and insightful comments. We have revised the description of this section to provide a more accurate account of the results (lines 253-258).

      Reference

      (1) Yue J, Sun H, Zhang W, et al. Wheat homologs of yeast ATG6 function in autophagy and are implicated in powdery mildew immunity. BMC Plant Biol. 2015;15:95.

      (2) Li F, Zhang M, Zhang C, et al. Nuclear autophagy degrades a geminivirus nuclear protein to restrict viral infection in solanaceous plants. New Phytol. 2020;225:1746-1761.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Freas et al. investigated if the exceedingly dim polarization pattern produced by the moon can be used by animals to guide a genuine navigational task. The sun and moon have long been celestial beacons for directional information, but they can be obscured by clouds, canopy, or the horizon. However, even when hidden from view, these celestial bodies provide directional information through the polarized light patterns in the sky. While the sun's polarization pattern is famously used by many animals for compass orientation, until now it has never been shown that the extremely dim polarization pattern of the moon can be used for navigation. To test this, Freas et al. studied nocturnal bull ants, by placing a linear polarizer in the homing path on freely navigating ants 45 degrees shifted to the moon's natural polarization pattern. They recorded the homing direction of an ant before entering the polarizer, under the polarizer, and again after leaving the area covered by the polarizer. The results very clearly show, that ants walking under the linear polarizer change their homing direction by about 45 degrees in comparison to the homing direction under the natural polarization pattern and change it back after leaving the area covered by the polarizer again. These results can be repeated throughout the lunar month, showing that bull ants can use the moon's polarization pattern even under crescent moon conditions. Finally, the authors show, that the degree in which the ants change their homing direction is dependent on the length of their home vector, just as it is for the solar polarization pattern. 

      The behavioral experiments are very well designed, and the statistical analyses are appropriate for the data presented. The authors' conclusions are nicely supported by the data and clearly show that nocturnal bull ants use the dim polarization pattern of the moon for homing, in the same way many animals use the sun's polarization pattern during the day. This is the first proof of the use of the lunar polarization pattern in any animal.

      Reviewer #2 (Public Review): 

      Summary: 

      The authors aimed to understand whether polarised moonlight could be used as a directional cue for nocturnal animals homing at night, particularly at times of night when polarised light is not available from the sun. To do this, the authors used nocturnal ants, and previously established methods, to show that the walking paths of ants can be altered predictably when the angle of polarised moonlight illuminating them from above is turned by a known angle (here +/- 45 degrees).

      Strengths: 

      The behavioural data are very clear and unambiguous. The results clearly show that when the angle of downwelling polarised moonlight is turned, ants turn in the same direction. The data also clearly show that this result is maintained even for different phases (and intensities) of the moon, although during the waning cycle of the moon the ants' turn is considerably less than may be expected.

      Weaknesses: 

      The final section of the results - concerning the weighting of polarised light cues into the path integrator - lacks clarity and should be reworked and expanded in both the Methods and the Results (also possibly with an extra methods figure). I was really unsure of what these experiments were trying to show or what the meaning of the results actually are.

      Rewrote these sections and added figure panel to Figure 6.

      Impact: 

      The authors have discovered that nocturnal bull ants while homing back to their nest holes at night, are able to use the dim polarised light pattern formed around the moon for path integration. Even though similar methods have previously shown the ability of dung beetles to orient along straight trajectories for short distances using polarised moonlight, this is the first evidence of an animal that uses polarised moonlight in homing. This is quite significant, and their findings are well supported by their data.

      Reviewer #3 (Public Review): 

      Summary: 

      This manuscript presents a series of experiments aimed at investigating orientation to polarized lunar skylight in a nocturnal ant, the first report of its kind that I am aware of.

      Strengths: 

      The study was conducted carefully and is clearly explained here. 

      Weaknesses: 

      I have only a few comments and suggestions, that I hope will make the manuscript clearer and easier to understand.

      Time compensation or periodic snapshots 

      In the introduction, the authors compare their discovery with that in dung beetles, which have only been observed to use lunar skylight to hold their course, not to travel to a specific location as the ants must. It is not entirely clear from the discussion whether the authors are suggesting that the ants navigate home by using a time-compensated lunar compass, or that they update their polarization compass with reference to other cues as the pattern of lunar skylight gradually shifts over the course of the night - though in the discussion they appear to lean towards the latter without addressing the former. Any clues in this direction might help us understand how ants adapted to navigate using solar skylight polarization might adapt use to lunar skylight polarization and account for its different schedule. I would guess that the waxing and waning moon data can be interpreted to this effect.

      Added a paragraph discussing this distinction in mechanisms and the limits of the current data set in untangling them. An interesting topic for a follow up to be sure.

      Effects of moon fullness and phase on precision 

      As well as the noted effect on shift magnitudes, the distributions of exit headings and reorientations also appear to differ in their precision (i.e., mean vector length) across moon phases, with somewhat shorter vectors for smaller fractions of the moon illuminated. Although these distributions are a composite of the two distributions of angles subtracted from one another to obtain these turn angles, the precision of the resulting distribution should be proportional to the original distributions. It would be interesting to know whether these differences result from poorer overall orientation precision, or more variability in reorientation, on quarter moon and crescent moon nights, and to what extent this might be attributed to sky brightness or degree of polarization.

      See below for response to this and the next reviewer comment

      N.B. The Watson-Williams tests for difference in mean angle are also sensitive to differences in sample variance. This can be ruled out with another variety of the test, also proposed by Watson and Williams, to check for unequal variances, for which the F statistic is = (n2-1)*(n1-R1) / (n1-1)*(n2-R2) or its inverse, whichever is >1. 

      We have looked at the amount of variance from the mean heading direction in terms of both the shifts and the reorientations and found no significant difference in variance between all relevant conditions. It is possible (and probably likely) that with a higher n we might find these differences but with the current data set we cannot make statistical statements regarding degradations in navigational precision.  

      As an additional analysis to address the Watson-Williams test‘s sensitivity to changes in variance, we have added var test comparisons for each of the comparisons, which is a well-established test to compare variance changes. None of these were significantly different, suggesting the observed differences in the WW tests are due to changes in the mean vector and not the distribution. We have added this test to the text.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      I have only very few minor suggestions to improve the manuscript: 

      (1) While I fully agree with the authors that their study, to the best of my knowledge, provides the first proof (in any animal) of the use of the moon's polarization pattern, the many repetitions of this fact disturb the flow of the text and could be cut at several instances. 

      Yes, it is indeed repeated to an annoying degree. 

      We have removed these beyond bookending mentions (Abstract and Discussion).

      (2) In my opinion, the authors did not change the "ambient polarization pattern" when using the linear polarization filter (e.g., l. 55, 170, 177 ...). The linear polarizer presents an artificial polarization pattern with a much higher degree of polarization in comparison to the ambient polarization pattern. I would suggest re-phrasing this, to emphasize the artificial nature of the polarization pattern under the polarizer.

      We have made these suggested changes throughout the text to clarify. We no longer say the ambient pattern was   

      (3) Line 377: I do not see the link between the sentence and Figure 7 

      Changed where in the discussion we refer to Figure 7.

      (4) Figure 7 upper part: In my opinion, the upper part of Figure 7 does not add any additional value to the illustration of the data as compared to Figure 5 and could be cut.

      We thought it might be easier for some reader to see the shifts as a dial representation with the shift magnitude converted to 0-100% rather than the shifts in Figure 5. This makes it somewhat like a graphical abstract summarising the whole study.

      I agree that Figure 5 tells the same story but a reader that has little background in directional stats might find figure 7 more intuitive. This was the intent at least. 

      If it becomes a sticking point, then we can remove the upper portion.  

      Reviewer #2 (Recommendations For The Authors): 

      Minor corrections and queries 

      Line 117: THE majority 

      Corrected

      Lines 129-130: Do you have a reference to support this statement? I am unaware of experiments that show that homing ants count their steps, but I could have missed it.

      We have added the references that unpack the ant pedometer.  

      Line 140: remove "the" in this line. 

      Removed

      Line 170: We need more details here about the spectral transmission properties of the polariser (and indeed which brand of filter, etc.). For instance, does it allow the transmission of UV light?

      Added

      Line 239: "...tested identicALLY to ...." 

      Corrected

      Lines 242-258 (Vector testing): I must admit I found the description of these experiments very difficult to follow. I read this section several times and felt no wiser as a result. I think some thought needs to be given to better introduce the reader to the rationale behind the experiment (e.g., start by expanding lines 243-246, and maybe add a methods figure that shows the different experimental procedures).

      I have rewritten this section of the methods to clearly state the experiment rational and to be clearer as to the methodology.

      Also added a methods panel to Figure 6.

      Line 247: "reoriented only halfway". What does this mean? Do you mean with half the expected angle?

      Yes, this is a bit unclear. We have altered for clarity:

      ‘only altered their headings by about half of the 45° e-vector shift (25.2°± 3.7°), despite being tested on near-full-moon nights.’

      Results section (in general): In Figure 1 (which is a very nice figure!) you go to all the trouble of defining b degrees (exit headings) and c degrees (reorientation headings), which are very intuitive for interpreting the results, and then you totally abandon these convenient angles in favour of an amorphous Greek symbol Phi (Figs. 2-6) to describe BOTH exit and reorientation headings. Why?? It becomes even more confusing when headings described by Phi can be typically greater than 300 degrees in the figures, but they are never even close to this in the text (where you seem to have gone back to using the b degrees and c degrees angles, without explicitly saying so). Personally, I think the b degrees and c degrees angles are more intuitive (and should be used in both the text and the figures), but if you do insist on using Phi then you should use it consistently in both the text and the figures. 

      Replaced Phi with b° and c° for both figures and in the text.

      Finally, for reorientation angles in Figure 4A, you say that the angle is 16.5 degrees. This angle should have been 143.5 degrees to be consistent with other figures. 

      Yes, the reorientation was erroneously copied from the shift data (it is identical in both the +45 shift and reorientation for Figure 4A). This has now been corrected

      Line 280, and many other lines: Wherever you refer to two panels of the same figure, they should be written as (say) Figure 2A, B not Figure 2AB.

      Changed as requested throughout the text.

      Line 295 (Waxing lunar phases): For these experiments, which nest are you using? 1 or 2?

      We have added that this is nest 1. 

      Figure 3B: The title of this panel should be "Waxing Crescent Moon" I think. 

      Ah yes, this is incorrect in the original submission. I have fixed this.

      Lines 312-313: Here it sounds as though the ants went right back to the full +/- 45 degrees orientations when they clearly didn't (it was -26.6 degrees and 189.9 degrees). Maybe tone the language down a bit here.

      Changed this to make clear the orientation shift is only ‘towards’ the ambient lunar e-vector.

      Line 327: Insert "see" before "Figure 5" 

      Added

      Line 329: See comment for Line 295. 

      We have added that this is nest 1. 

      Lines 357-373 (Vector testing): Again, because of the somewhat confusing methods section describing these experiments, these results were hard to follow, both here and in the Discussion. I don't really understand what you have shown here. Re-think how you present this (and maybe re-working the Methods will be half the battle won). 

      I have rewritten these sections to try to make clear these are ant tested with differences in vector length 6m vs. 2m, tested at the same location. Hopefully this is much clearer, but I think if these portions remain a bit confusing that a full rename of the conditions is in order. Something like long vector and short vector would help but comes with the problem of not truly describing what the purpose of the test is which is to control for location, thus the current condition names. As it stands, I hope the new clarifications adequately describe the reasoning while keeping the condition names. Of course, I am happy to make more changes here as making this clear to readers is important for driving home that the path integrator is in play.

      See current change to results as an example: ‘Both forgers with a long ~6m remaining vector (Halfway Release), or a short ~2m remaining vector (Halfway Collection & Release), tested at the same location_,_ exhibited significant shifts to the right of initial headings when the e-vector was rotated clockwise +45°.’

      Line 361: I think this should be 16.8 not 6.8 

      Yes, you are correct. Fixed in text (16.8).

      Line 365: I think this should be -12.7 not 12.7 

      Yes, you are correct. Fixed in text (–12.7).

      Line 408: "morning twilight". Should this be "morning solar twilight"? Plus "M midas" should be "M. midas"

      Added and fixed respectively.

      Line 440. "location" is spelt wrong. 

      Fixed spelling.

      Line 444: "...WITH longer accumulated vectors, ..." 

      Added ‘with’ to sentence. 

      Line 447: Remove "that just as"

      Removed.

      Line 448: "Moonlight polarised light" should be "Polarised moonlight" 

      Corrected.

      Lines 450-453: This sentence makes little sense scientifically or grammatically. A "limiting factor" can't be "accomplished". Please rephrase and explain in more detail.

      This sentence has been rephrased:

      ‘The limiting factors to lunar cue use for navigation would instead be the ant’s detection threshold to either absolute light intensity, polarization sensitivity and spectral sensitivity. Moonlight is less UV rich compared to direct sunlight and the spectrum changes across the lunar cycle (Palmer and Johnsen 2015).’

      Line 474: Re-write as "... due to the incorporation of the celestial compass into the path integrator..."

      Added.

      Reviewer #3 (Recommendations For The Authors): 

      Minor comments 

      Line 84 I am not sure that we can infer attentional processes in orientation to lunar skylight, at least it has not yet been investigated.

      Yes, this is a good point. We have changed ‘attend’ to ‘use’.  

      Line 90 This description of polarized light is a little vague; what is meant by the phrase "waves which occur along a single plane"? (What about the magnetic component? These waves can be redirected, are they then still polarized? Circular polarization?). I would recommend looking at how polarized light is described in textbooks on optics.

      We have rewritten the polarised light section to be clearer using optics and light physics for background. 

      Line 92 The phrase "e-vector" has not been described or introduced up to this point.

      We now introduce e-vector and define it. 

      ‘Polarised light comprises light waves which occur along a single plane and are produced as a by-product of light passing through the upper atmosphere (Horváth & Varjú 2004; Horváth et al., 2014). The scattering of this light creates an e-vector pattern in the sky, which is arranged in concentric circles around the sun or moon's position with the maximum degree of polarisation located 90° from the source. Hence when the sun/moon is near the horizon, the pattern of polarised skylight is particularly simple with uniform direction of polarisation approximately parallel to the north-south axes (Dacke et al., 1999, 2003; Reid et al. 2011; Zeil et al., 2014).’

      Happy to make further changes as well.  

      Line 107 Diurnal dung beetles can also orient to lunar skylight if roused at night (Smolka et al., 2016), provided the sky is bright enough. Perhaps diurnal ants might do the same?

      Added the diurnal dung beetles mention as well as the reference.

      Also, a very good suggestion using diurnal bull ants.

      Line 146 Instead of lunar calendar the authors appear to mean "lunar cycle". 

      Changed

      Line 165 In Figure 1B, it looks like visual access to the sky was only partly "unobstructed". Indeed foliage covers as least part of the sky right up to the zenith.

      We have added that the sky is partially obstructed. 

      Line 179 This could also presumably be checked with a camera? 

      For this testing we tried to keep equipment to a minimum for a single researcher walking to and from the field site given the lack of public transport between 1 and 4am. But yes, for future work a camera based confirmation system would be easier. 

      Line 243 The abbreviation "PI" has not been described or introduced up to this point.

      Changes to ‘path integration derived vector lengths….’

      Line 267 The method for comparing the leftwards and rightwards shifts should be described in full here (presumably one set of shifts was mirrored onto the other?).

      We have added the below description to indicate the full description of the mirroring done to counterclockwise shifts.

      ‘To assess shift magnitude between −45° and +45° foragers within conditions, we calculated the mirror of shift in each −45° condition, allowing shift magnitude comparisons within each condition. Mirroring the −45° conditions was calculated by mirroring each shift across the 0° to 180° plane and was then compared to the corresponding unaltered +45 condition.’

      Discussion Might the brightness and spectrum of lunar skylight also play a role here?

      We have added a section to the discussion to mention the aspects of moonlight which may be important to these animals, including the spectrum, brightness and polarisation intensity.  

      Line 451 The sensitivity threshold to absolute light intensity would not be the only limiting factor here. Polarization sensitivity and spectral sensitivity may also play a role (moonlight is less UV rich than sunlight and the spectrum of twilight changes across the lunar cycle: Palmer & Johnsen, 2015). 

      Added this clarification.

      Line 478 Instead of the "masculine ordinal" symbol used (U+006F) here a degree symbol (U+00B0) should be used.

      Ah thank you, we have replaced this everywhere in the text.  

      Line 485 It should be possible to calculate the misalignment between polarization pattern before and after this interruption of celestial cues. Does the magnitude of this misalignment help predict the size of the reorientation?

      Reorientations are highly correlated with the shift size under the filter, which makes sense as larger shifts mean that foragers need to turn back more to reorient to both the ambient pattern and to return to their visual route. Reorientation sizes do not show a consistent reduction compared to under-the-filter shifts when the lunar phase is low and is potentially harder to detect.

      I have reworked this line in the text as I do not think there is much evidence for misalignment and it might be more precise to say that overnight periods where the moon is not visible may adversely impact the path integrator estimate, though it is currently unknown the full impact of this celestial cue gap of if other cues might also play a role.

      Line 642 "from their" should be "relative to" 

      Changed as requested

      Figure 1B Some mention should be made of the differences in vegetation density. 

      Added a sentence to the figure caption discussing the differences in both vegetation along the horizon and canopy cover.

      Figures 2-6 A reference line at 0 degrees change might help the reader to assess the size of orientation changes visually. Confidence intervals around the mean orientation change would also help here.

      We have now added circular grid lines and confidence intervals to the circular plots. These should help make the heading changes clear to readers.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      eLife Assessment 

      This is a valuable study in the Jurkat T cell line that calls attention to phosphorylation of formin-like 1 β role and its role in polarization of CD63 positive extracellular vesicles (referred to as exosomes). The evidence presented in the Jurkat model is solid, but concerns have been raised about the statistical analysis and more details would be required to fully assess the significance of the results. For example, ANOVA is the method described, but it requires large amounts of normally distributed data in multiple groups and cannot be used to make pairwise comparisons within groups, which would require a post-hoc method (which is not discussed). In addition, the data showing forming-like 1 β in primary human T cells without and with a CAR are provided without quantification and don't investigate any of the novel claims, so doesn't address the relevance of Formin-like 1 β beyond the Jurkat model. Nonetheless, the consistent trends in the body of the study provide solid support for the claims.

      We acknowledge this general statement on statistics. Thus, we have now discussed and provided more details on the post-hoc method (Tukey), as a new Supplementary data S13 (p-values after applying tukey's method -post hoc- to the one-way anova for all the pairwise comparisons). Additionally, we have now provided quantitative data on the percentage of primary cells with and without CAR that show FMNL1 accumulations at the immune synapse (Suppl. Fig. S7). Regarding the data in primary human T cells, we have already changed the title of the manuscript to strictly adjust it to the main body of the data and our conclusions in the well-established Jurkat synapse model. We also want to emphasize that we have not pretended to extrapolate the relevance of our data regarding FMNL1 and exosomes beyond the Jurkat model. Thus, we have included some additional sentences and/or nuances in the Discussion to somewhat soften our statements in this regard (i.e. “…..provided that the FMNL1 effect on exosome secretion in Jurkat cells can be extended to primary T lymphocytes”) and to clarify this important point.

      Reviewer 1:

      (1) The main findings have been obtained in clones of Jurkat cells. They have not been confirmed in primary T cells. The only experiment performed in primary cells is shown in Figure S7 (primary human T lymphoblasts) for which only the distribution of FMNL1 is shown without quantification. No results presenting the effect of FMNL1 KO and expression of mutants in primary T cells are shown.

      Referee is right regarding the extension of exosome secretion studies to primary human T lymphocytes. Unfortunately, it is well known that primary T lymphocytes are extremely difficult to transfect. Moreover, the expression of our large bi-cistronic large plasmids (>15 Kb) is very inefficient, coupled with the challenge of expressing large proteins, such as the 180 kDa YFP-FMNL1 chimeric variants. The convergence of all these undesirable factors synergistically hampers these studies and we have been unable to consistently achieve enough transfection efficiency to perform these experiments. However, the role of FMNL1 on MTOC/MVB polarization in Jurkat cells, confirmed in this manuscript, has been already extended to primary CD8+ T cell clones (DOI10.1016/j.immuni.2007.01.008). Given that exosome secretion requires

      MTOC/MVB polarization both in Jurkat and primary T lymphoblasts (10.1038/cdd.2010.184, 10.3389/fimmu.2019.00851), this suggests FMNL1 may also control exosome secretion in primary T cells, although the formal demonstration will require further research.

      A new sentence has been included in the Discussion to address this important point. Regarding the second request, we have quantified the images mentioned in Suppl. Fig. S7, and the percentages of fixed T cells showing FMNL1 accumulations at the immune synapse are included in the figure legend.

      (2) Analysis in- depth of the defect in actin remodeling (quantification of the images, analysis of some key actors of actin remodeling) is still lacking. Only Factin is shown, no attempt to look more precisely at actors of actin remodeling has been done.

      The referee is right. Since we have obtained new results on the role of FMNL1 on actin remodeling, we have focused on this formin, which is already a key actor in this process. In this context, we have previously shown that the formin Dia1, another major actor of actin remodeling in T lymphocytes along with FMNL1 (DOI10.1016/j.immuni.2007.01.008), does not undergo phosphorylation upon PKC activation (Suppl. Fig. 5 in https://doi.org/10.1080/20013078.2020.1759926). Since our aim was to unravel the PKC-mediated pathway controlling actin remodeling, we have ruled out more studies on Dia1. Therefore, we have included a new sentence to emphasize the specific role of FMNL1 phosphorylation, but not Dia1, in this regard. Nonetheless, future studies aimed to identifying new important players in this or related pathways could offer significant insights.

      (3) The defect in the secretion of extracellular vesicles is still very preliminary. Examples of STED images given by the authors are nice, yet no quantification is performed.

      The referee is right regarding this point and we acknowledge this comment. Accordingly, we have now quantified the STED images and provided numerical data on the percentages of cells exhibiting the observed phenotypes (see the figure legend for Fig. 10).

      (4) Results shown in Figure S12 on the colocalization of proteins phosphorylated on Ser/Thr are still not convincing. It seems indeed that "phospho-PKC" is labeling more preferentially the CMAC positive cells (Raji) than the Jurkat T cells. It is thus particularly difficult to conclude on the colocalization and even more on the recruitment of phosphorylated-FMNL1 at the IS. Thus, these experiments are not conclusive and cannot be the basis even for their cautious conclusion: "Although all these data did not allow us to infer that FMNL1b is phosphorylated at the IS due to the resolution limit of confocal and STED microscopes, the results are compatible with the idea that both endogenous FMNL1 and YFP-FMNL1bWT are specifically phosphorylated at the cIS".

      The referee may be correct regarding the detail of the "phospho-PKC" labeling. However, it cannot be overlooked that Raji cells also contain proteins that are or may be potential PKC substrates. As a matter of fact, Raji cells also express FMNL1. In addition, MHCII triggering in B cells induces PKC activation (https://doi.org/10.1002/eji.200323351). Regarding which cell type is preferentially labeled, this is a variable topic depending on the analyzed synapse. 

      It is true that there are likely several PKC substrates, both in Jurkat in Raji cells, but our point is that one of these substrates either colocalizes with FMNL1 or is FMNL1 itself. We do not claim at any point that FMNL1 is the only PKC substrate, neither in Jurkat or in Raji cells. 

      Apparently, the referee has either overlooked our results or we did not emphasize them sufficiently. Our results effectively validated the PKC substrate antibody, both on endogenous phospho-FMNL1 and phospho-YFPFMNL1β by WB (Fig. 3). Moreover, the phospho-PKC does not recognize

      YFP-FMNL1β S1086A or S1086D variants (Fig. 3). Last, but not least, when FMNL1 is interfered in the Jurkat cell, the phospho-PKC does not colocalize with FMNL1, but it strongly colocalizes at the synapse with expressed YFPFMNL1βWT in the Jurkat cell (Fig. S11). Indeed YFP-FMNL1β belonged to the Jurkat cell. Taken together these results demonstrate: 1. the specificity of phospho-PKC antibody, 2. the phospho-PKC antibody certainly recognizes phosphorylated YFP-FMNL1β but not its non-phosphorylatable mutant variants, 3. the colocalization of phospho-PKC with anti-FMNL1 is specific. We have included some sentences to clarify these points and to avoid possible misunderstandings by potential readers.  We acknowledge the referee for his/her clarifying point, and we firmly believe our mentioned cautious conclusion is strictly correct, although we have tuned it to consider the possibility that a different PKC substrate could be closely associated to FMNL1, producing the observed colocalization: “Although all these data do not yet allow us to infer that FMNL1b is phosphorylated at the IS due to the resolution limits of super resolution microscopy and the possibility that another PKC substrate may be associated to FMNL1 or very close to FMNL1, in a strictly S1086-dependent manner”.

      To clear any doubt regarding which cell is labelled with phospho-PKC, we have changed the lower panels in Suppl. Fig. S12, and now is more evident that FMNL1 and phospho-PKC belong to the Jurkat cell.

      The study would benefit from a more careful statistical analysis. The dot plots showing polarity are presented for one experiment. Yet, the distribution of the polarity is broad. Results of the 3 independent experiments should be shown and a statistical analysis performed on the independent experiments.

      The referee is right and we have now included further post-hoc analyses data (Tukey) at Suppl. Fig S13. Tukey’s test values were included for all the dot plot figures. We have not included all the plots from 3 different experiments since the manuscript already contains 10+12 multi panel figures and is too large. However, we have stated in the figure legend that these independent experiments are representative of the data obtained from 3 independent experiments. Referee’s consideration regarding the broad distribution of polarity data is correct. We included in the first version of the manuscript a sentence in this regard, that it may have been overlooked: “Remarkably, one important feature of the IS consists of both the onset of the initial cell-cell contacts and the establishment of a mature, fully productive IS, are intrinsically stochastic, rapid and asynchronous processes (87, 88) (43). Thus, the score of the PI corresponding to the distance of MTOC/MVB with respect the IS (42) may be contaminated by background MTOC/MVB polarization, in great part due to the stochastic nature of IS formation (87)”.

    1. Author response:

      • The study does not clearly establish the relationship between Type 1 IFN and cancer therapy, and more robust data are needed to support the claim that tumor growth inhibition occurs via Type 1 IFN upregulation following ORMDL3 knockdown.

      We thank the reviewer’s concern. In Figure 6 we detected the expression of IFNB1 and ISGs in MC38 and LLC tumor upon ORMDL3 knockdown. At the mean time, we also used IHC to explore the abundance of RIG-I and ORMDL3 in these tumors. In addition, in figure S5 we performed western blots to detect the expression of RIG-I with or without ORMDL3 knockdown. All these results support our hypothesis that that ORMDL3 is a negative regulator of interferon via modulating RIG-I abundance.

      • There is ambiguity regarding whether ORMDL3 has a positive or negative role in the Type 1 IFN pathway, especially given conflicting findings in the literature that link higher ORMDL3 levels to increased Type 1 IFN expression.

      We appreciate the reviewer’s concern. In our system and experiments, we validated that ORMDL3 is a negative regulator of interferon, although there is also literature that links higher ORMDL3 levels to increased type-I IFN response. ORMDL3 has been reported associated with rhinovirus-induced childhood asthma (Nature.  2007;448(7152):470-473; N Engl J Med. 2013 Apr 11;368(15):1398-407), and ORMDL3 level is positively associated with rhinovirus abundance (N Engl J Med. 2013 Apr 11;368(15):1398-407).  There are reports indicating that ORMDL3 supports the replication of rhinovirus (for example, Am J Respir Cell Mol Biol. 2020 Jun;62(6):783-792). This phenomenon is consistent with our findings that higher ORMDL3 expression leads to lower interferon production, which facilitates viral replication. We believe that the different experimental conclusions obtained in these experiments are due to different experiment condition and different stimulation. In our research, we provided comprehensive studies at the molecular, cellular, and animal levels to support the conclusion that ORMDL3 is a negative regulator of type-I interferon.

      • The use of certain experimental models, such as HEK293T cells (which are not typical Type 1 IFN producers), raises concerns about the validity and generalizability of the results. Further clarity is needed regarding the rationale for using the same tag in overexpression experiments.

      We thank the reviewer’s suggestion. Besides HEK293T, in Figure 1C and 1D we also used A549 and BMDM to overexpress ORMDL3 and stimulate them with polyI:C or polyG:C, Our results showed that ORMDL3 especially inhibits RLR signaling. Additionally, in Figure 3H we found that the endogenous RIG-I expression decreased when we overexpressed ORMDL3 in BMDM. Regarding the issue of using different protein tags, we plan to use different tags to validate our results.

      • The manuscript contains several inconsistencies and lacks detailed explanations of critical areas, such as the mechanism by which ORMDL3 facilitates USP10 transfer to RIG-I despite no direct interaction between ORMDL3 and RIG-I.

      There are some ERMC (ER-mitochondria contact) proteins that mediate the interaction between ER and mitochondria. ORMDL3 locates in ER, and it has been reported to be associated with calcium transportation. At the meantime, the calcium transfer between ER and mitochondria plays an important role in protein synthesis. It is possible that some ERMC proteins mediate the interaction between ORMDL3 and MAVS. In addition,  we also validated that ORMDL3 interacts with USP10 (Figure 5B). Although ORMDL3 and RIG-I do not interact directly, we generated a mechanistic model that ORMDL3 and MAVS recruit USP10 and RIG-I to ERMCS respectively, thus USP10 could form a complex with RIG-I (Figure 5C) and regulate the stability of RIG-I upon RNA sensing.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer 1:

      Because tRNA-sequencing methods have not been widely used (compared to mRNA-seq), many readers would not be familiar with the characteristics of different methods introduced in this study (QuantM-tRNA, mim-tRNA, YAMAT, DM-tRNA, and ALL-tRNA; bowtie2-based, SHRiMP, and mimseq; what are the main features of "Salmon?"). The manuscript will read better when the basic features of these methods are described in the manuscript, however brief.

      Introduction page 4 now clarifies a little more the difference between bowtie2, SHRiMP and mimseq. Results page 9 briefly summarises the differences between the tRNA-Seq methods. Results page 14 clarifies how Decision and Salmon work.

      Reviewer 2:

      (1) The explanation of the parameter D for bowtie2 sounds ambiguous. "How much effort to expend" needs to be explained in more detail.

      Results page 6 gives a more precise explanation of the D parameter.

      (2) Please provide optimal parameters (L and D) for tRNA-seq alignment.

      I think optimal here is not possible to determine. It will depend on the species, the frequency of misincorporations due to modifications (tRNA-Seq protocol specific) and how long one is willing to let bowtie continue searching for a better match. The point of Figure 1a is that D needs to be increased if L is decreased and an error is allowed in the seed. I think the sentence in the results section Figure 1a is the appropriate way to express this without committing to a single ‘optimal’ parameterisation_:_ ‘We observed that when an error in the seed is allowed, as the seed length is decreased, there needs to be a concomitant increase in effort expended to allow bowtie2 more opportunities to find the best possible alignment, especially with respect to the Transcript ID‘.

      (3) I think the authors chose L=10 and D=100 based on Figure 1A. Which dataset did you choose for this parameterization among ALL-tRNAseq, DM-tRNAseq, mim-tRNAseq, QuantM-tRNA-seq, and YAMAT-seq?

      Figure 1A is based on simulation of full length reads with only sequencing errors, e.g not from any tRNA-Seq method in particular. This is stated in the results text and I’ve clarified in the figure legend.

      (4) Salmon does not need a read alignment process such as Bowtie2. Hence, it is not clear "Only results from alignment with bowtie2" in Figure legend for Figure 4a.

      I’m using Salmon in ‘alignment-mode’, taking the alignments from bowtie2. I’ve clarified this in results page 14.

    1. Author response:

      We thank both reviewers for their thorough and insightful feedback, which will contribute to improving our manuscript. In summary, the key concerns raised include the potential induction of GLV volatiles due to plant handling, limitations in the design of the "wind tunnel" bioassay, and the need for a deeper analysis of specific volatile compounds that contribute to the success of push-pull systems. We are happy to revise the entire manuscript according to all comments of the reviewers. This includes clarification of our methodology and providing a more reflective discussion on how physical stress might have influenced volatile emissions. Additionally, we will conduct new experiments with a modified bioassay setup to address concerns about directional cues and airflow control, minimizing cross-contamination. While the identification of individual compounds was beyond the scope of this study, we acknowledge its importance and propose it as a direction for future research.

      Reviewer #1 (Public review):

      Summary:

      The manuscript of Odermatt et al. investigates the volatiles released by two species of Desmodium plants and the response of herbivores to maize plants alone or in combination with these species. The results show that Desmodium releases volatiles in both the laboratory and the field. Maize grown in the laboratory also released volatiles, in a similar range. While female moths preferred to oviposit on maize, the authors found no evidence that Desmodium volatiles played a role in lowering attraction to or oviposition on maize.

      Strengths:

      The manuscript is a response to recently published papers that presented conflicting results with respect to whether Desmodium releases volatiles constitutively or in response to biotic stress, the level at which such volatiles are released, and the behavioral effect it has on the fall armyworm. These questions are relevant as Desmodium is used in a textbook example of pest-suppressive sustainable intercropping technology called push-pull, which has supported tens of thousands of smallholder farmers in suppressing moth pests in maize. A large number of research papers over more than two decades have implied that Desmodium suppresses herbivores in push-pull intercropping through the release of large amounts of volatiles that repel herbivores. This premise has been questioned in recent papers. Odermatt et al. thus contribute to this discussion by testing the role of odors in oviposition choice. The paper confirms that ovipositing FAW preferred maize, and also confirmed that odors released from Desmodium appeared not important in their bioassays.

      The paper is a welcome addition to the literature and adds quality headspace analyses of Desmodium from the laboratory and the field. Furthermore, the authors, some of whom have since long contributed to developing push-pull, also find that Desmodium odors are not significant in their choice between maize plants. This advances our knowledge of the mechanisms through which push-pull suppresses herbivores, which is critically important to evolving the technique to fit different farming systems and translating this mechanism to fit with other crops and in other geographical areas.

      Thank you for your careful assessment of our manuscript.

      Weaknesses:

      Below I outline the major concerns:

      (1) Clear induction of the experimental plants, and lack of reflective discussion around this: from literature data and previous studies of maize and Desmodium, it is clear that the plants used in this study, particularly the Desmodium, were induced. Maize appeared to be primarily manually damaged, possibly due to sampling (release of GLV, but little to no terpenoids, which is indicative of mostly physical stress and damage, for example, one of the coauthor's own paper Tamiru et al. 2011), whereas Desmodium releases a blend of many compounds (many terpenoids indicative of herbivore induction). Erdei et al. also clearly show that under controlled conditions maize, silver leaf and green leaf Desmodium release volatiles in very low amounts. While the condition of the plants in Odermatt et al. may be reflective of situations in push-pull fields, the authors should elaborate on the above in the discussion (see comments) such that the readers understand that the plant's condition during the experiments. This is particularly important because it has been assumed that Desmodium releases typical herbivore-induced volatiles constitutively, which is not the case (see Erdei et al. 2024). This reflection is currently lacking in the manuscript.

      We acknowledge the need for a more reflective discussion on the possible causes of GLV (green leaf volatiles) emission, particularly regarding physical damage. Although the field plants were carefully handled, it is possible that some physical stress may have contributed to the release of GLVs. We will ensure the revised manuscript reflects this nuanced interpretation. However, we will also explain more clearly that our aim was to capture the volatile emission of plants used by farmers under realistic conditions and moth responses to these plants, not to be able to attribute the volatile emission to a specific cause. We think that this is also clear in the manuscript. However, we plan to revise relevant passages throughout the manuscript to ensure that we do not make any claims about the reason for volatile emissions, and that our claims regarding these plants and their headspace being representative of the system as practiced by farmers are supported. In the revised manuscript we will explain better that the volatile profiles comprise a majority of non-GLV compounds. As shown in figure 1, the majority of the substances that were found in the headspace of the sampled plants of Desmodium intortum or Desmodium incanum are non-GLV monoterpenes, sesquiterpenes, or aromatic compounds. We will also note that the experimental plants used in the study were grown in insect proof screenhouses and were checked for any insect damage before volatile collection and bioassay.

      (2) Lack of controls that would have provided context to the data: The experiments lack important controls that would have helped in the interpretation:

      (2a) The authors did not control the conditions of the plants. To understand the release of volatiles and their importance in the field, the authors should have included controlled herbivory in both maize and Desmodium. This would have placed the current volatile profiles in a herbivory context. Now the volatile measurements hang in midair, leading to discussions that are not well anchored (and should be rephrased thoroughly, see eg lines 183-188). It is well known that maize releases only very low levels of volatiles without abiotic and biotic stressors. However, this changes upon stress (GLVs by direct, physical damage and eg terpenoids upon herbivory, see above). Erdei et al. confirm this pattern in Desmodium. Not having these controls, means that the authors need to put the data in the context of what has been published (see above).

      We appreciate this concern. Our study aimed to capture the real-world conditions of push-pull fields, where Desmodium and maize grow in natural environments without the direct induction of herbivory for experimental purposes. We will update the discussion to provide better context based on existing literature regarding the volatile release under stress conditions. We agree that in further studies it would be important to carry out experiments under different environmental conditions, including herbivore damage. However, this was not within the scope of the present study.

      (2b) It would also have been better if the authors had sampled maize from the field while sampling Desmodium. Together with the above point (inclusion of herbivore-induced maize and Desmodium), the levels of volatile release by Desmodium would have been placed into context.

      We acknowledge that sampling maize and other intercrop plants, such as edible legumes, alongside Desmodium in the push-pull field would have allowed us to make direct comparisons of the volatile profiles of different plants in the push-pull system under shared field conditions. Again, this should be done in future experiments but was beyond the scope of the present study. Due to the amount of samples, we could handle given cost and workload, we chose to focus on Desmodium because there is much less literature on the volatile profiles of field-grown Desmodium than maize plants in the field: we are aware of one study attempting to measure field volatile profiles from Desmodium intortum (Erdei et al. 2024) and no study attempting this for Desmodium incanum. We will point out this justification for our focus on Desmodium in the manuscript. Additionally, we will suggest in the discussion that future studies should measure volatile profiles from maize and intercrop legumes alongside Desmodium and border grass in push-pull fields.

      (2c) To put the volatiles release in the context of push-pull, it would have been important to sample other plants which are frequently used as intercrop by smallholder farmers, but which are not considered effective as push crops, particularly edible legumes. Sampling the headspace of these plants, both 'clean' and herbivore-induced, would have provided a context to the volatiles that Desmodium (induced) releases in the field - one would expect unsuccessful push crops to not release any of these 'bioactive' volatiles (although 'bioactive' should be avoided) if these odors are responsible for the pest suppressive effect of Desmodium. Many edible intercrops have been tested to increase the adoption of push-pull technology but with little success.

      Again, we very much agree that such measurements are important for the longer-term research program in this field. But again, for the current study this would have exploded the size of the required experiment. Regarding bioactivity, we have been careful to use the phrase "potentially bioactive", or to cite other studies showing bioactivity, where we have not demonstrated bioactivity ourselves.

      Because of the lack of the above, the conclusions the authors can draw from their data are weakened. The data are still valuable in the current discussion around push-pull, provided that a proper context is given in the discussion along the points above.

      We agree that our study is limited to its specific aims. Therefore, we think the revisions will make these more explicit and help to avoid misleading claims.

      (3) 'Tendency' of the authors to accept the odor hypothesis (i.e. that Desmodium odors are responsible for repelling FAW and thereby reduce infestation in maize under push-pull management) in spite of their own data: The authors tested the effects of odor in oviposition choice, both in a cage assay and in a 'wind tunnel'. From the cage experiments, it is clear that FAW preferred maize over Desmodium, confirming other reports (including Erdei et al. 2024). However, when choosing between two maize plants, one of which was placed next to Desmodium to which FAW has no tactile (taste, structure, etc), FAW chose equally. Similarly in their wind tunnel setup (this term should not be used to describe the assay, see below), no preference was found either between maize odor in the presence or absence of Desmodium. This too confirms results obtained by Erdei et al. (but add an important element to it by using Desmodium plants that had been induced and released volatiles, contrary to Erdei et al. 2024). Even though no support was found for repellency by Desmodium odors, the authors in many instances in the manuscript (lines 30-33, 164-169, 202, 279, 284, 304-307, 311-312, 320) appear to elevate non-significant tendencies as being important. This is misleading readers into thinking that these interactions were significant and in fact confirming this in the discussion. The authors should stay true to their own data obtained when testing the hypothesis of whether odors play a role in the pest-suppressive effect of push-pull.

      We appreciate this feedback and agree that we may have overstated claims that could not be supported by strict significance tests. However, we believe that non-significant tendencies can still provide valuable insights. In the revised version of the manuscript, we will ensure a clear distinction between statistically significant findings and non-significant trends and remove any language that may imply stronger support for the odor hypothesis that what the data show.

      (4) Oviposition bioassay: with so many assays in close proximity, it is hard to certify that the experiments are independent. Please discuss this in the appropriate place in the discussion.

      We have pointed this out in the submitted manuscript in the lines 275 – 279. Furthermore, we include detailed captions to figure 4 - supporting figure 3 & figure 4 - supporting figure 4. We are aware that in all such experiments there is a danger of between-treatment interference, which we will point out for our specific case. We will also mention that this common caveat does not invalidate experimental designs when practicing replication and randomization and assume insect’s ability to select suitable oviposition site in the background of such confounding factors under realistic conditions. We will also mention explicitly that with our experimental setup we tried to minimize interference between treatments by spacing and temporal staggering.

      (5) The wind tunnel has a number of issues (besides being poorly detailed):

      (5a) The setup which the authors refer to as a 'wind tunnel' does not qualify as a wind tunnel. First, there is no directional flow: there are two flows entering the setup at opposite sides. Second, the flow is way too low for moths to orient in (in a wind tunnel wind should be presented as a directional cue. Only around 1.5 l/min enters the wind tunnel in a volume of 90 l approximately, which does not create any directional flow. Solution: change 'wind tunnel' throughout the text to a dual choice setup /assay.)

      We agree with these criticisms and will change the terminology accordingly. We also plan to conduct an additional experiment with a no-choice arena that provides conditions closer to a true wind tunnel. The setup of the added experiment features an odor entry point at only one side of the chamber to create a more directional airflow. Each treatment (maize alone, maize + D. intortum, maize + D. incanum, and a control with no plants) will be tested separately, with only one treatment conducted per evening to avoid cross-contamination.

      (5b) There is no control over the flows in the flight section of the setup. It is very well possible that moths at the release point may only sense one of the 'options'. Please discuss this.

      We will add this to the discussion. The newly planned assays also address this concern by using a setup with laminar flow.

      (5c) Too low a flow (1,5 l per minute) implies a largely stagnant air, which means cross-contamination between experiments. An experiment takes 5 minutes, but it takes minimally 1.5 hours at these flows to replace the flight chamber air (but in reality much longer as the fresh air does not replace the old air, but mixes with it). The setup does not seem to be equipped with e.g. fans to quickly vent the air out of the setup. See comments in the text. Please discuss the limitations of the experimental setup at the appropriate place in the discussion.

      We will add these limitations to the discussion and will address these concerns with new experiments (see answer 5a).

      (5d) The stimulus air enters through a tube (what type of tube, diameter, length, etc) containing pressurized air (how was the air obtained into bags (type of bag, how is it sealed?), and the efflux directly into the flight chamber (how, nozzle?). However, it seems that there is no control of the efflux. How was leakage prevented, particularly how the bags were airtight sealed around the plants? 

      We will add the missing information to the methods and provide details about types of bags, manufacturers, and pre-treatments. In short, Teflon tubes connected bagged plants to the bioassay setup and air was pumped in at an overpressure, so leakage was not eliminated but contamination from ambient air was avoided.

      (5e) The plants were bagged in very narrowly fitting bags. The maize plants look bent and damaged, which probably explains the GLVs found in the samples. The Desmodium in the picture (Figure 5 supplement), which we should assume is at least a representative picture?) appears to be rather crammed into the bag with maize and looks in rather poor condition to start with (perhaps also indicating why they release these volatiles?). It would be good to describe the sampling of the plants in detail and explain that the way they were handled may have caused the release of GLVs.

      We will include a more detailed description of the plant handling and bagging processes to the methods to clarify how the plants were treated during all assays reported in the submitted manuscript and the newly planned assays. This will address concerns about the possible influence of plant stress, such as GLV emission due to bagging, on the results. We politely disagree that the maize plants were damaged and the Desmodium plants not representative of those encountered in the field. The Desmodium plant pictured was D. incanum, which has sparser foliage and smaller leaves than D. intortum.

      (6) Figure 1 seems redundant as a main figure in the text. Much of the information is not pertinent to the paper. It can be used in a review on the topic. Or perhaps if the authors strongly wish to keep it, it could be placed in the supplemental material.

      We think that Figure 1 provides essential information about the push-pull system and the FAW. To our knowledge, this partly contradictory evidence so far has not been synthesized in the literature. We realize that such a figure would more commonly be provided in a review article, but we do not think that the small number of studies on this topic so far justify a stand-alone review. Instead, the introduction to our manuscript includes a brief review of these few studies, complemented by the visual summary provided in Figure 1 and a detailed supplementary table. We will revise the figure and associated text in the introduction to highlight its relevance for the current study and to reduce redundant information.

      Reviewer #2 (Public review):

      Based on the controversy of whether the Desmodium intercrop emits bioactive volatiles that repel the fall armyworm, the authors conducted this study to assess the effects of the volatiles from Desmodium plants in the push-pull system on behavior of FAW oviposition. This topic is interesting and the results are valuable for understanding the push-pull system for the management of FAW, the serious pest. The methodology used in this study is valid, leading to reliable results and conclusions. I just have a few concerns and suggestions for improvement of this paper:

      (1) The volatiles emitted from D. incanum were analyzed and their effects on the oviposition behavior of FAW moth were confirmed. However, it would be better and useful to identify the specific compounds that are crucial for the success of the push-pull system.

      We fully agree that identifying specific volatile compounds responsible for the push-pull effect would provide valuable insights into the underlying mechanisms of the system. However, the primary focus of this study was to address the still unresolved question whether Desmodium emits volatiles at all under field conditions, and the secondary aim was to test whether we could demonstrate a behavioral effect of Desmodium headspace on FAW moths. Before conducting our experiments, we carefully considered the option of using single volatile compounds and synthetic blends in bioassays. We decided against this because we judged that the contradictory evidence in the literature was not a sufficient basis for composing representative blends. Furthermore, we think it is an important first step to test for behavioral responses to the headspaces of real plants. We consider bioassays with pure compounds to be important for confirmation and more detailed investigation in future studies. There was also contradictory evidence in the literature regarding moth responses to plants. We thus opted to focus on experiments with whole plants to maintain ecological relevance.

      (2) That would be good to add "symbols" of significance in Figure 4 (D).

      We report the statistical significance of the parameters in Figure 4 (D) in Table 3. While testing significance between groups is a standard approach, we used a more robust model-based analysis to assess the effects of multiple factors simultaneously. We will clarify this in the figure legend and provide a cross-reference to Table 3 for readers to easily find the statistical details.

      (3) Figure A is difficult for readers to understand.

      Unfortunately, it is not entirely clear which specific figure is being referred to as "Figure A" in this comment. We kindly request further clarification on which figure needs improvement, and we will make adjustments accordingly to ensure that all figures are easily comprehensible for readers.

      (4) It will be good to deeply discuss the functions of important volatile compounds identified here with comparison with results in previous studies in the discussion better.

      Our study does not provide strong evidence that specific volatiles from Desmodium plants are important determinants of FAW oviposition or choice in the push-pull system. Therefore, we prefer to refrain from detailed discussions of the potential importance of individual compounds. However, in the revised version, we will indicate specifically which of the volatiles we identified overlap with those previously reported from Desmodium, as only the total numbers are summarized in the discussion of the submitted paper.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Removing claims of causality: To avoid confusion, we have now removed claims of causality from our manuscript and also changed the title of the manuscript accordingly

      "Electrophysiological dynamics of salience, default mode, and frontoparietal networks during episodic memory formation and recall: A multi-experiment iEEG replication".

      Control analyses directly comparing AI and IFG: As per the reviewer’s suggestion, we have carried out additional control analyses by directly comparing the net inward/outward balance between the AI and the IFG. Our analysis revealed that the net outflow for the AI is significantly higher compared to the IFG during both encoding and recall phases, a pattern that was replicated across all four experiments. 

      These findings further highlight the unique role of the AI as a key hub in coordinating network interactions during episodic memory formation and retrieval, distinguishing it from a key anatomically adjacent prefrontal region implicated in cognitive control.

      We have incorporated these results into the manuscript (see new Figure S6 and updated Results section). 

      Control analyses directly comparing task with resting state: As per the reviewer’s suggestion, we compared the AI's net outflow during task periods to resting state, finding significantly higher outflow during both encoding and recall across all experiments (ps < 0.05). These results provide further evidence for enhanced role of AI net directed information flow to the DMN and FPN during memory processing compared to the resting state. 

      We have incorporated these results into the manuscript (see new Figure S9 and updated Results section). 

      Control analysis using every region of the brain outside the considered networks: We appreciate the reviewer's suggestion to conduct additional control analyses. However, we have concerns about implementing this approach for several reasons:

      (1) Hypothesis-driven research: Our study was designed based on a strong hypothesis derived from prior fMRI studies, which have consistently shown that the salience network (SN), anchored by the anterior insula (AI), plays a critical role in regulating the engagement and disengagement of the default mode network (DMN) and frontoparietal network (FPN) across diverse cognitive tasks.

      (2) Risk of p-hacking: Running analyses on a large number of brain regions outside our networks of interest without a priori hypotheses could lead to p-hacking, a practice strongly criticized in the scientific community, including by eLife editors (Makin & Orban de Xivry, 2019). Such an approach could potentially yield spurious results and undermine the validity of our findings.

      (3) Principled control region selection: Our choice of the inferior frontal gyrus (IFG) as a control region was hypothesis-driven, based on its: a) Anatomical adjacency to the AI b) Involvement in cognitive control functions, including response inhibition c) Frequent coactivation with the AI in fMRI studies. 

      (4) Robustness of current findings: Our PTE analysis involving the IFG, along with the additional control analyses requested by the reviewer (comparing the task-related net balance of the AI with the IFG and with resting state, see response to reviewer comment 2.1), strongly support a key role for the AI in orchestrating large-scale network dynamics during memory processes.

      (5) Specificity of findings: The contrast between AI and IFG results demonstrates that our observed patterns are not general to all task-active regions but are specific to the AI's role in network coordination. 

      We believe that our current analyses, including the additional controls, provide a comprehensive and rigorous examination of the AI's role in memory-related network dynamics. Adding analyses of numerous additional regions without clear hypotheses could potentially dilute the focus and interpretability of our results. 

      However, we acknowledge the importance of considering broader network interactions. In future studies, we could explore the role of other key regions in a hypothesis-driven manner, potentially expanding our understanding of the complex interactions between multiple brain networks during memory processes.

      These revisions, combined with our rigorous methodologies and comprehensive analyses, provide compelling support for the central claims of our manuscript. We believe these changes significantly enhance the scientific contribution of our work.

      Our point-by-point responses to the reviewers' comments are provided below.

      Reviewer 1:

      (1.1) Because phase-transfer entropy is referenced as a "causal" analysis in this investigation (PTE), I believe it is important to highlight for readers recent discussions surrounding the description of "causal mechanisms" in neuroscience (see "Confusion about causation" section from Ross and Bassett, 2024, Nature Neuroscience). A large proportion of neuroscientists (myself included) use "causal" only to refer to a mechanism whose modulation or removal (with direct manipulation, such as by lesion or stimulation) is known to change or control a given outcome (such as a successful behavior). As Ross and Bassett highlight, it is debatable whether such mechanistic causality is captured by Granger "causality" (a.k.a. Granger prediction) or the parametric PTE, and imprecise use of "causation" may be confusing. The authors have defined in the revised Introduction what their definition of "causality" is within the context of this investigation. 

      We appreciate the reviewer's feedback in terms of the terminology used in our manuscript. To avoid confusion, we have now removed claims of causality from our manuscript and also changed the title of the manuscript accordingly. 

      Reviewer 2:

      (2.1) Clarifying the new control analyses. The authors have been responsive to our feedback and implemented several new analyses. The use of a pre-task baseline period and a control brain region (IFG) definitively help to contextualize their results, and the findings shown in the revision do suggest that (1) relative to a pre-task baseline, directed interactions from the AI are stronger and (2) relative to a nearby region, the IFG, the AI exhibits greater outward-directed influence. 

      However, it is difficult to draw strong quantitative conclusions from the analyses as presented, because they do not directly statistically contrast the effect in question (directed interactions with the FPN and DMN) between two conditions (e.g. during baseline vs. during memory encoding/retrieval). As I understand it, in their main figures the authors ask, "Is there statistically greater influence from the AI to the DMN/FPN in one direction versus another?" And in the AI they show greater "outward" PTE than "inward" PTE from other networks during encoding/retrieval. The balance of directed information favors an outward influence from the AI to DMN/FPN. 

      But in their new analyses, they simply show that the degree of "outward" PTE is greater during task relative to baseline in (almost) all tasks. I believe a more appropriately matched analysis would be to quantify the inward/outward balance during task states, quantify the inward/outward balance during rest states, and then directly statistically compare the two. It could be that the relative balance of directed information flow is nonsignificantly changed between task and rest states, which would be important to know. 

      We thank the reviewer for this suggestion. We have now run additional analysis by directly comparing the inward/outward balance during the task versus the rest states. To calculate the net inward/outward balance, we calculated the net outflow as the difference between the total outgoing information and total incoming information (PTE(out)–PTE(in)). This analysis revealed that net outflow during task periods is significantly higher compared to rest, during both encoding and recall, and across the four experiments (ps < 0.05). These results provide further evidence for enhanced role of AI net directed information flow to the DMN and FPN during memory processing compared to the resting state. These new results have now been included in the revised manuscript (page 12). 

      Likewise, a similar principle applies to their IFG analysis. They show that the IFG tends to have an "inward" balance of influence from the DMN/FPN (the opposite of the AIs effect), but this does not directly answer whether the AI occupies a statistically unique position in terms of the magnitude of its influence on other regions. More appropriate, as I suggest above, would be to quantify the relative balance inward/outward influence, both for the IFG and the AI, and then directly compare those two quantities. (Given the inversion of the direction of effect, this is likely to be a significant result, but I think it deserves a careful approach regardless.) 

      We appreciate the reviewer's suggestion. As per the reviewer’s suggestion, we directly compared the net inward/outward balance between the AI and the IFG. Specifically, we compared the net outflow (PTE(out)–PTE(in)) for the AI with the IFG. This analysis revealed that the net outflow for the AI is significantly higher compared to the IFG during both encoding and recall, and across the four experiments. These findings further highlight a key role for the AI in orchestrating large-scale network dynamics during memory processes. The AI's pattern of directed information flow stands in contrast to that of the IFG, despite their anatomical proximity and shared involvement in cognitive control processes. This dissociation underscores the specificity of the AI's function in coordinating network interactions during memory formation and retrieval. These new results have now been included in our revised manuscript (page 11). 

      (2.2) Consider additional control regions. The authors justify their choice of IFG as a control region very well. In my original comments, I perhaps should have been more clear that the most compelling control analyses here would be to subject every region of the brain outside these networks (with good coverage) to the same analysis, quantify the degree of inward/outward balance, and then see how the magnitude of the AI effect stacks up against all possible other options. If the assertion is that the AI plays a uniquely important role in these memory processes, showing how its influence stacks up against all possible "competitors" would be a very compelling demonstration of their argument. 

      We thank the reviewer for this suggestion. However, please note that running a large number of random analysis by including a large number of brain regions (every region of the brain outside these networks) and comparing their dynamics to the AI without a hypothesis or solid principle amounts to p-hacking, which has been previously strongly criticized by the eLife editors (Makin & Orban de Xivry, 2019). Our study was strongly driven by a solid hypothesis based on prior fMRI studies that have shown that the SN, anchored by the anterior insula (AI), plays a critical role in regulating the engagement and disengagement of the DMN and FPN across diverse cognitive tasks (Bressler & Menon, 2010; Cai et al., 2016; Cai, Ryali, Pasumarthy, Talasila, & Menon, 2021; Chen, Cai, Ryali, Supekar, & Menon, 2016; Kronemer et al., 2022; Raichle et al., 2001; Seeley et al., 2007; Sridharan, Levitin, & Menon, 2008). Moreover, our selection of the IFG as a control region for comparison was also very strongly hypothesis driven, due to its anatomical adjacency to the AI, its involvement in a wide range of cognitive control functions including response inhibition (Cai, Ryali, Chen, Li, & Menon, 2014), and its frequent co-activation with the AI in fMRI studies. Furthermore, the IFG has been associated with controlled retrieval of memory (Badre, Poldrack, Paré-Blagoev, Insler, & Wagner, 2005; Badre & Wagner, 2007; Wagner, Paré-Blagoev, Clark, & Poldrack, 2001), making it a compelling region for comparison. Our findings related to the PTE analysis involving the IFG and also the additional control analyses requested by the reviewer (directly comparing the task-related net balance of the AI with the IFG and also to resting state, please see response to reviewer comment 2.1) strongly highlight a key role of the AI in orchestrating large-scale network dynamics during memory processes. 

      We believe that our current analyses, including the additional controls, provide a comprehensive and rigorous examination of the AI's role in memory-related network dynamics. Adding analyses of numerous additional regions without clear hypotheses could potentially dilute the focus and interpretability of our results.

      However, we acknowledge the importance of considering broader network interactions. In future studies, we could explore the role of other key regions in a hypothesis-driven manner, potentially expanding our understanding of the complex interactions between multiple brain networks during memory processes.

      (2.3) Reporting of successful vs. unsuccessful memory results. I apologize if I was not clear in my original comment (2.7, pg. 13 of the response document) regarding successful vs. unsuccessful memory. The fact that no significant difference was found in PTE between successful/unsuccessful memory is a very important finding that adds valuable context to the rest of the manuscript. I believe it deserves a figure, at least in the Supplement, so that readers can visualize the extent of the effect in successful/unsuccessful trials. This is especially important now that the manuscript has been reframed to focus more directly on claims regarding episodic memory processing; if that is indeed the focus, and their central analysis does not show a significant effect conditionalized on the success of memory encoding/retrieval, it is important that readers can see these data directly.

      As per the reviewer’s suggestion, we have now included a Figure related to the results for the successful versus unsuccessful comparison in the Supplementary materials of the revised manuscript (Figures S10, S11).   

      (2.4) Claims regarding causal relationships in the brain. I understand that the authors have defined "causal" in a specific way in the context of their manuscript; I do believe that as a matter of clear and transparent scientific communication, the authors nonetheless bear a responsibility to appreciate how this word may be erroneously interpreted/overinterpreted and I would urge further review of the manuscript to tone down claims of causality. Reflective of this, I was very surprised that even as both reviewers remarked on the need to use the word "causal" with extreme caution, the authors added it to the title in their revised manuscript.

      We thank the reviewer for this suggestion. To avoid confusion, we have now removed claims of causality from our manuscript and also changed the title of the manuscript accordingly. 

      References 

      Badre, D., Poldrack, R. A., Paré-Blagoev, E. J., Insler, R. Z., & Wagner, A. D. (2005). Dissociable controlled retrieval and generalized selection mechanisms in ventrolateral prefrontal cortex. Neuron, 47(6), 907-918. doi:10.1016/j.neuron.2005.07.023

      Badre, D., & Wagner, A. D. (2007). Left ventrolateral prefrontal cortex and the cognitive control of memory. Neuropsychologia, 45(13), 2883-2901. doi:10.1016/j.neuropsychologia.2007.06.015

      Bressler, S. L., & Menon, V. (2010). Large-scale brain networks in cognition: emerging methods and principles. Trends in Cognitive Sciences, 14(6), 277-290. doi:10.1016/j.tics.2010.04.004

      Cai, W., Chen, T., Ryali, S., Kochalka, J., Li, C. S., & Menon, V. (2016). Causal Interactions Within a Frontal-Cingulate-Parietal Network During Cognitive Control: Convergent Evidence from a Multisite-Multitask Investigation. Cereb Cortex, 26(5), 2140-2153. doi:10.1093/cercor/bhv046

      Cai, W., Ryali, S., Chen, T., Li, C. S., & Menon, V. (2014). Dissociable roles of right inferior frontal cortex and anterior insula in inhibitory control: evidence from intrinsic and taskrelated functional parcellation, connectivity, and response profile analyses across multiple datasets. J Neurosci, 34(44), 14652-14667. doi:10.1523/jneurosci.3048-14.2014

      Cai, W., Ryali, S., Pasumarthy, R., Talasila, V., & Menon, V. (2021). Dynamic causal brain circuits during working memory and their functional controllability. Nat Commun, 12(1), 3314. doi:10.1038/s41467-021-23509-x

      Chen, T., Cai, W., Ryali, S., Supekar, K., & Menon, V. (2016). Distinct Global Brain Dynamics and Spatiotemporal Organization of the Salience Network. PLOS Biology, 14(6), e1002469. doi:10.1371/journal.pbio.1002469

      Kronemer, S. I., Aksen, M., Ding, J. Z., Ryu, J. H., Xin, Q., Ding, Z., . . . Blumenfeld, H. (2022). Human visual consciousness involves large scale cortical and subcortical networks independent of task report and eye movement activity. Nat Commun, 13(1), 7342. doi:10.1038/s41467-022-35117-4

      Makin, T. R., & Orban de Xivry, J. J. (2019). Ten common statistical mistakes to watch out for when writing or reviewing a manuscript. Elife, 8. doi:10.7554/eLife.48175

      Raichle, M. E., MacLeod, A. M., Snyder, A. Z., Powers, W. J., Gusnard, D. A., & Shulman, G. L. (2001). A default mode of brain function. Proc Natl Acad Sci U S A, 98(2), 676-682. doi:10.1073/pnas.98.2.676

      Seeley, W. W., Menon, V., Schatzberg, A. F., Keller, J., Glover, G. H., Kenna, H., . . . Greicius, M. D. (2007). Dissociable Intrinsic Connectivity Networks for Salience Processing and Executive Control. Journal of Neuroscience, 27(9), 2349-2356. doi:10.1523/JNEUROSCI.5587-06.2007

      Sridharan, D., Levitin, D. J., & Menon, V. (2008). A critical role for the right fronto-insular cortex in switching between central-executive and default-mode networks. Proceedings of the National Academy of Sciences, 105(34), 12569-12574. doi:10.1073/pnas.0800005105

      Wagner, A. D., Paré-Blagoev, E. J., Clark, J., & Poldrack, R. A. (2001). Recovering meaning: left prefrontal cortex guides controlled semantic retrieval. Neuron, 31(2), 329-338. doi:10.1016/s0896-6273(01)00359-2

    1. Author response:

      Reviewer #1 (Public review): 

      The authors survey the ultrastructural organization of glutamatergic synapses by cryo-ET and image processing tools using two complementary experimental approaches. The first approach employs so-called "ultra-fresh" preparations of brain homogenates from a knock-in mouse expressing a GFP-tagged version of PSD-95, allowing Peukes and colleagues to specifically target excitatory glutamatergic synapses. In the second approach, direct in-tissue (using cortical and hippocampal regions) targeting of the glutamatergic synapses employing the same mouse model is presented. In order to ascertain whether the isolation procedure causes any significant changes in the ultrastructural organization (and possibly synaptic macromolecular organization) the authors compare their findings using both of these approaches. The quantitation of the synaptic cleft height reveals an unexpected variability, while the STA analysis of the ionotropic receptors provides insights into their distribution with respect to the synaptic cleft.

      The main novelty of this study lies in the continuous claims by the authors that the sample preservation methods developed here are superior to any others previously used. This leads them as well to systematically downplay or directly ignore a substantial body of previous cryo-ET studies of synaptic structure. Without comparisons with the cryo-ET literature, it is very hard to judge the impact of this work in the field. Furthermore, the data does not show any better preservation in the so-called "ultra-fresh" preparation than in the literature, perhaps to the contrary as synapses with strangely elongated vesicles are often seen. Such synapses have been regularly discarded for further analysis in previous synaptosome studies (e.g. Martinez-Sanchez 2021). Whilst the targeting approach using a fluorescent PSD95 marker is novel and seems sufficiently precise, the authors use a somewhat outdated approach (cryo-sectioning) to generate in-tissue tomograms of poor quality. To what extent such tomograms can be interpreted in molecular terms is highly questionable. The authors also don't discuss the physiological influence of 20% dextran used for high-pressure freezing of these "very native" specimens.

      Lastly, a large part of the paper is devoted to image analysis of the PSD which is not convincing (including a somewhat forced comparison with the fixed and heavy-metal staining room temperature approach). Despite being a technically challenging study, the results fall short of expectations. 

      Our manuscript contains a discussion of both conventional EM and cryoET of synapses. We apologise if we have omitted referencing or discussing any earlier cryoET work. This was certainly not our intention, and we include a more complete discussion of published cryoET work on synapses in our revised manuscript.

      The reviewer is concerned that the synaptic vesicles in some synapse tomograms are “stretched” and that this may reflect poor preservation.  We would like to point out that such non-spherical synaptic vesicles have also been previously reported in cryoET of primary neurons grown on EM grids (Tao et al., J. Neuro, 2018). Indeed, there is no reason per se to suppose synaptic vesicles are always spherical and there are many diverse families of proteins expressed at the synapse that shape membrane curvature (BAR domain proteins, synaptotagmin, epsins, endophilins and others). We will add further discussion of this issue in the revised manuscript.

      The reviewer regards ‘cryo-sectioning’ as outdated and cryoET data from these preparations as “poor quality”. We respectfully disagree. Preparing brain tissues for cryoET is generally considered to be challenging. The first successful demonstration of preparing such samples was before the advent of the cryoEM resolution revolution (with electron counting detectors) by Zuber et al (Proc. Natl. Acad. Sci.,2005) preparing cryo-sections/CEMOVIS of in vitro brain cultures. We followed this technique to prepare tissue cryo-sections for cryoET in our manuscript. Recently, cryoFIB-SEM liftout has been developed as an alternative method to prepare tissue samples for cryoET (Mahamid et al., J. Struct. Biol., 2015) and only more recently this method became available to more laboratories. Both techniques introduce damage as has been described (Han et al., J. Microsc., 2008; Lucas et al., Proc. Natl. Acad. Sci., 2023). Importantly no like-for-like, quantitative comparison of these two methodologies has yet been performed. We have recently demonstrated that the molecular structure of amyloid fibrils within human brain is preserved down to the protein fold level in samples prepared by cryo-sectioning (Gilbert et al., Nature, 2024). We will add further detail on the process by which we excluded poor quality tomograms from our analysis, which we described in detail in our methods section.

      The reviewer asks what the physiological effect is of adding 20% w/v ~40,000 Da dextran? This is a reasonable concern since this could in principle exert osmotic pressure on the tissue sample. While we did not investigate this ourselves, earlier studies have (Zuber et al, 2005) showing cell membranes were not damaged by and did not have any detectable effect on cell structure in the presence of this concentration of dextran.

      The reviewer is not convinced by our analysis of the apparent molecular density of macromolecules in the postsynaptic compartment that in conventional EM is called the postsynaptic density. However, the reviewer provides no reasoning for this assessment nor alternative approaches that could be attempted. We would like to add that we have tested multiple different approaches to objectively measure molecular crowding in cryoET data, that give comparable results. We believe that our conclusion – that we do not observe an increased molecular density conserved at the postsynaptic membrane, and that the PSD that we and others observed by conventional EM does not correspond to a region of increased molecular density - is well supported by our data.  We and the other reviewers consider this an important and novel observation.

      Reviewer #2 (Public review): 

      Summary: 

      The authors set out to visualize the molecular architecture of the adult forebrain glutamatergic synapses in a near-native state. To this end, they use a rapid workflow to extract and plunge-freeze mouse synapses for cryo-electron tomography. In addition, the authors use knockin mice expression PSD95-GFP in order to perform correlated light and electron microscopy to clearly identify pre- and synaptic membranes. By thorough quantification of tomograms from plunge- and high-pressure frozen samples, the authors show that the previously reported 'post-synaptic density' does not occur at high frequency and therefore not a defining feature of a glutamatergic synapse.

      Subsequently, the authors are able to reproduce the frequency of post-synaptic density when preparing conventional electron microscopy samples, thus indicating that density prevalence is an artifact of sample preparation. The authors go on to describe the arrangement of cytoskeletal components, membraneous compartments, and ionotropic receptor clusters across synapses.

      Demonstrating that the frequency of the post-synaptic density in prior work is likely an artifact and not a defining feature of glutamatergic synapses is significant. The descriptions of distributions and morphologies of proteins and membranes in this work may serve as a basis for the future of investigation for readers interested in these features.

      Strengths: 

      The authors perform a rigorous quantification of the molecular density profiles across synapses to determine the frequency of the post-synaptic density. They prepare samples using two cryogenic electron microscopy sample preparation methods, as well as one set of samples using conventional electron microscopy methods. The authors can reproduce previous reports of the frequency of the post-synaptic density by conventional sample preparation, but not by either of the cryogenic methods, thus strongly supporting their claim. 

      We thank the reviewer for their generous assessment of our manuscript.

      Reviewer #3 (Public review): 

      Summary: 

      The authors use cryo-electron tomography to thoroughly investigate the complexity of purified, excitatory synapses. They make several major interesting discoveries: polyhedral vesicles that have not been observed before in neurons; analysis of the intermembrane distance, and a link to potentiation, essentially updating distances reported from plastic-embedded specimen; and find that the postsynaptic density does not appear as a dense accumulation of proteins in all vitrified samples (less than half), a feature which served as a hallmark feature to identify excitatory plastic-embedded synapses. 

      Strengths: 

      (1) The presented work is thorough: the authors compare purified, endogenously labeled synapses to wild-type synapses to exclude artifacts that could arise through the homogenation step, and, in addition, analyse plastic embedded, stained synapses prepared using the same quick workflow, to ensure their findings have not been caused by way of purification of the synapses. Interestingly, the 'thick lines of PSD' are evident in most of their stained synapses.

      (2) I commend the authors on the exceptional technical achievement of preparing frozen specimens from a mouse within two minutes.

      (3) The approaches highlighted here can be used in other fields studying cell-cell junctions.

      (4) The tomograms will be deposited upon publication which will enable neurobiologists and researchers from other fields to carry on data evaluation in their field of expertise since tomography is still a specialized skill and they collected and reconstructed over 100 excellent tomograms of synapses, which generates a wealth of information to be also used in future studies.

      (5) The authors have identified ionotropic receptor positions and that they are linked to actin filaments, and appear to be associated with membrane and other cytosolic scaffolds, which is highly exciting.

      (6) The authors achieved their aims to study neuronal excitatory synapses in great detail, were thorough in their experiments, and made multiple fascinating discoveries. They challenge dogmas that have been in place for decades and highlight the benefit of implementing and developing new methods to carefully understand the underlying molecular machines of synapses.

      Weaknesses: 

      The authors show informative segmentations in their figures but none have been overlayed with any of the tomograms in the submitted videos. It would be helpful for data evaluation to a broad audience to be able to view these together as videos to study these tomograms and extract more information. Deposition of segmentations associated with the tomgrams would be tremendously helpful to Neurobiologists, cryo-ET method developers, and others to push the boundaries.

      Impact on community: 

      The findings presented by Peukes et al. pertaining to synapse biology change dogmas about the fundamental understanding of synaptic ultrastructure. The work presented by the authors, particularly the associated change of intermembrane distance with potentiation and the distinct appearance of the PSD as an irregular amorphous 'cloud' will provide food for thought and an incentive for more analysis and additional studies, as will the discovery of large membranous and cytosolic protein complexes linked to ionotropic receptors within and outside of the synaptic cleft, which are ripe for investigation. The findings and tomograms available will carry far in the synapse fields and the approach and methods will move other fields outside of neurobiology forward. The method and impactful results of preparing cryogenic, unlabelled, unstained, near-native synapses may enable the study of how synapses function at high resolution in the future.

      We thank the reviewer for their supportive assessment of our manuscript.  We thank the reviewer for suggesting overlaying segmentations with videos of the raw tomographic volumes. We will include this in our revised manuscript.

    1. Author response:

      Response to Reviewer #1:

      “Claiming a possible therapeutic role for this gene is a bit far-fetched at the present state of the art”.

      We agree that while the therapeutic relevance of Svep1 is not clear at this point, this potential is always something we consider in interpreting our data.

      Response to Reviewer #2:

      a. “The weakness of this paper is that it does not present a convincing explanation for how Svep1 regulates any of the phenotypes described. In this regard, a demonstration of a genetic interaction between Svep1 and FGF9 mutants or a careful characterization of a tissue-specific knockdown of Svep1, could be insightful. In addition, a comparison of the phenotype of Svep1 mutants and the phenotypes of other mutants affecting ECM components would be worthwhile”. 

      We agree that additional experiments are needed to determine how exactly Svep1 contributes to the phenotypes described. While our preliminary data point to an interaction of Svep1 and Fgf9, we agree that additional data are needed to prove that such interaction is a primary driver of the phenotypes observed.

      b. “A minor weakness is that the title of the paper is not fully supported by the data presented. While the defects in the morphogenesis of the distal lung in Svep1 mutants presage a defect in alveolar differentiation, this cannot be formally demonstrated since the animals die soon after birth”

      The reviewer is correct that we cannot formally demonstrate this in the current model. The profound defects observed in Svep1 mutants lead to early death, making it challenging to study the full process of alveolarization. However, it is important to note that lung morphogenesis is a continuous process in which earlier developmental phases lay the groundwork for subsequent stages. During the branching phase, the fate of alveolar cell types is established, while the saccular stage serves as a critical foundation for alveolar development, where alveolar cells begin to differentiate. We believe that the significant abnormalities in cellular differentiation observed prior to the bulk of alveolarization indicate likely defects in the later stages of alveolar differentiation. Therefore, while the model limits our ability to directly assess alveolarization, we anticipate that defects in cellular differentiation will continue to manifest beyond the saccular stage in Svep1 mice.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      The manuscript by Zhang et al. analyzed 17 specimens of Cindarella eucalla with 3D technology and discussed the anatomical findings, the relationship to other artiopods, and the ecology of the animal. The results are excellent and the findings are very interesting. However, the discussion needs to be extended, as the point the authors are trying to make is not always clear. I also recommend some restructuring of the discussion. Overall this is an important manuscript, and I'm looking forward to reading the edited version.

      Strengths:

      The analyses, the 3D data is excellent and provides new information.

      Weaknesses:

      The discussion - the authors provide information for the findings, but do not discuss them in detail. More information is needed.

      We are committed to enhancing the quality of our manuscript further and, in response to your comments, will implement the following improvements:

      (1) Comparative Analysis of Eyes: We will expand our discussion to include a detailed comparative analysis of the eyes of Cindarella eucalla with those of other artiopods (e.g. Xandarellids, trilobites, living insects), focusing on morphology, size, and other relevant characteristics.

      (2) Segmental Mismatch Discussion: We will provide an in-depth exploration of the specifics and significance of the segmental mismatch to offer a clearer understanding of its implications. We will also compare the characteristics of this mismatch in our focal species with those observed in extant arthropods, such as spiders and myriapods. This comparison will be further enriched by integrating our phylogenetic analysis, thereby providing a broader evolutionary context.

      (3) Methodological Clarity: We will provide more detailed information on the parameters used for the analyses in the Methods section, especially the phylogenetic sections and the X-ray tomography section.

      (4) Phylogenetic Analysis: We will engage in a more in-depth discussion of certain characters (e.g. anterior sclerite, hypostome, endopodite, segmental mismatch, etc.) within our phylogenetic analyses to clarify their relevance and contribution to our findings.

      Reviewer #2 (Public review):

      Summary:

      Zhang et al. present very well-illustrated specimens of the artiopodan Cinderella eucalla from the Chengjiang Biota. Multiple specimens are shown with preserved appendages, which is rare for artiopodans and will greatly help our understanding of this taxon. The authors use CT scanning to reveal the ventral organization of this taxon. The description of the taxon needs some modification, specifically expansion of the gut and limb morphology. The conclusion that Cinderella was a fast-moving animal is very weak, comparisons with extant fast animals and possibly FEA analyses are necessary to support such a claim. Although the potential insights provided by such well-preserved fossils could be valuable, the claims made are tenuous and based on the available evidence presented herein.

      Strengths:

      The images produced through CT scanning specimens reveal the very fine detail of the appendages and are well illustrated. Specimens preserve guts and limbs, which are informative both for the phylogenetic position and ecology of this taxon. The limbs are very well preserved, with protopodite, exopodite, and endopodites visible. Addressing the weaknesses below will make the most of this compelling data that demonstrates the morphology of the limbs well.

      Weaknesses:

      Although this paper includes very well-illustrated fossils, including new information on the eyes, guts, and limbs of Cinderella, the data are not fully explained, and the conclusions are weakly supported.

      The authors suggest the preservation of complex ramifying diverticular, but it should be better illustrated and the discussion of the gut diverticulae should be longer, especially as gut morphology can provide insights into the feeding strategy.

      The conclusion that Cinderella eucalla was fast, sediment feeding in a muddy environment, is not well supported. These claims seem to be tenuously made without any evidence to support them. The authors should add a new section in the discussion focused on feeding ecology where they explicitly compare the morphology to suspension-feeding artiopodans to justify whether it fed that way or not. To further explore feeding, the protopodite morphology needs to be more carefully described and compared to other known taxa. The function of endites on the endopodite to stir up sediment for particle feeding in a muddy environment would also need to be more thoroughly discussed and compared with modern analogs. The impact of their findings is not highlighted in the discussion, which is currently more of a review of what has been previously said and should focus more on what insights are provided by the great fossils illustrated by the authors.

      The authors argue that their data supports fast escaping capabilities, but it is not clear how they reached that conclusion based on the data available. Is there a way this can be further evaluated? The data is impressive, so including comparisons with extant taxa that display fast escaping strategies would help the authors make their case more compelling. The authors also claim that the limbs of Cinderella are strong, again this conclusion is unclear. Comparison with the limbs of other taxa to show their robustness would be useful. To actually test how these limbs deal with the force and strain applied to them by a sudden burst of movement, the authors could conduct Finite Element Analyses.

      Here are the key points we plan to address:

      (1) Gut and Limb Morphology: We will expand our description of the gut and limb morphology of C. eucalla, providing a more detailed comparison and analysis. This will include a revised discussion on the function and ecological implications of these features.

      (2) Fast-Moving Animal Claim: We acknowledge your concern about the conclusion that C. eucalla was a fast-moving animal. We will conduct a more detailed comparison among C. eucalla and other Cambrian artiopods and living arthropods, focusing on morphological and functional aspects. We will also reconsider our claim and will be more cautious in our conclusions. If the comparison proves insufficient, we will remove this assertion from the manuscript. But we may no longer conduct Finite Element Analysis, as a comprehensive and cautious analysis would require a massive project to complete.

      (3) Sediment Feeding in a Muddy Environment: We will revise the section discussing the feeding ecology of C. eucalla. We will enhance this section by comparing the morphology of C. eucalla to that of suspension-feeding artiopods, which will help to substantiate our claims. Additionally, we will expand the discussion to include a more detailed examination of endites, gnathobases, and other relevant anatomical structures.

      (4) Impact of Findings: We will endeavor to highlight the impact of our findings in the discussion, focusing on the insights provided by the well-preserved fossils illustrated in our study.

      Reviewer #3 (Public review):

      This paper provides an interesting description of the ventral parts of the Cambrian xandarellid Cindarella eucalla, derived from exceptionally preserved specimens of the Chengjiang Biota. These morphological data are useful for our broad understanding and future research on Xandarellida, and are generally well-represented in the description and accompanying figures. The strengths of this work rest in this morphological description of exceptional fossil material, and this is generally well supported. In addition, the authors put this description in the context of the morphology of other xandarellids and Cambrian arthropod groups, with most of these parallels being useful and reasonably supported, though in several places homology is assumed and this currently lacks evidence. The manuscript goes on to use these morphological data and comparisons to other groups (particularly trilobites) to make suggestions for the ecology of Cindarella eucalla and other xandarellids. The majority of my comments on this work relate to this latter aim - the ecological conclusions drawn are generally derived through morphological comparisons, where a specific morphology has been suggested as an adaption to a particular ecological function in another extinct arthropod group. However, the original suggestions for ecological function are untested, and so remain hypotheses. Despite this, they are frequently presented as truisms to enable ecological conclusions to be drawn for Cindarella eucalla. I have listed my comments and queries on the study below for the authors to address or respond to, and I hope they are useful to the authors.

      Comments:

      There are a number of ecological and functional morphology conclusions stated that seem put too strongly to be considered sufficiently supported by the evidence given. These relate to both the description of C. eucalla, and comparisons to other extinct arthropod taxa (notably trilobites). Many of these latter statements are assumptions of functional morphology, and should not be repeated as truisms, rather than they represent suggested functions and ecologies based on the known morphological descriptions. This aspect occurs throughout the article, and, for me, is the primary concern.

      We plan to address the following points in upon revision:

      (1) Homology Assumptions: You pointed out that we have assumed homology in certain instances without sufficient evidence. We will revise the manuscript to include a more detailed analysis of the anterior sclerite and exite, considering phylogenetic relationships and morphological comparisons to provide a more robust discussion.

      (2) Ecological and Functional Morphology: We acknowledge that our conclusions regarding the ecological function were presented with too much certainty. We will adopt a more cautious approach in our discussion, ensuring that our ideas are clearly labeled as such and are supported by a comparison of relevant studies on Cambrian artiopods and extant arthropods, including fluid dynamics, functional morphology, etc. We will re-evaluate the ecological function section, and if it does not adds value and clarity to the manuscript—our speculations do not contribute to the understanding of the specimen or may lead to misunderstandings—we will remove the relevant parts. We believe future changes reflect a more cautious and rigorous approach to the ecological and functional interpretations of C. eucalla.

    1. Author response:

      The following is the authors’ response to the current reviews.

      We thank the reviewers and editor for their positive assessment of our work. For the Version of Record, we have made small revisions addressing the remaining concerns of reviewer #3. We have also reformatted the supplementary material to conform to eLife’s style.

      While the manuscript was under review, we discussed our work with Bill Bialek, who suggested clarifying the effect of cell rearrangements on genetic patterns. Using the tracked cell trajectories we found that the highly coordinated intercalations in the germ band preserve the relative AP positions of cells. We have added an Appendix subsection (Appendix 1.5) explaining this finding and highlighting its relevance in a short paragraph added to the discussion.

      Reviewer #2

      Main comment from 1st review:

      Weaknesses:

      The modeling is interesting, with the integration of tension through tension triangulation around vertices and thus integrating force inference directly in the vertex model. However, the authors are not using it to test their hypothesis and support their analysis at the tissue level. Thus, although interesting, the analysis at the tissue level stays mainly descriptive.

      Comments on the revised version:

      My main concern was that the author did not use the analysis of mutant contexts such as Snail and Twist to confirm their predictions. They made a series of modifications, clarifying their conclusions. In particular, they now included an analysis of Snail mutant and show that isogonal deformations in the ventro-lateral regions are absent when the external pulling force of the VF is abolished, supporting the idea that isogonal strain could be used as an indicator of external forces (Fig7 and S6).

      They further discuss their results in the context of what was published regarding the mutant backgrounds (fog, torso-like, scab, corkscrew, ksr) where midgut invagination is disrupted, and where germ band buckles, and propose that this supports the importance of internal versus external forces driving GBE.

      Overall, these modifications, in addition to clarifications in the text, clearly strengthen the manuscript.

      We thank the reviewer for assessing our manuscript again and are happy to hear that they find the added data on the snail mutant convincing and that our revised manuscript is stronger.

      Reviewer #3

      In their article "The Geometric Basis of Epithelial Convergent Extension", Brauns and colleagues present a physical analysis of drosophila axis extension that couples in toto imaging of cell contours (previously published dataset), force inference, and theory. They seek to disentangle the respective contributions of active vs passive T1 transitions in the convergent extension of the lateral ectoderm (or germband) of the fly embryo.

      The revision made by the authors has greatly improved their work, which was already very interesting, in particular the use of force inference throughout intercalation events to identify geometric signatures of active vs passive T1s, and the tension/isogonal decomposition. The new analysis of the Snail mutant adds a lot to the paper and makes their findings on the criteria for T1s very convincing.

      About the tissue scale issues raised during the first round of review. Although I do not find the new arguments fully convincing (see below), the authors did put a lot of effort to discuss the role of the adjacent posterior midgut (PMG) on extension, which is already great. That will certainly provide the interested readers with enough material and references to dive into that question.

      We appreciate the referee’s positive assessment of our manuscript and their careful reading and constructive feedback. In particular, we are happy to hear that the referee finds our added data on the snail mutant very convincing and finds that the extended discussion on the role of the PMG is helpful. We address the remaining concerns in our detailed response below.

      I still have some issues with the authors' interpretation on the role of the PMG, and on what actually drives the extension. Although it is clear that T1 events in the germ band are driven by active local tension anisotropy (which the authors show but was already well-established), it does not show that the tissue extension itself is powered by these active T1s. Their analysis of "fence" movies from Collinet et al 2015 (Tor mutants and Eve RNAi) is not fully convincing. Indeed, as the authors point out themselves, there is no flow in Tor mutant embryos, even though tension anisotropy is preserved. They argue that in Tor embryos the absence of PMG movement leaves no room for the germband to extend properly, thus impeding the flow. That suggests that the PMG acts as a barrier in Tor mutants - What is it attached to, then?

      We thank the referee for pointing out this omission: The PMG is attached to the vitelline membrane in the scab domain (Munster et al. Nature 2019) and is also obstructed from moving by more anterior laying tissue (amnioserosa). It therefore acts as an obstacle for GBE extension if it fails to invaginate (e.g. in a Tor embryo). We have clarified this in the discussion of the Tor mutants.

      The authors also argue that the posterior flow is reduced in "fenced" Eve RNAi embryos (which have less/no tension anisotropy), to justify their claim that it is the anisotropy that drives extension. However, previous data, including some of the authors' (Irvine and Wieschaus, 1994 - Fig 8), show that the first, rapid phase of germband extension is left completely unaffected in Eve mutants (that lack active tension anisotropy). Although intercalation in Eve mutants is not quantified in that reference, this was later done by others, showing that it is strongly reduced.

      The quantification of GBE in Irvine and Wieschaus 1994 was based on the position of the PMG from bright field imaging, making it hard to distinguish the contributions of ventral furrow, PMG, and germ band, particularly during the early phase of GBE where all these processes happen simultaneously. More detailed quantifications based on PIV analysis of in toto light-sheet imaging show significantly reduced tissue flow in eve mutants after the completion of ventral furrow invagination (Lefebvre et al., eLife 2023). That the initial fast flow is driven by ventral furrow invagination, not by the PMG is apparent from twist/snail embryos where the initial phase is significantly slower (Lefebvre et al., eLife 2023, Gustafson et al., Nat Comms 2022). We have added these references to the re-analysis and discussion of the Collinet et al 2015 experiments.

      Similarly, the Cyto-D phenotype from Clement et al 2017, in which intercalation is also strongly reduced, also displays normal extension.

      We agree that a careful quantification of tissue flow in Cyto-D-treated embryos would be interesting. Whether they show normal extension is not clear from the Clement et al. 2017 paper, as no quantification of total tissue flow is performed and no statements regarding extension are made there.

      Reviewer #3 (Recommendations For The Authors):

      • A lot of typos / grammar mistakes / repetitions are still found here and there in the paper. Authors should plan a careful re-reading prior to final publication.

      We have carefully checked the manuscript and fixed the typos and grammar mistakes.

      • I failed to point to a very relevant reference in the previous round of review, which I think the authors should cite and comment: A review by Guirao & Bellaiche on the mechanics of intercalation in the fly germband, which notably discusses the passive/active and stress-relaxing/stress-generating nature of T1s. (Guirao and Bellaiche, Current opinions in cell biology 2017), in particular figures 1 and 2.

      We thank the referee for pointing us to this relevant reference which we now cite in the introduction.

      • Any new arguments/discussion the authors see fit to include in the paper to comment on the Eve/Tor phenotypes. As far as I am concerned, I am not fully convinced at the moment (see review), but I think the paper has other great qualities and findings, and now (since the first round of review) sufficiently discusses that particular matter. I leave it up to the authors how much (more) they want to delve into this in their final version!

      We have added clarifications and references to the discussion of the Eve/Tor phenotypes.


      The following is the authors’ response to the original reviews.

      Public Review:

      Joint Public Review:

      Summary:

      Brauns et al. work to decipher the respective contribution of active versus passive contributions to cell shape changes during germ band elongation. Using a novel quantification tool of local tension, their results suggest that epithelial convergent extension results from internal forces.

      Reading this summary, and the eLife assessment, we realized that we failed to clearly communicate important aspects of our findings in the first version of our manuscript. We therefore decided to largely restructure and rewrite the abstract and introduction to emphasize that:

      ● Our analysis method identifies active vs passive contributions to cell and tissue shape changes during epithelial convergent extension

      ● In the context of Drosophila germ band extension, this analysis provides evidence for a major role for internal driving forces rather than external pulling force from neighboring tissue regions (posterior midgut), thus settling a question that has been debated due to apparently conflicting evidence from different experiments.

      ● Our findings have important implications for local, bottom-up self-organization vs top-down genetic control of tissue behaviors during morphogenesis.

      Strengths:

      The approach developed here, tension isogonal decomposition, is original and the authors made the demonstration that we can extract comprehensive data on tissue mechanics from this type of analysis.

      They present an elegant diagram that quantifies how active and passive forces interact to drive cell intercalations.

      The model qualitatively recapitulates the features of passive and active intercalation for a T1 event.

      Regions of high isogonal strains are consistent with the proximity of known active regions.

      We think this statement is somewhat ambiguous and does not summarize our findings precisely. A more precise statement would be that high isogonal strain identifies regions of passive deformation, which is caused by adjacent active regions.

      They define a parameter (the LTC parameter) which encompasses the geometry of the tension triangles and allows the authors to define a criterium for T1s to occur.

      The data are clearly presented, going from cellular scale to tissue scale, and integrating modeling approaches to complement the thoughtful description of tension patterns.

      Weaknesses:

      The modeling is interesting, with the integration of tension through tension triangulation around vertices and thus integrating force inference directly in the vertex model. However, the authors are not using it to test their hypothesis and support their analysis at the tissue level. Thus, although interesting, the analysis at the tissue level stays mainly descriptive.

      We fully agree that a full tissue scale model is crucial to support the claims about tissue scale self-organization we make in the discussion. However, the full analysis of such a model is beyond the scope of the present manuscript. We have therefore split off that analysis into a companion manuscript (Claussen et al. 2023). In this paper, we show that the key results of the tissue-scale analysis of the Drosophila embryo, in particular the order-to-disorder transition associated with slowdown of tissue flow, are reproduced and rationalized by our model.

      We now refer more closely to this companion paper to point the reader to the results presented there.

      Major points:

      (1) The authors mention that from their analysis, they can predict what is the tension threshold required for intercalations in different conditions and predict that in Snail and Twist mutants the T1 tension threshold would be around √2. Since movies of these mutants are most probably available, it would be nice to confirm these predictions.

      This is an excellent suggestion. We have included an analysis of a recording of a Snail mutant, which is presented in the new Figures 4 and S6. As predicted, we find that isogonal deformations in the ventro-lateral regions are absent when the external pulling force of the VF is abolished. Further, in the absence of isogonal deformation, T1 transitions indeed occur at a critical tension of approx. √2, as predicted by our model. Both of these results provide important experimental evidence for our model and for isogonal strain as a reliable indicator of external forces.

      (2) While the formalism is very elegant and convincing, and also convincingly allows making sense of the data presented in the paper, it is not all that clear whether the claims are compatible with previous experimental observations. In particular, it has been reported in different papers (including Collinet et al NCB 2015, Clement et al Curr Biol 2017) that affecting the initial Myosin polarity or the rate of T1s does not affect tissue-scale convergent extension. Analysis/discussion of the Tor phenotype (no extension with myosin anisotropy) and the Eve/Runt phenotype (extension without Myosin anisotropy), which seem in contradiction with an extension mostly driven by myosin anisotropy.

      We are happy to read that the referees find our approach elegant and convincing. The referees correctly point out that we have failed to clearly communicate how our findings connect to the existing literature on Drosophila GBE. Indeed, the conflicting results reported in the literature on what drives GBE – internal forces (myosin anisotropy) or external forces (pulling by the posterior midgut) – were a motivation for our study. We have extensively rewritten the introduction, results section (“Isogonal strain identifies regions of passive tissue deformation”), and discussion (“Internal and external contributions to germ band extension”) in response to the referee’s request.

      In brief, distinguishing active internal vs passive external driving of tissue flow has been a fundamental open question in the literature on morphogenesis. Our tension-isogonal decomposition now provides a way to answer this question on the cell scale, by identifying regions of passive deformation due to external forces. As we now explain more clearly, our analysis shows that germ band extension is predominantly driven by internal tension dynamics, and not pulling forces from the posterior midgut.

      We put this cell-scale evidence into the context of previous experimental observations on the tissue scale: Genetic mutants (fog, torso-like, scab, corkscrew, ksr), where posterior midgut invagination is disrupted (Muenster et al. 2019, Smits et al. 2023). In these mutants, the germ band buckles forming ectopic folds or twists into a corkscrew shape as it extends, pointing towards a buckling instability characteristic of internally driven extensile flows.

      To address the apparently conflicting evidence from Collinet et al. 2015, we carried out a

      quantitative re-analysis of the data presented in that reference (see new SI section 3 and Fig.

      S11). The results support the conclusion that the majority of GBE flow is driven internally, thus resolving the apparent conflict.

      Lastly, as far as we understand, Clement et al. 2017 appears to be compatible with our picture of active T1 transitions. Clement et al. report that the actin cortex, when loaded by external forces, behaves visco-elastically with a relaxation time of the order of minutes, in line with our model for emerging interfaces post T1.

      We again thank the referees for prompting us to address these important issues and believe that including their discussion has significantly strengthened our manuscript.

      Recommendations for the authors:

      Minor points:

      - Fig 2 : authors should state in the main text at which scale the inverse problem is solved. (Intercalating quartet, if I understood correctly from the methods) ? and they should explain and justify their choice (why not computing the inverse at a larger scale).

      We have rephrased the first sentence of the section “Cell scale analysis” to clarify that we use local tension inference. This local inference is informative about the relative tension of one interface to its four neighbors. The focus on this local level is justified because we are interested in local cell behaviors, namely rearrangements. Tension inference is also most robust on the local level, since this is where force balance, the underlying physical determinant of the link between mechanics and geometry, resides. In global tension inference, spurious large scale gradients can appear when small deviations from local force balance accumulate over large distances. We have added a paragraph in SI Sec. 1.4 to explain these points.

      -Fig 2 : how should one interpret that tension after passive intercalation (amnioserosa) is higher than before. On fig 2E, tension has not converged yet on the plot, what happens after 20 minutes ?

      Recall that the inferred tension is the total tension on an interface. While on contracting interfaces, the majority of this tension will be actively generated by myosin motors, on extending interfaces there is also a contribution carried by passive crosslinkers. The passive tension can be effectively viewed as viscous dissipation on the elongating interface as crosslinkers turn over (Clement et al. 2017). Note that this passive tension is explicitly accounted for in the model presented in Fig. 5. Notably, it is crucial for the T1 process to resolve in a new extending junction. In the amnioserosa, the tension post T1 remains elevated because the amnioserosa is continually stretched by the convergence of the germ band. The tension hence does not necessarily converge back to 1. However, our estimates for the tension after 20 mins post T1 are very noisy because most of the T1s happen relatively late in the movie (past the 25 min mark) and therefore there are only a few T1s where we can track the post-T1 dynamics for more than 20 mins.

      We have added a brief explanation of the high post-T1 tension at the end of the section entitled “Relative tension dynamics distinguishes active and passive intercalations”. Further, we have moved up the section describing the minimal model right after the analysis of the relative tension during intercalations. We believe that this helps the reader better understand these findings before moving on to the tension-isogonal decomposition which generalizes them to the tissue scale.

      Page 7-8 / Figure 3: It is unclear how the decomposition into 1) physical shape 2) tension shape 2) isogonal shape works exactly. A more detailed explanation and more clear illustration of what a quartet is and its labels could help.

      We have added a more detailed explanation in the main text. See our response to the longer question regarding this point below.

      -What exactly defines the boundary curve in figure 3E? How is it computed?

      We have added a sentence in the caption for Fig. 3E explaining that the boundary curve is found by solving Eq. (1) with l set to zero for the case of a symmetric quartet. We have also added a brief explanation immediately below Eq. (1) pointing out that this equation defines the T1 threshold in the space of local tensions T_i in terms of the isogonal length l_iso.

      -The authors should consider incorporating some details described in the SI file to the main text to clarify some points, as long as the accessible style of the manuscript can be kept. The points mentioned below may also be clarified in the SI doc. The specific points that could be elaborated are: Page 7-8 / Figure 3: It is unclear how the decomposition into 1) physical shape 2) tension shape 2) isogonal shape works exactly. A more detailed explanation and more clear illustration of what a quartet is and its labels could help. The mapping to Maxwell-Cremona space is fine, but which subset is the quartet? For a set of 4 cells with two shared vertices and a junction, aren't there 5 different tension vectors? Are we talking two closed force triangles? Separately, how do you exactly decompose the deformation (of 4 full cell shapes or a subset?) into isogonal and non-isogonal parts? What is the least squares fit done over - is this system underdetermined? Is this statistically averaged or computed per quartet and then averaged?

      We thank the referees for pointing us to unclear passages in our presentation. We hope that our revisions have resolved the referee’s questions. As described above, we have clarified the tension-isogonal decomposition in the main text. We have also revised the corresponding SI section (1.5) to address the above questions. A sketch of the quartet with labels is found in SI Fig. S7A which we now refer to explicitly in the main text.

      We always consider force-balance configurations, i.e. closed force triangles. Therefore in the “kite” formed by two adjacent tension triangles, only three tension vectors are independent.

      The decomposition of deformation is performed as follows: For each of the four cells, the center of mass c_i is calculated. Next, tension inference is performed to find the two tension triangles with tension vectors T_ij. Now there are three independent centroidal vectors c_j - c_i and three corresponding independent tension vectors T_ij. We define the isogonal deformation tensor I_quratet as the tensor that maps the centroidal vectors to the tension vectors. In general this is not possible exactly, because I_quartet has only three independent components, but there are six equations.

      The plots in Fig. 3C, C’ are obtained by performing this decomposition for each intercalating quartet individually. The data is then aligned in time and ensemble averages are calculated for each timepoint.

      For tissue-scale analysis in Fig. 6, the decomposition is performed for individual vertices (i.e. the corresponding centroidal and tension triangles) and then averaged locally to find the isogonal strain fields shown in Fig. 6B, B’.

      - Line 468: "Therefore, tissue-scale anisotropy of active tension is central to drive and orient convergent-extension flow [10, 57, 59, 60]." Authors almost never mention the contribution of the PMG to tissue extension. Yet it is known to be crucial (convergent extension in Tor mutants is very much affected). Please discuss this point further.

      The referees raise an important point: as discussed in our response to major point (2), we now explicitly discuss the role of internal (active tension) and external (PMG pulling) forces during germ band extension. Please see our response to major point (2) for the changes we made to the manuscript to address this.

      In particular, we now explain that in mutants where PMG invagination is impaired (fog, torso-like, torso, scab, corkscrew), the germ band buckles out of plane or extends in a twisted, corkscrew fashion (Smits et al. 2023). This shows that the germ band generates extensile forces largely internally. In torso mutants, the now stationary PMG acts as a barrier which blocks GBE extension; the germ band buckles as a response.

      The role of PMG invagination hence lies not in creating pulling forces to extend the germ band, but rather in “making room” to allow for its orderly extension. As shown by the genetics mutants just discussed, the synchronization of PMG invagination and GBE is crucial for successful gastrulation.

      -Typos:

      Line 74: how are intercalations are

      Line 84: vertices vertices

      Line 233: very differently

      Line 236: are can

      Line 390: energy which is the isogonal mode must

      Line 1585: reveals show

      Line 603: area Line 618: in terms of on the

      We have fixed these typos.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This valuable study revisits the effects of substitution model selection on phylogenetics by comparing reversible and non-reversible DNA substitution models. The authors provide evidence that 1) non time-reversible models sometimes perform better than general time-reversible models when inferring phylogenetic trees out of simulated viral genome sequence data sets, and that 2) non time-reversible models can fit the real data better than the reversible substitution models commonly used in phylogenetics, a finding consistent with previous work. However, the methods are incomplete in supporting the main conclusion of the manuscript, that is that non time-reversible models should be incorporated in the model selection process for these data sets.

      The non-reversible models should be incorporated in the selection model process not because the significantly perform better but only because the do not perform worse than the reversible models and that true biochemical processes of nucleotide substitution does support the science of non-reversibility.

      Reviewer #1 (Public Review):

      The study by Sianga-Mete et al revisits the effects of substitution model selection on phylogenetics by comparing reversible and non-reversible DNA substitution models. This topic is not new, previous works already showed that non-reversible, and also covarion, substitution models can fit the real data better than the reversible substitution models commonly used in phylogenetics. In this regard, the results of the present study are not surprising. Specific comments are shown below.

      True.

      Major comments

      It is well known that non-reversible models can fit the real data better than the commonly used reversible substitution models, see for example,

      https://academic.oup.com/sysbio/article/71/5/1110/6525257

      https://onlinelibrary.wiley.com/doi/10.1111/jeb.14147?af=R

      The manuscript indicates that the results (better fitting of non-reversible models compared to reversible models) are surprising but I do not think so, I think the results would be surprising if the reversible models provide a better fitting.

      I think the introduction of the manuscript should be increased with more information about non-reversible models and the diverse previous studies that already evaluated them. Also I think the manuscript should indicate that the results are not surprising, or more clearly justify why they are surprising.

      The surprise in the findings is in NREV12 performing better than NREV6 for double stranded DNA viruses as it was expected that NREV6 would perform better given the biochemical processes discussed in the introduction.

      In the introduction and/or discussion I missed a discussion about the recent works on the influence of substitution model selection on phylogenetic tree reconstruction. Some works indicated that substitution model selection is not necessary for phylogenetic tree reconstruction, https://academic.oup.com/mbe/article/37/7/2110/5810088 https://www.nature.com/articles/s41467-019-08822-w https://academic.oup.com/mbe/article/35/9/2307/5040133

      While others indicated that substitution model selection is recommended for phylogenetic tree reconstruction, https://www.sciencedirect.com/science/article/pii/S0378111923001774 https://academic.oup.com/sysbio/article/53/2/278/1690801 https://academic.oup.com/mbe/article/33/1/255/2579471

      The results of the present study seem to support this second view. I think this study could be improved by providing a discussion about this aspect, including the specific contribution of this study to that.

      In our conclusion we have stated that: The lack of available data regarding the proportions of viral life cycles during which genomes exist in single and double stranded states makes it difficult to rationally predict the situations where the use of models such as GTR, NREV6 and NREV12 might be most justified: particularly in light of the poor over-all performance of NREV6 and GTR relative to NREV12 with respect to describing mutational processes in viral genome sequence datasets. We therefore recommend case-by-case assessments of NREV12 vs NREV6 vs GTR model fit when deciding whether it is appropriate to consider the application of non-reversible models for phylogenetic inference and/or phylogenetic model-based analyses such as those intended to test for evidence of natural section or the existence of molecular clocks.

      The real data was downloaded from Los Alamos HIV database. I am wondering if there were any criterion for selecting the sequences or if just all the sequences of the database for every studied virus category were analysed. Also, was any quality filter applied? How gaps and ambiguous nucleotides were considered? Notice that these aspects could affect the fitting of the models with the data.

      We selected varying number of sequences of the database for every studied virus type. Using the software aliview we did quality filter by re-aligning the sequences per virus type.

      How the non-reversible model and the data are compared considering the non-reversible substitution process? In particular, given an input MSA, how to know if the nucleotide substitution goes from state x to state y or from state y to state x in the real data if there is not a reference (i.e., wild type) sequence? All the sequences are mutants and one may not have a reference to identify the direction of the mutation, which is required for the non-reversible model. Maybe one could consider that the most abundant state is the wild type state but that may not be the case in reality. I think this is a main problem for the practical application of non-reversible substitution models in phylogenetics.

      True.

      Reviewer #2 (Public Review):

      The authors evaluate whether non time reversible models fit better data presenting strand-specific substitution biases than time reversible models. Specifically, the authors consider what they call NREV6 and NREV12 as candidate non time-reversible models. On the one hand, they show that AIC tends to select NREV12 more often than GTR on real virus data sets. On the other hand, they show using simulated data that NREV12 leads to inferred trees that are closer to the true generating tree when the data incorporates a certain degree of non time-reversibility. Based on these two experimental results, the authors conclude that "We show that non-reversible models such as NREV12 should be evaluated during the model selection phase of phylogenetic analyses involving viral genomic sequences". This is a valuable finding, and I agree that this is potentially good practice. However, I miss an experiment that links the two findings to support the conclusion: in particular, an experiment that solves the following question: does the best-fit model also lead to better tree topologies?

      By NREV12 leading to inferred trees that are closer to the true generating tree as compared to GTR, it then shows that the best-fit model in this case being NREV12 leads to better tree topologies.

      On simulated data, the significance of the difference between GTR and NREV12 inferences is evaluated using a paired t test. I miss a rationale or a reference to support that a paired t test is suitable to measure the significance of the differences of the wRF distance. Also, the results show that on average NREV12 performs better than GTR, but a pairwise comparison would be more informative: for how many sequence alignments does NREV12 perform better than GTR?

      We have used the popular paired t-test as it is the most widely used when comparing means values between two matched samples where the difference of each mean pair is normally distributed. And the wRF distances do match the guidelines above.

      The paired t-test contains the pairwise comparison and the boxplots side by side show the pairwise wRF comparisions..

      Reviewer #1 (Recommendations For The Authors):

      Minor comments

      The reversible and non-reversible models used in this study assume that all the sites evolve under the same substitution matrix, which can be unrealistic. This aspect could be mentioned.

      Done.

      The manuscript indicates that "a phylogenetic tree was inferred from an alignment of real sequences (Avian Leukosis virus) with an average sequence identity (API) of ~90%.". I was wondering under which substitution model that phylogenetic tree reconstruction was performed? could the use of that model bias posterior results in terms of favoring results based on such a model?

      We have stated on page ….. that the GTR+G model was used to reconstruct the tree. The use of the GTR+G model could yes bias the posterior results as we have stated on page ….

      I was wondering which specific R function was used to calculate the weighted Robinson-Foulds metric. I think this should be included in the manuscript.

      We stated that We used the weighted Robinson-Foulds metric (wRF; implemented in the R phangorn package (Schliep, 2011)⁠)

      Despite a minority, several datasets fitted better with a reversible model than with a non-reversible model. I think that should be clearly indicated.

      In addition, in my opinion the AIC does not enough penalizes the number of parameters of the models and favors the non-reversible models over the reversible models, but this is only my opinion based on the definition of AIC and it is not supported. Thus, I think the comparison between phylogenetic trees reconstructed under different substitution models was a good idea (but see also my second major comment).

      Noted.

      When comparing phylogenetic trees I was wondering if one should consider the effect of the estimation method and quality of the studied data? For example, should bootstrap values be estimated for all the ancestral nodes and only ancestral nodes with high support be evaluated in the comparison among trees?

      Yes the estimation method and quality of the studied data should be considered. When using RF unlike wRF this will not matter but for weighted RF it does. When building the trees, using RaxML only high support nodes are added to the tree.

      In Figure 3, I do not see (by eye) significant differences among the models. I see in the legend that the statistical evaluation was based on a t test but I am not much convinced. Maybe it is only my view. Exactly, which pairs of datasets are evaluated with the t test? Next, I would expect that the influence of the substitution model on the phylogenetic tree reconstruction is higher at large levels of nucleotide diversity because with more substitution events there is more information to see the effects of the model. However, the t test seems to show that differences are only at low levels of nucleotide diversity (and large DNR), what could be the cause of this?

      The paired T-tests compares the wRF distances of the inferred tree real tree and the trees simulated using the GTR model verses the wRF distances of the inferred true tree from the trees simulated using the NREV12 model.

      The reason why the influence of the NREV12 model on the tree reconstructed is not significantly higher at large levels of nucleotide diversity could be because at a certain level the DNR are simply unrealistic.

      Can the user perform substitution model selection (i.e., AIC) among reversible and non-reversible substitution models with IQTREE? If yes, then doing that should be the recommendation from this study, correct?

      But, can DNR be estimated from a real dataset? DNR seems to be the key factor (Figure 3) for the phylogenetic analysis under a proper model.

      Substitution model selection can be performed among reversible and non-reversible using both HyPhy and IQTREE. And we have recommended that model tests should be done as a first step before tree building. Estimating DNR from real datasets requires a substation rate matrix of a non-reversible.

      The manuscript has many text errors (including typos and incorrect citations). For example, many citations in page 20 show "Error! Reference source not found.". I think authors should double check the manuscript before submitting. Also, some text is not formally written. For example, "G represents gamma-distributed rates", rates of what? The text should be clear for readers that are not familiar with the topic (i.e., G represents gamma-distributed substitution rates among sites). In general, I recommend a detailed revision of the whole text of the manuscript.

      Done.

      Reviewer #2 (Recommendations For The Authors):

      The authors reference Baele et al., 2010 for describing NREV6 and NREV12. I suggest using the same name used in the referenced paper: GNR-SYM and GNR respectively. Although I do not think there is a standard name for these models, I would use a previously used one.

      We have built studies based on the names NREV6 and NREV12. We would like to keep the naming as standard for our studies.

      GTR and NREV12 models are already described in many other papers. I do not see the need to include such an extensive description. Also, a reference should be included to the discrete Gamma rate categories [1]

      We included the extensive description to enable other readers who are not super familiar with these models better understanding since we have given the models our own naming different from those used in other papers.

      We have added referencing for the discrete gamma rate as recommended. (Yang, 1994)

      To evaluate the exhaustiveness and correctness of the results, I would recommend publishing as supplementary material the simulated data sets or the scripts for generating the data set, the scripts or command lines for the analysis, and the versions of the software used (e.g., IQTREE). Also, to strongly support the main conclusion of the manuscript, I suggest adding to the simulations section results the RF-distances of the best-fit selected model under AIC, AICc, and BIC as well.

      We can go ahead and submit all the needed datasets. The simulated data RF-Distances results are available and will be submitted. We cannot however add them to the main document as this will create very long data tables.

      In some instances, it is mentioned that the selection criterion used is AIC, while in others, AIC-c is referenced. Even in the table captions, both terms are mixed. It should be made clearer which criterion is being employed, as AIC is not suitable for addressing the overparameterization of evolutionary models, given that it does not account for the sample size. A previous pre-print of this article [2] does not mention AIC-c, but also explicitly includes the formulas for AIC that do not take the sample size into account, and reports the same results as this manuscript, what indicates that AIC and not AIC-c was used here. This should be clarified. It is recommended to use AIC-c instead of AIC, especially if the sample size to model parameters ratio is low [3]. Two things may be appointed here: some authors consider tree branch lengths as model free parameters and others do not. In this paper it is not specified how the model parameters are counted. AIC tends to select more parameterized models than AIC-c, and overparameterization can lead to different tree inferences, as evidenced in Hoff et al., 2016. Therefore, it is expected that NREV12 is more frequently selected than NREV6 and GTR.

      In my opinion, a pairwise comparison between GTR and NREV12 performance is of great interest here, and the whiskers plots are not useful. Scatterplots would display the results better.

      Boxplots are meant to offer a simplified view of the results as the paired t-tests does all of the comparisons. We shall provide the scatter plots as supplementary information so that readers can get full detailed plots as recommended.

      Some references are missing

      Missing references added

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This paper seeks to understand the upstream regulation and downstream effectors of glycolysis in retinal progenitor cells, using mouse retinal explants as the main model system. The paper presents evidence that high glycolysis in retinal progenitor cells is required for their proliferation and timely differentiation into photoreceptors. Retinal glycolysis increases after the deletion of Pten. The authors suggest that high glycolysis controls cell proliferation and differentiation by promoting intracellular alkalinization, beta-catenin acetylation and stabilization, and consequent activation of the canonical Wnt pathway.

      Strengths:

      (1) The experiments showing that PFKFB3 overexpression is sufficient to increase the proliferation of retinal progenitors (which are already highly dividing cells) and photoreceptor differentiation are striking and the result is unanticipated. It suggests that glycolytic flux is normally limiting for proliferation in embryos.

      In our BrdU birthdating experiment, we showed that PFKB3 expression drives the precocious differentiation of retinal progenitor cells (RPCs) into photoreceptors. However, we did not determine if there is an associated change in the number of dividing RPCs. To examine the proliferative status of PFKB3-overexpressing RPCs, we will perform short-term BrdU labeling to measure the number of RPCs in S-phase of the cell cycle. Additionally, we will count the number of RPCs expressing pHH3, a mitotic marker, and Ki67, a marker of cycling cells in all cell cycle phases.

      (2) Likewise the result that an increase in pH from 7.4 to 8.0 is sufficient to increase proliferation implies that pH regulation may have instructive roles in setting the tempo of retinal development and embryonic cell proliferation. Similarly, the results show that acetate supplementation increases proliferation (I think this result should be moved to the main figures).

      We thank the reviewer for these positive comments on our work. We will move the acetate data to the main figure as requested.

      Weaknesses:

      (1) Epistatic experiments to test if changes in pH mediate the effects of glycolysis on photoreceptor differentiation, or if Wnt activation is the main downstream effector of glycolysis in controlling differentiation are not presented.

      Traditionally, epistasis is tested using double knock-out (DKO) studies with null mutant alleles. If two genes operate in the same pathway, the downstream phenotype prevails, whereas phenotypic worsening is observed if two genes act in parallel pathways. Our data suggests the following order of events: Pten¯®glycolysis­®intracellular pH­®Wnt signaling­®photoreceptor differentiation. In this model, Wnt signaling is the downstream-most effector. To test our epistatic model, we will assess RPC proliferation and the differentiation of Crx+ photoreceptor precursors with the following assays:

      (1) To confirm that Wnt signaling acts downstream of Pten, we will generate DKOs of Pten and Ctnnb1, a downstream effector of Wnt signaling. We know that fewer photoreceptors are generated in single Pten-cKO and Ctnnb1-cKO retinas, with a disruption of the outer nuclear layer only in Ctnnb1-cKOs. If Pten and Wnt act in the same pathway, Pten;Ctnnb1 DKOs will resemble single Ctnnb1-cKOs.

      (2) While epistasis is traditionally examined using genetic mutants, we will perform proxy experiments using pharmacological agents. To test whether Wnt activation acts downstream of a pH increase, we will activate Wnt signaling with recombinant Wnt3a at high and low pH. While low pH inhibits photoreceptor differentiation, if Wnt signaling is downstream, it should promote differentiation even at low pH. Conversely, we will alter pH in the presence of a Wnt inhibitor, FH535, which should block the positive effects of high pH on photoreceptor differentiation.

      (3) To test whether Wnt activation acts downstream of glycolysis to increase photoreceptor differentiation, we will apply recombinant Wnt3a to retinal explants while simultaneously inhibiting glycolysis with 2DG.  While 2DG inhibits photoreceptor differentiation, if Wnt signaling is downstream, it should still be able to promote differentiation. 

      (4) To test whether pharmacological inhibition of Wnt signaling reverses the effects of high glycolytic activity in Pten cKO retinas, we will treat wild-type and Pten-cKO retinas with the Wnt inhibitor FH535 and/or the glycolytic inhibitor 2DG.

      (2) It is likely that metabolism changes ex vivo vs in vivo, and therefore stable isotope tracing experiments in the explants may not reflect in vivo metabolism.

      We agree with the reviewer that metabolism likely changes ex vivo compared to in vivo. However, we did not perform stable isotope tracing experiments to directly examine glycolytic flux in this study. While outside the scope of the current study, this type of analysis is an important future direction that we will bring up in the discussion.

      (3) The retina at P0 is composed of both progenitors and differentiated cells. It is not clear if the results of the RNA-seq and metabolic analysis reflect changes in the metabolism of progenitors, or of mature cells, or changes in cell type composition rather than direct metabolic changes in a specific cell type.

      We mined a scRNA-seq dataset to show that Pgk1, a rate-limiting enzyme for glycolysis, is specifically elevated in early-stage RPCs versus later stage. We have since analysed additional glycolytic pathway genes, and observed a similar enrichment of Pfkl, Eno1 and Slc16a3 transcripts in early RPCs, while other genes were equally expressed in both early and late RPCs.

      To functionally demonstrate that there are differences in glycolysis between early and late RPCs, we will use CD133 to sort RPCs at E15 (early) and P0 (late). We will perform qPCR on sorted cells to validate the transcriptional differences in glycolytic gene expression. Additionally, we will perform two proxy measures of glycolysis: 1) We will measure lactate levels in sorted RPCs at both stages, and 2) We will use a Seahorse assay and assess ECAR in sorted RPCs at both stages.

      (4) The biochemical links between elevated glycolysis and pH and beta-catenin stability are unclear. White et al found that higher pH decreased beta-catenin stability (JCB 217: 3965) in contrast to the results here. Oginuma et al found that inhibition of glycolysis or beta-catenin acetylation does not affect beta-catenin stability (Nature 584:98), again in contrast to these results. Another paper showed that acidification inhibits Wnt signaling by promoting the expression of a transcriptional repressor and not via beta-catenin stability (Cell Discovery 4:37). There are also additional papers showing increased pH can promote cell proliferation via other mechanisms (e.g. Nat Metab 2:1212). It is possible that there is organ-specificity in these signaling pathways however some clarification of these divergent results is warranted.

      The pleiotropic actions of Wnt signaling on cell proliferation and differentiation are well known, even shifting from pro-proliferative to anti-proliferative depending on tissue or cell type. It is thus not surprising that different studies found unique effects of pH and glycolysis on b-catenin modifications and the activation of downstream signaling. Thus, as suggested by the reviewer, the difference between our data and other studies could be attributed to tissue and organism. In our revision, we will more fully assess our findings in the context of published studies, as recommended by the reviewer.

      To summarize our data, in the developing retina, we found that non-phosphorylated b-catenin protein levels increase in Pten-cKO retinas in vivo, while conversely, non-phosphorylated b-catenin protein levels decrease upon 2DG treatment and at low pH 6.5 in vitro.

      The Oginuma et al. 2020 (Nature 584: 98-101) study was performed on the chick tailbud and investigated lineage decisions by neuromesodermal progenitors in the presomitic mesoderm. In this context, WNT activity, glycolysis and pHi all decline in tandem, complementary to our findings. However, Oginuma et al. found that while phosphorylated and non-phosphorylated b-catenin levels do not vary, K49 b -catenin acetylation is reduced at low pHi. In their system, K49 b -catenin acetylation is associated with a switch in cell fate choice from neural to mesodermal in the chick tailbud. We will now assess this modification.

      Hauck et al. 2021 (Cell Death & Differentiation 28:1398-1417) found that by mutating Pkm, a rate-limiting glycolytic enzyme, b-catenin can more efficiently shuttle to the nucleus to activate Wnt-signaling and promote cardiomyocyte proliferation. This study highlights the importance of examining b-catenin protein levels in both cytoplasmic and nuclear fractions. They also examined transcriptional targets of Wnt signaling, such as Axin2, Ccnd1, Myc, Sox2 and Tnnt3, which we will also now assess.

      In a separate study in cancer cells, high pH leads to increased expression of Ccnd1, a b-catenin target gene, and promotes proliferation (Koch et al. 2020. Nat Metab. 2:1212-1222). These findings are consistent with our demonstration that b-catenin levels are stabilized at pH 8, and RPC proliferation is enhanced. A separate study by Melnik et al 2018 (Cell Discovery 4:37) performed in cancer cells found that acidification induced by metformin indirectly suppresses Wnt signaling by activating the DDIT3 transcriptional repressor, consistent with our data showing low pH suppresses b-catenin stability. Melnik et al also used Mcl inhibitors, as we did in our study, and showed that this treatment blocked Wnt signaling. While we did not look at the impact of CNCn on Wnt signaling, we did see a decline in proliferation, as expected if Wnt levels are low. The relationship between CNCn and Wnt activity will now be assessed.

      The one study that fits less well is from Czowski and White (BioRxiv), where they found that higher pH levels decrease b-catenin levels in the cytoplasm, nucleus and junctional complexes in MDCK cells. In this study, the authors altered pH using inhibitors for a sodium-proton exchanger and a sodium bicarbonate transporter. The Oginuma paper instead used the ionophores nigericin and valinomycin to equilibrate intracellular pHi to media pH, which we will now incorporate into our study.

      In summary, to more comprehensively examine the link between Pten loss, glycolytic activity, pHi and Wnt signaling, we will examine levels of phosphorylated, non-phosphorylated and K49 acetylated b-catenin after each manipulation (i.e., Pten loss, pH manipulations, CNCn treatment, glycolysis inhibition, acetate treatments). For pH manipulations, we will use nigericin and valinomycin to equilibrate pH. These studies will be performed on cytoplasmic and nuclear fractions from CD133+ MACS-enriched RPCs, to add cell type and stage specificity to our study. We will also use qPCR to examine Wnt signaling genes, such as Axin2, Ccnd1, Myc, Sox2 and Tnnt3.

      (5) The gene expression analysis is not completely convincing. E.g. the expression of additional glycolytic genes should be shown in Figure 1. It is not clear why Hk1 and Pgk1 are specifically shown, and conclusions about changes in glycolysis are difficult to draw from the expression of these two genes. The increase in glycolytic gene expression in the Pten-deficient retina is generally small.

      See response to point 3.

      (6) Is it possible that glycolytic inhibition with 2DG slows down the development and production of most newly differentiated cells rather than specifically affecting photoreceptor differentiation?

      We thank the reviewer for this excellent suggestion. We will examine the impact of  2DG on the differentiation of other retinal cell types, including bipolar and amacrine cells and Muller glia. For technical reasons, we will exclude ganglion cells, which die in culture and are not possible to examine in explants, and horizontal cells, which are a rare cell type, and hence, difficult to accurately quantify.

      (7) Are the prematurely-born cells caused by PFKFB3 overexpression photoreceptors as assessed by morphology or markers (in addition to position)?

      We will immunostain treated retinas with additional cell-type specific markers to examine rod and cone photoreceptor numbers and morphologies.

      Reviewer #2 (Public review):

      Summary:

      The manuscript by Hanna et al., addresses the question of energy metabolism in the retina, a neuronal tissue with an inordinately high energy demand. Paradoxically, the retina appears to employ to a large extent glycolysis to satisfy its energetic needs, even though glycolysis is far less efficient than oxidative phosphorylation (OXPHOS). The focus of the present study is on the early development of the retina and the retinal progenitor cells (RPCs) that proliferate and differentiate to form the seven main classes of retinal neurons. The authors use different genetic and pharmacological manipulations to drive the metabolism of RPCs or the retina towards higher or lower glycolytic activity. The results obtained suggest that increased glycolytic activity in early retinal development produces a more rapid differentiation of RPCs, resulting in a more rapid maturation of photoreceptors and photoreceptor segment growth. The study is significant in that it shows how metabolic activity can determine cell fate decisions in retinal neurons.

      Strengths:

      This study provides important findings that are highly relevant to the understanding of how early metabolism governs the development of the retina. The outcomes of this study could be relevant also for human diseases that affect early retinal development, including retinopathy of maturity where an increased oxygenation likely causes a disturbance of energy metabolism.

      We thank the reviewer for these positive comments on our study.

      Weaknesses:

      The restriction to only relatively early developmental time points makes it difficult to assess the consequences of the different manipulations on the (more) mature retina. Notably, it is conceivable that early developmental manipulations, while producing relevant effects in the young post-natal retina, may "even out" and may no longer be visible in the mature, adult retina.

      While we agree that it would be interesting to observe the long-term consequences of our manipulations, we are limited by our retinal explant model, which can at best be cultured for 2 weeks in vitro. Additional limitations include the lack of photoreceptor outer segment development in our in vitro model. However, we can perform more extensive analyses of our genetic models in vivo (i.e., Pten-cKO, cyto-PFKB3-GOF, Ctnnb1-cKO). For these lines, we will focus on more in-depth analyses of photoreceptor differentiation and outer segment maturation using additional markers and one later stage of development.

      Reviewer #3 (Public review):

      Summary:

      This study examines the metabolic regulation of progenitor proliferation and differentiation in the developing retina. The authors observe dynamic changes in glycolytic gene expression in retinal progenitors and use various strategies to test the role of glycolysis. They find that elevated glycolysis in Pten-cKO retinas results in alteration of RPC fate, while inhibition of glycolysis has converse effects. They specifically test the role of elevated glycolysis using dominant active cytoPFKB3, which demonstrates the selective effects of elevated glycolysis on progenitor proliferation and rod differentiation. They then show that elevated glycolysis modulates both pHi and Wnt signaling, and provide evidence that these pathways impact proliferation and differentiation of progenitors, particularly affecting rod photoreceptor differentiation.

      Strengths:

      This is a compelling and rigorous study that provides an important advance in our understanding of metabolic regulation of retina development, addressing a major gap in knowledge. A key strength is that the study utilizes multiple genetic and pharmacological approaches to address how both increased or decreased glycolytic flux affect retinal progenitor proliferation and differentiation. They discover elevated Wnt signaling pathway genes in Pten cKO retina, revealing a potential link between glycolysis and Wnt pathway activation. Altogether the study is comprehensive and adds to the growing body of evidence that regulation of glycolysis plays a key role in tissue development.

      We thank the reviewer for these positive comments on our study.

      Weaknesses:

      (1) Following the expression of cytoPFKB3, which results in increased glycolytic flux, BrDU labeling was performed at e12.5 and increased labeled cells were detected in the outer nuclear layer. However whether these are cones or rods is not established. The rest of the analysis is focused on the precocious maturation of rhodopsin-labeled outer segments, and the major conclusions emphasize rod photoreceptor differentiation. Therefore, it is unclear whether there is an effect on cone differentiation for either Pten cKO or cytoPFKB3 transgenic retina. It is also not established whether rods are born precociously. Presumably, this would be best detected by BrDU labeling at later embryonic stages.

      We agree with the reviewer that we should expand our study to also examine cone differentiation and outer segment maturation, which we will now do by adding additional markers to our study.

      (2) The authors find that there is upregulation of multiple Wnt pathway components in Pten cKO retina. They further show that inhibiting Wnt signaling phenocopies the effects of reducing glycolysis. However, they do not test whether pharmacological inhibition of Wnt signaling reverses the effects of high glycolytic activity in Pten cKO retinas. Thus the argument that Wnt is a key downstream effector pathway regulating rod photoreceptor differentiation is weak.

      See Reviewer 1, point 1

      (3) The use of sodium acetate to force protein acetylation is quite non-specific and will have effects beyond beta-catenin acetylation (which the authors acknowledge). Thus it is a stretch to state that "forced activation of beta-catenin acetylation" mimics the impact of Pten loss/high glycolytic activity in RPCs since the effects could be due to acetylation of other proteins.

      As outlined in our response to Reviewer #1, point 4, we will now assess K49 b-catenin acetylation levels, as conducted by Oginuma et al. This analysis will allow us to determine whether b-catenin acetylation is altered with manipulations of Pten, glycolysis, pH or acetate treatments.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer 1:

      Summary

      This study has as its goal to determine how the structure and function of the circuit that stabilizes gaze in the larval zebrafish depends on the presence of the output cells, the motor neurons. A major model of neural circuit development posits that the wiring of neurons is instructed by their postsynaptic cells, transmitting signals retrogradely on which cells to contact and, by extension, where to project their axons. Goldblatt et al. remove the motor neurons from the circuit by generating null mutants for the phox2a gene. The study then shows that, in this mutant that lacks the isl1-labelled extraocular motor neurons, the central projection neurons have 1) largely normal responses to vestibular input; 2) normal gross morphology; 3) minimally changed transcriptional profiles. From this, the authors conclude that the wiring of the circuit is not instructed by the output neurons, refuting the major model.

      Strengths

      I found the manuscript to be exceptionally well-written and presented, with clear and concise writing and effective figures that highlight key concepts. The topic of neural circuit wiring is central to neuroscience, and the paper's findings will interest researchers across the field, and especially those focused on motor systems. 

      The experiments conducted are clever and of a very high standard, and I liked the systematic progression of methods to assess the different potential effects of removing phox2a on circuit structure and function. Analyses (including statistics) are comprehensive and appropriate and show the authors are meticulous and balanced in most of the conclusions that they draw. Overall, the findings are interesting, and with a few tweaks, should leave little doubt about the paper's main conclusions. 

      We are grateful for the Reviewer’s enthusiasm for our manuscript and recognition of the advance to the vestibular and motor systems fields. We particularly appreciate their suggestions for experiments to improve the characterization of our phox2a mutant line. We hope the Reviewer finds the results of the added experiment adequately address the points they raise. 

      Weaknesses/Recommendations

      (1) The main point is the incomplete characterisation of the effects of removing phox2a on the extra-ocular motor neurons. Are these cells no longer there, or are they there but no longer labelled by isl1:GFP? If they are indeed removed, might they have developed early on, and subsequently lost? These questions matter as the central focus of the manuscript is whether the presence of these cells influences the connectivity and function of their presynaptic projection neurons. Therefore, for the main conclusions to be fully supported by the data, the authors would need to test whether 1) the motor neurons that otherwise would have been labelled by the isl1:GFP line are physically no longer there; 2) that this removal (if, indeed, it is that) is developmental. If these experiments are not feasible, then the text should be adjusted to take this into account. 

      Show (e.g., with DAPI or some other staining) whether there are still cells where you would have expected to see nIII/nIV extraocular motor neurons. If this is done in a developmental timeline both main "concerns" are addressed in one go. If this doesn't work for some reason, then I'd suggest adjusting the discussion section to note this caveat. I realise it is commonplace in zebrafish and rodent papers to equate the two, but it should also be considered that the isl1:GFP does not report which cells are isl1+ 100% faithfully. 

      We thank the Reviewer for their suggestion. We’ve included the results of this experiment in (new) Supplemental Figure 1 and have updated the Results accordingly (text lines 69-72). 

      Briefly: We performed fluorescent in situ hybridization for vachta, a marker for cholinergic motor neurons, when nIII/nIV differentiation is complete at 2 dpf and prior to synaptogenesis with both their pre- and postsynaptic partners. We included a DAPI stain. We find that while phox2a does not physically remove neurons from the region that contains nIII/nIV motor neurons, neurons in this region no longer express vachta. The presence of neurons at an early stage (2dpf) that have lost expression of both a transcription factor (isl1) and motor neuron marker (vachta) supports our contention that, while cells are there, they should not be considered motor neurons.

      While the reviewer did not suggest it directly, we note that there is a more laborious way to determine “what happens to cells that would have been phox2a+ but no longer express phox2a?” Specifically, one could target a reporter transgene to the endogenous phox2a locus on the phox2a mutant background. Regrettably, generating such a knock-in reporter is difficult and success is far from assured.

      Previously (Greaney et. al. 2017, 10.1002/cne.24042 ), we compared expression patterns in nIV to those observed after retro-orbital dye fills. We never saw neurons labeled by dye that were not also GFP+. However, it was not possible to perform a similar analysis for nIII, so we acknowledge the limits of the isl1:GFP reporter.

      (2) A further point to address is the context of the manipulation. If the phox2a removal does indeed take out the extra-ocular motor neurons, what percentage of postsynaptic neurons to the projection neurons are still present?

      In other words, how does the postsynaptic nMLF output relate to the motor neurons? If, for instance, the nMLF (which, as the authors state, are likely still innervated by the projection neurons) are the main output of the projections neurons, then this would affect the interpretation of the results.

      Is there quantitative information on the projection neuron outputs to address the second point (i.e., how much of the projection neurons' output is the extra-ocular motor neurons)? If not, it should be discussed how this could affect the conclusions. 

      Qualitatively, projection neurons form more robust arbors to the nMLF than to their nIII/nIV partners (see: Schoppik et al. 2017, DOI: 10.1523/JNEUROSCI.1711-17.2017 ). We expect this is proportional to the size of each downstream target. 

      The Reviewer makes an interesting point here. These projection neurons innervate several downstream nuclei that could potentially influence their development; we’ve considered this in the Discussion based on existing literature and in the context of our own findings. A precise dissection of each target population’s contribution would be interesting and important for larger questions about neural circuits for balance (see Sugioka et. al. 2023 10.1038/s41467-023-36682-y ). However, we feel this analysis is outside our study’s scope, given that our aim here was to evaluate a standing hypothesis restricted to the contribution of nIII/nIV motor neurons. 

      Less important, but still useful: 

      - Figure 4C/D: I found these panels difficult to interpret. Perhaps split them up so each panel does a little less heavy lifting? Do the main panels in C show all axons? Where are the "two remaining nIII/nIV neurons" in D? 

      We’ve split the panels in 4C as suggested and adjusted the caption text in 4D to clarify the “remaining neurons” were simply not eliminated following phox2a knockout. We presume they are instead phox2b+. 4C shows all axons labeled by our transgenic line that follow the medial longitudinal fasciculus.  

      Extremely minor: 

      - line 28: "tantamount" --> "paramount"? 

      - some figure legends say DeltaFF, instead of DeltaF/F 

      - line 192: "the any" 

      These have been corrected; we thank the reviewer for their attention to detail. 

      Reviewer 2:

      Summary

      This study was designed to test the hypothesis that motor neurons play a causal role in circuit assembly of the vestibulo-ocular reflex circuit, which is based on the retrograde model proposed by Hans Straka. This circuit consists of peripheral sensory neurons, central projection neurons, and motor neurons. The authors hypothesize that loss of extraocular motor neurons, through CRISPR/Cas9 mutagenesis of the phox2a gene, will disrupt sensory selectivity in presynaptic projection neurons if the retrograde model is correct. 

      Strengths

      The work presented is impressive in both breadth and depth, including the experimental paradigms. Overall, the main results were that the loss of function paradigm to eliminate extraocular motor neurons did not 1) alter the normal functional connections between peripheral sensory neurons and central projection neurons, 2) affect the position of central projection neurons in the sensorimotor circuit, or 3) significantly alter the transcriptional profiles of central projection neurons. Together, these results strongly indicate that retrograde signals from motor neurons are not required for the development of the sensorimotor architecture of the vestibulo-ocular circuit. 

      We are grateful for the excellent summary of our manuscript and support for our aim, which was indeed to evaluate Hans Straka’s model for the development of the vestibulo-ocular reflex circuit.  

      Appraisal of whether the authors achieved their aims, and whether the results support their conclusions The results of this study showed that extraocular motor neurons were not required for central projection neuron specification in the vestibulo-ocular circuit, which countered the prevailing retrograde hypothesis proposed for circuit assembly. A concern is that the results presented may be limited to this specific circuit and may not be generalizable to other circuit assemblies, even to other sensorimotor circuits. 

      Impact

      As mentioned above, this study sheds valuable new insights into the developmental organization of the vestibulo-ocular circuit. However, different circuits likely utilize various mechanisms, extrinsic or intrinsic (or both), to establish proper functional connectivity. So, the results shown here, although begin to explain the developmental organization of the vestibulo-ocular circuit, are not likely to be generalizable to other circuits; though this remains to be seen. At a minimum, this study provides a starting point for the examination of patterning of connections in this and other sensorimotor circuits.

      Weaknesses/Recommendations

      A concern is that the results presented may be limited to this specific circuit and may not be generalizable to other circuit assemblies, even to other sensorimotor circuits. However, different circuits likely utilize various mechanisms, extrinsic or intrinsic (or both), to establish proper functional connectivity. So, the results shown here, although begin to explain the developmental organization of the vestibulo-ocular circuit, are not likely to be generalizable to other circuits; though this remains to be seen. 

      We agree with the Reviewer that — of course — a diverse array of developmental mechanisms shape sensorimotor circuit architecture. However, prior findings in the spinal cord (Wang & Scott 2000, Sürmeli et al. 2011, Bikoff et al. 2016, Sweeney et al. 2018, Shin et al. 2020) support our primary conclusion that motor neurons are dispensable for specification of premotor partners. The Recommendation ends with “though this remains to be seen.” We infer that the Reviewer does not have a counterexample at hand for a circuit where motor neurons determine the fate of their partners. Therefore, the preponderance of evidence argues that our work is likely to generalize to other circuits. However, we acknowledge the limitations of our work and we have tempered any claims to generality in the text.

      Lines 156-57: The authors should consider and discuss explicitly the potential of compensatory mechanisms in the CRISPR/Cas9 mutants that may permit synaptogenesis of the projection neurons even though MNs partners are absent. 

      We agree with the Reviewer that careful consideration of compensation is merited when using mutants. There are two synapses that the comment might refer to: those between projection neurons and motor neurons, and those between sensory afferents and projection neurons. Projection neurons fail to form any synapses at the region that would contain their motor neuron (nIII/nIV) partners (see Fig. 4C), so there is no question of compensation there. Figure 1B shows that there is no phox2a expression in sensory or central projection neurons. Consequentially, even if there were a gene that perfectly compensated for the loss of phox2a it wouldn’t be active in sensory or central projection neurons. We therefore do not believe that compensatory expression of other genes plays any role here. 

      Line 162: Is this an accurate global statement or should it be restricted to the evidence provided in this report?

      We’ve clarified this line, which summarizes findings described in previous results sections of this report.

      Reviewer 3:

      Summary

      In this manuscript by Goldblatt et al. the authors study the development of a well-known sensorimotor system, the vestibulo-ocular reflex circuit, using Danio rerio as a model. The authors address whether motor neurons within this circuit are required to determine the identity, upstream connectivity and function of their presynaptic partners, central projection neurons. They approach this by generating a CRISPR-mediated knockout line for the transcription factor phox2a, which specifies the fate of extraocular muscle motor neurons. After showing that phox2a knockout ablates these motor neurons, the authors show that functionally, morphologically, and transcriptionally, projection neurons develop relatively normally.

      Overall, the authors present a convincing argument for the dispensability of motor neurons in the wiring of this circuit, although their claims about the generalizability of their findings to other sensorimotor circuits should be tempered. The study is comprehensive and employs multiple methods to examine the function, connectivity and identity of projection neurons.

      We appreciate the Reviewer’s support for our manuscript and have implemented their thoughtful suggestions on how to improve the clarity and presentation of our conclusions. We acknowledge the shared consideration with Reviewer 2 as to the generalizability of our findings, and have tempered the language in our revision. 

      In the introduction the authors set up the controversy on whether or not motor neurons play an instructive role in determining "pre-motor fate". This statement is somewhat generic and a bit misleading as it is generally accepted that many aspects of interneuron identity are motor neuron-independent. The authors might want to expand on these studies and better define what they mean by "fate", as it is not clear whether the studies they are citing in support of this hypothesis actually make that claim. 

      We appreciate the Reviewer’s attention to this important consideration. We agree that there are numerous, and often ambiguous ways to define cell fate. We’ve modified our manuscript to read  “…for and against an instructive role in establishing connectivity” (line 19) to reflect that connectivity is the most pertinent readout of cell fate in (most) studies cited there, as well as in our model system (lines 25-26: “Subtype fate, anatomical connectivity, and function are inextricably linked: directionally-tuned sensory neurons innervate nose-up/nosedown subtypes of projection neurons, which in turn innervate specific motor neurons…”). We’ve expanded on the prior studies mentioned above in relevant sections of the Results and Discussion. 

      Although it appears unchanged from their images, the authors do not explicitly quantitate the number of total projection neurons in phox2a knockouts. 

      We have added this quantification (text lines 95-96); the number of projection neurons per hemisphere is unchanged in control and mutant larvae.

      For figures 2C and 3C, please report the proportion of neurons in each animal, either showing individual data points here or in a separate supplementary figure; and please perform and report the results of an appropriate statistical test. 

      Generally, we agree that per-animal sampling can provide important metrics. We’ve added a line in the appropriate Methods section with the mean/standard deviation number of neurons sampled per animal for each genotype (lines 408-410). However, our extensive prior work using this transgenic line (Goldblatt et al. 2023, DOI: 10.1016/j.cub.2023.02.048 ) argued that a per-animal breakdown can be misleading. Due to occasional technical aberrations, variation in transgenic line expression, and limitations of our registration method, we cannot sample 100% of the projection nucleus (~50 neurons/hemisphere) in each animal. Likewise, the topography of the nucleus in WT animals, both for up/down subtypes (Fig. 2) and impulse responsive/unresponsive neurons (Fig. 3), means that subtypes may not be proportionally sampled on a peranimal basis. While such problems would likely resolve if we took data from ~50-75 animals for each condition, at a throughput of ~2 animals/day and 1-2 experimental days / week on shared instrumentation the throughput simply isn’t there. We therefore believe the data is best represented as an aggregate.

      In the topographical mapping of calcium responses (figures 2D, E and 3D), the authors say they see no differences but this is hard to appreciate based on the 3D plotting of the data. Quantitating the strength of the responses across the 3-axes shown individually and including statistical analyses would help make this point, especially since the plots look somewhat qualitatively different. 

      We have added a supplemental table (new Table 2) with statistical comparisons of projection neuron topography (both to tonic and impulse stimuli) across genotypes for additional clarification. Quantitatively, we find that differences in projection neuron position (max observed: approx. 5 microns) are within the limits of our expected error in registering neurons across larvae to a standardized framework, given the small size of the nucleus (approx. 40 microns in each spatial axis) and each individual neuron (approx. 5 microns in diameter).

      The transcriptional analysis is very interesting, however, it is not clear why it was performed at 72 hpf, while functional experiments were performed at 5 days. Is it possible that early aspects of projection neuron identity are preserved, while motor neuron-dependent changes occur later? The authors should better justify and discuss their choice of timepoint. 

      As suggested, we have updated the manuscript to justify the choice of timepoint (text lines 176-177). We agree with the Reviewer that choosing the “right” timepoint for transcriptional analysis is key. The comment underscores the challenges in balancing the amount of time past neurogenesis (24-54 hpf) when potential fate markers could change, with the timecourse of synaptogenesis (2-4 dpf) and functional maturation (5 dpf). We hypothesized that selecting an intermediate timepoint (72 hpf, during peak synaptogenesis), would enable the highest resolution of both fate markers expressed at the end of neurogenesis (54 hpf) and wiring specificity molecules. We point the Reviewer to recent studies in comparable systems that proposed subtype diversity is most resolvable during synaptogenesis as further justification (see: Ozel et al. 2022, DOI: 10.1038/s41586-020-2879-3 and Li et al. 2017, DOI: 10.1016/j.cell.2017.10.019). However, we acknowledge that the ideal experiment would have been a transcriptional timecourse that would have directly addressed the question. 

      The inclusion of heterozygotes as controls is problematic, given that the authors show there are notable differences between phox2a+/+ and phox2a+/- animals; pooling these two genotypes could potentially flatten differences between controls and phox2a-/-. 

      We agree that this is an important limitation on our interpretations and have noted this more explicitly in the appropriate Results section (line 204).  

      Projection neurons appear to be topographically organized and this organization is maintained in the absence of motor neurons. Are there specific genes that delineate ventral and dorsal projection neurons? If so, the authors should look at those as candidate genes as they might be selectively involved in connectivity. Showing that generic synaptic markers (Figure 4E) are maintained in the entire population is not convincing evidence that these neurons would choose the correct synaptic partners.

      We agree with the Reviewer that Figure 4E is limited and that the most convincing molecular probe would be against a subtype-specific marker gene, ideally the one(s) that establish subtype-specific connectivity. To date, few such markers have been identified in any system, and, to the best of our knowledge, no reported markers differentiate dorsal (nose-up) from ventral (nose-down) projection neurons. We are currently evaluating candidates, but will not include that data here until the relevant genes are established as veridical subtype markers with defined roles in subtype fate specification and connectivity.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this study, the authors describe the participation of the Hes4-BEST4-Twist axis in controlling the process of epithelial-mesenchymal transition (EMT) and the advancement of colorectal cancers (CRC). They assert that this axis diminishes the EMT capabilities of CRC cells through a variety of molecular mechanisms. Additionally, they propose that reduced BEST4 expression within tumor cells might serve as an indicator of an adverse prognosis for individuals with CRC.

      Strengths:

      • Exploring the correlation between the Hes4-BEST4-Twist axis, EMT, and the advancement of CRC is a novel perspective and gives readers a fresh standpoint.<br /> • The whole transcriptome sequence analysis (Figure 5) showing low expression of BEST4 in CRC samples will be of broad interest to cancer specialists as well as cell biologists although further corroborative data is essential to strengthen these findings (See Weaknesses).

      Weaknesses:

      (1) The authors employed three kinds of CRC cell lines, but not untransformed cells such as intestinal epithelial organoids which are commonly used in recent research.

      Sincerely thanks for catching this issue. While we acknowledge the potential of intestinal epithelial organoids as a valuable model for this study and will consider establishing this system in future research, which falls outside the scope of our current work.

      (2) The authors use three different human CRC cell lines with a lack of consistency in the selection of them. Please clarify 1) how these lines are different from each other, 2) why they pick up one or two of them for each experiment. To be more convincing, at least two lines should be employed for each in vitro experiment.

      We apologize for any confusion caused to the reviewer. In our study, we employed HCT116 and Caco2 cell lines to investigate the overexpression of BEST4 in the biological functions of CRC and its involvement in EMT. The selection of HCT116, a human CRC cell line, was based on its relatively lower expression level of BEST4 compared to other CRC cell lines. Conversely, Caco2 is a human colon adenocarcinoma cell line that closely resembles differentiated intestinal epithelial cells and exhibits microvilli structures. Given that BEST4 serves as a marker for intestinal epithelial cells, these two cell lines were chosen for investigating the in vitro effects of overexpressing BEST4 on proliferation, clonality, invasion, migration of colon cancer tumor cells and expression of downstream EMT-related markers. Similarly, we selected the HCT-15 cell line derived from human CRC for BEST4 knockout due to its comparatively higher expression level of BEST4 among other CRC cell lines. We employed the CRISPR/Cas9 gene-editing technology to knockout BEST4 instead of utilizing shRNA for downregulating BEST4 expression, thereby limiting our selection to a single cell line.

      (3) The authors demonstrated associations between BEST4 and cell proliferation/ viability as well as migration/invasion, utilizing CRC cell lines, but it should be noted that these findings do not indicate a tumor-suppressive role of BEST4 as mentioned in line 120. Furthermore, while the authors propose that "BEST4 functions as a tumor suppressor in CRC" in line 50, there seems no supporting data to suggest BEST4 as a tumor suppressor gene.

      We apologize for these inaccurate expressions, and we have made the necessary modifications to the corresponding parts in the manuscript.

      (4) The HES4-BEST4-Twist1 axis likely plays a significant role in CRC progression via EMT but not CRC initiation. Some sentences could lead to a misunderstanding that the axis is important for CRC initiation.

      We apologize for these inaccurate expressions, and we have made the necessary modifications to the corresponding parts in the manuscript.

      (5) The authors mostly focus on the relationship of the HES4-BEST4-Twist1 axis with EMT, but their claims sometimes appear to deviate from this focus.

      We apologize for confusing the reviewer. The objectives of our study are as follows: (1) to establish the role of BEST4 in CRC growth both in vitro and in vivo; (2) to determine the underlying molecular mechanisms by which BEST4 interacts with Hes4 and Twist1, thereby regulating EMT; and (3) to investigate the correlation between BEST4 expression and prognosis of CRC. We have made the necessary modifications to the corresponding parts in the manuscript.

      (6) Some experiments do not appear to have a direct relevance to their claims. For example, the analysis using the xenograft model in Figure 2E-J is not optimal for analyzing EMT. The authors should analyze metastatic or invasive properties of the transplanted tumors if they intend to provide some supporting evidence for their claims.

      Sincerely thanks for catching this issue. The process of EMT transforms epithelial cells exhibiting a spindle fibroblast-like morphology, leading to the acquisition of mesenchymal characteristics and morphology, enabling these cells to acquire invasive and migratory abilities, with expression switching epithelial E-cadherin and Zo-1 to mesenchymal vimentin (Dongre and Weinberg, 2019)..The whole process is regulated by transcriptional factors of the Snail family and Twist1(Dongre and Weinberg, 2019). We utilized the xenograft model with overexpressed BEST4 to analyze the lysates of tumor tissue, revealing that BEST4 upregulated E-cadherin and downregulated vimentin and Twist1 (Figures 2I). These findings indicate that BEST4 inhibits EMT in vivo. Deletion of BEST4 may enable these cells to acquire invasive and migratory abilities, leading to metastasis in vivo. Therefore, we subsequently evaluated the metastatic potential of BEST4 in a CRC liver metastasis model by intrasplenically injecting HCT-15 cell lines with knockout of BEST4 (BEST4gRNA), wild-type control (gRNA), or knockout with rescue (BEST4-Rescued) into BALB/c nude mice. Our observations revealed a twofold increase in liver metastatic nodules in the absence of BEST4 compared to the control group (Fig. 2J-L). Although further in vivo experiments are required for confirmation, our research suggests a potential role for BEST4 in counteracting EMT induction in vivo.

      (7) In Figure 4H, ZO-1 and E-cad expression looks unchanged in the BEST4 KD.

      Sincerely thanks for catching this issue. We have implemented the necessary modifications to the corresponding sections in the manuscript and performed a comprehensive quantification of all Western Blot data to ensure statistically significant differences, including those presented in the supplementary file.

      (8) The in vivo and in vitro data supporting the whole transcriptome sequence analysis (Figure 5) is mostly insufficient. Including the following experiments will substantiate their claims: 1) BEST4 and HES4 immunostaining of human surgical tissue samples, 2) qPCR data of HES4, Twist1, Vimentin, etc. as shown in Figure 5C, 5D.

      Sincerely thanks for catching this issue.

      (1) Due to the substandard quality of the BEST4 antibody, we opted to evaluate the clinical significance of BEST4 in CRC by assessing mRNA results instead of protein levels using immunohistochemistry (IHC). After testing multiple antibodies for western blotting, only one (1:800; LsBio, LS-C31133) accurately indicated BEST4 protein expression while still exhibiting some non-specific bands. Consequently, we decided to transfect a HA-tagged BEST4 plasmid into the CRC cell line and used HA as a marker for BEST4 expression. Unfortunately, none of the antibodies employed for IHC were suitable as they failed to accurately distinguish between positive or negative staining for BEST4 and showed significant non-specific staining (data not shown). The challenge in detecting BEST4 protein in colorectal cancer tissues may be attributed to its low expression levels. Our findings are consistent with previous reports from the Human Protein Atlas database (https://www.proteinatlas.org/ENSG00000142959-BEST4/pathology), which also did not detect any BEST4 protein expression in colorectal cancer tissues through IHC analysis.

      (2) The qPCR data of E-cadherin, Twist1, and Vimentin mRNA expression in CRC tissue has already been published in other studies(Christou et al., 2017; Lazarova and Bordonaro, 2016; Zhu et al., 2015). It was found that E-cadherin is downregulated, while Twist1 and Vimentin are upregulated in CRC tissue compared to the adjacent normal tissues. The qPCR data of E-cadherin, Twist1, and Vimentin mRNA expression in CRC tissue has already been published in other studies(Christou et al., 2017; Lazarova and Bordonaro, 2016; Zhu et al., 2015). It was found that E-cadherin is downregulated, while Twist1 and Vimentin are upregulated in CRC tissue compared to the adjacent normal tissues. The analysis of mRNA expression data obtained from colorectal cancer samples and normal samples in the publicly available databases TCGA and GTEx also revealed a significant downregulation of _Hes_4 expression in colorectal cancer tissues, which will be our next research objective.

      (9) Some statements are inconsistent probably due to grammatical errors. (For example, some High/low may be reversed in lines 234-244.)

      We apologize for these mistakes. We have made corrections to this section in the manuscript.

      Reviewer #2 (Public Review):

      Summary:

      Using in vitro and in vivo approaches, the authors first demonstrate that BEST4 inhibits intestinal tumor cell growth and reduces their metastatic potential, possibly via downstream regulation of TWIST1.

      They further show that HES4 positively upregulates BEST4 expression, with direct interaction with BEST4 promoter region and protein. The authors further expand on this with results showing that negative regulation of TWIST1 by HES4 requires BEST4 protein, with BEST4 required for TWIST1 association with HES4. Reduction of BEST1 expression was shown in CRC tumor samples, with correlation of BEST4 mRNA levels with different clinicopathological factors such as sex, tumor stage, and lymph node metastasis, suggesting a tumor-suppressive role of BEST4 for intestinal cancer.

      Strengths:

      • Good quality western blot data.

      • Multiple approaches were used to validate the findings.

      • Logical experimental progression for readability.

      • Human patient data / In vivo murine model / Multiple cell lines were used, which supports translatability / reproducibility of findings.

      We sincerely thank Reviewer #2 for constructive feedback on this work

      Weaknesses:

      (1) Interpretation of figures and data (unsubstantiated conclusions).

      We apologize for this confusing presentation. We have made corrections to this section in the manuscript.

      (2) Figure quality.

      We apologize for the poor quality of the figures. The figure resolution was drastically reduced during the conversion of the manuscript to pdf on publisher web site. The figures have been re-uploaded and we have once again confirmed that each image has a resolution exceeding 300dpi.

      (3) Figure legends lack information.

      Sincere thanks for catching this issue. We have provided detailed figure legends including supplementary figure legends on pages 36-43 of the manuscript. We have rechecked this section and made improvements and additions.

      (4) Lacking/shallow discussion.

      We apologize for our shallow discussion. We have supplemented and improved some parts of the discussion

      (5) Requires more information for reproducibility regarding materials and methods.

      Sincere thanks for catching this issue. We have provided detailed information for reproducibility regarding materials and methods on pages 18-29; 43-47 of the manuscript. We have rechecked this section and made improvements and additions.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      We sincerely thank Reviewer #1 for constructive feedback on this work.

      Minor comments:

      (1) Line 73: "Variant 4" is not precise. The term "variant" should mean mutation in the gene or different transcription.

      We apologize for using an inaccurate expression. We have now changed Variant 4 to Bestrophin 4.

      (2) Line 78. Is it correct that BEST4+ cells coexist with Hes4+ cells?

      According to the previous study that published in Nature (Parikh et al., 2019), BEST4+ cells originate from the absorptive lineage and express the transcription factors Hes4. Additionally, we also observed the nuclear co-localization of BEST4 and Hes4 in HCT116 cells through immunofluorescence staining (Figure 3E)

      (3) Line 85. The reason "Best4 may be associated with Twist1" is unclear.

      We apologize for the lack of clarity in our previous statement. In a recent analysis utilizing single cell RNA-sequencing, it was discovered that a subset of mature colonocytes expresses BEST4 (Parikh et al., 2019). Additionally, this subset coexists with hairy/enhancer of split 4 positive (Hes4+) cells (Parikh et al., 2019). Previous research has demonstrated the antagonistic role of Hes4 in regulating Twist1 through protein-protein interaction, which governs the differentiation of bone marrow stromal/stem cell lineage (Cakouros et al., 2015). Based on these findings, we speculate that there may be an interactive regulation between BEST4/Hes4/Twist1, potentially influencing the process of cell polarity during epithelial-mesenchymal transition in colorectal cancer. We have made corrections to this section in the manuscript.

      (4) Line 87. Grammatical error (Establishing the role BEST4).

      We apologize for the grammatical error of this section. We have rectified the issue in the manuscript.

      (5) Please clarify the reason the authors do not show any data of BEST4-overexpressing Caco2 cells in Figure 2?

      We apologize for our negligence in not adding this data to in Figure 2. It has now been fully supplemented.

      (6) In line 145, the authors did not show any tumorigenic properties.

      We apologize for this confusing presentation. We have made corrections to this section in the manuscript.

      (7) Figure 3 shows 1) HES4 regulates BEST4 promotor activity, and 2) HES4 and BEST4 colocalized in nuclei, but these are very different biological processes. Please clarify how these two relate to each other.

      Trajectory analysis identifies the basic helix-loop-helix (bHLH) transcription factors Hairy/enhancer of split 4 (Hes4)-expressing colonocytes (Hes4+) in BEST4-expressing colonic epithelial lineage (BEST4+). Although they are very different biological processes, the recent identification of a heterogeneous BEST4+ and Hes4+ subgroup in a human colonic epithelial lineage (Parikh et al., 2019) led us to consider their potential role in regulating CRC progression. We firstly observed a responsive upregulation of both endogenous BEST4 mRNA and protein levels in Hes4 overexpression cells compared to the control transfectant, indicating that Hes4 is a potential upstream activator regulating BEST4 functional. We then confirmed that Hes4 interacted with BEST4, binding directly to its upstream promoter at the region of 1470-1569 bp enhancing its promoter activity as analysed by Co-IP, dual-luciferase assay and ChIP-qPCR, respectively. Essentially, they were co-localized in the nucleus, as shown in immunofluorescence staining after the transient co-transfection of Hes4 and BEST4 into HCT116, therefore indicating that BEST4 interacts with Hes4 at both transcriptional and translational levels (Figure 3; Figure 3-supplemental figure 1)

      (8) In line 182-185, please clarify the reason BEST4 mediates the inhibition of the Twist 1 promotor activity by Hes4.

      Because a step of Hes4 in committing to human bone marrow stromal/stem cell lineage-specific development is mediated by Twist1 downregulation (Cakouros et al., 2015), with evidence of direct interaction between BEST4 and Hes4 observed in HCT116, it is plausible that they could exploit Twist1 to regulate EMT. In the present study, we found that Twist1 colocalized with BEST4 in the nucleus, and their interaction destabilized Twist1 and significantly inhibited EMT induction. Hes4 caused the same effect; however, it required intermediation through BEST4. Although the mechanistic insights of their intercorrelation remain to be elucidated, the present study identified the axis of Hes4-BEST4-Twist1 governing the development of CRC, at least partially by counteracting EMT induction

      (9) In line 205, please rephrase "BEST4-mediated upstream Hes4" to be clearer.

      We apologize for this confusing presentation. We have made corrections to this section in the manuscript.

      Reviewer #2 (Recommendations For The Authors):

      We sincerely thank Reviewer #2 for constructive feedback on this work

      Major Comments:

      (1) The general quality of the figures requires improvement (text in some figures is illegible, and the resolution of the images is low) with more proofreading of the text for clarity. In addition, the resolution of the histology in Fig 2K does not allow a proper evalution of the data.

      We apologize for the poor quality of the figures. The figure resolution was drastically reduced during the conversion of the manuscript to pdf on publisher web site. The figures have been re-uploaded and we have once again confirmed that each image has a resolution exceeding 300dpi. Meanwhile, the Figure 2K was further enhanced and expanded.

      (2) While the authors show that the HES4/BEST4 complex interacts with the TWIST1 protein, they do not expand on the mechanisms underpinning the post-translational or transcriptional regulation of TWIST1. We would like the authors to prove or further speculate on the mechanisms behind this regulation in the discussion.

      Our present study showed that BEST4 inhibited EMT in conjunction with downregulation of Twist1 in both HCT116 and Caco2 CRC cell lines. A previous study has shown an antagonist role of Hes4 in regulating Twist1 via protein-protein interaction that controls the bone marrow stromal/stem cell lineage differentiation (Cakouros et al., 2015). We speculate a possible interactive regulation between Hes4/BEST4/Twist1 by which they deter the process of cell polarity during EMT in CRC. In the present study, we found that BEST4 mediates the inhibition of the Twist1 both in transcription and translation level by Hes4. Twist1 colocalized with BEST4 in the nucleus, and their interaction destabilized Twist1 and significantly inhibited EMT induction. Hes4 caused the same effect; however, it required intermediation through BEST4. The present study identified the axis of Hes4-BEST4-Twist1 governing the development of CRC, at least partially by counteracting EMT induction. We agree that further studies to elucidate mechanistic insights of their intercorrelation are needed that are beyond the scope of the current work.

      (3) The authors need to show or argue that why TWIST1 is necessary for the phenotypes observed, e.g. metastasis/proliferation.

      We apologize for the lack of clarity in articulating this question. The process of EMT transforms epithelial cells exhibiting a spindle fibroblast-like morphology, leading to the acquisition of mesenchymal characteristics and morphology, enabling these cells to acquire invasive and migratory abilities, with expression switching epithelial E-cadherin and Zo-1 to mesenchymal vimentin (Dongre and Weinberg, 2019). When diagnosed in advanced stages, EMT may occur as CRC metastasize to distal organs (Pastushenko and Blanpain, 2019; Sunlin Yong, 2021; Yeung and Yang, 2017; Zhang et al., 2021).The whole process is regulated by transcriptional factors of the Snail family and Twist1(Dongre and Weinberg, 2019). Twist1 (a basic helix-loop-helix transcription factor) reprograms EMT by repressing the expression of E-cadherin and ZO-1 (Nagai et al., 2016; Yang et al., 2004) and simultaneously inducing several mesenchymal markers, typically vimentin (Bulzico et al., 2019; Meng et al., 2018; Nagai et al., 2016; Yang et al., 2004), which is a pivotal predictor of CRC progression (Vesuna et al., 2008; Yang et al., 2004; Yusup et al., 2017; Zhu et al., 2015).Overexpression of Twist1 significantly enhances the migration and invasion capabilities of colorectal cancer cells; furthermore, it is closely associated with metastasis and poor prognosis in patients with colorectal cancer(Yusup et al., 2017; Zhu et al., 2015). We have supplemented and improved these parts of the introduction and discussion.

      (4) The authors sufficiently prove that HES4/BEST4 regulates TWIST1 downregulation, however, we believe the findings are not enough to show *direct* regulation (refer also to line 273). At least rephrasing the conclusions would be adequate, also while referring to the working model depicted in Fig. 5G.

      We apologize for this inaccurate presentation. Although the interaction may not be direct, our co-immunoprecipitation (CO-IP) results demonstrated nuclear colocalization of Twist1 and BEST4 (Figure 4D; Figure 4-supplemental figure 1A). Furthermore, their interaction destabilized Twist1 and significantly inhibited the induction of EMT. We have made corrections to this section in the manuscript.

      (5) The discussion is very short and not satisfactory; is BEST4 an evolutionary conserved protein (besides the channel region)? Any speculation on which domain(s) is(are) important for the interaction with HES4 and TWIST1? How do the findings in the current study compare with recent, potentially contradicting data indicating a pro-tumorigenic function of BEST4 for CRC, including its upregulation (and not downregulation) in malignant intestinal tissues, and activation of PI3K/AKT signaling (PMID: 35058597)?

      We apologize for our shallow discussion. We have supplemented and improved some parts of the discussion. The bestrophins are a highly conserved family of integral membrane proteins initially discovered in Caenorhabditis elegans(Sonnhammer and Durbin, 1997). Homologous sequences can be found across animals, fungi, and prokaryotes, while they are absent in protozoans or plants(Hagen et al., 2005). Conservation is primarily observed within the N-terminal 350–400 amino acids, featuring an invariant motif arginine-phenylalanine-proline (RFP) with unknown functional properties (Milenkovic et al., 2008). Mutations in this region can lead to the development of vitelliform macular dystrophy. However, the C-terminus is a potential site for protein modification and function(Marmorstein et al., 2002; Miller et al., 2019). There is currently no further literature research on the functional roles of different domains of BEST4. Although the crucial domain for the interaction with HES4 and TWIST1 is yet to be determined, requiring further investigation for clarification, our findings demonstrate that Hes4 directly binds to the upstream promoter region of BEST4 at 1470-1569 bp, thereby enhancing its promoter activity. These results provide valuable insights for future research.

      Sincere thanks for catching this publication to us. We carefully read this study and would like to point out a few things.

      a) Firstly, the study demonstrated that BEST4 expression is upregulated in clinical CRC samples, which contradicts the results of other published studies except for our own research. RNA-seq of tissue samples from 95 human individuals representing 27 different tissues was performed to determine the tissue specificity of all protein-coding genes, and the results indicated that the BEST4 gene is predominantly expressed in the colon (Fagerberg et al., 2014). In addition, BEST4 was reported to be exclusively expressed by human absorptive cells and could be induced during the process of human absorptive cell differentiation(Ito et al., 2013). Recently, the research from Simmons’s group that published in Nature further proved that human absorptive colonocytes distinctly express BEST4 by single-cell profiling of healthy human colonic epithelial cells, and is dysregulated in colorectal cancer patients(Parikh et al., 2019). Furthermore, the analysis of RNA-seq expression data obtained from colorectal cancer samples and normal samples in the publicly available databases TCGA and GTEx also revealed a significant downregulation of BEST4 expression in colorectal cancer tissues, which is consistent with our research findings. The literature above demonstrates a close relationship between BEST4 and the normal function of the human colon, and provide evidence for their loss in colorectal cancer patients.

      b) Their study showed an increased expression of BEST4 protein levels in colorectal cancer patients through Western Blot. However, the antibody they used was only suitable for IHC-P and not for Western Blot (Abcam , ab188823); . In our study, we also utilized WB technology to detect the expression of BEST4 in colorectal cancer tissues and adjacent normal tissues. The results revealed a decreased expression of BEST4 protein levels in colorectal cancer patients. The antibody we used was specifically designed for WB detection (1:800; LsBio, LS-C31133 https://www.lsbio.com/antibodies/best4-antibody-n-terminus-wb-western-ls-c31133/29602).

      c) The study demonstrated an upregulation of BEST4 protein levels in colorectal cancer patients using immunohistochemistry (IHC). However, the expression of BEST4 was assessed in colorectal cancer tissues through IHC utilizing publicly available protein expression databases such as the Human Protein Atlas. Interestingly, this analysis revealed a minimal presence of BEST4 protein in colorectal cancer tissues (https://www.proteinatlas.org/ENSG00000142959-BEST4/pathology), contradicting their research findings but aligning with our own observations.

      d) Literature based on single-cell genomics analysis reports that only OTOP2 and BEST4 genes are expressed in a subset of the normal colorectal epithelial cells but not the rest(Parikh et al., 2019). An inhibitory effect of OTOP2 on CRC has been recently shown BEST4, and the Otopetrin 2 (OTOP2), which encodes proton‐selective ion channel protein were reported to distinct expressed in normal absorptive colonocytes and colocalized with each other (Drummond et al., 2017; Ito et al., 2013; Parikh et al., 2019). OTOP2 has been recently demonstrated to have an inhibitory effect on the development of CRC via being regulated by wide-type p53(Qu et al., 2019), while the role of BEST4 in CRC is less well studied, that indicate the potential of BEST4 to inhibit colorectal cancer. The Gene set enrichment analysis (GSEA) conducted by them revealed a significant enrichment of gene signatures associated with oncogenic signaling and metastasis, such as the PI3K/Akt signaling pathway, in patients exhibiting higher BEST4 expression compared to those with lower BEST4 expression. However, our GSEA did not show any significant enrichment of the PI3K/Akt signaling pathway in patients with higher BEST4 expression compared to those with lower BEST4 expression. In contrast to their findings, our BEST4 overexpression cell line did not exhibit a significant increase in phosphorylated Akt levels. The present study concludes that our findings align with previous literature and public database analyses, providing evidence for the downregulation of BEST4 in colorectal cancer tissues and its potential as an anticancer agent. Discrepancies observed in other studies may be attributed to difference in experimental model, protocols, preparations or experimental conditions.

      Minor Comments:

      (1) Western blot data should be quantified.

      Sincere thanks for catching this point to us. We have conducted a comprehensive quantification of all the Western Blot data and included the results in the supplementary file.

      (2) Errors in labelling figures in the text should be corrected (Line 214 and more).

      We apologize for these mistakes. We have made corrections to this section in the manuscript.

      (3) The authors used the human HES4 gene, which is indicated with the incorrect nomenclature. The gene and protein nomenclature should be correctly used.

      We apologize for these mistakes. We have made corrections to this section in the manuscript.

      (4) Methods and Materials for certain assays should be further clarified; e.g transwell migration/invasion assays (reference to previous publication? transwell inserts used, etc.)

      Sincerely thanks for catching this issue. We have implemented enhancements and updates to the respective sections.

      (5) Figure 2K: Quality of histology is insufficient.

      We apologize for the poor quality of the figures. The quality of Figure 2K was further enhanced and expanded.

      (6) Figure 2K: Can the authors speculate on whether there is any increase in proliferation through BEST4-ko in HCT15 cells (with overexpression of BEST4 leading to reduced proliferation) and how this may impact the metastatic assay or engraftment/seeding onto the liver?

      Our in vitro experiment demonstrated that the ablation of BEST4 in HCT-15 cells resulted in increased cell proliferation, clonogenesis, migration and invasion (Figures 1 and Figure 1-supplemental figure 1). These findings suggest that BEST4 knockout may potentially contribute to tumor proliferation in vivo; however, further research is required for confirmation. EMT transforms epithelial cells exhibiting a spindle fibroblast-like morphology, leading to the acquisition of mesenchymal characteristics and morphology, enabling these cells to acquire invasive and migratory abilities (Dongre and Weinberg, 2019). When diagnosed in advanced stages, EMT may occur as CRC metastasize to distal organs (Pastushenko and Blanpain, 2019; Sunlin Yong, 2021; Yeung and Yang, 2017; Zhang et al., 2021).  Our study demonstrated that BEST4 inhibits EMT in colorectal cancer (CRC) both in vitro and in vivo. Conversely, ablation of BEST4 promotes EMT by upregulating the expression of EMT-related genes, thereby facilitating the metastasis of colorectal cancer cells to the liver.

      (7) Figure 2L: Authors should indicate in the figure that the BEST4-rescued is at 0 (and not blank).

      Sincerely thanks for catching this issue. We have made corrections to this section in the manuscript.

      (8) Figure 3B: Authors should introduce the usage of the new LS174T cell line in the text.

      Sincerely thanks for catching this issue. The human colorectal cancer cell line, LS174T, was selected for Hes4 knockdown due to its comparatively higher expression of Hes4 in comparison to other CRC cell lines. We have made corrections to this section in the manuscript.

      (9) Figure 3F: Why is there less FLAG in the input, compared to the IP?

      Sincerely thanks for catching this issue. Cell lysates (20 µg) were used for input, and 500ug for IP according to the manufacturer's protocols.

      (10) Figure 5F-G: the quality of the figure is not good enough for interpretation.

      Again, we apologize for poor quality of pictures due to manuscript conversion. We have made corrections to this section in the manuscript.

      (11) Table 1: Conclusions made by the authors are wrong (lines 237-239); instead "high BEST4 expression more prevalent in females" and "low BEST4 expression more prevalent among CRC patients with advanced tumor stage". And how are low and high BEST4 expressions defined (the same applies to the data in Fig. 5F)?

      We apologize for these mistakes, we set cutoff-high (50%) and cutoff-low (50%) values to split the high-expression and low-expression cohorts. We have made corrections to this section in the manuscript.

      (12) In all Figure legends, there should be an indication of the type of statistical tests that were applied, as well as information on the number of independent experiments that were performed and provided the same results

      Sincerely thanks for catching this issue. The types of statistical tests applied in the Materials and Method- Statistical analysis section are indicated. Information on the number of independent experiments used is provided in the figure legend section.

      Reference

      Bulzico, D., Pires, B.R.B., PAS, D.E.F., Neto, L.V., and Abdelhay, E. (2019). "Twist1 Correlates With Epithelial-Mesenchymal Transition Markers Fibronectin and Vimentin in Adrenocortical Tumors". Anticancer research 39, 173-175. 10.21873/anticanres.13094.

      Cakouros, D., Isenmann, S., Hemming, S.E., Menicanin, D., Camp, E., Zannetinno, A.C., and Gronthos, S. (2015). "Novel basic helix-loop-helix transcription factor hes4 antagonizes the function of twist-1 to regulate lineage commitment of bone marrow stromal/stem cells". Stem Cells Dev 24, 1297-1308. 10.1089/scd.2014.0471.

      Christou, N., Perraud, A., Blondy, S., Jauberteau, M.O., Battu, S., and Mathonnet, M. (2017). "E-cadherin: A potential biomarker of colorectal cancer prognosis". Oncol Lett 13, 4571-4576. 10.3892/ol.2017.6063.

      Dongre, A., and Weinberg, R.A. (2019). "New insights into the mechanisms of epithelial-mesenchymal transition and implications for cancer". Nature reviews. Molecular cell biology 20, 69-84. 10.1038/s41580-018-0080-4.

      Drummond, C.G., Bolock, A.M., Ma, C., Luke, C.J., Good, M., and Coyne, C.B. (2017). "Enteroviruses infect human enteroids and induce antiviral signaling in a cell lineage-specific manner". Proceedings of the National Academy of Sciences of the United States of America 114, 1672-1677. 10.1073/pnas.1617363114.

      Fagerberg, L., Hallstrom, B.M., Oksvold, P., Kampf, C., Djureinovic, D., Odeberg, J., Habuka, M., Tahmasebpoor, S., Danielsson, A., Edlund, K., et al. (2014). "Analysis of the human tissue-specific expression by genome-wide integration of transcriptomics and antibody-based proteomics". Mol Cell Proteomics 13, 397-406. 10.1074/mcp.M113.035600.

      Hagen, A.R., Barabote, R.D., and Saier, M.H. (2005). "The bestrophin family of anion channels: identification of prokaryotic homologues". Molecular membrane biology 22, 291-302. 10.1080/09687860500129711.

      Ito, G., Okamoto, R., Murano, T., Shimizu, H., Fujii, S., Nakata, T., Mizutani, T., Yui, S., Akiyama-Morio, J., Nemoto, Y., et al. (2013). "Lineage-specific expression of bestrophin-2 and bestrophin-4 in human intestinal epithelial cells". PLoS One 8, e79693. 10.1371/journal.pone.0079693.

      Lazarova, D.L., and Bordonaro, M. (2016). "Vimentin, colon cancer progression and resistance to butyrate and other HDACis". Journal of cellular and molecular medicine 20, 989-993. 10.1111/jcmm.12850.

      Marmorstein, L.Y., McLaughlin, P.J., Stanton, J.B., Yan, L., Crabb, J.W., and Marmorstein, A.D. (2002). "Bestrophin interacts physically and functionally with protein phosphatase 2A". The Journal of biological chemistry 277, 30591-30597. 10.1074/jbc.M204269200.

      Meng, J., Chen, S., Han, J.X., Qian, B., Wang, X.R., Zhong, W.L., Qin, Y., Zhang, H., Gao, W.F., Lei, Y.Y., et al. (2018). "Twist1 Regulates Vimentin through Cul2 Circular RNA to Promote EMT in Hepatocellular Carcinoma". Cancer research 78, 4150-4162. 10.1158/0008-5472.Can-17-3009.

      Milenkovic, V.M., Langmann, T., Schreiber, R., Kunzelmann, K., and Weber, B.H. (2008). "Molecular evolution and functional divergence of the bestrophin protein family". BMC evolutionary biology 8, 72. 10.1186/1471-2148-8-72.

      Miller, A.N., Vaisey, G., and Long, S.B. (2019). "Molecular mechanisms of gating in the calcium-activated chloride channel bestrophin". eLife 8. 10.7554/eLife.43231.

      Nagai, T., Arao, T., Nishio, K., Matsumoto, K., Hagiwara, S., Sakurai, T., Minami, Y., Ida, H., Ueshima, K., Nishida, N., et al. (2016). "Impact of Tight Junction Protein ZO-1 and TWIST Expression on Postoperative Survival of Patients with Hepatocellular Carcinoma". Digestive diseases (Basel, Switzerland) 34, 702-707. 10.1159/000448860.

      Parikh, K., Antanaviciute, A., Fawkner-Corbett, D., Jagielowicz, M., Aulicino, A., Lagerholm, C., Davis, S., Kinchen, J., Chen, H.H., Alham, N.K., et al. (2019). "Colonic epithelial cell diversity in health and inflammatory bowel disease". Nature 567, 49-55. 10.1038/s41586-019-0992-y.

      Pastushenko, I., and Blanpain, C. (2019). "EMT Transition States during Tumor Progression and Metastasis". Trends in cell biology 29, 212-226. 10.1016/j.tcb.2018.12.001.

      Qu, H., Su, Y., Yu, L., Zhao, H., and Xin, C. (2019). "Wild-type p53 regulates OTOP2 transcription through DNA loop alteration of the promoter in colorectal cancer". FEBS open bio 9, 26-34. 10.1002/2211-5463.12554.

      Sonnhammer, E.L., and Durbin, R. (1997). "Analysis of protein domain families in Caenorhabditis elegans". Genomics 46, 200-216. 10.1006/geno.1997.4989.

      Sunlin Yong, Z.W., Tang Yuan, Chuang Cheng, Dan Jiang (2021). "Comparison of MMR protein and Microsatellite Instability Detection in Colorectal Cancer and Its Clinicopathological Features Analysis". Journal of Medical Research 50, 61-66. 10.11969/j.issn.1673-548X.2021.05.015

      Vesuna, F., van Diest, P., Chen, J.H., and Raman, V. (2008). "Twist is a transcriptional repressor of E-cadherin gene expression in breast cancer". Biochem Biophys Res Commun 367, 235-241. 10.1016/j.bbrc.2007.11.151.

      Yang, J., Mani, S.A., Donaher, J.L., Ramaswamy, S., Itzykson, R.A., Come, C., Savagner, P., Gitelman, I., Richardson, A., and Weinberg, R.A. (2004). "Twist, a master regulator of morphogenesis, plays an essential role in tumor metastasis". Cell 117, 927-939. 10.1016/j.cell.2004.06.006.

      Yeung, K.T., and Yang, J. (2017). "Epithelial-mesenchymal transition in tumor metastasis". Molecular oncology 11, 28-39. 10.1002/1878-0261.12017.

      Yusup, A., Huji, B., Fang, C., Wang, F., Dadihan, T., Wang, H.J., and Upur, H. (2017). "Expression of trefoil factors and TWIST1 in colorectal cancer and their correlation with metastatic potential and prognosis". World journal of gastroenterology 23, 110-120. 10.3748/wjg.v23.i1.110.

      Zhang, N., Ng, A.S., Cai, S., Li, Q., Yang, L., and Kerr, D. (2021). "Novel therapeutic strategies: targeting epithelial-mesenchymal transition in colorectal cancer". The Lancet. Oncology 22, e358-e368. 10.1016/s1470-2045(21)00343-0.

      Zhu, D.J., Chen, X.W., Zhang, W.J., Wang, J.Z., Ouyang, M.Z., Zhong, Q., and Liu, C.C. (2015). "Twist1 is a potential prognostic marker for colorectal cancer and associated with chemoresistance". American journal of cancer research 5, 2000-2011.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer 1:

      One major issue arises in Figure 4, the recording of VLPO Ca2+ activity. In Lines 211-215, they stated that they injected AAV2/9-DBH-GCaMP6m into the VLPO, while activating LC NE neurons. As they claimed in line 157, DBH is a specific promoter for NE neurons. This implies an attempt to label NE neurons in the VLPO, which is problematic because NE neurons are not present in the VLPO. This raises concerns about their viral infection strategy since Ca activity was observed in their photometry recording. This means that DBH promoter could randomly label some non-NE neurons. Is DBH promoter widely used? The authors should list references. Additionally, they should quantify the labeling efficiency of both DBH and TH-cre throughout the paper.

      In Figure 5, we found that the VLPO received the noradrenergic projection from LC, indicating the recorded Ca2+ activity may come from the axon fibers corresponding to the projection. Similarly, Gunaydin et al. (2014) demonstrated that fiber photometry can be used to selectively record from neuronal projection.

      We appreciate the reviewer's insightful suggestion to elaborate on the DBH promoter, we have now expanded our discussion to address the DBH (pg. 18): “DBH (Dopamine-beta-hydroxylase), located in the inner membrane of noradrenergic and adrenergic neurons, is an enzyme that catalyzes the conversion of dopamine to norepinephrine, and therefore plays an important role in noradrenergic neurotransmission. DBH is a marker of noradrenergic neurons. Zhou et al. (2020) clarified the probe specifically labeled noradrenergic neurons by immunolabeling for DBH. Recently, DBH promoter have been used in several studies (e.g., Han et al., 2024; Lian et al., 2023). The DBH-Cre mice are widely used to specifically labeled noradrenergic neurons (e.g., Li et al., 2023; Breton-Provencher et al., 2022; Liu et al., 2024). It is difficult to distinguish the role of NE or DA neurons when using the TH promoter in VLPO. Therefore, we used DBH promoter with more specific labeling. LC is the main noradrenergic nucleus of the central nervous system. In our study, we injected rAAV-DBH-GCaMP6m-WPRE (Figure 2 and 8) and rAAV-DBH-EGFP-S'miR-30a-shRNA GABAA receptor)-3’-miR30a-WPRES (Figure 9) into the LC. The results showed that DBH promoter could specifically label noradrenergic neurons in the LC, while non-specific markers outside the LC were almost absent.”

      As suggested, we have quantified the labeling efficiency of both DBH and TH-cre throughout the revised manuscript (Fig.2D; Fig.3D, N-O; Fig.4E-F, J, L; Fig.5E, L; Fig.6L, S, X; Fig.7G).

      A similar issue arises with chemogenetic activation in Fig. 5 L-R, the authors used TH-cre and DIO-Gq virus to label VLPO neurons. Were they labelling VLPO NE or DA neurons for recording? The authors have to clarify this.

      As previously addressed in response to Comment #1, we agree that it is difficult to distinguish the role of NE or DA neurons when using the TH promoter in the VLPO. Therefore, we injected the mixture of DBH-Cre-AAV and AAV-EF1a-DIO-hChR2(H134R)-eYFP/AAV-Ef1a-DIO-hM3Dq-mCherry viruses into bilateral LC and AAV-EF1a-DIO-hChR2(H134R)-eYFP/AAV-Ef1a-DIO-hM3Dq-mCherry virus into bilateral VLPO. Moreover, we quantified the labeling efficiency of DBH in the LC to demonstrate that this promoter can specifically label NE neurons (Fig. 5). Importantly, these corrections did not alter the outcomes of our results. Both photogenetic and chemogenetic activation of LC-NE terminals in the VLPO can effectively promote midazolam recovery (Fig. 5G, N).

      Another related question pertains to the specificity of LC NE downstream neurons in the VLPO. For example, do they preferentially modulate GABAergic or glutamatergic neurons?

      Our study primarily aimed to explore the role of the LC-VLPO NEergic neural circuit in modulating midazolam recovery. We acknowledge that our evidence for the role of LC NE downstream neurons in the VLPO, derived from activation of LC-NE terminals and pharmacological intervention in the VLPO (Fig.5, Fig.6, Fig.8, Fig.9) is limited. Accordingly, we now present the VLPO’s role as a promising direction for future research in the limitation section of our revised manuscript: “This study shows that the LC-VLPO NEergic neural circuit plays an important role in modulating midazolam recovery. However, the specificity of LC NE downstream neurons in the VLPO is not explained in this paper, which is our next research direction, VLPO neurons and their downstream regulatory mechanisms may be involved in other nervous systems except the NE nervous system, and the deeper and more complex mechanisms need to be further investigated.”

      In Figure 1A-D, in the measurement of the dosage-dependent effect of Mida in LORR, were they only performed one batch of testing? If more than one batch of mice were used, error bar should be presented in 1B. Also, the rationale of testing TH expression levels after Mid is not clear. Is TH expression level change related to NE activation specifically? If so, they should cite references.

      As recommended, we have supplemented error bar and modified the graph of LORR’s rate in the revised manuscript. (Fig. 1A-B; Fig. 9G-H).

      We agree that the use of TH as a marker of NE activation is controversial, so in the revised manuscript, we directly determined central norepinephrine content to reflect the change of NE activity after midazolam administration (Fig. 1D).

      Regarding the photometry recording of LC NE neurons during the entire process of midazolam injection in Fig. 2 and Fig. 4, it is unclear what time=0 stands for. If I understand correctly, the authors were comparing spontaneous activity during the four phases. Additionally, they only show traces lasting for 20s in Fig. 2F and Fig. 4L. How did the authors select data for analysis, and what criteria were used? The authors should also quantify the average Ca2+ activity and Ca2+ transient frequency during each stage instead of only quantifying Ca2+ peaks. In line 919, the legend for Figure 2D, they stated that it is the signal at the BLA; were they also recorded from the BLA?

      In this study, we used optical fiber calcium signal recording, which is a fluorescence imaging based on changes in calcium. The fluorescence signal is usually divided into different segments according to the behavior, and the corresponding segments are orderly according to the specific behavior event as the time=0. The mean calcium fluorescence signal in the time window 1.5s or 1s before the event behavior is taken as the baseline fluorescence intensity (F0), and the difference between the fluorescence intensity of the occurrence of the behavior and the baseline fluorescence intensity is divided by the difference between the baseline fluorescence intensity and the offset value. That is, the value ΔF/F0 represents the change of calcium fluorescence intensity when the event occurs. The results of the analysis are commonly represented by two kinds of graphs, namely heat map and event-related peri-event plot (e.g., Cheng et al., 2022; Gan-Or et al., 2023; Wei et al., 2018). In Fig. 2, the time points for awake, midazolam injection, LORR and RORR in mice were respectively selected as time=0, while in Fig. 4, RORR in mice was selected as time=0. The selected traces lasting for 20s was based on the length of a complete Ca2+ signal. We have explained the Ca2+ recording experiment more specifically in the figure legends and methods sections of our revised manuscript.

      To the BLA, we sincerely apologize for our carelessness, the signal we recorded were from the LC rather than the BLA. We have carefully checked and corrected similar problems in the revised manuscript.

      Reviewer 2:

      In figure legends, abbreviations in figure should be supplemented as much as possible. For example, "LORR" in Figure 1.

      As suggested, we have supplemented abbreviations in figure as much as possible in the revised manuscript.

      Additional recommendations:

      The main conceptual issue in the paper is the inflation of the conclusion regarding the mechanism of sedation induced by midazolam. The authors did not reveal the full mechanism of this but rather the relative contribution of NE system. Several conclusions in the text should be edited to take into account this starting from the title. I think the following examples are more appropriate: "NE contribution to rebooting unconsciousness caused by midazolam' or 'NE contribution to reverse the sedation induced by midazolam'.

      As suggested, we have moderated the assertions about the mechanism of sedation induced by midazolam in several conclusions starting from the title (Line 1,125,150,169,202,237,482), to present a more measured interpretation in the manuscript.

      Line 178-179, the authors state 'these suggest that intranuclear ... suppresses recovery from midazolam administration'. In fact, this intervention prolonged or postponed recovery from midazolam.

      In our revised manuscript, we have corrected this inappropriate term (Line 178).

      Pharmacology part (page 12) that aimed to pinpoint which NE receptor is implicated would suffer from specificity issues.

      In relation to the specificity issue, the focus on VLPO might be rational but again other areas are most likely involved given the pharmacological actions of midazolam.

      In the revised manuscript, we have discussed those specificity issues of NE receptor and areas involved throughout the midazolam-induced altered consciousness: “In addition, given the pharmacological actions of midazolam, other areas may also be involved. Current studies suggest that the neural network involved in the recovery of consciousness consists of the prefrontal cortex, basal forebrain, brain stem, hypothalamus and thalamus. The role of these regions in midazolam recovery remains to be further investigated. Therefore, we will apply more specific experimental methods to determine the importance of LC-VLPO NEergic neural circuit and related NE receptors in the midazolam recovery, and conduct further studies on other relevant brain neural regions, hoping to more fully elucidate the mechanism of midazolam recovery in the future”.

      Line 274, the authors used 'inhibitory EEG activity'. what does it mean? a description of which rhythm-related power density is affected would be more objective.

      Example of conclusion inflation: in line 477, the word 'contributes' is better than 'mediates' if the specificity issue is taken into account.

      As suggested, we have improved our expression of words in our revised manuscript (pg. 13-14).

      References

      Gunaydin LA, Grosenick L, Finkelstein JC, et al. Natural neural projection dynamics underlying social behavior. Cell. 2014;157(7):1535-1551. doi:10.1016/j.cell.2014.05.017

      Zhou N, Huo F, Yue Y, Yin C. Specific Fluorescent Probe Based on "Protect-Deprotect" To Visualize the Norepinephrine Signaling Pathway and Drug Intervention Tracers. J Am Chem Soc. 2020;142(41):17751-17755. doi:10.1021/jacs.0c08956

      Han S, Jiang B, Ren J, et al. Impaired Lactate Release in Dorsal CA1 Astrocytes Contributed to Nociceptive Sensitization and Comorbid Memory Deficits in Rodents. Anesthesiology. 2024;140(3):538-557. doi:10.1097/ALN.0000000000004756

      Lian X, Xu Q, Wang Y, et al. Noradrenergic pathway from the locus coeruleus to heart is implicated in modulating SUDEP. iScience. 2023;26(4):106284. Published 2023 Feb 27. doi:10.1016/j.isci.2023.106284

      Li C, Sun T, Zhang Y, et al. A neural circuit for regulating a behavioral switch in response to prolonged uncontrollability in mice. Neuron. 2023;111(17):2727-2741.e7. doi:10.1016/j.neuron.2023.05.023

      Breton-Provencher V, Drummond GT, Feng J, Li Y, Sur M. Spatiotemporal dynamics of noradrenaline during learned behaviour. Nature. 2022;606(7915):732-738. doi:10.1038/s41586-022-04782-2

      Liu Q, Luo X, Liang Z, et al. Coordination between circadian neural circuit and intracellular molecular clock ensures rhythmic activation of adult neural stem cells. Proc Natl Acad Sci U S A. 2024;121(8):e2318030121. doi:10.1073/pnas.2318030121

      Cheng J, Ma X, Li C, et al. Diet-induced inflammation in the anterior paraventricular thalamus induces compulsive sucrose-seeking. Nat Neurosci. 2022;25(8):1009-1013. doi:10.1038/s41593-022-01129-y

      Gan-Or B, London M. Cortical circuits modulate mouse social vocalizations. Sci Adv. 2023;9(39):eade6992. doi:10.1126/sciadv.ade6992

      Wei YC, Wang SR, Jiao ZL, et al. Medial preoptic area in mice is capable of mediating sexually dimorphic behaviors regardless of gender. Nat Commun. 2018;9(1):279. Published 2018 Jan 18. doi:10.1038/s41467-017-02648-0

    1. Author response:

      We thank the reviewers for their thoughtful and critical comments. We will revise and improve the manuscript according to the public reviews. In particular, we will:

      (1) provide a broader perspective on the potential clinical implications of our experiments regarding the mechanisms and the treatment of coma and disorders of consciousness. In particular, we will address how the reported increase in dynamical features associated with consciousness, even without behavioral signs, might be relevant to characterize patients with a motor-cognitive dissociation.

      (2) use the term "tDCS" to qualify the technique we used in the paper instead of "HD-tDCS" to avoid any potential confusion. We understand that "HD-tDCS", which we used in our paper to refer to high-density tDCS (small size electrodes), may cause some confusion with high-definition tDCS, which is more commonly used in the literature to design a 4x1 tDCS montage with smaller high-definition electrodes. We will also provide the full characteristics of the carbon electrodes we used for stimulation.

      (3) clarify the location sites of stimulation and provide structural MRI images with the accurate localization of the stimulating electrodes.

      (4) clarify the fMRI data analyses we performed and provide a schematic illustration of the analysis process.

    1. Author response:

      We are pleased that the reviewers found our study thought-provoking and appreciate the care they have taken in providing constructive feedback. Focusing on the main issues raised by the reviewers, we provide here a provisional response to the Public Comments and outline our revision plan.

      A) Reviewers 1 and 2 were concerned that our task and analyses were limited by the fact that we only tested the model based on biases in movement direction (angular biases) and did not examine biases in movement extent (radial biases).

      While we think the angular biases provide a sufficient test to compare the set of models presented in the paper, we appreciate that there was a missed opportunity to also look at movement extent.  Looking at predictions concerning both movement direction and extent would provide a stronger basis for model comparison. To this end, we will take a two-step approach:

      (1) Re-analysis of existing datasets from experiments that involve a pointing task (movements terminate at the target position) rather than a shooting task (movements terminate further than the target distance).  We will conduct a model comparison using these data. 

      (2) If we are unable to obtain a suitable dataset or datasets because we cannot access individual data or there are too few participants, we will conduct a new experiment using a pointing task.  We will use these new data to evaluate whether the transformation model can accurately predict biases in both movement direction and extent.

      We will incorporate those new results in our revision.

      B) Reviewer 3 noted that model fitting was based on group average data. They questioned if this was representative across individuals and how well the model would account for individual patterns of reach biases.

      To address this issue, we propose to do the following:

      (1) We will first fit the model to individual data in Exp 1 and assess whether a two-peak function, the signature of the transformation model, is characteristic of most the fits. We recognize that the results at the individual level may not support the model.  This could occur because the model is not correct.  Alternatively, the model could be correct but difficult to evaluate at the individual level for several reasons. First, the data set may be underpowered at the individual level. Second, motor biases can be idiosyncratic (e.g., within subject correlation is greater than between subject correlation), a point we noted in the original submission. Third, as observed in previous studies, transformation biases also show considerable individual variability (Wang et al, 2020); as such, even if the model is correct, a two-peaked function may not hold for all individuals.

      (2) If the individual variability is too large to draw meaningful conclusions, we will conduct a new experiment in which we measure motor and proprioceptive biases. Our plan would be to collect a large data set from a limited number of participants.  These data should allow us to evaluate the models on an individual basis, including using each participant’s own transformation/proprioceptive bias function to predict their motor biases.

      C) The reviewers have comments regarding the assumptions and form of the different models. Reviewer 3 questioned the visual bias model presented in the paper, and Reviewers 2 and 3 suggested additional visual bias/ biomechanical models to consider.

      We agree that what we call a visual bias effect is not confined to the visual modality: It is observed when the target is presented visually or proprioceptively, and in manifest in both reaching movements, saccades, and pressing keys to adjust a dot to match with the remembered target (Kosovicheva & Whitney, 2017; Yousif et al. 2023). As such, the bias may reflect a domain-general distortion in the representation of goals within polar space. We refer to this component as a "visual bias" because it is associated with the representation of the visual target in the reaching task.

      We do think the version of the visual bias model in the original submission is reasonable given that the bias pattern has been observed in perceptual tasks with stimuli that were very similar to ours (e.g., Kosovicheva & Whitney, 2017). We have explored other perceptual models in evaluating the motor biases observed in Experiment 1. For example, several models discuss how visual biases may depend on the direction of a moving object or the orientation of an object (Wei & Stocker, 2015; Patten, Mannion & Clifford, 2017). However, these models failed to account for the motor biases observed in our experiments, a not surprising outcome since the models were not designed to capture biases in perceived location.  There are also models of visual basis associated with viewing angle (e.g., based on retina/head position).  Since we allow free viewing, these biases are unlikely to make substantive contributions to the biases observed in our reaching tasks.

      Given that some readers are likely to share the reviewers’ concerns on this issue, we will extend our discussion to describe alternative visual models and provide our arguments about why these do not seem relevant/appropriate for our study.

      In terms of biomechanical models, we plan to explore at least one alternative model, the MotorNet Model (https://elifesciences.org/articles/88591). This recently published model combines a six-muscle planar arm model with artificial neural networks (ANNs) to generate a control policy. The model has been used to predict movement curvature in various contexts.  We will focus on its utility to predict biases in reaching to visual targets.

      D) Reviewer 1 had concerns with how we measured the transformation bias. In particular, they asked why the data from Wang et al (2020) are used as an estimate of transformation biases, and not as the joint effects of visual and proprioceptive biases in the sensed target and hand location, respectively.

      We define transformation error as the misalignment between the visual target and the hand position. We quantify this transformation bias by referencing studies that used a matching task in which participants match their unseen hand to a visual target, or vice versa. Errors observed in these tasks are commonly attributed to proprioceptive bias, although they could also reflect a contribution from visual bias. We utilized the same data set to simulate both the transformation bias model and the proprioceptive bias model.

      Although it may seem that we are simply renaming concepts, the concept of transformation error addresses biases that arise during motor planning. For the proprioceptive bias model, the bias only influences the perceived start position but not the goal since proprioception will influence the perceived position of the target before the movement begins. In contrast, the transformation bias model proposes that movements are planned toward a target whose location is biased due to discrepancies between visual and proprioceptive representations.

      The question then arises whether measurements of proprioceptive bias also reflect a transformation bias. We believe that the transformation bias is influenced by proprioceptive feedback, or at the very least, proprioceptive and transformation bias share a common source of error and thus, are highly correlated. We will revise the Introduction and Results sections to more clearly articulate these relationships and assumptions.

      E) Reviewer 3 asked whether the oblique effect in visual perception could account for our motor bias.

      The potential link between the oblique effect and the observed motor bias is an intriguing idea, one that we had not considered. However, after giving this some thought, we see several arguments against the idea that the oblique effect accounts for the pattern of motor biases.

      First, by the oblique effect, variance is greater for diagonal orientations compared to Cartesian orientations. These differences in perceptual variability can explain the bias pattern in visual perception through a Bayesian efficient coding model (Wei & Stocker, 2015). We note that even though participants showed large variability for stimuli at diagonal orientations, the bias for these stimuli was close to zero. As such, we do not think it can explain the motor bias function given the large bias for targets at locations along the diagonal axes.

      Second, the reviewer suggested an "oblique effect" within the motor system, proposing that motor variability is greater for diagonal directions due to increased visual bias. If this hypothesis is correct, a visual bias model should account for the motor bias observed, particularly for diagonal targets. In other words, when estimating the visual bias from a reaching task, a similar bias pattern should emerge in tasks that do not involve movement. However, this prediction is not supported in previous studies. For example, in a position judgment task that is similar to our task but without the reaching response, participants exhibited minimal bias along the diagonals (Kosovicheva & Whitney, 2017).

      Despite our skepticism, we will keep this idea in mind during the revision, investigating variability in movement across the workspace.

    1. Author response:

      Reviewer #1 (Public Review):

      In this work, the authors aimed to understand how titins derived from different nuclei within the syncytium are organized and integrated after cell fusion during skeletal muscle development and remodeling. The authors developed mCherry titin knock-in mice with the fluorophore mCherry inserted into titin's Z-disk region to track the titin during cell fusion. The results suggested that titin exhibited homogenous distribution after cell fusion. The authors also probed on how titin behaves during muscle injury by implantation of titin-eGFP myoblasts into adult mCherry-titin mice. Interestingly, titin is retained at the proximal nucleus and does not diffuse across the whole syncytium in this system. The findings of the study are novel and interesting. The experimental approaches are appropriate. The results are described well. However, the manuscript needs revisions to enhance its clarity.

      (1) In this work, the authors have not described the statistical analysis appropriately. In most of the figures, significance levels are not described. The information on the biological and technical replicates is missing in almost all the figures. This information is critical for understanding the strength of the experimentation.

      Thank you for this feedback, added the missing information to the figure legends.

      (2) The in vivo experiments are underpowered. The authors have used only 3 animals in the cardiotoxin injury experiment and eliminated another 3 animals from the analysis. How did they determine insufficient myoblast integration?

      The experimental design was targeted at using transplantation of myoblasts into skeletal muscle to obtain information on the ability of transplanted cells to fuse with cells in the injured area – and if those myoblasts could provide titin protein beyond the confinement of the transplanted cells (as would be expected after cell fusion). The goal was not to optimize cell transplantation with improved force generation of lesioned muscle. For this, we agree, the experiments would be underpowered.

      Here, we use a different approach, and successfully demonstrate the integration of titin protein from transplanted cells into sarcomeres of host muscle fibers. Here, only an animal number of 5 per group was approved by the local authorities, in agreement with the scope of our proposed hypothesis on cell fusion contributing titin beyond the transplanted cell and in agreement with the 3R guidelines and the necessity to addressed our research question in as few animals as possible. We proposed the need for at least 3 animals per implantation group and included 2 additional animals for compensation in case there was insufficient myoblast integration (no detection of GFP+ cells). The resulting n=3 and n=4 animals provided enough fusion events to show that even after 3 weeks, titin protein is confined to the address our hypothesis: in case after cell fusion titin is homogenously distributed, we would have expected red and greed striation throughout the fiber. This was not the case. In 8 out of 8 fused cells we had a segregation of green and red titin molecules as depicted in figure 6 and S5.

      (3) Similarly, the in vitro imaging experiments, especially the in vitro titin mobility assays used only 3 cells (Fig 2b) or 6-9 cells (Fig 2c-2e). The number of cells imaged is insufficient to derive a valid conclusion. What is the variability in the results between cells? Whether all the cells behave similarly in titin mobility assays?

      For Figure 2 we had described our replicates insufficiently. Quantification in 2b-e consists of total 9 cells out of 3 independent experiments (3 per experiment). For 2d one outlier (Grubbs test) was excluded for the GFP signal. For 2e we only included cells that could be fitted with a two-phase association curve. That resulted in 6 cells for the GFP signal and 7 cells for the mCherry signal.

      (4) Figure 1c-e, Figure 2a, Figure 3, Figure 4, Figure 5, Figure 6- please describe the replicates and also if possible, quantify the data and present them as separate figures.

      1) Figure 1d (former 1c) is the validation that titin is properly integrated into the sarcomere and that the cherry signal localizes to the Z-disk, overlapping with actinin. This is qualitative, not quantitative information and replicated and confirmed in figure 2. 1e (former 1d) is a representative image for the quantification in 1f (former 1e) with 3 biological replicates (=cells) and 3 technical replicates (=Z-disks) each, for every time point significantly different with p<0.001, tested by 2-way ANOVA

      2) 2a: representative image (+regarding profile) for quantification in 2b (9 biological replicates(=cells) measured at 3 different experiment days) (see answer to 1-3)

      3) Representative images: Cells were seeded on several cover slips and fusion was started. This was done on 4 occasions (=technical replicates) with different stainings (see supplement) and 30+ images were taken in total with at least 5 images per staining. The taken images of different fusion stadiums were later classified based on the distribution of the differentially labeled titin.

      4) a-c: representative image that shows two independent fusion events; fusion experiments were performed at 4 days with a total of 13 fusion events captured (6 only immature cells, 7 with one mature cell). For quantification in d+e, very small (< 1000 μm2) and very large (> 10,000 μm2) syncytia were excluded to minimize the effect of large size differences of the syncytia, so that 5 immature and 4 mature fusion events remained for comparative analysis.

      5) smFISH Experiment was repeated on 2 days and 6 images of fusion events were made. Since they were in different stages of fusion and 4 elements contributed to the images (mCherry-RNA, GFP-RNA, mCh-Titin protein, GFP-Titin protein), it was difficult to compare. However, we added the quantification to Fig. S4 (b and c) and added a regarding paragraph to the results. There seems to be a smaller overlap region for the RNA than for the protein signal.

      6) Representative images with n=6 (but 3 excluded due to insufficient myoblast integration) biological replicates (mice) for the CTX+cells group (main experiment group) and n=4 for the only cells control and n=1 for the only CTX control, based on 3R regulation of animal experiments. From each mouse (n=11) the contralateral TA muscle was harvested as well to serve as an uninjured and without cell transplantation control.

      (5) Figure 2- the authors excluded samples with an obvious decrease in cell quality during imaging from the analysis. How do the authors assess the cell quality? Simply by visual examination? Or were the samples that did not show fluorescence recovery eliminated? I am wondering what percentage of cells showed poor cell quality. How do they avoid the bias? I recommend that the authors include these cells also for the analysis of data presented in Figures 2b, 2c, and 2f.

      Cells were not excluded for their recovery status, but only if they showed signs of cell death (collapse of sarcomere structures, membrane bubbling, etc). All cells that stayed alive during the imaging showed a fluorescence recovery. Cells that had only a slower or uncomplete recovery were not excluded from the complete analysis. One cell was excluded from the comparison of exchange half-life (Fig. 2d), since it was a significant outlier. For Figure 2e (Fast phase) only cells could be included, where we were able to fit a two-phase association curve.

      (6) It is unclear how the authors identified the different stages of cell fusion in the microscopy images i.e. early fusion, distribution, and complete distribution.

      Early fusion was characterized when two cells made connection with their membranes, but differentially labeled titin has not yet mixed. Distribution was characterized when titin mixing has started but is not yet complete.

    1. Author response:

      Reviewer #1 (Public Review):

      Lactobacillus plantarum is a beneficial bacterium renowned for its positive physiological effects and probiotic functions. Fu et al. conducted an investigation into the involvement of this bacterium in host purine metabolism. Initially, they employed microbiomics to analyze changes in L. plantarum within a hyperuricemia model, followed by isolation of the bacterium from this model. The gene map associated with purine nucleoside metabolism was determined through whole-genome analysis. Metabolic shifts in L. plantarum under nucleoside-enriched conditions were assessed using HPLC and metabolomics, while underlying mechanisms were explored through gene knockout experiments. Finally, the efficacy of L. plantarum was validated in hyperuricemia models involving goslings and mice. The authors presented their findings coherently and logically, addressing key questions using appropriate methodologies and yielding significant and innovative results. The authors demonstrated that host-derived Lactobacillus plantarum alleviates host hyperuricemia by influencing purine metabolism. However, their study primarily focused on this bacterium without delving deeper into the mechanisms underlying hyperuricemia beyond verification through two models. Nevertheless, these findings are sufficient to support their conclusion effectively. Additionally, further research is warranted to investigate the metabolites of Lactobacillus plantarum.

      We appreciate the reviewers' suggestions. We have studied Lactobacillus plantarum in detail, focusing specifically on its role in the purine nucleoside metabolism of the host, confirmed through in vitro and in vivo experiments. Our key finding demonstrates how Lactobacillus plantarum contributes to this process. We also examined the expression of hepatic uric acid synthesis proteins and renal uric acid excretion proteins related to alleviating host hyperuricaemia (Figure 9). While discussing the metabolites of Lactobacillus plantarum may fall outside the scope of this article, we plan to investigate this further. Our goal is to identify a signature metabolite via in vitro and in vivo studies and explore how it may help reduce hyperuricaemia in the host.

      Reviewer #2 (Public Review):

      Summary:

      Purine nucleoside metabolism in intestinal flora is integral to the purine nucleoside metabolism in the host. This study identified the iunH gene in Lactobacillus plantarum that regulates its purine nucleoside metabolism. Oral gavage of Lactobacillus plantarum and subsequent analysis showed it maintains homeostasis of purine nucleoside metabolism in the host.

      Strengths:

      This study presents sufficient evidence for the role of Lactobacillus plantarum in alleviating hyperuricaemia, combining microbiomics, whole genomics, in vitro bacterial culture, and metabolomics. These results suggest the iunH gene of Lactobacillus plantarum is crucial in host purine nucleoside metabolism. The experimental design is robust, and the data are of high quality. This study makes significant contributions to the fields of hyperuricaemia, purine nucleoside metabolism, and Lactobacillus plantarum investigation.

      We appreciate the reviewers' encouraging feedback.

      Weaknesses:

      A key limitation of this manuscript is the absence of an in-depth study on the alleviation metabolism of Lactobacillus plantarum. Notable questions include: What overall metabolic changes occur in a purine nucleoside-enriched environment? How do the metabolites of Lactobacillus plantarum vary? Do these metabolites influence host purine nucleoside metabolism? These areas merit further investigation.

      Thank you! The Supplementary Material link includes intracellular and extracellular metabolomics data for Lactobacillus plantarum, detailing the overall metabolic changes. We agree with the reviewer that the effect of metabolites on host purine nucleoside metabolism is worth investigating, but it has not been explored too much in this paper as it focuses more on the changes in the metabolites of the purine nucleosides themselves. We plan to explore this topic further in future research.

      Reviewer #3 (Public Review):

      Fu et al. present a multi-model study using goose and mouse that investigates the protective effects of Lactobacillus plantarum against hyperuricaemia. They highlight this strain's significance and clarify its role in responding to intestinal nucleoside levels and affecting uric acid metabolism through modulation of host signaling pathways.

      Strengths:

      (1) Fu et al. created two animal models for validation, yielding more reliable and extensive data. In addition, the in vitro tests were repeatedly tested by a multitude of methods, proving to be convincing.

      (2) This study integrates microbiomics, whole genomics, in vitro bacterial culture, and metabolomics, providing a wealth of data and valuable insights for future research.

      We thank the reviewer for their encouraging assessment.

      Weakness:

      Fu et al. clearly described the role of Lactobacillus plantarum, but it is also important to explore its other mechanisms influencing uric acid metabolism in the host. While changes in hepatic and renal uric acid metabolism were confirmed, the gut's role in this process deserves investigation, particularly regarding whether Lactobacillus plantarum or its metabolites act within the gut. The authors have effectively conveyed the story outlined in the article's title, and the remainder can be explored later. In addition, further discussion is needed to highlight how this strain of Lactobacillus plantarum differs from other Lactobacillus strains or how it innovatively functions differ from some literature reported.

      Thank you! We fully acknowledge the importance of investigating the role of gut in this process, especially whether Lactobacillus plantarum or its metabolites have an effect within the gut, which would be an interesting topic for a follow-up study. We fully agree that it is crucial to highlight how this Lactobacillus plantarum differs from other strains and those reported in the literature regarding its innovative functions, as discussed in detail in lines 343 to 376. We fully acknowledge the importance of investigating the role of gut in this process, especially whether Lactobacillus plantarum or its metabolites have an effect within the gut, which would be an interesting topic for a follow-up study. We fully agree that it is crucial to highlight how this Lactobacillus plantarum differs from other strains and those reported in the literature regarding its innovative functions, as discussed in detail in lines 343 to 376. Previous studies indicate that Lactobacillus plantarum can reduce hyperuricaemia, but its specific uric acid-lowering mechanism and the process of nucleoside degradation remain unclear. We investigated the nucleoside hydrolysis function of Lactobacillus plantarum, identified key genes, and validated by gene knockout. Our findings suggest that host-derived Lactobacillus plantarum plays an antagonistic role against hyperuricaemia.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Theoretical principles of viscous fluid mechanics are used here to assess likely mechanisms of transport in the ER. A set of candidate mechanisms is evaluated, making good use of imaging to represent ER network geometries. Evidence is provided that the contraction of peripheral sheets provides a much more credible mechanism than the contraction of individual tubules, junctions, or perinuclear sheets.

      The work has been conducted carefully and comprehensively, making good use of underlying physical principles. There is a good discussion of the role of slip; sensible approximations (low volume fraction, small particle size, slender geometries, pragmatic treatment of boundary conditions) allow tractable and transparent calculations; clear physical arguments provide useful bounds; stochastic and deterministic features of the problem are well integrated.

      We thank the reviewer for their positive assessment of our work.

      There are just a couple of areas where more discussion might be warranted, in my view.

      (1) The energetic cost of tubule contraction is estimated, but I did not see an equivalent estimate for the contraction of peripheral sheets. It might be helpful to estimate the energetic cost of viscous dissipation in generated flows at higher frequencies.

      This is a good point. We have now included an energetic cost estimate for the contractions of peripheral sheets in the revised manuscript.

      The mechanism of peripheral sheet contraction is unclear: do ATP-driven mechanisms somehow interact with thermal fluctuations of membranes?

      The new energetic estimates in the revision might help constrain possible hypotheses for the mechanism(s) driving peripheral sheet contraction, and suggest if a dedicated ATP-driven mechanism is required.

      (2) Mutations are mentioned in the abstract but not (as far as I could see) later in the manuscript. It would be helpful if any consequences for pathologies could be developed in the text.

      We are grateful for this suggestion. The need to rationalise pathology associated with the subtle effects of mutations of ER-morphogens is indeed pointed out as one factor motivating the study of the interplay between ER structure and performance. In the revised manuscript, we have included a brief discussion potentially linking the malfunction of ER morphogens to luminal transport, referencing freshly published findings.

      Reviewer #2 (Public Review):

      Summary:

      This study explores theoretically the consequences of structural fluctuations of the endoplasmic reticulum (ER) morphology called contractions on molecular transport. Most of the manuscript consists of the construction of an interesting theoretical flow field (physical model) under various hypothetical assumptions. The computational modeling is followed by some simulations.

      Strengths:

      The authors are focusing their attention on testing the hypothesis that a local flow in the tubule could be driven by tubular pinching. We recall that trafficking in the ER is considered to be mostly driven by diffusion at least at a spatial scale that is large enough to account for averaging of any random flow occurring from multiple directions [note that this is not the case for plants].

      We thank the reviewer. Indeed, the trafficking in the ER was historically presumed to be driven by passive diffusion but this has been challenged by recent findings suggesting that the transport may also involve an active super-diffusional component (the short-lasting flows). These findings include: the dependence of ER luminal transport on ATP-derived energy observed in the historical and recent publications cited here; fast and directional single-particle motion; and a linear scaling of photoactivated signal arrival times with distance. On a larger scale, indeed, the motion can be seen as a faster effective diffusion, as there is no persistent circulatory directionality of the currents.

      Weaknesses:

      The manuscript extensively details the construction of the theoretical model, occupying a significant portion of the manuscript. While this section contains interesting computations, its relevance and utility could be better emphasized, perhaps warranting a reorganization of the manuscript to foreground this critical aspect.

      Overall, the manuscript appears highly technical with limited conclusive insights, particularly lacking predictions confirmed by experimental validation. There is an absence of substantial conclusions regarding molecular trafficking within the ER.

      We sought to balance the theoretical/computational details of our model with the biophysical conclusions drawn from its predictions. Given the model's complexity and novelty, it was essential to elucidate the theoretical underpinnings comprehensively, in order to allow others to implement it in the future with additional, or different, parameters. To maintain clarity and focus in the main text, we have judiciously relegated extensive technical details to the methods section or supplementary materials, and divided the text into stand-alone section headings allowing the reader to skip through to conclusions.

      The primary focus of our manuscript is to introduce and explore, via our theoretical model, the interplay between ER structure dynamics and molecular transport. Our approach, while in silico, generates concrete predictions about the physical processes underpinning luminal motion within the ER. For instance, our findings challenge the previously postulated role of small tubular contractions in driving luminal flow, instead highlighting the potential significance of local flat ER areas—empirically documented entities—for facilitating such motion.

      Furthermore, by deducing what type of transport may or may not occur within the range of possible ER structural fluctuations, our model offers detailed predictions designed to bridge the gap between theoretical insight and experimental verification. These predictions detail the spatial and temporal parameters essential for effective transport, delineating plausible values for these parameters. We hope that the model’s predictions will invite experimentalists to devise innovative methodologies to test them. We have introduced text edits to the revised version to clarify the reviewer’s point as per the detailed comments below.

      Recommendations for the authors:

      Editor comments (Recommendations For The Authors):

      The two reviewers have different opinions about the strengths and weaknesses of this work. The editors do believe that this work is a valuable contribution to the field of ER dynamics and transport, and could stimulate experiments.

      We thank both reviewers and the editors for the time and care they have invested in reviewing our manuscript.

      Nevertheless, discussing further the role of diffusion vs. advection in ER luminal transport, including conflicting values of measured diffusion coefficients, would be valuable. For instance, it is possible that the active contraction-driven mechanism results in an effective diffusion over a long time, which could be quantified and compared to experiments.

      In our study we focus on tubule-scale transport because the statistics of transport at this scale have been measured and the origins of the observed transport is an outstanding problem. We already know from Holcman et al. (2018) that transport at the tubule scale involves an active, possibly advective, component beyond passive molecular diffusion. Although we do touch briefly on a network-scale phenomenon in our section on mixing/content homogenisation, our main focus is on trying to understand tubule-scale transport. We agree that a substantial exploration of effective diffusion over a network scale would be of value and increase the breadth of our paper, we feel that this is beyond the scope of the current paper. We believe the “conflicting” diffusion coefficients, in fact, characterise motion at different time and length scales: the global diffusion coefficient pointed out to us in the reviews may pertain to network-scale effective diffusion over long time scales, but this is different to the Brownian motion on the scale of tubules/tubular junctions relevant to our in silico model.

      Reviewer #1 (Recommendations For The Authors):

      I congratulate the authors on their work and do not have any substantial further recommendations, beyond two minor points.

      (1) Before (13), say "Using the expression (7) for Q_2, ..."

      (2) Typo on p.25: "principal" rather than "principle" (two instances)

      We thank the reviewer for spotting these and have addressed both points.

      Reviewer #2 (Recommendations For The Authors):

      Here are some specific comments:

      (1) Insufficient Influence of ER Tubule Contraction:

      The conclusion regarding weak fluid flows generated by ER tubule contractions may seem obvious. It would be more intriguing if the authors explored conditions necessary to achieve faster flows, such as those around 20 µm/s, within tubules.

      We agree these are important conditions to explore and it is extensively covered in Fig. 4e-f, which show that tubule contraction sites the length of entire tubules and occurring at 5 and 10 times the experimentally measured rates produce mean average edge traversal speeds exploring otherµconceivable scenarios. of ~30 and 60 m/s respectively. These pinch parameters seemed unlikely and motivated

      (2) Limited Impact of ER Network Geometry:

      The comparison across different ER network structures seems insufficiently documented. A comparison between distal and proximal ER from the nucleus could provide deeper insights.

      We have added text in the new paragraph 4 of the introduction to better articulate the core principles of the ER’s structural elements. As established by historical EM and light microscopy, the ER is universally composed of tubules, with 3-way junctions, and small (peripheral) or large perinuclear sheets. We establish that the specific shaping of these elements influences the nanofluidics we investigate here. While the proportion of these elements may vary across different cell types and cellular regions, the fundamental structure, and therefore the impact on local mobility remains consistent. Our categorisation of the ER into its elements reflects these ubiquitous components, allowing us to analyse the impact of shaping at the relevant scale, covering the perinuclear and peripheral ER.

      (3) Ineffectiveness of Tubule Junction Contraction:

      The study's negative result on ER tubule junction contraction's impact on molecular exchange may not capture broad interest without experimental validation. Conducting experiments to test this hypothesis could strengthen the study.

      We agree that experimental testing of this prediction in the future, when appropriate tools become available to correlate molecular motion speed and fast contractions of nanoscopic tubular junctions, will be needed for its validation.

      (4) Potential Role of Peripheral Sheets:

      While the speculation on the contraction of peripheral ER sheets is intriguing, further experimental investigation is warranted to validate this hypothesis, especially considering the observed slow diffusion in ER sheets.

      We agree with the reviewer that our study is theoretical in nature and on the necessity of further experimental investigation before we are able to make a definitive conclusion on peripheral sheets.

      In summary, while the study underscores the complexity of ER morphology dynamics and its implications for molecular transport, its novelty and broad implications seem limited. Given its reliance on computational simulations and dense theoretical language, submission to a computational journal could be more appropriate. In addition, given there is an absence of substantial conclusions regarding molecular trafficking within the ER, publication in a specialized journal of fluid mechanics or physics may be appropriate.

      Comments:

      - The manuscript is hard to read. There is no smooth transition from Figure 1 to Figure 2.

      To smoothen the transition, we edited the text at the beginning of results and added a reference there to the introductory Fig. 1.

      - Figure 8 serves no purpose. To make the text easier, C0, C1, C2... should be presented in Figure 2 and merged with Figure 10 with a table summarizing the information of these networks. It is not clear why 5 networks are needed. They look similar. Could you add the number of nodes per network?

      We have now merged Fig 8 and Fig 10 from the previous version into one figure (which is now Fig 9). We have also added information about the number of nodes and added a sentence in the manuscript to clarify that it showcases the source data used to model/reconstruct realistic ER structures.

      - Figure 13: seems out of contex. What is the message? The ER does not show any large flow--from early FRAP and recent photoactivation - the material seems to diffuse at long distances made by few tubules.

      Fig 13 (now Fig 12 in the revised version) does not illustrate any flow. Its purpose is to illustrate the computational methodology used to simulate flows and transport due to contraction of perinuclear sheets. (Note that we have spotted and fixed a small but important typo in the caption: “peripheral” →”perinuclear”.) It is worth noting that FRAP provides a relative estimate of mobility but contains no information as to the mode of motion. Whether the motion is diffusive or otherwise must be presumed in FRAP analyses and this presumption then can be used to extract metrics such as the diffusion coefficient. Photoactivation analyses suffer from the same limitation but analysis of how photoactivated signal arrival times scale with distance was recently suggested as a workaround. These measurements suggest a superdiffusive ER transport (https://doi.org/10.1016/j.celrep.2024.114357). Although a different approach used in a recent preprint to photoactivation signal analysis suggests that at long-distances transport can be approximated as diffusion (https://doi.org/10.1101/2023.04.23.537908), improved measurement in the future would be needed to address the seeming discrepancies.

      - Figure 1: what is the difference between a and b? How do you do your cross-section? This probability needs a drawing at least to understand how you define it.

      We expanded the explanation in the third last paragraph of Section I.

      - Figure 2: this manuscript is not a review. It is not clear why part of a figure is copied and pasted from another manuscript. It should be removed. Are the authors using the quantification [peaks of different color]? Where? The title should be given to explain each panel.

      We have chosen to keep the inset, which was not in the main text of the cited paper but its supplementary information, and provides a direct benchmark for our work.

      Why the mean flow in a is stochastic? With large excursion for large values? Could you plot the Fourrier or spectrogram so we can understand the frequencies? Are there regular patterns of bursts?

      The mean (i.e. cross-sectionally averaged) flow is stochastic because the pinching events are random (more precisely, they follow a Poisson distribution, as explained in the paper). Large excursions are rare and caused by interactions between pinches. We have prescribed the distributions of pinch durations and frequencies as per experimentally measured distributions and we do not expect to recover from a Fourier analysis more information than we have prescribed.

      What do we learn from the fit of Fig 2b-c? Is it a constant flow?

      The conclusion of Fig 2b-c is that the in silico simulation model based on the pinching tubule hypothesis produces solute transport, as quantified by the instantaneous particle speeds (Fig 2b) and the average edge traversal speeds (Fig 2c), that is much weaker than experimentally measured. This is one of the main results of our paper and explained in Section IIA, paragraph 3. Fig 2a tells us that the flow is not constant (flows in this system can only be generated transiently, with directionality persistence considered unlikely).

      Figure 14: Estimating of the area is unclear. The legend is largely insufficient.Why did the authors report only nine regions of contractions? Is it so rare? How many samples have they used? Nine among how many?

      Thank you; the details of area estimation are included in the main text, in Section I.4. The nine regions are an arbitrary selection of a sample we deemed representative of this phenomenon.

      - Abstract: this is misleading, it should start by explaining that diffusion is the consensus of trafficking in the ER.

      - "the content motion in actively contracting nanoscopic tubular networks" is misleading. We should recall that this is an assumption that has not been proven.

      The current abstract is a succinct summary of the question in scope and results. The sentence highlighted by the referee specifically refers to the model we study in the paper; we modified it in order to remove any ambiguity and to make clear that we are testing a proposed mechanism. We also point out that although the biological origins of the tubule contractions or their effects on solute transport have not been established, these contractions have been documented.

      Minor comments:

      Introduction: "Thus past measurements indicate that the transport of proteins in ER is not consistent with Brownian motion" is misleading. You should explain that this depends on the time scale. At large timescale, diffusion is a coarse-grained description and is actually accurate from FRAP and photoactivation data [see J. Lippincott-Schwartz publications over the past 20 years].The super-diffusion [9] "A photoactivation chase technique also measured a superdiffusive behaviour of luminal material spread through the ER network [9]." This is not clear and is probably due to an artifact of measurements or interpretation.

      We thank the reviewer for this comment. We expanded in paragraph 2 of the introduction to better reflect the state of knowledge around this point.

      Page 2 "Strocytes" does not exist you may be meant "Astrocytes".

      Thank you; typo fixed.

      Page 5: The value of the flow seems incompatible with previous literature ~ 20 mu m/s. Again where 0.6 is found? Where in [7]: if there is no diffusion in the tubule, why compare with 0.6 mu m^2/s? The global diffusion coefficient is much higher ~ 5 mu m^2/s.

      Supplementary Figure 3b of µHolcman et al., Nat Cell Biol, 2018 (Ref. [7] in the unrevised version: The value of 0.6   m^2/s is the intranodal diffusion coefficient reported empirically in our article), for ER in COS-7 cells. Motion inside the tubule would in general consist of a combination of advection and diffusion; since the same fluid occupies the tubules and the m^2/s as the diffusion coefficient in tubules as well. The experiments in Holcman et al. (2018) µ junctions, and the junction sizes are similar to the tubule diameters, it is reasonable to take 0.6 does not mention diffusion inside tubules because (i) the study reports a dominantly advective (or at least active) transport across tubules (the driving mechanism of which remains unknown) but this does not mean diffusion is not there as well; and (ii) the time resolution in these experiments are too low to capture the fine details of solute motion inside tubules, and the transport is captured only as “jumps” between junctions. We point out also that the higher global diffusion coefficient may pertain to network-scale effective diffusion over long time scales, which is different to the Brownian motion at the scale of tubules/tubular junctions relevant to our _in silic_o model.

      Page 5: "The distributions of the average edge traversal speeds appeared insensitive to ER structure variations for both pinching-induced and exclusively diffusion transport." is rather trivial. Similar to "the presumed pinching parameters would be inadequate to facilitate ER luminal material exchange."

      These sentences, and the surrounding text, report the observed outcome of our numerical simulations: pinch-induced transport statistics has little variation across different ER geometries, and pinching does not facilitate luminal content mixing. This conclusion was not clear to us without running the simulation, and hence we deemed it nontrivial and relevant to comment on.

      Page 7: The authors mention that they could measure "typical edge traversal speed of 45 µm/s".

      I am not aware of such a measurement. Could they explain where this number comes from?

      These measurements were reported in Supplementary Figure 3b (bottom right) of Holcman et al. (2018) and are for the ER in a COS-7 cell. The main figure Fig. 2g reports analogous measurements for COS-7 cells because the tubule contraction data reported in this work measurements (mean speed 20 m/s) for a HEK-293 cell. We have worked with the speed pertains to COS-7 cells.

      A contraction that leads to 3.9 mu m/s over a distance of a few microns would be interesting. Is this a prediction of the present model?

      Yes, as stated in Section II, C paragraph 2. The present model with the experimentally measured averages for the tubule contraction parameters does indeed predict that a particle, in the absence of diffusion, is transported by a single tubule contraction at a maximum speed of 3.9 µm/s over 0.19 µm.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      (1) The overall writing is very difficult to follow and the authors need to work on significant re-writing. 

      Thank you for your comment. We have rewritten the text and asked an immunology expert, who is also a native English speaking editor, to review it.

      (2) The paper in its current form really lacks detail and it is NOT possible for readers to repeat or follow their methods. For example: a) It is not clear whether the authors checked the serum to see if the mice were producing antibodies before they sacrificed them to harvest spleen/blood i.e. using ELISA? b) How long after administration of the second dose were the mice sacrificed? c) What cell types are taken for single B cell sorting? Splenocytes or PBMC?

      Thank you for your comment. We have revised the methodology section thoroughly to ensure that the readers can follow and replicate the method. Our responses to the specific examples raised are as follows:

      a) We did not examine the serum titer after immunization. An increased serum titer, as determined by ELISA, does not always reflect the number of cross-reactive B cells because we expected the serum titer to consist of polyclonal antibodies, which are a mixture of PR8-reactive, H2-reactive, and cross-reactive clones. We thus anticipated that we would not obtain enough cross-reactive B cells after a series of immunizations. After comparing various immunization methods, including different adjuvants and immunization sites, using the readout of the number of cross-reactive B cells, we decided to adopt the immunization protocol presented in this paper.

      b) We sacrificed the mice two weeks after the second immunization (see Supplementary Figure 5).

      c) For this experiment, we used CD43 MACS B cells from the spleen purified with negatively charged beads (see Supplementary Figure 6).

      (3) According to the authors, 77 clones were sorted from the PR8+ and H2+ double positive quadrant. It is surprising that after transfection and re-analyzing of bulk antibody presenting EXPI cells on FACS, only 13 clones (or 8 clones? - unclear) seemed to be truly cross-reactive. If that is the case, the approach is not as efficient as the authors claimed.

      Thank you for your comment. To isolate high affinity antibodies, we gated the high fluorescent intensity population of cross-reactive B cells during Ig-expressing 293 cell sorting, as shown in Fig 2B, while we collected a wide intensity population of cross-reactive cells during splenocyte sorting. The narrow gating reduced the number of clones. We, however, cannot quantify how many clones we lost in the process, but we achieved a cloning efficiency exceeding 75%. To avoid any confusion, we have clarified this point by attaching additional supplementary figures (Supplementary Figures 5 and 6).

      Reviewer #2 (Public Review):

      (4) A His tagged antigen was used for immunization and H1-his was used in all assays. Either the removal of His specific clones needs to be done before selection, or a different tag needs to be used in the subsequent assays.

      Thank you for your comment. As pointed out, the possibility of antibody generation in regions other than HA cannot be ruled out since the immunized antigen and the detection antigen were the same. However, as shown in Table 1, the cross-reactive antibodies obtained in this study exhibited characteristic binding abilities to each of the six types of HA. If these were antibodies recognizing His, they would bind to all six types of HA. This indicates that these cross-reactive antibodies were not His-specific clones.

      We have incorporated information on this potential caveat into the discussion (page 12, lines 4-9).

      (5) This assay doesn't directly test the neutralization of influenza but rather equates viral clearance to competitive inhibition. The results would be strengthened with the demonstration of a functional antibody in vivo with viral clearance.

      Thank you for your constructive comment. While we agree that demonstration of a functional antibody in vivo with viral clearance would strengthen our results, this is clearly out of the scope of our current study and will be subject of future research.

      (6) Limitations of this new technique are as follows: there is a significant loss of cells during FACs, transfection and cloning efficiency are critical to success, and well-based systems limit the number of possible clones (as the author discussed in the conclusions). Early enrichment of the B cells could improve efficiency, such as selection for memory B cells.

      Thank you for your comment. Our cloning efficiency for sorted B cells exceeded 75%. However, we selected high binders of cross-reactive B cells during Ig-expressing 293 functional screening on purpose, as shown in Figure 2B, while we collected all cross-reactive B cells during B cell sorting (see attached Supplementary Figure 5). This functional selection step reduced the number of clones. We clarified this point by attaching additional supplementary figures (Supplementary Figures 5 and 6).

      Our sorted cross-reactive B cells are most likely CD38+ memory B cells, as shown in Supplementary Figure 6.

      Reviewer #1 (Recommendations For The Authors):

      a) It is advised for the authors to provide a flow chart with time stamps to prove the many statements made in the paper. For example, it is stated that "we demonstrated efficient isolation of influenza cross-reactive antibodies with high affinity from mouse germinal B cells over 4 days". It is not clear how this was calculated.

      Thank you for your comment. We have prepared a time-stamped flow chart (Supplementary Figure 5).

      b) The papers cited by the authors are relatively old if not outdated. There are many papers published focusing on efficient isolation of mAbs for SARS-CoV-2 research. For example, the paper by Lima et al (Nat Comm 2022, 13:7733) used a very similar strategy for rapid isolation of cross-reactive mAbs by FACS sorting followed by cloning of paired heavy and light chains from single B cells. The authors need to incorporate citations from the latest publications in this field.

      Thank you for your comment. The paper by Lima et al. (Nat Comm 2022, 13:7733) has been cited in the Discussion as ref 28.

      c) Figure 2 needs much more detail for readers to follow.

      Thank you for your comment. We have revised the legend of Figure 2 accordingly and added additional supplementary figures (Supplementary Figures 5 and 6) to increase clarity.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      This work sought to demonstrate that gut microbiota dysbiosis may promote the colonization of mycobacteria, and they tried to prove that Nos2 down-regulation was a key mediator of such gut-lung pathogenesis transition.

      Strengths:

      They did large-scale analysis of RNAs in lungs to analyze the gene expression of mice upon gut dysbiosis in MS-infected mice. This might help provide an overview of gene pathways and critical genes for lung pathology in gut dysbiosis. This data is somewhat useful and important for the TB field.

      Weaknesses:

      (1)They did not use wide-type Mtb strain (e.g. H37Rv) to develop mouse TB infection models, and this may lead to the failure of the establishment of TB granuloma and other TB pathology icons.

      The colonization of M.tb in the lungs and the amount of colonization are the first and primary conditions for the occurrence of TB. Our aim in this study is to explore the impact of gut microbiota dysbiosis on the colonization of M.tb in the lungs. However, due to the lack of necessary conditions for biosafety in our laboratory, some highly infectious bacteria (such as M.tb) are not allowed to be cultured, and establishing the M.tb infection animal model in our laboratory does not meet the requirements of biosafety. Hence, we used the model strain of M.tb, M.smegmatis (MS), and established the animal-infected model for exploring the effect of gut microbiota dysbiosis on MS colonization in mice lungs. However, the establishment of MS infected model may not necessarily produce typical TB granulomas and other TB pathology signs. we have discussed the limitations of the current study in the discussion part of the manuscript. The suggested revisions are shown in lines 21-39 of page 15. In future studies, we plan to adopt the reviewer's suggestion and will use a wide-type M.tb strain to establish the TB-infected model in the laboratory that has biosafety standards to further verify the results of the current study.

      (2) The usage of in vitro assays based on A542 to examine the regulation function of Nos2 expression on NO and ROS may not be enough. A542 is not the primary Mtb infection target in the lungs.

      Thanks for the reviewer’s comments. Although alveolar epithelial cells (AECs) are not the main target cells of Mtb infection, they are among the cells that are contacted early in M.tb infection. Early M.tb invasion of AECs is very essential for the establishment of infection ( PMID 11479618). AECs are usually the initial site of the lung’s response against M.tb. Available literature suggests that freshly isolated AECs are more permissive to M.tb growth than macrophages(PMID 33228849). As a cellular reservoir for M.tb, AECs are capable of facilitating rapid bacterial growth while potentially escaping recognition by phagocytes in the alveolus. The immune cells such as macrophages are the primary targets of M.tb infection, where the M.tb survive and proliferate, leading to the formation and maintenance of granulomas. However, AECs are subjected to the same density of infection, and the bacteria invade and replicate in these cells and induce cell apoptosis and necrosis, which is considered a major mechanism implicated in extra-pulmonary dissemination (PMID 12925134, PMID 32849525). Besides their direct barrier role, AECs also directly respond to M.tb infection by producing mediators such as cytokines, chemokines, and antimicrobial agents (PMID 35017314). Therefore, it is feasible to select alveolar epithelial cell A549 to explore the colonization mechanism of intestinal microbiota affecting M.tb in vitro.  

      (3) They did not examine the lung pathology upon gut dysbiosis to examine the true significance of increased colonization of Mtb.

      We have added the results of the lung pathological section in the revised manuscript. The results of lung pathological sections are shown in lines 11-13 of page 4, and Figure S2 of supplement information.

      (4) Most of the studies are based on MS-infected mouse models with a lack of clinical significance.

      The first and primary condition of any pathogen infection is that the bacteria must invade the host through colonization and multiply in the target organ. This study aimed to investigate the effect of intestinal microbial dysbiosis on the colonization of mycobacterium in mouse lungs. Our laboratory does not meet the biosafety standard for culturing highly infectious bacteria such as Mycobacterium tuberculosis. So, we used the Mycobacterium smegmatis as a model strain for M.tb to establish the infected mice model in the current research. Although M. smegmatis is generally considered nonpathogenic, M. smegmatis is closely related to M.tb in biochemical characteristics, genetic information, cell structure, and metabolism( PMID 32674978). M.smegmatis is regarded as a valuable model organism in the study of M.tb, which is widely been used to explore the biological characteristics of M.tb such as physiological state, stress response, non-culture state reactivation, antimicrobial activity, and biochemical protection (PMID 32674978). It has also been reported that M.smegmatis could be used as a model strain to study the molecular mechanism of interaction between M.tb and its host (PMID 30546046, PMID25970481, PMID 29568875). However, in preclinical experimental research, we used M. smegmatis as the object of study. Instead of focusing on the pathological changes caused by M.smegmatis in the host lungs, we mainly focused on the influence of intestinal microbiota on the colonization of mycobacterium in the lungs and its possible mechanism, which provides a reliable model to study the prevention of early infection and spread of M.tb through regulating the intestinal microbiota. It has important clinical significance for the further development of new measures for the prevention and control of tuberculosis. If experimental conditions permit, the establishment of an infected model with wild-type M.tb can be used to verify the findings of the present study which may provide important clinical guidelines.

      Reviewer #2 (Public Review):

      The manuscript entitled "Intestinal microbiome dysbiosis increases Mycobacteria pulmonary colonization in mice by regulating the Nos2-associated pathways" by Han et al reported that using clindamycin, an antibiotic to selectively disorder anaerobic Bacteriodetes, intestinal microbiome dysbiosis resulted in Mycobacterium smegmatis (MS) colonization in the mice lungs. The authors found that clindamycin induced damage of the enterocytes and gut permeability and also enhanced the fermentation of cecum contents, which finally increased MS colonization in the mice's lungs. The study showed that gut microbiota dysbiosis up-regulated the Nos2 gene-associated pathways, leading to increased nitric oxide (NO) levels and decreased reactive oxygen species (ROS) and β-defensin 1 (Defb1) levels. These changes in the host's immune response created an antimicrobial and anti-inflammatory environment that favored MS colonization in the lungs. The findings suggest that gut microbiota dysbiosis can modulate the host's immune response and increase susceptibility to pulmonary infections by altering the expression of key genes and pathways involved in innate immunity. The authors reasonably provided experimental data and subsequent gene profiles to support their conclusion. Although the overall outcomes are convincing, there are several issues that need to be addressed:

      (1) In Figure S1, the reviewer suggests checking the image sizes of the pathological sections of intestinal tissue from the control group and the CL-treatment group. When compared to the same intestinal tissue images in Figure S4, they do not appear to be consistently magnified at 40x. The numerical scale bars should be presented instead of just magnification such as "40x".

      Thanks for the precise comments. We have carefully checked the pathological section in Figure S1 and Figure S5 and added the numerical scale bars to the figure. The revised sections are added in the supplementary materials.

      (2) In Figure 4d, the ratio of Firmicutes in the CL-FMT group decreased compared to the CON-FMT group, whereas the CL-treatment group showed an increase in Firmicutes compared to the Control group in Figure 3b. The author should explain this discrepancy and discuss its potential implications on the study's findings.

      The success of fecal microbial transfer (FMT) is influenced by many factors, such as host intestinal microbiota, immunity, and genetic factors (PMID 37167953). During FMT procedure, all microbiota of the donor feces do not have the same colonization ability in the recipients. Some research has revealed that the colonization success rate of Bacteroidetes is higher than that of Firmicutes [PMID 24637796]. In this study, we noticed that the reason for the difference between Figure 4D and Figure 3B was that during FMT, the colonization of Firmicutes decreased in the Cl-FMT receptor after transplantation, while the colonization of Bacteroides increased, resulting in a decrease in the proportion of Firmicutes/ Bacteroides in the Cl-FMT group. However, we considered the gut microbiota as a whole in the present study. After FMT, we found that 85.11% of bacterial genera and 52.38% of fungi genera present in the CL inocula were successfully transferred to the CL-recipient mice, and 91.45% of bacteria genera and 56.36% of fungi genera in the CON inocula were also successfully transferred to the CON-recipient mice, respectively (Figure 4g). The trans-kingdom network analyses between bacteria and fungi showed that the trends of the gut microbiome in recipient mice were consistent with those in the donor mice. Therefore, the FMT model established in this study remains successful. For reviewer clarification, we have added explanations in the discussion part of the manuscript. See lines 8-29 of page 12 for details.

      (3) In Figure 6, did the authors have a specific reason for selecting Nos2 but not Tnf for further investigation? The expression level of the Tnf gene appears to be the most significant in both RT-qPCR and RNA-sequencing results in Figure 5f. Tnf is an important cytokine involved in immune responses to bacterial infections, so it is also a factor that can influence NO, ROS, and Defb1 levels.

      Thanks for the valuable reviewer’s comment. By analyzing the transcriptome data, we found that there were 8 genes strongly associated with TB infection in the KEGG pathway, including Nos2, Cd14, Tnf, Cd74, Clec4e, Ctsd, Cd209a, and Il6. Then, we performed KO pathway analysis and found that the Nos2 gene was strongly associated with multiple pathways including “cytokine activity ", "chemokine activity", and "nitric oxide synthase binding". Moreover, in a clinical study on tuberculosis, the expression level of Nos2 in the plasma of patients with newly diagnosed tuberculosis was significantly higher than that of healthy people, indicating that Nos2 is associated with the occurrence of tuberculosis (PMID 34847295). Therefore, we selected Nos2 as the main target gene in the current study to conduct the correlation pathway analysis. As an important cytokine involved in the immune response to bacterial infection, Tnf mentioned by the reviewers may also be a factor affecting the levels of NO, ROS, and Defb1, which provides a new idea for our future research.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      First, they need to use a true Mtb-infected mouse model to determine the relationship between gut dysbiosis and increased lung infection of Mtb.

      Second, the mechanism by which nos2-mediated NO and ROS production need to be further analyzed in the real Mtb infection process (either in vivo or in vitro).

      Third, Lung pathology should be included in addressing the increased colonization of mycobacteria. Addressing these problems may help improve this work.

      (1) Our laboratory does not meet the biosafety standard for culturing highly infectious bacteria such as Mycobacterium tuberculosis. So, we used the Mycobacterium smegmatis as a model strain for M.tb to establish the infected mice model in the current research. Although M. smegmatis is generally considered nonpathogenic. M. smegmatis is closely related to M.tb in biochemical characteristics, genetic information, cell structure, and metabolism( PMID 32674978). M.smegmatis is regarded as a valuable model organism in the study of M.tb, and has been widely used to explore the biological characteristics of M.tb such as physiological state, stress response, non-culture state reactivation, antimicrobial activity, and biochemical protection (PMID 32674978). It has also been reported that M.smegmatis was used as a model strain to study the molecular mechanism of interaction between M.tb and its host (PMID 30546046, PMID25970481, PMID 29568875). However, in preclinical experimental research, we mainly focused on the influence of intestinal microbiota on the colonization of mycobacterium in the lungs and its possible mechanism which provides a reliable model to study the prevention of early infection and spread of M.tb through regulating the intestinal microbiota.

      (2) In the future, we will establish an infected model with wild-type M.tb to verify the mechanism by which nos2-mediated NO and ROS production and promote M.tb colonization.

      (3) We have added the results in the lung pathological section in the revised manuscript. The results of lung pathological sections are shown in lines 11-13 of page 4, and Figure S2 of supplement information.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public Review):

      The exclusive use of males is a major concern lacking adequate justification and should be disclosed in the title and abstract to ensure readers are aware of this limitation. With several reported sex differences in rat vocal behaviors this means caution should be exercised when generalizing from these findings. The occurrence of an estrus cycle in typical female rats is not justification for their exclusion. Note also that male rodents experience great variability in hormonal states as well, distinguishing between individuals and within individuals across time. The study of endocrinological influences on behavior can be separated from the study of said behavior itself, across all sexes. Similarly, concerns about needing to increase the number of animals when including all sexes are usually unwarranted (see Shansky [2019] and Phillips et al. [2023]).

      As suggested by the Reviewer, we have disclosed the use of males in the title and the abstract. Also, we have added the statement that research on female rat subjects is required: “Here we are showing introductory evidence that 44-kHz vocalizations are a separate and behaviorally-relevant group of rat ultrasonic calls. These results require further confirmations and additional experiments, also in form of repetition, including research on female rat subjects.”

      Regarding the analysis where calls were sorted using DBSCAN based on peak frequency and duration, my comment on the originally reviewed version stands. It seems that the calls are sorted by an (unbiased) algorithm into categories based on their frequency and duration, and because 44kHz calls differ by definition on frequency and duration the fact that the algorithm sorts them as a distinct category is not evidence that they are "new calls [that] form a separate, distinct group". I appreciate that the authors have softened their language regarding the novelty and distinctness of these calls, but the manuscript contains several instances where claims of novelty and specificity (e.g. the subtitle on line 193) is emphasized beyond what the data justifies.

      We further softened our language regarding novelty and distinctness of 44-kHz vocalizations – including the aforementioned subtitle. However, in response, we would like to bring to the readers’ attention that all major groups of calls, i.e., long 22-kHz calls, short 22-kHz calls, and 50-kHz vocalization, are also defined in our manuscript and in the literature by their frequency and duration. However not one of these groups was identified separately by DBSCAN clustering excepting the 44kHz vocalizations. If they were not a distinct group, we would expect the 44-kHz and 50-kHz vocalizations to blend first (because of the similar frequencies) or 44-kHz and 22-kHz calls to merge first (because of the similar durations), but they do not in this unbiased examination.

      The behavioral response to call playback is intriguing, although again more in line with the hypothesis that these are not a distinct type of call but merely represent expected variation in vocalization parameters. Across the board animals respond rather similarly to hearing 22 kHz calls as they do to hearing 44 kHz calls, with occasional shifts of 44 kHz call responses to an intermediate between appetitive and aversive calls. This does raise interesting questions about how, ethologically, animals may interpret such variation and integrate this interpretation in their responses. However, the categorical approach employed here does not address these questions fully.

      This paragraph is exactly the same as in the previous review. There was no comment regarding our previous answer. Here is the previous answer:

      “We are unsure of the Reviewer’s critique in this paragraph and will attempt to address it to the best of our understanding. Our finding of up to >19% of long seemingly aversive, 44-kHz calls, at a frequency in the define appetitive ultrasonic range (usually >32 kHz) is unexpected rather than “expected”. We would agree that aversive call variation is expected, but not in the appetitive frequency range.

      Kindly note the findings by Saito et al. (2019), which claim that frequency band plays the main role in rat ultrasonic perception. It is possible that the higher peak frequency of 44-kHz calls may be a strong factor in their perception by rats, which is, however, modified by the longer duration and the lack of modulation. 

      Also, from our experience, it is quite challenging to demonstrate different behavioral responses of naïve rats to pre-recorded 22-kHz (aversive) vs. 50-kHz (appetitive) vocalizations. Therefore, to demonstrate a difference in response to two distinct, potentially aversive, calls, i.e., 22kHz vs. 44-kHz calls, to be even more difficult (as to our knowledge, a comparable experiment between short vs. long 22-kHz ultrasonic vocalizations, has not been done before). 

      Therefore, we do not take lightly the surprising and interesting finding that “animals respond rather similarly to hearing 22 kHz calls as they do to hearing 44 kHz calls, with occasional shifts of 44 kHz call responses to an intermediate between appetitive and aversive calls”. We would rather put this description in analogous words: “the rats responded similarly to hearing 44-kHz calls as they did to hearing aversive 22-kHz calls, especially regarding heart-rate change, despite the 44-kHz calls occupying the frequency band of appetitive 50-kHz vocalizations” and “other responses to 44-kHz calls were intermediate, they fell between response levels to appetitive vs. aversive playback” – which we added to the Discussion.

      Finally, we acknowledge that our findings do not present a finite and complete picture of the discussed aspects of behavioral responses to the presented ultrasonic stimuli (44-kHz vocalizations). Therefore, we have incorporated the Reviewer’s suggestion in the discussion. The added sentence reads: “Overall, these initial results raise further questions about how, ethologically, animals may interpret the variation in hearing 22-kHz vs. 44-kHz calls and integrate this interpretation in their responses.”

      I appreciate the amendment in discussing the idea of arousal being the key determinant for the increased emission of 44kHz, and the addition of other factors. Some of the items in this list, such as annoyance/anger and disgust/boredom, don't really seem to fit the data. I'm not sure I find the idea that rats become annoyed or disgusted during fear conditioning to be a particularly compelling argument. As such the list appears to be a collection of emotion-related words, with unclear potential associations with the 44kHz calls.

      We agree that most of the factors listed are not supported by the data. These are hypotheses and speculations only – hence, an assumption / tentative statement, i.e., “It could also be argued that…”. We have changed it into “It could also be speculated that…”.

      Later in the Discussion the authors argue that the 44kHz aversive calls signal an increased intensity of a negative valence emotional state. It is not clear how the presented arguments actually support this. For example, what does the elongation of fear conditioning to 10 trials have to do with increased negative emotionality? Is there data supporting this relationship between duration and emotion, outside anthropomorphism? Each of the 6 arguments presented seems quite distant from being able to support this conclusion.

      We have added a description summarizing the literature that expounds the differences in employing one-two vs. five-ten foot-shocks during fear-conditioning training. It says:

      “Importantly, it has been demonstrated multiple times that training rats with several electric foot-shocks (i.e., 5-10 shocks) produces a qualitatively different kind of fear-memory compared to training with only 1-2 shocks. Training with more numerous shocks has been shown to result in augmented freezing (e.g., Fanselow and Bolles, 1979, Haubrich et al., 2020, Haubrich and Nader, 2023, Poulos et al., 2016, Wang et al., 2009) which reflects a more intense fear-memory that is resistant to extinction (Haubrich et al., 2020, Haubrich and Nader, 2023), resistant to reconsolidation blockade (Haubrich et al., 2020, Wang et al., 2009, Finnie and Nader, 2020), associated with downregulation of NR2B NMDA-receptor subunits as well as elevated amyloid-beta concentrations in the lateral and basal amygdala (Finnie and Nader, 2020, Wang et al., 2009). Additionally, it involves activation of the noradrenaline-locus coeruleus system (Haubrich et al., 2020) and collective changes in connectivity across multiple brain regions within the neural network (Haubrich and Nader, 2023). 

      Notably, it has also been shown that higher freezing as a result of fear-conditioning training correlates with increased concentrations of stress hormone, corticosterone, in the blood (Dos Santos Correa et al., 2019). The rats subjected to 6- and 10-trial fear conditioning, whose results are reported herein (Tab. 1/Exp. 2/#7,8,11,12; n = 73), also demonstrated higher freezing than rats subjected to 1trial conditioning (Tab. 1/Exp. 2/#6,10; n = 33), which is reported elsewhere (Olszynski et al., 2021, Fig. S1C-E; Olszynski et al., 2022, Fig. S1D-G). Therefore, we postulate that emission of 44-kHz calls is associated with increased stress and the training regime forming robust memories.”  

      In sum, rather than describing the 44kHz long calls as a new call type, it may be more accurate to say that sometimes aversive calls can occur at frequencies above 22 kHz. Individual and situational variability in vocalization parameters seems to be expected, much more so than all members of a species strictly adhering to extremely non-variable behavioral outputs.

      This paragraph is exactly the same as in the previous review. There was no comment regarding our previous answer. Here is the previous answer:

      “The surprising fact that there are presumably aversive calls that are beyond the commonly applied thresholds, i.e., >32 kHz, while sharing some characteristics with 22-kHz calls, is the main finding of the current publication. Whether they be finally assigned as a new type, subtype, i.e. a separate category or become a supergroup of aversive calls with 22-kHz vocalizations is of secondary importance to be discussed with other researchers of the field of study. 

      However, we would argue – by showing a comparison – that 22-kHz calls occur at durations of <300 ms and also >300 ms, and are, usually, referred to in literature as short and long 22-kHz vocalizations, respectively (not introduced with a description that “sometimes 22-kHz calls can occur at durations below 300 ms”). These are then regarded and investigated as separate groups or classes usually referred to as two different “types” (e.g., Barker et al., 2010) or “subtypes” (e.g., Brudzynski, 2015). Analogously, 44-kHz vocalizations can also be regarded as a separate type or a subtype of 22kHz calls. The problem with the latter is that 22-kHz vocalizations are traditionally and predominantly defined by 18–32 kHz frequency bandwidth (Araya et al., 2020; Barroso et al., 2019; Browning et al., 2011; Brudzynski et al., 1993; Hinchcliffe et al., 2022; Willey & Spear, 2013).”

      Reviewer #1 (Recommendations For The Authors):

      Additional considerations:

      Abstract: The 19.4% seems to be the percentage of 44 kHz calls observed during the 9th trial of the 10trial experiment, not the percentage of calls that were 44kHz during bouts of freezing.

      We clarified the sentence. It now says:

      “We observed 44-kHz calls to be associated with freezing behavior during fear conditioning training, during which they constituted up to 19.4% of all calls and most of them appeared next to each other forming groups of vocalizations (bouts).”

      Abstract: "We hope that future investigations of 44-kHz calls in rat models of human diseases will  contribute to expanding our understanding and therapeutic strategies related to human psychiatric conditions." This sounds like a far too strong of an implication provided the link between these calls and models of human psychiatric conditions is not clear.

      We agree, the link is not clear. Therefore we only express our hope. We hope “the link” is there. While other ultrasonic calls are already being investigated in such animal models, training regimes employing numerous electric shocks are used as models of PTSD, helplessness etc.

      Line 101: Seems a strong assumption to state the authors of the other publication were inspired by this paper, unless there is personal communication corroborating this.

      The wording of the sentence has been changed.

      It is still not clear why both Friedman and Wilcoxon tests were used, especially in situations where only one result seems to be referenced (for example on line 108-109).

      We added the explanation within Methods: “In particular, the Friedman test was used to assess the presence of change within the sequence of several ITI, while the Wilcoxon test was used for the difference between the first and the last ITI analyzed.”

    1. Author response:

      We sincerely appreciate the positive assessment regarding the significance of our study, as well as the valuable suggestions provided by editor and the reviewers.

      In response to the reviewers’ comments, we will modify the manuscript to include co-staining of CD66b and GSDMD in the whole skin samples of clinical patients, which will further clarify the expression of GSDMD in neutrophils.

      Additionally, we plan to conduct further analyses using publicly available data to elucidate the changes in neutrophil pyroptotic signaling in IMQ-induced psoriatic mice tissue, thereby strengthening our conclusions about the role of neutrophil pyroptosis in the progression of psoriasis.

      Moreover, while our research primarily focuses on the role of neutrophil pyroptosis in psoriasis, this does not conflict with existing reports indicating that KC cell pyroptosis also contributes to disease progression. Both studies underscore the significant role of GSDMD-mediated pyroptotic signaling in psoriasis, and the consistent involvement of KC cells and neutrophils further emphasizes the potential therapeutic value of targeting GSDMD signaling in psoriasis treatment. We will expand upon this discussion in the revised manuscript.

      In our model, to accurately assess the disease condition in mice, we standardized the drug treatment area on the dorsal side (2*3 cm). Therefore, the area was not factored into the scoring process, and we will include a detailed description of this in the revised manuscript.

      Regarding the downregulation of CASP in GSDMD KO mouse skin tissue, existing studies indicate that GSDMD generates a feed-forward amplification cascade via the mitochondria-STING-Caspase axis (PMID: 36065823, DOI: 10.1161/HYPERTENSIONAHA.122.20004). We hypothesize that the absence of GSDMD attenuates STING signaling’s activation of Caspase.

      Furthermore, in the revised manuscript, we will address the reviewers’ other comments to enhancing the manuscript quality, such as providing further clarification on relevant issues in the discussion section, refining the key experiments in the methods section, and adding details about the antibodies used, including their associated clones and catalog numbers, as well as including sample sizes (n numbers) in the figure legends.

      We believe that the new data and further discussions and clarifications included in the revised manuscript will adequately address all the concerns raised by the reviewers and better support our conclusions.

      Finally, we would like to express our gratitude once again to the editor and reviewers for their invaluable feedback on this work!

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer 1 (Public Review):

      Comment 1. Clinical Data on Patient Brain Samples: The inclusion of specific details such as postmortem intervals and the age at disease onset for patient brain samples would be valuable. These factors could significantly affect the quality of the tissues and their relevance to the study. Moreover, given the large variation in disease duration between PD and PDD, it’s important to consider disease duration as a potential confounding factor, especially when concluding that PDD patients have a more severe form of synucleinopathy compared to PD.

      We thank the reviewer for this valuable comment. We have included the post-mortem interval (PMI) and age of death in Table S1, showing the clinicopathological information. Changes on page 16. As suggested by the reviewer, we included the discussion on the large variation in disease duration between PD and PDD cases. We noted that DLB cases also have shorter disease durations but still demonstrate seeding kinetics similar to PDD. Therefore, we hypothesise that the molecular differences we observed between different diseases were due to the strain properties or higher pathological load (seen in both PDD and DLB) and are unlikely due to the disease duration. Changes on pages 9-11, lines 204-212.

      Comment 2. Inclusion of Healthy Controls in Multiple Tests: Given the importance of healthy controls in scientific studies, especially those involving human brain samples, the authors could consider using healthy controls in more tests to strengthen the robustness of the findings. Expanding the use of healthy controls in biochemical profiling and phosphorylation profiles would provide a better basis for comparison and clarify the significance of results in a disease context. This will help the authors to elaborate on the interpretation of results, for example, in Figure 3, where the authors claim that PD brains show mostly monomeric _α_Syn forms (line 119 and 120, and also in 222 and 223). Whether it implies the absence of alpha-syn pathology in PD brains? If there are differences from healthy controls? What are these low molecular weight bands (¡15kD) (line 125-126) and whether they are also present in healthy controls? Also, we do not have a perfect pS129-specific (anti-p_α_Syn) antibody. They are known for non-specific labeling. Investigating the phosphorylation levels in healthy controls and comparing them to PD brains, especially considering the predominance of monomeric (healthy _α_Syn?) in PD brains, would help clarify the observed changes.

      We agree with the reviewer’s assessment and consider this an important suggestion. We performed biochemical profiling and immunogold imaging with the three HC cases and presented the results in Figure 4. aSyn in healthy controls was completely digested by PK. The low MW bands were absent in PD and HC, and there was no difference in the PK profiles. However, this may be due to the low pathology load and amount of pathological aSyn in the selected PD brains. Additional comments were added to the results. Changes are on pages 4 (lines 136-137) and page 7 (Figure 4).

      Comment 3. Age of Healthy Controls: Providing information about the age at death for healthy controls is crucial, as age can impact the accumulation of aSyn. Also include if the brain samples were age-matched, or analyses were age-adjusted.

      We have described the age of each patient, and the analyses were age-adjusted. Changes on page 16 (Table S1).

      Comment 4. Braak Staging Discrepancy: The study reports the same Braak staging for both PD and PDD, despite the significant difference in disease duration. Maybe other reviewers with clinical experience might have a better take on this. This observation merits discussion in the paper, allowing readers to better understand the implications of this finding.

      ddressed: Our PD and PDD cases are Braak stage 6, indicating that the LB pathology had progressed to the neocortex. It‘s important to note that Braak stage represents only where the LB pathogy has spread and does not indicate anything about the load of LBs. However, our immunohistochemistry results (page 20) show that PDD demonstrates a higher LB load than PD cases in the entorhinal cortex. As the reviewer has suggested, this comment has been amended in the manuscript. Changes on pages 9-11, lines 204-212.

      Comment 5. Citation of Relevant Studies: The paper should consider citing and discussing a recent celebrated study on PD biomarkers that used thousands of cerebrospinal fluid (CSF) samples from different PD patient cohorts to demonstrate the effectiveness of SAA as a biochemical assay for diagnosing PD and its subtypes.

      As suggested by the reviewer, we included this study in the discussion. Changes on page 12, lines 275-278.

      Reviewer 3 (Public Review):

      The experiments are missing two important controls. 1) what to fibrils generated by different in vitro fibril preparations made from recombinant synclein protein look like; and 2) the use of CSF from the same patients whose brain tissue was used to assess whether CSF and brain seeds look and behave identically. The latter is perhaps the most important question of all - namely how representative are CSF seeds of what is going on in patients’ brains?

      We thank the reviewers for this valuable comment. Although in vitro preformed fibrils (PFFs) made out of recombinant aSyn are still important sources for cellular and animal studies to generate disease models and investigate mechanisms, many studies have now turned to use human brain amplified fibrils considering them to more closely present the human structure. Therefore, our study was designed to specifically address this hypothesis by comparing e human derived and SAA-amplified fibrils. It would be interesting to compare these structures also to PFFs but this was beyond the scope of our study. Comparing the CSF and brain seed from the same patients would be very interesting indeed but also difficult as this would require biosample collection during life followed by brain donation. The SAA cannot be done from the PM CSF due to contamination with blood. However, we are in a privileged position to examine such a comparison soon with our longitudinal Discovery cohort, where some participants have donated their brains. These future studies will address the critical question of whether the CSF seeds reflect those in the brain.

      In their discussion the authors do not comment on the obvious differences in the conditions leading to the formation of seeds in the brain and in the artificial conditions of the seeding assay. Why should the two sets of conditions be expected to yield similar morphologies, especially since the extracted fibrils are subjected to harsh conditions for solubilization and re-suspension.

      We agree with the reviewer that the formation of seeds in the brain and the SAA reaction conditions are very different, and one would not expect similar fibrillar morphologies. However, the theory is that pathological seeds are known to amplify through templated seeding, where seeds copy their intrinsic properties to the growing SAA fibrils. Thus, numerous studies use the SAA fibrils as model fibrils to investigate the different aSyn strains. Our study aimed to test whether the SAA fibrils are representative models of the brain fibrils. We included a more explicit comment on this discussion. Changes on page 3, lines 78-83.

      Finally, the key experiment was not performed - would the resultant seeds from SAA preparations from the different nosological entities produce different pathologies when injected into animal brains? But perhaps this is the subject of a future manuscript.

      We agree this is an essential experiment to build on our conclusion. Animal studies would be imperative to assess whether the SAA fibrils reflect the brain fibrils’ toxicity. However, these were beyond the scope of the present study but are being performed in collaboration with some expert groups.

      Furthermore, the authors comment on phosphorylation patterns, stating that the resultant seeds are less heavy phosphorylated than the original material. Again, this should not be surprising, since the SAA assay conditions are not known to contain the enzymes necessary to phosphorylate synuclein. The discussion of PTMs is limited to pS-129 phosphorylation. What about other PTMs? How does the pattern of PTMs affect the seeding pattern.

      We agree with the reviewer that other PTMs should be explored, but this was beyond the scope of this study. Here, we could focus on pS129, which has multiple reliable antibodies that also work with immunogold-TEM.

      Lastly, the manuscript contains no data on how the diagnostic categories were assigned at autopsy. This information should be included in the supplementary material.

      Clinical and neuropathological diagnostic criteria are now included in Table S1. Changes on page 16, lines 448-461.

      Reviewer 1 (Recommendations for the authors):

      (1) Remove a duplicate sentence in line 94-96.

      Addressed: Thank you for pointing this out. The duplicated sentence has been corrected. Changes are on page 4, lines 105-106.

      (2) Figure 1 Placement of Healthy Controls: Moving the graph representing healthy controls from the supplementary materials to the main figures could help readers better appreciate the results of diseased states.

      The healthy control SAA curves were moved to the main figure. Changes are on page 5, Figure 2.

      (3) Commenting on Case 2 Healthy Control: In the discussion section, you may comment on the case of the healthy control that showed amplification towards the end. While definitive conclusions may be challenging, acknowledging the possibility of incidental Lewy bodies or the prodromal phase of the disease would add depth to the analysis? But make sure to include the age information for healthy controls.

      We believe this is an important point to discuss in the manuscript. We have referenced other studies with similar observations and stated that it is currently unknown what this phenomenon reflects (page 11, lines 221-226). The age information of the healthy control subjects was added to Table S1.

      (4) Figure S3 Clarity: To enhance the clarity of Figure S3, consider adding a reference marker or arrow in the low-magnification image that points to the region being magnified in the insets. This visual cue will make it easier for readers to connect the detailed insets with the corresponding area in the broader image.

      In Figure S3, we included a reference arrow in the low-magnification images to clarify where the higher-magnification images are taken. Changes are on page 19, Figure S3.

      Reviewer 2 (Recommendations for the authors):

      (1) A major issue confronting the field is the conflation of the PMCA and RT-QuIC assays (the latter of which was used here). The decision to rename and combine the two under the umbrella of SAAs does a major disservice to the field for many reasons. Recognizing that the push for this did not come from the authors, clarifying the differences in their Introduction would be very useful. I suggest this, in large part, because in the prion field, PMCA is known to amplify prion strains with high fidelity whereas the product from RT-QuIC does not. In fact, the RT-QuIC product for PrP is not even infectious, while the synuclein field uses it as a means to generate material for subsequent studies. Highlighting these differences would certainly strengthen the arguments the authors are making about the inadequacy of the synuclein RT-QuIC approach in research.

      We thank the reviewers for these very valuable comments. We have included a further introduction on PMCA and RT-QuIC, explaining the differences and clearly stating our selection of the RT-QuIC method in this paper (page 3, lines 55-68). In addition, we have highlighted that, unlike PMCA, the RT-QuIC end-products are non-infectious and biologically dissimilar to the seed protein. Combined with our results, the findings demonstrate the methodological limitation of RT-QuIC in reproducing the seed fibrils and replicating their intrinsic biophysical information.

      (2) On page 4, sentences starting on lines 94 and 95 are a duplication.

      The duplicated sentence has been corrected. Changes are on page 4, lines 105-106.

      (3) In the Results, noting that the pSyn staining on the RT-QuIC fibrils is coming from the human patient sample used to seed the reaction would be useful. This is mentioned in the Discussion, but the lack of mention in the Results made me pause reading to double check the methods. I think this could also be addressed a bit more clearly in the Abstract.

      We have clarified this in the Results and Abstract. Changes on page 1 (lines 21-22) and page 9 (lines 192-194)

      (4) On page 8 line 188, change was to were in the sentence, ”First, faster seeding kinetics was...”

      This grammar error has been corrected. Changes are on page 9, line 200.

      (5) The authors may want to comment on the unexpected finding that despite the RT-QuIC fibrils having a difference in twisted vs straight filaments, all 4 seeded reactions gave identical results in the conformational stability assay.

      Addressed: We want to thank the reviewer for this comment and have highlighted the unexpected finding with a comment on what could be causing the identical results in the conformational stability assay. Changes are on page 12, lines 297-303.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      The study "Endogenous oligomer formation underlies DVL2 condensates and promotes Wnt/βcatenin signaling" by Senem Ntourmas et al. contributes to the understanding of phase separation in Dishevelled (DVL) proteins, specifically focusing on DVL2. It builds upon existing research by investigating the endogenous complexes of DVL2 using ultracentrifugation and contrasting them with DVL1 and DVL3 behavior. The study identifies a DVL2-specific region involved in condensate formation and introduces the "two-step" concept of DVL2 condensate formation, enriching the field's knowledge. 

      Strengths: 

      A notable strength of this study is the validation of endogenous DVL2 complexes, providing insights into its behavior compared to DVL1 and DVL3. The functional validation of the DVL C-terminus (here termed conserved domain 2 (CD2) and the identification of DVL2-specific regions (here termed LCR4) involved in condensate formation are significant contributions that complement the current knowledge on the importance of DVL DIX domain, DEP domain and intrinsically disordered regions between DIX and PDZ domains. Additionally, the introduction of the concept where oligomerization (step 1) precedes condensate formation (step 2) is an interesting hypothesis, which can be further experimentally challenged in the future.

      We thank the reviewer for her/his interest in our work and for acknowledging our significant contributions to the understanding of DVL2 phase separation.   

      Weaknesses: 

      However, the applicability of the findings to full-length DVL2 protein, hence the physiological relevance, is limited. This is mostly due to the fact that the authors almost completely depend on the set of DVL2 mutants, which lack the (i) DEP domain and (ii) nuclear export signal (NES). These variants fail to establish DEP domain-mediated interactions, including those with FZD receptors. Of note, the DEP domain itself represents a dimerization/tetramerization interface, which could affect the protein condensate formation of these mutants. Possibly even more importantly, the used mutants localize into the nucleus, which has different biochemical & biophysical properties than a cytoplasm, where DVL typically reside, which in turn affects the condensate formation. On top, in the nucleus, most of the DVL binding partners, including relevant kinases, which were reported to affect protein condensate formation, are missing.

      The most convincing way to address this valid concern and to support a physiological relevant role of our findings is to extend our experiments with full-length DVL2, which we did alongside the suggestion in point two (please see below). In addition, we address the specific issues as follows:

      We completely agree that interaction through the DEP domain contributes to condensate formation, which was thoroughly demonstrated in great studies by Melissa Gammons and Mariann Bienz, and complex formation (Fig. 2B, C). We deleted this domain on purpose for our mapping experiments, since we obtained more consistent results without any additional contribution of the DEP domain. Once we mapped CFR and identified crucial amino acids within CFR (VV, FF), we demonstrated that CFR-mediated interaction contributes to complex formation, condensate formation and pathway activation in the context of full-length DVL2 (Fig. 7A-G). 

      We also agree that the nuclear localization may affect condensate formation because of the reasons mentioned by the reviewer or others, such as differences in DVL2 protein concentration. However, later proof-of-concept experiments in full-length DVL2 confirmed that CFR and its identified crucial amino acids (VV, FF), which were mapped in this rather artificial nuclear context, contribute to the typical cytosolic condensate formation of DVL2 (Fig. 7C, D). Moreover, we also observed cells with cytosolic condensates for the NES-lacking DVL2 constructs, although to a lower extent as compared to cells with nuclear condensates. A new analysis of NES-lacking key constructs focusing exclusively on cells with cytosolic condensates revealed similar differences between the DVL2 mutants as were observed before when investigating cells with nuclear (and cytosolic) condensates (new Fig. S3E, F), suggesting that the detected differences are not due to nuclear localization but reflect the overall condensation capacity. 

      In addition, our condensate-challenging experiments (osmotic shock, 1,6-hexandiol) suggested that cytosolic condensates of full-length DVL2 and nuclear CFR-mediated condensates of deletion proteins lacking the DEP domain behave quite similar (Fig. 6A-C).

      Second, the use of an overexpression system, while suitable for comparing DVL2 protein condensate features, falls short in functional assays. The study could benefit from employing established "rescue systems" using DVL1/2/3 knockout cells and re-expression of DVL variants for more robust functional assessments. 

      We used the suggested established rescue system of DVL1/2/3 knockout cells (T-REx DVL1/2/3 triple knockout cells and T-REx DVL1/2/3 RNF43 ZNRF3 penta knockout cells, which are even more sensitive towards DVL re-expression as they lack RNF43/ZNRF3-mediated degradation of DVL activating receptors; both cell lines from the Bryja lab). Upon overexpression, our key mutants DVL2 VV-AA FF-AA and ∆CFR showed markedly reduced pathway activation compared to WT DVL2 (new Figs. 7F and S5J), as we observed before. Especially in the DVL1/2/3 triple knockout cells, DVL2 VV-AA FF-AA hardly activated the pathway and was as inactive as the established M2 mutant (new Fig. 7F). Most importantly, while re-expression of WT DVL2 at close to endogenous expression levels fully rescued Wnt3a-induced pathway activation in DVL1/2/3 knockout cells, DVL2 VV-AA FF-AA revealed significantly reduced rescue capacity and was almost as inactive as DVL2 M2 (new Figs. 7G and S5K). 

      Furthermore, the discussion and introduction overlook some essential aspects of DVL biology. One such example is the importance of the open/close conformation of DVL and its effects on DVL phase separation and activity. In the context of this study, it is important to say that this conformational plasticity is mediated by DVL C-terminus (CD2 in this study). The second example is the reported roles of DVL1 and DVL3, which can both mediate the Wnt3a signal. How this can be interpreted when DVL1 and DVL3 lack LCR4 and still form condensates? 

      We included the open/close conformation of DVL in our manuscript (introduction p. 3 and new discussion paragraph p. 10) and discussed it in the context of our findings. It is intriguing to speculate that Wnt-induced opening of DVL2 increases the accessibility of LCR4 and CD2, thereby triggering pre-oligomerization and subsequent phase separation of DVL2 (see discussion).

      We extended the last paragraph of the discussion to interpret the roles of DVL1 and DVL3 lacking LCR4 (see p. 10). In short, the general ability of DVL1 and DVL3 to form condensates and to activate the Wnt pathway can be potentially explained through the other interaction sites (DIX, DEP, intrinsically disordered region). However, previous studies suggest that the DVL paralogs exhibit (quantitative) differences in Wnt pathway activation and that all three paralogs have to interact at a certain ratio for optimal pathway activation. In this context, a physiologic role for DVL2 LCR4 may be to promote the formation of these DVL1/2/3 assemblies and/or to enhance the stability of these assemblies.

      In order to increase the physiological relevance of the study, I would recommend analyzing several key mutants in the context of the full-length DVL2 protein using the rescue/complementation system. Further, a more thorough discussion and connections with the existing literature on DVL protein condensates/puncta/LLPS can improve the impact of the study. 

      We thank the reviewer for her/his suggestions to improve our study, which we addressed as detailed above.

      Reviewer #2 (Public Review): 

      Summary: 

      The authors aimed to identify which regions of DVL2 contribute to its endogenous/basal clustering, as well as the relevance of such domains to condensate/phase separation and WNT activation. 

      Strengths: 

      A strength of the study is the focus on endogenous DVL2 to set up the research questions, as well as the incorporation of various techniques to tackle it. I found also quite interesting that DVL2-CFR addition to DVL1 increased its MW in density gradients. 

      We thank the reviewer for her/his interest in our work and the constructive suggestions to improve our study.

      Weaknesses: 

      I think that several of the approaches of the manuscript are subpar to achieve the goals and/or support several of the conclusions. For example: 

      (1) Although endogenous DVL2 indeed seems to form complexes (Figure 1A), neither the number of proteins involved nor whether those are homo-complexes can be determined with a density gradient. Super-resolution imaging or structural analyses are needed to support these claims. 

      We agree that it will be very interesting to study the nature of the detected endogenous complexes in detail and we will consider this for any follow-up study, as structural analyses were out of scope for the revision of the presented manuscript. To address the issue, we mentioned that the calculation of about eight DVL2 molecules per complex is based on the assumption of homotypic complexes (results p. 4) and we discussed, why we think that homotypic complexes are the most likely assumption based on the currently available (limited) data (discussion p. 8).

      (2) Follow-up analyses of the relevance of the DVL2 domains solely rely on overexpressed proteins. However, there were previous questions arising from o/e studies that prompted the focus on endogenous, physiologically relevant DVL interactions, clustering, and condensate formation.

      Although the title, conclusions, and relevance all point to the importance of this study for understanding endogenous complexes, only Figures 1A and B deal with endogenous DVL2. 

      We think that the biochemical detection of endogenous DVL2 complexes itself represents a valuable contribution to the understanding of endogenous DVL clustering, especially (i) since it is still lively discussed in the field whether and to which extent endogenous DVL assemblies exist (see introduction) and (ii) since recent studies addressing this issue rely on fluorescent tagging of the endogenous protein, which, among all benefits, harbors the risk to artificially affect DVL assembly. The follow-up analysis predominantly strengthens this key finding through (i) associating the detected complexes with established (DEP domain) and newly mapped (LCR4) DVL2 interaction sites, which we think is crucial to validate our biochemical approach, and (ii) linking the complexes with condensate formation and pathway activation for functional insights.

      In addition, we performed new experiments with re-expression of DVL2 and our key mutants at close to endogenous expression levels in DVL1/2/3 knockout cells, supporting a physiological relevant role of our findings (new Figs. 7G and S5K, please also see point (5) below).

      (3) Mutants lacking activity/complex formation, e.g. DVL2_1-418, may need further validation. For instance, DVL2_1-506 (same mutant but with DEP) seems to form condensates and it is functional in WNT signalling (King et al., 20223). These differences could be caused by the lack of DEP domain in this particular construct and/or folding differences. 

      We would definitely expect that DVL2 1-506 exhibits increased condensate formation and pathway activation as compared to DVL2 1-418, since the DEP domain was thoroughly characterized as interaction domain in the Bienz lab and the Gammons lab (see references), which we confirmed in our assays (Fig. 2B-D). However, as the DEP domain is an established DVL2 interaction site, we were not interested to further characterize the DEP domain but to explain the marked difference in complex formation between DVL2 ∆DEP and 1-418 (Fig. 2A-C), which could not be associated with any known DVL2 interaction site and which we finally mapped to CFR (Fig. 4A-D). 

      Since fusion of the newly-characterized interaction site CFR to DVL2 1-418 (1-418+CFR) rescued complex formation, condensate formation and signaling activity (Fig. 3B-E and Fig. 4C, D), we think that the lacking activity/complex formation of DVL2 1-418 is more likely due to missing interaction sites than due to folding problems. However, as it is hard to exclude folding differences of deletion mutants, we confirmed the CFR activity through loss-of-function experiments in the context of fulllength DVL2 with minimal point mutations (Fig. 7A-G, VV,FF). 

      (4) The key mutants, DeltaCFR and VV/FF only show mild phenotypes. The authors' results suggest that these regions contribute but are not necessary for 1) complex formation (Density gradient Figures 7A and B), condensate formation (Figures 7C and D), and WNT activity (Figure 7E). Of note Figure 7C shows examples for the mutants with no condensates while the qualification indicates that 50% of the cells do have condensates. 

      Condensate formation and Wnt pathway activation by DVL VV-AA FF-AA were reduced by more than 50% as compared to WT (Fig. 7D, E). We consider these marked differences, since loss of function always ranges between 0% and 100%. In newly performed experiments in DVL1/2/3 knockout cells, the differences were even more pronounced, see point (5) below.

      Yes, Fig. 7C shows an example to qualitatively visualize the change in condensate formation, while Fig. 7D provides the corresponding quantification allowing quantitative assessment of the differences.

      (5) Most of the o/e analyses (including all reporter assays) should be performed in DVL1-3 KO cells in order to explore specifically the behaviour of the investigated mutants. 

      As suggested, we employed DVL1/2/3 knockout cells for performing reporter assays (T-REx DVL1/2/3 triple knockout cells and T-REx DVL1/2/3 RNF43 ZNRF3 penta knockout cells, which are even more sensitive towards DVL re-expression as they lack RNF43/ZNRF3-mediated degradation of DVL activating receptors; both cell lines from the Bryja lab). Here, we focused on key mutants in the context of full-length DVL2, as they are closest to the physiologic situation. Upon overexpression, DVL2 VV-AA FF-AA and DVL2 ∆CFR showed markedly reduced pathway activation as compared to WT DVL2 (new Figs. 7F and S5J). Especially in the DVL1/2/3 triple knockout cells, DVL2 VV-AA FF-AA hardly activated the pathway and was as inactive as the established M2 mutant (new Fig. 7F). Moreover, re-expression at close to endogenous expression levels revealed that DVL2 VV-AA FF-AA less efficiently rescued Wnt3a-induced pathway activation as compared to WT (Figs. 7G and S5K).

      (6) How comparable are condensates found in the cytoplasm (usually for wt DVL) with those located in the nucleus (DEP mutants)? 

      In principal, cytosolic condensates could differ from nuclear condensates due to various reasons, such as e.g. different protein concentration, different availability of interaction partners or different biochemical/biophysical properties (please also see point 1 of reviewer 1). In our condensatechallenging experiments (osmotic shock, 1,6-hexandiol), cytosolic condensates of full-length DVL2 and nuclear condensates of DVL2 mutants behaved quite similar (Fig. 6A-C).

      We are confident that the differences between different DEP mutants in our mapping experiments are not due to nuclear localization but reflect the overall condensation capacity because later proofof-concept experiments demonstrated that CFR, which was identified in these mapping experiments, contributes to cytosolic condensate formation in the context of full-length DVL2 (Fig. 7C, D). Moreover, a new analysis focusing only on cells with cytosolic condensates, which can also be observed for DEP mutants to a low extent, revealed similar differences between key DEP mutants as observed before (Fig. S3E, F; for details please also see point 1 of reviewer 1).

      Several studies in the last two decades have analysed the relevance of DVL homo - and heteroclustering by relying on overexpressed proteins. Recent studies also explored the possibility of DVL undergoing liquid-liquid phase separation following similar principles. As highlighted by the authors in the introduction, there is a need to understand DVL dynamics under endogenous/physiological conditions. Recent super-resolution studies aimed at that question by characterising endogenously edited DVL2. The authors seemed to aim in the same direction with their initial findings (Figure 1A) but quickly moved to o/e proteins without going back to the initial question. This reviewer thinks that to support their conclusions and advance in this important question, the authors should introduce the relevant mutations in the endogenous locus (e.g. by Cas9+ donor template encoding the required 3' exons, as done by others before for WNT components, including DVL2) and determine their impact in the above-indicated processes.

      We agree that genomic editing of the DVL2 locus would be the cleanest system to study the relevance of CFR at endogenous expression levels. As we did not have the resources to generate the suggested cells, we, as an alternative, transiently re-expressed DVL2 and the respective mutants at low levels that were really close to the endogenous expression levels in DVL1/2/3 triple knockout cells (Fig. S5K). These experiments revealed that DVL2 VV-AA FF-AA less efficiently rescued Wnt3ainduced pathway activation as compared to DVL2 WT (Fig. 7G).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary: 

      This is a detailed description of the role of PKCδ in Drosophila learning and memory. The work is based on a previous study (Placais et al. 2017) that has already shown that for the establishment of long-term memory, the repetitive activity of MP1 dopaminergic neurons via the dopamine receptor DAMB is essential to increase mitochondrial energy flux in the mushroom body. 

      In this paper, the role of PKCδ is now introduced. PKCδ is a molecular link between the dopaminergic system and the mitochondrial pyruvate metabolism of mushroom body Kenyon cells. For this purpose, the authors establish a genetically encoded FRET-based fluorescent reporter of PKCδspecific activity, δCKAR. 

      Strengths: 

      This is a thorough study of the long-term memory of Drosophila. The work is based on the extensive, high-quality experience of the senior authors. This is particularly evident in the convincing use of behavioral assays and imaging techniques to differentiate and explore various memory phases in Drosophila. The study also establishes a new reporter to measure the activity of PKCδ - the focus of this study - in behaving animals. The authors also elucidate how recurrent spaced training sessions initiate a molecular gating mechanism, linking a dopaminergic punishment signal with the regulation of mitochondrial pyruvate metabolism. This advancement will enable a more precise molecular distinction of various memory phases and a deeper comprehension of their formation in the future. 

      Weaknesses: 

      Apart from a few minor technical issues, such as the not entirely convincing visualisation of the localisation of a PKCδ reporter in the mitochondria, there are no major weaknesses. Likewise, the scientific classification of the results seems appropriate, although a somewhat more extensive discussion in relation to Drosophila would have been desirable.

      We are very grateful for this very positive appreciation of our work. Following this comment, we have revised our manuscript to bring more compelling evidence of the mitochondrial localization of the PKCδ reporter. We also developed the discussion of our results with respect to the Drosophila learning and memory literature.

      Reviewer #2 (Public Review):

      Summary 

      This study deepens the former authors' investigations of the mechanisms involved in gating the longterm consolidation of an associative memory (LTM) in Drosophila melanogaster. After having previously found that LTM consolidation 1. costs energy (Plaçais and Préat, Science 2013) provided through pyruvate metabolism (Plaçais et al., Nature Comm 2017) and 2. is gated by the increased tonic activity in a type of dopaminergic neurons ('MP1 neurons') following only training protocol relevant for LTM, i.e. interspaced in time (Plaçais et al., Nature Neuro 2012), they here dig into the intra-cell signalling triggered by dopamine input and eventually responsible for the increased mitochondria activity in Kenyon Cells. They identify a particular PKC, PKCδ, as a major molecular interface in this process and describe its translocation to mitochondria to promote pyruvate metabolism, specifically after spaced training. 

      Methodological approach 

      To that end, they use RNA interference against the isozyme PKCδ, in a time-controlled way and in the whole Kenyon cell populations or in the subpopulation forming the α/β lobe. This knock-down decreased the total PKCδ mRNA level in the brain by ca. 30%, and is enough to observe decreased in flies performances for LTM consolidation. Using Pyronic, a sensor for pyruvate for in vivo imaging, and pharmacological disruption of mitochondrial function, the authors then show that PKCδ knockdown prevents a high level of pyruvate from accumulating in the Kenyon cells at the time of LTM consolidation, pointing towards a role of PKCδ in promoting pyruvate metabolism. They further identify the PDH kinase PDK as a likely target for PKCδ since knocking down both PKCδ and PDK led to normal LTM performances, likely counterbalancing PKCδ knock-down alone. 

      To understand the timeline of PKCδ activation and to visualise its mitochondrial translocation in a subpart of Mushroom body lobes they imported in fruitfly the genetically-encoded FRET reporters of PKCδ, δCKAR, and mitochondria-δCKAR (Kajimoto et al 2010). They show that PKCδ is activated to the sensor's saturation only after spaced training, and not other types of training that are 'irrelevant' for LTM. Further, adding thermogenetic activation of dopaminergic neurons and RNA interference against Gq-coupled dopamine receptor to FRET imaging, they identify that a dopamine-triggered cascade is sufficient for the elevated PKCδ-activation. 

      Strengths and weaknesses 

      The authors use a combination of new fluorescent sensors and behavioral, imaging, and pharmacological protocols they already established to successfully identify the molecular players that bridge the requirement for spaced training/dopaminergic neurons MP1 oscillatory activity and the increased metabolic activity observed during long-term memory consolidation. 

      The study is dense in new exciting findings and each methodological step is carefully designed. Almost all possible experiments one could think of to make this link have been done in this study, with a few exceptions that do not prevent the essential conclusions from being drawn. 

      The discussion is well conducted, with interesting parallels with mammals, where the possibility that this process takes place as well is yet unknown. 

      Impact 

      Their findings should interest a large audience: 

      They discover and investigate a new function for PKCδ in regulating memory processes in neurons in conjunction with other physiological functions, making this molecule a potentially valid target for neuropathological conditions. They also provide new tools in drosophila to measure PKCδ activation in cells. They identify the major players for lifting the energetic limitations preventing the formation of a long-term memory. 

      We warmly thank Reviewer #2 for the enthusiastic assessment of our work. There were no specific point to address in the Public Review.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      I have a few comments that could help improve the paper and help the reader navigate the detailed analysis.

      (1) Perhaps the authors could add a sentence or two in the intro about the different PKC genes in Drosophila and whether they are expressed in the MB.

      We thank Reviewer #1 for this suggestion. We now describe in the introduction the various subfamilies of PKCs downstream of Gq signaling , the Drosophila members of those different PKC subfamilies, and their expression in the brain. 

      (2) Italicise Drosophila throughout the text.

      We have done this correction.

      (3) In Figure 1, you could change the scheme in Figure F-H and have the timeline always start after training. Then you could see that the training varies in time (perhaps provide the exact duration for each training protocol) and the test interval is constant. Why is it actually measured in a time window and not at an exact time?

      This is indeed a good suggestion to clarify the presentation of our results. We changed the timelines schemes in all the figures with the t=0 starting at the end of the conditioning. Indeed, each conditioning protocol has a different duration as represented on these timelines: as one-cycle training lasts 5 min, 5x massed training has a duration of 20 min, and 5x spaced training takes 1 hours and 30 min to be completed, with its 15 min intertrial intervals. In vivo imaging experiments are performed during a certain time window after conditioning during which, according to our previous experience, the activity of MP1 dopamine neurons after spaced training remains constant (Plaçais et al., 2012). This offers the practical advantage that we can image several flies after a given training session, instead of having to perform many consecutive conditioning protocols.  

      (4) In Figure 2 you could show the massed training data from the supplement. This is very similar to what is shown in Figure 1. Are there also imaging experiments on massed training?

      The reason why massed training data was initially displayed in the supplementary data is that α/β neurons are known to be crucial for LTM formation but are not required for memory formed after massed training, so that the absence of effect was somehow expected. Nonetheless, we performed δCKAR imaging in α/β neurons after 5x massed training and found that PKCδ activity was not increased post-conditioning as expected (Figure 2C). This experiment was performed in parallel of additional data after 5x spaced conditioning δCKAR imaging in α/β neurons as a positive control (these new data were added to the Figure 2B). Following Reviewer #1’s suggestion, all data investigating the effect of PKCδ in α/β neurons are now displayed on Figure 2.

      (5) Figure 3: I am not sure if the blue curve in Figure A really represents an upregulated pyruvate flux compared to the control (mentioned in line 210). It may be the case initially, but it is clearly below the control after 40s. Why is that?

      This visual effect is due to the fact that PDBu injection in itself increases the pyruvate level in MB neurons (independently of its effect on PKCδ), before sodium azide injection. As a result, the baseline of the PDBu treated flies is above the DMSO control flies when sodium azide is injected, which results in the fact that the pyronic sensor saturates quicker and therefore reaches its plateau before the control when traces where normalized right before sodium azide injection. 

      That being said, the measure of the slope in itself following sodium azide injection is not affected by these differences, and is always measured between 10 and 70% of the plateau. 

      Given this remark, and another comment from Reviewer#2 about this experiment, we removed the panel 3A and present only the complete recording of this experiment, that is now displayed on Figure 3 – figure supplement 1C.

      (6) For me, the localisation of the mitochondrial reporter in the mitochondria is not clear. The image in the supplement is not sufficient to show this clearly. What is missing here is a co-staining in the same brain of UAS-mito-δCKAR and a mitochondrial marker to label the mitochondria and the reporter at the same time in the same animal.

      We agree with Reviewer #1’s remark and added new data to make this point more convincing. As suggested, we co-expressed mito-δCKAR with the mitochondrial reporter mito-DsRed in MB neurons (Lutas et al., 2012). We observed a clear colocalization of both signals by performing confocal imaging in the MB neurons somas, indicating that mito-δCKAR is indeed addressed to mitochondria (Figure 4 – figure supplement 1B and 2). 

      (7) Are there controls that the MB expression of the reporters in the flies does not influence the learning ability? In order to make statements about the physiology of the cells, it must also be shown that the cells still have normal activity and allow learning behaviour comparable to wild-type flies.

      This is indeed an important control that we added in the revised version. We tested the memory after 5x spaced, 5x massed and 1x training of flies expressing in the MB the various imaging probes used in our study (cyto-δCKAR, mito-δCKAR and Pyronic). Memory performance was similar to controls in all cases (Figure 1 – figure supplement 1E).  

      (8) Perhaps the authors could go into more detail on two points in the discussion and shorten the comprehensive comparison to the vertebrate system somewhat. It would be nice to know how the local transfer from the peduncle to the vertical lobus is supposed to take place. What is the mechanism here? Any suggestions from the literature? It would also be useful to mention the compartmentalisation of the MB and how the information can overcome these boundaries from the peduncle to the vertical lobe.

      We now elaborate on this question in the discussion (lines 368-386). To sum up, given that the compartmentalization of the MBs is anatomically defined by the presence of specific subset of MBON and DAN cell types (forming different information-processing units), rather than by physical boundaries per se, we can consider two main hypotheses to explain PKCδ activation transfer from the peduncle to the lobes: passive diffusion of activated PKCδ, or mitochondrial motility that would displace PKCδ from its place of first activation. We indeed found that mitochondrial motility was occurring upon 5x spaced conditioning for LTM formation (Pavlowsky et al. 2024).

      In principle, one could also consider that PKCδ could be activated in the lobes by a relaying neuron. The MVP2 neuron (aka MBON-γ1>pedc) presents dendrites facing MP1 and makes synapses with the α/β neurons at the level of the α and β lobes, which makes it a good candidate. Furthermore, as we show that PKCδ activation in the lobes requires DAMB (Figure 4C, Figure 5A-B, Figure 5 – figure supplement 1), one could imagine the following activation loop: MP1 activates the MB neurons via DAMB, that activate MVP2 at the level of the peduncle, which activates in turn the MB neurons at the level of the lobes. However, we did not retain this hypothesis, because MVP2 is GABAergic, which makes it highly unlikely to be able to activate a kinase like PKCδ.

      Regarding the comparative discussion with mammalian systems, we appreciate Reviewer #1’s remark that it may appear too detailed, but given that Reviewer #2 (public comment) highlighted the ‘interesting parallel with mammals’ in our discussion, we finally chose not to reduce this part in the revised manuscript.

      Reviewer #2 (Recommendations For The Authors):

      Fig 1G: is there a decrease in PKCδ activation after mass training as compared to the control, indicating an inhibitory mechanism onto PKCδ following mass training? Or is this an artifact of the PDBu application procedure in the control group? 

      We thank Reviewer #2 for this careful comment. The dent in the timetrace following PDBu application after massed training (Figure 1G) is indeed an artifact due to the manual injection of the drug. But we would like to emphasize that what matters in the determination of PKCδ activity is the level of the baseline before PDBu application after normalization to the final plateau, so that variation around the injection time do not impact the result of the analysis. Moreover, in the revised version, we performed a similar series of experiments, using an α/β neuron-specific driver (Figure 2C). In this series of experiments, there were limited injection artefacts, and we obtained the same conclusion as Figure 1G that PKCδ activity is left unchanged by 5x massed conditioning. 

      Fig 3A: I suggest moving this panel in the supplement: I found it difficult to process the effect of PDBu that is unspecific to PKCδ and that leads to a different plateau because of a different baseline. It would be better explained in more detail in the supplement, especially given that the 3B panel can lead to a similar conclusion and does not have this specificity problem. Up to the authors.

      We thank Reviewer #2 for this feedback. We followed the suggestion and now only display the full recording of this experiment on Figure 3 – figure supplement 1C.

      Fig 3C: To go further, one wonders if knocking-down PDK would act as a switch for gating LTM formation, i.e. if done during a 1x training or a 5x massed training would it gate long-term consolidation?

      This is indeed an excellent suggestion. We performed this experiment and showed that in flies expressing the PDK RNAi in adult MB neurons, only one cycle of training was sufficient to induce longterm memory formation (Figure 3A), instead of the 5 spaced cycles normally required. This confirms the model we previously established in Plaçais et al. 2017, where long-term memory formation was observed upon PDK MB knock-down after 2 cycles of spaced training. This new result goes further in characterizing this facilitation effect, now showing that even a single cycle is sufficient. Altogether these data show that mitochondrial metabolic activation is the critical gating step in long-term memory formation. Spaced training achieves this activation through PDK inhibition, mediated by PKCδ.

      What is the level of mRNA in this construct? I don't see a quantification, can you justify it?

      We thank Reviewer #2 for this remark. This PDK RNAi had been used in a previous work in pyruvate imaging experiment, where it successfully boosted mitochondrial pyruvate uptake. But indeed we had not validated it at the mRNA level. In the revised version of the present manuscript, we now confirm by RT-qPCR that the PDK RNAi efficiently downregulates PDK expression in neurons (Figure 3 – figure supplement 1A).

      Fig. 4C: Is PKCδ activation increase in Vertical lobe DAMB-dependent? One wonders, because MP1 may somehow activate other neurons that could reach this part of the Kenyon Cells. I do not see in the results what could disprove this possibility. The mechanism linking DAMB activation in the peduncle and PKCδ activation in the VL is mysterious, see also Fig. 5.

      This is a very sound remark. In the revised version we have checked whether PKCδ activation in the vertical lobes is also dependent on DAMB.  We performed thermogenetic activation of MP1 neurons and imaged mito-δCKAR signal in the vertical lobes upon DAMB MB knock-down. We found that as for the peduncle, DAMB was required for PKCδ mitochondrial activation (Figure 4C, right panel). This experiment was performed in parallel with similar measurements in flies that did not express DAMB RNAi, as a positive control (these new control data were added to the Figure 4C, left panel).

      This result supports a model where dopamine from MP1 neurons directly acts on Kenyon cells, even for PKCδ activation in the vertical lobes. Thus, this advocates for a diffusion of DAMB-activated PKCδ from the peduncle to the vertical lobes, either by passive diffusion or by mitochondrial motility - two hypotheses that we added in the discussion. 

      Fig. 5: If MP1 neurons release dopamine only to the peduncle, how do you expect PKCδ to be translocated to mitochondria all the way to the vertical lobe? Also is it specific to the vertical lobe and not found in the medial lobe?

      Investigating the spatial distribution of PKCδ is, once again, a very sound suggestion. We re-analyzed our dataset of the mito-δCKAR signal after spaced training for peduncle measurement, as the imaging plane also included the β lobe. We found that PKCδ is also activated at that level, and that its activation also depends on DAMB (Figure 5 – figure supplement 1). We also performed additional pyruvate measurements in the medial lobes, and observed that mitochondria pyruvate uptake presents the same extension in time in the medial lobes as in the vertical lobes when comparing spaced training (Figure 6 E-F and Figure 6 – figure supplement 1E-F) to 1x training (Figure 6A-B and Figure 6 – figure supplement 1C-D). Therefore, the metabolic action of PKCδ seems not to be restricted to the vertical lobes, but spreads across the whole axonal compartment.

      Altogether, these data point toward the fact that activated PKCδ diffuse from its point of activation, the peduncle, where dopamine is released by MP1 and DAMB is activated, to both the vertical and medial lobes, either by passive diffusion, or taking advantage of mitochondrial movement that was shown to be triggered by spaced training (Pavlowsky et al. 2024), from the MB neurons somas to the axons. To further characterize the kinetics of PKCδ activation, we measured its activity using the mitoδCKAR sensor at 3 and 8 hours following spaced training. We found that while PKCδ was still active at 3 hours, it was back to its baseline activity level at 8 hours, both at the level of the peduncle and the vertical lobes (Figure 5 C-F). However, at 8 hours, pyruvate metabolism is still upregulated in the lobes, which indicates that an additional mechanism is relaying PKCδ action to maintain the high energy state of the MBs at later time points. As we propose in the revised discussion, the mitochondrial motility hypothesis makes sense here (Pavlowsky et al. 2024), as the progressive increase in the number of mitochondria in the lobes would be able to sustain high mitochondrial metabolism beyond PKCδ activation at 8 hours post-conditioning. This new result and its implications open exciting perspectives for future research about the different mitochondrial regulations occurring after spaced training, their organization over time and their interactions.

      Fig.7:  PDK written in yellow is almost invisible

      This has been changed.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      The authors use the teleost medaka as an animal model to study the effect of seasonal changes in day-length on feeding behaviour and oocyte production. They report a careful analysis of how day-length affects female medakas and a thorough molecular genetic analysis of genes potentially involved in this process. They show a detailed analysis of two genes and include a mutant analysis of one gene to support their conclusions

      Strengths:

      The authors pick their animal model well and exploit the possibilities to examine in this laboratory model the effect of a key environmental influence, namely the seasonal changes of day-length. The phenotypic changes are carefully analysed and well-controlled. The mutational analysis of the agrp1 by a ko-mutant provides important evidence to support the conclusions. Thus this report exceeds previous findings on the function of agrp1 and npyb as regulators of food-intake and shows how in medaka these genes are involved in regulating the organismal response to an environmental change. It thus furthers our understanding of how animals react to key exogenous stimuli for adaptation.

      Weaknesses:

      The authors are too modest when it comes to underscoring the importance of their findings. Previous animal models used to study the effect of these neuropeptides on feeding behaviour have either lost or were most likely never sensitive to seasonal changes of day length. Considering the key importance of this parameter on many aspects of plant and animal life it could be better emphasised that a suitable animal model is at hand that permits this. The molecular characterization of the agrp1 ko-mutant that the authors have generated lacks some details that would help to appreciate the validity of the mutant phenotype. Additional data would help in this respect.

      We would like to thank Reviewer #1 for the really constructive advice. In the revised manuscript, we will try to provide more information on the molecular characterization of the agrp1 KO-mutant and to emphasize the importance of our present animal model that permits the analysis of neuropeptide effects on feeding behavior in response to seasonal changes of day length.

      Reviewer #2 (Public review):

      Summary:

      The authors investigated the mechanisms behind breeding season-dependent feeding behavior using medaka, a well-known photoperiodic species, as a model. Through a combination of molecular, cellular, and behavioral analyses, including tests with mutants, they concluded that AgRP1 plays a central role in feeding behavior, mediated by ovarian estrogenic signals.

      Strengths:

      This study offers valuable insights into the neuroendocrine mechanisms that govern breeding season-dependent feeding behavior in medaka. The multidisciplinary approach, which includes molecular and physiological analyses, enhances the scientific contribution of the research.

      Weaknesses:

      While medaka is an appropriate model for studying seasonal breeding, the results presented are insufficient to fully support the authors' conclusions.

      Specifically, methods and data analyses are incomplete in justifying the primary claims:<br /> - the procedure for the food intake assay is unclear;

      - the sample size is very small;

      - the statistical analysis is not always adequate.

      Additionally, the discussion fails to consider the possible role of other hormones that may be involved in the feeding mechanism.

      We would like to thank Reviewer #2 for the helpful comments. As the reviewer suggested, we will try to edit the paragraph describing the procedure for the food intake assay to make it much easier for the readers to understand in the revised manuscript. In Figure 1-Supplementary figure 2, RNAseq was performed to search for the candidate neuropeptides, and that’s why the sample size was the minimum. On the other hand, each group in the other experiments consist of n ≥ 5 samples, which is usually accepted to be an adequate sample size in various studies (cf. Kanda et al., Gen Comp Endocrinol., 2011, Spicer et al., Biol Reprod., 2017). As for the statistical analyses, we will revise our manuscript so that the readers may be convinced with the validity of our statistical analyses.

      Reviewer #3 (Public review):

      Summary:

      Understanding the mechanisms whereby animals restrict the timing of their reproduction according to day length is a critical challenge given that many of the most relevant species for agriculture are strongly photoperiodic. However, the principal animal models capable of detailed genetic analysis do not respond to photoperiod so this has inevitably limited progress in this field. The fish model medaka occupies a uniquely powerful position since its reproduction is strictly restricted to long days and it also offers a wide range of genetic tools for exploring, in depth, various molecular and cellular control mechanisms.

      For these reasons, this manuscript by Tagui and colleagues is particularly valuable. It uses the medaka to explore links bridging photoperiod, feeding behaviour, and reproduction. The authors demonstrate that in female, but not male medaka, photoperiod-induced reproduction is associated with an increase in feeding, presumably explained by the high metabolic cost of producing eggs on a daily basis during the reproductive period. Using RNAseq analysis of the brain, they reveal that the expression of the neuropeptides agrp and npy that have been previously implicated in the regulation of feeding behaviour in mice are upregulated in the medaka brain during exposure to long photoperiod conditions. Unlike the situation in mice, these two neuropeptides are not co-expressed in medaka neurons, and food deprivation in medaka led to increases in agrp but also a decrease in npy expression. Furthermore, the situation in fish may be more complicated than in mice due to the presence of multiple gene paralogs for each neuropeptide. Exposure to long-day conditions increases agrp1 expression in medaka as the result of increases in the number of neurons expressing this neuropeptide, while the increase in npyb levels results from increased levels of expression in the same population of cells. Using ovariectomized medaka and in situ hybridization assays, the authors reveal that the regulation of agrp1 involves estrogen acting via the estrogen receptor esr2a. Finally, a loss of agrp1 function mutant is generated where the female mutants fail to show the characteristic increase in feeding associated with long-day enhanced reproduction as well as yielding reduced numbers of eggs during spawning.

      Strengths:

      This manuscript provides important foundational work for future investigations aiming to elucidate the coordination of photoperiod sensing, feeding activity, and reproduction function. The authors have used a combination of approaches with a genetic model that is particularly well suited to studying photoperiodic-dependent physiology and behaviour. The data are clear and the results are convincing and support the main conclusions drawn. The findings are relevant not only for understanding photopriodic responses but also provide more general insight into links between reproduction and feeding behaviour control.

      Weaknesses:

      Some experimental models used in this study, namely ovariectomized female fish and juvenile fish have not been analysed in terms of their feeding behaviour and so do not give a complete view of the position of this feeding regulatory mechanism in the context of reproduction status. Furthermore, the scope of the discussion section should be expanded to speculate on the functional significance of linking feeding behaviour control with reproductive function.

      We would like to thank Reviewer #3 for the insightful advice. We will try to revise several pertinent sentences describing the ovariectomized female fish and juvenile fish so that our present experimental results will give more complete view of their feeding regulatory mechanism in the context of reproduction status. We will also try to expand and revise the discussion section to incorporate the valuable suggestion of Reviewer #3.

    1. Author response:

      I thank the Senior Editor and the three reviewers for their consideration and careful assessments, which I find fair and justified. I agree the evidence is inadequate that single cell fluctuating CpG DNA methylation allows for human neuron lineage tracing. I agree with Reviewer #1 that fCpGs essentially function as “a cellular division clock that diverges over time”, but that fCpG methylation also records ancestry because cells with more similar patterns should be more related than cells with different patterns. However, as noted, there are alternative explanations that could explain fCpG DNA methylation pattern neuronal differences, or potentially obscure ancestry recorded by replication errors. Lineage tracing with fCpG methylation previously appeared possible in human intestines, endometrium, and blood, and potentially a similar approach could be used to reconstruct human brain cell ancestries.

      I intend to revise the manuscript in a few weeks to address points raised by reviewers. These include a) editing to improve clarity and correct neurodevelopmental concepts, and b) adding a supplement that explains in much more detail how fCpG methylation may record cell divisions and ancestries. As recommended, additional “experiments” will be added including a) an analysis of single cell zygote to inner cell mass data to illustrate how fCpG brain barcode methylation changes between cell divisions very early in development before neurogenesis, and b) an analysis of newly released single cell brain aging data (Chien et al., 2024, Neuron 112, 2524–2539, August 7, 2024) that should help address issues of reproducibility and barcode stability over time. The evidence for lineage tracing will still be incomplete, but the modifications should help support the idea that fCpG methylation can record somatic cell ancestries.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The overall analysis and discovery of the common motif are important and exciting. Very few human/primate ribozymes have been published and this manuscript presents a relatively detailed analysis of two of them. The minimized domains appear to be some of the smallest known self-cleaving ribozymes.

      Strengths:

      The manuscript is rooted in deep mutational analysis of the OR4K15 and LINE1 and subsequently in modeling of a huge active site based on the closely-related core of the TS ribozyme. The experiments support the HTS findings and provide convincing evidence that the ribozymes are structurally related to the core of the TS ribozyme, which has not been found in primates prior to this work.

      Weaknesses:

      (1) Given that these two ribozymes have not been described outside of a single figure in a Science Supplement, it is important to show their locations in the human genome, present their sequence and structure conservation among various species, particularly primates, and test and discuss the activity of variants found in non-human organisms. Furthermore, OR4K15 exists in three copies on three separate chromosomes in the human genome, with slight variations in the ribozyme sequence. All three of these variants should be tested experimentally and their activity should be presented. A similar analysis should be presented for the naturally-occurring variants of the LINE1 ribozyme. These data are a rich source for comparison with the deep mutagenesis presented here. Inserting a figure (1) that would show the genomic locations, directions, and conservation of these ribozymes and discussing them in light of this new presentation would greatly improve the manuscript. As for the biological roles of known self-cleaving ribozymes in humans, there is a bioRxiv manuscript on the role of the CPEB3 ribozyme in mammalian memory formation (doi.org/10.1101/2023.06.07.543953), and an analysis of the CPEB3 functional conservation throughout mammals (Bendixsen et al. MBE 2021). Furthermore, the authors missed two papers that presented the discovery of human hammerhead ribozymes that reside in introns (by de la PeÃ{plus minus}a and Breaker), which should also be cited. On the other hand, the Clec ribozyme was only found in rodents and not primates and is thus not a human ribozyme and should be noted as such.

      We thank this Reviewer for his/her input and acknowledgment of this work. To improve the manuscript, we have included the genomic locations in Figure 1A, Figure 6A and Figure 6C. And we have tested the activity of representative variants found in the human genome and discussed the activity of the variants in other primates. All suggested publications are now properly cited.

      Line 62-66: It has been shown that single nucleotide polymorphism (SNP) in CPEB3 ribozyme was associated with an enhanced self-cleavage activity along with a poorer episodic memory (14). Inhibition of the highly conserved CPEB3 ribozyme could strengthen hippocampal-dependent long-term memory (15, 16). However, little is known about the other human self-cleaving ribozymes.

      Line 474-501: Homology search of two TS-like ribozymes. To locate close homologs of the two TS-like ribozymes, we performed cmsearch based on a covariance model (38) built on the sequence and secondary structural profiles. In the human genome, we got 1154 and 4 homolog sequences for LINE-1-rbz and OR4K15-rbz, respectively. For OR4K15-rbz, there was an exact match located at the reverse strand of the exon of OR4K15 gene (Figure 6A). The other 3 homologs of OR4K15-rbz belongs to the same olfactory receptor family 4 subfamily K (Figure 6C). However, there was no exact match for LINE-1-rbz (Figure 6A). Interestingly, a total of 1154 LINE-1-rbz homologs were mapped to the LINE-1 retrotransposon according to the RepeatMasker (http://www.repeatmasker.org) annotation. Figure 6B showed the distribution of LINE-1-rbz homologs in different LINE-1 subfamilies in the human genome. Only three subfamilies L1PA7, L1PA8 and L1P3 (L1PA7-9) can be considered as abundant with LINE-1-rbz homologs (>100 homologs per family). The consensus sequences of all homologs obtained are shown in Figure 6D. In order to investigate the self-cleavage activity of these homologs, we mainly focused on the mismatches in the more conserved internal loops. The major differences between the 5 consensus sequences are the mismatches in the first internal loop. The widespread A12C substitution can be found in majority of LINE-1-rbz homologs, this substitution leads to a one-base pair extension of the second stem (P2) but almost no activity (RA’: 0.03) based on our deep mutational scanning result. Then we selected 3 homologs without A12C substitution for LINE-1-rbz for in vitro cleavage assay (Figure 6E). But we didn’t observe significant cleavage activity, this might be caused by GU substitutions in the stem region. For 3 homologs of OR4K15-rbz, we only found one homolog of OR4K15 with pronounced self-cleavage activity (Figure 6F). In addition, we performed similar bioinformatic search of the TS-like ribozymes in other primate genomes. Similarly, the majority (15 out of 18) of primate genomes have a large number of LINE-1 homologs (>500) and the remaining three have essentially none. However, there was no exact match. Only one homolog has a single mutation (U38C) in the genome assembly of Gibbon (Figure S15). The majority of these homologs have 3 or more mismatches (Figure S15). For OR4K15-rbz, all representative primate genomes contain at least one exact match of the OR4K15-rbz sequence.

      Line 598-602: According to the bioinformatic analysis result, there are some TS-like ribozymes (one LINE-1-rbz homolog in the Gibbon genome, and some OR4K15-rbz homologs) with in vitro cleavage activity in primate genomes. Unlike the more conserved CPEB3 ribozyme which has a clear function, the function of the TS-like ribozymes is not clear, as they are not conserved, belong to the pseudogene or located at the reverse strand.

      (2) The authors present the story as a discovery of a new RNA catalytic motif. This is unfounded. As the authors point out, the catalytic domain is very similar to the Twister Sister (or "TS") ribozyme. In fact, there is no appreciable difference between these and TS ribozymes, except for the missing peripheral domains. For example, the env33 sequence in the Weinberg et al. 2015 NCB paper shows the same sequences in the catalytic core as the LINE1 ribozyme, making the LINE1 ribozyme a TS-like ribozyme in every way, except for the missing peripheral domains. Thus these are not new ribozymes and should not have a new name. A more appropriate name should be TS-like or TS-min ribozymes. Renaming the ribozymes to lanterns is misleading.

      Although we observed some differences in mutational effects, we agree with the reviewer that it is more appropriate to call them TS-like ribozymes. We have replaced all “lantern ribozyme” with “TS-like ribozyme” as suggested.

      (3) In light of 2) the story should be refocused on the fact the authors discovered that the OR4K15 and LINE1 are both TS-like ribozymes. That is very exciting and is the real contribution of this work to the field.

      We thank this Reviewer for their acknowledgement of this work. To improve the manuscript, we have re-named the ribozymes as suggested.

      (4) Given the slow self-scission of the OR4K15 and LINE1 ribozymes, the discussion of the minimal domains should be focused on the role of peripheral domains in full-length TS ribozymes. Peripheral domains have been shown to greatly speed up hammerhead, HDV, and hairpin ribozymes. This is an opportunity to show that the TS ribozymes can do the same and the authors should discuss the contribution of peripheral domains to the ribozyme structure and activity. There is extensive literature on the contribution of a tertiary contact on the speed of self-scission in hammerhead ribozymes, in hairpin ribozyme it's centered on the 4-way junction vs 2-way junction structure, and in HDVs the contribution is through the stability of the J1/2 region, where the stability of the peripheral domain can be directly translated to the catalytic enhancement of the ribozymes.

      We appreciate your question and the valuable suggestions provided. We have included the citations and discussion about the peripheral domains in other ribozymes.

      Line 570-576: Thus, a more sophisticated structure along with long-range interactions involving the SL4 region in the twister sister ribozyme must have helped to stabilize the catalytic region for the improved catalytic activity. Similarly, previous studies have demonstrated that peripheral regions of hammerhead (49), hairpin (50) and HDV (51, 52) ribozymes could greatly increase their self-cleavage activity. Given the importance of the peripheral regions, absence of this tertiary interaction in the TS-like ribozyme may not be able to fully stabilize the structural form generated from homology modelling.

      (5) The argument that these are the smallest self-cleaving ribozymes is debatable. LÃ1/4nse et al (NAR 2017) found some very small hammerhead ribozymes that are smaller than those presented here, but the authors suggest only working as dimers. The human ribozymes described here should be analyzed for dimerization as well (e.g., by native gel analysis) particularly because the authors suggest that there are no peripheral domains that stabilize the fold. Furthermore, Riccitelli et al. (Biochemistry) minimized the HDV-like ribozymes and found some in metagenomic sequences that are about the same size as the ones presented here. Both of these papers should be cited and discussed.

      We apologize for any confusion caused by our previous statement. To clarify, we highlighted “35 and 31 nucleotides only” because 46 and 47 nt contain the variable hairpin loops which are not important for the catalytic activity. By comparing the conserved segments, the TS-like ribozyme discussed in this paper is the shortest with the simplest secondary structure. And we have replaced the terms “smallest” and “shortest” with “simplest” in our manuscript. The title has been changed to “Minimal twister sister (TS)-like self-cleaving ribozymes in the human genome revealed by deep mutational scanning”. All the publications mentioned have been cited and discussed. Regarding possible dimerization, we did not find any evidence but would defer it to future detailed structural analysis to be sure.  

      Line 605-608: Previous studies also have revealed some minimized forms of self-cleaving ribozymes, including hammerhead (19, 53) and HDV-like (54) ribozymes. However, when comparing the conserved segments, they (>= 36 nt) are not as short as the TS-like ribozymes (31 nt) found here.

      (6) The authors present homology modeling of the OR4K15 and LINE1 ribozymes based on the crystal structures of the TS ribozymes. This is another point that supports the fact that these are not new ribozyme motifs. Furthermore, the homology model should be carefully discussed as a model and not a structure. In many places in the text and the supplement, the models are presented as real structures. The wording should be changed to carefully state that these are models based on sequence similarity to TS ribozymes. Fig 3 would benefit from showing the corresponding structures of the TS ribozymes.

      We thank the reviewer for pointing these out and we have already fixed them. We have replaced all “lantern ribozyme” with “TS-like ribozyme” as suggested. The term “Modelled structures” were used for representing the homology model. And we have included the TS ribozyme structure in Fig 3.

      Reviewer #2 (Public Review):

      Summary:

      This manuscript applies a mutational scanning analysis to identify the secondary structure of two previously suggested self-cleaving ribozyme candidates in the human genome. Through this analysis, minimal structured and conserved regions with imminent importance for the ribozyme's activity are suggested and further biochemical evidence for cleavage activity are presented. Additionally, the study reveals a close resemblance of these human ribozyme candidates to the known self-cleaving ribozyme class of twister sister RNAs. Despite the high conservation of the catalytic core between these RNAs, it is suggested that the human ribozyme examples constitute a new ribozyme class. Evidence for this however is not conclusive.

      Strengths:

      The deep mutational scanning performed in this study allowed the elucidation of important regions within the proposed LINE-1 and OR4K15 ribozyme sequences. Part of the ribozyme sequences could be assigned a secondary structure supported by covariation and highly conserved nucleotides were uncovered. This enabled the identification of LINE-1 and OR4K15 core regions that are in essence identical to previously described twister sister self-cleaving RNAs.

      Weaknesses:

      I am skeptical of the claim that the described catalytic RNAs are indeed a new ribozyme class. The studied LINE-1 and OR4K15 ribozymes share striking features with the known twister sister ribozyme class (e.g. Figure 3A) and where there are differences they could be explained by having tested only a partial sequence of the full RNA motif. It appears plausible, that not the entire "functional region" was captured and experimentally assessed by the authors.

      We thank this Reviewer for his/her input and acknowledgment of this work. Because a similar question was raised by reviewer 1, we decided to name the ribozymes as TS-like ribozymes. Regarding the entire regions, we conducted mutational scanning experiments at the beginning of this study. The relative activity distributions (Figure 1B, 1C) have shown that only parts of the sequence contributes to the self-cleavage activity. That is the reason why we decided to focus on the parts of the sequence afterwards.

      They identify three twister sister ribozymes by pattern-based similarity searches using RNA-Bob. Also comparing the consensus sequence of the relevant region in twister sister and the two ribozymes in this paper underlines the striking similarity between these RNAs. Given that the authors only assessed partial sequences of LINE-1 and OR4K15, I find it highly plausible that further accessory sequences have been missed that would clearly reveal that "lantern ribozymes" actually belong to the twister sister ribozyme class. This is also the reason I do not find the modeled structural data and biochemical data results convincing, as the differences observed could always be due to some accessory sequences and parts of the ribozyme structure that are missing.

      We appreciate the reviewer for raising this question. As we explained in the last question, we now called the ribozymes as TS-like ribozymes. We also emphasize that the relative activity data of the original sequences have indicated that the other part did not make any contribution to the activity of the ribozyme. The original sequences provided in the Science paper (Salehi-Ashtiani et al. Science 2006) were generated from biochemical selection of the genomic library. It did not investigate the contribution of each position to the self-cleavage activity.

      Highly conserved nucleotides in the catalytic core, the need for direct contacts to divalent metal ions for catalysis, the preference of Mn2+ oder Mg2+ for cleavage, the plateau in observed rate constants at ~100mM Mg2+, are all characteristics that are identical between the proposed lantern ribozymes and the known twister sister class.

      The difference in cleavage speed between twister sister (~5 min-1) and proposed lantern ribozymes could be due to experimental set-up (true single-turnover kinetics?) or could be explained by testing LINE-1 or OR4K15 ribozymes without needed accessory sequences. In the case of the minimal hammerhead ribozyme, it has been previously observed that missing important tertiary contacts can lead to drastically reduced cleavage speeds.

      We thank the reviewer for this question. We now called the ribozymes as TS-like ribozymes. As we explained in the last question, the relative activity data of the original sequences have proven that the other part did not make any contribution to the activity of the ribozyme. Moreover, we have tested different enzyme to substrate ratios to achieve single turn-over kinetics (Figure S13). The difference in cleavage speed should be related to the absence of peripheral regions which do not exist in the original sequences of the LINE-1 and OR4K15 ribozyme. We have included the publications and discussion about the peripheral domains in other ribozymes.

      Line 458-463: The kobs of LINE-1-core was ~0.05 min-1 when measured in 10mM MgCl2 and 100mM KCl at pH 7.5 (Figure S13). Furthermore, the single-stranded ribozymes exhibited lower kobs (~0.03 min-1 for LINE-1-rbz) (Figure S14) when comparing with the bimolecular constructs. This confirms that the stem loop region SL2 does not contribute much to the cleavage activity of the TS-like ribozymes.

      Line 570-576: Thus, a more sophisticated structure along with long-range interactions involving the SL4 region in the twister sister ribozyme must have helped to stabilize the catalytic region for the improved catalytic activity. Similarly, previous studies have demonstrated that peripheral regions of hammerhead (49), hairpin (50) and HDV (51, 52) ribozymes could greatly increase their self-cleavage activity. Given the importance of the peripheral regions, absence of this tertiary interaction in the TS-like ribozyme may not be able to fully stabilize the structural form generated from homology modelling.

      Reviewer 2: ( Recommendations For The Authors):

      Major points

      It would have made it easier to connect the comments to text passages if the submitted manuscript had page numbers or even line numbers.

      We thank the reviewer for pointing this out and we have already fixed it.

      In the introduction: "...using the same technique, we located the functional and base-pairing regions of..." The use of the adjective functional is imprecise. Base-paired regions are also important for the function, so what type of region is meant here? Conserved nucleotides?

      We thank the reviewer for pointing this out. We were describing the regions which were essential for the ribozyme activity. And we have defined the use of “functional region” in introduction.

      Line 95: we located the regions essential for the catalytic activities (the functional regions) of LINE-1 and OR4K15 ribozymes in their original sequences.

      In their discussion, the authors mention the possible flaws in their 3D-modelling in the absence of Mg2+. Is it possible to include this divalent metal ion in the calculations?

      We thank the reviewer for this question. Currently, BriQ (Xiong et al. Nature Communications 2021) we used for modeling doesn’t include divalent metal ion in modeling.

      Xiong, Peng, Ruibo Wu, Jian Zhan, and Yaoqi Zhou. 2021. “Pairing a High-Resolution Statistical Potential with a Nucleobase-Centric Sampling Algorithm for Improving RNA Model Refinement.” Nature Communications 12: 2777. doi:10.1038/s41467-021-23100-4.

      Abstract:

      It is claimed that ribozyme regions of 46 and 47 nt described in the manuscript resemble the shortest known self-cleaving ribozymes. This is not correct. In 1988, hammerhead ribozymes in newts were first discovered that are only 40 nt long.

      We apologize for any confusion caused by our previous statement. To clarify, we highlighted “35 and 31 nucleotides only” as 46 and 47 nt contain the variable hairpin loops which are not important for the catalytic activity. By comparing the conserved segments, the TS-like ribozyme discussed in this paper is the shortest with the simplest secondary structure. And we have replaced the terms “smallest” and “shortest” with “simplest” in our manuscript. The title has been changed to “Minimal TS-like self-cleaving ribozyme revealed by deep mutational scanning”.

      The term "functional region" is, to my knowledge, not a set term when discussing ribozymes. Does it refer to the catalytic core, the cleavage site, the acid and base involved in cleavage, or all, or something else? Therefore, the term should be 1) defined upon its first use in the manuscript and 2) probably not be used in the abstract to avoid confusion to the reader.

      We apologize for any confusion caused by our previous statement. To clarify, we have changed the term “functional region” in abstract. And we have defined the use of “functional region” in introduction.

      Line 34-37: We found that the regions essential for ribozyme activities are made of two short segments, with a total of 35 and 31 nucleotides only. The discovery makes them the simplest known self-cleaving ribozymes. Moreover, the essential regions are circular permutated with two nearly identical catalytic internal loops, supported by two stems of different lengths.

      Line 95: we located the regions essential for the catalytic activities (the functional regions) of LINE-1 and OR4K15 ribozymes in their original sequences.

      The choice of the term "non-functional loop" in the abstract is a bit unfortunate. The loop might not be important for promoting ribozyme catalysis by directly providing, e.g. the acid or base, but it has important structural functions in the natural RNA as part of a hairpin structure.

      We thank the reviewer for pointing this out and we have re-phrased the sentences.

      Line 33-34: We found that the regions essential for ribozyme activities are made of two short segments, with a total of 35 and 31 nucleotides only.

      Line 283: Removing the peripheral loop regions (Figures 1B and 1C) allows us to recognize that the secondary structure of OR4K15-rbz is a circular permutated version of LINE-1-rbz.

      Results:

      Please briefly explain CODA and MC analysis when first mentioned in the results (Figure (1) The more detailed explanation of these terms for Figure 2 could be moved to this part of the results section (including explanations in the figure legend).

      We thank the reviewer for pointing this out and we included a brief explanation.

      Line 150-154: CODA employed Support Vector Regression (SVR) to establish an independent-mutation model and a naive Bayes classifier to separate bases paired from unpaired (26). Moreover, incorporating Monte-Carlo simulated annealing with an energy model and a CODA scoring term (CODA+MC) could further improve the coverage of the regions under-sampled by deep mutations.

      Please indicate the source of the human genomic DNA. Is it a patient sample, what type of tissue, or is it an immortalized cell line? It is not stated in the methods I believe.

      We thank the reviewer for pointing this out. According to the original Science paper (Salehi-Ashtiani et al. Science 2006), the human genomic DNA (isolated from whole blood) was purchased from Clontech (Cat. 6550-1). In our study, we directly employed the sequences provided in Figure S2 of the Science paper for gene synthesis. Thus, we think it is unnecessary to mention the source of genomic DNA in the methods section of our paper.  

      Please also refer to the methods section when the calculation of RA and RA' values is explained in the main text to avoid confusion.

      We thank the reviewer for pointing this out and we have fixed it.

      Line 207-208: Figure 2A shows the distribution of relative activity (RA’, measured in the second round of mutational scanning) (See Methods) of all single mutations

      For OR4K15 it is stated that the deep mutational scanning only revealed two short regions as important. However, there is another region between approx. 124-131 nt and possibly even at positions 47 and 52 (to ~55), that could contribute to effective RNA cleavage, especially given the library design flaws (see below) and the lower mutational coverage for OR4K15. A possible correlation of the mutations in these regions is even visible in the CODA+MC analysis shown in Figure 1D on the left. Why are these regions ignored in ongoing experiments?

      We thank the reviewer for this question. As shown in Table S1, although the double mutation coverage of OR4K15-ori was low (16.2 %), we got 97.6 % coverage of single mutations. The relative activity of these single mutations was enough to identify the conserved regions in this ribozyme. Mutations at the positions mentioned by the reviewer did not lead to large reductions in relative activity. Since the relative activity of the original sequence is 1, we presumed that only positions with average relative activity much lower than 1 might contribute to effective cleavage.

      Regarding the corresponding correlation of mutations in CODA+MC, they are considered as false positives generated from Monte Carlo simulated annealing (MC), because lack of support from the relative activity results.

      Have the authors performed experiments with their "functional regions" in comparison to the full-length RNA or partial truncations of the full-length RNA that included, in the case of OR4K15, nt 47-131? Also for LINE-1 another stem region was mentioned (positions 14-18 with 30-34) and two additional base pairs. Were they included in experiments not shown as part of this manuscript?

      We appreciate the reviewer for raising this question. We only compared the full-length or partial truncations of the LINE-1 ribozyme. Since the secondary structure predicted from OR4K15-ori data was almost the same as LINE-1, we didn’t perform deep mutagenesis on the partial truncation of the OR4K15. However, the secondary structure of OR4K15 was confirmed by further biochemical experiments.   

      Regarding the second question, the additional base pairs were generated by Monte Carlo simulated annealing (MC). They are considered as false positives because of low probabilities and lack of support from the deep mutational scanning results. The appearance of false positives is likely due to the imperfection of the experiment-based energy function employed in current MC simulated annealing. 

      Are there other examples in the literature, where error-prone PCR generates biases towards A/T nucleotides as observed here? Please cite!

      We thank the reviewer for pointing this out and we have included the corresponding citation.

      Line 161-162: The low mutation coverage for OR4K15-ori was due to the mutational bias (27, 28) of error-prone PCR (Supplementary Figures S1, S2, S3 and S4).

      Line 170-171: whose covariations are difficult to capture by error-prone PCR because of mutational biases (27, 28).

      The authors mention that their CODA analysis was based on the relative activities of 45,925 and 72,875 mutation variants. I cannot find these numbers in the supplementary tables. They are far fewer than the read numbers mentioned in Supplementary Table 2. How do these numbers (45,925 and 72,875) arise? Could the authors please briefly explain their selection process?

      We apologize for any confusion caused by our previous statement. Our CODA analysis only utilized variants with no more than 3 mutations. The number listed in the supplementary tables is the total number of the variants. To clarify, we have included a brief explanation for these numbers.

      Line 203-204: We performed the CODA analysis (26) based on the relative activities of 45,925 and 72,875 mutation variants (no more than 3 mutations) obtained for the original sequence and functional region of the LINE-1 ribozyme, respectively.

      What are the reasons the authors assume their findings from LINE-1 can be used to directly infer the structure for OR4K15? (Third section in results, last paragraph)

      We apologize for any confusion caused by our previous statement. We meant to say that the consistency between LINE-1-rbz and LINE-1-ori results suggested that our method for inferring ribozyme structure was reliable. Thus, we employed the same method to infer the structure of the functional region of OR4K15. To clarify, we have re-phrased the sentence.   

      Line 259-261: The consistent result between LINE-1-rbz and LINE-1-ori suggested that reliable ribozyme structures could be inferred by deep mutational scanning. This allowed us to use OR4K15-ori to directly infer the final inferred secondary structure for the functional region of OR4K15.

      There are several occasions where the authors use the differences between the proposed lantern ribozymes and twister sister data as reasons to declare LINE-1 and OR4K15 a new ribozyme class. As mentioned previously, I am not convinced these differences in structure and biochemical results could not simply result from testing incomplete LINE-1 and OR4K15 sequences.

      We apologize for any confusion caused by our previous statement. Despite we observed some differences in mutational effects, we agree with the reviewer that it is not convincing to claim them as a new ribozyme class. We have replaced all “lantern ribozyme” with “TS-like ribozyme” as the reviewer 1 suggested.

      The authors state, that "the result confirmed that the stem loop SL2 region in LINE-1 and OR4K15 did not participate in the catalytic activity". To draw such a conclusion a kinetic comparison between a construct that contains SL2 and does not contain SL2 would be necessary. The given data does not suffice to come to this conclusion.

      We appreciate the reviewer for raising this question. To address this, we performed gel-based kinetic analysis of these two ribozymes (Figure S14).

      Line 458-462: The kobs of LINE-1-core under single-turnover condition was ~0.05 min-1 when measured in 10mM MgCl2 and 100mM KCl at pH 7.5 (Figure S13). Only a slightly lower value of  kobs (~0.03 min-1) was observed for LINE-1-rbz (Figure S14). This confirms that the stem loop region SL2 does not contribute to the cleavage activity of the TS-like ribozymes.

      Construct/Library design:

      The last 31 bp in the OR4K15 ribozyme template sequence are duplicated (Supplementary Table 4). Therefore, there are 2 M13 fwd binding sites and several possible primer annealing sites present in this template. This could explain the lower yield for the mutational analysis experiments. Did the authors observe double bands in their PCR and subsequent analysis? The experiments should probably be repeated with a template that does not contain this duplication. Alternatively, the authors should explain, why this template design was chosen for OR4K15.

      We apologize for this mistake during writing. Our construct design for OR4K15 contains only one M13F binding site. We thank the reviewer for pointing this out and we have fixed the error.

      Figure 5B: Where are the bands for the OR4K15 dC-substrate? They are not visible on the gel, so one has to assume there was no substrate added, although the legend indicates otherwise.

      Also this figure, please indicate here or in the methods section what kind of marker was used. In panels A and B, please label the marker lanes.

      We apologize for this mistake and we have repeated the experiment. The marker lane was removed to avoid confusion caused by the inappropriate DNA marker. 

      The authors investigated ribozyme cleavage speeds by measuring the observed rate constants under single-turnover conditions. To achieve single-turnover conditions enzyme has to be used in excess over substrate. Usually, the ratios reported in the literature range between 20:1 (from the authors citation list e.g.: for twister sister (Roth et al 2014) and hatchet (Li et al. 2015)) or even ~100:1 (for pistol: Harris et al 2015, or others https://www.sciencedirect.com/science/article/pii/S0014579305002061). Can the authors please share their experimental evidence that only 5:1 excess of enzyme over the substrate as used in their experiments truly creates single-turnover conditions?

      We greatly appreciate the Reviewer for raising this question. To address this, we performed kinetic analysis using different enzyme to substrate ratios (Figure S13). There is not too much difference in kobs, except that kobs reach the highest value of 0.048 min-1 when using 100:1 excess of enzyme over the substrate. 

      Line 458-460: The kobs of LINE-1-core under single-turnover condition was ~0.05 min-1 when measured in 10mM MgCl2 and 100mM KCl at pH 7.5 (Figure S13).

      Citations:

      In the introduction citation number 12 (Roth et al 2014) is mentioned with the CPEB3 ribozyme introduction. This is the wrong citation. Please also insert citations for OR4K15 and IGF1R and LINE-1 ribozyme in this sentence.

      We thank the reviewer for pointing this out and we now have fixed it.

      Also in the introduction, a hammerhead ribozyme in the 3' UTR of Clec2 genes is mentioned and reference 16 (Cervera et al 2014) is given, I think it should be reference 9 (Martick et al 2008)

      We thank the reviewer for pointing this out and we now have fixed it.

      In the results section it is stated that, "original sequences were generated from a randomly fragmented human genomic DNA selection based biochemical experiment" citing reference 12. This is the wrong reference, as I could not find that Roth et al 2014 describe the use of such a technique. The same sentence occurs in the introduction almost verbatim (see also minor points).

      We thank the reviewer for pointing this out and we now have fixed it.

      Minor points

      Headline:

      Either use caps for all nouns in the headline or write "self-cleaving ribozyme" uncapitalized

      We thank the reviewer for pointing this out and we now have fixed it.

      Abstract:

      1st sentence: in "the" human genome

      "Moreover, the above functional regions are..." - the word "above" could be deleted here

      "named as lantern for their shape"- it should be "its shape"

      "in term of sequence and secondary structure"- "in terms"

      "the nucleotides at the cleavage sites" - use singular, each ribozyme of this class has only one cleavage site

      We thank the reviewer for pointing these out and we now have fixed them.

      Introduction:

      Change to "to have dominated early life forms"

      Change to "found in the human genome"

      Please write species names in italics (D. melanogaster, B. mori)

      Please delete "hosting" from "...are in noncoding regions of the hosting genome"

      Please delete the sentence fragment/or turn it into a meaningful sentence: "Selection-based biochemical experiments (12).

      Change to "in terms of sequence and secondary structure, suggesting a more"

      Please reword the last sentence in the introduction to make clear what is referred to by "its", e.g. probably the homology model of lantern ribozyme generated from twister sister ribozymes?

      Please refer to the appropriate methods section when explaining the calculation of RA and RA'.

      We thank the reviewer for pointing these out and we now have fixed them.

      The last sentence of the second paragraph in the second section of the results states that the authors confirmed functional regions for LINE-1 and OR4K15, however, until that point the section only presents data on LINE-1. Therefore, OR4K15 should not be mentioned at the end of this paragraph.

      In response to the reviewer's suggestions, we have removed OR4K15 from this paragraph.

      Line 225-228: The consistency between base pairs inferred from deep mutational scanning of the original sequences and that of the identified functional regions confirmed the correct identification of functional regions for LINE-1 ribozyme.

      Change to "Both ribozymes have two stems (P1, P2), to internal loops ..."

      We thank the reviewer for pointing this out and we now have fixed it.

      The section naming the "functional regions" of LINE-1 and OR4K15 lantern ribozymes should be moved after the section in which the circular permutation is shown and explained. Therefore, the headline of section three should read "Consensus sequence of LINE-1 and OR4K15 ribozymes" or something along these lines.

      We thank the reviewer for pointing this out and we now have fixed it.

      Line 308-309: Given the identical lantern-shaped regions of the LINE-1-rbz and OR4K15-rbz ribozyme, we named them twister sister-like (TS-like) ribozymes.

      The statement on the difference between C8 in OR4K15 and U38 in LINE-1 should be further classified. As U38 is only 95% conserved. Is it a C in those other instances or do all other nucleotide possibilities occur? Is the high conservation in OR4K15 an "artifact" of the low mutation rate for this RNA in the deep mutational scanning?

      We thank the reviewer for this question. Yes, the high conservation in OR4K15 an "artifact" of the low mutation rate for this RNA in the deep mutational scanning. That is why RA’ value is more appropriate to describe the conservation level of each position. We also mentioned this in the manuscript:

      Line 287-288: The only mismatch U38C in L1 has the RA’ of 0.6, suggesting that the mismatch is not disruptive to the functional structure of the ribozyme.

      Section five, first paragraph: instead of "two-stranded LINE-1 core" use the term "bimolecular", as it is more commonly used.

      We thank the reviewer for pointing this out and we now have changed it.

      Figure caption 3 headline states "Homology modelled 3D structure..."but it also shows the secondary structures of LINE1, OR4K15 and twister sister examples.

      We thank the reviewer for pointing this out and we now have removed “3D”.

      In Figure 3C, we see a nucleobase labeled G37, however in the secondary structure and sequence and 3D structural model there is a C37 at this position. Please correct the labeling.

      We thank the reviewer for pointing this out and we now have fixed it.

      Section 7 "To address the above question..." please just repeat the question you want to address to avoid any confusion to the reader.

      We thank the reviewer for pointing these out and we have re-phrased this sentence.

      Line 364: Considering the high similarity of the internal loops, we further investigated the mutational effects on the internal loop L1s.

      Please rephrase the sentence "By comparison, mutations of C62 (...) at the cleavage site did not make a major change on the cleavage activity...", e.g. "did not lead to a major change" etc.

      Section 8, first paragraph: This result further confirms that the RNA cleavage in lantern...", please delete "further"

      Change to "analogous RNAs that lacked the 2' oxygen atom in the -1 nucleotide"

      Methods

      Change to "We counted the number of reads of the cleaved and uncleaved..."

      Change to "...to produce enough DNA template for in vitro transcription."

      Change to "The DNA template used for transcription was used..." (delete while)

      We thank the reviewer for pointing these out and we now have fixed them.

      Supplement

      All supplementary figures could use more detailed Figure legends. They should be self-explanatory.

      Fig S1/S2: how is "mutation rate" defined/calculated?

      We thank the reviewer for pointing this out and we now have added a short explanation. The mutation rate was calculated as the proportion of mutations observed at each position for the DNA-seq library.

      Fig S3/S4: axis label "fraction", fraction of what? How calculated?

      We thank the reviewer for pointing this out and we now have added a short explanation. The Y axis “fraction” represents the ratio of each mutation type observed in all variants.

      Fig S5: RA and RA' are mentioned in the main text and methods, but should be briefly explained again here, or it should be clearly referred to the methods. Also, the axis label could be read as average RA' divided by average RA. I assume that is not the case. I assume I am looking at RA' values for LINE-1 rbz and RA values for LINE-1-ori? Also, mention that only part of the full LINE-1-ori sequence is shown...

      We thank the reviewer for pointing this out and we have now added a short explanation. The Y axis represents RA’ for LINE-1-rbz, or RA for LINE-1-ori. The part shown is the overlap region between LINE-1-rbz and LINE-1-ori. We apologize for any confusion caused by our previous statement.

      Fig S9 the magenta for coloring of the scissile phosphate is hard to see and immediately make out.

      We thank the reviewer for pointing this out and we now have added a label to the scissile phosphate.

      Fig S10: Why do the authors only show one product band here? Instead of both cleavage fragments as in Figure 5?

      We thank the reviewer for this question. We purposely used two fluorophores (5’ 6-FAM, 3’ TAMRA) to show the two product bands in Figure 5. In Fig S10, long-time incubation was used to distinguish catalysis based self-cleavage from RNA degradation. This figure was prepared before the purchasing of the substrate used in Figure 5. The substrate strand used in Fig S10 only have one fluorophore (5’ 6-FAM) modification. And the other product was too short to be visualized by SYBR Gold staining.

      Fig S13: please indicate meaning of colors in the legend (what is pink, blue, grey etc.)

      Please change to "RtcB ligase was used to capture the 3' fragment after cleavage...."

      We thank the reviewer for pointing this out and we now have fixed it.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      Materials and Methods section:

      Cell gating and FACS sorting strategies need to be explained. There is no figure legend of supplementary figure 4 which is supposed to explain the gating strategy. Please detail the strategy for each cell types.

      Thank you for your suggestion. We have given a detailed description about the gating and FACS sorting strategies for different liver cell types in supplementary figure 1. In addition, flow cytometry plots of CD45+Ly6C-CD64+F4/80+ KCs from Bmp9fl/flBmp10fl/flLrat Cre mouse were also presented in supplementary figure 1.

      The genetic background of the different mouse strains and the age of the mice should be noted on each figure.

      All the mice used in our study are C57BL/6 background (method section). The age of the mice has been described on each figure.

      The Mann Whitney test instead of the two-tailed student's t-test should be used for the different statistical analyses. Why are the expression counts statically analyzed by 2-tailed Student's t test as they were already identified as DE in RNAseq statistical analysis?

      Thank you for your suggestion. Statical methods have been corrected in the revised manuscript.

      What is the age of the mice and how many are used for each bulk RNAseq?

      This information has been added on the corresponding figure legends.

      Figure 1:

      Figure 1a and c: The qPCR data would be much more interesting if presented as DDct and not as relative value as we do not see the mRNA levels of BMP9 and BMP10 in each Bmp9fl/flBmp10fl/flCre mouse. This would allow to compare the mRNA level of BMP9 versus BMP10. This should be changed in all figures.

      The presentation of qPCR data in Figure 1a have been changed, which is allowed to compare the abundance of BMP9 versus BMP10 mRNA. Figure 1c only shows the expression of BMP10, so it is unnecessary to present qPCR data as DDct. In our bulk RNA sequencing data of liver tissues, we found that BMP9 expression counts is higher than that of BMP10, in line with the data from BioGPS.

      Figure 1e (IF) and f (FACS), the quantification of these data should be added as shown in Fig2d. What is the difference between Fig1e and Fig2d as they both seem to show the quantification of F4/80 in CTL versus Bmp9fl/flBmp10fl/flLratCre mice. Are the cells sorted in Fig1f and 1e and suppl Fig1b? if yes please precise the strategy. If they are not gated how can the authors obtain 93% of KC? The reference Tillet et al., JBC 2018 should be added in the discussion of figure 1 as it is the first description of BMP10 in HSC.

      The quantitative data of Figure 1e and 1f have been added in our revised manuscript. Compared with other tissue-resident macrophages, CLEC4F as a KC-specific marker exclusively expressed on KCs. In our previous report (PMID: 34874921), we demonstrated that BMP9/10-ALK1 signal induced the expression of CLEC4F. The data shown in Figure 1e repeated this phenotype that upon loss of BMP9/10-ALK1 signal, liver macrophages did not express CLEC4F. F4/80 in Figure 1e was used as an internal positive control. Fig2d showed the quantification of F4/80 and CD64, two pan-macrophage markers, which was more accurate to measure the number of liver macrophages, especially given that F4/80 mean fluorescence intensity was reduced in liver macrophages of Bmp9fl/flBmp10fl/flLrat Cre mice. Cells in Fig1f, 1e and suppl Fig1b were not sorted and the flow cytometry plots of these cells were pre-gated on live CD45+Ly6C-CD64+F4/80+ liver macrophages. The reference Tillet et al., JBC 2018 has been added in our revised manuscript.

      Supplementary 4 should have a detailed figure legend and should appear before gating experiments. What cell subtype is used for each cell type gating. Please add the exact references of all the antibodies used and if they are fluorescently labeled antibodies. Why is the number of lymphocytes noted and how is it calculated? The gating strategy for the Bmp9fl/flBmp10fl/flLratCre mice should also be showed as the number of FA4/80+ and Tim4+ cells are decreased.

      A detailed figure legend has been added in original supplementary figure 4 that has been moved to supplementary figure 1 in our revised manuscript. The antibodies used in our study were also used in our previous report (PMID: 34874921) and others (PMID: 31561945; PMID: 26813785). Lymphocytes number on flow cytometry plots will automatically appear when we analyze flow cytometry data, so it does not mean that these selected cells are lymphocytes. To avoid the misunderstanding, these words have been deleted. The gating strategy of CD45+Ly6C-CD64+F4/80+ liver macrophages for the Bmp9fl/flBmp10fl/flLrat Cre mice was showed in our revised manuscript (Supplementary Figure 1).

      Figure 2:

      Figure 2a: How many mice were used for bulk RNAseq at what age? Please describe the gating strategy for sorting liver macrophages. The PCA should be shown. The genes represented in Fig2c and cited in the text should be shown on the volcano plot and the heatmap (Timd4, Cdh5, Cd5l). A reference for these KC and monocytic markers should be added in the text.

      Control and Bmp9fl/flBmp10fl/flLrat Cre mice at the age of 8-10 weeks (n=3/group) were used for bulk RNAseq. This information has been added in Figure 2a legend. The PCA, Timd4 gene and references for these KC and monocytic markers have been shown in our revised manuscript according to your suggestion.

      Figure 2b: How are selected the genes represented in the heatmap? The top ones? If it is a KC signature the authors should give a reference for this signature.

      These genes were KC signature genes. The reference (PMID: 30076102) has been given in our revised manuscript.

      Fig2e: Please explain what is the Vav1 promoter and in which cells it will delete Alk1and Smad4? The authors also need to show that Alk1 and Smad4 are indeed deleted in these mice and in which cell subtype (EC and KC?). This is an important point as the authors conclude that other molecular mechanisms than Smad4 signaling may affect the phenotypes of liver macrophages in Bmp9fl/flBmp10fl/flLratCre.

      Cre recombinase of Vav1Cre mice is expressed at high levels in hematopoietic stem cells (PMID: 27185381). This strain is widely used to target all hematopoietic cells with a high efficiency (PMID: 24857755). In our previous report (PMID: 34874921), we demonstrated that Alk1 (Supplemental Figure 6A) and Smad4 (Supplemental Figure 6G) were efficiently deleted in KCs from Alk1fl/flVav1Cre and Smad4fl/flVav1Cre mice, respectively. This sentence and reference have been added in our revised manuscript. Homozygous loss of ALK-1 causes embryonically lethality due to aberrant angiogenesis (PMID: 28213819). EC-specific ALK1 knockout in the mouse through deletion of the ALK1 gene from an Acvrl12loxP allele with the EC-specific L1-Cre line results in postnatal lethality at P5, and mice exhibiting hemorrhaging in the brain, lung, and gastrointestinal tract (PMID: 19805914). In contrast, Alk1fl/flVav1Cre mice generated in our lab did not observe this phenomenon or body weight loss, and still survived at the age of 16 weeks. Thus, we don’t think that ECs can be targeted by Vav1Cre strain, at least in our experimental system.

      Supl Figure 3 (revised Supl Figure 4): The authors need to explain what cell types are affected by Csf1r-Cre and Clec4fDTR. Have the authors tried to perform a similar experiment in Bmp9fl/flBmp10fl/flLratCre? The legend of the Y axis is not clear, why is CD45+ used in the first bar graph while the other two graphs use F4/80+?

      We (PMID: 34874921) and others (PMID: 31587991; PMID: 31561945; PMID: 26813785) have demonstrated that Clec4f specifically expressed on KCs and thus only KCs can be deleted in Clec4fDTR mice after DT injection. CSF1R, also known as macrophage colony-stimulating factor receptor (M-CSFR), is the receptor for the major monocyte/macrophage lineage differentiation factor CSF1. Thus, Csf1r-Cre strain can target monocyte, monocyte-derived macrophage and tissue-resident macrophage including liver, spleen, intestine, heart, kidney, and muscle with a high efficiency (PMID: 29761406). We did not perform a similar experiment in Bmp9fl/flBmp10fl/flLrat Cre mice as we have demonstrated that the differentiation of liver macrophages from Bmp9fl/flBmp10fl/flLrat Cre mice is inhibited. The other two graphs in Supl Figure 4C were obtained from Supl Figure 4B. Flow cytometry plots in Supl Figure 4B are pre-gated on CD45+Ly6C-CD64+F4/80+ liver macrophages, so it is appropriate to use F4/80+ as an internal control.

      Figure 3: Same remarks as in Figure 2. How many mice were used for bulk RNAseq, at what age? The PCA should be shown. How were selected the genes represented in the heatmap? The top ones? A reference should be given for the sinusoidal EC and the continuous EC signatures and large artery signature. Maf and Gata4 should be shown on the volcano plot. A quantification for CD34 IF (Fig3e) as well as for the quantification of the FACS data (Fig 3f) should be added.

      Control and Bmp9fl/flBmp10fl/flLrat Cre mice at the age of 8-10 weeks (n=3/group) were used for bulk RNAseq. According to your suggestion, other revisions have been made.

      Figure 4: A quantification and statistical analysis of Prussian staining area and GS IF should be added not just number of mice which were affected.

      A quantification and statistical analysis of Prussian staining area and GS IF has been added.

      Minor points:

      Few spelling mistakes that should be checked.

      Figure 5a, some bar graphs are missing.

      Spelling mistakes and missing bar graphs in Figure 5a have been corrected.

      Reviewer #2 (Recommendations For The Authors):

      The authors should provide some additional information:

      - Did the single HSC-KO mice for either BMP9 or BMP10 already show partial phenotypes?

      We think that under steady state, the phenotype of KCs and ECs, described in our manuscript, in the livers of single HSC-KO mice for either BMP9 or BMP10 was not altered. However, we don’t know whether the role of BMP9 and BMP10 is still redundant in liver diseases or inflammation, which is worth further studying.

      - The authors should also stain Endomucin, Lyve1, CD32b on liver tissue to assess endothelial zonation/differentiation in addition to FACS analysis.

      In our revised manuscript, we performed immunostaining for Endomucin and Lyve1 and found increased expression of Endomucin and decreased expression of Lyve1 (Figure 3g), suggesting that endothelial zonation/differentiation was disrupt in the liver of Bmp9fl/flBmp10fl/flLrat Cre mice compared to their littermates. We did not stain CD32b expression in the liver section as there is no good antibody against mouse CD32b for frozen sections.

      - Did the authors assess BMP9/BMP10 effects individually and combined in vitro on KC and EC? Are these likely only direct effects or may they also involve each other (i.e. also cross talk between KC and EC in response to BMP9/10?). This could be assessed in co-culture models.

      Using ALK1 reporter mice, we demonstrated that KCs and liver ECs express ALK1.We and others have shown that in vitro stimulation with BMP9/BMP10 can induce the expression of ID1/ID3 and GATA4/Maf in KCs and ECs (PMID: 34874921; PMID: 35364013; PMID: 30964206), respectively. These results suggested that BMP9/BMP10 can directly function on KCs and ECs. Indeed, we are also interested in the crosstalk between KCs and ECs. However, in vitro coculture system can not mimic the interaction between KCs and ECs in the liver as these cells will lose their identity upon their isolation from liver environment. Nevertheless, Bonnardel et al. applied Nichenet bioinformatic analysis to predict that liver ECs provide anchoring site, Notch and CSF1 signal for KCs (PMID: 31561945). Of course, this prediction still needs experimental validation.

      - The abstract should be rephrased and more specific focus on BMP related intercellular crosstalk in the liver and its implications for liver health and disease. At the end of the abstract they should also emphasize for which specific fields/topics/diseases these findings are important.

      Thank you for your suggestion. The abstract has been rephrased and we hope this abstract could satisfy you.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public reviews:

      Response to Reviewer #1 (Public Review):

      The reviewer is correct that the previous explanation of the fitness calculation could be considered insufficient as it was only briefly described in Results. In the revised manuscript, in the "Supplementary Materials" section and then in "Supplementary Text 1", we provide a full definition of the fitness of strains carrying single or multiple mutations and thus show how epistasis was calculated.

      Response to Reviewer #2 (Public Review):

      In our opinion, the reviewer's comments relate to three issues. First, our finding that the level of transcription of the monosomic chromosomes is not upregulated was not compared with the results of other studies, including those in other organisms. Indeed, we did not mention that the gene dosage distortions introduced by aneuploidy are frequently and profoundly compensated in multicellular organisms. We cite the suggested broad and recent review paper in the revised manuscript (line 247). We also removed the somewhat provocative sentence: “The relationship between transcriptome and proteome is generally fixed in yeast”. Regarding this organism, both data and opinions remain indeed conflicting when considering the work with many different yeast strains. But the standard laboratory strains stand out as those where dosage compensation is absent or weak. A paper published a year ago states flatly: "... at least in the strain background used here (authors: BY, the same we use), aneuploidies are transmitted to transcriptome and proteome with a minimum of gene-dosage buffering, rendering aneuploidies discoverable by proteomics" (Messner et al. 2023). A more recent paper reports: "In lab-generated aneuploids, some proteins - especially subunits of protein complexes - show reduced expression, but the overall protein levels correspond to the aneuploid gene dosage" (Muenzner et al. 2024). This "reduced expression" was seen in disomics and was achieved by upregulated proteolysis, whereas we have monosomics and downregulated proteolysis. In summary, we cannot back away from our claim that the biases introduced by monosomy were not compensated. (It is not critical to our paper, we could do it and still leave our main claim about extraordinarily high positive epistasis intact). Muezner and colleagues do report compensation, but in "wild" strains. Our explanation would be that the existing yeast aneuploids are not a random sample of aneuploid mutations. In particular, they could be strains, perhaps relatively rare, in which the genetic background was permissive for aneuploidy from the start or allowed rapid evolution toward tolerance of aneuploidy. Strains with rigid gene-mRNA-protein relationships suffer so much that they perish unless they are shielded from selection, as is possible in the laboratory. The reviewer will know better whether this might also apply to multicellular organisms.

      The second concern is that we did not sufficiently report "... the trends of up- and downregulation across the genome and whether there are any genes on the varied chromosome that are dosage compensated". We believe we have indeed done this, albeit mostly in a simple graphical fashion. For the whole genomes, Datasheet 2 reports the extent of down- or up-regulation for each gene in each strain and highlights those that are statistically significant. We do not analyze the distributions of these deviations because they are relative. They represent individual gene down- and up-regulations within a monosomic transcriptome compared to the corresponding genes in the diploid transcriptome, with the total size of the transcriptomes set equal. Thus, the downs and ups cancel each other out, the left and right sides of the distribution would be equal in their totals, and we have no meaningful expectations about the possible variation in the shapes of the overall distributions or their opposite sides. As for the "varied chromosome", we show that there were extensive down- and up-regulations on the monosomic chromosomes, even though the mean expression for them was half that of the diploid chromosomes. This can be seen in Figure 3B as blue and red colored bars that are present on each monosomic chromosome and intermingled along its length. The purpose of these graphs is to show that even the genes in which the halving of the dose was most damaging to fitness (most negative values of rDR-1) did not tend to be upregulated on average (both blue and red colors are found among them). We consider this an important and original part of our data.

      Finally, the reviewer is concerned that we are only dealing with the relative abundance of mRNA species. He/she suggests that "... an experiment that would clarify the results would be to perform estimates of the total transcriptome size. If the general transcriptome size is indeed increased, the claims of reduced proteosome expression may need to be revised". We followed this advice and extracted transcriptomes from known amounts of yeast cells with known amounts of standard mRNA or "spike" added. We thus seriously considered the reviewer's suggestion, even though it was contrary to our intuition and, we believe, was not confirmed in the additional experiment. The results are reported in the last paragraph of Results and shown in Supplementary Figure S3. Our arguments are listed in that paragraph, so we will not repeat them here.

      Response to Reviewer #3 (Public Review):

      (1) Figure 3b – both its legend and reference to it in the main text are corrected in line with suggestions made by Reviewers #1 and #3.  

      (2) We had to restrict our mRNA analysis to about a half of strains. We decided for purely random selection. It left M4 outside but nevertheless included M2, M10 or M16 representing the strains with especially high level of epistasis. See msc. lines 161-162.

      (3) We agree, and say so in the article, that both the loss and gain of a copy of a chromosome most likely result in errors in mitosis. By "endoreduplication" we mean any event resulting in two chromosomes instead of one, not necessarily additional DNA replication as we now clarify. We also suggest that both loss and endoreduplication occurred in all strains, but in M7 and M13 they happened so close together that we could not isolate the rare monosomic cells from the rapidly spreading revertants (lines 86-91).

      Recommendations for the authors:

      Reply to Reviewer #1 (Recommendations for The Authors):

      The legend to Fig. 3b is hopefully clearer now.

      Reply to Reviewer #2 (Recommendations for The Authors):

      We understand that these points were raised also in the public review so the answer to the latter is also relevant to the recommendations for authors.

      Reply to Reviewer #3 (Recommendations for The Authors):

      (1) The first sentence of this comment may be based on a misinterpretation of our main argument. We believe that the upregulation of ribosomal protein (RP) coding genes was not helpful, but harmful. It was costly because RPs are a large part of the proteome, but it did not help translation because it did not restore the stoichiometry of RPs. This unproductive investment reduced the rate of remaining metabolism, so that other impairments introduced by halving the doses of other genes were no longer critical, and this made them unobservable at the level of phenotype, i.e., produced epistasis. However, both this Reviewer and Reviewer 2 seem to suggest that an entire translational apparatus may have been expanded, compensating for its reduced efficiency (per transcript). Reviewer 2 suggested an mRNA spike as a standard, and we followed this approach as more accessible to us. (We reiterate our claim of good agreement between mRNAs and proteins in the BY strain, supported by two new important papers, line 256-257). The results are reported in the last paragraph of Results. We believe that they indicate a reduction, not an increase, in the translational apparatus (including its parts encoded on the monosomal chromosomes), so that our explanation of positive epistasis remains unchallenged.

      (2) We re-examined the sequences and found that there were heterozygous SNPs in the same gene, RSP5, in several strains. One was a loss of a START codon (M3, M4, M6, M8, M9, M10, M14, M16), always the same. The other was a substitution, always the same, in M5, M11 and M15. There were no mutations in this gene in M1 and M2. We tested our stock haploid strains BY4741 and 4742 and found that they were not mutated. However, we also recovered the specific haploid strains used in the final crosses to construct the diploid strains used to obtain monosomics. Some had one of the two mutations, some were clean. All grew normally, the mutants were similar to the wild types, indicating that the fitness effect of the mutations, even in haploids, was at most partial, since the expected severe effects of RSP5 inactivation were not visible.

      Where do the mutations come from? In previous experiments, we subjected some BY strains to severe selection regimes. As we can now surmise, mutations in RSP5 helped to resist some of them, especially those involving overexpression of selected genes. (We do not summarize here the results of our lengthy review of our notes and the literature leading this explanation to be the most plausible). Unfortunately, we used strains that went through that harsh selection in crosses serving to derive another collection of strains, those used here.

      How critical is it? First, the mutations were heterozygous, which further reduced their apparently weak effects. Second, M1 and M2 were free of them. Third, we tried to get clean monosomics, i.e. with type homozygous for RSP5. We obtained such strains with monosomy as the only change for M9, M10 and M16. The other three attempts did not yield correct M3, M5 and M6, but complex aneuploids. This is normal, as we explain (complain) in Results. We would have to isolate a large number of potential monosomies and then sequence them to show that all exact monosomies can be derived in the absence of mutations in RSP5. We believe that after an effort comparable to that required to obtain the first set of monosomics, we would complete it. For financial and organizational reasons, this is not possible at this time. We do not consider it necessary to complete the revision. Note that of the five mutation-free straight monosomics, M2, M10 and M16 are among the most affected and thus have the highest positive epistasis. Yes, the role of point mutations cannot be excluded for other monosomics, although we strongly believe it is unlikely. But we have removed all our previous claims that our monosomies are certainly not supported by other genetic changes. Most importantly, our main claim of positive epistasis in its purely descriptive genetic sense remains unaffected. The main functional argument also holds: the indiscriminate overproduction of unbalanced RP proteins was so costly that inefficiencies introduced in functional modules other than biosynthesis become much less relevant. Thus, the main message of our work does not depend on the thinkable, in our view unlikely, role of mutations in RSP5.

      We provided this lengthy explanation to show that we cared about the reviewer's comment and tried to deal with it in an honest way. It was a lot of pain and no gain for us, but we are still grateful for the opportunity to re-examine our main claims.  

      (3) The 16 (non-essential) plus 16 (essential) strains were replicated 3 times each. In preliminary experiments, we tested that they were not statistically different (using one-way ANOVA). We considered these 32 strains to have the same genetic background, and thus we considered the 96 estimates homogeneous, except for being influenced only by environmental variation or random error.

      (4) We changed the description of Figure 3b to explain that a particular color shows a range (not its boundary) of log2 fold change (FC) relative to the control.

      (5) Corrected.

      (6) Corrected.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this important study, Huffer et al posit that non-cold sensing members of the TRPM subfamily of ion channels (e.g., TRPM2, TRPM4, TRPM5) contain a binding pocket for icilin which overlaps with the one found in the cold-activated TRPM8 channel.

      The authors identify the residues involved in icilin binding by analyzing the existing TRPM8-icilin complex structures and then use their previously published approach of structure-based sequence comparison to compare the icilin binding residues in TRPM8 to other TRPM channels. This approach uncovered that the residues are conserved in a number of TRPM members: TRPM2, TRPM4, and TRPM5. The authors focus on TRPM4, with the rationale that it has the simplest activation properties (a single Ca2+-binding site). Electrophysiological studies show that icilin by itself does not activate TRPM4, but it strongly potentiates the Ca2+ activation of TRPM4, and introducing the A867G mutation (the mutation that renders avian TRPM8 sensitive to icilin) further increases the potentiating effects of the compound. Conversely, the mutation of a residue that likely directly interacts with icilin in the binding pocket, R901H, results in channels whose Ca2+ sensitivity is not potentiated by icilin.

      The data indicate that, just like in TRPV channels, the binding pockets and allosteric networks might be conserved in the TRPM subfamily.

      The data are convincing, and the authors employ good experimental controls.

      We appreciate the supportive feedback of this reviewer.

      Reviewer #2 (Public Review):

      Summary:

      The authors set out to study whether the cooling agent binding site in TRPM8, which is located between the S1-S4 and the TRP domain, is conserved within the TRPM family of ion channels. They specifically chose the TRPM4 channel as the model system, which is directly activated by intracellular Ca2+. Using electrophysiology, the authors characterized and compared the Ca2+ sensitivity and the voltage dependence of TRPM4 channels in the absence and presence of synthetic cooling agonist icilin. They also analyzed the mutational effects of residues (A867G and R901H; equivalent mutations in TRPM8 were shown involved in icilin sensitivity) on Ca2+ sensitivity and voltage-dependence of TRPM4 in the absence and presence of Ca2+. Based on the results as well as structure/sequence alignment, the authors concluded that icilin likely binds to the same pocket in TRPM4 and suggested that this cooling agonist binding pocket is conserved in TRPM channels.

      Strengths:

      The authors gave a very thorough introduction to the TRPM channels. They have nicely characterized the Ca2+ sensitivity and the voltage-dependence of TRPM4 channels and demonstrated icilin potentiates the Ca2+ sensitivity and diminishes the outward rectification of TRPM4. These results indicate icilin modulates TRPM4 activation by Ca2+.

      We appreciate the supportive feedback of this reviewer.

      Weaknesses:

      The reviewer has a few concerns. First, icilin alone (at 25µM) and in the absence of Ca2+ does not activate the TRPM4 channel. Have the authors titrated a wide range of icilin concentrations (without Ca2+ present) for TRPM4 activation? It raises the question that whether icilin is indeed an agonist for TRPM4 channel. This has not been tested so it is unclear. One may argue that icilin needs Ca2+ as a co-factor for channel activation just like in TRPM8 channel. This leads to the second concern, which is a complication in the experimental design and data interpretation. TRPM4 itself requires Ca2+ for activation to begin with, thus it is hard to dissect whether the current observed here for TRPM4 is activated by Ca2+ or by icilin plus its cofactor Ca2+. This is the difference between TRPM8 and TRPM4, as TRPM8 itself is not activated by Ca2+, thus TRPM8 activation is through icilin and Ca2+ acts as a prerequisite for icilin activation.

      We agree that the comparison between TRPM8 and TRPM4 is not perfect because TRPM4 requires Ca2+ for activation, but it is clear that the current activated by Ca2+ in the presence of icilin also involves icilin because it activates at lower Ca2+ concentrations and lower voltages. We have tested icilin at concentrations between 12.5 and 25 µM and at these concentrations icilin does not activate TRPM4 when applied alone, so we have no evidence that it is an agonist. Both of these concentrations are higher than those reported by Chuang et al. to be saturating for TRPM8 in the presence of Ca2+. We haven’t tested icilin at higher concentrations because we wanted to keep the final concentration of DMSO low enough to avoid any effects of the vehicle. We now emphasize this even more clearly in the revised manuscript.

      The results presented in this study are only sufficient to show that icilin modulates the Ca2+-dependent activation of TRPM4 and icilin at best may act as an allosteric modulator for TRPM4 function. One cannot conclude from the current work that icilin is an agonist or even specifically a cooling agonist for TRPM4. Icilin is a cooling agonist for TRPM8, but it does not mean that if icilin modulates TRPM4 activity then it serves as a cooling agonist for TRPM4.

      We agree with these comments, and we believe that the intent of our statements in the manuscript are completely in line with this perspective. We never refer to icilin as a cooling agent for TRPM4 but rather refer to the cooling agent binding pocket in TRPM8 and how that appears to be conserved and functions in TRPM4 to modulate opening of the channel. We have carefully gone through the manuscript to refer directly to icilin by name (rather than as a cooling agent) when referring to its actions on TRPM4 to make sure there is no confusion.

      For the mutation data on A867G, Figure 4A-B, left panels, it looks like A867G has stronger Ca2+ sensitivity compared to the WT in the absence of icilin and the onset of current activation is faster than the WT, or this is simply due to the scale of the data figure are different between A867G and the WT. Overall the mutagenesis data are weak to support the conclusion that icilin binds to the S1-S4 pocket. The authors need to mutate more residues that are involved in direct interaction with icilin based on the available structural information, including but limited to residues equivalent to Y745 and H845 in human TRPM8.

      The A867G mutant does seem to promote opening by Ca2+ in the absence of icilin, and we now comment on this in the manuscript. Having said that, we have not carefully studied the concentration-dependence for activation by Ca2+ because at higher concentrations we see evidence of desensitization. We think Ca2+, icilin and depolarized voltages promote an open state of TRPM4 and the A867G does so as well.

      We respectfully disagree about the strength of mutagenesis results present in our manuscript. We present clear gain and loss of function for two mutants corresponding to influential residues within the cooling agent binding pocket of TRPM8. We agree that Y786 mutations would have been a valuable addition, and our plan was to include mutations of this residue. Unfortunately, both the Y786A and Y786H mutants exhibited rundown to repeated stimulation by Ca2+, making them challenging to obtain reliable results on their effects on modulation by icilin.

      The authors set out to study the conservation of the cooling agonist binding site in TRPM family, but only tested a synthetic cooling agonist icilin on TRPM4. In order to draw a broad conclusion as the title and the discussion have claimed, the authors need to more cooling compounds, including the most well-known natural cooling agonist menthol, and other cooling agonists such as WS-12 and/or C3, and test their effects on several TRPM channels, not just TRPM4. With the current data, the authors need to significantly tone down the claim of a conserved cooling agonist binding pocket in the TRPM family.

      We would have liked to broaden the scope to other ligands that modulate TRPM8 and we agree that including those data would certainly reinforce our conclusions. However, the first author recently moved on to a new faculty position and extending our findings would require enlisting another member of the lab and take away from their independent projects. We also do not agree that this is essential to support any of our conclusions. It is also important to keep in mind that icilin is a high-affinity ligand for TRPM8, such that weaker interactions with TRPM4 can still be readily observed. We think it is likely that lower affinity agonists like menthol might not have sufficient affinity to see activity in TRPM4. This scenario is not unlike our earlier experience with TRPV channels where we succeeded in engineering vanilloid sensitivity into TRPV2 and TRPV3 using the high affinity agonist resiniferatoxin (Zhang et al., 2016, eLife). In the case of TRPV2, another group had made the same quadruple mutant and failed to see activation by capsaicin even though resiniferatoxin also worked in their hands (see Fig. 2 in Yang et al., 2016, PNAS).

      On page 11, the authors suggest based on the current data, that TRPM2 and TRPM5 may also be sensitive to cooling agonists because the key residues are conserved. TRPM2 is the closest homolog to TRPM8 but is menthol-insensitive. There are studies that attempted to convert menthol sensitivity to TRPM2, for example, Bandell 2006 attempted to introduce S2 and TRP domains from TRPM8 into TRPM2 but failed to make TRPM2 a menthol-sensitive channel. The sequence conservation or structural similarity is not sufficient for the authors to suggest a shared cooling agonist sensitivity or even a common binding site in the TRPM2 and TRPM5 channels. Again, as pointed out above, the authors need to establish the actual activation of other TRPM channels by these agonists first, before proceeding to functionally probe whether other TRPM channels adopt a conserved agonist binding site.

      We are somewhat confused by these comments because we do not comment about whether cooling agents can activate TRPM2 or TRPM5. We simply analyzed the structures to make the point that the key residues in the cooling agent binding pocket of TRPM8 are conserved in these other TRPM channels. The Bandell paper is relevant, but it is also possible that they failed to uncover a relationship because they only used an agonist that has relatively low affinity for TRPM8. It would have been interesting to see what they might have found if they had used a high-affinity ligand like icilin instead of a low affinity ligand like menthol.

      Taken together, this current work presents data to show the modulatory effects of icilin on the Ca2+ dependent activation and voltage dependence of the TRPM4 channel.

      We agree.

      Reviewer #3 (Public Review):

      Summary:

      The family of transient receptor potential (TRP) channels are tetrameric cation selective channels that are modulated by a variety of stimuli, most notably temperature. In particular, the Transient receptor potential Melastatin subfamily member 8 (TRPM8) is activated by noxious cold and other cooling agents such as menthol and icilin and participates in cold somatosensation in humans. The abundance of TRP channel structural data that has been published in the past decade demonstrates clear architectural conservation within the ion channel family. This suggests the potential for unifying mechanisms of gating despite their varied modes of regulation, which are not yet understood. To address this question, the authors examine the 264 structures of TRP channels determined to date and observe a potential binding pocket for icilin in multiple members of the Melastatin subfamily, TRPM2, TRPM4, and TRPM5. Interestingly, none of the other Melastatin subfamily members had been shown to be sensitive to icilin apart from TRPM8. Each of these channels is activated by intracellular calcium (Ca2+) and a Ca2+ binding site neighbors the predicted pocket for icilin binding in all cryo-EM structures. The authors examined whether icilin could modulate the activation of TRPM4 in the presence of intracellular Ca2+. The addition of icilin enhances Ca2+-dependent activation of TRPM4, promotes channel opening at negative membrane potentials, and improves the kinetics of opening. Furthermore, mutagenesis of TRPM4 residues within the putative icilin binding pocket predicted to enhance or diminish TRPM4 activity elicit these behaviors. Overall, this study furthers our understanding of the Melastatin subfamily of TRP channel gating and demonstrates that a conserved binding pocket observed between TRPM4 and TRPM8 channel structures can function similarly to regulate channel gating.

      Strengths:

      This is a simple and elegant study capitalizing on a vast amount of high-resolution structural information from the TRP channel of ion channels to identify a conserved binding pocket that was previously unknown in the Melastatin subfamily, which is interrogated by the authors through careful electrophysiology and mutagenesis studies.

      Weaknesses:

      No weaknesses were identified by this reviewer.

      We appreciate the supportive comments of the review.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      I don't have any major asks, but a few questions did arise while reading your work.

      (1) You refer multiple times to the VSLD pocket as being "open to the cytoplasm". It is not clear if you are implying that compounds such as icilin access the pocket via the cytoplasm (e.g., permeate the membrane to the cytosol, and then enter the binding site?) Is there data to support this? Some clarification here would be helpful, and perhaps explain if there is any distinction between how calcium might enter the VSLD binding site vs hydrophobic compounds like icilin.

      This is an excellent point. Our reference to “open to the cytoplasm” was for Ca2+ ions and we have no evidence for how icilin enters the cooling agent binding pocket. We had tried to look for evidence that Ca2+ might trap icilin in TRPM4 but at the end of the day the results were not convincing enough to include in the manuscript. We have added data showing that icilin slows deactivation of TRPM4 after removing Ca2+, which is particularly evident in the A867G mutant, but this doesn’t inform on whether Ca2+ can trap icilin. We have added a statement about not knowing how icilin enters or leaves the cooling agent binding pocket in TRPM channels.

      (2) Icilin is referred to as a "cooling compound", but its cooling effects are dependent on its interactions with TRPM8. This might be something to clarify, as it might otherwise be understood that other TRPM channels that interact with icilin also mediate the sensing of cool temperatures.

      This is another excellent point and we have no reason to believe that icilin interacting with any TRPM channel other than TRPM8 mediates cooling sensations. We have added a statement to this effect in the discussion when considering actions of icilin that might be mediated by TRPM4 channels.

      Reviewer #2 (Recommendations For The Authors):

      (1) The title and statements in the results/discussion refer to icilin as a cooling agonist of TRPM4 and binds to a conserved "cooling agonist binding pocket", and the authors suggested a similar role and binding site for icilin in TRPM2 and TRPM5 channel. It is a too broad conclusion that is not fully supported by the current experimental data, which only shows icilin works as a modulator, not an agonist for TRPM4 channel. The authors should change the usage of cooling agonist or conserved cooling agonist binding pocket plus significantly tone down the conclusion of a conserved cooling agonist binding pocket, which is potentially misleading. Alternatively, if the authors insist on using cooling agonist in this context, they should establish the activation of TRPM4, TRPM2, and TRPM5 by icilin as the first step, because the current data only support icilin as a TRPM4 modulator but not an agonist.

      We respectfully don’t agree with this opinion. We show broad conservation of the cooling agent binding pocket in structures of many TRPM channels, and we chose one of them to test for a functional relationship. We think that the title accurately reflects the topic of the paper and does not specify the extent to which functional conservation has been demonstrated and we would like to keep it. The distinction between agonist and modulator is not even germane because icilin is not an agonist of TRPM8 either.

      (2) The manuscript will be strengthened if the authors test additional cooling compounds of TRPM8, including menthol, the menthol analog WS-12, and C3. More importantly, distinct from icilin, these three compounds do not depend on Ca2+ to activate the TRPM8 channel. Thus when testing these compounds on TRPM4, it may reduce the complication of the role of Ca2+, as TRPM4 channel itself requires Ca2+ for activation.

      We restate our response to this point in the public review…

      We would have liked to broaden the scope to other ligands that modulate TRPM8 and we agree that including those data would certainly reinforce our conclusions. However, the first author recently moved on to a new faculty position and extending our findings would require enlisting another member of the lab and taking away from their independent projects. We also do not agree that this is essential to support any of our conclusions. It is also important to keep in mind that icilin is a high-affinity ligand for TRPM8, such that weaker interactions with TRPM4 can still be readily observed. We think it is likely that lower affinity agonists like menthol might not have sufficient affinity to see activity in TRPM4 This scenario is not unlike our earlier experience with TRPV channels where we succeeded in engineering vanilloid sensitivity into TRPV2 and TRPV3 using the high affinity agonist resiniferatoxin (Zhang et al., 2016, eLife). In the case of TRPV2, another group had made the same quadruple mutant and failed to see activation by capsaicin even though resiniferatoxin also worked in their hands (see Fig. 2 in Yang et al., 2016, PNAS).

      (3) The manuscript will be strengthened if the authors test additional residues in the S1-S4 pocket that form direct interactions or are within interacting distances with icilin based on the cryo-EM structures.

      We restate our response to this point in the public review…

      We present clear gain and loss of function for two mutants corresponding to influential residues within the cooling agent binding pocket of TRPM8. We agree that Y786 mutations would have been a valuable addition and our plan was to include mutations of this residue. Unfortunately, both the Y786A and Y786H mutants exhibited rundown, making them challenging to obtain reliable results on their effects on modulation by icilin.

      Furthermore, the ambiguity in the icilin binding pose based on available TRPM8 structures complicates structure-based identification of the most important interacting residues in TRPM8, and we would have needed to functionally validate the effects of any novel mutations we identified in TRPM8 prior to testing them in TRPM4. Instead, we have based our mutagenesis on constructs that have been previously characterized to affect the sensitivity of TRPM8 to cooling agents. A systematic mutagenesis scan of TRPM8 residues predicted to interact differentially with icilin in the two different available binding poses would likely help clarify the true binding pose of icilin and would be an interesting future study.

      Reviewer #3 (Recommendations For The Authors):

      I enjoyed reading this manuscript. It was well-executed and written. It will be interesting to corroborate these findings with a cryo-EM structure of TRPM2, TRPM4, or TRPM5 in the presence of icilin.

      We agree and may pursue these in future studies. This would be particularly interesting given ambiguities in how icilin docks into TRPM8 in previously published structures.

      Minor comments/questions:

      Have the authors considered icilin accessibility to its binding pocket? In other words, could the presence of intracellular Ca2+ inhibit the accessibility of icilin to its binding pocket in TRPM4? It should be a straightforward experiment, I think it would be informative, and could further support the authors' conclusion of the location of the TRPM4 icilin binding pocket.

      We completely agree and we had tried to look for evidence that Ca2+ might trap icilin in TRPM4 but at the end of the day the results were not convincing enough to include in the manuscript. We have added data showing that icilin slows deactivation of TRPM4 after removing Ca2+, which is particularly evident in the A867G mutant, but this doesn’t inform on whether Ca2+ can trap icilin. We have added a statement about not knowing how icilin enters or leaves the cooling agent binding pocket in TRPM channels.

      Figures 7 and 8 are missing the 0 µM Ca2+ control trace in the presence of 25 µM icilin.

      All sample traces from Figures 7 and 8 are shown from a single cell for the sake of comparison (Likewise, the sample traces from Figures 3 and 4 come from a single cell, and the sample traces from Figures 5 and 6 come from a single cell). Unfortunately, we were unable to obtain data from an R901H mutant cell that contained all six conditions we wished to show, and there is no representative trace for 0 µM Ca2+ in the presence of 25 µM icilin for that cell.

      This is up to the discretion of the authors, but perhaps a better way to arrange the paper Figures would be to combine Figures 5-6 and Figures 7-8 and rearrange the data to place some in a supplementary figure (e.g. Figure 5-6 = Figure 5 and Figure 5 - Figure Supplement 1, Figure 7-8 = Figure 6 and Figure 6 - Figure Supplement 1).

      We carefully considered these suggestions and we appreciate the reviewers’ flexibility but would prefer to retain the original arrangement of data in the figures.

      Are there any mutations in the icilin binding pocket in TRPM4, and presumably TRPM2 and TRPM5, that are associated with human disease? This is a question that came to my mind and not one that needs to be addressed in the manuscript.

      This is an interesting point. There are quite a few disease-associated mutants within TRPM4 at positions corresponding to the cooling agent binding pocket in TRPM8. We could not see an appropriate place in the discussion where we could concisely bring this information in so we decided against commenting.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors want to determine the role of the sperm hook of the house mouse sperm in movement through the uterus. They use transgenic lines with fluorescent labels to sperm proteins, and they cross these males to C57BL/6 females in pathogen-free conditions. They use 2-photon microscopy on ex vivo uteri within 3 hours of mating and the appearance of a copulation plug. There are a total of 10 post-mating uteri that were imaged with 3 different males. They provide 10 supplementary movies that form the basis for some of the quantitative analysis in the main body figures. Their data suggest that the role of the sperm hook is to facilitate movement along the uterine wall.

      Strengths:

      Ex vivo live imaging of fluorescently labeled sperm with 2-photon microscopy is a powerful tool for studying the behavior of sperm.

      Weaknesses:

      The paper is descriptive and the data are correlations.

      The authors cannot directly test their proposed function of the sperm hook in sliding and preventing backward slipping.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      I suggest that the authors clearly state and explain in the manuscript that this study is limited with respect to the ability to "directly test the role of the sperm hook in facilitating movement along the uterine wall". I think that if they make this statement in the manuscript, perhaps at the end of the abstract, then the strength of evidence for their claims could be deemed as solid after re-review.

      We thank the reviewer again for the review process. We believe that our manuscript has improved considerably during the review process. Regarding the limitations and future work, we have added the following to the discussion section.

      “Further investigation of sperm behaviour inside the female reproductive tract or tissue mimicking microfluidic devices with real-time deep tissue imaging as in the current study, will provide valuable opportunities for a more comprehensive examination of both sperm-sperm and sperm-epithelium interactions in the female reproductive tract. While we have focused on observing sperm interactions for only natural healthy mice in this study, future works employing specifically targeted genetically modified knockout animal models will further elucidate and confirm the exact genetic and functional mechanisms that guide these interactions.”

      The revised manuscript is an improvement over the initial submission. I suggest that the authors mark the oviduct explicitly in Fig. 1A.

      The oviduct includes the ampulla, isthmus, and UTJ. We have additionally marked the oviduct in Fig. 1A, with according arrows and a box.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This is a well-written and detailed manuscript showing important results on the molecular profile of 4 different cohorts of female patients with lung cancer.

      The authors conducted comprehensive multi-omic profiling of air-pollution-associated LUAD to study the roles of the air pollutant BaP. Utilizing multi-omic clustering and mutation-informed interface analysis, potential novel therapeutic strategies were identified.

      Strengths:

      The authors used several different methods to identify potential novel targets for therapeutic interventions.

      Weaknesses:

      Statistical test results need to be provided in comparisons between cohorts.

      We appreciate your recognition and valuable suggestions.. We have revised statistical test results in the panels including: Fig. 3b, e and g.

      Reviewer #2 (Public Review):

      Summary:

      Zhang et al. performed a proteogenomic analysis of lung adenocarcinoma (LUAD) in 169 female never-smokers from the Xuanwei area (XWLC) in China. These analyses reveal that XWLC is a distinct subtype of LUAD and that BaP is a major risk factor associated with EGFR G719X mutations found in the XWLC cohort. Four subtypes of XWLC were classified with unique features based on multi-omics data clustering.

      Strengths:

      The authors made great efforts in performing several large-scale proteogenomic analyses and characterizing molecular features of XWLCs. Datasets from this study will be a valuable resource to further explore the etiology and therapeutic strategies of air-pollution-associated lung cancers, particularly for XWLC.

      Weaknesses:

      (1) While analyzing and interpreting the datasets, however, this reviewer thinks that authors should provide more detailed procedures of (i) data processing, (ii) justification for choosing methods of various analyses, and (iii) justification of focusing on a few target gene/proteins in the datasets for further validation in the main text.

      We appreciate your valuable feedback. In response to the suggestions for enhancing the manuscript's clarity, we have provided more detailed procedures in the main text and methods sections.

      (2) Importantly, while providing the large datasets, validating key findings is minimally performed, and surprisingly there is no interrogation of XWLC drug response/efficacy based on their findings, which makes this manuscript descriptive and incomplete rather than conclusive. For example, testing the efficacy of XWLC response to afatinib combined with other drugs targeting activated kinases in EGFR G719X mutated XWLC tumors would be one way to validate their datasets and new therapeutic options.

      We appreciate your suggestion. In reference to testing the efficacy of XWLC response to afatinib combined with drugs targeting kinases, we have planned to establish PDX and organoid models to validate the effectiveness of our therapeutic approach. Due to the extended timeframe required, we intend to present these results in a subsequent study.

      (3) The authors found MAD1 and TPRN are novel therapeutic targets in XWLC. Are these two genes more frequently mutated in one subtype than the other 3 XWLC subtypes? How these mutations could be targeted in patients?

      Thank you for your question. We have investigated the TPRN and MAD1 mutations in our dataset, identifying five TPRN mutations and eight MAD1 mutations. Among the TPRN mutations, XWLC_0046 and XWLC_0017 belong to the MCII subtype, XWLC_0012 belongs to the MCI subtype, and the subtype of the other three samples is undetermined, resulting in mutation frequencies of 1/16, 2/24, 0/15, and 0/13, respectively. Similarly, for the MAD1 mutations, XWLC_0115, XWLC_0021, and XWLC_0047 belong to the MCII subtype, XWLC_0055 containing two mutations belongs to the MCI subtype, and the subtype of the other three samples is undetermined, resulting in mutation frequencies of 1/16, 3/24, 0/15, and 0/13 across subtypes, respectively. Fisher’s test did not reveal significant differences between the subtypes.

      For targeting novel therapeutic targets such as MAD1 and TPRN, we propose a multi-step approach. Firstly, we advocate for conducting functional in vivo and in vitro experiments to verify their roles during cancer progression. Secondly, we suggest conducting small molecule drug screening based on the pharmacophore of these proteins, which may lead to the identification of potential therapeutic drugs. Lastly, we recommend testing the efficacy of these drugs to further validate their potential as effective treatments.

      (4) In Figures 2a and b: while Figure 2a shows distinct genomic mutations among each LC cohort, Figure 2b shows similarity in affected oncogenic pathways (cell cycle, Hippo, NOTCH, PI3K, RTK-RAS, and WNT) between XWLC and TNLC/CNLC. Considering that different genomic mutations could converge into common pathways and biological processes, wouldn't these results indicate commonalities among XWLC, TNLC, and CNLC? How about other oncogenic pathways not shown in Figure 2b?

      Thank you for your question. Based on the data presented in Fig. 2a, which encompasses all genomic mutations, it appears that the mutation landscape of XWLC bears the closest resemblance to TSLC (Fig. 2a). However, when considering oncogenic pathways (Fig. 2b) and genes (Fig. 2c), there is a notable disparity between the two cohorts. These findings suggest that while XWLC and TSLC exhibit similarities in terms of genomic mutations, they possess distinct characteristics in terms of oncogenic pathways and genes.

      Regarding the oncogenic signaling pathways, we referred to ten well-established pathways identified from TCGA cohorts. These members of oncogenic pathways are likely to serve as cancer drivers (functional contributors) or therapeutic targets, as highlighted by Sanchez-Vega et al. in 2018(Sanchez-Vega et al., 2018).

      (5) In Figure 2c, how and why were the four genes (EGFR, TP53, RBM10, KRAS) selected? What about other genes? In this regard, given tumor genome sequencing was done, it would be more informative to provide the oncoprints of XWLC, TSLC, TNLC, and CNLC for complete genomic alteration comparison.

      Thank you for your question and good suggestion. Building upon our previous study (Zhang et al., 2021), we found that EGFR, TP53, RBM10, and KRAS were the top mutated genes in Xuanwei lung cancer cohorts. Furthermore, we have included the mutation frequency of cancer driver genes (Bailey et al., 2018) across XWLC, TSLC, TNLC, and CNLC in Supplementary Table 2b.

      (6) Supplementary Table 11 shows a number of mutations at the interface and length of interface between a given protein-protein interaction pair. Such that, it does not provide what mutation(s) in a given PPI interface is found in each LC cohort. For example, it fails to provide whether MAD1 R558H and TPRN H550Q mutations are found significantly in each LC cohort.

      We appreciate your careful review. In Supplementary Table 11, we have provided significant onco_PPI data for each LC cohort, focusing on enriched mutations at the interface of two proteins. Our emphasis lies on onco_PPI rather than individual mutations, as any mutation occurring at the interface could potentially influence the function of the protein complex. Thus, our Supplementary Table 11 exclusively displays the onco_PPI rather than mutations. MAD1 R558H and TPRN H550Q were identified through onco_PPI analysis, and subsequent extensive literature research led us to focus specifically on these mutations.

      (7) Figure 7c and d are simulation data not from an actual binding assay. The authors should perform a biochemical binding assay with proteins or show that the mutation significantly alters the interaction to support the conclusion.

      We appreciate your suggestion. The relevant experiments are currently in progress, and we anticipate presenting the corresponding data in a subsequent study.

      Reviewer #3 (Public Review):

      Summary:

      The manuscript from Zhang et al. utilizes a multi-omics approach to analyze lung adenocarcinoma cases in female never smokers from the Xuanwei area (XWLC cohort) compared with cases associated with smoking or other endogenous factors to identify mutational signatures and proteome changes in lung cancers associated with air pollution. Mutational signature analysis revealed a mutation hotspot, EGFR-G719X, potentially associated with BaP exposure, in 20% of the XWLC cohort. This correlated with predicted MAPK pathway activations and worse outcomes relative to other EGFR mutations. Multi-omics clustering, including RNA-seq, proteomics, and phosphoproteomics identified 4 clusters with the XWLC cohort, with additional feature analysis pathway activation, genetic differences, and radiomic features to investigate clinical diagnostic and therapeutic strategy potential for each subgroup. The study, which nicely combines multi-modal omics, presents potentially important findings, that could inform clinicians with enhanced diagnosis and therapeutic strategies for more personalized or targeted treatments in lung adenocarcinoma associated with air pollution. The authors successfully identify four distinct clusters with the XWLC cohort, with distinct diagnostic characteristics and potential targets. However, many validating experiments must be performed, and data supporting BaP exposure linkage to XWLC subtypes is suggestive but incomplete to conclusively support this claim. Thus, while the manuscript presents important findings with the potential for significant clinical impact, the data presented are incomplete in supporting some of the claims and would benefit from validation experiments.

      Strengths:

      Integration of omics data from multimodalities is a tremendous strength of the manuscript, allowing for cross-modal comparison/validation of results, functional pathway analysis, and a wealth of data to identify clinically relevant case clusters at the transcriptomic, translational, and post-translational levels. The inclusion of phosphoproteomics is an additional strength, as many pathways are functional and therefore biologically relevant actions center around activation of proteins and effectors via kinase and phosphatase activity without necessarily altering the expression of the genes or proteins.

      Clustering analysis provides clinically relevant information with strong therapeutic potential both from a diagnostic and treatment perspective. This is bolstered by the individual microbiota, radiographic, wound healing, outcomes, and other functional analyses to further characterize these distinct subtypes.

      Visually the figures are well-designed and presented and for the most part easy to follow. Summary figures/histograms of proteogenomic data, and specifically highlighted genes/proteins are well presented.

      Molecular dynamics simulations and 3D binding analysis are nice additions.

      While I don't necessarily agree with the authors' interpretation of the microbiota data, the experiment and results are very interesting, and clustering information can be gleaned from this data.

      Weaknesses:

      (1) Statistical methods for assessing significance may not always be appropriate.

      We appreciate your suggestion. We have revised statistical test results in the panels including: Fig. 3b,e and g.

      (2) Necessary validating experiments are lacking for some of the major conclusions of the paper.

      Thank you for raising this point. However, we respectfully choose not to comment on this matter at present.

      (3) Many of the conclusions are based on correlative or suggestive results, and the data is not always substantive to support them.

      Thank you for raising this point. However, we respectfully choose not to comment on this matter at present.

      (4) Experimental design is not always appropriate, sometimes lacking necessary controls or large disparity in sample sizes.

      Thank you for raising this point. However, we respectfully choose not to comment on this matter at present.

      (5) Conclusions are sometimes overstated without validating measures, such as in BaP exposure association with the identified hotspot, kinase activation analysis, or the EMT function.

      Thank you for raising this point. However, we respectfully choose not to comment on this matter at present.

      Reviewer #1 (Recommendations For The Authors):

      (1) Please provide a justification for why only females were included in the study. I am concerned that the results obtained in this study can not be generalized as only females were included.

      We appreciate your suggestion. Lung cancer in never smokers (LCINS) accounts for approximately 25% of lung cancer cases (15% of lung cancer in men and 53% in women) (Parkin et al., 2005). Currently, the etiology and mechanisms of LCINS are not clear. Globally, LCINS shows remarkable gender and geographic variations, occurring more frequently among Asian women (Bray et al., 2018). Indoor coal burning for heating and cooking has been implicated as a risk factor for Chinese women, as they spend more time indoors (Mumford et al., 1987). Among men, the proportion of never smokers is lower, with less regional variation, and lung cancer in males is frequently caused by smoking. Thus, to better reveal the etiology and molecular mechanisms of LCINS, we collected data exclusively from female LCINS patients in the Xuanwei area, excluding potential confounding factors such as hormonal or smoking status. Our study specifically aims to uncover the etiology and mechanisms of LCINS in female patients, with future research planned to verify whether our conclusions can be generalized to LCINS in male patients.

      (2) "Therefore, the XWLC and TSLC cohorts are more explicitly influenced by environmental carcinogens, while the TNLC and CNLC cohorts may be more affected by age or endogenous risk factors." This statement in the results (starting line 142) does not have adequate support from the results. First, the average age in the 4 cohorts does not seem to be very different to me based on Figure 1b. if they are different, please provide statistical test results. Please make sure this statement is supported by other results, otherwise, I would recommend excluding it from the manuscript.

      We appreciate your suggestion. To gain biological insights, we frequently associate mutational signatures with factors such as age, defective DNA mismatch repair, or environmental exposures. These remain associations rather than causation. Thus, we agree with the suggestion to weaken the conclusion as follows:

      “Generally, exposure to tobacco smoking carcinogens (COSMIC signature 4) and chemicals such as BaP (Kucab signatures 49 and 20) were identified as the most significant contributing factors in both the XWLC and TSLC cohorts (Fig. 1f and 1g). In contrast, defective DNA mismatch repair (COSMIC signature ID: SBS6) was identified as the major contributor in both the TNLC and CNLC cohorts (Fig. 1h and 1i), with no potential chemicals identified based on signature similarities. Therefore, the XWLC and TSLC cohorts appear to be more explicitly associated with environmental carcinogens, while the TNLC and CNLC cohorts may be more associated with defective DNA mismatch repair processes.”

      (3) Please provide statistical test results in this subsection "The EGFR-G719X mutation, which is a hotspot associated with BaP exposure, possesses distinctive biological features " (Line 203) showing that the number of G719X is significantly different in XWLC.

      We appreciate your suggestion. Two-sided Fisher’s test was used to calculate p-values, which are labeled in Figure 3b.

      (4) "Analysis of overall survival and progression-free interval (PFI) revealed that patients with the G719X mutation had worse outcomes compared to other EGFR mutation subtypes " This statement (starting Line 232) should be supported by literature data.

      We appreciate your suggestion.

      In the Watanabe et al. post-hoc analysis, patients with the G719 mutation had significantly shorter OS with gefitinib compared to patients with the common mutations (Watanabe et al., 2014). We revised the sentences as following:

      “Analysis of overall survival and progression-free interval (PFI) revealed that patients with the G719X mutation had worse outcomes compared to other EGFR mutation subtypes (Fig. 3j and 3k) which was consistent with a previous study(Watanabe et al., 2014).”

      (5) I would suggest changing this statement to a "suggestion" as there is no experimental support for this, and mentioning that this requires further experimental validation with the suggested drugs "Therefore, a promising approach to overcome resistance in tumors with this mutation could involve combining afatinib, which targets activated EGFR, with FDA-approved drugs that specifically target the activated kinases associated with G719X. " (Line 260).

      We appreciate your suggestion. We change the sentences as following:

      "Therefore, we propose a potential approach to overcoming resistance in tumors with this mutation, which could involve combining afatinib, targeting activated EGFR, with FDA-approved drugs that specifically target the activated kinases associated with G719X. "

      (6) It is not clear to me how PPIs were integrated with missense. Please clarify the method.

      We appreciate your suggestion. To identify interactions enriched with missense mutations, we constructed mutation-associated protein–protein interactomes (PPIs). Initially, we downloaded protein-protein interactomes from Interactome INSIDER (v.2018.2) (Meyer et al., 2018). Subsequently, we identified interfaces carrying missense mutations by mapping mutation sites to PPI interface genomic coordinates using bedtools (v2.25.0)(Quinlan and Hall, 2010). Finally, we defined oncoPPI as those PPIs significantly enriched in interface mutations in either of the two protein-binding partners across individuals. For more details, please refer to the methods sections “Building mutation-associated protein–protein interactomes” and “Significance test of PPI interface mutations.”

      Reviewer #2 (Recommendations For The Authors):

      Regarding the tumor microbiota composition, it is not clear what the significance of these results would be. Are the specific microbiota associated with MC-IV more pathogenic than other species found in other subtypes? What are the unique features of these MC-IV microbiota? If these are difficult to address, this section could be removed from the manuscript.

      We appreciate your suggestion. This section is removed from the manuscript.

      Regarding the radiomic data section (Figure 6d and Extended Figure 6d), more description about the eight and five features (that are different between MC-II and others) would be helpful to better understand the importance and significance of these data.

      We appreciate your suggestion. We have added the description as following: “Features such as median and mean reflect average gray level intensity and Idmn and Gray Level Non-Uniformity measure the variability of gray-level intensity values in the image, with a higher value indicating greater heterogeneity in intensity values. These results suggest a denser and more heterogeneous image in the MC-II subtype.”

      Other minor comments:

      (1) If EGFR G719X is a known hotspot mutation associated with BaP, please cite previous literature.

      We appreciate your suggestion. Upon careful retrieval using "G719X" and "BaP" as keywords, we did not find previous literature discussing G719X as a known hotspot mutation associated with BaP.

      (2) In Figure 1d, it should be clearly written in the legend that tumor (T) and normal (N) tissue were analyzed.

      We appreciate your suggestion. We have clarified the figure legend of Figure 1d.

      (3) In Figure 1m, it is not obvious that EGFR pY1173 and pY1068 are more abundant in the Bap+S9 sample. Total EGFR bands are very faint. These western blots should be repeated and quantified.

      We appreciate your suggestion. We have removed Fig. 1m. After identifying the antibody with satisfactory performance, we will provide the revised results.

      (4) In Figure 2d, aren't the EGFR E746__A750del mutations more frequently found in CNLC, TSLC, and TNLC? (which is opposite to what the authors wrote in the text).

      We appreciate your suggestion. This mistake has been corrected.

      (5) In Figure 7f-i and Ext Figure 8, Does "CK" mean empty vector control? Then, it would be changed to "EV".

      We appreciate your suggestion. This mistake has been corrected.

      Reviewer #3 (Recommendations For The Authors):

      Methods:

      While previous work was referenced, a description of proteomics methods should still include: instrumentation, acquisition method, all software packages used, method for protein identification, method for protein quantification, how FDR was maintained for identification/quantification, definition of differentially expressed proteins, whether multiple testing correction was performed and if so what method.

      We appreciate your suggestion. We revised the description of label-free mass spectrometry methods accordingly.

      The paper would greatly benefit from brief methodological explanations throughout, as all methods are currently exclusively found in the supplementary information. This severely hampers the readability of the manuscript.

      Thank you for raising this point. However, we respectfully choose not to comment on this matter at present.

      Suggestions Throughout

      The paper would greatly benefit from proofreading/editing

      Line 157-158/Figure 1J for CYP1A1 displays protein concentrations while Figure 1K for AhR shows mRNA. Why this discrepancy? It would be preferable to show both mRNA and protein levels for both CYP1A1 and AhR. Also, there is a large discrepancy in the "n" between the normal and tumor groups, which makes the statistical comparison challenging. The AhR data is therefore unconvincing, and additional protein data is suggested. Thus the claim of significantly elevated AhR and CYP1A1 levels in tumors is not sufficiently supported and requires further investigation, both mRNA and protein, and with similarly sized sample groups.

      We appreciate your suggestion. We have thoroughly edited the revised manuscript, with all changes marked accordingly. Compared to mRNA level assessment, protein abundance is a better indicator of gene expression. Therefore, we reanalyzed the protein level of AhR for comparison and found no significant differences (Figure 1k). Additionally, the samples sequenced by mRNA-seq were not entirely consistent with those sequenced by label-free proteomics. The samples analyzed by different methods are shown in Figure 1d.

      Line 159 Figure 1I There is no control for the data serum data presented here. What are the serum levels for individuals not residing in the Xuanwei? It is unclear whether this represents elevated BPDE serum levels without appropriate controls. Thus nothing insightful can be derived from this data.

      We appreciate your suggestion. We have deleted the results concerning BPDE serum detection in the revised manuscript.

      Line 164 The statement "sites such as Y1173 and Y1068 of EGFR were more phosphorylated in BaP treated cells" is not sufficiently supported by the presented data and cannot be made. Figure 1M has no quantitation, no indication of "n" or whether this represents a single experiment or one validated with repeating. The western blot is also cropped with no indication of molecular weight or antibody specificity. This data is NOT convincing. The antibody signal is very weak, and not convincing with cropped blots. An updated figure, with an uncropped blot, and quantitation with multiple n's and statistical comparison is required. I am not sure the Wilcoxon rank sum test is appropriate to test significance in j-l. The null hypothesis should not be equal medians but equal means based on the experimental design.

      We appreciate your suggestion. We have removed Fig. 1m. After identifying the antibody with satisfactory performance, we will provide the revised results.

      Line 181 phrase "significant differences" should not be used unless making a claim about statistical significance.

      We appreciate your suggestion. We change “significant differences” to “noticeable differences”.

      Line 197: "The blood serum assay provided support..." As noted above this claim is not sufficiently supported by the presented data and requires more complete investigation.

      We appreciate your suggestion. This conclusion has been deleted in the revised manuscript.

      Line 219: Requires proofreading/editing.

      We appreciate your suggestion. We have thoroughly edited the revised manuscript, with all changes marked accordingly.

      Line 220: appears to have a typo and should read GGGC>GTGC

      We appreciate your suggestion. This mistake has been corrected in the revised manuscript.

      Line 223/224 Figure 3e-h. Again there is a large disparity between the n's of each group. Despite the WT having the highest frequency in the XWLC study population, it has only n=5 when comparing the protein and phosphosite for MAPKs. There is also no explanation for what the graph symbols indicate, what statistical test was performed to determine the statistical significance of the presented differences, and between which specific groups that significance exists. Thus, it is challenging to ascertain whether there are relevant differences in the MAPK signaling components.

      We appreciate your suggestion. We added the description of “N, number of tumor samples containing corresponding EGFR mutation” to the figure legend. p-values were calculated with a two-tailed Wilcoxon rank sum test, and p<0.05 was labeled on Figures 3e-i.

      Figure 3I Good figure. However, it would be beneficial to provide validation with Western Blotting for a few of these substrates using pospho-specific antibodies. It is suggested that this experiment be added.

      We appreciate your suggestion. Figure 3I showed the comparison of patients’ ages among subtypes. I guess you mean Figure 3g and Figure 3h. The relevant experiments are currently underway, and we will provide the corresponding data in the next revised version.

      Figure 4b. Very compelling figure.

      We appreciate your suggestion.

      Line 276: The AhR and CYP1A1 data presented earlier was not convincing, and CYP1A1 and AhR cannot be responsibly used as indicators of BaP activity based on potential. This is not an appropriate application.

      We appreciate your suggestion. CYP1A1 and AhR are two key regulators involved in BaP metabolism and signaling transduction, respectively. However, after examining the protein expression of AhR between tumor and normal tissues, we found no significant differences (Fig. 1k) and CYP1A1 has been proven to be highly expressed in tumor samples (Fig. 1j). Thus, we mainly examined the expression of CYP1A1 among the four subgroups. We changed our description as follows:

      “As CYP1A1 is a key regulator involved in BaP metabolism and has been proven to be highly expressed in tumor samples (Fig. 1j), we next examined the expression of CYP1A1 among the four subgroups to evaluate their associations with air pollution.”

      Figure 4d. Here it is AhR protein used rather than mRNA measured earlier. What is the explanation for this change?

      We appreciate your suggestion. As there was no significant differences of the protein expression of AhR between tumor and normal tissues (Fig. 1k), we deleted the expression comparison of AhR among subtypes.

      Line 281 "Moderately elevated expression level of AhR" is not supported by the presented data and should be removed.

      We appreciate your suggestion. We have deleted the result of comparison of AhR among subtypes.

      Figure 4: There is no indication or explanation of how the protein abundance is being measured. Is this from the proteomics (MS) approaches, by ELISA or by Western? If it is simply by MS then validation by another method is preferable. The data presented in Figure 4 do not adequately support the claim that MC-II subtype is more strongly associated with BaP exposure. What statistical test is used in 4F? Why is the n in the MC-II group, which is the highlighted group of interest nearly double the other groups?

      We appreciate your suggestion. Fig. 4e is derived from the proteomics data. The two-tailed Wilcoxon rank sum test was used to calculate p-values in panels c and e.

      Figure 4g: At least one or two of these should be validated by Western Blot or targeted MS.

      We appreciate your suggestion. The relevant experiments are currently underway, and we will provide the corresponding data in the next revised version.

      Figure 5a: Assuming these were also measured via proteomic analysis, how do their expression patterns compare across the different omics modes?

      Thank you for your suggestion. Figure 5 integrates transcriptomics (19182 genes), proteomics (9152 genes), and phosphoproteomics (5733 genes) data. In general, we utilized transcriptomics data to identify unique or distinct pathways among subgroups. Furthermore, proteomics and phosphoproteomics data were employed to validate key gene expressions, as they encompass fewer genes compared to transcriptomics data.

      For instance, in Fig. 5a-d, we observed higher expression levels of mesenchymal markers such as VIM, FN1, TWIST2, SNAI2, ZEB1, ZEB2, and others in the MC-IV subtype using transcriptomics data (Fig. 5a). Additionally, we calculated epithelial-mesenchymal transition (EMT) scores using the ssGSEA enrichment method based on protein levels and conducted GSEA analysis using transcriptomics data (Fig. 5b). Furthermore, using proteomics data, we evaluated Fibronectin (FN1), an EMT marker that promotes the dissociation, migration, and invasion of epithelial cells, at the protein level (Fig. 5c), and β-Catenin, a key regulator in initiating EMT, also at the protein level (Fig. 5d). Overall, our findings indicate that the MC-IV subtype exhibits an enhanced EMT capability, which may contribute to the high malignancy observed in this subtype.

      Line 314: Not compared with MCI, which appeared to be much lower at the mRNA level. Is there an explanation for this difference?

      We appreciate your suggestion. FN1 expression is lowest in MCI at the protein level (Fig. 5c). However, at the transcriptome level, FN1 expression is lowest in the MCIII subtype (Fig. 5a). You may wonder why these results are inconsistent. Discrepancies between mRNA and protein expression levels are common, and previous study showed that about 20% genes had a statistically significant correlation between protein and mRNA expression in lung adenocarcinomas (Chen et al., 2002). Post-transcriptional mechanisms, including protein translation, post-translational modification, and degradation, may influence the level of a protein present in a given cell or tissue. In this situation, we focused on identifying distinct biological pathways in each subgroup, supported by multi-omics data.

      Line 321: MC-IV *potentially* possesses an enhanced EMT capability. This statement cannot be conclusively made.

      We appreciate your suggestion. We changed our description as: “Collectively, our findings demonstrate that the MC-IV subtype is associated with enhanced EMT capability, which may contribute to the high malignancy observed in this subtype.”

      Lines 325 and 327 indicated dysregulation of cell cycle processes and activation of CDK1 and CDK2 pathways based on KSEA analysis which is closely linked to cell cycle regulation as two separate pieces of evidence. However, these are both drawn from the phosphoproteomics, and likely indicate conclusions drawn from the same phosphosite data. Said another way, if phosphosite data indicates differences in kinases linked to cell cycle regulation then you would also expect phosphosite data to indicate dysregulation of cell cycle.

      We appreciate your suggestion. You mentioned that Fig. 4f and Fig. 5e redundantly prove that the CDK1 and CDK2 pathways are dysregulated. However, KSEA analysis in Fig. 4f estimates changes in kinase activity based on the collective phosphorylation changes of its identified substrates (Wiredja et al., 2017). In contrast, Fig. 5e directly evaluates the abundance of protein and phosphosite levels of CDK1 and CDK2 across subtypes. These analyses mutually confirm each other rather than being redundant.

      Line 413/Figure 6b: While there may be a trend displayed by the figure, it is not convincing enough to state that MC-IV shows a conclusively distinguishable bacterial composition. Too much variability exists within groups MC-II and MC-III. However, it does show that MC-IV and MC-II have consistent composition within their groups, and that is interesting.

      We appreciate your suggestion. We have deleted the analysis of bacterial composition across subtypes.

      Figure 6: Overall very nice figure, with intriguing diagnostic potential. See the above note on 6a-b interpretation.

      We appreciate your suggestion. We have deleted the analysis of bacterial composition across subtypes, including Fig. 6a-6c.

      Figure 7c-f better labeling of the panels will aid reader comprehension.

      We appreciate your suggestion. Necessary labeling has been added to Fig. 7c-f to enhance comprehension.

      Figure 7 panel order is confusing, switching from right to left to vertical. Rearranging to either left to right or vertical would help orient readers.

      We appreciate your suggestion. We have adjusted the order of Fig. 7 and extended Fig. 8 panel.

      Figure 7 legend i: should read Cell colony* assay

      We appreciate your suggestion. We have corrected this mistake in the revised manuscript.

      The Discussion is very brief. While it includes a discussion of the potential impact of the study, it does not include an analysis of the caveats/drawbacks of the study. A more thorough discussion of other studies focusing on the impacts of BaP exposure is also suggested as this was a highlighted point by the authors.

      We appreciate your suggestion. we have added discussion about the associations between BaP exposure and lung cancer and also talked about the shortcomings of our study as followings:

      “Mechanistically, Qing Wang showed that BaP induces lung carcinogenesis, characterized by increased inflammatory cytokines, and cell proliferative markers, while decreasing antioxidant levels, and apoptotic protein expression(Wang et al., 2020). In our study, we used clinical samples and linked the mutational signatures of XWLC to the chemical compound BaP, which advanced the etiology and mechanism of air-pollution-induced lung cancer. In our study, several limitations must be acknowledged. Firstly, although our multi-omics approach provided a comprehensive analysis of the subtypes and their unique biological pathways, the sample size for each subtype was relatively small. This limitation may affect the robustness of the clustering results and the identified subtype-specific pathways. Larger cohort studies are necessary to confirm these findings and refine the subtype classifications. Secondly, although our study advanced the understanding of air-pollution-induced lung cancer by using clinical samples, the reliance on epidemiological data in previous studies introduces potential confounding factors. Our findings should be interpreted with caution, and further mechanistic studies are warranted to establish causal relationships more definitively. Thirdly, our in silico analysis suggested potential approach to drug resistence in G719X mutations. However, these predictions need to be validated through extensive in vitro and in vivo experiments. The reliance on computational models without experimental confirmation may limit the clinical applicability of these findings.”

      References:

      Bailey, M. H., Tokheim, C., Porta-Pardo, E., Sengupta, S., Bertrand, D., Weerasinghe, A., Colaprico, A., Wendl, M. C., Kim, J., Reardon, B., et al. (2018). Comprehensive Characterization of Cancer Driver Genes and Mutations. Cell 173, 371-385 e318.

      Bray, F., Ferlay, J., Soerjomataram, I., Siegel, R. L., Torre, L. A., and Jemal, A. (2018). Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin 68, 394-424.

      Chen, G., Gharib, T. G., Huang, C. C., Taylor, J. M., Misek, D. E., Kardia, S. L., Giordano, T. J., Iannettoni, M. D., Orringer, M. B., Hanash, S. M., and Beer, D. G. (2002). Discordant protein and mRNA expression in lung adenocarcinomas. Mol Cell Proteomics 1, 304-313.

      Meyer, M. J., Beltran, J. F., Liang, S., Fragoza, R., Rumack, A., Liang, J., Wei, X., and Yu, H. (2018). Interactome INSIDER: a structural interactome browser for genomic studies. Nat Methods 15, 107-114.

      Mumford, J. L., He, X. Z., Chapman, R. S., Cao, S. R., Harris, D. B., Li, X. M., Xian, Y. L., Jiang, W. Z., Xu, C. W., Chuang, J. C., and et al. (1987). Lung cancer and indoor air pollution in Xuan Wei, China. Science 235, 217-220.

      Parkin, D. M., Bray, F., Ferlay, J., and Pisani, P. (2005). Global cancer statistics, 2002. CA Cancer J Clin 55, 74-108.

      Quinlan, A. R., and Hall, I. M. (2010). BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841-842.

      Sanchez-Vega, F., Mina, M., Armenia, J., Chatila, W. K., Luna, A., La, K. C., Dimitriadoy, S., Liu, D. L., Kantheti, H. S., Saghafinia, S., et al. (2018). Oncogenic Signaling Pathways in The Cancer Genome Atlas. Cell 173, 321-337 e310.

      Wang, Q., Zhang, L., Huang, M., Zheng, Y., and Zheng, K. (2020). Immunomodulatory Effect of Eriocitrin in Experimental Animals with Benzo(a)Pyrene-induced Lung Carcinogenesis. J Environ Pathol Toxicol Oncol 39, 137-147.

      Watanabe, S., Minegishi, Y., Yoshizawa, H., Maemondo, M., Inoue, A., Sugawara, S., Isobe, H., Harada, M., Ishii, Y., Gemma, A., et al. (2014). Effectiveness of gefitinib against non-small-cell lung cancer with the uncommon EGFR mutations G719X and L861Q. J Thorac Oncol 9, 189-194.

      Wiredja, D. D., Koyuturk, M., and Chance, M. R. (2017). The KSEA App: a web-based tool for kinase activity inference from quantitative phosphoproteomics. Bioinformatics 33, 3489-3491.

      Zhang, H., Liu, C., Li, L., Feng, X., Wang, Q., Li, J., Xu, S., Wang, S., Yang, Q., Shen, Z., et al. (2021). Genomic evidence of lung carcinogenesis associated with coal smoke in Xuanwei area, China. Natl Sci Rev 8, nwab152.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public reports:

      In the public reports there is only one point we would like to discuss. It concerns our use of a computational model to analyse spatial tumour growth. Citing from the eLife assessment, which reflects several comments of the referees:

      The paper uses published data and a proposed cell-based model to understand how growth and death mechanisms lead to the observed data. This work provides an important insight into the early stages of tumour development. From the work provided here, the results are solid, showing a thorough analysis. However, the work has not fully specified the model, which can lead to some questions around the model’s suitability.

      The observables we use to determine the (i) growth mode and the (ii) dispersion of cells are modelindependent. The method to determine the (iii) rate of cell death does not use a spatial model. Throughout, our computational model of spatial growth is not used to analyze data. Instead, it is used to check that the observables we use can actually discriminate between different growth modes given the limitations of the data. We have expanded the description of the computational model in the revised version, and have released our code on Github. However, the conclusions we reach do not rely on a computational model. Instead, where we estimate parameters, we use population dynamics as described in section S5. The other observables are parameter free and model-independent. We view this as a strength of our approach.

      Recommendations for the authors:

      Reviewer #1:

      (1.1) In Figure 1, the data presented by Ling et al. demonstrate a distinctive “comb” pattern. While this pattern diverges from the conventional observations associated with simulated surface growth, it also differs from the simulated volume growth pattern. Is this discrepancy attributable to insufficient data? Alternatively, could the emergence of such a comb-like structure be feasible in scenarios featuring multiple growth centers, wherein clones congregate into spatial clusters?

      We are unsure what you are referring to. One possibility is you refer to the honey-comb structure formed by the samples of the Ling et al. data shown in Fig. 1A of the main text. This is an artefact arising from the cutting of the histological cut into four quadrants, see Fig. S1 in the SI of Ling et al. The perceived horizontal and vertical “white lines” in our Fig. 1A stems from the lack of samples near the edges of these quadrants. We have added this information to the figure caption.

      An alternative is you are referring to the peaks in Fig 2A of the main text. The three of these peaks indeed stem from individual clones. We have placed additional figures in the SI (S2 B and S2 C) to disentangle the contribution from different clones. The peaks have a simple explanation: each clone contributes the same weight to the histogram. If a clone only has few offspring, this statistical weight is concentrated on a few angles only, see SI Figure S2 B.

      (1.2) I am not sure why there are two sections about “Methods” in the main text: Line 50 as well as Line 293. Furthermore, the methods outlined in the main paper lack the essential details necessary for readers to navigate through the critical aspects of their analysis. While these details are provided in the Supplementary Information, they are not adequately referenced within the methods section of the main text. I would recommend that the authors revise the method sections of the main text to include pertinent descriptions of key concepts or innovations, while also directing readers to the corresponding supplementary method section for further elucidation.

      We have merged the Section “Materials and Methods” at the end of the main text with the SI description of the data in SI 4.2 and placed a reference to this material in the main body.

      (1.3) The impact of the particular push method (proposed in the model) on the resultant spatial arrangement of clones remains unclear. For instance, it’s conceivable that employing a different pushing method (for example, with more strict constraints on direction) could yield a varied pattern of spatial diversity. Furthermore, there is ambiguity regarding the criteria for determining the sequence of the queue housing overlapping cells.

      Regarding the off-lattice dynamics we use, there are indeed many variants one could use. In nonexhaustive trials, we found that the details of the off-lattice dynamics did not affect the results. The reason may be that at each computational step, each cell only moves a very small amount, and differences in the dynamics tend to average out over time.

      We deliberately do not give constraints on the direction. Such constraints emerge in lattice-based models (when preferred directions arise from the lattice symmetry), but these are artifacts of the lattice.

      At cell division the offspring is placed in a random direction next to the parent regardless of whether this introduces an overlap. Cells then push each other along the axis connecting their two centers of mass – unlike in lattice based models a sequence of pushes does not propagate through the tumor straight away but sets off of a cascade of pushes. Equal pushing of two cells (i.e. two initial displacements as opposed to pushing one of the two) results in the same patterns of directed, low dispersion surface and undirected, high dispersion volume growth but is much harder computationally as it reintroduces overlaps that have been resolved in the previous step.

      We have rewritten the description of the pushing queue in the SI Section 1. The choice of the pushing sequence is somewhat arbitrary but we found that it also has no noticable effect on the growth mode. Maybe putting it in contrast to depth-first approaches helps to illustrate this: We tried two queueing schemes for iterating through overlapping cells, width-first and depth-first. In both cases, we begin by scanning a given cell’s (the root’s) neighborhood for overlaps and shuffle the list of overlapping neighbours. In a width-first approach we then add this list to the queue. Subsequent iterations append their lists of overlapping cells to the queue, such that we always resolve overlaps within the neighborhood of the root first. A depth-first approach follows a sequence of pushes by immediately checking a pushed cell’s neighborhood for new overlaps and adding these to the front of the queue (which works more like a stack then). This can be efficiently implemented by recursion but has no noticeable performance advantage and results in the same patterns of directed, low dispersion surface and undirected, high dispersion volume growth. In our opinion the width-first approach of first resolving overlaps in the immediate neighborhood is more intuitive, which is why we adopted it for our simulation model.

      (1.4) For the example presented in S5.1, how can the author identify from genomic data that mutation 3 does not replace its ancestral clade mutation 2? In other words, if mutation 2, 3 and 4 are linked meaning clone 4 survives but 2 and 3 dies, how does one know if clone 3 dies before clone 2? I understand that this is a conceptual example, but if one cannot identify this situation from the real data, how can the clade turnover be computed?

      Thank you for this comment, which points to an error of ours in the turnover example of the SI: Clade 3 does in fact replace 2 and contributes to the turnover! (The algorithm correctly annotated clade 3 as orphaned and computes a turnover of 3/15 for this example). We have corrected this.

      In this example, it does not matter for the clade turnover whether clone 3 dies before clone 2. As long as its ancestor (clone 2) becomes extinct it adds to the clade turnover. The term “replaces” applies to the clade of 3 which has a surviving subclone and thereby eventually replaces clade 2. The clade turnover its solely based on the presence of the mutations (which define their clade) and not on the individual clones.

      (1.5) After reviewing reference 24 (Li et al.), I noticed that the assertions made therein contradict the findings presented in S3 (Mutation Density on Rings). Specifically, Li et al. state that “peripheral regions not only accumulated more mutations, but also contained more changes in genes related to cell proliferation and cell cycle function” (Page 6) and “Phylogenetic trees show that branch lengths vary greatly with the long-branched subclones tending to occur in peripheral regions” (Page 4). However, upon re-analysis of their data, the authors demonstrated a decrease in mutation density near the surface. It is crucial to comprehend the underlying cause of such a disparity.

      The reason for this disparity is the way Li et al. labelled samples as belonging to peripheral or central regions of the tumour. We have added a new figure in the SI to show this: Fig. S14 shows the number of mutations found in samples of Li et al. against their distances from the centre, along with the classification of samples as center/periphery given in Li et al. In the case of tumor T1, the classification of a sample in reference Li et al. does not agree with the distance from the center: samples classified as core are often more distant from the center than those classified as peripheral. Furthermore, Lewinsohn et al. (see below) show in their Fig. 5 that samples classified as ‘center’ by Li all fall into a single clade, and we believe this affects all results derived from this classification. For this reason, we do not consider the classification in reference 24 (Li et al.) further. We now briefly discuss this in Section S3.3.

      (1.6) The authors consider coinciding mutations to occur when offspring clades align with an ancestral clade. Nevertheless, since multiple mutations can arise simultaneously in a single generation (such as kataegis), it becomes essential to discern its impact on clade turnover and, consequently, the estimation of d/b.

      The mutational signatures found here show no sign of kataegis. Also, the number of polymorphic sites in the whole-exome data is small and the mutations are uniformly spread across the exome. The point is well taken, however, the method requires single mutations per generation. In practice, this can be achieved by subsampling a random part of the genome or exome (see [45]). We tested this point by processing the data from only a fraction of the exome; this did not change the results. In particular, Figure S30 shows the turnover-based inference for different subsampling rates L of the Ling et al. data. Subsampling of sites reduces the exome-wide mutation rate, the inferred rate scales linearly with L, as expected.

      (1.7) I could not understand Step 2 in Section S2.1, an illustration may be helpful.

      We have added figure S2 explaining the directional angle algorithm to Section S2.1 in the supplementary information.

      (1.8) Figure S2, does a large rhoc lead to volume growth rather than surface growth, not the other way around?

      Thank you for catching this mix-up!

      Reviewer #2

      I do have a few minor comments/questions, but I am confident the authors will be able to address them appropriately.

      (2.1) Line 56: I am not sure what the units of “average read depth 74X” is in terms of SI units?

      This number gives the number of sequence reads covering a particular nucleotide and is dimensionless. We have added this information.

      (2.2) Lines 63 - 68: I am unsure what is meant by the terms “T1 of ca.” and “T2 of ca.”. Can these also be explained/defined please?

      These refer to the approximate (circa) diameters of tumor 1 and tumor 2 in the data by Li et al. We have expanded the abbreviations.

      (2.3) Line 69: I would like to see a more extensive description of the cell-based model here in the main text, such as how do the cells move. Moreover, do cells have a finite reach in space, do they have a volume/area?

      We have expanded the model description in the main body of the paper and placed information there that previously was only in the SI.

      (2.4) Line 76: You have said cells can “push” one another in your model. Do they also “pull” one another? Cell adhesion is know to contribute to tumour integrity - so this seems important for a model of this nature.

      We have not implemented adhesive forces between pairs of cells so far. This would cause a higher pressure under cell growth (which can have important physiological consequences). However, the hard potential enforcing a distance between adjacent cells would still lead to cells pushing each other apart under population growth, so we expect to see the dispersion effect we discuss even when there is adhesion.

      (2.5) Line 80-81: “due to lack of nutrient”. Is nutrient included in this work? It is my understanding it is not. No problem if so, it is just that this line makes it seem like it is and important. If it is not, the authors should mention this in the same sentence.

      Thank you for pointing out this source of misunderstanding, your understanding is correct and we have modified the text to remove the ambiguity.

      (2.6) Line 94-95: Since you are interested in tissue growth, recent work has indicated how the cell boundary (and therefore tissue boundary) description influences growth. Please also be sure to indicate this when you describe the model.

      We presume you refer to the recent paper by Lewinsohn et al. (Nature Ecology and Evolution, 2023), which reports a phylogenetic analysis based on the Li et al. data. Lewinsohn et al. find that cells near the tumour boundary grow significantly faster than those in the tumour’s core. This is at variance with what we find; we were not aware of this paper at the time of submission. We now refer to this paper in the main text, and also have included a new section S3.4 in the SI accounting for this discrepancy. If you refer to a different paper, please let us know.

      Briefly, we repeat the analysis of Lewinsohn et al., using their algorithm on artificial data generated by our model under volume growth. Samples were placed precisely like they were placed in the tumor analyzed by Li et al. We find that, even though the data was generated by volume growth, the algorithm of Lewinsohn et al. finds a signal of surface growth, in many cases even stronger compared to the signal which Lewinsohn et al. find in the empirical data. We have added subsection S3.4 with new figure S15 in the Supplementary Information.

      (2.7) Line 107: “thus no evidence for enhanced cell growth near the edge of the tumour”. It is unclear to me how this tells us information relative to the tumour edge. It seems to me this is an artifact that at the edge of the tumour, there are less cells to compare with? Could you please expand on this a bit?

      The direction angles tell us if new mutations arise predominantly radially outwards. With this observable, surface growth would lead to a non-uniform distribution of these angles even if we restrict the analysis to samples from the interior of the tumor (which, under surface growth, was once near the surface). So the effect is not linked to fewer cells for comparison. Also, we have checked the direction angles in simulations under different growth modes with the samples placed in the same way as in the data (see Figs. S3 and S4 right panels). We have expanded the text in the main text, section Results accordingly.

      (2.8) I really enjoyed the clear explanation between lines 119 and 122 regarding cell dispersion!

      Thank you!

      (2.9) Figure 2B: Since you are looking at a periodic feature in theta, I would have expected the distribution to be periodic too, and therefore equal at theta=-180=180. Can you explain why it is different, please? Interestingly, you simulated data does seem to obey this!

      The distribution of theta is periodic but the binning and midpoints of bins were chosen badly. We have replotted the diagram with bin boundaries that handle the edge-points -180/180 correctly. Thank you for pointing this out.

      (2.10) Figure 3B: This plot does not have a title. Also, what do the red vertical lines in plots 3B, 3C and 3D indicate?

      We have added the title. The red lines indicate the expectation values of the distributions.

      (2.11) Figure 4: I am unsure how to read the plot in 4B. Also, what does the y-axis represent in 4C and 4D?

      We have added explanations for 4B and have placed the labels for 4C and 4D in the correct position on the y-axes.

      (2.12) Lines 194-199: you discuss your inferred parameters here, but you do not indicate how you inferred these parameters. May you please briefly mention how you inferred these, please?

      These were inferred using the turnover method explained in the paragraph above, we have expanded the information. A full account is given in the SI Section S5.

      (2.13) Line 258-260: “... mutagen (aristolochic acid) found in herbal traditional Chinese medicine and thought to cause liver cancer.” I do not see what this sentence adds to the work. Could you please be clearer with the claim you are making here?

      Mutational signatures allow to infer underlying mutational processes. The strongest signature found in the data is associated with a mutagen that has in the past been used in traditional Chinese medicines. The patients from whom the tumours were biopsied were from China, so past exposure to this potent mutagen is possible. We are not making a big claim here, the mutational signature of aristolochic acid and its cancerogenic nature has been well studied and is referenced here. The result is interesting in our context because in one of the datasets (Li et al.) the signature is present in early (clonal) mutations but absent in later ones, allowing to make inferences from present data on the past. We have added the information that the patients were from China.

      (2.14) In your Supplementary Information, S1, I believe your summation should not be over i, as you state in the following it is over cells within 7 cell radii. Please fix this by possibly defining a set which are those within 7 cell radii.

      We have done this.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This study provides an incremental advance to the scavenger receptor field by reporting the crystal structures of the domains of SCARF1 that bind modified LDL such as oxidized LDL and acylated LDL. The crystal packing reveals a new interface for the homodimerization of SCARF1. The authors characterize SCARF1 binding to modified LDL using flow cytometry, ELISA, and fluorescent microscopy. They identify a positively charged surface on the structure that they predict will bind the LDLs, and they support this hypothesis with a number of mutant constructs in binding experiments.

      Strengths:

      The authors have crystallized domains of an understudied scavenger receptor and used the structure to identify a putative binding site for modified LDL particles. An especially interesting set of experiments is the SCARF1 and SCARF2 chimeras, where they confer binding of modified LDLs to SCARF2, a related protein that does not bind modified LDLs, and use show that the key residues in SCARF1 are not conserved in SCARF2.

      Weaknesses:

      While the data largely support the conclusions, the figures describing the structure are cursory and do not provide enough detail to interpret the model or quality of the experimental X-ray structure data. Additionally, many of the flow cytometry experiments lack negative controls for non-specific LDL staining and controls for cell surface expression of the SCARF constructs. In several cases, the authors interpret single data points as increased or decreased affinity, but these statements need dose-response analysis to support them. These deficiencies should be readily addressable by the authors in the revision.

      The paper is a straightforward set of experiments that identify the likely binding site of modified LDL on SCARF1 but adds little in the way of explaining or predicting other binding interactions. That a positively charged surface on the protein could mediate binding to LDL particles is not particularly surprising. This paper would be of greater importance if the authors could explain the specificity of the binding of SCARF1 to the various lipoparticles that it does or does not bind. Incorporating these mutants into an assay for the biological role of SCARF1 would be powerful.

      Reviewer #2 (Public Review):

      Summary:

      The manuscript by Wang and colleagues provided mechanistic insights into SCARF1 and its interactions with the lipoprotein ligands. The authors reported two crystal structures of the N-terminal fragments of SCARF1 ectodomain (ECD). On the basis of the structural analysis, the authors further investigated the interactions between SCARF1 and modified LDLs using cell-based assays and biochemical experiments. Together with the two structures and supporting data, this work provided new insights into the diverse mechanisms of scavenger receptors and especially the crucial role of SCARF1 in lipid metabolism.

      Strengths:

      The authors started by determining the crystal structures of two fragments of SCARF1 ECD. The superposition of the two high-resolution structures, together with the predicted model by AlphaFold, revealed that the ECD of SCARF1 adopts a long-curved conformation with multiple EGF-like domains arranged in tandem. Non-crystallographic and crystallographic two-fold symmetries were observed in crystals of f1 and f2 respectively, indicating the formation of SCARF1 homodimers. Structural analysis identified critical residues involved in dimerization, which were validated through mutational experiments. In addition, the authors conducted flow cytometry and confocal experiments to characterize cellular interactions of SCARF1 with lipoproteins. The results revealed the vital role of the 133-221aa region in the binding between SCARF1 and modified LDLs. Moreover, four arginine residues were identified as crucial for modified LDL recognition, highlighting the contribution of charge interactions in SCARF1-lipoprotein binding. The lipoprotein binding region is further validated by designing SCARF1/SCARF2 chimeric molecules. Interestingly, the interaction between SCARF1 and modified LDLs could be inhibited by teichoic acid, indicating potential overlap in or sharing of binding sites on SCARF1 ECD.

      The author employed a nice collection of techniques, namely crystallographic, SEC, DLS, flow cytometry, ELISA, and confocal imaging. The experiments are technically sound and the results are clearly written, with a few concerns as outlined below. Overall, this research represents an advancement in the mechanistic investigation of SCARF1 and its interaction with ligands. The role of scavenger receptors is critical in lipid homeostasis, making this work of interest to the eLife readership.

      Reviewer #3 (Public Review):

      Summary:

      The manuscript by Wang et. al. described the crystal structures of the N-terminal fragments of Scavenger receptor class F member 1 (SCARF1) ectodomains. SCARF1 recognizes modified LDLs, including acetylated LDL and oxidized LDL, and it plays an important role in both innate and adaptive immune responses. They characterized the dimerization of SCARF1 and the interaction of SCARF1 with modified lipoproteins by mutational and biochemical studies. The authors identified the critical residues for dimerization and demonstrated that SCARF1 may function as homodimers. They further characterized the interaction between SCARF1 and LDLs and identified the lipoprotein ligand recognition sites, the highly positively charged areas. Their data suggested that the teichoic acid inhibitors may interact with SCARF1 in the same areas as LDLs.

      Strengths:

      The crystal structures of SCARF1 were high quality. The authors performed extensive site-specific mutagenesis studies using soluble proteins for ELISA assays and surface-expressed proteins for flow cytometry.

      Weaknesses:

      (1) The schematic drawing of human SCARF1 and SCARF2 in Fig 1A did not show the differences between them. It would be useful to have a sequence alignment showing the polymorphic regions.

      The schematic drawing in Fig.1A is to give a brief idea about the two molecules, the sequence alignment may take too much space in the figure. A careful alignment between SCARF1 and SCARF2 can be found in Ref. 24 (Ishii, et al., J Biol Chem, 2002. 277, 39696-702) an also mentioned in p.4.

      (2) The description of structure determination was confusing. The f1 crystal structure was determined by SAD with Pt derivatives. Why did they need molecular replacement with a native data set? The f2 crystal structure was solved by molecular replacement using the structure of the f1 fragment. Why did they need to use EGF-like fragments predicted by AlphaFold as search models?

      The crystal structure of f1 was first determined by SAD using Pt derivatives, but soaking of Pt reduced the resolution of the crystals, therefore we use this structure as a search model for a native data set that had higher resolution for further refinement. For the structural determination of f2, the molecular replacement using f1 structure was not able to show the initial density of the extra region in f2 (residues 133-209), which was missing in f1. Therefore, the EGF-like domains of SCARF1 modeled by AlphaFold were applied as search models for this region (p.18).

      (3) It's interesting to observe that SCARA1 binds modified LDLs in a Ca2+-independent manner. The authors performed the binding assays between SCARF1 and modified LDLs in the presence of Ca2+ or EDTA on Page 9. However, EDTA is not an efficient Ca2+ chelator. The authors should have performed the binding assays in the presence of EGTA instead.

      The binding assays in the presence of EGTA are included in the revised manuscript (Fig. S7) (p.9), which also suggest that SCARA1 binds OxLDL in a Ca2+-independent manner.

      (4) The authors claimed that SCARF1Δ353-415, the deletion of a C-terminal region of the ectodomain, might change the conformation of the molecule and generate hinderance for the C-terminal regions. Why didn't SCARF1Δ222-353 have a similar effect? Could the deletion change the interaction between SCARF1 and the membrane? Is SCARF1Δ353-415 region hydrophobic?

      The truncation mutants were constructed to roughly locate the binding region of lipoproteins on SCARF1, and the overall results showed that the sites might locate at the region of 133-221. Mutant Δ222-353 may also affect the conformation, but it still had binding with OxLDL like wild type, suggesting the binding sites were retained in this mutant. Mutant Δ353-415 showed a reduction of binding, implying that the binding sites might be retained but binding was affected, we think it might be due to the conformational change that could reduce the binding or accessibility of lipoproteins. Since this region locates closer to the membrane, it’s possible that it may change the interaction with the membrane. In the AF model, Δ353-415 region does not seem to be more hydrophobic than other regions (Fig. S2C).

      (5) What was the point of having Figure 8? Showing the SCARF1 homodimers could form two types of dimers on the membrane surface proposed? The authors didn't have any data to support that.

      Fig. 8 shows a potential model of the SCARF1 dimers on the cell surface by combining the structural information from crystals and AF predictions. The two dimers in the figure are identical but with different viewing angles. The lipoprotein binding sites are also indicated (Fig. 8).

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      The authors need to show examples of the electron density for both structures.

      Electron density examples of the two structures are shown in Fig. S2A.

      Figure 1)

      The figure does not show enough details of the structure. The text mentions hydrogen-bond and disulfide bonds that stabilize the loops, these should be shown.

      Disulfide bonds of the two structures are shown in Fig. 1.

      Figure 2)

      D) The full gel should be shown.

      E) Rather than just relying on changes in gel filtration elution volumes, the authors do the appropriate experiment and measure the hydrodynamic radius of the WT and mutant ectodomains by DLS. However, they need to show plots of the size distribution, not just mean radial values, in order to show if the sample is monodisperse.

      The full gel and plots of DLS are shown in Fig. S3A-B.

      Figure 3)

      I have concerns about the rigor of the experiments in panels A-D. The authors include a non-transfected control but do not appear to have treated non-transfected cells with the lipoproteins to evaluate the specificity of binding. Every cell binding assay (flow  or confocal) must show the data from non-transfected cells treated with each lipoprotein, as each lipoprotein species could have a unique non-specific binding pattern. The authors show these controls in Figure 6, but these controls are necessary in every experiment.

      In Fig. 3A, since several lipoproteins were included in the figure, we use non-transfected cells without lipoprotein treatment as a negative control. The OxLDL or AcLDL treated non-transfected cells were also used as negative controls and shown in Fig. 3B-C. LDL, HDL or OxHDL may have their own non-specific binding patterns, the treatment of LDL, HDL or OxHDL with the transfected cells all gave negative results (Fig. 3A and D).

      Cell-surface of the SCARF1 variants is a major concern. The constructs the authors use are tagged with a GFP on the cytosolic side. However, the Methods to do indicate if they gate on GFP+ transfected cells for analytical flow. Such gating may have been used because the staining experiments in Figures 3 and 4 show uniform cell populations, whereas the staining done with an anti-SCARF1 Ab in S4 shows most of the cells not expressing the protein on the surface. Please clarify.

      Data for the anti-SCARF1 Ab assay is gated for GFP in the revised Fig. S4, and  the non-transfected cells are included as a control.

      The authors must demonstrate cell-surface staining with an epitope tag on the extracellular side and clarify if the analyzed cells are gated for surface expression. The anti-SCARF antibody used in S4 may not recognize the truncated or mutant SCARFs equally. Cell-surface expression in the flow experiments cannot be inferred from confocal experiments because the flow experiments have a larger quantitative range.

      Anti-SCARF1 antibody assay provides an estimation of the surface expression of the proteins. If the epitope of the antibody was mutated or removed in the mutants, most likely it would lose binding activity. Including an epitope tag on the ectodomain could be an option, but if truncation or mutation changes the conformation of the ectodomain, the accessibility of the epitope may also be affected, and addition of an extra sequence or domain, such as an epitope tag, may affect the surface expression of proteins sometimes.

      In several places, the authors infer increased or decreased affinity from mean fluorescent intensity values of a single concentration point without doing appropriate dose-curves. These experiments need to be done or else the mentions of changes in apparent affinities should be removed.

      We add a concentration for the WT interaction with OxLDL (Fig. S6, p.9) and the manuscript is also modified accordingly.

      Figure 7

      The concentration of teichoic acid used to inhibit modified LDL binding should be indicated and a dose-curve analysis should be done comparing teichoic acid to some non-inhibitory bacterial polymer.

      The concentration of teichoic acids used in the inhibition assays is 100 mg/ml (p.21). Unfortunately, we don’t have other bacterial polymers in the lab and not sure about the potential inhibitory effects.

      Reviewer #2 (Recommendations For The Authors):

      Major points:

      (1) The SCARF1 ECD contains three N-linked glycosylation sites (N289, N382, N393). It remains unclear whether these modifications are involved in SCARF1 binding to modified LDLs. Is it possible to design some experiments to investigate the effect of N-glycans on the recognition of modified LDLs? In particular, N382 and N393 are included in 353-415aa and the truncation mutant of SCARF1Δ353-415aa resulted in reduced binding with OxLDL in Fig.3G. Or whether the reduced binding is only due to the potential conformational changes caused by the deletion of the C-terminal region of the ECD?

      A previous study regarding the N-glycans (N289, N382, N393) of SCARF1 (ref.17) has shown that they may affect the proteolytic resistance, ligand-binding affinity and subcellular localization of SCARF1, which is not quite surprising as lipoproteins are large particles, the N-glycans on the surface of SCARF1 could affect accessibility or affinity for lipoproteins. But the exact roles of each glycan could be difficult to clarify as they might also be involved in protein folding and trafficking.

      The reduction of the binding of OxLDL for the mutant SCARF1 Δ353-415aa may be due to the conformational change or the loss of the glycans or both.

      (2) The authors speculated that the dimeric form of SCARF1 may be more efficient in recognizing lipoproteins on the cell surface. Please highlight the critical region/sites for ligand binding in Figure 8 and discuss the structural basis of dimerization improving the binding.

      The binding sites for lipoproteins on SCARF1 are indicated in Fig. 8. According to our data, it might be possible the conformation of the dimeric form of SCARF1 makes it more accessible to the ligands on the cell surface as implied by flow cytometry (p.14-15), but still needs further evidence on this.

      (3) Could the two salt bridges (D61-K71, R76-D98) observed in f1 crystals be found in f2 crystals? They seemed to be a little far from the defined dimeric interface (F82, S88, Y94) and how important are these to SCARF1 dimerization?

      The two salt bridges observed in f1 crystal are not found in f2 crystal (distances are larger than 5.0 Å), suggesting they are not required for dimerization (p. 7-8), but may be helpful in some cases.

      (4) The monomeric mutants (S88A/Y94A, F82A/S88A/Y94A) exhibited opposite affinity trends to OxLDL in ELISA and flow cytometry. The authors proposed steric hinderance of the dimers coated onto the plates as the potential explanation for this observation. However, the method of ELISA stated that OxLDLs, instead of SCARF1 ECD, were coated onto the plates. So what's the underlying reason for the inconsistency in different assays?

      Thanks. ELISA was done by coating OxLDLs on the plates as described in the Methods. But still, a dimeric form of SCARF1 may only bind one OxLDL coated on the plates due to steric hinderance. We correct this on p.12.

      Minor points:

      (1) Figure 2D and Figure S3 - please label the molecular weight marker on the SEC traces to indicate the native size of various purified proteins.

      The elution volume of SEC not only reflects the molecular weight, but it’s also affected by the conformation or shape of protein. The ectodomain of SCARF1 has a long curved conformation, the elution volumes of the monomeric or dimeric forms of SCARF1 do not align well with the standard molecular weight marker and elute much earlier in SEC. We include the standard molecular weight marker in Fig. S3C-D.

      (2) Could the authors provide SEC profiles of f1 and f2 that were used in crystallographic study?

      The SEC profiles of f1 and f2 for crystallization are shown in Fig. S5 (p.6).

      (3) The legend of Figure 3A states that the NC in flow cytometry assay represents the non-transfected cells, but please confirm whether the NC in Fig. 3A-C corresponds to non-transfected cells or no lipoprotein.

      NC in Fig. 3A represents the non-transfected cells, and no lipoproteins were added in this case as several lipoproteins are included in Fig. 3A. The lipoprotein (OxLDL or AcLDL) treated non-transfected cells (NC) were shown in Fig. 3B-C as negative controls.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The manuscript authored by Stockner and colleagues delves into the molecular simulations of Na+ binding pathway and the ionic interactions at the two known sodium binding sites site 1 and site 2. They further identify a patch of two acidic residues in TM6 that seemingly populate the Na+ ions prior to entry into the vestibule. These results highlight the importance of studying the ion-entry pathways through computational approaches and the authors also validate some of their findings through experimental work. They observe that sodium site 1 binding is stabilized by the presence of the substrate in the s1 site and this is particularly vital as the GABA carboxylate is involved in coordinating the Na+ ion unlike other monoamine transporters and binding of sodium to the Na2 site stabilizes the conformation of the GAT1 by reducing flexibility among the helical bundles involved in alternating access.

      Strengths:

      The study displays results that are generally consistent with available information from experiments on SLC6 transporters particularly GAT1 and puts forth the importance of this added patch of residues in the extracellular vestibule that could be of importance to the ion permeation in SLC6 transporters. This is a nicely performed study and could be improved if the authors could comment on and fix the following queries.

      We thank the reviewer for the overall positive assessment of our work.

      Comments on revised version:

      The authors have satisfactorily addressed my comments and this has significantly improved the clarity of the manuscript.

      The only point that I would like to inquire about is the role of EL4 in modulating Na+ entry.

      In the simulations do the authors see no role of EL4 in controlling Na+ entry. It is particularly intriguing as some studies in the recent past displayed charged mutations in EL4 of dDAT, SERT and GAT1 as being detrimental for substrate entry/uptake. It would therefore be nice to add a small discussion if there is any role for EL4 in Na+ entry.

      In this study we focused on sodium binding to the sodium binding site NA1 and NA2 and discovered the role of negatively charged residues at the beginning of TM6 contribution to sodium binding. Our data shows less than average interactions of sodium ions with EL4. In particular, we do also not observe any prominent role for D355, which is the only negatively charged residues in EL4a. We associate this effect to the presence of four positively charged residues (R69,Y76, K350, R351) surrounded D355 and an electrostatic repulsion by a local positive field, which is also visible in Figure 1k. Following the suggestion of the reviewer, we added a short statement to the last paragraph of the discussion.

      Reviewer #2 (Public Review):

      Summary

      Starting from an AlphaFold2 model of the outward-facing conformation of the GAT1 transporter, the authors primarily use state-of-the-art MD simulations to dissect the role of the two Na+ ions that are known to be co-transported with the substrate, GABA (and a cotransported Cl- ion). The simulations indicated that Na+ binding to OF GAT depends on the electrostatic environment. The authors identify an extracellular recruiting site including residues D281 and E283 which they hypothesized to increase transport by locally increasing the available Na+ concentration and thus increasing binding of Na+ to the canonical binding sites NA1 and NA2. The charge-neutralizing double mutant D281AE283A showed decreased binding in simulations. The authors performed GABA uptake experiments and whole-cell patch clamp experiments that taken together validated the hypothesis that the Na+ staging site is important for transport due to its role in pulling in Na+.

      Detailed analysis of the MD simulations indicated that Na+ binding to NA2 has multiple structural effects: The binding site becomes more compact (reminiscent of induced fit binding) and there is some evidence that it stabilizes the outward-facing conformation.

      Binding to NA1 appears to require the presence of the substrate, GABA, whose carboxylate moiety participates in Na+ binding; thus the simulations predict cooperativity between binding of GABA and Na+ binding to NA1.

      Strengths

      - MD simulations were used to propose a hypothesis (the existence of the staging Na+ site) and then tested with a mutant in simulations AND in experiments. This is an excellent use of simulations in combination with experiments.

      - A large number of repeat MD simulations are generally able to provide a consistent picture of Na+ binding. Simulations are performed according to current best practices and different analyses illuminate the details of the molecular process from different angles.

      - The role of GABA in cooperatively stabilizing Na+ binding to the NA1 site looks convincing and intriguing.

      We thank the reviewer for the overall positive assessment of our work.

      Weaknesses

      - Assessing the effects of Na+ binding on the large scale motions of the transporter is more speculative because the PCA does not clearly cover all of the conformational space and the use of an AlphaFold2 model may have introduced structural inconsistencies. For example, it is not clear if movements of the inner gate are due to a AF2 model that's not well packed or really a feature of the open outward conformation.

      We do not think that the results of the manuscript and in particular the large scale motions are speculative or dependent too much on the limitations of PCA. We only use PCA for Figure 6a-d,6g,h. Motions of SLC6 transporters (and of any other transporter) are much more complex than a single 2D PCA plot could every capture. We therefore used PCA here only to identify the two motions with the largest amplitude, show in Figure 6a-d, 6g,h.

      Given that all the ~13000 degrees of freedom of GAT1 contribute to conformational differences, a dimensionally reduction method like PCA can be very helpful for extracting dominant motions. Structure comparison showed that motions observed in PC1 captured a large portion of the motions of occlusion (Figure 6c,d) when compared to the full transition observed in the unfiltered trajectories (See Figure 6e,f). PCA therefore helps to extract this main motions.

      For completeness, we show a series of structures from the unfiltered trajectories in figure 6e,f. In the overlay, the motion of occlusion is more difficult to observe, because convoluted with all other degrees of freedom. In figure 6e,f, the structures are aligned with the maximum likelihood method theseus, while the coloring is based on the amplitudes measured by PCA to visualize the regions moving relative to each other with largest amplitude. All other structural measures, including the opening of the inner gate (Figure 6i-k), are direct measures of the raw trajectories.

      With respect to the question of the instability of the inner gate, we made similar observations for hSERT (please see DOI: 10.1038/s41467-023-44637-6) using the experimentally determined structure as starting point. We find a weakening of the inner gate for sodium free SERT and at intermediate or full occlusion of sodium- and serotonin-bound SERT. These previous data on SERT corroborate our finding and indicates that the effect could be a general feature of the SLC6 transporter family.

      Unfortunately no outward-open structure of GAT1 was available for this study. AlphaFold2 models have limitations and we are well aware of these limitations, but AlphaFold2 can also make high quality models including small adjustment of backbone positions, if the sequence identity is high, as in the current project (43% sequence identity for the transmembrane region). For GAT1 (as described in the manuscript) we initially tested hSERT based model created with MODELLER. MODELLER uses as premises the assumption that the protein backbone does not change or only very little between the template protein and the target protein. These MODELLER created models did not perform well, because of a slight shift in the position of the backbone, which is a consequence of consistently smaller side chains in the bundle domain-scaffold domain interface of GAT1 as compared to SERT.

      In the simulations described in the manuscript (using the AlphaFold created model) we observed that the overall structural and dynamic parameters and in particular also observation at the inner gate are very similar to the results described in our papers on sodium binding to SERT using experimental SERT structures. The differences of Na1 binding are explained in the manuscript and are contingent to the residue difference of D98 in SERT and the corresponding residue G65 in GAT1. This makes us confident about the quality of the obtained data. Please see DOI: 10.3390/cells11020255; DOI: 10.3389/fncel.2021.673782.

      - Quantitative analyses are difficult with the existing data; for example, the tICA "free energy" landscape is probably not converged because unbinding events haven't been observed.

      The tICA analysis is a Marco State Model approach, which relies on the convergence of transitions between a large number of microstates. A limited number of trajectories showing full sodium unbinding are not obligatory for converged dataset, but the transitions between the microstates must to be converged. For the transitions within the S1 we have many transitions and very good convergence for transition probabilities within the S1. We limit interpretation of free energy data and discussion on this part of the free energy surface. The supporting information (Figure S5) reports on the quality of the tICA analysis. Flat lines with a time lag larger than 40 ns is consistent with a converged model based on the data of the trajectories used for the analysis, and consistently, also the Chapman-Kolmogorov tests show minimal difference between estimates and predictions.

      We see about 40 binding event from the extracellular side to the S1, which seems insufficient for a converged quantification for sodium transiting from the extracellular side to the S1. We state this limitation of the dataset in the results section of the manuscript.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors show that the Gαs-stimulated activity of human membrane adenylyl cyclases (mAC) can be enhanced or inhibited by certain unsaturated fatty acids (FA) in an isoform-specific fashion. Thus, with IC50s in the 10-20 micromolar range, oleic acid affects 3-fold stimulation of membrane-preparations of mAC isoform 3 (mAC3) but it does not act on mAC5. Enhanced Gαs-stimulated activities of isoforms 2, 7, and 9, while mAC1 was slightly attenuated, but isoforms 4, 5, 6, and 8 were unaffected. Certain other unsaturated octadecanoic FAs act similarly. FA effects were not observed in AC catalytic domain constructs in which TM domains are not present. Oleic acid also enhances the AC activity of isoproterenol-stimulated HEK293 cells stably transfected with mAC3, although with lower efficacy but much higher potency. Gαs-stimulated mAC1 and 4 cyclase activity were significantly attenuated in the 20-40 micromolar by arachidonic acid, with similar effects in transfected HEK cells, again with higher potency but lower efficacy. While activity mAC5 was not affected by unsaturated FAs, neutral anandamide attenuated Gαs-stimulation of mAC5 and 6 by about 50%. In HEK cells, inhibition by anandamide is low in potency and efficacy. To demonstrate isoform specificity, the authors were able to show that membrane preparations of a domain-swapped AC bearing the catalytic domains of mAC3 and the TM regions of mAC5 are unaffected by oleic acid but inhibited by anandamide. To verify in vivo activity, in mouse brain cortical membranes 20 μM oleic acid enhanced Gαs-stimulated cAMP formation 1.5-fold with an EC50 in the low micromolar range.

      Strengths:

      (1) A convincing demonstration that certain unsaturated FAs are capable of regulating membrane adenylyl cyclases in an isoform-specific manner, and the demonstration that these act at the AC transmembrane domains.

      (2) Confirmation of activity in HEK293 cell models and towards endogenous AC activity in mouse cortical membranes.

      (3) Opens up a new direction of research to investigate the physiological significance of FA regulation of mACs and investigate their mechanisms as tonic or regulated enhancers or inhibitors of catalytic activity.

      (4) Suggests a novel scheme for the classification of mAC isoforms.

      Weaknesses:

      (1) Important methodological details regarding the treatment of mAC membrane preps with fatty acids are missing.

      We will address this issue in more detail.

      (2) It is not evident that fatty acid regulators can be considered as "signaling molecules" since it is not clear (at least to this reviewer) how concentrations of free fatty acids in plasma or endocytic membranes are hormonally or otherwise regulated.

      Although this question is not the subject of this ms., we will address this question in more detail in the discussion of the revision.

      Reviewer #2 (Public review):

      Summary:

      The authors extend their earlier findings with bacterial adenylyl cyclases to mammalian enzymes. They show that certain aliphatic lipids activate adenylyl cyclases in the absence of stimulatory G proteins and that lipids can modulate activation by G proteins. Adding lipids to cells expressing specific isoforms of adenylyl cyclases could regulate cAMP production, suggesting that adenylyl cyclases could serve as 'receptors'.

      Strengths:

      This is the first report of lipids regulating mammalian adenylyl cyclases directly. The evidence is based on biochemical assays with purified proteins, or in cells expressing specific isoforms of adenylyl cyclases.

      Weaknesses:

      It is not clear if the concentrations of lipids used in assays are physiologically relevant. Nor is there evidence to show that the specific lipids that activate or inhibit adenylyl cyclases are present at the concentrations required in cell membranes. Nor is there any evidence to indicate that this method of regulation is seen in cells under relevant stimuli.

      Although this question is not the subject of this manuscript, we will address this question in more detail in the discussion of the revision.

      Reviewer #3 (Public review):

      Summary:

      Landau et al. have submitted a manuscript describing for the first time that mammalian adenylyl cyclases can serve as membrane receptors. They have also identified the respective endogenouse ligands which act via AC membrane linkers to modify and control Gs-stimulated AC activity either towards enhancement or inhibition of ACs which is family and ligand-specific. Overall, they have used classical assays such as adenylyl cyclase and cAMP accumulation assays combined with molecular cloning and mutagenesis to provide exceptionally strong biochemical evidence for the mechanism of the involved pathway regulation.

      Strengths:

      The authors have gone the whole long classical way from having a hypothesis that ACs could be receptors to a series of MS studies aimed at ligand indentification, to functional studies of how these candidate substances affect the activity of various AC families in intact cells. They have used a large array of techniques with a paper having clear conceptual story and several strong lines of evidence.

      Weaknesses:

      (1) At the beginning of the results section, the authors say "We have expected lipids as ligands". It is not quite clear why these could not have been other substances. It is because they were expected to bind in the lipophilic membrane anchors? Various lipophilic and hydrophilic ligands are known for GPCR which also have transmembrane domains. Maybe 1-2 additional sentences could be helpful here.

      Will be done as suggested.

      (2) In stably transfected HEK cells expressing mAC3 or mAC5, they have used only one dose of isoproterenol (2.5 uM) for submaximal AC activation. The reference 28 provided here (PMID: 33208818) did not specifically look at Iso and endogenous beta2 adrenergic receptors expressed in HEK cells. As far as I remember from the old pharmacological literature, this concentration is indeed submaximal in receptor binding assays but regarding AC activity and cAMP generation (which happen after signal amplification with a so-called receptor reserve), lower Iso amounts would be submaximal. When we measure cAMP, these are rather 10 to 100 nM but no more than 1 uM at which concentration response dependencies usually saturate. Have the authors tried lower Iso concentrations to prestimulate intracellular cAMP formation? I am asking this because, with lower Iso prestimulation, the subsequent stimulatory effects of AC ligands could be even greater.

      The best way to address this issue is to establish a concentration-response curve for Iso-stimulated cAMP formation using the permanently transfected cells. We note that in the past isoproterenol concentrations used in biochemical or electrophysiological experiments differed substantially.

      (3) The authors refer to HEK cell models as "in vivo". I agree that these are intact cells and an important model to start with. It would be very nice to see the effects of the new ligands in other physiologically relevant types of cells, and how they modulate cAMP production under even more physiological conditions. Probably, this is a topic for follow-up studies.

      The last sentence is correct.

      Appraisal of whether the authors achieved their aims, and whether the results support their conclusions:

      The authors have achieved their aims to a very high degree, their results do nicely support their conclusions. There is only one point (various classical GPCR concentrations, please see above) that would be beneficial to address.

      Without any doubt, this is a groundbreaking study that will have profound implications in the field for the next years/decades. Since it is now clear that mammalian adenylyl cyclases are receptors for aliphatic fatty acids and anandamide, this will change our view on the whole signaling pathway and initiate many new studies looking at the biological function and pathophysiological implications of this mechanism. The manuscript is outstanding.

    1. Author response:

      eLife Assessment

      This important study reports the transcriptomic and proteomic landscape of the oviducts at four different preimplantation periods during natural fertilization, pseudopregnancy, and superovulation. The data presented convincingly supported the conclusion in general, although more analyses would strengthen the conclusions drawn. This work will interest reproductive biologists and clinicians practicing reproductive medicine. 

      We appreciate the concise summary and agree that additional experiments can reinforce the fidelity of predictions made by our robust bioinformatic characterization of the oviduct. Our robust bioinformatic model appears reproducible as similar pathway trends have been produced in all three datasets, lending confidence for future researchers to establish testable hypotheses more effectively.  

      Reviewer #1 (Public review):

      The paper demonstrated through a comprehensive multi-omics study of the oviduct that the transcriptomic and proteomic landscape of the oviduct at 4 different preimplantation periods was dynamic during natural fertilization, pseudopregnancy, and superovulation using three independent cell/tissue isolation and analytical techniques. This work is very important for understanding oviductal biology and physiology. In addition, the authors have made all the results available in a web search format, which will maximize the public's access and foster and accelerate research in the field.

      Strengths:

      (1) The manuscript addresses an important and interesting question in the field of reproduction:

      how does the oviduct at different regions adapt to the sperm and embryos for facilitating fertilization and preimplantation embryo development and transport?

      (2) Authors used cutting-edge techniques: Integrated multi-modal datasets followed by in vivo confirmation and machine learning prediction.

      (3) RNA-seq, scRNA-seq, and proteomic results are immediately available to the scientific community in a web search format.

      (4) Substantiated results indicate the source of inflammatory responses was the secretory cell population in the IU region when compared to other cell types; sperm modulate inflammatory responses in the oviduct; the oviduct displays immuno-dynamism.

      We sincerely thank you for your thorough and insightful review of our manuscript. Your comprehensive summary accurately captures the essence of our multi-omics study on oviductal biology, highlighting its importance in understanding reproductive physiology. We are particularly grateful for your recognition of our study's strengths. In the revised manuscript, we

      plan to add another searchable scRNA-seq data on our public website; https://genesearch.org/winuthayanon/Oviduct_pregnancy/. We will also address the weaknesses in the response below in our revised manuscript.  

      Weaknesses:  

      (1) The rationale for using the superovulation model is not clear. The oviductal response to sperm and embryos can be studied by comparing mating with normal and vasectomized mice and comparing pregnancy vs pseudopregnancy (induced by mating with vasectomized males). Superovulation causes supraphysiological hormone levels and other confounding conditions.

      We agree with this assessment that superovulation changes the hormonal levels and could have a confounding impact on the oviduct function. As such, for all experiments involving pseudopregnant datasets, pseudopregnancy was induced by mating females with vasectomized males without superovulation. In our oviductal luminal protein content analysis, oviductal fluid was collected from pregnant females with and without superovulation. This allowed us to directly compare the impact of superovulation on protein abundance and profile. In the revised manuscript, we will provide clarifying statements on using superovulation in our experimental design. 

      One exception for using superovulation in the absence of a “natural mating” group for comparison is the scRNA-seq dataset. As single-cell libraries should be performed in a single run to avoid batch effects, we need to ensure that the sufficient number of females were pregnant for single-cell isolation (we used ~4 mice/timepoint). Therefore, superovulation was used to synchronize and ensure that the females were receptive to mating. At the time of our sample collection, single nuclei isolation methods (freeze tissue now, isolate nuclei later) have not been reliable or standardized. We have tried to synchronize females using the male bedding without having to superovulate. However, we would still need to set up at least 12-15 females per pregnancy timepoint to mate with male mice, which totals to ~48-60 mice each night. Due to budget and vivarium space limitations, we were not able to do so. We will include a similar statement to explain and clarify these limitations in the revised manuscript.

      (2) This study involves a very complex dataset with three different models at four time points. If possible, it would be very informative to generate a graphic abstract/summary of their major findings in oviductal responses in different models and time points

      Thank you for this suggestion. We will include the graphical abstract to accompany our final version of the manuscript.

      (3) The resolution of Figures 3A-3C in the submitted file was not high enough to assess the authors' conclusion.

      We plan to provide a higher magnification of images in Figures 3A-C in the revised version.

      (4) The authors need to double-check influential transcription factors identified by machine learning. Apparently, some of them (such as Anxa2, Ift88, Ccdc40) are not transcription factors at all.

      We appreciate the recognition of this oversight. We will clearly state the distinction between ‘influential TFs’ and ‘significant proteins’ in the revised manuscript. We will ensure that all TFs are stated correctly. 

      Reviewer #2 (Public review):

      The manuscript investigates oviductal responses to the presence of gametes and embryos using a multi-omics and machine learning-based approach. By applying RNA sequencing (RNAseq), single-cell RNA sequencing (sc-RNA-seq), and proteomics, the authors identified distinct molecular signatures in different regions of the oviduct, proximal versus distal. The study revealed that sperm presence triggers an inflammatory response in the proximal oviduct, while embryo presence activates metabolic genes essential for providing nutrients to the developing embryos. Overall, this study offers valuable insights and is likely to be of great interest to reproductive biologists and researchers in the field of oviduct biology. However, further investigation into the impact of sperm on the immune cell population in the oviduct is necessary to strengthen the overall findings.

      We appreciate the concise summary, strengths, and weakness highlighted. We plan to address comments made by the reviewer concerning superovulation, figure recommendations, and additional analysis in our revised manuscript. We plan to include the comparison of findings from scRNA-seq analysis from fallopian tube tissues collected from hydrosalpinx patients by Ulrich et al. (PMID: 35320732) with our data. The evaluation of this data by Ulrich et al. will help distinguish between different inflammatory pathways stimulated by sperm vs. general inflammation. We will follow up on a detailed description of immune cell types present at 0.5 dpc using FACS analysis in future studies. This is mainly due to a lack of expertise and technical limitations in our lab on immune cell investigation. Nevertheless, we have made collaborative efforts and recruited two immunologists to facilitate our future immune cell studies. We will also provide a clear justification for using superovulation, especially in the scRNA-seq analysis in the revised manuscript (please see response to Reviewer 1 above).

    1. Author response:

      Thank you for your thoughtful and constructive feedback on our manuscript. We greatly appreciate your recognition of the strengths in our work, particularly regarding the genetic evidence demonstrating the role of beta-adrenergic receptors in cavernous malformation (CCM) development and the therapeutic potential of beta-blocker drugs.

      We acknowledge your concerns and have addressing them to improve the clarity and rigor of our study. Specifically:

      For Reviewer 1:

      (1) Figure Annotation: We will enhance the annotation of the figure panels to provide clearer and more detailed descriptions, ensuring that each panel is easily interpretable and contributes effectively to the overall narrative of the study.

      (2) Baseline Control for Klf2 Expression: We will include this control (klf2 expression in control Morpholino-injected embryos) in our revised figures and text to provide a more complete context for the changes in klf2 expression observed in response to genetic loss of adrb1 in ccm2 morphants and ccm2-CRISPR embryos.

      For Reviewer 2:

      (3) Sample Sizes in Figures 1, 2, and 3: We agree that reporting sample sizes is crucial for the transparency and reproducibility of our findings. For Figures 1B and D, as well as Figures 2G and 3B, we will update the figure legends to include the number of embryos and adult fish in which lesions were scored.

      (4) Figure 4 Concerns: Use of adrb1 Morphants: We will note that the adrb1 morphant shows similar hemodynamic changes to the adrb1 mutant and that the morphants did not prevent the mosaic klf2 expression in ccm2 CRISPR embryos. Thus supporting our conclusion that protection from CVP cavernomas in the adrb1 mutant are not due to altered hemodynamics blocking mosaic klf2 expression.

  2. Sep 2024
    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      In this manuscript, Day et al. present a high-throughput version of expansion microscopy to increase the throughput of this well-established super-resolution imaging technique. Through technical innovations in liquid handling with custom-fabricated tools and modifications to how the expandable hydrogels are polymerized, the authors show robust ~4-fold expansion of cultured cells in 96-well plates. They go on to show that HiExM can be used for applications such as drug screens by testing the effect of doxorubicin on human cardiomyocytes. Interestingly, the effects of this drug on changing DNA organization were only detectable by ExM, demonstrating the utility of HiExM for such studies. 

      Overall, this is a very well-written manuscript presenting an important technical advance that overcomes a major limitation of ExM - throughput. As a method, HiExM appears extremely useful, and the data generally support the conclusions. 

      Strengths: 

      Hi-ExM overcomes a major limitation of ExM by increasing the throughput and reducing the need for manual handling of gels. The authors do an excellent job of explaining each variation introduced to HiExM to make this work and thoroughly characterize the impressive expansion isotropy. The dox experiments are generally well-controlled and the comparison to an alternative stressor (H2O2) significantly strengthens the conclusions. 

      Weaknesses: 

      (1) Based on the exceedingly small volume of solution used to form the hydrogel in the well, there may be many unexpanded cells in the well and possibly underneath the expanded hydrogel at the end of this. How would this affect the image acquisition, analysis, and interpretation of HiExM data? 

      The hydrogel footprint covers approximately 5% of the surface within an individual well and only cells within this area are embedded in the polymerized hydrogel for subsequent processing steps. Cells that are outside of this footprint are not incorporated into the gel because these cells are digested by Proteinase K and washed away by the excess water exchange in the gel swelling step. Note that different cell types may require higher or lower concentrations of Proteinase K to adequately digest cells for expansion while maintaining fluorescence signal. Given the compatibility of HiExM with 96-well plates, this titration can be performed rapidly in a single experiment. Although cells outside of the hydrogel footprint are removed prior to imaging, we do occasionally observe Hoechst signal that appears to be underneath the gels. We believe this signal is likely from excess DNA from digested cells that was not fully washed out in the gel swelling step. This signal is both spatially and morphologically distinct from the nuclear signal of intact cells and it does not affect image acquisition, analysis, or data interpretation. 

      (2) It is unclear why the expansion factor is so variable between plates (e.g., Figure 2H). This should be discussed in more detail. 

      The variability in expansion factor across plates can likely be attributed to the small volume of gel solution (~250 nL) required for expansion within 96 well plates. Small variations in gel volume could impact gel polymerization compared to standard ExM gels. For example, gels in HiExM are more sensitive to evaporation because of the ~1000x reduced volume compared to standard expansion gel preparations, resulting in an increased air-liquid-interface. Evaporation in HiExM gels would increase monomer and cross linker concentrations, leading to variation in expansion factor across plates. We note that expansion factor is robust within well plates and that variance is slightly increased between plates. These considerations are discussed in the revised manuscript.

      (3) The authors claim that CF dyes are more resistant to bleaching than other dyes. However, in Figure. S3, it appears that half of the CF dyes tested still show bleaching, and no data is shown supporting the claim that Alexa dyes bleach. It would be helpful to include data supporting the claim that Alexa dyes bleach more than CF dyes and the claim that CF dyes in general are resistant to bleaching should be modified to more accurately reflect the data shown. 

      We did not show data using Alexa dyes because these fluorophores are highly sensitive to photobleaching using Irgacure and thus we could not obtain images. In contrast, some CF dyes are more robust to bleaching in HiExM including CF488A, CF568, and CF633 dyes.  We have recently adapted our protocol to PhotoExM chemistry which is compatible with a wider range of fluorophores as described by Günay et al. (2023) and as shown in Fig. S16.

      (4) Related to the above point, it appears that Figure S11 may be missing the figure legend. This makes it hard to understand how HiExM can use other photo-inducible polymerization methods and dyes other than CF dyes.

      We revised the legend for revised Fig. S11 (now Fig. S16) as follows: Example of a cell expanded in HiExM using Photo-ExM gel chemistry. Photo-ExM does not require an anoxic environment for gel deposition and polymerization, improving ease of use of HiExM. Mitochondria were stained with an Alexa 647 conjugated secondary antibody, demonstrating that HiExM is compatible with additional fluorophores when combined with Photo-ExM.

      (5) The use of automated high-content imaging is impressive. However, it is unclear to me how the increased search space across the extended planar area and focal depths in expanded samples is overcome. It would be helpful to explain this automated imaging strategy in more detail. 

      We imaged plates on the Opera Phenix using the PreciScan Acquisition Software in Harmony. In brief, each well is imaged at 5x magnification in the Hoechst channel to capture the full well at low resolution. Hoechst is used for this step given its signal brightness, ubiquity across established staining protocols, and spectral independence from most fluorophores commonly conjugated to secondary antibodies. Using this information, the microscope detects regions of interest (nuclei) based on criteria including size, brightness, circularity, etc. Finally, the positional information for each region is stored, and the microscope automatically images those regions at 63x magnification. The working distance for the objective used in this study is 600 µm which is sufficient to capture the entirety of expanded cells in the Z direction. This strategy minimizes offtarget imaging and allows robust image acquisition even in cultures with lower seeding density. A detailed description of the automated imaging strategy is included in the methods section of the revised manuscript.

      (6) The general method of imaging pre- and post-expansion is not entirely clear to me. For example, on page 5 the authors state that pre-expansion imaging was done at the center of each gel. Is pre-expansion imaging done after the initial gel polymerization? If so, this would assume that the gelation itself has no effect on cell size and shape if these gelled but not yet expanded cells are used as the reference for calculating expansion factor and isotropy. 

      Pre-expansion imaging is performed after staining is complete, but prior to the application of AcX, which is the first step of the HiExM protocol. Following staining and imaging, plates can be sealed with parafilm and stored at 4˚C for up to a week prior to starting the expansion protocol. We typically image 61 fields of view at the center of the well plate (where the gel will be deposited) to obtain sufficient pre-expansion images as shown in Figure 2b (left). After preexpansion imaging, we perform the HiExM protocol followed by image acquisition. We then tile all the images, as shown in Figure 2b, and compare tiled images from the same well pre- and post-expansion to manually identify the same cells. Comparisons of the pre- and postexpansion images of the same cell are used to calculate expansion factor and isotropy measurements as described. A detailed description of this process is included in the revised manuscript.

      (7) In the dox experiments, are only 4 expanded nuclei analyzed? It is unclear in the Figure 3 legend what the replicates are because for the unexpanded cells, it says the number of nuclei but for expanded it only says n=4. If only 4 nuclei are analyzed, this does not play to the strengths of HiExM by having high throughput.

      We performed the doxorubicin titration assay across four different well plates (n=4). For each condition, the total number of expanded nuclei measured was 118, 111, 110, 113, and 77 for DMSO, 1nM, 10nM, 100nM, and 1µM, respectively. For SEM calculations, we included the number of independent experiments to avoid underestimating error. We revised the Fig. 3 legend to include these experimental details.

      (8) I am not sure if the analysis of dox-treated cells is accurate for the overall phenotype because only a single slice at the midplane is analyzed. It would be helpful to show, at least in one or two example cases, that this trend of changing edge intensity occurs across the whole 3D nucleus.  

      For this analysis, the result is heavily dependent on the angle at which the edge of the nucleus intersects the image plane in the orthogonal view. For this reason, we opted to only use the optimal image plane for each nucleus. We repeated our analysis on an image using multiple optical sections to demonstrate this point. These new data are included as Fig. S11 of the revised manuscript.

      (9) It would be helpful to provide an actual benchmark of imaging speed or throughput to support the claims on page 8 that HiExM can be combined with autonomous imaging to capture thousands of cells a day. What is the highest throughput you have achieved so far?  

      The parameters that dictate imaging speed in HiExM include exposure time, z-stack height, and number of fluorophore channels. Depending on the signal intensity for a given channel, exposure times vary from 200ms to 1000ms. For z-stack height, we found that imaging 65 sections with 1µm spacing allowed for robust identification of each region of interest in the 5x pre-scan. As an example, collecting images for a full well plate (e.g., 20 images per well with 4 channels) requires approximately 24 hours of autonomous image acquisition using the Opera Phenix. Depending on cell size, this process yields imaging data for 1200 cells (1 cell per field of view) to 6000 cells (5 cells per field of view). Different autonomous imagers as well as improving staining techniques that increase signal:noise can be expected to significantly decrease the exposure time as it will reduce the number of z-stacks needed for each region.

      Reviewer #2 (Public Review): 

      Summary: 

      In the present work, the authors present an engineering solution to sample preparation in 96well plates for high-throughput super-resolution microscopy via Expansion Microscopy. This is not a trivial problem, as the well cannot be filled with the gel, which would prohibit the expansion of the gel. A device was engineered that can spot a small droplet of hydrogel solution and keep it in place as it polymerizes. It occupies only a small portion of space at the center of each well, the gel can expand into all directions, and imaging and staining can proceed by liquid handling robots and an automated microscope. 

      Strengths: 

      In contrast to Reference 8, the authors' system is compatible with standard 96 well imaging plates for high-throughput automated microscopy and automated liquid handling for most parts of the protocol. They thus provide a clear path towards high-throughput ExM and highthroughput super-resolution microscopy, which is a timely and important goal. 

      Weaknesses: 

      The assay they chose to demonstrate what high-throughput ExM could be useful for, is not very convincing. But for this reviewer that is not important. 

      We believe the data provide an example of the utility of HiExM that would benefit experiments that require many samples (e.g., conditions, replicates, timepoints, etc.) by enabling easier sample processing and autonomous acquisition of thousands of nanoscale images in parallel. The ability to generate large data sets also enables quantitative analysis of images with appropriate statistical power. The intention of this work is to provide a proof-of-concept example of the robustness, accessibility, and experimental design flexibility of HiExM.

      Reviewer #3 (Public Review):

      Summary: 

      Day et al. introduced high-throughput expansion microscopy (HiExM), a method facilitating the simultaneous adaptation of expansion microscopy for cells cultured in a 96-well plate format. The distinctive features of this method include 1) the use of a specialized device for delivering a minimal amount (~230 nL) of gel solution to each well of a conventional 96-well plate, and 2) the application of the photochemical initiator, Irgacure 2959, to successfully form and expand the toroidal gel within each well.  

      Strengths: 

      This configuration eliminates the need for transferring gels to other dishes or wells, thereby enhancing the throughput and reproducibility of parallel expansion microscopy. This methodological uniqueness indicates the applicability of HiExM in detecting subtle cellular changes on a large scale. 

      Weaknesses: 

      To demonstrate the potential utility of HiExM in cell phenotyping, drug studies, and toxicology investigations, the authors treated hiPS-derived cardiomyocytes with a low dose of doxycycline (dox) and quantitatively assessed changes in nuclear morphology. However, this reviewer is not fully convinced of the validity of this specific application. Furthermore, some data about the effect of expansion require reconsideration. 

      The application we chose was intended as a methods proof-of-concept that could enable future deep biological investigations using HiExM. We believe the data provide an example of the utility of HiExM for collecting thousands of nanoscale images that would benefit experiments that require many samples (e.g., conditions, replicates, timepoints, etc.). The ability to generate large data sets also enables quantitative analysis of images with appropriate statistical power. The intention of this experiment was to provide a proof-of-concept example of the robustness, accessibility, and experimental design flexibility of HiExM. 

      The variability in expansion factor across plates can likely be attributed to the small volume (~250 nL) deposited by the device posts. Small variations in gel volume could impact gel polymerization compared to standard ExM gels. For example, HiExM gels are more sensitive to evaporation due to an increased air-liquid-interface because they are ~1000x smaller than standard expansion gel preparations. Evaporation in HiExM gels likely increases monomer and cross linker concentrations, leading to variation in expansion factor across plates. We note that expansion factor is robust within well plates and that the expansion factor can be more variable between plates, likely due to differences in gel volumes and evaporation. Future iterations of the platform are expected to control for these environmental conditions. These differences are discussed in the revised manuscript.

      Recommendations for the authors:.

      Reviewer #1 (Recommendations For The Authors):

      (1) Please include a scale bar in Figure 3a.

      A scale bar has been added to Figure 3a.

      (2) Please show the data related to nuclear volume after dox treatment.

      We have added a supplementary figure (Fig. S10) showing nuclear volume and sphericity for post-expansion nuclei as well as nuclear area and circularity for pre-expansion nuclei.

      (3) I think it would be extremely helpful for the method as a whole if analysis code and files for device fabrication were made publicly available rather than upon request.

      The analysis code has been included in the supplementary files as CM_Hoechst_Analysis_for publication.ipynb. Device design files are also available at the supplementary files link as hiExM_device.SLDPRT (96-well plate device) and MultiExM_24_July28_2022.SLDPRT (24-well plate device).

      (4) Some details are missing from the methods, such as the concentration of AcX used for HiExM, the concentration of antibodies, etc. Related, how long does the photopolymerization take? Just the 60 seconds that the UVA light is on?

      Additional protocol details are included in the methods section of the revised manuscript. The photopolymerization does only take 60 seconds.

      Reviewer #2 (Recommendations For The Authors):

      (1) The first three references are chosen a little strangely here. I suggest citing STED, SIM, and PALM/STORM from the original manuscripts here. Also, EM is technically not a super-resolution technique as it is within the resolution of electron beams. This reviewer would stay with light microscopy methods when discussing "super-resolution".

      We removed the reference to EM and added citations to the original publications for SIM, STED, and STORM.

      (2) The sentence after citation 4 is a little off in its meaning.

      We have edited the sentence to improve clarity.

      (3) It is highly useful and great that the authors include the observations on the effect of photopolymerization with Irgacure 2959 on dyes.

      (4) In the discussion, the authors could mention new high NA silicone oil objectives that may further optimise the resolution in their scheme.

      We added a sentence in the discussion to reflect this important point.

      (5) The files for the manufacture of the HiExM devices must be in the supplementary data rather than available on request.

      The Solidworks designs for the 96 and 24 well plate devices are included in the supplementary files as hiExM_device.SLDPRT and MultiExM_24_July28_2022.SLDPRT, respectively.

      (6) It would be useful if the authors could discuss their thoughts on the high throughput processing of expansion factors in the data analysis routine.

      We added details to the methods section describing how images are processed and analyzed.

      Reviewer #3 (Recommendations For The Authors):

      Major:

      (1) In the experiments depicted in Figure 3, the authors attempted cellular phenotyping using hiPCS-derived cardiomyocytes treated with doxorubicin (dox). They addressed that the relative intensity of Hoechst at the nuclear periphery increased solely in post-expansion images, although this trend is not clearly evidenced in the provided data (e.g., DMSO control vs. 1 nM dox, Figure 3b). Moreover, this observed phenomenon lacks clear biological significance and may not be suitable as a demonstration for proof-of-concept (POC) acquisition. It is crucial to delineate the biological processes linked with the specific enhancement of DNA binding dye signals in the nuclear periphery and how to rule out the possibility of heterogeneous redistribution of nuclear components rather than enhancing resolution. For instance, if this change can be associated with a biological process such as DNA damage, quantitative detection of the accumulated proteins related to DNA repair, or the specific histone marks, may be more suitable and less susceptible to heterogeneous expansion factors. Additionally, the authors noted the absence of significant changes in nuclear volume, yet the corresponding data was not presented. Moreover, the application insufficiently demonstrated the HiExM's scalable feature employing various well plates. If only acquiring images of dozens of nuclei (Figure 3 legend, p15), a single well per condition would suffice. Therefore, it is necessary to elucidate why this application necessitates a 96-well format for demonstration purposes. The potential experimental design should also incorporate the requirement for well-to-well replication and the acquisition of features at the individual well level, rather than at the single-cell level. Also, related to Figure S10, whether outer gradient slope, but not inner gradient slope, is linked to apoptosis (Page 8, Line 2-4) remains unclear in the H2O2-treated cells.

      We believe the data provide an example of the utility of HiExM that would benefit experiments that require many samples (e.g., conditions, replicates, timepoints, etc.) by enabling easier sample processing and autonomous acquisition of thousands of nanoscale images in parallel. The ability to generate large data sets also enables quantitative analysis of images with appropriate statistical power. The intention of this work is to provide a proof-of-concept example of the robustness, accessibility, and experimental design flexibility of the HiExM method. As discussed in the manuscript, dox treatment is associated with DNA damage, cellular stress, and apoptosis, and commonly observed at high dox concentrations (>200 nM) in in vitro studies using conventional microscopy. Our data suggest that cardiomyocytes exhibit sensitivity to lower concentrations of dox than previously anticipated. Although direct evidence specifically linking dox to increased DNA condensation at the nuclear periphery is limited, the known proapoptotic effects of dox strongly suggest that our observations correlate with these changes. We have now included the data analysis on nuclear morphology in revised Fig. S10. We agree that deeper biological interpretation of the observed changes in Hoechst signal upon dox treatment (or other cellular stressors such as H2O2) using HiExM and whether these changes are correlated with DNA damage or other cellular alterations remains an exciting future direction to develop a more sensitive platform for assessing drug responses.

      For expanded samples, we performed the doxorubicin titration assay across four different well plates (n=4). For each condition, the total number of nuclei measured was 118, 111, 110, 113, and 77 for DMSO, 1nM, 10nM, 100nM, and 1µM, respectively. We apologize for the confusion with respect to the number of replicates and cells analyzed. For SEM calculations, we used the number of independent experiments to avoid underestimating error. 

      (2) In Figure 2b, do the orange arrows indicate the same cell with a unique shape in both the pre- and post-expansion images? Additionally, in Figure 3b, why do the pre- and post-expansion nuclei exhibit such different global shapes? Considering that the gel may freely rotate within the well during expansion, it raises doubts about whether one can identify cells with consistent shapes in both the pre- and post-expansion images. Furthermore, this reviewer observed a similar issue regarding reproducibility among different well plates, as shown in Figure 2h. The panel illustrates that different plates yielded distinct populations of gel sizes. The expansion factors provided in the figure legend (page 13) ranged from 3.5x to 5.1x across gels, indicating a relatively large variation in expansion size. What is the reason behind these variations, and how can they be minimized? These variations could become critical when considering large-scale screening across multiple plates.

      The orange arrow is intended to indicate the same cell with a unique shape in both the pre- and post-expansion images, albeit at a different orientation given that the gel is not fixed within the well. We agree that improved methods to identify the same cells pre- and post-expansion could facilitate error measurements. We have referenced recent methods that could be combined with HiExM to automate and improve error and distortion detection to the discussion of the revised manuscript. 

      Fig. 2 illustrates the ability of HiExM to achieve reproducible gel formation with minimal error within gels, wells, and across plates, measurements consistent with proExM. While uniform within gels, the expansion factor is somewhat variable between gels and plates. We attribute these differences primarily to the small size of the gels, making them vulnerable to the effects of evaporation between experiments. We note this variability should be taken into consideration for studies where absolute length measurements between plates are important for biological interpretation. Future iterations of the platform that allow precise delivery of gel volumes and that minimizes environmental exposure are expected to improve the expansion factor reproducibility across plates to further enable the use of HiExM as a tool for high-throughput nanoscale imaging.

      Minor:

      (1) Considering the signal loss due to photobleaching and fluorophore dilution during expansion, protein imaging may occasionally lack the sensitivity required to detect subtle morphological changes in cellular machinery. This potential limitation should be addressed or discussed in the text.

      A sentence reflecting this point has been added to the manuscript.

      (2) On page 15, the figure legend for panel d states, "Heatmaps of nuclei in b showing..." However, it appears that the panel referred to in this sentence corresponds to panel c.

      The typo has been fixed.

      (3) The type of glass 96-well plate utilized in this study should be specified, as the quality of the product could impact the expansion results.

      The supplier and product number of the well plate used in our study has been added to the methods section.

      (4) In Figure S3, the raw pixel values of CF305 dye are exceptionally low. Is there a specific reason for the very low signals observed when using this dye?

      CF® 350 (305 was a typo) does not excite well at 405 nm, which is the excitation wavelength for the channel we used.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      In their manuscript, Gan and colleagues identified a functional critical residue, Tyr404, which when mutated to W or A results in GOF and LOF of TRPML1 activity, respectively. In addition, the authors provide a high-resolution structure of TRPML1 with PI(4,5)P2 inhibitor. This high-resolution structure also revealed a bound phospholipid likely sphingomyelin at the agonist/antagonist site, providing a plausible explanation for sphingomyelin inhibition of TRPML1.

      This is an interesting study, revealing valuable additional information on TRPML1 gating mechanisms including effects on endogenous phospholipids on channel activity. The provided data are convincing. Some major open questions remain. The work will be of interest to a wide audience including industry researchers occupied with TRPML1 exploration as a drug target.

      We appreciate reviewer #1’s positive comments and the specific points raised by this reviewer are addressed in our response to Recommendations For The Authors

      Reviewer #2 (Public Review):

      The transient receptor potential mucolipin 1 (TRPML1) functions as a lysosomal organelle ion channel whose variants are associated with lysosomal storage disorder mucolipidosis type IV. Understanding sites that allosterically control the TRPML1 channel function may provide new molecular moieties to target with prototypic drugs.

      Gan et al provide the first high-resolution cryo-EM structures of the TRPML1 channel (Y404W) in the open state without any activating ligands. This new structure demonstrates how a mutation at a site some distance away from the pore can influence the channel's conducting state. However, the authors do not provide a structural analysis of the Y404W pore which would validate their open-state claims. Nonetheless, Gan et al provide compelling electrophysiology evidence which supports the proposed Y404W gain of function effect. The authors propose an allosteric mechanism with the following molecular details- the Y404 to W sidechain substitution provides extra van der Waals contacts within the pocket surrounded by helices of the VSD-like domain and causes S4 bending which in turn opens to the pore through the S4-S5 linker. Conversely, the author functionally demonstrates that an alanine mutation at this site causes a loss of function. Although the authors do not provide a structure of the Y404A mutation, they propose that the alanine substitution disrupts the sidechain packing and likely destabilizes the open conformation. TRPM1 channels are regulated by PIP2 species, which is related to their cell function. In the membrane of lysosomes, PI(3,5)P2 activates the channel, whereas PI(4,5)P2 found in the plasma membrane has inhibitory effects. To understand its lipid regulation, the authors solved a cryo-EM structure of TRPM1 bound to PI(4,5)P2 in its presumed closed state. Again, while the provided functional evidence suggests that PI(4,5)P2 occupancy inhibits TRPML1 current, the authors do not provide analysis of the pore which would support their closed state assertion. Within this same structure, the authors observe a density that may be attributed to sphingomyelin (or possibly phosphocholine). Using electrophysiology of WT and the Y404W channels, the authors report sphingomyelins antagonist effect on TRPML1 currents under low luminal (external) pH. Taken together, the results described in Gan et al provide compelling evidence for a gating (open, closed) mechanism of the TRPML1 pore which can be allosterically regulated by altered packing and lipid interactions within the VSDL.

      We appreciate reviewer #2’s positive comments and constructive suggestions. We functionally demonstrated that the Y404A mutant is more stable in the closed state. We did not pursue the structure of this mutant as we expect its structure will be the same as the apo closed TRPML1. To verify the open conformation of the Y404W mutant and the closed conformation of PI(4,5)P2 –bound TRPML1, we analyzed the pore radii of our structures in the revision as suggested by the reviewer and compared them with open and closed pores from previously determined TRPML1. Some specific points raised by this reviewer are addressed in our response to Recommendations For The Authors

      Reviewer #1 (Recommendations For The Authors):

      (1) Mutations in TRPML1 cause Mucolipidosis type IV. One patient mutation reported earlier (Chen et al., 2014 https://pubmed.ncbi.nlm.nih.gov/25119295/) to be a LOF mutation is R403C. This mutation resides just next to the here-identified Y404 position which can be converted in either LOF or GOF. Another patient mutation, F408del (also reported previously: (Chen et al., 2014 https://pubmed.ncbi.nlm.nih.gov/25119295/)) results in a mild activity reduction, in particular of the PI(3,5)P2 effect. Can the authors please discuss their findings in the context of the reported literature on these patient mutations and provide explanations as to why this part of the TRPML1 protein seemingly is such a hotspot for mutations affecting channel activity and how they explain this based on their structural evidence? What characteristics would be required for a small molecule agonist of TRPML1 in order to elicit larger activation in these patient LOF mutations if possible?

      We thank the reviewer for highlighting these mutations identified in human patients. R403 appears to play two key roles. Firstly, its side chain participates in stabilizing Y404 in the open state. Secondly, as demonstrated in our previous study on TRPML1 (PMID: 35131932), the R403 side chain points towards the PI(3,5)P2 binding pocket, where it forms a critical salt bridge with the C3 phosphate group in the open state. Therefore, R403C mutation likely abolishes PI(3,5)P2 activation and also destabilizes the open state, resulting in the loss of function of the channel. We have expanded our discussion on this mutation in the revision. F408 is positioned at the junction between S4 and the S4-S5 linker. Its deletion mutation could change the stability or the folding of the protein. It is difficult to speculate the exact cause of the F408Δ LOF based on the TRPML1 structure. We don’t feel the effect of this mutation is relevant to the findings of this study.

      (2) The authors used ML-SA1 only as a basis for their claims. Could they possibly provide some key data also on alternative small molecule agonists such as SF-51 and/or MK6-83?

      We thank the reviewer for this suggestion. The TRPML1 agonists such as  ML-SA1 (derived from SF-51) and  MK6-83 have been well characterized in previous studies. In our study of sphingomyelin effect on TRPML1 activity, we used SF-51 to activate the channel (Figure 4b). The goal of our study is to demonstrate that agonist and antagonist can still allosterically regulate the LOF Y404A and GOF Y404W mutant channels, respectively, and their competition with sphingomyelin. We chose ML-SA1 in our experiment simply because it has been a commonly used TRPML1 agonist and its binding has been structurally defined, allowing us to compare various TRPML1 structures with different ligands. We don’t feel the use of other agonists would add extra information to our findings.  

      (3) Sphingomyelin effects on TRPML1 have been confirmed by other groups as well (see e.g. Prat Castro et al., 2022 https://pubmed.ncbi.nlm.nih.gov/36139381/ Fig.3. Interestingly TPC2 seems unaffected by sphingomyelin albeit it is also activated by PI(3,5)P2. Can the authors provide possibly some modeling and/or cryoEM data on TPC2 with sphingomyelin to potentially explain why TPC2 is seemingly unaffected by sphingomyelin?

      We appreciate the reviewer for providing additional evidence of sphingomyelin's effects on TRPML1 and have included the reference in revision. The binding site and activation mechanism of PI(3,5)P2 are different between TPC2 and TRPML1. It is beyond the scope of this study and also too speculative to model sphingomyelin binding (if any) in TPC2 to explain its lack of effect on TPC2 activity.

      Reviewer #2 (Recommendations For The Authors):

      The findings from Gan et al provide structural insights into the allosteric regulation of TRPML1 channel gating. The authors have provided compelling and hard-won cryo-EM structural evidence of the channel regulation by PI4,5P2 and at sites that pack the gating pore of the VSDL (S4). However, as noted in the public review, the analysis of the cryo-EM structures that would support claims of open and closed channel states is woefully lacking. Additional information related to the functional results is required to evaluate the activation and inhibition kinetic effect of lipids and pharmacological agents used to support their allosteric mechanism of TRPML1 gating.

      Major concerns:

      (a) At the very least, the pore domains of the new channels (PDB 9CBZ and 9CC2) should be analyzed using the HOLE (or other) programs to estimate the distances along the ion-conducting pathway - are the structures wide enough to support the passage of hydrated or partially hydrated cations? Additional figure panels should provide this comparative analysis.

      We thank the reviewer for this valuable suggestion. We have added a figure (Figure Supplement 3b) of pore domain radius analysis using the HOLE program in the revision and have also included the radius comparison with previous determined open and closed TRPML1 structures.  

      (b) At the very least, all current traces (Figures 1C, 1D, 1E, 4B, 4C) should be accompanied by time course plots of current amplitudes. It is impossible to evaluate the authors' claims of lipid and drug effects on TRPML1 channels without this information.

      The corresponding time course plots of current amplitudes have now been included in the revision as Figure Supplement 1 and 7.

      (c) Regarding the gain of function Y404W mutation structure, the authors' allosteric mechanistic hypothesis centers on side chain packing details within the VSD-like domain S4 (which in turn opens the pore through the S4-S5 linker). However, the local resolution within the structures at this site is not described. To assess the veracity of these claims, at the very least, authors should provide electron density maps of this region, either in Figure 2 or in Figure Supplement 4.

      We have included the electron density map and local resolution information surrounding the W404 residue from the Y404W GOF mutant structure in the revision as  Figure Supplement 3c.

      Minor concerns:

      (d) Additional evidence related to the identity of the pore domain-associated lipid density (PC or sphingomyelin), and its channel regulation would improve the manuscript. The authors examine sphingomyelin, but what is the functional impact of PC on TRPML1 currents? While this is suggested, it is at the authors' discretion whether or not to carry out this analysis.

      We thank the reviewer for raising this question. Adding extra PC has no effect on TRPML1 activity. This is expected since PC is the major lipid component of the membrane.

      (e) The manuscript is well written. However, a few errors were noted while reviewing this draft.

      i. Line 138, (Figure F&G).

      ii. Line 66, "signaling transduction".

      These errors have now been corrected.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      This manuscript by Meissner and colleagues described a novel take on a classic social cognition paradigm developed for marmosets. The classic pull task is a powerful paradigm that has been used for many years across numerous species, but its analog approach has several key limitations. As such, it has not been feasible to adopt the task for neuroscience experiments. Here the authors capture the spirit of the classic task but provide several fundamental innovations that modernize the paradigm - technically and conceptually. By developing the paradigm for marmosets, the authors leverage the many advantages of this primate model for studies of social brain functions and their particular amenability to freely-moving naturalistic approaches.

      Strengths:

      The current manuscript describes one of the most exciting paradigms in primate social cognition to be developed in many years. By allowing for freely-moving marmosets to engage in high numbers of trials, while precisely quantifying their visual behavior (e.g. gaze) and recording neural activity this paradigm has the potential to usher in a new wave of research on the cognitive and neural mechanisms underlying primate social cognition and decision-making. This paradigm is an elegant illustration of how naturalistic questions can be adapted to more rigorous experimental paradigms. Overall, I thought the manuscript was well written and provided sufficient details for others to adopt this paradigm. I did have a handful of questions and requests about topics and information that could help to further accelerate its adoption across the field.

      Weaknesses:

      LN 107 - Otters have also been successful at the classic pull task (https://link.springer.com/article/10.1007/s10071-017-1126-2)

      We have added this reference to the manuscript.

      LN 151 - Can you provide a more precise quantification of timing accuracy than the 'sub-second level'. This helps determine synchronization with other devices.

      We have included more precise timing details, noting that data is stored at the millisecond level.

      Using this paradigm, the marmosets achieved more trials than in the conventional task (146 vs 10). While this is impressive, given that only ~50 are successful Mutual Cooperation trials it does present some challenges for potential neurophysiology experiments and particular cognitive questions. The marmosets are only performing the task for 20 minutes, presumably because they become sated and are no longer motivated. This seems a limitation of the task and is something worth discussing in the manuscript. Did the authors try other food rewards, reduce the amount of reward, food/water restrict the animals for more than the stated 1-3 hours? How might this paradigm be incorporated into in-cage approaches that have been successful in marmosets? Any details on this would help guide others seeking to extend the number of trials performed each day.

      We have added a discussion addressing the use of liquid rewards, minimal food and water restriction, and the potential for further optimization to increase task engagement and trial numbers. This is now reflected in the revised manuscript.

      Can you provide more details on the DLC/Anipose procedure? How were the cameras synchronized? What percentage of trials needed to be annotated before the model could be generalized? Did each monkey require its own model, or was a single one applied to all animals?

      We have added more detailed information on the DLC and Anipose tracking which can be found in the Multi-animal 3D tracking section under Materials & Methods.

      Will the schematics and more instructions on building this system be made publicly available? A number of the components listed in Table 1 are custom-designed. Although it is stated that CAD files will be made available upon request, sharing a link to these files in an accessible folder would significantly add to the potential impact of this paradigm by making it easier for others to adopt.

      We have made the SolidWorks CAD files publicly available. They can now be found in the Github repository alongside the apparatus and task code.

      In the Discussion, it would be helpful to have some discussion of how this paradigm might be used more broadly. The classic pulling paradigm typically allows one to ask a specific question about social cognition, but this task has the potential to be more widely applied to other social decision-making questions. For example, how might this task be adopted to ask some of the game-theory-type approaches common in this literature? Given the authors' expertise in this area, this discussion could serve to provide a roadmap for the broader field to adopt.

      Although this paradigm was developed specifically for marmosets, it seems to me that it could readily be adopted in other species with some modifications. Could the authors speak to this and their thoughts on what may need to be changed to be used in other species? This is particularly important because one of the advantages of the classic paradigm is that it has been used in so many species, providing the opportunity to compare how different species approach the same challenge. For example, though both chimps and bonobos are successful, their differences are notably illuminating about the nuances of their respective social cognitive faculties.

      We have expanded the discussion for the broader applications of this apparatus both for other decision-making research questions as well as its adaptability for use in other species.

      Reviewer #2 (Public Review):

      Summary:

      This important work by Meisner et al., developed an automated apparatus (MarmoAPP) to collect a wide array of behavioral data (lever pulling, gaze direction, vocalizations) in marmoset monkeys, with the goal of modernizing collection of behavioral data to coincide with the investigation of neurological mechanisms governing behavioral decision making in an important primate neuroscience model. The authors show a variety of "proof-of-principle" concepts that this apparatus can collect a wide range of behavioral data, with higher behavioral resolution than traditional methods. For example, the authors highlight that typical behavioral experiments on primate cooperation provide around 10 trials per session, while using their approach the authors were able to collect over 100 trials per 20-minute session with the MarmoAAP.

      Overall the authors argue that this approach has a few notable advantages:<br /> (1) it enhances behavioral output which is important for measuring small or nuanced effects/changes in behavior;<br /> (2) allows for more advanced analyses given the higher number of trials per session;<br /> (3) significantly reduces the human labor of manually coding behavioral outcomes and experimenter interventions such as reloading apparatuses for food or position;<br /> (4) allows for more flexibility and experimental rigor in measuring behavior and neural activity simultaneously.

      Strengths:

      The paper is well-written and the MarmoAPP appears to be highly successful at integrating behavioral data across many important contexts (cooperation, gaze, vocalizations), with the ability to measure significantly many more behavioral contexts (many of which the authors make suggestions for).

      The authors provide substantive information about the design of the apparatus, how the apparatus can be obtained via a long list of information Apparatus parts and information, and provide data outcomes from a wide number of behavioral and neurological outcomes. The significance of the findings is important for the field of social neuroscience and the strength of evidence is solid in terms of the ability of the apparatus to perform as described, at least in marmoset monkeys. The advantage of collecting neural and freely-behaving behavioral data concurrently is a significant advantage.

      Weaknesses:

      While this paper has many significant strengths, there are a few notable weaknesses in that many of the advantages are not explicitly demonstrated within the evidence presented in the paper. There are data reported (as shown in Figures 2 and 3), but in many cases, it is unclear if the data is referenced in other published work, as the data analysis is not described and/or self-contained within the manuscript, which it should be for readers to understand the nature of the data shown in Figures 2 and 3.

      (1) There is no data in the paper or reference demonstrating training performance in the marmosets. For example, how many sessions are required to reach a pre-determined criterion of acceptable demonstration of task competence? The authors reference reliably performing the self-reward task, but this was not objectively stated in terms of what level of reliability was used. Moreover, in the Mutual Cooperation paradigm, while there is data reported on performance between self-reward vs mutual cooperation tasks, it is unclear how the authors measured individual understanding of mutual cooperation in this paradigm (cooperation performance in the mutual cooperation paradigm in the presence or absence of a partner; and how, if at all, this performance varied across social context). What positive or negative control is used to discern gained advantages between deliberate cooperation vs two individuals succeeding at self-reward simultaneously?

      Thank you for your comment. This Tools & Resources paper is focused solely on the development of the apparatus and methods. Future publications will provide more details on training performance, learning behaviors, and include appropriate controls to distinguish deliberate cooperation from simultaneous success in self-reward tasks.

      (2) One of the notable strengths of this approach argued by the authors is the improved ability to utilize trials for data analysis, but this is not presented or supported in the manuscript. For example, the paper would be improved by explicitly showing a significant improvement in the analytical outcome associated with a comparison of cooperation performance in the context of ~150 trials using MarmoAAP vs 10-12 trials using conventional behavioral approaches beyond the general principle of sample size. The authors highlight the dissection of intricacies of behavioral dynamics, but more could be demonstrated to specifically show these intricacies compared to conventional approaches. Given the cost and expertise required to build and operate the MarmoAAP, it is critical to provide an important advantage gained on this front. The addition of data analysis and explicit description(s) of other analytical advantages would likely strengthen this paper and the advantages of MarmoAAP over other behavioral techniques.

      Thank you for the suggestion. While this manuscript focuses on the apparatus and methods, the increase in trial numbers itself provides clear advantages, including greater statistical power and more robust analyses of behavioral dynamics. Future publications will offer more in-depth analyses comparing the performance and cooperation behavior observed with MarmoAAP, further demonstrating these analytical benefits.

      Reviewer #3 (Public Review):

      Summary:

      The authors set out to devise a system for the neural and behavioral study of socially cooperative behaviors in nonhuman primates (common marmosets). They describe instrumentation to allow for a "cooperative pulling" paradigm, the training process, and how both behavioral and neural data can be collected and analyzed. This is a valuable approach to an important topic, as the marmoset stands as a great platform to study primate social cognition. Given that the goals of such a methods paper are to (a) describe the approach and instrumentation, (b) show the feasibility of use, and (c) quantitatively compare to related approaches, the work is easily able to meet those criteria. My specific feedback on both strengths and weaknesses is therefore relatively limited in scope and depth.

      Strengths:

      The device is well-described, and the authors should be commended for their efforts in both designing this system but also in "writing it up" so that others can benefit from their R&D.

      The device appears to generate more repetitions of key behavior than other approaches used in prior work (with other species).

      The device allows for quantitative control and adjustment to control behavior.

      The approach also supports the integration of markerless behavioral analysis as well as neurophysiological data.

      Weaknesses:

      A few ambiguities in the descriptions are flagged below in the "Recommendations for authors".

      The system is well-suited to marmosets, but it is less clear whether it could be generalized for use in other species (in which similar behaviors have been studied with far less elegant approaches). If the system could impact work in other species, the scope of impact would be significantly increased, and would also allow for more direct cross-species comparisons. Regardless, the future work that this system will allow in the marmoset will itself be novel, unique, and likely to support major insights into primate social cognition.

      Thank you for this feedback. We have expanded the discussion to include how the apparatus could be adapted for use in other species, highlighting the potential modifications required, such as adjusting the size and strength of the servo motor and components. These changes would enable broader applications and facilitate cross-species comparisons.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      Understanding large-scale neural activity remains a formidable challenge in neuroscience. While several methods have been proposed to discover the assemblies from such large-scale recordings, most previous studies do not explicitly model the temporal dynamics. This study is an attempt to uncover the temporal dynamics of assemblies using a tool that has been established in other domains.

      The authors previously introduced the compositional Restricted Boltzmann Machine (cRBM) to identify neuron assemblies in zebrafish brain activity. Building upon this, they now employ the Recurrent Temporal Restricted Boltzmann Machine (RTRBM) to elucidate the temporal dynamics within these assemblies. By introducing recurrent connections between hidden units, RTRBM could retrieve neural assemblies and their temporal dynamics from simulated and zebrafish brain data.

      Strengths:

      The RTRBM has been previously used in other domains. Training in the model has been already established. This study is an application of such a model to neuroscience. Overall, the paper is well-structured and the methodology is robust, the analysis is solid to support the authors' claim.

      Weaknesses:

      The overall degree of advance is very limited. The performance improvement by RTRBM compared to their cRBM is marginal, and insights into assembly dynamics are limited.

      (1) The biological insights from this method are constrained. Though the aim is to unravel neural ensemble dynamics, the paper lacks in-depth discussion on how this method enhances our understanding of zebrafish neural dynamics. For example, the dynamics of assemblies can be analyzed using various tools such as dimensionality reduction methods once we have identified them using cRBM. What information can we gain by knowing the effective recurrent connection between them? It would be more convincing to show this in real data.

      See below in the recommendations section.

      (2) Despite the increased complexity of RTRBM over cRBM, performance improvement is minimal. Accuracy enhancements, less than 1% in synthetic and zebrafish data, are underwhelming (Figure 2G and Figure 4B). Predictive performance evaluation on real neural activity would enhance model assessment. Including predicted and measured neural activity traces could aid readers in evaluating model efficacy.

      See below in the recommendations section.

      Recommendations:

      (1) The biological insights from this method are constrained. Though the aim is to unravel neural ensemble dynamics, the paper lacks in-depth discussion on how this method enhances our understanding of zebrafish neural dynamics. For example, the dynamics of assemblies can be analyzed using various tools such as dimensionality reduction methods once we have identified them using cRBM. What information can we gain by knowing the effective recurrent connection between them? It would be more convincing to show this in real data.

      We agree with the reviewer that our analysis does not explore the data far enough to reach the level of new biological insights. For practical reasons unrelated to the science, we cannot further explore the data in this direction at this point, however, funding permitting, we will pick up this question at a later stage. The only change we have made to the corresponding figure at the current stage was to adapt the thresholds, which better emphasizes the locality of the resulting clusters.

      (2) Despite the increased complexity of RTRBM over cRBM, performance improvement is minimal. Accuracy enhancements, less than 1% in synthetic and zebrafish data, are underwhelming (Figure 2G and Figure 4B). Predictive performance evaluation on real neural activity would enhance model assessment. Including predicted and measured neural activity traces could aid readers in evaluating model efficacy.

      We thank the reviewer kindly for the comments on the performance comparison between the two models. We would like to highlight that the small range of accuracy values for the predictive performance is due to both the sparsity and stochasticity of the simulated data, and is not reflective of the actual percentage in performance improvement. To this end, we have opted to use a rescaled metric that we call the normalised Mean Squared Error (nMSE), where the MSE is equal to 1 minus the accuracy, as the visible units take on binary values. This metric is also more in line with the normalised Log-Likelihood (nLLH) metric used in the cRBM paper in terms of interpretability. The figure shows that the RTRBM can significantly predict the state of the visible units in subsequent time-steps, whereas the cRBM captures the correct time-independent statistics but has no predictive power over time.

      We also thank the reviewer for pointing out that there is no predictive performance evaluation on the neural data. This has been chosen to be omitted for two reasons. First, it is clear from Fig. 2 that the (c)RBM has no temporal dependencies, meaning that the predictive performance is determined mostly by the average activity of the visible units. If this corresponds well with the actual mean activity per neuron, the nMSE will be around 0. This correspondence is already evaluated in the first panel of 3F. Second, as this is real data, we can not make an estimate of a lower bound on the MSE that is due to neural noise. Because of this, the scale of the predictive performance score will be arbitrary, making it difficult to quantitatively assess the difference in performance between both models.

      (3) The interpretation of the hidden real variable $r_t$ lacks clarity. Initially interpreted as the expectation of $\mathbf{h}_t$, its interpretation in Eq (8) appears different. Clarification on this link is warranted.

      We thank the reviewer kindly for the suggested clarification. However, we think the link between both values should already be sufficiently clear from the text in lines 469-470:

      “Importantly, instead of using binary hidden unit states 𝐡[𝑡−1], sampled from the expected real valued hidden states 𝐫[𝑡−1], the RTRBM propagates these real-valued hidden unit states directly.”

      In other words, both indeed are the same, one could sample a binary-valued 𝐡[𝑡-1] from the real-valued 𝐫[𝑡-1] through e.g. a Bernoulli distribution, where 𝐫[𝑡-1] would thus indeed act as an expectation over 𝐡[𝑡−1]. However, the RTRBM formulation keeps the real-valued 𝐫[𝑡-1] to propagate the hidden-unit states to the next time-step. The motivation for this choice is further discussed in the original RTRBM paper (Sutskever et al. 2008).

      (4) In Figure 3 panel F, the discrepancy in x-axis scales between upper and lower panels requires clarification. Explanation regarding the difference and interpretation guidelines would enhance understanding.

      Thank you for pointing out the discrepancy in x-axis scales between the upper and lower panels of Figure 3F. The reason why these scales are different is that the activation functions in the two models differ in their range, and showing them on the same scale would not do justice to this difference. But we agree that this could be unclear for readers. Therefore we added an additional clarification for this discrepancy in line 215:

      “While a direct comparison of the hidden unit activations between the cRBM and the RTRBM is hindered by the inherent discrepancy in their activation functions (unbounded and bounded, respectively), the analysis of time-shifted moments reveals a stronger correlation for the RTRBM hidden units ($r_s = 0.92$, $p<\epsilon$) compared to the cRBM ($r_s = 0.88$, $p<\epsilon$)”

      (5) Assessing model performance at various down-sampling rates in zebrafish data analysis would provide insights into model robustness.

      We agree that we would have liked to assess this point in real data, to verify that this holds as well in the case of the zebrafish whole-brain data. The main reason why we did not choose to do this in this case is that we would only be able to further downsample the data. Current whole brain data sets are collected at a few Hz (here 4 Hz, only 2 Hz in other datasets), which we consider to be likely slower than the actual interaction speed in neural systems, which is on the order of milliseconds between neurons, and on the order of ~100 ms (~10 Hz) between assemblies. Therefore reducing the rate further, we expect to only see a reduction in quality, which we considered less interesting than finding an optimum. Higher rates of imaging in light-sheet imaging are only achievable currently by imaging only single planes (which defies the goal of whole brain recordings), but may be possible in the future when the limiting factors (focal plane stepping and imaging) are addressed. For completeness, we have now performed the downstepping for the experimental data, which showed the expected decrease in performance. The results have been integrated into Figure 4.

      Reviewer #2 (Public Review):

      Summary:

      In this work, the authors propose an extension to some of the last author's previous work, where a compositional restricted Boltzmann machine was considered as a generative model of neuron-assembly interaction. They augment this model by recurrent connections between the Boltzmann machine's hidden units, which allow them to explicitly account for temporal dynamics of the assembly activity. Since their model formulation does not allow the training towards a compositional phase (as in the previous model), they employ a transfer learning approach according to which they initialise their model with a weight matrix that was pre-trained using the earlier model so as to essentially start the actually training in a compositional phase. Finally, they test this model on synthetic and actual data of whole-brain light-sheet-microscopy recordings of spontaneous activity from the brain of larval zebrafish.

      Strengths:

      This work introduces a new model for neural assembly activity. Importantly, being able to capture temporal assembly dynamics is an interesting feature that goes beyond many existing models. While this work clearly focuses on the method (or the model) itself, it opens up an avenue for experimental research where it will be interesting to see if one can obtain any biologically meaningful insights considering these temporal dynamics when one is able to, for instance, relate them to development or behaviour.

      Weaknesses:

      For most of the work, the authors present their RTRBM model as an improvement over the earlier cRBM model. Yet, when considering synthetic data, they actually seem to compare with a "standard" RBM model. This seems odd considering the overall narrative, and it is not clear why they chose to do that. Also, in that case, was the RTRBM model initialised with the cRBM weight matrix?

      Thank you for raising the important point regarding the RTRBM comparison in the synthetic data section. Initially, we aimed to compare the performance of the cRBM with the cRTRBM. However, we encountered significant challenges in getting the RTRBM to reach the compositional phase. To ensure a fair and robust comparison, we opted to compare the RBM with the RTRBM.

      A few claims made throughout the work are slightly too enthusiastic and not really supported by the data shown. For instance, when the authors refer to the clusters shown in Figure 3D as "spatially localized", this seems like a stretch, specifically in view of clusters 1, 3, and 4.

      Thanks for pointing out this inaccuracy. When going back to the data/analyses to address the question about locality, we stumbled upon a minor bug in the implementation of the proportional thresholding, causing the threshold to be too low and therefore too many neurons to be considered.

      Fixing this bug reduces the number of neurons, thereby better showing the local structure of the clusters. Furthermore, if one would lower the threshold within the hierarchical clustering, smaller, and more localized, clusters would appear. We deliberately chose to keep this threshold high to not overwhelm the reader with the number of identified clusters. We hope the reviewer agrees with these changes and that the spatial structure in the clusters presented are indeed rather localized.

      Moreover, when they describe the predictive performance of their model as "close to optimal" when the down-sampling factor coincided with the interaction time scale, it seems a bit exaggerated given that it was more or less as close to the upper bound as it was to the lower bound.

      We thank the reviewer for catching this error. Indeed, the best performing model does not lay very close to the estimated performance of an optimal model. The text has been updated to reflect this.

      When discussing the data statistics, the authors quote correlation values in the main text. However, these do not match the correlation values in the figure to which they seem to belong. Now, it seems that in the main text, they consider the Pearson correlation, whereas in the corresponding figure, it is the Spearman correlation. This is very confusing, and it is not really clear as to why the authors chose to do so.

      Thank you for identifying the discrepancy between the correlation values mentioned in the text and those presented in the figure. We updated the manuscript to match the correlation coefficient values in the figure with the correct values denoted in the text.

      Finally, when discussing the fact that the RTRBM model outperforms the cRBM model, the authors state it does so for different moments and in different numbers of cases (fish). It would be very interesting to know whether these are the same fish or always different fish.

      Thank you for pointing this out. Keeping track of the same fish across the different metrics makes sense. We updated the figure to include a color code for each individual fish. As it turns out each time the same fish are significantly better performing.

      Recommendations:

      Figure 1: While the schematic in A and D only shows 11 visible units ("neurons"), the weight matrices and the activity rasters in B and C and E and F suggest that there should be, in fact, 12 visible units. While not essential, I think it would be nice if these numbers would match up.

      Thank you for pointing out the inconsistency in the number of visible units depicted in Figure 1. We agree that this could have been confusing for readers. The figure has been updated accordingly. As you suggested, the schematic representation now accurately reflects the presence of 12 visible units in both the RBM and RTRBM models.

      Figure 3: Panel G is not referenced in the main text. Yet, I believe it should be somewhere in lines 225ff.

      Thank you for mentioning this. We added in line 233 a reference to figure 3 panel G to refer to the performance of the cRBM and RTRBM on the different fish.

      Line 637ff: The authors consider moments <v\_i h\_μ> and <v\_i h\_j>, and from the context, it seems they are not the same. However, it is not clear as to why because, judging from the notation, they should be the same.

      The second-order statistic <v\_i h\_j> on line 639 was indeed already mentioned and denoted as <v\_i h\_μ> on line 638. It has now been removed accordingly in the updated manuscript.

      I found the usage of U^ and U throughout the manuscript a bit confusing. As far as I understand, U^ is a learned representation of U. However, maybe the authors could make the distinction clearer.

      We understand the usage of Û and U throughout the text may be confusing for the reader. However, we would like to notify the reviewer that the distinction between these two variables is explained in line 142: “in addition to providing a close estimate (̂Û) to the true assembly connectivity matrix U”. However, for added clarification to the reader, we added additional mentions of the estimated nature of Û throughout the text in the updated manuscript.

      Equation 3: It would be great if the authors could provide some more explanation of how they arrived at the identities.

      These identities have previously been widely described in literature. For this reason, we decided not to include their derivation in our manuscript. However, for completeness, we kindly refer to:

      Goodfellow, I., Bengio, Y., & Courville, A. (2016). Chapter 20: Deep generative models [In Deep Learning]. MIT Press. https://www.deeplearningbook.org/contents/generative_models.html

      Typos:

      -  L. 196: "connectiivty" -> "connectivity"

      -  L. 197: Does it mean to say "very strong stronger"?

      -  L. 339: The reference to Dunn et al. (2016) should appear in parentheses.

      -  L. 504f: The colon should probably be followed by a full sentence.

      -  Eq. 2: In the first line, the potential V still appears, which should probably be changed to show the concrete form (-b * h) as in the second line.

      -  L. 351: Is there maybe a comma missing after "cRBM"?

      -  L. 271: Instead of "correlation", shouldn't it rather be "similarity"? - L. 218: "Figure 3D" -> "Figure 3F"

      We thank the reviewer for pointing out these typos, which have all (except one) been fixed in the text. We do emphasize the potential V to show that there are alternative hidden unit potentials that can be chosen. For instance, the cRBM utilizes dReLu hidden unit potentials.

      Reviewer #3 (Public Review):

      With ever-growing datasets, it becomes more challenging to extract useful information from such a large amount of data. For that, developing better dimensionality reduction/clustering methods can be very important to make sense of analyzed data. This is especially true for neuroscience where new experimental advances allow the recording of an unprecedented number of neurons. Here the authors make a step to help with neuronal analyses by proposing a new method to identify groups of neurons with similar activity dynamics. I did not notice any obvious problems with data analyses here, however, the presented manuscript has a few weaknesses:

      (1) Because this manuscript is written as an extension of previous work by the same authors (van der Plas et al., eLife, 2023), thus to fully understand this paper it is required to read first the previous paper, as authors often refer to their previous work for details. Similarly, to understand the functional significance of identified here neuronal assemblies, it is needed to go to look at the previous paper.

      We agree that the present Research Advance has been written in a way that builds on our previous publication. It was our impression that this was the intention of the Research Advance format, as spelled out in its announcement "eLife has introduced an innovative new type of article – the Research Advance – that invites the authors of any eLife paper to present significant additions to their original research". In the previous formatting guidelines from eLife this was more evident with a strong limitation on the number of figures and words, however, also for the present, more liberal guidelines, place an emphasis on the relation to the previous article. We have nonetheless tried in several places to fill in details that might simplify the reading experience.

      (2) The problem of discovering clusters in data with temporal dynamics is not unique to neuroscience. Therefore, the authors should also discuss other previously proposed methods and how they compare to the presented here RTRBM method. Similarly, there are other methods using neural networks for discovering clusters (assemblies) (e.g. t-SNE: van der Maaten & Hinton 2008, Hippocluster: Chalmers et al. 2023, etc), which should be discussed to give better background information for the readers.

      The clustering methods suggested by the reviewer do not include modeling any time dependence, which is the crucial advance presented here by the introduction of the RTRBM, in extending the (c)RBM. In our previous publication on the cRBM (an der Plas et al., eLife, 2023), this comparison was part of the discussion, although it focussed on a different set of methods. While clustering methods like t-SNE, UMAP and others certainly have their value in scientific analysis, we think it might be misleading the reader to think that they achieve the same task as an RTRBM, which adds the crucial dimension of temporal dependence.

      (3) The above point to better describe other methods is especially important because the performance of the presented here method is not that much better than previous work. For example, RTRBM outperforms the cRBM only on ~4 out of 8 fish datasets. Moreover, as the authors nicely described in the Limitations section this method currently can only work on a single time scale and clusters have to be estimated first with the previous cRBM method. Thus, having an overview of other methods which could be used for similar analyses would be helpful.

      We think that the perception that the RTRBM performs only slightly better is based on a misinterpretation of the performance measure, which we have tried to address (see comments above) in this rebuttal and the manuscript. In addition we would like to emphasize that the structural estimation (which is still modified by the RTRBM, only seeded by the cRBMs output), as shown in the simulated data, makes improved structural estimates, which is important, even in cases where the performance is comparable (which can be the case if the RBM absorbs temporal dependencies of assemblies into modified structure of assemblies). We have clarified this now in the discussion.

      Recommendations:

      (1) Line 181: it is not explained how a reconstruction error is defined.

      Dear reviewer, thanks for pointing this out. A definition of the (mean square) reconstruction error is added in this line.

      (2) How was the number of hidden neurons chosen and how does it affect performance?

      Thank you for pointing this out. Due to the fact that we use transfer learning, the number of hidden units used for the RTRBM is given by the number of hidden units used for training the cRBM. In further research, when the RTRBM operates in the compositional phase, we can exploit a grid search over a set of hyper parameters to determine the optimal set of hidden units and other parameters.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewing Editor (Recommendations For The Authors):

      The revised manuscripts and rebuttal sufficiently answered all the questions raised by three Reviewers. Overall, the manuscript is well written and the results are clear based on a straightforward experiment in a pursuit of comparing dimeric PTH analog to 1-34 PTH analog, which has established clinical efficacy. The study's results are valuable as it utilized large animal models, specifically examined the local bone integration effects, and demonstrated the comparable therapeutic efficacy of the new PTH analog to 1-34 PTH. However, the data did not convincingly show how the dimeric PTH analog overcomes the limitations of 1-34 PTH. I suggest that the discussion should focus more on the differences between the two analogs.

      We sincerely appreciate your thorough review and valuable feedback. We have carefully considered your comments and would like to address them as follows:

      “Regarding the results on the effect of dimeric R25CPTH(1-34) in the OVX mouse model (Noh et al., 2024), bone formation markers were increased in the dimeric R25CPTH(1-34) group compared to the rhPTH (1-34) group. Additionally, bone resorption markers were decreased in the rhPTH (1-34) group compared to the control group. However, no significant differences were observed in the dimeric R25CPTH(1-34) group. This suggests that the mechanism of action of the dimeric peptide differs from that of the wildtype peptide. Furthermore, based on unpublished data comparing mRNA expression in bone and kidney tissues between the dimeric R25CPTH(1-34) and rhPTH (1-34) treated groups, we strongly believe that dimeric R25CPTH(1-34) exhibits distinct biological activity from rhPTH (1-34). These differences may arise from variations in PTH receptor binding, involvement of different G protein subtypes, or downstream intracellular signaling pathways.

      The distinct effects of dimeric R25CPTH(1-34) and rhPTH (1-34) on osteoblasts and osteoclasts could indicate that while remodeling-based osteogenesis has a limited clinical use period, the dimeric form might promote sustained bone formation and increased bone density over a longer duration. Given that patients with this mutation, who have been exposed to the mutant dimer throughout their lives, exhibit high bone density, this suggests significant potential for dimeric R25CPTH(1-34) as a novel therapeutic option alongside wildtype PTH.” (Discussion section 2nd paragraph)

      A few minor points I 'd like to point out. This line number is based on a Word file.

      Line 146-148 - However, both were insufficient compared to the control group and did not illustrate any bone filling. The measured bone-implant contact ratio was 18.32 {plus minus} 16.19% for the control group, 48.13 {plus minus} 29.81% for the group, and 39.53 {plus minus} 26.17% (P < 0.05).

      - Does it mean that bone generation of both treatment group is inferior to the control group? please specify which groups the values are belong to and between which groups P-value compare.

      Thank you very much for your suggestion to improve the manuscript. We have recognized the previous omission and have revised the sentence clearly as follows.

      "The measured bone–implant contact ratio was 18.32 ± 16.19% for the control group, 48.13 ± 29.81% for the rhPTH(1-34) group, and 39.53 ± 26.17% for the dimeric R25CPTH(1-34) group, illustrating the significant improvement in osseointegration. (P < 0.05 for the control group compared to both PTH groups; however, the difference between the PTH groups was not significant.)"

      Line 157 - incompleteness over the same period. The rhPTH(1-34) group exhibited a mature trabecularcfghnc

      - Please correct misspellings.

      As the reviewer mentioned, I have corrected "trabecularcfghnc" to "trabecular." Thank you.

      Line 165-168 and Figure 4 M-N - Both the rhPTH(1-34) and dimeric R25CPTH(1-34) groups showed a significantly higher number of TRAP+ cells at both bone defects, with and without a xenograft, compared to the control group (Figure 4M,N). (P < 0.05) In addition, the number of TRAP+ cells in the dimeric R25CPTH(1-34)group was significantly higher than in the vehicle, yet lower than in the rhPTH(1-34) group (Figure 4M,N).)

      - I believe the heading of figure 4M-N should be changed to with or without xenograft. And maybe you want to explain the significant difference of TRAP positive cells between two groups (with vs. without xenograft). Minor point: was - were

      We totally agree with reviewer’s comment. We changed figure 4. Also, based on the revised figure, the figure legends for figure 4 were also revised as follows. “The number of TRAP-positive cells in the mandible with and without xenograft in the rhPTH(1-34) and dimeric R25CPTH(1-34)-treated beagle groups.” Following the reviewer's comments, the be verb in the sentences in the results section was changed from ‘was’ to ‘were’. “The capability of rhPTH(1-34) and dimeric R25CPTH(1-34) in bone remodeling were evaluated by tartrate-resistant acid phosphatase (TRAP) immunohistochemical staining.”

      Line 182-186 - This study investigated the therapeutic effects of rhPTH(1-34) and dimeric R25CPTH(1-34) on bone regeneration and osseointegration in a large animal model with postmenopausal osteoporosis. rhPTH(1-34) and dimeric R25CPTH(1-34) have shown significant clinical efficacy, and although there have been a few studies investigating their effects on bone regeneration in rodents (Garcia et al., 2013), the authors in this study aimed to investigate the effects using a large animal model that more accurately mimics osteoporotic humans (Cortet, 2011).

      - Please split the sentences for better clarity. In last sentence, I'm unsure what Cortet 2011 citation here is for. The statement should be written in the first person not the third person.

      We appreciate your attention to detail, which has helped improve the clarity and accuracy of this manuscript. As per the reviewer's suggestion, I have reordered and changed the references to fit the content and revised the sentences to the first person.

      “rhPTH(1-34) and dimeric R25CPTH(1-34) have shown significant clinical efficacy. Although there have been a few studies investigating their effects on bone regeneration in rodents (Garcia et al., 2013), we aimed to investigate these effects using a large animal model. We chose this model because it more accurately mimics osteoporotic humans (Jee and Yao, 2001).”

      Line 196-197 - Furthermore, by demonstrating that dimeric R25CPTH(1-34) exhibits a distinct pharmacological profile different from rhPTH(1-34) but still provides a clear anabolic effect in the localized jaw region, the authors have shown that it may possess different potential therapeutic indications from rhPTH(1-34).

      - This study does not include any pharmacological data. (Please cite reference). Again, I would suggest writing it in the first person. It sounds like you are reviewing someone else's work

      Thank you for your insightful comments. We acknowledge that our study did not include pharmacological data. We have changed the sentence to clarify that the pharmacological profile information is derived from previous studies. A suitable citation was included to substantiate this assertion. As suggested, we have revised the statement in the first person to more accurately represent our own research and discoveries.

      “Furthermore, we have shown that dimeric R25CPTH(1-34) has a distinct anabolic effect in the localized mandible region, which is comparable to that of rhPTH(1-34). Our findings indicate that dimeric R25CPTH(1-34) may have distinct potential therapeutic indications, as demonstrated by prior pharmacological studies  (Bae et al., 2016), which demonstrated that it possesses a distinct pharmacological profile from rhPTH(1-34).”

      Line 201 - One of the potential clinical advantages of dimeric R25CPTH(1-34) is its partial agonistic effect in pharmacodynamics.

      - it needs reference

      Thank you for your insightful advice. As reviewer’s suggestion, we have included references as follows.

      “Additionally, the potency of cAMP production in cells was lower for dimeric R25CPTH compared to monomeric R25CPTH, consistent with its lower PTH1R-binding affinity (Noh et al., 2024).”

      Line 206-207 - Also, the effects of dimer were prominent, as we mentioned better bone formation than the control group

      - But not compared with monomeric 1-34 PTH

      We have revised the statement to more accurately reflect our findings.

      “Also, the impact of dimeric R25CPTH(1-34) was notable, as we observed a noticeable improvement in bone formation when compared to the control group. However, these effects were not as strong as those of rhPTH(1-34). Both PTH analogs demonstrated enhanced anabolic effects around the titanium implants, promoting bone regeneration and remodeling.”

      Line 224 - The authors have attributed this phenomenon to the unique anatomical characteristics observed in the jawbone.

      - I would suggest writing it in the first person

      We totally understood the reviewer’s comment. We have corrected the sentences as follows.

      “The anabolic effects of both PTH analogs in this specific region may have been enhanced by the unique anatomical characteristics of the mandible, which we attribute to these improvements.”

      Line 236 - The authors have attributed this phenomenon to the unique anatomical characteristics observed in the jawbone.

      - This is outdated as the label of two year limit of Forteo use was lifted by FDA in 2021

      Thank you for your valuable comments regarding the FDA’s decision to lift the two-year limit on Forteo (teriparatide) use in 2021. We have revised sentences to reflect this recent information in FDA guidelines as follows.

      “Despite the FDA's decision to remove the two-year treatment limit in 2021, which opens possibilities for broader clinical applications, there are still numerous challenges that need to be addressed. There are ongoing concerns about the potential long-term effects of extended use, including accelerated bone remodeling, possible hypercalcemic conditions, and heightened bone resorption”

      Line 380-382 - bone volume (TV; mm3), trabecular number (Tb.N; 1/mm), trabecular thickness (Tb. Th; um), trabecular separation (Tb.sp; µm).

      - minor points- please superscript mm3, and change u -> µ

      We appreciate reviewer’s detailed comments. We have corrected the part about unit display in figure legend.

      Line 405-406 - following treatment with dimeric dimeric R25CPTH(1-34)

      - please remove redundancy.

      We removed dimeric duplication in the figure legend for figure 5 as follows.

      “Figure 5. Measurement of biochemical Marker Dynamics in serum. The serum levels of calcium, phosphorus, P1NP, and CTX across three time points (T0, T1, T2) following treatment with dimeric R25CPTH(1-34), rhPTH(1-34) and control.”

      Line 409-410 - CTX levels, associated with bone resorption, show no significant differences between groups.

      - there is a missing figure identification. please specify relevant figure - I guess (E)

      We appreciate the reviewer's insightful comment regarding the missing figure identification in the sentence about CTX levels. After reviewing Figure 5, we have specified the relevant figure panel as follows:

      “Figure 5. (A) The study timeline. (B-C) Calcium and phosphorus levels show an upward trend in response to both PTH treatments compared to control, indicating enhanced bone mineralization. (D) P1NP levels, indicative of bone formation, remain relatively stable across time and treatments. (E) CTX levels, associated with bone resorption, show no significant differences between groups.”

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment 

      This study is a detailed investigation of how chromatin structure influences replication origin function in yeast ribosomal DNA, with focus on the role of the histone deacetylase Sir2 and the chromatin remodeler Fun30. Convincing evidence shows that Sir2 does not affect origin licensing but rather affects local transcription and nucleosome positioning which correlates with increased origin firing. However, the evidence remains incomplete as the methods employed do not rigorously establish a key aspect of the mechanism, fully address some alternative models, or sufficiently relate to prior results. Overall, this is a valuable advance for the field that could be improved to establish a more robust paradigm. 

      We have added extensive new results to the manuscript that, we believe, address all three criticisms above, namely that the methods employed do not (1) rigorously establish a key aspect of the mechanism; (2) fully address some alternative models; or (3) sufficiently relate to prior results.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      This paper presents a mechanistic study of rDNA origin regulation in yeast by SIR2. Each of the ~180 tandemly repeated rDNA gene copies contains a potential replication origin. Earlyefficient initiation of these origins is suppressed by Sir2, reducing competition with origins distributed throughout the genome for rate-limiting initiation factors. Previous studies by these authors showed that SIR2 deletion advances replication timing of rDNA origins by a complex mechanism of transcriptional de-repression of a local PolII promoter causing licensed origin proteins (MCMcomplexes) to re-localize (slide along the DNA) to a different (and altered) chromatin environment. In this study, they identify a chromatin remodeler, FUN30, that suppresses the sir2∆ effect, and remarkably, results in a contraction of the rDNA to about onequarter it's normal length/number of repeats, implicating replication defects of the rDNA. Through examination of replication timing, MCM occupancy and nucleosome occupancy on the chromatin in sir2, fun30, and double mutants, they propose a model where nucleosome position relative to the licensed origin (MCM complexes) intrinsically determines origin timing/efficiency. While their interpretations of the data are largely reasonable and can be interpreted to support their model, a key weakness is the connection between Mcm ChEC signal disappearance and origin firing.  

      Criticism: The reviewer expressed concern about the connection between Mcm ChEC signal disappearance and origin firing.

      To further support our claim that the disappearance of the MCM signal in our ChEC datasets reflects origin firing, we now present additional data using the well-established method of MCM Chromatin IP (ChIP).

      (1) New Supporting Evidence:  ChIP at genome-wide origins. In Figure 5 figure supplement 2, we demonstrate that the Mcm2 ChIP signal in cells released into hydroxyurea (HU) is significantly reduced at early origins compared to late origins, which mirrors the pattern observed with the MCM2 ChEC signal. This reduction in the ChIP signal at early origins supports the interpretation that the MCM signal disappearance is associated with origin firing.

      (2) New supporting based evidence:  ChIP at rDNA Origins. Our ChIP analysis also shows that the disappearance of the MCM signal at rDNA origins in sir2Δ cells released into HU is accompanied by signal accumulation at the replication fork barrier (RFB), indicative of stalled replication forks at this location (Figure 5 figure supplement 3). This pattern is consistent with the initiation of replication at these origins and fork stalling at the RFB.

      (3) New supporting evidence:  2D gels with quantification. Furthermore, additional 2D gel electrophoresis results provide ample independent evidence of rDNA origin firing in HU in sir2Δ mutants and suppression of origin firing in sir2 fun30 cells. These new data include 1) quantification of 2D gels in Figure 4D and 2) new 2D gels presented in Figure 4C as described below in greater detail. Collectively, these results demonstrate that rDNA origins fire prematurely in HU in sir2 cells and that firing is suppressed by FUN30 deletion. These additional data reinforce our model and support the association between MCM signal disappearance and replication initiation.

      While the cyclical chromatin association-dissociation of MCM proteins with potential origin sequences may be generally interpreted as licensing followed by firing, dissociation may also result from passive replication and as shown here, displacement by transcription and/or chromatin remodeling.

      The reviewer raised a concern that the cyclical chromatin association-dissociation of MCM proteins could be interpreted as licensing followed by firing, but might also result from passive replication or displacement by transcription and chromatin remodeling.

      Addressing Alternative Explanations:

      (1) Selective Disappearance of MCM Complexes: While transcription and passive replication can indeed cause the MCM-ChEC signal to disappear, these processes cannot selectively cause the disappearance of the displaced MCM complex without also affecting the non-displaced MCM complex. Specifically, RNA polymerase transcribing C-pro would first need to dislodge the normally positioned MCM complex before reaching the displaced complex, which is not observed in our data.

      (2) Role of FUN30 Deletion:  FUN30 deletion results in increased C-pro transcription and reduced disappearance of the displaced MCM complex. This observation supports our model, as transcription alone would not selectively affect the displaced MCM complex while leaving the normally positioned MCM complex unaffected.

      (3) Licensing Restrictions: It is crucial to note that continuous replenishment of displaced MCMs with newly loaded MCMs is not possible in our experimental conditions, as the cells are in S phase and licensing is restricted to G1. This temporal restriction further supports our interpretation that the disappearance of the MCM signal reflects origin firing rather than alternative processes.

      In summary, while alternative explanations such as transcription and passive replication could potentially account for MCM signal disappearance, our data indicate that these processes cannot selectively affect the displaced MCM complex without impacting the non-displaced complex. The selective disappearance observed in our experiments, along with the effects of FUN30 deletion and the temporal constraints on MCM loading, strongly support our interpretation that the disappearance of the MCM signal reflects origin firing.

      Moreover, linking its disappearance from chromatin in the ChEC method with such precise resolution needs to be validated against an independent method to determine the initiation site(s). Differences in rDNA copy number and relative transcription levels also are not directly accounted for, obscuring a clearer interpretation of the results. 

      The reviewer raised concerns about the need to validate the disappearance of MCM from chromatin observed using the ChEC method against an independent method to determine initiation sites. Additionally, they pointed out that differences in rDNA copy number and relative transcription levels are not directly accounted for, which may obscure the interpretation of the results.

      (1) Reduced rDNA Copy Number promotes Early Replication: Copy number reduction of the magnitude caused by deletion of both SIR2 and FUN30 is not expected to suppress early rDNA replication in sir2, but rather to exacerbate it. Specifically, deletion of SIR2 and FUN30 causes the rDNA to shrink to approximately 35 copies. Kwan et al., 2023 (PMID: 36842087) have shown that a reduction in rDNA copy number to 35 copies results in a dramatic acceleration of rDNA replication in a SIR2+ strain. Therefore, the effect of rDNA size on replication timing reinforces our conclusion that deletion of FUN30 suppresses rDNA replication.

      (2) New 2D Gels in sir2 and sir2 fun30 strains with equal number of rDNA repeats: To directly address the concern regarding differences in the number of rDNA repeats, we have included new 2D gel analyses in the revised manuscript. By using a fob1

      background, we were able to equalize the repeat number between the sir2 and sir2 fun30 strains (Figure 4E). The 2D gels conclusively show that the suppression of rDNA origin firing upon FUN30 deletion is independent of both rDNA size and FOB1.

      Nevertheless, this paper makes a valuable advance with the finding of Fun30 involvement, which substantially reduces rDNA repeat number in sir2∆ background. The model they develop is compelling and I am inclined to agree, but I think the evidence on this specific point is purely correlative and a better method is needed to address the initiation site question. The authors deserve credit for their efforts to elucidate our obscure understanding of the intricacies of chromatin regulation. At a minimum, I suggest their conclusions on these points of concern should be softened and caveats discussed. Statistical analysis is lacking for some claims. 

      Strengths are the identification of FUN30 as suppressor, examination of specific mutants of FUN30 to distinguish likely functional involvement. Use of multiple methods to analyze replication and protein occupancies on chromatin. Development of a coherent model. 

      Weaknesses are failure to address copy number as a variable; insufficient validation of ChEC method relationship to exact initiation locus; lack of statistical analysis in some cases. 

      With regard to "insufficient validation of ChEC method relationship to exact initiation locus":  The two potential initiation sites that one would monitor (non-displaced and displaced) are separated by less than 150 base pairs, and other techniques simply do not have the resolution necessary to distinguish such differences. Indeed, our new ChIP results presented in Figure 5 figure supplement 3 clearly demonstrate that while the resolution of ChIP is adequate to detect the reduction of MCM signal at the replication initiation site and its relocation to the RFB ( ~2 kb away), it lacks the resolution required to differentiate closely spaced MCM complexes.

      Furthermore, as we suggest in the manuscript, our results are consistent with a model in which it is only the displaced MCM complex that is activated, whether in sir2 or WT.  If no genotypedependent difference in initiation sites is even expected, it would be hard to interpret even the most precise replication-based assays.  

      We appreciate the reviewer pointing out that some statistical analyses were lacking: we have added statistical analysis for 2D gels (Figures 4D and 4E),  EdU incorporation experiments in Figure 4F and disappearance of MCM ChEC and ChIP signal upon release of cells into HU (Figure 5 supplement 1 and Supplement 2).  

      Additional background and discussion for public review: 

      This paper broadly addresses the mechanism(s) that regulate replication origin firing in different chromatin contexts. The rDNA origin is present in each of ~180 tandem repeats of the rDNA sequence, representing a high potential origin density per length of DNA (9.1kb repeat unit). However, the average origin efficiency of rDNA origins is relatively low (~20% in wild-type cells), which reduces the replication load on the overall genome by reducing competition with origins throughout the genome for limiting replication initiation factors. Deletion of histone deacetylase SIR2, which silences PolII transcription within the rDNA, results in increased early activation or the rDNA origins (and reduced rate of overall genome replication). Previous work by the authors showed that MCM complexes loaded onto the rDNA origins (origin licensing) were laterally displaced (sliding) along the rDNA, away from a well-positioned nucleosome on one side. The authors' major hypothesis throughout this work is that the new MCM location(s) are intrinsically more efficient configurations for origin firing. The authors identify a chromatin remodeling enzyme, FUN30, whose deletion appears to suppress the earlier activation of rDNA origins in sir2∆ cells. Indeed, it appears that the reduction of rDNA origin activity in sir2∆ fun30∆ cells is severe enough to results in a substantial reduction in the rDNA array repeat length (number of repeats); the reduced rDNA length presumably facilitates it's more stable replication and maintenance. 

      Analysis of replication by 2D gels is marginally convincing, using 2D gels for this purpose is very challenging and tricky to quantify. 

      We address this criticism by carefuly quantifying 2 D gel results using single rARS signal for normalizing bubble arc as discussed below.

      The more quantitative analysis by EdU incorporation is more convincing of the suppression of the earlier replication caused by SIR2 deletion. 

      We have also added quantification of EdU results to strengthen our arguments.  

      To address the mechanism of suppression, they analyze MCM positioning using ChEC, which in G1 cells shows partial displacement of MCM from normal position A to positions B and C in sir2∆ cells and similar but more complete displacement away from A to positions B and C in sir2fun30 cells. During S-phase in the presence of hydroxyurea, which slows replication progression considerably (and blocks later origin firing) MCM signals redistribute, which is interpreted to represent origin firing and bidirectional movement of MCMs (only one direction is shown), some of which accumulate near the replication fork barrier, consistent with their interpretation. They observe that MCMs displaced (in G1) to sites B or C in sir2∆ cells, disappear more rapidly during S-phase, whereas the similar dynamic is not observed in sir2∆fun30∆. This is the main basis for their conclusion that the B and C sites are more permissive than A. While this may be the simplest interpretation, there are limitations with this assay that undermine a rigorous conclusion (additional points below). The main problem is that we know the MCM complexes are mobile so disappearance may reflect displacement by other means including transcription which is high is the sir2∆ background. Indeed, the double mutant has greater level of transcription per repeat unit which might explain more displaced from A in G1. Thus, displacement might not always represent origin firing. Because the sir2 background profoundly changes transcription, and the double mutant has a much smaller array length associated with higher transcription, how can we rule out greater accessibility at site A, for example in sir2∆, leading to more firing, which is suppressed in sir2 fun30 due to greater MCM displacement away from A? 

      I think the critical missing data to solidly support their conclusions is a definitive determination of the site(s) of initiation using a more direct method, such as strand specific sequencing of EdU or nascent strand analysis. More direct comparisons of the strains with lower copy number to rule out this facet. As discussed in detail below, copy number reduction is known to suppress at least part of the sir2∆ effect so this looms over the interpretations. I think they are probably correct in their overall model based on the simplest interpretation of the data but I think it remains to be rigorously established. I think they should soften their conclusions in this respect. 

      Please see discussion below about these issues.

      Reviewer #2 (Public Review): 

      Summary: 

      In this manuscript, the authors follow up on their previous work showing that in the absence of the Sir2 deacetylase the MCM replicative helicase at the rDNA spacer region is repositioned to a region of low nucleosome occupancy. Here they show that the repositioned displaced MCMs have increased firing propensity relative to non-displaced MCMs. In addition, they show that activation of the repositioned MCMs and low nucleosome occupancy in the adjacent region depend on the chromatin remodeling activity of Fun30. 

      Strengths: 

      The paper provides new information on the role of a conserved chromatin remodeling protein in the regulation of origin firing and in addition provides evidence that not all loaded MCMs fire and that origin firing is regulated at a step downstream of MCM loading. 

      Weaknesses: 

      The relationship between the author's results and prior work on the role of Sir2 (and Fob1) in regulation of rDNA recombination and copy number maintenance is not explored, making it difficult to place the results in a broader context. Sir2 has previously been shown to be recruited by Fob1, which is also required for DSB formation and recombination-mediated changes in rDNA copy number. Are the changes that the authors observe specifically in fun30 sir2 cells related to this pathway? Is Fob1 required for the reduced rDNA copy number in fun30 sir2 double mutant cells? 

      We have conducted additional studies in the fob1 background to address how FOB1 and the replication fork barrier (RFB) influence the kinetics of rDNA size reduction upon FUN30 deletion (Figure 2 - figure supplement 2), rDNA replication timing (Figure 2 - figure supplement 3), and rDNA origin firing using 2D gels (Figure 4C).

      Strains lacking SIR2 exhibit unstable rDNA size, and FOB1 deletion stabilizes rDNA size in a sir2 background (and otherwise). Similarly, we found that FOB1 deletion influences the kinetics of rDNA size reduction in sir2 fun30 cells. Specifically, we were able to generate a fob1 sir2 fun30 strain with more than 150 copies. Nonetheless, and consistent with our model, this strain still exhibited delayed rDNA replication timing (Figure 2 - figure supplement 3), and its rDNA still shrank upon continuous culture (Figure 2 figure supplement 2). These results demonstrate that, although FOB1 affects the kinetics of rDNA size reduction in sir2 fun30 strains, the reduced rDNA array size or delayed replication timing upon FUN30 deletion size does not depend on FOB1.

      The use of the fob1 background allowed us to compare the activation of rDNA origins in sir2 and sir2 fun30 strains with equally short rDNA sizes. 2D gels demonstrate robust and reproducible suppression of rDNA origin activity upon deletion of FUN30 in sir2 fob1 strains with 35 rDNA copies (Figure 4C). These results indicate that the main effect we are interested in—FUN30-induced reduction in origin firing—is independent of both FOB1 and rDNA size.

      Our additional studies conclusively show that the FUN30-induced reduction in rDNA origin firing is independent of both FOB1 and rDNA size. These findings provide important insights into the mechanisms regulating rDNA copy number maintenance, placing our results within the broader context of existing knowledge on Sir2 and Fob1 functions.

      Reviewer #3 (Public Review): 

      Summary: 

      Heterochromatin is characterized by low transcription activity and late replication timing, both dependent on the NAD-dependent protein deacetylase Sir2, the founding member of the sirtuins. This manuscript addresses the mechanism by which Sir2 delays replication timing at the rDNA in budding yeast. Previous work from the same laboratory (Foss et al. PLoS Genetics 15, e1008138) showed that Sir2 represses transcription-dependent displacement of the Mcm helicase in the rDNA. In this manuscript, the authors show convincingly that the repositioned Mcms fire earlier and that this early firing partly depends on the ATPase activity of the nucleosome remodeler Fun30. Using read-depth analysis of sorted G1/S cells, fun30 was the only chromatin remodeler mutant that somewhat delayed replication timing in sir2 mutants, while nhp10, chd1, isw1, htl1, swr1, isw2, and irc3 had not effect. The conclusion was corroborated with orthogonal assays including two-dimensional gel electrophoresis and analysis of EdU incorporation at early origins. Using an insightful analysis with an Mcm-MNase fusion (Mcm-ChEC), the authors show that the repositioned Mcms in sir2 mutants fire earlier than the Mcm at the normal position in wild type. This early firing at the repositioned Mcms is partially suppressed by Fun30. In addition, the authors show Fun30 affects nucleosome occupancy at the sites of the repositioned Mcm, providing a plausible mechanism for the effect of Fun30 on Mcm firing at that position. However, the results from the MNAse-seq and ChEC-seq assays are not fully congruent for the fun30 single mutant. Overall, the results support the conclusions providing a much better mechanistic understanding how Sir2 affects replication timing at rDNA, 

      The observation that the MNase-seq plot in fun30 mutant shows a large signal at the +3 nucleosome and somewhat smaller at position +2, while the ChEC-seq plot exhibits negligible signals, is indeed an important point of consideration. This discrepancy arises because most of the MCM in fun30 mutant remains at its original site where it abuts +1 nucleosome. As a result, the MCM-MNase fusion protein fails to reach and “light up” the +3 nucleosome, which is, nonetheless, well-visualized with exogenous MNase.  The paucity of displaced MCMs, which is responsible for cutting +2 nucleosome, explains the discrepancy in the +2 nucleosome signal between exogenous MNase and CheC datasets in the fun30 mutant.  

      Despite this apparent discrepancy, the overall results support our conclusions and provide a much better mechanistic understanding of how Sir2 affects replication timing at rDNA. The MNaseseq data reflect nucleosome positioning and chromatin structure, while the ChEC-seq data specifically highlights the locations where MCM is bound and active.  

      Strengths 

      (1) The data clearly show that the repositioned Mcm helicase fires earlier than the Mcm in the wild type position. 

      (2) The study identifies a specific role for Fun30 in replication timing and an effect on nucleosome occupancy around the newly positioned Mcm helicase in sir2 cells. 

      Weaknesses 

      (1) It is unclear which strains were used in each experiment. 

      (2) The relevance of the fun30 phospho-site mutant (S20AS28A) is unclear. 

      We appreciate the reviewer pointing out places in which our manuscript omitted key pieces of information (items 1 and 3), we have included the strain numbers in our revision.  With regard to point 2, we had written:  

      Fun30 is also known to play a role in the DNA damage response; specifically, phosphorylation of Fun30 on S20 and S28 by CDK1 targets Fun30 to sites of DNA damage, where it promotes DNA resection (Chen et al. 2016; Bantele et al. 2017). To determine whether the replication phenotype that we observed might be a consequence of Fun30's role in the DNA damage response, we tested non-phosphorylatable mutants for the ability to suppress early replication of the rDNA in sir2; these mutations had no effect on the replication phenotype (Figure 2B), arguing against a primary role for Fun30 in DNA damage repair that somehow manifests itself in replication. 

      (3) For some experiments (Figs. 3, 4, 6) it is unclear whether the data are reproducible and the differences significant. Information about the number of independent experiments and quantitation is lacking. This affects the interpretation, as fun30 seems to affect the +3 nucleosome much more than let on in the description. 

      We have provided replicas and quantitation for the results in these figures.

      (Replica ChEC Southern blot with quantification (Figure 3 figure supplement 1), quantification and replicas for 2D gels in Figure 4 and replicas for nucleosome occupancy (Figure 6 supplement 1).

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      Fig. 3-Examination of MCM occupancy at the rDNA ARS region using a variation of ChEC.

      Presumably these are these G1-arrested cells but does not seem to be stated. Please confirm. 

      The 2D gels results are not very convincing of their conclusions. We are asked to compare bubble to fork arcs at 30 minutes, but this is not feasible. It is the author's job to quantify the data from multiple replicates, but none is given. After much careful examination, comparing the relative intensities of ascending bubble and Y-arcs, I think I can accept that 4A shows highest early efficiency for sir2 over WT and fun30, which are similar to each other, and lowest for sir2 fun30, at 60 and 90 min. 

      In the revision we provide a careful quantification of the 2D gels in Figure 4. For assessing rDNA origin activity, we normalized the bubble arc during the HU time course to a single rARS signal, that appears as large 24.4kb Nhe1I fragment originating from the  rightmost rDNA repeat (see Figures 4A and 4B). The description of the quantification in the text is provided below. 

      “Prior to separation on 2D gels, DNA was digested with NheI, which releases a 4.7 kb rARScontaining linear DNA fragment at the internal rDNA repeats (1N) and a much larger, 24.5 kb single-rARS-containing fragment originating from the rightmost repeat. In 2D gels, active origins generate replication bubble arc signals, whereas passive replication of an origin appears as a y-arc. Having a signal emanating from a single ARS-containing fragment simplifies the comparison of rDNA origin activity in strains with different numbers of rDNA repeats, such as in sir2 vs sir2 fun30 mutants. Origin activity is expressed as a ratio of the bubble to the single-ARS signal, effectively measuring the number of active rDNA origins per cell at a given time point. 

      As seen previously (Foss et al. 2019), deletion of SIR2 increased the number of activated rDNA origins, while deletion of FUN30 suppressed this effect. When analyzed in aggregate at 20, 30, 60 and 90 minutes following release into HU, the average number of activated rDNA origin activity in sir2 mutant was increased 6.3-fold compared to those in WT (5.0±2.3 in sir2 vs 0.8±0.4 in wt, p<0.05 by 2 tailed t-test), and the increased number was reduced upon FUN30 deletion (1.3±0.7 in sir2 fun30, p<0.05 by 2 tailed t-test vs sir2, NS for comparison to WT).”

      However, for part 4B, they state (p. 11) that deletion of FUN30 in a SIR2 background had no perceptible effect (on ARS305) but I think the data appear otherwise: the FUN30 cells show more Y-arc than WT.

      We now provide the assessment of ARS305 activity in HU cells as a ratio of bubble-arc to 1N signal. The reviewer is right that FUN30 has a more robust bubble arc signal compared to WT.

      However, after normalization to 1N this difference did not appear significant (3.7 vs 5.1). Overall the analysis of activity or ARS305 origins demonstrates a reciprocity with the activity of rDNA origins in each of the four genotypes.  Furthermore, this observation is confirmed in our EdU-based analysis of 111 genomic origins, with statistical analysis showing a very high level of significance (see below).  

      Ultimately, analysis of unsynchronized cells would give unambiguous results about origin efficiency. In this regard I note that analysis of rDNA origin firing by 2D gels with HU versus asynchronous gives different results in WT versus sir2∆, with no difference in unsynchronized cells (He et al. 2022). It would be interesting to test the strains here unsynchronized, though copy number size would still be a variable to address.

      Origin activity in log cultures is typically assessed by comparing replication initiation within an origin, presenting as a bubble arc, to passively replicated DNA (Y-arc). However, such an analysis at tandemly arrayed origins, such as rDNA, is not feasible, as both active and passive replication are the result of activation of the same origins. This explains the lack of difference between WT and sir2 cells previously reported (He et al. 2022), which we have also observed. Differences in activation of rDNA origins in WT vs sir2 cells is clearly reflected in HU experiments, as was the case in the earlier report (He et al. 2022). 

      To address the issue of differences in copy number between sir2 and sir2 fun30 cells we have now done experiments in a fob1 background where we can equalize the copy number among the two genotypes. These 2D gels are presented in Figure 4C. We address this issue in the revised manuscript as follows:

      “The overall impact of FUN30 deletion on rDNA origin activity in a sir2 background is expected to be a composite of two opposing effects: a suppression of rDNA origin activation and increased rDNA origin activation due to reduced rDNA size (Kwan et al. 2023). To evaluate the effect FUN30 on rDNA origin activation independently of rDNA size, we generated an isogenic set of strains in a fob1 background, all of which contain 35 copies of the rDNA repeat.  (Deletion of FOB1 is necessary to stabilize rDNA copy number.)  Comparing rDNA origin activity in sir2 versus sir2 fun30 genotypes, we observed a robust and reproducible reduction in rDNA origin activity upon FUN30 deletion. This finding confirms that the FUN30 suppresses rDNA origin firing in sir2 background independently of both rDNA size and FOB1 status.”

      -EdU analysis is more convincing regarding relative effects on genome versus rDNA, however, again, the effect of reduced rDNA array size in the sir2 fun30 cells may also be the proximal cause of the reduced effect on genome (early origins) replication rather than a direct effect on origin efficiency. No statistic provided to support that fun30 suppresses sir2 for rDNA activity. 

      This comment raises three distinct, but related, issues: 

      First, the reviewer is asking whether the reduced rDNA size, of the magnitude we observed in sir2 fun30 cells, could by itself be responsible for increased origin activity elsewhere in the genome, just because there is less rDNA that needs to be replicated. As noted earlier (Kwan et al. 2023), Kwan et al. examined the effect of rDNA size reduction and observed: 1) marked increased in rDNA origin activity and 2) reciprocal reduction in origin activity elsewhere in the genome. This counterintuitive finding suggests that a smaller rDNA size exerts more competition for limited replication resources compared to a larger rDNA size. In light of this, our findings with FUN30 deletion become even more compelling. The suppression of rDNA firing upon FUN30 deletion is so significant that it overrides the expected effects of rDNA size reduction.

      Second, the reviewer points out our lack of statistical analysis to support our contention that fun30 suppresses sir2 with regard to rDNA origin activity. We have now addressed this issue as well, by quantifying 2D gel signals, as described above in the text that begins with "Prior to separation on 2D gels, DNA was digested with NheI ...". 

      Third, we have now provided a statistical analysis to support our conclusion that EdU-based analysis of activity of 111 early origins shows suppression upon deletion of SIR2 that is largely reversed by additional deletion of FUN30. 

      "Deletion of FUN30 in a sir2 background partially restored EdU incorporation at early origins, concomitant with reduced EdU incorporation at rDNA origins. In particular, the median value of log10 of read depths at 111 early origins, as the data are shown in Figure 4F, dropped from 6.5 for wild type to 6.2 for sir2 but then returned almost to wild type levels (6.4) in sir2 fun30.  The p value obtained by Student's t test, comparing the drop in 111 origins from wild type to sir2 with that from wild type to sir2 fun30 was highly significant (<< 10-16)  In contrast, FUN30 deletion in the WT background did not reduce EdU incorporation at genomic origins (median 6.6). These findings highlight that FUN30 deletion-induced suppression of rDNA origins in sir2 is accompanied by the activation of genomic origins."

      Use loss of Mcm-ChEC signal as proxy for origin firing. Reasonably convincing that decrease correlates with origin firing on a one-to-one basis (Fig. 5B), though no statistic given. 

      We provide the statistical analysis in Figure 5-figure supplement 1.

      However, there is no demonstration of ability to observe this correlation with fine resolution as needed for the claims here. It seems equally possible that sir2 deletion causes more firing by repositioning MCMs to a better location or that the prior location, which still contains substantial MCM, becomes more permissive. The MCM signal appears to be mobile, so perhaps the role of FUN30 is to prevent to mobility of MCM away from the original site in WT cells; note that significantly less Mcm signal is at the original position in sir2 fun30. No accumulation of MCM occurs near the RFB in WT (and fun30) cells. I understand that origin firing is lower in WT but raises concerns about sensitivity and dynamic range of this assay and that MCM positions may reflect transcription versus replication. 

      Please see the section above labeled "Addressing Alternative Explanations".  

      Is Fig 6A Y-axis correctly labeled? I understand this figure to represent MNase-seq reads; is there any Mcm2-ChEC-seq in part A? 

      We have corrected the labeling. 6A represent MNase-seq reads. Thank you for pointing this out.

      I understand part B to represent nucleosome-sized fragments released by Mcm2-ChEC interpreted to be nucleosomes. But could they be large fragments potentially containing adjacent MCM-double hexamers?  

      Our representation of ChEC-seq data in Figure 1 supplement 1, where we can see the entire spectrum of fragment sizes, demonstrates two distinct populations of fragments: nucleosome size and MCM-size fragments.

      Reviewer #2 (Recommendations For The Authors): 

      Suggestions for the authors to consider: 

      (1) The authors make a good case for the importance of replication balance between rDNA and euchromatin in ensuring that the genome is replicated in a timely fashion. This seems to be clearly regulated by Sir2. However, Sir2 also affects rDNA copy number and suppresses unequal cross over events, which are stimulates by Fob1. Does Fun30 suppress Fob1-dependent recombination events in sir2D cells? 

      It is unclear why FUN30 only affects rDNA repeat copy number in sir2 cells. Why doesn't Fun30 reduce copy number in wild-type cells? 

      Deletion of SIR2 causes rightward repositioning of MCMs to a position where they are more prone to fire, as shown by our HU ChEC datasets in which we show that the repositioned MCMs are more prone to activation than the non-repositioned ones. FUN30 deletion suppresses activation of these, activation-prone repositioned MCMs, as shown by HU ChEC. This suppression of rDNA origin activation in sir2 cells causes rDNA to shrink. In fun30 single mutants, due to the paucity of non-repositioned MCMs, we do not observe significant suppression of rDNA origin firing, and consequently, there is no reduction in rDNA size in fun30 cells.

      (2) The authors use Mcm-MNase to map the location of the MCM helicase. Can these results be confirmed using the more standard and direct ChIP assay to examine changes in MCM localization

      We carried out suggested MCM ChIP experiments and present these results in Figure 5 supplement 2 and supplement 3. These ChIP data demonstrate that: 

      (1) MCM signal disappears preferentially at early origins compared to late origins, as seen in our ChEC results.

      (2) The disappearance of ChEC signal at rDNA origins in sir2 mutant is accompanied by the signal accumulation at the RFB, consistent with fork stalling at the RFB mirroring the results we obtained by ChEC. While these results indicate that that ChIP has adequate resolution to detect MCM repositioning at 2 kb, scale, its resolution was insufficient for fine scale discrimination of repositioned and non-repositioned MCMs.

      In this regard, the specific role of Fun30 in regulation of MCM firing at rDNA is interesting. 

      Does Fun30 localize to the ARS region of rDNA? How is Fun30 specifically recruited to rDNA?  

      We carried out ChIP for Fun30 and observed, similarly to previous reports (Durand-Dubief et al. 2012), a wide distribution of Fun30 throughout the genome and at rDNA. We have elected not to include these results in the current manuscript.

      (3) The 2D gels in Figure 4 are difficult to interpret. The bubble to arc ratios in fun30D seem different from both wild-type and sir2D. It may be helpful to the reader to quantify the bubble to arc ratios. fun30D also seems to be affecting ARS305 by itself.

      We provide quantification of 2 D gels in Figure 4.

      (4) Figure 5. 

      (4.1) For examining origin firing based on the disappearance of the Mcm-MNase reads, is HU arrest necessary? HU may be causing indirect effects due to replication fork stalling. In principle, the authors should be able to perform this analysis without HU, since their cells are released from synchronized arrest in G1 (and at least for the first cell cycle should proceed synchronously on to S phase). In addition, validation of Mcm-ChEC results using ChIP for one of the subunits of the MCM complex would increase confidence in the results. 

      The HU arrest allows us to examine early events in DNA replication at much finer spatial and temporal resolution than it would be possible without it.

      We have now used Mcm2 ChIP to confirm that the signal disappears at the MCM loading site in HU in sir2 cells as discussed above (Figure 5 figure supplement 3). However, the resolution is inadequate to discriminate non-repositioned vs repositioned MCMs.

      (4.2) The non-displaced Mcm-ChEC signal in sir2D seems like it's decreasing more than in wildtype cells. Explain. It would be helpful to quantify these results by integrating the area under each peek (or based on read numbers). It looks like one of the displaced Mcm signals (the one more distal from the non-displaced) is changing at a similar rate to the non-displaced.  

      Integrating the area under each Mcm-ChEC peak or using read numbers is superfluous for the following reasons:  (1) The rectangular appearance of the peaks in Figure 5 clearly reflects signal intensity, making additional numerical integration redundant. (2) The visual differences between wild-type and sir2D cells are distinct and sufficient for drawing conclusions without further quantification.  (3) Keeping the analysis straightforward avoids unnecessary complexity and maintains clarity.

      (4.3) Can the authors explain why fun30D seems to be suppressing only one of the 2 displaced Mcms from firing? 

      We speculate that the local environment is more conductive for firing one of two displaced MCMs, but we do not understand why.

      (5) Figure 6. Why would the deletion of SIR2, a silencing factor, results in increased nucleosome occupancy at rDNA? 

      If we understand correctly, the reviewer is referring to a small increase in +2 and +3 signal in sir2 compared to the WT. In WT G1 cells, there is a single MCM between +1 and +3 nucleosome. This space cannot accommodate a +2 nucleosome in G1 cells because MCM is loaded at that position in most cells (in G2 cells however, this space is occupied by a nucleosome (Foss et al., 2019). MCM repositioning in sir2 mutant would displace MCM from this location making it possible for this space to be now occupied by a nucleosome.

      The changes in nuc density seem modest. Also, nucleosome density is similarly increased in sir2D and fun30D cells, but sir2 has a dramatic effect on origin firing but fun30D does not. Explain. 

      We believe that the FUN30 status makes most of the difference for firing of displaced MCMs.

      Since there are few displaced MCMs in SIR2 cells, there is not large impact on origin firing. Furthermore, the rDNA already fires late in WT cells, so our ability to detect further delay upon  FUN30 deletion could be more difficult.

      (6) Discussion. At rDNA Sir2 may simply act by deacetylating nucleosomes and decreasing their mobility. This is unrelated to compaction which is usually only invoked regarding the activities of the full SIR complex (Sir2/3/4) at telomeres and the mating type locus. The arguments regarding polymerase size, compaction etc may not be relevant to the main point since although the budding yeast Sir2 participates in heterochromatin formation at the mating type loci and telomeres, at rDNA it may act locally near its recruitment site at the RFB. 

      This is a valid point. We have added this sentence in the discussion to highlight the differences between silencing at rDNA and those at the silent mating loci and telomeres that SIR-complex dependent.

      “Steric arguments such as these are even less compelling when made for rDNA than for the silent mating type loci and telomeres, because chromatin compaction has been studied mostly in the context of the complete Sir complex (Sir1-4). In contrast, Sir1, 3, and 4 are not present at the rDNA.”

      Minor 

      It would be interesting to see if deletion of any histone acetyltranferases acts in a similar way to Fun30 to reduce rDNA copy number in sir2D cells. 

      Thank you for this suggestion.

      Reviewer #3 (Recommendations For The Authors): 

      (1) The design of Figure 3 could be improved. A scheme could help understand the assay without flipping back to Figure 1. The numbers below the gel bands need definition. 

      We have included the scheme describing the restriction and MCM-MNase cut sites and the location of the probe for the Southern blot.

      (2) The design of Figure 4 could be improved by adding a scheme to help interpret the 2d gel picture. The figure also lacks quantitation. Are the results reproducible and the differences significant? 

      We have added the scheme, quantification and statistics in Figure 4.

      (3) Please list in each figure legend the exact strains from Table S1 which were used. 

      We have included the strain numbers in the Figure legend.

      Durand-Dubief M, Will WR, Petrini E, Theodorou D, Harris RR, Crawford MR, Paszkiewicz K, Krueger F, Correra RM, Vetter AT et al. 2012. SWI/SNF-like chromatin remodeling factor Fun30 supports point centromere function in S. cerevisiae. PLoS Genet 8: e1002974.

      Foss EJ, Gatbonton-Schwager T, Thiesen AH, Taylor E, Soriano R, Lao U, MacAlpine DM, Bedalov A. 2019. Sir2 suppresses transcription-mediated displacement of Mcm2-7 replicative helicases at the ribosomal DNA repeats. PLoS Genet 15: e1008138.

      He Y, Petrie MV, Zhang H, Peace JM, Aparicio OM. 2022. Rpd3 regulates single-copy origins independently of the rDNA array by opposing Fkh1-mediated origin stimulation. Proc Natl Acad Sci U S A 119: e2212134119.

      Kwan EX, Alvino GM, Lynch KL, Levan PF, Amemiya HM, Wang XS, Johnson SA, Sanchez JC, Miller MA, Croy M et al. 2023. Ribosomal DNA replication time coordinates completion of genome replication and anaphase in yeast. Cell Rep 42: 112161.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      This manuscript describes soluble Uric Acid (sUA) as an endogenous inhibitor of CD38, affecting CD38 activity and NAD+ levels both in vitro and in vivo. Importantly, the inhibition constants calculated support the claim that sUA inhibits CD38 under physiological conditions. These findings are of extreme importance to understanding the regulation of an enzyme that has been shown to be the main NAD+/NMN-degrading enzyme in mammals, which impacts several metabolic processes and has major implications for understanding aging diseases. The manuscript is well written, the figures are self-explanatory, and in the experiments presented, the data is very solid. The authors discuss the main limitations of the study, especially in regard to the in vivo results. As a whole, I believe that this is a very interesting manuscript that will be appreciated by the scientific community and that opens a lot of new questions in the field of metabolism and aging. I found some issues that I believe constitute a weakness in the manuscript, and although they do not require new experiments, they may be considered by the authors for discussion in the final version of the manuscript.

      We greatly appreciate the reviewer’s thoughtful comments and favorable review of our work.

      The authors acknowledge the existence of several previous papers involving pharmacological inhibition of CD38 and their impact on several models of metabolism and aging. However, they only cite reviews. Given the focus of the manuscript, I believe that the seminal original papers should be cited.

      Yes, we agreed with the reviewer. Two representative papers regarding the pioneering findings [Ref 1, 2] of pharmacological inhibition of CD38 were cited in the discussion of current manuscript.

      (1) Tarragó, M. G., et al. (2018). A Potent and Specific CD38 Inhibitor Ameliorates Age-Related Metabolic Dysfunction by Reversing Tissue NAD+ Decline. Cell Metab 27(5): 1081-1095.e1010.

      (2) Escande, C., et al. (2013). Flavonoid apigenin is an inhibitor of the NAD+ ase CD38: implications for cellular NAD+ metabolism, protein acetylation, and treatment of metabolic syndrome. Diabetes 62(4): 1084-1093.

      Related to the previous comment, the authors show that they have identified the functional group on sUA that inhibits CD38, 1,3-dihydroimidazol-2-one. How does this group relate with previous structures that were shown to inhibit CD38 and do not have this chemical structure? Is sUA inhibiting CD38 in a different site? A crystallographic structure of CD38-78c is available in PDB that could be used to study or model these interactions.

      Currently, there are several kinds of CD38 inhibitors, including NAD+/NMN analogs, flavonoids, 4-quinolines, etc. [Ref 1], but they do not have 1,3-dihydroimidazol-2-one or similar groups. We also noticed that sUA and its analogs have no remarkable structural similarity with these inhibitors. We have ever tried to identify the binding sites of sUA on CD38 by NMR. Since our NMR method required a large sample size, we had to prepare recombinant human CD38 using a cell-free protein synthesis system. However, the obtained CD38 protein showed a lower Vmax than commercial recombinant CD38 expressed in HEK293 cells, raising a concern of spatial conformation deference in the synthesized CD38. Thus, we were unable to get convinced data to confirm if sUA has different binding sites. Given the difference in structural feature and inhibition type, we did not use the PDB data regarding 78c-CD38 interaction for analysis in this study.

      (1) Chini, E. N., et al. (2018). The Pharmacology of CD38/NADase: An Emerging Target in Cancer and Diseases of Aging. Trends Pharmacol Sci 39(4): 424-436.

      Although the mouse model used to manipulate sUA levels is not ideal, the authors discuss its limitations, and importantly, they have CD38 KO mice as control. However, all the experiments were performed in very young mice, where CD38 expression is low in most tissues (10.1016/j.cmet.2016.05.006). This point should be mentioned in the discussion and maybe put in the context of variations of sUA levels during aging.

      We appreciate the reviewer’s kind suggestions. Yes, CD38 expression in young mice is relatively low and we used young mice in this study; thus, aged mice would be promising to furthest evaluate the interaction between CD38 and sUA. Regarding the changes in sUA levels during aging, previous reports indicate that sUA levels seem to increase with age in mice and humans [Ref 1, 2]. We speculate that this increase is a physiologically compensatory response to aging in organisms. Accordingly, we added more details in the discussion (second paragraph).

      (1) Iwama, M., et al. (2012). Uric acid levels in tissues and plasma of mice during aging. Biol Pharm Bull 35(8): 1367-1370.

      (2) Kuzuya, M., et al. (2002). Effect of aging on serum uric acid levels: longitudinal changes in a large Japanese population group. J Gerontol A Biol Sci Med Sci 57(10): M660-664.

      Reviewer #2 (Public Review):

      Summary:

      This is an interesting work where Wen et al. aimed to shed light on the mechanisms driving the protective role of soluble uric acid (sUA) toward avoiding excessive inflammation. They present biochemical data to support that sUA inhibits the enzymatic activity of CD38 (Figures 1 and 2). In a mouse model of acute response to sUA and using mice deficient in CD38, they find evidence that sUA increases the plasma levels of nicotinamide nucleotides (NAD+ and NMN) (Figure 3) and that sUA reduces the plasma levels of inflammasome-driven cytokines IL-1b and IL-18 in response to endotoxin, both dependent on CD38 (Figure 4). Their work is an important advance in the understanding of the physiological role of sUA, with mechanistic insight that can have important clinical implications.

      Strengths:

      The authors present evidence from different approaches to support that sUA inhibits CD38, impacts NAD+ levels, and regulates inflammatory responses through CD38.

      We deeply thank the reviewer for the thoughtful comments and appreciation of our findings.

      Weaknesses:

      The authors investigate macrophages as the cells impacted by sUA to promote immunoregulation, proposing that inflammasome inhibition occurs through NAD+ accumulation and sirtuin activity due to sUA inhibition of CD38. Unfortunately, the study still lacks data to support this model, as they could not replicate their in vivo findings using murine bone marrow-derived macrophages, a standard model to assess inflammasome activation. Without an alternative approach, the study lacks data to establish in vitro that sUA inhibition of CD38 reduces inflammasome activation in macrophages - consequently, they cannot determine yet if both NAD+ accumulation and sirtuin activity in macrophages is a mechanism leading to sUA role in vivo.

      We deeply thank the reviewer for pointing out this weakness in our work. In fact, we tried to prepare stable CD38 KD/KO THP-1 cells in the middle of 2021; however, we faced some technical problems due to the limitations of instruments. Thus, we used CD38 KO mice to prepare CD38 KO BMDMs, as shown in the first version of manuscript, we failed to replicate the results in BMDMs because of the low uptake of sUA. To address the reviewer’s concern regarding the lack of an in vitro link between CD38 and sUA immunosuppression, we used 78c, a highly specific and potent inhibitor of CD38, to block CD38 in primed THP-1 cells. Then we evaluated the effect of sUA pre-incubation on MSU crystal-induced IL-1β release in primed THP-1 cells (vehicle and CD38 blockade). The added results in Figure 4-figure supplement 2B and 2C indicated that CD38 blockade largely impaired the immunosuppressive effect of sUA without reducing sUA uptake. In addition, we found that sUA at physiological levels boosted NAD+ levels in THP-1 cells (Figure 3-figure supplement 1B) without affecting the activities of other key enzymes involved in NAD+ synthesis and degradation, including NAMPT and PARP (Figure 3-figure supplement 2). All these results supported that CD38 is a key mediator for sUA at physiological levels to regulate inflammasome activation in vitro.

      Reviewer #3 (Public Review):

      Summary:

      In the present manuscript, the authors propose that soluble Uric acid (sUA) is an enzymatic inhibitor of the NADase CD38 and that it controls levels of NAD modulating inflammatory response. Although interesting the studies are at this stage preliminary and validation is needed.

      Strengths:

      The study characterizes the potential relevance of sUA in NAD metabolism.

      We greatly appreciate the reviewer for the thoughtful comments and valuable suggestions.

      Weaknesses:

      (1) A full characterization of the effect of sUA in other NAD-consuming and synthesizing enzymes is needed to validate the statement that the mechanism of regulation of NAD by sUA is mediated by CD38, The CD38 KO may not serve as the ideal control since it may saturate NAD levels already. Analysis of multiple tissues is needed.

      Yes, it is necessary to confirm if sUA affects other NAD+-consuming and synthesizing enzymes. To address the concern and to provide additional validation, we tested the direct effects of sUA and other purine derivates on the activities of another two key enzymes involved in the metabolic network of NAD+, including PARP (NAD+-consuming enzyme) and NAMPT (NAD+-synthesizing enzyme). The added results in Figure 3-figure supplement 2 showed that sUA has no effect on PARP and NAMPT activity, suggesting that CD38 is a main target for sUA in regulating NAD+ availability. In addition, we also confirmed both PARP and NAMPT were not affected by purine metabolism under physiological conditions. Although hypoxanthine and xanthine, at 500 μM (supraphysiological levels), slightly inhibited PARP activity, it has no physiological significance due to their low physiological concentrations (generally below 20 μM). Further evaluation of these inhibitory effects under pathological conditions would be of interest but were beyond the focus of this study.

      Given that tissue sUA uptake is saturated under physiological conditions (tissue sUA did not increase in our models, Figure 3-figure supplement 5A and 5B), CD38 and other potential targets in tissues may be not affected by sUA in our models. We used CD38 KO mice to confirm if sUA interacts with other targets to regulate NAD+ degradation and inflammatory responses. A previous study [Ref 1] revealed that inhibition of other enzymes involved in NAD+ metabolism, such as PARP, resulted in a significant increase of NAD+ availability in CD38 KO mice, which indicates that CD38 KO mice can be used to exclude the potential interaction between sUA and other targets. In fact, we did not observe significant effects of sUA in CD38 KO mice. More importantly, we added the additional validation regarding PARP and NAMPT activity according to the reviewer’s kind suggestion, which further confirmed that CD38 is the main target for sUA in our models.

      (1) Tarragó, M. G., et al. (2018). A Potent and Specific CD38 Inhibitor Ameliorates Age-Related Metabolic Dysfunction by Reversing Tissue NAD+ Decline. Cell Metab 27(5): 1081-1095.e1010.

      (2) The physiological role of sUA as an endogenous inhibitor of CD38 needs stronger validation (sUA deficient model?).

      We thank the reviewer’s insightful suggestions. Yes, sUA depletion model is ideal for further validation, as we discussed in the limitations of this study. Given that introduction of exogenous recombinant uricase (immunometabolism may be affected) to deplete sUA is not ideal for the evaluation under physiological conditions, uricase-transgenic mice would be a promising model. However, now we have no uricase-transgenic mice, and we are unable to prepare CD38 KO/uricase-transgenic mice for additional validation within a reasonable time. In the first version of manuscript, therefore, we used an sUA-release model in sUA-supplementation mice as a further validation in Figure 3E.

      (3) Flux studies would also be necessary to make the conclusion stronger.

      Answer: We highly appreciate the reviewer’s suggestion regarding metabolic flux analysis. Yes, flux analysis using specifically designed isotope-labeled NAD+ is an ideal validation in mice, as it can track any sUA-induced changes in NAD+ metabolism. However, we are unable to synthesize or obtain suitable isotope-labeled substrates for in vivo validation due to the technical limitations and financial burdens.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      The manuscript is very solid and very well-presented and discussed. In my opinion, the only weakness in the writing is that the message about finding an endogenous regulator of CD38 activity and NAD levels gets blurred by the temptation of jumping into the potential of developing new pharmacological CD38 inhibitors based on sUA structure. My recommendation would be to focus on delivering a clear message about sUA as a physiological inhibitor of CD38, and the possible implications for understanding the onset and evolution of metabolic diseases and aging. Maybe leave the potential of developing novel sUA-based CD38 inhibitors for a final comment. I understand this last point is very attractive, but there are very potent pharmacological CD38 inhibitors already available with promising results.

      We greatly appreciate the reviewer’s valuable suggestions. Yes, there are some promising CD38 inhibitors with nanomolar Ki such as 78c. To clearly focus on sUA as a physiological inhibitor of CD38, we simplified the description in the manuscript and just keep the discussion of functional group. Since the data regarding the development of CD38 inhibitors in our manuscript remain limited, we did not put it in a separate part. We believe the simplified information is still sufficient for medicinal chemists who are interested in the development of CD38 inhibitors.

      Reviewer #2 (Recommendations For The Authors):

      Major comments:

      (1) The authors present several pieces of data to explain why there is no impact of sUA in inflammasome-mediated cytokine secretion, which is convincing and supports that BMDM will not be a model of choice to establish macrophages as the target of sUA. However, their data with THP-1 cells is compelling and could be further explored using shRNA, or CRISPR approaches to deplete CD38, thus establishing a mechanistic link in vitro.

      We greatly appreciate the reviewer’s thoughtful comments. We added more results as mentioned before (please see our response to public review).

      (2) There is a severe lack of linearity in how the figures are presented in the main text, making it difficult to read through the manuscript. The figures should be presented in the order they appear in the panels.

      We thank the reviewer for pointing out this issue. We improved the figure assembly according to eLife guidelines.

      Minor comments:

      (1) The authors do not appropriately address the finding that CD38 impacts the secretion of IL-1b in BMDM (Figure S6B) and in vivo (Figure 4), possibly independently of sUA.

      Yes, the regulatory effect of CD38 on cytokine release seems independent of sUA. In fact, we used CD38 KO BMDMs to validate the role of CD38 in the inflammasome activation. We showed that sUA levels are comparable between WT and KO mice, suggesting that CD38 KO does not affect the baselines and boosted levels of sUA in our models. In this situation, we were able to evaluate the immunosuppressive effects of sUA at the same physiological levels in WT and CD38 KO mice, thus providing evidence to support that sUA at physiological levels limits excessive inflammation via CD38.

      (2) Figure 3F: the legend on the x-axis lacks the indication of which groups were treated with recombinant hCD38.

      We appreciate the reviewer’s comments. We improved Figure 3 by adding more information.

      (3) While the results on panels 3 and 4 provide robust evidence that sUA is anti-inflammatory through CD38, the title of the figures extrapolates their findings (i.e., no data shows CD38-sUA, either in vitro or in vivo).

      We appreciate the reviewer’s kind suggestion. We provided data to support the direct interaction between CD38 and sUA in Figure 3F; we admitted that we did not show the data regarding the direct interaction in mice in Figure 4. To help readers easily track the results, however, we used a conclusion-like title.

      (4) The introduction could briefly mention NMN and CD38 activity as an ecto-enzyme to facilitate the understanding of their findings by a general audience, especially the dosing of NMN and their data on BMDM.

      We added more description regarding CD38 and NMN in the introduction part. Once again, we deeply thank the reviewer for the valuable suggestions.

      Reviewer #3 (Recommendations For The Authors):

      (1) A full characterization of the effect of sUA in other NAD-consuming and synthesizing enzymes is needed to validate the statement that the mechanism of regulation of NAD by sUA is mediated by CD38, The CD38 KO may not serve as the ideal control since it may saturate NAD levels already. Analysis of multiple tissues is needed.

      We greatly appreciate the reviewer’s valuable suggestions. We added more results to validate CD38 as the main target of sUA in our models. Please see our response to public review.

      (2) The physiological role of sUA as an endogenous inhibitor of CD38 needs stronger validation (sUA deficient model?).

      We greatly appreciate the reviewer’s valuable suggestions. Please see our response to public review.

      (3) Flux studies would also be necessary to make the conclusion stronger.

      We greatly appreciate the reviewer’s valuable suggestions. Please see our response to public review.

    1. Author response:

      Reviewer #1 (Public Review):

      Summary:

      In previous work, the authors described necrosis-induced apoptosis (NiA) as a consequence of induced necrosis. Specifically, experimentally induced necrosis in the distal pouch of larval wing imaginal discs triggers NiA in the lateral pouch. In this manuscript, the authors confirmed this observation and found that while necrosis can kill all areas of the disc, NiA is limited to the pouch and to some extent to the notum, but is excluded from the hinge region. Interestingly and unexpectedly, signaling by the Jak/Stat and Wg pathways inhibits NiA. Further characterization of NiA by the authors reveals that NiA also triggers regenerative proliferation which can last up to 64 hours following necrosis induction. This regenerative response to necrosis is significantly stronger compared to discs ablated by apoptosis. Furthermore, the regenerative proliferation induced by necrosis is dependent on the apoptotic pathway because RNAi targeting the RHG genes is sufficient to block proliferation. However, NiA does not promote proliferation through the previously described apoptosis-induced proliferation (AiP) pathway, although cells at the wound edge undergo AiP. Further examination of the caspase levels in NiA cells allowed the authors to group these cells into two clusters: some cells (NiA) undergo apoptosis and are removed, while others referred to as Necrosis-induced Caspase Positive (NiCP) cells survive despite caspase activity. It is the NiCP cells that repair cellular damage including DNA damage and that promote regenerative proliferation. Caspase sensors demonstrate that both groups of cells have initiator caspase activity, while only the NiA cells contain effector caspase activity. Under certain conditions, the authors were also able to visualize effector caspase activity in NiCP cells, but the level was low, likely below the threshold for apoptosis. Finally, the authors found that loss of the initiator caspase Dronc blocks regenerative proliferation, while inhibiting effector caspases by expression of p35 does not, suggesting that Dronc can induce regenerative proliferation following necrosis in a non- apoptotic manner. This last finding is very interesting as it implies that Dronc can induce proliferation in at least two ways in addition to its requirement in AiP.

      Strengths:

      This is a very interesting manuscript. The authors demonstrate that epithelial tissue that contains a significant number of necrotic cells is able to regenerate. This regenerative response is dependent on the apoptotic pathway which is induced at a distance from the necrotic cells. Although regenerative proliferation following necrosis requires the initiator caspase Dronc, Dronc does not induce a classical AiP response for this type of regenerative response. In future work, it will be very interesting to dissect this regenerative response pathway genetically.

      Weaknesses:

      No weaknesses were identified.

      We thank the reviewer for their positive evaluation and kind words.

      Reviewer #2 (Public Review):

      Summary / Strengths:

      In this manuscript, Klemm et al., build on past published findings (Klemm et al., 2021) to characterize caspase activation in distal cells following necrotic tissue damage within the Drosophila wing imaginal disc. Previously in Klemm et al., 2021, the authors describe necrosis-induced-apoptosis (NiA) following the development of a genetic system to study necrosis that is caused by the expression of a constitutive active GluR1 (Glutamate/Ca2+ channel), and they discovered that the appearance of NiA cells were important for promoting regeneration.

      In this manuscript, the authors aim to investigate how tissues regenerate following necrotic cell death. They find that the cells of the wing pouch are more likely to have non-autonomous caspase activation than other regions within the wing imaginal disc (hinge and notum),two signaling pathways that are known to be upregulated during regeneration, Wnt (wingless) and JAK/Stat signaling, act to prevent additional NiA in pouch cells, and may explain the region specificity, the presence of NiA cells promotes regenerative proliferation in late stages of regeneration, not all caspase-positive cells are cleared from the epithelium (these cells are then referred to as Necrosis-induced Caspase Positive (NiCP) cells), these NiCP cells continue to live and promote proliferation in adjacent cells, the caspase Dronc is important for creating NiA/NiCP cells and for these cells to promote proliferation. Animals heterozygous for a Dronc null allele show a decrease in regeneration following necrotic tissue damage.

      The study has the potential to be broadly interesting due to the insights into how tissues differentially respond to necrosis as compared to apoptosis to promote regeneration.

      Weaknesses:

      However, here are some of my current concerns for the manuscript in its current version:

      The presence of cells with activated caspase that don't die (NiCP cells) is an interesting biological phenomenon but is not described until Figure 5. How does the existence of NiCP cells impact the earlier findings presented? Is late proliferation due to NiA, NiCP, or both? Does Wg and JAK/STAT signaling act to prevent the formation of both NiA and NiCP cells or only NiA cells? Moreover, the authors are able to specifically manipulate the wound edge (WE) and lateral pouch cells (LP), but don't show how these manipulations within these distinct populations impact regeneration. The authors provide evidence that driving UAS-mir(RHG) throughout the pouch, in the LP or the WE all decrease the amount of NiA/NiCP in Figure 3G-O, but no data on final regenerative outcomes for these manipulations is presented (such as those presented for Dronc-/+ in Fig 7M). The manuscript would be greatly enhanced by quantification of more of the findings, especially in describing if the specific manipulations that impacted NiA /NiCP cells disrupt end-point regeneration phenotypes.

      We thank the reviewer for their assessment and helpful suggestions to improve the manuscript. Regarding the presence of NiA and NiCP cells, and the proportion of each within a regenerating wing disc, unfortunately we are currently limited in our ability to distinguish each type of cell using available tools. This applies to both visualizing these cells via anti-cDcp-1 staining or the activity of GC3Ai, DBS-GFP and CasExpress, and detecting their function via their influence on proliferation. As such, although the identification of NiCP does not change any of the conclusions prior to Figure 5 in which NiCP are described, we are currently unable to comment on the contribution of NiA versus NiCP to late proliferation, or whether they are differently affected by Wg and JAK/STAT signaling. This issue is touched on in the discussion, but we will expand upon our commentary to better highlight these issues.

      With respect to the reviewer’s suggestion to include evidence on whether blocking NiA/NiCP influences final regenerative outcomes, these data were published in our first paper on this work (Klemm et al., 2021, PMID: 34740246), which we will gladly reiterate in this work.

      Finally, we agree that further quantification of our findings will strengthen the work, which is also suggested by Reviewer 3, and plan to add it where possible in a revised manuscript.

      How fast does apoptosis take within the wing disc epithelium? How many of the caspase(+) cells are present for the whole 48 hours of regeneration? Are new cells also induced to activate caspase during this time window? The author presented a number of interesting experiments characterizing the NiCP cells. For the caspase sensor GC3Ai experiments in Figure 5, is there a way to differentiate between cells that have maintained fluorescent CG3Ai from cells that have newly activated caspase? What is the timeline for when NiA and NiCP are specified? In addition, what fraction of NiCP cells contribute to the regenerated epithelium? Additional information about the temporal dynamics of NiA and NiCP specification/commitment would be greatly appreciated.

      Regarding the timing of apoptosis, Schott et al., 2017 (PMID:28870988) demonstrated that apoptotic GC3Ai-labeled cells in imaginal discs are extruded within 1 hr of labeling, the kinetics of which agree with previously published work on the temporal dynamics of apoptotic cell extrusion by Monier et al., 2015 (PMID:25607361). This is much faster than the continued labeling that we observe up to 64 hr post necrosis. We will include this information alongside a quantification of the percent of the wing pouch with GC3Ai-positive cells over time to better address whether the GC3Ai signal is maintained over time or if newly activated caspases account for the signal in late regenerating discs. We plan to include PH3 staining to distinguish between cells that have activated GC3Ai and are proliferating versus new caspase activity. Additionally, we plan to include new experimental evidence to evaluate the timing of GC3Ai-labelled apoptotic cell loss in our system.

      The question of when NiA/NiCP are specified is difficult to address due to the issue of not being able to easily distinguish between these cell types. We previously attempted to answer this particular question, and also to determine what fraction of these cells contribute to the regenerated epithelium, using caspase-based lineage tracing with CasExpress. However, as shown in the paper, we are unable to label NiA/NiCP with CasExpress, either due to the lack of caspase activity level or subcellular localization. We are currently attempting to combine other caspase reporters with lineage tracing tools and examine late-stage wing discs to address these questions.

      The notum also does not express developmental JAK/STAT, yet little NiA was observed within the notum. Do the authors have any additional insights into the differential response between the pouch and notum? What makes the pouch unique? Are NiA/NiCP cells created within other imaginal discs and other tissues? Are they similarly important for regenerative responses in other contexts?

      As noted by Martin et al., 2017 (PMID:28935711), Martin & Morata, 2018 (PMID:29938762), and our own observations in Harris et al., 2016 (PMID:26840050), the notum does not respond to damage in a way that leads to regeneration, while the pouch does. As NiA/NiCP are a pro-regenerative response, we speculate that this intrinsic difference in regenerative capacity that is potentially caused by a different proliferative and genetic response to injury may account for the disparity in NiA/NiCP occurrence in the pouch vs the notum. A difference in the presence of the (currently unidentified) DAMPs or PRRs in notum vs pouch cells may also be responsible. Alternatively, since the hinge tissue is also refractory to NiA/NiCP due to the presence of genetic factors such as Wg and JAK/STAT, there may be an analogous pathway present in notum cells that acts to protect against the induction of pro-apoptotic factors. Indeed, caspase 3 activation does not seem to occur upon ablation of the notum (Bergantinos et al. 2010, PMID:20215351). We plan to add these points to the discussion.

      Regarding the existence of NiA/NiCP in other contexts, we have additional data stemming from our clonal patch experiments (Figure S1) that demonstrates this phenomenon occurs in other imaginal discs, which we plan to include in the revised manuscript.

      Reviewer #3 (Public Review):

      The manuscript "Regeneration following tissue necrosis is mediated by non- apoptotic caspase activity" by Klemm et al. is an exploration of what happens to a group of cells that experience caspase activation after necrosis occurs some distance away from the cells of interest. These experiments have been conducted in the Drosophila wing imaginal disc, which has been used extensively to study the response of a developing epithelium to damage and stress. The authors revise and refine their earlier discovery of apoptosis initiated by necrosis, here showing that many of those presumed apoptotic cells do not complete apoptosis. Thus, the most interesting aspect of the paper is the characterization of a group of cells that experience mild caspase activation in response to an unknown signal, followed by some effector caspase activation and DNA damage, but that then recover from the DNA damage, avoid apoptosis, and proliferate instead. Many questions remain unanswered, including the signal that stimulates the mild caspase activation, and the mechanism through which this activation stimulates enhanced proliferation.

      The authors should consider answering additional questions, clarifying some points, and making some minor corrections:

      Major concerns affecting the interpretation of experimental results:

      Expression of STAT92E RNAi had no apparent effect on the ability of hinge cells to undergo NiA, leading the authors to conclude that other protective signals must exist. However, the authors have not shown that this STAT92E RNAi is capable of eliminating JAK/STAT signaling in the hinge under these experimental conditions. Using a reporter for JAK/STAT signaling, such as the STAT-GFP, as a readout would confirm the reduction or elimination of signaling. This confirmation would be necessary to support the negative result as presented.

      We thank the reviewer for their assessment and helpful suggestions to improve the manuscript. Although the knockdown of Stat92E using this RNAi line has been shown to produce phenotypes associated with loss of JAK/STAT signaling in previous papers (Monahan and Starz-Gaiano, 2014, 2016 PMID:26277564, 26993259), we agree it would be useful to demonstrate this in our hands and therefore plan to include these data.

      Similarly, the authors should confirm that the Zfh2 RNAi is reducing or eliminating Zfh2 levels in the hinge under these experimental conditions, before concluding that Zfh2 does not play a role in stopping hinge cells from undergoing NiA.

      We attempted to demonstrate the loss of Zfh2 using this RNAi line, but as noted by the reviewer the antibody staining appears mostly unchanged. A reduction in Zfh2 protein levels by this RNAi has previously been demonstrated in cells of the gut (Rojas Villa et al., 2019, PMID: 31841513), suggesting that the persistent Zfh2 staining we see could be due to perdurance of the Zfh2 protein, high levels of expression or high sensitivity of the Zfh2 antibody (or a combination of these factors). We plan to repeat the experiment using a longer knockdown duration prior to ablation to show a change in Zfh2, and/or test alternative RNAi lines. In the absence of these data, we will alter our conclusions to state that Zfh2 cannot be ruled out as playing a role in preventing NiA/NiCP formation in the hinge.

      EdU incorporation was quantified by measuring the fluorescence intensity of the pouch and normalizing it to the fluorescence intensity of the whole disc. However, the images show that EdU fluorescence intensity of other regions of the disc, especially the notum, varied substantially when comparing the different genetic backgrounds (for example, note the substantially reduced EdU in the notum of Figure 3 B' and B'). Indeed, it has been shown that tissue damage can lead to suppression of proliferation in the notum and elsewhere in the disc, unless the signaling that induces the suppression is altered. Therefore, the normalization may be skewing the results because the notum EdU is not consistent across samples, possibly because the damage-induced suppression of proliferation in the notum is different across the different genetic backgrounds.

      We agree with the reviewer that the use of EdU cannot distinguish between an increase in proliferation in the pouch versus a decrease in proliferation of the notum (or a combination of the two), since EdU incorporation by its nature is a relative rather than absolute measure of proliferation. However, we believe that the important finding is that a localized change in proliferation is observed late in necrosis, which is dependent on NiA/NiCP since blocking the formation of these cells prevents this change. While it is possible that this observed change is caused by a reduction in proliferation of the notum, with little or even no alteration in the pouch, this would imply that NiA/NiCP act to non-autonomously limit the proliferation of cells far from where they appear in the pouch, rather than causing localized proliferation in the immediately surrounding tissue that is representative of a blastema. Although we cannot rule this possibility out, our use of a different marker for proliferation in this work (fluorescent E2F) and a more objective proliferation marker, PH3, (Klemm et al., 2021, PMID: 34740246) agree with our observations made using EdU and suggest the formation of a localized blastema in the pouch. To attempt to address this issue, due to the variability of EdU staining between samples, we aim to quantify changes in EdU that are normalized to undamaged discs stained and mounted in the same sample, thus allowing a more objective background level of proliferation to be used for comparison.

      The authors expressed p35 to attempt to generate "undead cells". They take an absence of mitogen secretion or increased proliferation as evidence that undead cells were not generated. However, there could be undead cells that do not stimulate proliferation non-autonomously, which could be detected by the persistence of caspase activity in cells that do not complete apoptosis. Indeed, expressing p35 and observing sustained effector caspase activation could help answer the later question of what percentage of this cell population would otherwise complete apoptosis (NiA, rescued by p35) vs reverse course and proliferate (NiCP, unaffected by p35).

      While it is very possible that expression of P35 in NiA/NiCP could induce a previously uncharacterized type of undead cell that persists but does not secrete known AiP-related factors, the way in which P35 blocks activity of effector caspases (Drice and Dcp-1) precludes our ability to reliably detect and assay NiA/NiCP over time: P35 inactivates caspases by binding to their catalytic site, which causes cDcp-1 labeling to become weak and diffuse (Klemm et al 2021, PMID: 34740246), likely because the detectable epitope is in the catalytic site (Florentin & Arama, 2012. PMID: 22351928). Similarly, the GC3Ai reporter acts as a substrate for caspases and must be cleaved for fluorescence to occur (Zhang et al., 2013 PMID: 23857461). Thus, co-expressing P35 with GC3Ai actually reduces the number of NiA/NiCP cells labeled by GC3Ai and weakens cDcp-1 staining, preventing us from assaying their persistence or association with proliferative markers.

      It is unclear if the authors' model is that the NiCP cells lead to autonomous or non-autonomous cell proliferation, or both. Could the lineage-tracing experiments and/or the experiments marking mitosis relative to caspase activity answer this question?

      While we see GC3Ai-labeled NiA/NiCP in the same area of the pouch with high levels of proliferation (PH3), we observe a mixture of GC3Ai cells that overlapped the PH3 cells and GC3Ai cells that were adjacent to PH3(+) cells. Thus, we are unable to conclusively say whether proliferation is induced autonomously or non-autonomously. We have attempted to answer this question with lineage tracing, however NiA/NiCP are not labeled by the CasExpress tool, and we were unable to define a relationship between NiA/NiCP and proliferation through lineage tracing. However, we add further explanation of our findings to better clarify the proposed model of NiA/NiCP-induced proliferation.

      Many of the conclusions rely on single images. Quantification of many samples should be included wherever possible.

      As suggested by Reviewers 2 and 3 we plan to strengthen our findings by adding quantification of phenotypes where possible, in particular in Figure 2 as mentioned in the “Recommendations for the authors”.

      Why does the reduction of Dronc appear to affect regenerative growth in females but not males?

      We note that the effect on regenerative growth does appear to be present in males, but that the effect is not significant. We suspect that the lower n for this experiment is the cause, and are addressing this by repeating the experiment to increase the n.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer 1 (Public Review):

      Multiple sclerosis (MS) is a debilitating autoimmune disease that causes loss of myelin in neurons of the central nervous system. MS is characterized by the presence of inflammatory immune cells in several brain regions as well as the brain barriers (meninges). This study aims to understand the local immune hallmarks in regions of the brain parenchyma that are adjacent to the leptomeninges in a mouse model of MS. The leptomeninges are known to be a foci of inflammation in MS and perhaps "bleed" inflammatory cells and molecules to adjacent brain parenchyma regions. To do so, they use novel technology called spatial transcriptomics so that the spatial relationships between the two regions remain intact. The study identifies canonical inflammatory genes and gene sets such as complement and B cells enriched in the parenchyma in close proximity to the leptomeninges in the mouse model of MS but not control. The manuscript is very well written and easy to follow. The results will become a useful resource to others working in the field and can be followed by time series experiments where the same technology can be applied to the diAerent stages of the disease.

      Comments on revised version:

      I agree that the authors successfully addressed most of my comments/critiques. However, the fact that the control mice were not injected with CFA and pertussis toxin is somewhat concerning, because it will be hard to interpret the cause of the transcriptomic readouts described in this study. Some of the described eAects might be due to CFA or pertussis (which was used in the EAE but not the "naive" group), and not necessarily to the relapsing-remitting EAE immune features recapitulated in this mouse model. Moreover, this caveat associated with the "naive" control group is not being clearly stated throughout the manuscript and might go unnoticed to readers.

      The authors should clearly state, in the methods section (in the section "Induction of SJL EAE"), that the naive control group was not injected with CFA or pertussis toxin.

      Additionally, this potential confounder, of not using a control group injected with the same CFA and pertussis toxin regimen of the EAE group, should be mentioned in paragraph two of the discussion alongside the other limitations of the study already highlighted by the authors (or in another section of the discussion).

      We thank the reviewer for highlighting this point. Our choice of healthy/naïve, rather than CFA only, controls was intentional, given our desire to sensitively measure genes changing during neuroinflammation. Ultimately, however, we believe the choice of control group had little effect on our conclusions. We would like to note that SJL-EAE does not require pertussis toxin, so the only difference between naïve and CFA only groups is a single injection of CFA 11 weeks prior to experiment endpoint. We have performed additional IHC imaging of naïve and CFA only groups, finding no difference in glial reactivity by MFI measurement of GFAP, IBA1, or CD68 (updated Supplementary Figure 1C–E).

      We have also added sections to the Results and Discussion section to clearly address this point. In the Results: “Since naïve animals were used as controls, we confirmed that CFA alone does not produce lasting glial reactivity or LMI formation. Groups of animals were given CFA only or left naïve. Neither group developed neurologic signs, and after 11 weeks the brains were processed for IHC analysis. There was no evidence of LMI development, and no difference in glial reactivity as measured by GFAP, IBA1, or CD68 intensity (Supplemental Figure 1C–E).” In the Discussion: “Another important consideration in these experiments is our choice of naïve, rather than CFA only, controls. While often used as the control in EAE studies focused on mechanisms of autoimmunity, CFA only can independently induce systemic inflammation. Since this study seeks to describe transcriptomic changes in neuroinflammation more broadly, we chose to use a healthy comparison group to maximize our ability to find genes enriched in neuroinflammation. Ultimately, however, the choice of naïve or CFA only controls is unlikely to have affected our conclusions. SJL-EAE, unlike the more common C57Bl6-EAE, does not require pertussis toxin during the induction. The only difference between naïve and CFA only controls is the subcutaneous CFA delivered at time of immunization (11 weeks prior to experiment endpoint). Indeed, when we compared CFA only and healthy animals at 11 weeks there was no difference in glial reactivity by GFAP, IBA1, or CD68 MFI. There was also no evidence of neurologic symptoms or LMI development in CFA only controls.”

      Reviewer 2 (Public Review):

      Accumulating data suggests that the presence of immune cell infiltrates in the meninges of the multiple sclerosis brain contributes to the tissue damage in the underlying cortical grey matter by the release of inflammatory and cytotoxic factors that diAuse into the brain parenchyma. However, little is known about the identity and direct and indirect eAects of these mediators at a molecular level. This study addresses the vital link between an adaptive immune response in the CSF space and the molecular mechanisms of tissue damage that drive clinical progression. In this short report the authors use a spatial transcriptomics approach using Visium Gene Expression technology from 10x Genomics, to identify gene expression signatures in the meninges and the underlying brain parenchyma, and their interrelationship, in the PLP-induced EAE model of MS in the SJL mouse. MRI imaging using a high field strength (11.7T) scanner was used to identify areas of meningeal infiltration for further study. They report, as might be expected, the upregulation of genes associated with the complement cascade, immune cell infiltration, antigen presentation, and astrocyte activation. Pathway analysis revealed the presence of TNF, JAK-STAT and NFkB signaling, amongst others, close to sites of meningeal inflammation in the EAE animals, although the spatial resolution is insuAicient to indicate whether this is in the meninges, grey matter, or both.

      UMAP clustering illuminated a major distinct cluster of upregulated genes in the meninges and smaller clusters associated with the grey matter parenchyma underlying the infiltrates. The meningeal cluster contained genes associated with immune cell functions and interactions, cytokine production, and action. The parenchymal clusters included genes and pathways related to glial activation, but also adaptive/B-cell mediated immunity and antigen presentation. This again suggests a technical inability to resolve fully between the compartments as immune cells do not penetrate the pial surface in this model or in MS. Finally, a trajectory analysis based on distance from the meningeal gene cluster successfully demonstrated descending and ascending gradients of gene expression, in particular a decline in pathway enrichment for immune processes with distance from the meninges.

      Comments on revised version:

      The authors have addressed all of my comments regarding the lack of spatial resolution between the grey matter and the overlying meninges and also concerning the diAiculties in extrapolating from this mouse model to MS itself.

      I am however very concerned about the lack of the correct control group. Immunization of rodents with complete freunds adjuvant and pertussis alone gives rise to widespread microglial activation, some immune cell infiltration and also structural changes to axons, particularly at nodes of Ranvier (https://doi.org/10.1097/NEN.0b013e3181f3a5b1). This will inevitably make it diAicult to interpret the transcriptomics results, depending on whether these changes are reversible or not and the time frame of the reversal. In the C57Bl6 EAE models adjuvant induced microglial activation becomes chronic, whereas the axonal changes do reverse by 10 weeks. Whether this is the same in SJL EAE model is not clear.

      We thank the reviewer for bringing up this concern regarding control group, which we discussed above in point 1.1. To specifically address reviewer 2’s point regarding microglial activation, we performed IHC analysis comparing naïve and CFA only groups of SJL animals. We found no substantial diAerence in astrocyte or microglial activation in these animals after 11 weeks, as measured by GFAP, IBA1, and CD68. This new data appears in updated Supplementary Figure 1C–D.

      Recommendations for the authors:

      Both reviewers agree that the revised version has improved and some of their major concerns were adequately addressed. However, both reviewers also agree that critical experimental controls are missing, including the FCA and pertussis toxin injected mice which likely show some degree of inflammation in their brain and are needed to compare your experimental MS group and interpret the transcriptomics data.

      We appreciate both reviewers’ important comments on the control group used in this study. In this revised manuscript we have described our rationale for choosing naïve controls, rather than CFA only, and believe they are the most appropriate comparison group. Additionally, we believe that both CFA only and naïve will have similar degrees of baseline neuroinflammation at the 11- week time point. We apologize for not clarifying before, but pertussis toxin is not used in the SJL-EAE, and therefore the “CFA only” control is much milder in SJL-EAE compared to C57Bl6-EAE. Given that many signs of inflammation resolve by 10 weeks in CFA only with pertussis controls (https://academic.oup.com/jnen/article/69/10/1017/2917071; https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10902151/),) CFA only without pertussis controls are unlikely to have any substantial remaining neuroinflammation at 11 weeks. To test this, we performed an additional experiment directly comparing naïve and CFA only without pertussis.

      These groups showed similar degrees of glial reactivity.

      Given the costs of repeating a spatial transcriptomic experiment and inevitable batch effects should we add a group at this point, we have chosen to not as a CFA only control condition to our transcriptomics analysis. However, we believe our added text clarifying the rationale behind control choice and added immunofluorescence data gives readers the appropriate context to accurately interpret our results.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We would like to thank the reviewers for their positive and constructive comments on the manuscript.

      We committed in our original rebuttal letter to implement the following revisions to both DGRPool and the corresponding manuscript to address the reviewers’ comments:

      (1) We agree with reviewer #1 that normalizing the data could potentially improve the GWAS results. Thus, for computing the GWAS results, we are now using these two additional options in PLINK2: “--quantile-normalize --variance-standardize”. We assessed the impact of these options on the overall results, which revealed only minor improvements of the results, globally being a bit more stringent. In this direction, we also now filter the top results with a nominal p-value of 0.001 instead of 0.01, also because it provided better results for the new gene set enrichment step.

      (2) We added a KRUSKAL test next to the ANOVA test for assessing the links between the phenotypes and the 6 known covariates, as well as a Shapiro-Wilk test of normality.

      (3) We agree with both reviewers that gene expression information is of interest. As mentioned before, adding gene expression data to the portal would have required extensive work, beyond the current scope of this paper, which primarily focuses on phenotypes and genotype-phenotype associations. Nonetheless, we included more gene-level outlinks to Flybase. Additionally, we now link variants and genes to Flybase's online genome browser, JBrowse. By following the reviewers' suggestions, we aim to guide DGRPool users to potentially informative genes.

      (4) Consistent with the latter point, and in agreement with reviewer #2, we acknowledge that additional tools could enhance DGRPool's functionality and facilitate meta- analyses for users. Therefore, we developed a gene-centric tool that now allows users to query the database based on gene names. Moreover, we integrated ortholog databases into the GWAS results. This feature will enable users to extend Drosophila gene associations to other species if necessary.

      (5) We amended the manuscript to describe all the new tools and features that were developed and implemented. In short, the new features include a new gene-centric page with diverse links (Phenotypes, Genome Browser JBrowse, Orthologs …), a variant-centric page (variant details, and PheWAS), an API for programmatic access to the database, and other statistical outputs and filtering options.

      We will detail these advances in the point-by-point response below and in the revised manuscript.

      Reviewer #1 (Public Review):

      This is a technically sound paper focused on a useful resource around the DRGP phenotypes which the authors have curated, pooled, and provided a user-friendly website. This is aimed to be a crowd-sourced resource for this in the future.

      The authors should make sure they coordinate as well as possible with the NC datasets and community and broader fly community. It looks reasonable to me but I am not from that community.

      We thank the reviewer for the positive comments. We will leverage our connections to the fly and DGRP communities to make the resource as valuable as possible. DGRPool in fact already reflects the input of many potential users and was also inspired by key tools on the DGRP2 website. Furthermore, it also rationalizes why we are bridging our results with other resources, such as linking out to Flybase, which is the main resource for the Drosophila community at large.

      I have only one major concern which in a more traditional review setting I would be flagging to the editor to insist the authors did on resubmission. I also have some scene setting and coordination suggestions and some minor textual / analysis considerations.

      The major concern is that the authors do not comment on the distribution of the phenotypes; it is assumed it is a continuous metric and well-behaved - broad gaussian. This is likely to be more true of means and medians per line than individual measurements, but not guaranteed, and there could easily be categorical data in the future. The application of ANOVA tests (of the "covariates") is for example fragile for this.

      The simplest recommendation is in the interface to ensure there is an inverse normalisation (rank and then project on a gaussian) function, and also to comment on this for the existing phenotypes in the analysis (presumably the authors are happy). An alternative is to offer a kruskal test (almost the same thing) on covariates, but note PLINK will also work most robustly on a normalised dataset.

      We thank the reviewer for raising this interesting point. Indeed, we did not comment on the distribution of individual phenotypes due to the underlying variability from one phenotype to another, as suggested by the reviewer. Some distributions appear normal, while others are clearly not normally distributed. This information is 'visible' to users by clicking on any phenotype; DGRPool automatically displays its global distribution if the values are continuous/quantitative. Now, we also provide a Shapiro-Wilk test to assess the normality of the distribution.

      We acknowledge the reviewer's concerns regarding the use of ANOVA tests. However, we want to point out that the ANOVA test is solely conducted to assess whether any of the well- established inversions or symbiont infection status (that, for simplification, we call “covariates” or “known covariates”) are associated with the phenotype of interest. This is merely informational, to help the user understand if their phenotype of interest is associated with a known covariate. But all of these known covariates are put in the model in any case, so PLINK2 will automatically correct for them, whatever is the output of the ANOVA test.

      Still, we amended the manuscript to better explain this, and we added a Kruskal-Wallis test (in addition to the ANOVA test) in the results, so the users can have a better overview of potentially associated known covariates. We added this text on p. 10 of the revised manuscript:

      “The tool further runs a gene set enrichment analysis of the results filtered at p<0.001 to enrich the associated genes to gene ontology terms, and Flybase phenotypes. We also provide an ANOVA and a Kruskal-Wallis test between the phenotype and the six known covariates to uncover potential confounder effects (prior correction), which is displayed as a “warning” table to inform the user about potential associations of the phenotype and any of the six known covariates. It is important to note that these ANOVA and Kruskal tests are conducted for informational purposes only, to assess potential associations between well-established inversions or symbiont infection status and the phenotype of interest. However, all known covariates are included in the model regardless, and PLINK2 will automatically correct for them, irrespective of the results from the ANOVA or Kruskal tests. “

      We also acknowledge in the manuscript (Methods section) that the Kruskal-Wallis test is used for a single factor (independent variables) at a time. This is unlike the ANOVA test that we initially performed, which was handling multiple factors simultaneously (given that it was performed in a multifactorial design). For a more direct comparison with our ANOVA model, we ran separate Kruskal-Wallis tests for each factor, but then we acknowledged its potential limitations compared to our multifactorial ANOVA, since each of these tests treats the factor in question as the only source of variation, not considering other factors. But since the test is not intended for interactions or combined effects of these factors, we deem it to be sufficient.

      Nevertheless, we concur with the reviewer that normalizing the data could potentially enhance GWAS results. Consequently, we have rerun the GWAS analyses using the PLINK2 --quantile- normalize and --variance-standardize options. We have updated all results on the website and also updated the plots in the manuscript, accordingly.

      Minor points:

      On the introduction, I think the authors would find the extensive set of human GWAS/PheWAS resources useful; widespread examples include the GWAS Catalog, Open Targets PheWAS, MR-base, and the FinnGen portal. The GWAS Catalog also has summary statistics submission guidelines, and I think where possible meta-data harmonisation should be similar (not a big thing). Of course, DRGP has a very different structure (line and individuals) and of course, raw data can be freely shown, so this is not a one-to-one mapping.

      Thank you for the suggestion. We cited these resources in the Introduction.

      “This aligns with the harmonization effort undertaken by other human GWAS/PheWAS resources, such as the GWAS Catalog, Open Targets PheWAS, MR-base, and the FinnGen portal, which provide extensive examples of effective data use and accessibility. Although the structure of DGRPool differs from these human databases, we acknowledge the importance of similar meta-data harmonization guidelines. Inspired by the GWAS Catalog's summary statistics submission guidelines, we propose submission guidelines for DGRP phenotyping data in this paper. “

      For some authors coming from a human genetics background, they will be interpreting correlations of phenotypes more in the genetic variant space (eg LD score regression), rather than a more straightforward correlation between DRGP lines of different individuals. I would encourage explaining this difference somewhere.

      We understand that this is a potential issue and we made the distinction clearer in the manuscript to avoid any confusion. We added this text on p.7, at the beginning of the correlation results section:

      “Of note, by “phenotype correlations”, we mean direct phenotype-phenotype correlations, i.e. a straightforward Spearman’s correlation of two phenotypes between common DRGP lines, and we repeated this process for each pair of phenotypes. “

      This leads to an interesting point that the inbred nature of the DRGP allows for both traditional genetic approaches and leveraging the inbred replication; there is something about looking at phenotype correlations through both these lenses, but this is for another paper I suspect that this harmonised pool of data can help.

      We agree with the reviewer and hope that more meta-analyses will be made possible by leveraging the harmonized data that are made available through DGRPool.

      I was surprised the authors did not crunch the number of transcript/gene expression phenotypes and have them in. Is this because this was better done in other datasets? Or too big and annoying on normalisation? I'd explain the rationale to leave these out.

      This is a very good point and is in fact something that we initially wanted to do. However, to render the analysis fair and robust, it would require processing all datasets in the same way. This implies cataloging all existing datasets and processing them through the same pipeline. In addition, it would require adding a “cell type” or “tissue” layer, because gene expression data from whole flies is obviously not directly comparable to gene expression data from specific tissues or even specific conditions. This would be key information as phenotypes are often tissue-dependent. Consequently, and as implied by the reviewer, we deemed this too big of a challenge beyond the scope of the current paper. Nevertheless, we plan to continue investigating this avenue in a potential follow-up paper.

      We still added a gene-centric tool to be able to query the GWAS results by gene. We also added orthologs and Flybase gene-phenotype information, both in this new gene-centric tool and also in all GWAS results.

      I think 25% FDR is dangerously close to "random chance of being wrong". I'd just redo this section at a higher FDR, even if it makes the results less 'exciting'. This is not the point of the paper anyway.

      We agree with the reviewer that this threshold implies a higher risk of false positive results. However, this is not an uncommonly used threshold (Li et al., PLoS biology, 2008; Bevers et al., Nature Metabolism, 2019; Hwangbo et al, Elife, 2023), and one that seems robust enough in our analysis since similar phenotypes are significant in different studies at different FDR thresholds.

      Nevertheless, we revisited these results with a stronger threshold of 5% FDR in the main Figure 3C. Most of the conclusions were maintained, except for the relation between longevity and “food intake”, as well as “sleep duration”. We modified the manuscript accordingly, notably removing these points from the abstract, and tuning down the results section. We kept the 25% FDR results as supplemental information.

      I didn't buy the extreme line piece as being informative. Something has to be on the top and bottom of the ranks; the phenotypes are an opportunity for collection and probably have known (as you show) and cryptic correlations. I think you don't need this section at all for the paper and worry it gives an idea of "super normals" or "true wild types" which ... I just don't think is helpful.

      We appreciate the reviewer’s feedback on the section regarding extreme DGRP lines and understand the concern about potential implications of “super normals” or “true wild types.” This section aimed to explore whether specific DGRP lines consistently rank in the extremes of phenotypic measures, particularly those tied to viability-related traits. Our hypothesis was that if particular lines consistently appear at the top or bottom, this might suggest some inherent bias or inbreeding-related weakness that could influence genetic association studies.

      However, as per the analyses presented, we did not discover support for this phenomenon. Importantly, the observed mild correlation in extremeness across sexes, while not profound, further suggested that this phenomenon is not a consistent population-wide feature.

      Nevertheless, we consider that this message is still important to convey. In response to the reviewer's feedback, we have provided a clearer conclusion of this paper section by adding the following paragraph:

      “In conclusion, this analysis showed that while certain lines exhibit lower longevity or outlier behavior for specific traits, we found no evidence of a general pattern of extremeness across all traits. Therefore, the data do not support the idea of 'super normals' or any other inherently biased lines that could significantly affect genetic studies. “

      I'd say "well-established inversion genotypes and symbiot levels" rather than generic covariates. Covariates could mean anything. You have specific "covariates" which might actually be the causal thing.

      We thank the author for the suggestion. We agree and modified the manuscript accordingly.

      I wouldn't use the adjective tedious about curation. It's a bit of a value judgement and probably places the role of curation in the wrong way. Time-consuming due to lack of standards and best practice?

      We thank the author for the suggestion. We agree and modified the manuscript accordingly, replacing the occurrences by “thorough” and “rigorous” which correspond better to the initial intended meaning.

      Reviewer #2 (Public Review):

      Summary:

      In the present study, Gardeux et al provide a web-based tool for curated association mapping results from DRP studies. The tool lets users view association results for phenotypes and compare mean phenotype ~ phenotype correlations between studies. In the manuscript, the authors provide several example utilities associated with this new resource, including pan-study summary statistics for sex, traits, and loci. They highlight cross-trait correlations by comparing studies focused on longevity with phenotypes such as oxphos and activity.

      Strengths:

      -Considerable efforts were dedicated toward curating the many DRG studies provided.

      -Available tools to query large DRP studies are sparse and so new tools present appeal

      Weaknesses:

      The creation of a tool to query these studies for a more detailed understanding of physiologic outcomes seems underdeveloped. These could be improved by enabling usages such as more comprehensive queries of meta-analyses, molecular information to investigate given genes or pathways, and links to other information such as in mouse rat or human associations.

      We appreciate the reviewer's kind comments.

      Regarding the tools, we concur with the reviewer that incorporating additional tools could enhance DGRPool and facilitate users in conducting meta-analyses. Therefore, we developed two new tools: a gene-centric tool that enables users to query the database based on gene names, and a variant-centric tool mostly for studying the impact of specific genomic loci on phenotypes. Additionally, in all GWAS results, we added links to ortholog databases, thereby allowing users to extend fly gene associations to other species, if required.

      Furthermore, we added links to the Flybase database, for variants, phenotypes, and genes that are already present in Flybase. We also link out to a 'genome browser-like' view (Flybase’s JBrowse tool) of the GWAS results centered around the affected variants/genes.

      Finally, we now also perform a gene-set enrichment analysis for each GWAS result, both in the Flybase gene-phenotype database and the Gene Ontology (GO) database.

      Reviewer #2 (Recommendations For The Authors):

      (1) The authors discuss how current available DRG databases are basically data-dump sites and there is a need for integrative queries. Clearly, they spent (and are spending) considerable efforts into curating associations from available studies so the current resource seems to contain several areas of missed opportunities. The most clear addition would be to integrate gene-level queries. For example which genes underlie associations to given traits, what other traits map to a specific gene, or multiple genes which map to traits. This absence of integration is somewhat surprising given the lab's previous analyses of eQTL data in DRPs (https://doi.org/10.1371/journal.pgen.1003055 ) and readily available additional data (ex. 10.1101/gr.257592.119 ,flybase) simple intersections between these at the locus level would provide much deeper molecular support for searching this database.

      The point raised by the reviewer concerning eQTL / transcriptomic data is in fact similar to the one raised by reviewer #1. We strongly agree with both reviewers that incorporating eQTL results in the tool would be very valuable, and this is in fact something that we initially wanted to do. However, to render the analysis fair and robust, it would require re-processing multiple public datasets in the same way. This would imply cataloging all existing datasets and processing them through the same pipeline. In addition, it would require adding a “cell type” or “tissue” layer, because gene expression data from whole flies is obviously not directly comparable to gene expression data from specific tissues or even specific conditions. This would be key information as phenotypes are often tissue-dependent. Consequently, we deemed implementing all these layers too big of a challenge beyond the scope of the current paper, but we plan to continue investigating this avenue in a potential follow-up paper.

      As mentioned before, we still integrated gene-level queries in a new tool, querying genes in the context of GWAS results. We acknowledge that this is not directly related to gene expression, and thus not implicating eQTL datasets (at least for now), but we think that it is for now a good alternative, reinforcing the interpretation of the GWAS results.

      Since this point was raised by both reviewers, we added a discussion about this in the manuscript.

      “We recognize certain limitations of the current web tool, particularly the lack of eQTL or gene expression data integration. Properly integrating DGRP GWAS results with gene expression data in a fair and robust manner would require uniform processing of multiple public datasets, necessitating the cataloging and standardization of all available datasets through a consistent pipeline. Moreover, incorporating a “cell type” or “tissue” layer would be essential, as gene expression data from whole flies is not directly comparable to data from specific tissues or even specific conditions. Since phenotypes are often tissue-dependent, this information is vital. However, implementing these layers presented too big of a challenge and was beyond the scope of this paper. “

      (2) Another area that would help to improve is to provide either a subset or the ability to perform a meta-analysis of the studies proposed to see where phenotype intersections occur, as opposed to examining their correlation structure. For any given trait the PLINK data or association results seem already generated so running together and making them available seems fairly straightforward. This can be done in several ways to highlight the utility (for example w/wo specific covariates from Huang et al., 2014 and/or comparing associations that occur similarly or differently between sexes).

      We are not 100% sure what the reviewer refers to when mentioning “phenotype intersection”, but we interpreted it as a “PheWAS capability”. Currently, in DGRPool, for every variant, there is a PheWAS option, which scans all phenotypes across all studies to see if several phenotypes are impacted by this same variant.

      We tried to make this tool more visible, both in the GWAS section of the website, but also in the “Check your phenotype” tool, when users are uploading their own data to perform a GWAS. We have also created a “Variants” page, accessible from the top menu, where users can view particular variants and explore the list of phenotypes they are significantly associated with.

      From both result pages, users can download the data table as .tsv files.

      (3) As pointed out by the authors, an advantage of DRGs is the ease of testing on homozygous backgrounds. For each phenotype queried (or groups of related phenotypes would be of interest too), I imagine that subsetting strains by the response would help to prioritize lines used for follow-up studies. For example, resistant or sensitive lines to a given trait. This is already done in Fig 4C and 4E but should be an available analysis for all traits.

      For all quantitative phenotypes, we show the global distribution by sex, followed by the sorted distribution by DGRP line. Since the data can be directly downloaded from the corresponding plots, resistant and sensitive lines can then be readily identified for all phenotypes.

      (4) To researchers beyond the DRP community, one feature to consider would be seeing which other associations are conserved across species. While doing this at the phenotype level might be tricky to rename, assigning gene-level associations would make this streamlined. For example, a user could query longevity, subset by candidate gene associations then examine outputs  for  what  is  associated  with  orthologue  genes  in  humans (ex. https://www.ebi.ac.uk/gwas/docs/file-downloads) or other reference panels such as mice and rats.

      In all GWAS results, and in the gene-centric tool, we have added links to ortholog databases. In short, when clicking on a variant, users can see which gene is potentially impacted by this variant (gene-level variant annotation). When clicking on these genes, the user can then open the corresponding, detailed gene page.

      To address the reviewer’s comment, in the gene page, we have added two orthologous databases (Flybase and OrthoDB), which enables cross-species association analyses.

      (5) Related to enabling a meta-data analysis, it would be helpful to let users download all PLINK or DGRP tables in one query. This would help others to query all data simultaneously.

      We would like to kindly point out that all phenotyping data can already be downloaded from the front page, which includes the phenotypes, the DGRP lines and the studies’ data and metadata. However, we did not provide the global GWAS results through a single file, because the data is too large. Instead, we provide each GWAS dataset via a unique file, available per phenotype, on the corresponding GWAS result page of this phenotype. This file is filtered for p<0.001, and contains GWAS results (PLINK beta, p and FDR) as well as gene and regulatory annotations.

      (6) Following analysis of association data an interesting feature would be to enable users to subset strains for putative LOF variants at a given significant locus. This is commonly done for mouse strains (ex. via MGI).

      The GWAS result table available for each phenotype can be filtered for any variant of interest. We added the capability to filter by variant impact; LOF variants being usually referred to as HIGH impact variants.

      (7) Viewing the locus underlying annotation can also provide helpful information. For example, several nice fly track views are shown in 10.1534/g3.115.018929, which would help users to interpret molecular mechanisms.

      We now link the GWAS results out to Flybase’s JBrowse genome browser.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      We are grateful to all three reviewers and editors for their critical comments and suggestions.

      Reviewer #2 (Recommendations For The Authors):

      The authors responded satisfactorily to all my comments and suggestions.

      We thank the reviewer for his time and feedback.

      Reviewer #3 (Recommendations For The Authors):

      Comments for authors:

      The authors have addressed most of the reviewer's concerns. Although no additional data were included to strengthen the manuscript, they have clarified some relevant points, and the manuscript has been updated accordingly. In my view, the current manuscript is well-written and mostly straightforward.

      We thank the reviewer for his time and suggestions. Addressing them have improved the quality of our manuscript.

      After a second revision, I just have a few minor comments (mostly editorial) that should be easy to address.

      (1) Page 16: "The dominant presence of the GRIK1-1 gene was also reported in retinal Off bipolar cells..." Please include reference(s).

      We have now cited the following reference:

      Lindstrom, S.H., Ryan, D.G., Shi, J., DeVries, S.H., 2014. Kainate receptor subunit diversity underlying response diversity in retinal Off bipolar cells. J. Physiol. 592, 1457–1477. https://doi.org/10.1113/jphysiol.2013.265033

      (2) Page 18: "Based on our functional assays, the splice seems to affect the interaction between the receptor and auxiliary proteins". Please remove or tone down this statement; the current data do not support this claim.

      We have revised the sentence as following: “Based on our functional assays, the splice may possibly affect the interaction between the receptor and auxiliary proteins.”

      (3) Page 24: "cultures ... at 0.5 µg/mL were transfected". In the current context, it is not clear what you mean with 0.5 µg/mL. Please check and correct.

      Thanks for pointing out this error. We have corrected it.

      (4) Page 30. He et al. reference is repeated.

      Thanks. We have fixed it now.

      (5) Figure 3, Panel C: Please incorporate the EC50 value for the red trace into the figure; it appears to be a different data set and, consequently, a different fitting compared with Figure 2C.

      The GluK1-1a data set (red trace) is identical to that in Figure 2c, though it may appear different due to the scale of the X and Y axis. As suggested, we have now included the EC50 value for this data set in Figure 3, panel C.

      (6) Figure legend 4: Please check two minor issues here:

      (a) "Bar graphs... with or without Neto1 protein..." This statement is apparently wrong; Figure 4 does not show the effect of Neto1.

      (b) "The wild type GluK1 splice variant data is the same as from Figure 1.." I think the authors mean Figure 2A instead of Fig. 1. Please check.

      Thanks for pointing out the error. We have fixed the same in the revised manuscript.

      (7) Please check and correct spelling/wording issues in the text. Here are some examples:

      (a) Page 9 " Figure 3G - I, Table2.." (There is no Panel I). 

      Fixed.

      (b) Page 16 "... and is involved in various pathophysiology..." 

      We have revised the sentence as “… and is involved in various pathophysiological conditions”

      (c) Page 19 "The constructs used for this study were HEK293 WT mammalian cells were seeded on..." 

      Fixed. Thanks.

      (d) Page 23 "The immunoblots were probed..." Please check the whole paragraph and correct the issues.

      Fixed. Thanks.

      (e) Page 27 "initially, 1,97,908 particles were picked". Check the value; the same issue occurs in Fig.6 table supplement 1. 

      Thanks. We have now modified the sentence to clarify that for  GluK1-1aEM ND-SYM, initially, 1,97,908 particles were picked and subjected to multiple rounds of clean-up using 2D and 3D classification. Finally,  24,531 particles were used for the final 3D reconstruction and refinement.

      (f) Legend Figure 2: Remove "(F)" from the legend. 

      Thanks. Fixed.

      (g) Legend Figure 2-Sup.1: Check/correct spelling issues. 

      Thanks. Fixed.

      (h) Figure 5-figure supplement 1: There is a mistake in panel B: "GFP" label is shown for Gluk1 and Neto2, but the authors mention that the pull-down was done with Anti-His antibodies. Please correct.

      Thanks. The pull-down experiments were done with anti-His for both the blots presented in panels A and B as mentioned in both the figures (right side panels of both A and B). However, for the GluK1 and Neto2 pull downs (panel B), the blots were probed with anti-GFP antibody which would detect both the receptor (as the receptor has both GFP-His8) and Neto2-GFP at their respective sizes. This has been indicated in the figure panel B.

      (8) Related to the point-by-point document:

      Major concern 2: Interpreting the effect of mutants on the regulation by Neto proteins requires knowing how the mutant is affecting the channel properties without Neto. In my view, if the data showing the K368/375/379/382H376-E mutant without Neto is missing (in this case due to low current amplitude), then, the pink bars in Fig. 5 should be removed from the figure. 

      We thank the reviewer for raising this interesting point and agree that it would be valuable to characterize the channel properties of all the mutants individually. However, as mentioned earlier, the functions of some mutant receptors are only rescued, or reliable, measurable currents are detected, when they are co-expressed with Neto proteins. We still believe that comparing wild-type and mutant receptors co-expressed with Neto proteins provides important insights, and therefore, we would like to retain the K368/375/379/382H376-E mutant data in the figure.

      Major concern 4: Figure 6-figure Supplement 8 is not mentioned in the manuscript. It would help to include a proper description in the Results section similar to the answer included in the point-by-point document.

      Figure6-figure Supplement 8 has already been cited on page 15. We have also cited Figure6-figure Supplement 9 on the same page and have added following sentences in the text:

      “A superimposition of GluK1-1aEM (detergent-solubilized or reconstituted in nanodiscs) and GluK1-2a (PDB:7LVT) showed an overall conservation of the structures in the desensitized state. No significant movements were observed at both the ATD and LBD layers of GluK1-1a with respect to GluK1-2a (Figure 6; Figure 6-figure supplement 9).”

      Major concern 5: The ramp/recovery protocol was not included properly in the manuscript; please include the time of the ramp pulse and the time used for the recovery period.

      Elaborated ramp and recovery protocols are included in the methods section. The time used for the recovery period was variable and was tuned as per the recovery kinetics. All the figures were representative traces are shown include the scale bar showing the time period of agonist application.

      Minor concern 1: The proposed change was not included in the manuscript; check page 7.

      Thanks for highlighting this error. We have now changed it in the revised manuscript.

      Minor concern 10: The manuscript was not corrected as indicated. Please check.

      Thanks. We have now modified the sentence as following: “…..a reduction was observed for K375/379/382H376-E receptors (1.17 ± 0.28 P=0.3733) compared to wild-type although differences do not reach statistical significance

      Minor concern 14: The figure was not corrected as indicated. Please check.

      Thanks for highlighting this error. We have now changed it in the revised manuscript.

      Minor concern 19: I suggest including this briefly in the Discussion section.

      Thanks for the suggestion. We have included the following sentence in the discussion:

      “The differences in observations could be due to variations in experimental conditions, such as the constructs and recording conditions used.”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Weaknesses:

      Given that all mutants tested showed the same degree of activation by PEG400, it seemed possible that PEG400 might be an allosteric activator of WNK1/3 through direct binding interactions. Perhaps PEG400 eliminates CWN1/2 waters by inducing conformational changes so that water loss is an effect not a cause of activation. To address this it would be helpful to comment on whether new electron densities appeared in the X-ray structure of WNK1/SA/PEG400 that might reflect PEG400 interactions with chains A or B.

      We re-evaluated the WNK1/SA/PEG400 electron density looking for non-protein densities larger than water. No new densities were found. However, we do observe a PEG400-destabilizing effect using differential scanning fluorimetry, and have included this data into Figure 2. We conclude that the effects on the water structure and destabilization are due to demands on solvent.

      We have included in the second paragraph of the introduction references to primary literature that advance similar arguments to explain osmolyte induced effects on activity.

      Specifically, Colombo MF, Rau DC, Parsegian VA (1992) Protein solvation in allosteric regulation: a water effect on hemoglobin. Science 256: 655-659 and LiCata VJ, Allewell NM (1997) Functionally linked hydration changes in Escherichia coli aspartate transcarbamylase and its catalytic subunit. Biochemistry 36: 10161—10167. 

      It would also be helpful to discuss any experiments that might have been done in previous work to examine the direct binding of glycerol and other osmolytes to WNKs.

      We did not observe PEG400 in WNK1/SA/PEG400 despite effects on the space group and subunit packing. On the other hand, glycerol was observed in WNK1/SA, which was cryoprotected in glycerol (PDB file 6CN9). We have highlighted these differences in the second section of the results. A thorough analysis on the effects of various osmolytes on WNK structure, stability, and activity is a potential future direction.

      The study would benefit from a deeper discussion about how to reconcile the different effects of mutations. For example, wouldn't most or all of the mutations be expected to disrupt the water network, and relieve the proposed autoinhibition? This seemed especially true for some of the residues, like Y420(Y346), D353(D279), and K310(K236), which based on Fig 3 appeared to interact with waters that were removed by PEG400.

      The manuscript has been updated with new data and better discussion of this point. Given the inconsistencies on the effects of mutation in static light scattering (SLS), we addressed the possibility that the reducing agent was not constant across experiments. In a repeated study, including reducing agent (1 mM TCEP), we obtained results on mutant mass more similar to wild-type than in the original experiment. An exception was that two of the mutants were much more monomeric than wild-type. It follows that the network CWN1 stabilizes the inactive dimer. The reduced activity of some of the mutants probably reflects the position of CWN1 and the AL-CL Cluster in the active site, such that mutants can affect substrate binding or catalysis. This is now better discussed both in the data and discussion sections.

      Mutants have a tendency to have complex effects on activity and structure. It was satisfying to find any activating mutants. We point out that we have been careful to present all of our data including mutants that are not easily explained by our models.

      Alternatively, perhaps the waters in CWN2 are more important for maintaining the autoinhibited structure. This possibility would be useful to discuss, and perhaps comment on what may be known about the energetic contributions of bound water towards stabilizing dimers.

      This research focused on the most salient unique feature of WNK1- CWN1. We also identified CWN2. Mutational analysis of CWN2 can’t be done without disrupting the dimer interface, greatly complicating data interpretation.

      It would also be useful to comment on why aggregation of E319Q/A (E314) shouldn't inhibit kinase activity instead of activating it.

      On recollection of the SLS data in the presence of reducing agent, we saw reduced aggregation. WNK3/D279N and WNK3/E314Q were more monomeric, especially at the higher protein concentration used. WNK3/E314Q is one of the more active mutants.

      The X-ray work was done entirely with WNK1 while the mutational work was done entirely with WNK3. Therefore, a simple explanation for the disconnect between structure and mutations might be that WNK1 and WNK3 differ enough that predictions from the structure of one are not applicable to mutations of the other. It would be helpful to describe past work comparing the structure and regulation of WNK1 and WNK3 that support the assumption of their interchangeability.

      We have responded directly to this concern. We introduced our most interesting amino acid replacement WNK3/E314A into WNK1, making WNK1/E388A. Similar trends in chloride inhibition and mutational activation were observed in WNK1 as in WNK3. This supports the assumption of interchangeability of WNK1 and WNK3 we invoked for practical reasons.  As expected, the overall activity of WNK1 is lower than WNK3. Overall, the lower activity limited data collection. However, the lower activity did allow us to fit the chloride inhibition data to a kinetic model for WNK1.  Panels on WNK1 activity, mutation, and chloride inhibition were added to Figure 5 and to Supplemental data (Table S6).

      Reviewer #2 (Public Review):

      Strengths:

      The most interesting result presented here is that P1 crystals of WNK1 convert to P21 in the presence of PEG400 and still diffract (rather than being destroyed as the crystal contacts change, as one would expect). All of the assays for activity and osmolyte sensing are carried out well.

      Thank you. We have emphasized this point in the Results section with the word “remarkably”

      Weaknesses:

      The rationale for using WNK3 for the mutagenesis study is that it is more sensitive to osmotic pressure than WNK1. I think that WNK1 would have been a better platform because of the direct correlation to the structural work leading to the hypothesis being tested. All of the crystallographic work is WNK1; it is not logical to jump to WNK3 without other practical considerations.

      This point is addressed in the last comment to Reviewer 1. We added autophosphorylation assay data on our most interesting mutant (WNK3/E314A) in WNK1 (WNK1/E388A). Conversely, we have crystallographic data on uWNK3 (on uWNK3/E314A collected to 3.3Å). These new data justify the assumption of interchangeability of results obtained for uWNK1 and uWNK3.

      Osmolyte sensing was tested by measuring ATP consumption as a function of PEG400 (Figure 6). Data for the subset of mutants analyzed by this assay showed increasing activity. It is not clear why the same collection of mutant proteins analyzed in the experiments of Figure 5 was not also measured for osmolyte sensing in Figure 6.

      These data are now more complete, having been now collected for all of the WNK3 mutants (now Figure 7).

      The last set of data presented uses light scattering to test whether the WNK3 mutant proteins exhibit quaternary structural changes consistent with the monomer/dimer hypothesis. If they did, one would expect a higher degree of monomer for those that are activated by mutation, and a lower amount of monomer (like wt) for those that are not. Instead, one of the mutant proteins that showed the most chloride inhibition (Y346F) had a quaternary structure similar to the wt protein, and others have similar monomer/dimer mixtures but distinct chloride inhibition profiles (K307A and M301A). I don't see how the light scattering data contribute to this story other than to refute the hypothesis by showing a lack of correlation between quaternary structure, water binding, and activity. This is another reason why the disconnect between WNK1 and WNK3 could be a problem. All of the detailed structural work with WNK1 must be assumed with WNK3; perhaps the light scattering data are contradicting this assumption?

      As noted above, on recollection of the SLS data in the presence of reducing agent, we saw reduced aggregation and more consistency with our model. Thus, we now feel it is a useful contribution to the manuscript. The table in Supplemental data has been updated.

      Reviewer #1 (Recommendations For The Authors):

      Fig 3D in the PDF manuscript seemed distorted - waters were cut off. Also Fig 2D would benefit from showing the whole molecule, instead of cutting off the top and bottom of the kinase domain.<br /> We suspect this is a data transfer problem, since we don’t see these truncations.

      Both Figure 2 and 3 have been changed, addressing these concerns and adding new differential scanning fluorimetry data as discussed in reply to Reviewer 1. Figure 2 was simplified by eliminating Figures 2A-2C, and replacing them with a new Figure 2B, the superposition of WNK1/SA/PEG400 (PDB 9D3F), WNK1/SA (PDB 6CN9).  

      In Figure 3, we added a panel highlighting the volume change around CWN1 in presence of PEG400 (Figure 3C). Hopefully, inappropriate cropping has been eliminated.

      Line 162: Y314F should be Y346F.

      This has been corrected. Thank you.

      Lines 211-213 - these two sentences do not seem to logically go together: "Two hyper-active mutants were discovered, WNK3/E314A, and WNK3/E314Q. These mutants are straightforward to interpret based on our model: the mutated residues support and stabilize inactive dimeric WNK."

      An extensive rewrite has been conducted to address the difference in activity between the higher activity mutants versus less active mutants, now discussed in two paragraphs, and two Figures, Figure 5 and 6. The SLS data, recollected with more reducing agent, has given more consistent results (Supplemental), making the discussion more straightforward (discussed above).

      Reviewer #2 (Recommendations For The Authors)

      I think WNK1 would be a better platform for mutagenesis than WNK3. Or minimally the authors should better justify the switch to WNK3 from WNK1. Analyze the same set of mutants in Figure 5 into Figure 6.

      Again, we have added assay data on uWNK1/E388A, and structural data on uWNK3/E314A.

      I would analyze the same set of mutants in Figures 5 and 6.

      We have analyzed all of the WNK3 mutants in the ADP-Glo assays (Figure 7).

      Will the P21 crystal form grow independently in PEG400?

      Attempts to crystallize WNK1/SA or WNK3/SA or other constructs in PEG400 have been unsuccessful.

      I would also add some context about the role of water in allosteric mechanisms. I know there is a long history in hemoglobin in which specific waters have been associated with the T and R states such as that by Marcio Colombo. There is a relatively recent article in J. Phys Chem. that would provide good context. Leitner et al., J. Chem. Phys. 152, 240901 (2020)

      Thank you. Good call.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This fundamental study uses a creative experimental system to directly test Ohno's hypothesis, which describes how and why new genes might evolve by duplication of existing ones. In agreement with existing criticism of Ohno's original idea, the authors present compelling evidence that having two gene copies does not speed up the evolution of a new function as posited by Ohno, but instead leads to the rapid inactivation of one of the copies through the accumulation of mostly deleterious mutations. These findings will be of broad interest to evolutionary biologists and geneticists.

      We thank the editors and the reviewers for their positive feedback concerning our experimental system and for the constructive feedback on how to further improve the manuscript. We have now addressed the reviewer’s comments in a revised version.

      Reviewer #1 (Public Review):

      Overview:

      The authors construct a pair of E. coli populations that differ by a single gene duplication in a selectable fluorescent protein. They then evolve the two populations under differing selective regimes to assess whether the end result of the selective process is a "better" phenotype when starting with duplicated copies. Importantly, their starting duplicated population is structured to avoid the duplication- amplification process often seen in bacterial artificial evolution experiments. They find that while duplication increases robustness and speed of adaptation, it does not result in more highly adapted final states, in contrast to Ohno's hypothesis.

      Major comments:

      This is an excellent study with a very elegant experimental setup that allows a precise examination of the role of duplication in functional evolution, exclusive of other potential mechanisms. My main concern  is  to  clarify  some  of  the  arguments  relating  to  Ohno's  hypothesis.

      I think my main confusion on first reading the manuscript was in the precise definition of Ohno's hypothesis. I think this confusion was mine and not the authors, but it is likely common and could be addressed.

      Most evolutionary biologists think of gene duplication as making neofunctionalization "easier" by providing functional redundancy and a larger mutational target, such that the evolutionary process of neofunctionalization is faster (as the authors observed). In this framework, the final evolved state might not differ when selection is applied to duplicated copies or a single-copy gene. Ohno's hypothesis, by contrast, argues that there generally exist adaptive conflicts between the ancestral function and the "desired" novel function, such that strong selection on a single-copy gene cannot produce the evolutionary optima that selection on two copies would. This idea is hinted at in the quotation from Ohno in paragraph 2 of the introduction. However, the sentences that follow I don't think reinforce this concept well enough and lead to some confusion.

      With that definition in mind, I agree with the authors' conclusion that these data do not support Ohno's hypothesis. My quibble would be that what is actually shown here is that adaptive conflict in function is not universal: there are cases where a single gene can be optimized for multiple functions just as well as duplicated copies. I do not think the authors have, however, refuted the possibility that such adaptive conflicts are nonetheless a significant barrier to evolutionary innovation in the absence of gene duplication generally. Perhaps just a sentence or two to this effect might be appropriate.

      We fully agree with the reviewer that trade-offs might play an important role in the evolution of single copy and of duplicated genes, depending on the gene and on the selection regime. And while trade-offs are not likely to play a big role in the selection regime we discuss in detail in the main text (evolution towards more green), they probably are important for at least one our selection regimes. In fact, we so state in the following passage of the discussion. In addition, we have now added a sentence that acknowledges the importance of trade-offs for evolution in the absence of gene duplication:

      “A single gene encoding such a protein suffers from an adaptive conflict between the two activities. Gene duplication may provide an escape from this adaptive conflict, because each duplicate may specialize on one activity14, 15. For coGFP, a trade-off likely exists for fluorescence in these two colors, because improvement of green fluorescence entails a loss of blue fluorescence during evolution (Figure S8 and Figure S16). We therefore expected that during selection for both green and blue fluorescence, one cogfp copy in double-copy populations would “specialize” on green fluorescence whereas the other copy would specialize on blue fluorescence. However, when we analyzed individual population members with two active gene copies we could not find any such specialization (Figure S21). Moreover, the identified key mutations at positions 147 and 162 have a very low frequency (<1%) in these populations (Figure S15). Future experiments with different selection strategies might reveal the reasons for this observation and the conditions under which such a specialization can occur.“

      I also think the authors need to clarify their approach to normalizing fluorescence between the two populations to control for the higher relative protein expression of the population with a duplicated gene. Since each population was independently selected with the highest fluorescing 60% (or less) of the cells selected, I think this normalization is appropriate. Of course, if the two populations were to compete against each other, this dosage advantage of the duplicates would itself be a selective benefit. Even as it is, the dosage advantage should be a source of purifying selection on the duplication, and perhaps this should be noted.

      The reviewer is correct. To be able to follow the evolutionary trajectories of the different constructs, the populations were treated separately. The gates were adjusted for each library separately to select for the top 60, 1 or 0.01% of cells and the gates for the double-copy populations were set to slightly higher fluorescence, reflected in the higher fluorescence of these populations in Figure 3A. Indeed, if individuals in these populations were to compete against each other, the double-copy populations would have a benefit due to the dosage advantage. However, as we already pointed out in the manuscript, we did not see any additional advantage beyond the increased gene dosage provided by the second copy (Figure 3B). To discuss this issue in more detail, we have now added the following text to the discussion:

      “It is worth noting that we evolved each of our single- and double-copy populations separately and in parallel to follow their individual evolutionary trajectories. In a natural population, individuals with one or two copies might occur in the same population and compete against each other. In this situation any dosage advantage of a duplicate gene would itself entail selective benefit. Our approach allowed us to find out if gene duplication facilitates phenotypic evolution beyond any such gene dosage effect. At least for the specific genes, selection pressures, and mutation rates we used, the data suggest that it does not.”

      Finally, I am slightly curious about the nature of the adaptations that are evolving. The authors primarily discuss a few amino-acid changing mutations that seem to fix early in the experiment. Looking at Figure 3, it however, appears that the populations are still evolving late in the experiment, and so presumably other changes are occurring later on. Do the authors believe that perhaps expression changes to increase protein levels are driving these later changes?

      Figure S15 shows that some mutations are indeed still increasing in frequency during late evolutionary rounds, in particular S2L, V141L and V205L. We have measured the emission spectra of these mutants (Figure S16), and these mutations increase fluorescence both in green and blue. It is therefore likely that these mutations, similar to L98M, increase protein expression, solubility, or thermal stability, as suggested by the reviewer. We now clarify this matter in a new passage of the results:

      “Like L98M, the additional mutations S2I, V141I and V25L also occurred in all selection regimes, but they reached lower frequencies than L98M during the 5 generations of the experiment. We hypothesized that mutations observed in all selection regimes do not derive their benefit from increasing the intensity of any one fluorescent color. Instead, they may increase protein expression, solubility, or thermal stability.”

      Reviewer #2 (Public Review):

      Summary:

      Drawing from tools of synthetic biology, Mihajlovic et al. use a cleverly designed experimental system to dissect Ohno's hypothesis, which describes the evolution of functional novelty on the gene-level through the process of duplication & divergence.

      Ohno's original idea posits that the redundancy gained from having two copies of the same gene allows one of them to freely evolve a new function. To directly test this, the authors make use of a fluorescent protein with two emission maxima, which allows them to apply different selection regimes (e.g. selection for green AND blue, or, for green NOT blue). To achieve this feat without being distracted by more complex evolutionary dynamics caused by the frequent recombination between duplicates, the authors employ a well-controlled synthetic system to prevent recombination: Duplicates are placed on a plasmid as indirect repeats in a recombination-deficient strain of E.coli. The authors implement their directed evolution approach through in vitro mutagenesis and selection using fluorescent-activated cell sorting. Their in-depth analysis of evolved mutants in single-copy versus double-copy genotypes provides clear evidence for Ohno's postulate that redundant copies experience relaxed purifying selection. In contrast to Ohno's original postulate, however, the authors go on to show that this does not in fact lead to more rapid phenotypic evolution, but rather, the rapid inactivation of one of the copies.

      Strengths:

      This paper contributes with great experimental detail to an area where the literature predominantly leans on genomics data. Through the use of a carefully designed, well-controlled synthetic system the authors are able to directly determine the phenotype & genotype of all individuals in their evolving populations and compare differences between genotypes with a single or double copy of coGFP. With it they find clear evidence for what critics of Ohno's original model have termed "Ohno's dilemma", the rapid non- functionalization by predominantly deleterious mutations.

      Including an expressed but non-functional coGFP in (phenotypically) single copy genotypes provides an especially thoughtful control that allows determining a baseline dN/dS ratio in the absence of selection. All in all the study is an exciting example of how the clever use of synthetic biology can lead to new insights.

      Weaknesses:

      The major weakness of the study is tied to its biggest strength (as often in experimental biology there is a trade-off between 'resolution' and 'realism').

      The paper ignores an important component of the evolutionary process in favour of an in-depth characterization of how two vs one copy evolve. Specifically, by employing a recombination-deficient strain and constructing their duplicates as inverted repeats their experimental design completely abolishes recombination between the two copies.

      This is problematic for two reasons:

      i)  In nature, new duplicates do not arise as inverted, but rather as direct (tandem) repeats and - as the authors correctly point out - these are very unstable, due to the fact that repeated DNA is prone to recA- dependent homologous recombination (which arise orders of magnitude more frequently than point mutations).

      ii)  This instability often leads to further amplification of the duplicates under dosage selection both in the lab and in the wild (e.g. Andersson & Hughes, Annu. Rev. Genet. 2009), and would presumably also be an outcome under the current experimental set-up if it was not prevented from happening?

      So in sum, recombination between duplicate genes is not merely a nuisance in experiments, but occurring at extremely high frequencies in nature (such that the authors needed to devise a clever engineering solution to abolish it), and is often observed in evolving populations, be it in the laboratory or the wild.

      The manuscript sells controlling of copy number as a strength. And clearly, without it, the same insights could not be gained. However, if the basis for the very process of what Ohno's model describes is prevented from happening for the process to be studied, then, for reasons of clarity and context this needs pointing out, especially, to readers less familiar with the principles of molecular evolution.

      Connected to this, there are several places in the introduction and the discussion where I feel that the existing literature, in particular models put forward since Ohno that invoke dosage selection (such as IAD) end up being slightly misrepresented.

      My point is best exemplified in line 1 of Discussion: "To test Ohno's hypothesis and to distinguish its predictions from those of competing hypotheses, it is necessary to maintain a constant and stable copy number of duplicated genes during experimental evolution."

      We understand the reviewer’s position and fully agree that we needed to clarify better what our experiments aimed to achieve. To this end, we rewrote the beginning of the discussion to read:

      “Our aim was to study whether gene duplication can affect mutational robustness and phenotypic evolution beyond any effect of increased gene dosage provided by multiple gene copies. To this end, we needed to maintain a constant and stable copy number of duplicated genes during experimental evolution.”

      I think this statement is simply not true and might be misleading. To take the exaggerated position of a devil's advocate, the goal of evolutionary biology should be to find out how evolution actually proceeds in nature most of the time, rather than creating laboratory systems that manage to recapitulate influential ideas.

      On this point, we respectfully disagree. To ask questions like ours, laboratory experiments that are highly controlled albeit possibly “unnatural” can be essential. And we would argue that our experiments do not merely aim to “recapitulate” an influential idea but to validate it and potentially refute it, as we did for our study system. Validating theory is an essential aspect of experimental science. Textbooks in biology and beyond are rife with examples.

      While fixing copy number may be a necessary step to understand how one copy evolves if a second one is present, it seems that if Ohno's hypothesis only works out in recA-deficient bacterial strains and on engineered inverted repeats, that Ohno might have missed one crucial aspect of how paralogs evolve. The mentioned competing hypotheses have been put forward to (a) address Ohno's dilemma (which the present study beautifully demonstrates exists under their experimental conditions) and (b) to reflect a commonly observed evolutionary process in bacteria (dosage gain in response to selection, e.g. a classic way of gaining antibiotic resistance). Fixing the copy number allowed the authors to show which predictions of Ohno's model hold up and which don't (under these specific conditions). But they do so without even preventing the processes described by alternative models from happening, so the experimental system is hardly appropriate to distinguish between Ohno & alternatives. Therefore, I think it could be made clearer that the experimental system is great to look at certain aspects Ohno's hypothesis in  detail, but  it  can  only inform  us about  a  universe  without  recombination.

      (1)  Citing the works by ref 8, 26, 27 to merely state that "in some copies were gained and some were lost (ref 6, ref 25)" makes it seem as if fixing at 2 copies is some sort of sensible average. Yet ref 6 (Dhar et al.) specifically states that dosage is the most important response. Moreover, in this study gene copies are lost, but plasmid copies are gained instead. In Holloway et al. 2007 (ref 25), the 2 copies resided on different plasmids, so entirely different underlying molecular genetics might be at work (high cost of plasmid maintenance, and competitive binding on both proteins onto the respective (off)-target, where either way selection favored a single copy, so a different situation altogether). In both cited studies, fixing the copy would have prohibited learning something about the process of duplication & divergence.

      Hence this statement seems to distract the readers from the main message, which seems that preventing recombination experimentally allows to follow the divergence of each copy and studying a response that does not involve dosage-increase.

      (2)   "These studies highlighted the importance of gene duplication in providing fast adaptation under changing environmental conditions but they focused on the importance of gene dosage." I think this constructs a false dichotomy. Instead, these studies pointed out that dosage (and with it, selection for dosage)  is  an  important  part  of  the  equation  that  might  have  been  missed  by  Ohno.

      Your points are well taken. To clarify the insights from previous experiments and the aims of our experiments we rewrote this passage in question as follows.

      “These studies underline the importance of gene duplication in providing fast adaptation under changing environmental conditions. In some studies one copy was lost6, 25, while in others, additional copies were gained8, 26, 27. Together these studies highlight that gene dosage and selection for dosage can play an important role during the evolution of duplicated genes6, 8, 25-28.

      These studies also raise the question whether gene duplication can provide an advantage beyond its effects on gene dosage. To find out it is necessary to study the evolution of gene duplicates while keeping the copy number of the duplicated gene exactly at two. This is challenging because gene duplication causes recombinational instability and high variability in copy number. No previous experimental studies were designed to control copy number. Here, we present an experimental system that allowed us to keep the copy number fixed at one or two genes, and to follow the evolution of each gene copy in the absence of any dosage increase.”

      (3)  "Such models are also easier to test experimentally, because they do not require precise control of gene copy number. The necessary tests can even benefit from massive gene amplifications8. Although Ohno's hypothesis is more difficult to test experimentally (...)" - again, I feel the wording is slightly misleading. The point is not that IAD is easier to test and Ohno's model is harder to test in laboratory experiments, rather, experiments (and some more limited observations of naturally evolving populations) seem to suggest that in reality evolution proceeds (more often?) according to IAD rather than Ohno's neofunctionalization hypothesis. However, as the authors point out, it will be exciting to see their clever experimental system used to test other genes and conditions to get a more comprehensive understanding of what gene- and selection- parameter values would overcome Ohno's dilemma.

      We agree and in response rewrote the paragraph in question to read:

      “The challenge that a duplicated gene copy must remain free of frequent deleterious mutations long enough to acquire beneficial mutations that provide a new selectable phenotype is known as Ohno’s dilemma13. Our experiments confirm that this challenge is highly relevant for post-duplication evolution. Other models such as the innovation-amplification-divergence (IAD) model8, 13 postulate that this dilemma can be resolved through an increase in gene dosage that allows latent pre-duplication phenotypes to come under the influence of selection. To distinguish between the effects of gene dosage and other benefits of gene duplication, we prevented recombination and gene amplification to prevent copy number increases beyond two copies. We are aware that our experimental design does not reflect how evolution may occur in the wild. However, this design allowed us to study evolutionary forces separately that are otherwise difficult to disentangle. “

      Finally, we also made two changes in the abstract (highlighted in red) to take your feedback into account.

      Reviewer #2 (Recommendations For The Authors):

      The paper is very well written, with a lot of emphasis put on explaining every step and every finding. It was a joy to read.

      Thanks!

      Full stop missing in line 5 of abstract.

      Corrected.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #2 (Recommendations for the authors):

      Comments to the authors:

      R1. The authors show a similar reduction in ECAR as a measure of glycolytic inhibition upon treating H37Rv-infected unprimed MDMs with 5 mM 2-DG at 1 h and 24 h. However, the data pertaining to the extent of glycolytic inhibition upon 2-DG treatment in IFN-γ or IL-4 primed AMs or MDMs is not included.

      We acknowledge that we have not checked the ECAR of every dataset herein treated with 2DG. However, we have provided evidence that 2DG reduced ECAR in the control datasets, and moreover, 2DG is functionally affecting the cells (e.g. the presence of 2DG altered cytokine production in both AM and MDM, even in the presence of IFN-γ or IL-4).

      R2. The authors have replotted the same data as percent change and fold difference with different normalizing samples. While they have corrected one of the highlighted discrepancies in the data plotting of Fig. 1A and 1C, similar discrepancies are still evident in many other instances. Based on my understanding of the data and normalization methodology stated by the authors in response to comment (#5) by reviewer 1, the authors are plotting fold changes across all samples with respect to unstimulated and unprimed macrophages, whereas percent changes are plotted for stimulated (LPS or dead H37Rv) samples with respect to baseline measurements for each unstimulated sample under differently primed macrophages. I believe the slope of lines connecting unstimulated and LPS stimulates/H37Rv infected upon percent increase or decrease (from the baseline of unstimulated samples) will still maintain their trend in fold changes (relative to unstimulated and unprimed macrophages) irrespective of changes in absolute values. For instance, in Fig. 1F, there are at least 3 samples that show an increase in fold change in OCR upon H37Rv infection in IFN-γ primed MDMs. However, Fig. 1H, plotted from the same data, shows a decrease in OCR across all IFN-γ primed MDMs upon H37Rv infection. The authors have also highlighted that this decrease in OCR upon H37Rv infection in IFN- γ primed MDMs is highly significant (P < 0.01). The same data is again plotted as a bar plot in Fig. 1J as fold change relative to unstimulated and unprimed macrophages (mislabeled as percent change to unstimulated), showing no difference upon H37Rv infection of IFN-γ primed MDMs.

      We have amended the axis in Figure 1 and Supplemental Figure 1 to more accurately describe the two different forms of analysis. We have fixed the errors outlined. We have also amended the methods in the text to clarify the two analyses carried out on the metabolic data. Lines 113-122 as follows:

      “Fold change in ECAR and OCR was calculated compared to unstimulated unprimed controls at 150 minutes, where unstimulated unprimed macrophages were set to 1. This allows for analysis of the effects of both priming and subsequent stimulation for and accounts for the variation in the raw ECAR and OCR reading between runs thereby making each donor its own control.

      Percent change in ECAR and OCR was also calculated to equalise groups to the same point prior to stimulation. Each condition was compared to its own respective primed or unprimed baseline at 30 minutes and this was set to 100%, prior to stimulation, this was carried to examine the capacity of cells to increase metabolic parameters in response to stimulation. Post stimulation percent change data was then extracted and analysed at 150 minutes. This controls for the priming effect and enables the analysis of metabolic capacity in each dataset.”

      For figure 1J, the data is replotted from fold change datasets (not percentage change where the decrease in OCR is significant). The axis label has been revised for accuracy.

    1. Author response:

      The following is the authors’ response to the original reviews.

      This study investigates the role of Caspar (Casp), an orthologue of human Fas- associated factor-1, in regulating the number of primordial germ cells that form during Drosophila embryogenesis. The findings are important in that they reveal an additional pathway involved in germ cell specification and maintenance. The evidence supporting the conclusions is solid, as the authors identify Casp and its binding partner Transitional endoplasmic reticulum 94 (TER94) as factors that influence germ cell numbers. Minor changes to the title, text, and experimental design are recommended.

      We thank the Editors and Reviewers for their overall positive and thoughtful feedback. Based on these comments, we have revised our manuscript. The changes in the manuscript have been highlighted in ‘blue’ font for easy visualisation.

      Reviewer #1 (Public Review):

      Summary:

      The authors were seeking to define the roles of the Drosophila caspar gene in embryonic development and primordial germ cell (PGC) formation. They demonstrate that PGC number, and the distribution of the germ cell determinant Oskar, change as a result of changes in caspar expression; reduction of caspar reduces PGC number and the domain of Oskar protein expression, while overexpression of caspar does the reverse. They also observe defects in syncytial nuclear divisions in embryos produced from caspar mutant mothers. Previous work from the same group demonstrated that Caspar protein interacts with two partners, TER94 and Vap33. In this paper, they show that maternal knockdown of TER94 results in embryonic lethality and some overlap of phenotypes with reduction of caspar, supporting the idea may work together in their developmental roles. The authors propose models for how Caspar might carry out its developmental functions. The most specific of these is that Caspar and its partners might regulate oskar mRNA stability by recruiting ubiquitin to the translational regulator Smaug.

      Strengths:

      The work identifies a new factor that is involved in PGC specification and points toward an additional pathway that may be involved in establishing and maintaining an appropriate distribution of Oskar at the posterior pole of the embryo. It also ties together earlier observations about the presence of TER94 in the pole plasm that have not heretofore been linked to a function.

      Weaknesses:

      (1)  A PiggyBac insertion allele casp[c04227] is used throughout the paper and referred to as a loss-of-function allele (casp[lof]). However, this allele does not appear to act strictly as a loss-of-function. Figure 1E shows that some residual Casp protein is present in early embryos produced by casp[lof]/Df females, and this protein is presumably functional as the PiggyBac insertion does not affect the coding region. Also, Figures 1B and 1C show that the phenotypes of casp[lof] homozygotes and casp[lof]/Df are not the same; surprisingly, the homozygous phenotypes are more severe. These observations are unexplained and inconsistent with the insertion being simply a loss-of-function allele. Might there be a second-site mutation in casp[c04227]?

      The term loss-of-function (lof) is used rather than null or amorph. casplof is a strong hypomorph, with residual (and functional) protein estimated in the range 5-10% when compared to the wild type. The caspc04227 was procured from BDSC, and based on the decrease in lethality of the casplof/casp(Df) compared to casplof, we assume that second site hits in the casplof line are the reason for the enhanced lethality. For this very reason, we have used casplof/ casp(Df) for all subsequent experiments. We also conducted rescue experiments wherever possible to confirm the specificity with caspWT and various deletion variants of casp.

      (2)  TER94 knockdown phenotypes have been previously published (Zhang et al 2018 PMID 30012668), and their effects on embryonic viability and syncytial mitotic divisions were described there. This paper is inappropriately not cited, and the data in Figure 4 should be presented in the context of what has been published before.

      We apologize for the oversight. Indeed, Zhang et al. (2018) highlighted TER94 as one of the loci uncovered in their screen and some of the relevant phenotypes are described there. We have referred to their findings at the appropriate junctures as suggested (pg 11, pg13, pg 15).

      (3)  The peptide counts in the mass spectrometry experiment aimed at finding protein partners for Casp are extremely low, except for Casp itself and TER94. Peptide counts of 1-2 seem to me to be of questionable significance.

      Peptide counts are indeed low, but the fact that they are enriched at all, in comparison to controls, considering that we are using whole embryo lysates rather than isolated PGC lysates, suggests interaction with Casp could be biologically/ functionally meaningful. The data is restricted to the supplementary material and is not analyzed in isolation; we have combined data from multiple mass spectrometry experiments by other researchers to link Casp to pole plasm components.

      (4)  The pole bud phenotypes from TER94 knockdown and casp mutant shown in Fig 5 appear to be quite different. These differences are unexplained and seem inconsistent with the model proposed that the two proteins work in a common pathway. Whole embryos should also be shown, as the TER94 KD phenotype could result from a more general dysmorphism.

      We agree that TER94 KD is a stronger phenotype, with TER94 having essential cell division and patterning roles. In fact, the TER94 RNAi embryos, unlike casplof, stall in terms of their developmental program before Stage 4. This has been noted in the earlier study (Zhang et al., 2018). As a result, we focused on pole bud stage embryos that were rare - but present in the collections. We report that PGC from very early TER94 RNAi embryos have fewer pole buds.

      The rationale behind the presumption that these two proteins may work in a common pathway is clear-cut. We have validated the physical interaction using protein lysates from two developmental time points. Satisfyingly, an affinity purification using antibodies against TER94 or Casp invariably enriches the other protein as the primary interacting partner. Our model integrates data from mammalian and fly systems to support the idea that there must be an overlap between TER94/Casp function, with these two proteins working together to engineer the degradation of ubiquitinated Smaug. Future experiments are necessary to confirm and extend this claim.

      (5)  Figure 6 is not quantitative, lacking even a second control staining to check for intensity variation artifacts. Therefore, it shows that the distribution of Oskar protein changes in the various genotypes, but not convincingly that the level of Oskar changes as the paper claims.

      We appreciate that oskar RNA localization is also somewhat altered due to change in casp levels. We have acknowledged the variability in the various phenotypes, and as such, it is unsurprising that it has also reflected in the Oskar levels. However, it is evident that a statistically significant number of mutant embryos show a decrease in Oskar levels.

      (6)  The error bars are huge in the graphs in Figure 7H, I, and J, leading me to question whether these changes are statistically significant. Calculations of statistical significance are missing from these graphs and need to be added.

      The data in the Western blots represents the whole embryo, as the lysates used are from embryos 0-1, 1-2, 2-3 hrs. We have averaged and plotted data from 5 Western blots. The changes are not statistically significant. Even without the statistical significance, the data for Fig. 7I led us to examine Smaug in the pole cells, rather than in the whole embryo. The pole cell data (Fig8-D3) is striking and led to the conclusion – that Smaug protein perdures in the pole cells during the stages of syncytial/cellular blastoderm.

      (7)  There are many instances of fuzzy and confusing language when describing casp phenotypes. For example, on lines 211-212, it is stated that 'casp[lof] adults are only partially homozygous viable as ~70% embryos laid by the homozygous mutant females failed to hatch into larvae'. Isn't this more accurately described as 'casp[c04227] is a maternal-effect lethal allele with incomplete penetrance'? Another example is on line 1165, what exactly is a 'semi-vital function'?

      We thank the reviewer for reading the manuscript in detail. We have tried to pay attention to reduce the ambiguity and fixed the text accordingly (pg 7, line 214; pg 33, line 1169, word semi-vital is deleted).

      Reviewer #2 (Public Review):

      Summary:

      This study investigated the role of the Caspar (Casp) gene, a Drosophila homolog of human Fas-associated factor-1. It revealed that maternal loss of Casp led to centrosomal and cytoskeletal abnormalities during nuclear cycles in Drosophila early embryogenesis, resulting in defective gastrulation. Moreover, Casp regulates PGC numbers, likely by regulating the levels of Smaug and then Oskar. They demonstrate that Casp protein levels are linearly correlated to the PGC number. The partner protein TER94, an ER protein, shows similar but slightly distinct phenotypes. Based on the deletion mutant analysis, TER94 seems functionally relevant for the observed Casp phenotype. Additionally, it is likely involved in regulating protein degradation during PGC specification.

      Strengths:

      The paper reveals an unexpected function of the maternally produced Casp gene, previously implicated in immune response regulation and NF-kB signaling inhibition, in nuclear division and PGC formation in early fly embryos. Experiments are properly conducted and strongly support the conclusion. The rescue experiment using deletion mutant form is particularly informative as it suggests the requirement of each domain function.

      Weaknesses:

      Functional relationships among molecules shown here (and other genes known to regulate these processes) are still unclear.

      We completely agree with this assessment. In our view this is an interesting albeit initial report. We also appreciate that understanding the mechanistic underpinnings of these results will be critical. We have ensured that our present claims are backed up by data, however, are fully sensitive to the fact that newer observations will refine or even alter these claims. We are continuing to work on the problem and will hopefully make further inroads in mechanism in the coming years.

      Reviewer #3 (Public Review):

      Summary:

      Das et al. discovered a maternal role for Caspar (Casp), the Drosophila orthologue of human Fas-associated factor-1 (FAF1), in embryonic development and germ cell formation. They find that Casp interacts with Transitional endoplasmic reticulum 94 (TER94). Loss of Casp or TER94 leads to partial embryonic lethality, correlated with aberrant centrosome behavior and cytoskeletal abnormalities. This suggests that Casp, along with TER94, promotes embryonic development through a still unidentified mechanism. They also find that Casp regulates germ cell number by controlling a key determinant of germ cell formation, Oskar, through its negative regulator, Smaug.

      Strengths:

      Overall, the experiments are well-conducted, and the conclusions of this paper are mostly well-supported by data.

      Weaknesses:

      Some additional controls could be included, and the language could be clarified for accuracy.

      Reviewer #1 (Recommendations For The Authors):

      (1)  The paper is inconsistent in using standard Drosophila nomenclature. Often the name of the mammalian counterpart is used instead. This needs to be cleaned up as it is very confusing to the reader.

      The names of the mammalian counterpart are explicitly used, when we intended, to underscore the parallels between mammalian vs Drosophila function, specifically in the context of the major players in this study, TER94 vs VCP; Caspar vs FAF1. Since we do not have direct biochemical data indicating that TER94/Casp degrades Smaug, we use published mammalian literature to draw parallels. At no point have we swapped terminology casually.

      (2)  The Discussion is far too long and in my view extends too far beyond the experimental data in the paper. As a start for editing, its first two paragraphs (lines 1138-1164) include mostly general statements and could be greatly reduced or eliminated.

      Our aim was to emphasize the repurposing of factors between early development and later/adult stages for different functional contexts. Our laboratory (Ratnaparkhi) works on Casp in terms of its roles in NF-kappa B signalling. We serendipitously stumbled on the embryonic lethality while characterizing the casplof allele, which, later, led us to examine the function of Casp during embryonic germ cell development.

      (3)  The Introduction is weak in its description of the developmental function of Toll and Dorsal. This could be summarized in a sentence or two.

      As suggested, a few sentences that highlight the developmental function of Toll/Dorsal signalling have been added to the text (pg 3, line 90-92).

      (4)  Even if correctly cited, it is not appropriate to simply reproduce an image from a public database, as was done in Figure S1C. This should be removed.

      Figure S1C has been deleted.

      (5)  The Materials and Methods section should be moved to after the Discussion so it does not interrupt the flow of the Results.

      The Section has been moved as suggested.

      Reviewer #2 (Recommendations For The Authors):

      For general readers, more detailed information about the PGC specification will be helpful in the Introduction or Results section.

      PGC specification is introduced in the text as the story transits from global embryonic effects of casp knockdown to specific effects on PGCs. A few additional sentences have been added to bolster the text (pg 11, first paragraph).

      The Methods section talks about live imaging, but I could not find the experiments in the figures. Are the data available for asynchronous nuclear divisions in the live imaging?

      The live imaging relates to DIC movies that are part of Suppl. Fig 2A. The movies are embedded in an MS PowerPoint slide, which has been uploaded as a PowerPoint (and not a PDF).

      To ensure that the mutant changes the Osk translation rate, showing the Osk RNA level may be helpful.

      oskar RNA localization is quite distinct as compared to Oskar and Vasa protein. It has been shown that oskar RNA is localized to the founder granules and is, in fact, excluded from the germ granules that contain Vasa, Oskar and nos RNA etc. Gavis lab recently reported (Eichler et al., 2020) that ectopic localization of osk RNA in the germ granules is toxic to pole cells. Thus, it will be of interest to analyze whether and how oskar RNA is localized in casp embryos.

      More discussion about the difference between Casp and ter94 phenotypes and potential reasons would be informative.

      TER94 appears to be an essential maternal gene. Hypomorphic knockdown of TER94 using RNAi is sufficient to induce early embryonic lethality. In fact, Zhang et. al., 2018 et al., using stronger/earlier maternal drivers highlighted the lethality and somatic cell division defects caused due to the severe loss of TER94. The UBX domain is present in multiple proteins, in addition to Casp. TER94 possible plays a vital role in protein degradation of critical cell cycle proteins, such as cyclins that need to be degraded for efficient genomic duplications in the 10’ nuclear division cycles that predominate the first few hours of embryogenesis.

      N=3 (Fig1 legend) and N =15 (Fig2). What are those numbers?

      N=3 indicated the number of repeats of the western blot. This reference has been deleted. N=15, represents the number of embryos imaged for data in panels G and H.

      Reviewer #3 (Recommendations For The Authors):

      Major Suggestion:

      (1)  Oskar (Osk) mRNA Localization: Does Osk mRNA localization change upon overexpression or LOF of Casp? Since TER94 has been implicated in Osk mRNA localization (Ruden et al., 2000), this would be a good control to include.

      As mentioned earlier, in the response to editors, data presented in our manuscript indicates that Caspar is unique in its ability to regulate both Oskar levels and centrosome dynamics. As the reviewer pointed out, we are in the process of analysing the possible localization defects in oskar mRNA in the embryos. Since the preliminary data are promising, we are pursuing this carefully to better understand the involvement of Caspar. We are focusing on the ability of Caspar to regulate early nuclear divisions prior to pole cell formation. It is possible that in casp mutant embryos the nuclei/centrosomes that enter the pole plasm are already defective and thus can influence release of the pole plasm components. This needs to be examined carefully, and we are conducting these experiments.

      (2)  Western Blot for Osk Protein: It would also be beneficial to perform a western blot for Osk protein to demonstrate that it is indeed increased upon Casp overexpression.

      This is a good suggestion. However, Oskar antibodies are not readily available, and we have a very limited supply which have been used for embryo staining experiments. We considered these more useful as in addition to the absolute levels, staining experiment can reveal localization pattern. It was thus possible to correlate Oskar function with the pole cell counts in respective genetic backgrounds.

      (3)  Title Clarification: The title states, "Caspar determines primordial germ cell identity in Drosophila melanogaster." The current experiments do not show that Casp determines germ cell identity. It would be more accurate to conclude that Casp regulates germ cell numbers.

      Please refer to the introductory paragraphs where we explain our views in this regard. We have modified our title to “Caspar specifies primordial germ cell count and identity in Drosophila melanogaster."

      Minor Suggestions:

      (1)  Line 69: Delete the use of "recent" for papers published in 2001 and 2007. These papers are around 20 years old.

      The word has been deleted.

      (2)  Paragraph from Line 110: Consider splitting this paragraph into two for better readability and clarity.

      Paragraph has been split into two; this has improved readability.

      (3)  Line 266: Check and correct the formatting issues in this line.

      Edited, based on suggestion. A line break was added after the title.

      (4)  Line 328: Adding references to earlier studies here will be useful for providing context and supporting information.

      References that introduce Centrosomes and their roles as organizing centres have been added in line 336.

      (5)  Line 564: It is best to avoid using the word "master." Please consider using other terms such as "key" or "principal."

      Edited, based on suggestion.

      (6)  Citations: The authors should also cite Cinalli et al., 2013 for the Gcl reference to ensure comprehensive citation of relevant literature.

      Thank you for the suggestion. The reference has been added on pages 16 and 29.

      (7)  Overall Length: The paper is quite long. If it can be shortened, it will be easier to read. Consider condensing sections where possible without losing essential information.

      The paper is indeed longer than average, but the choice of eLife as the home for this study was, in part, determined by the platform's flexibility regarding length/ word count. It seemed worthwhile to elaborate the text in places to accentuate the novelty of the findings.

      These additions and adjustments would help to further substantiate the claims and improve the clarity of the paper.

      We hope that the claims made in our manuscript are substantiated by the data that are presented. Wherever possible, we have tried to modify the text suitably to improve clarity.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer 1 (Public Review):

      Summary: Wilmes and colleagues present a computational model of a cortical circuit for predictive processing which tackles the issue of how to learn predictions when different levels of uncertainty are present for the predicted sensory stimulus. When a predicted sensory outcome is highly variable, deviations from the average expected stimulus should evoke prediction errors that have less impact on updating the prediction of the mean stimulus. In the presented model, layer 2/3 pyramidal neurons represent either positive or negative prediction errors, SST neurons mediate the subtractive comparison between prediction and sensory input, and PV neurons represent the expected variance of sensory outcomes. PVs therefore can control the learning rate by divisively inhibiting prediction error neurons such that they are activated less, and exert less influence on updating predictions, under conditions of high uncertainty.

      Strengths: The presented model is a very nice solution to altering the learning rate in a modality and context-specific way according to expected uncertainty and, importantly, the model makes clear, experimentally testable predictions for interneuron and pyramidal neuron activity. This is therefore an important piece of modelling work for those working on cortical and/or predictive processing and learning. The model is largely well-grounded in what we know of the cortical circuit.

      Weaknesses: Currently, the model has not been challenged with experimental data, presumably because data from an ad- equate paradigm is not yet available. I therefore only have minor comments regarding the biological plausibility of the model:

      Beyond the fact that some papers show SSTs mediate subtractive inhibition and PVs mediate divisive inhibition, the selection of interneuron types for the different roles could be argued further, given existing knowledge of their properties. For instance, is a high PV baseline firing rate, or broad sensory tuning that is often interpreted as a ’pooling’ of pyramidal inputs, compatible with or predicted by the model?

      Thank you for this nice suggestion. We added a section to the discussion expanding on this: “The model predicts that the divisive interneuron type, which we here suggest to be the PVs, receive a representation of the stimulus as an input. PVs could be pooling the inputs from stimulus-responsive layer 2/3 neurons to estimate uncertainty. The more the stimulus varies, the larger the variability of the pyramidal neuron responses and, hence, the variability of the PV activity. The broader sensory tuning of PVs (Cottam et al. 2013) is in line with the model insofar as uncertainty modulation could be more general than the specific feature, which is more likely for low-level features processed in primary sensory cortices. PVs were shown to connect more to pyramidal cells with similar feature-tuning (Znamenskyiy et al. 2024); this would be in line with the model, as uncertainty modulation should be feature-related. In our model, some SSTs deliver the prediction to the positive prediction error neurons. SSTs are already known to be involved in spatial prediction, as they underlie the effect of surround suppression (Adesnik et al. 2012), in which SSTs suppress the local activity dependent on a predictive surround.”

      On a related note, SSTs are thought to primarily target the apical dendrite, while PVs mediate perisomatic inhibition, so the different roles of the interneurons in the model make sense, particularly for negative PE neurons, where a top-down excitatory predicted mean is first subtractively compared with the sensory input, s, prior to division by the variance. However, sensory input is typically thought of as arising ’bottom-up’, via layer 4, so the model may match the circuit anatomy less in the case of positive PE neurons, where the diagram shows ’s’ arising in a top-down manner. Do the authors have a justification for this choice?

      We agree that ‘s’ is a bottom-up input and should have been more clear about that we do not consider ‘s’ to be a top-down input like the prediction. We hence adjusted the figure correspondingly and added a few clarifying sentences to the manuscript. The reviewer, however, raises an important point, which is not talked about enough. Namely, that if the bottom-up input ‘s’ comes from L4, how can it be compared in a subtractive manner with the top-down prediction arriving in the superficial layers? In Attinger et al. it was shown that the visual stimulus had subtractive effects on SST neurons. The axonal fibers delivering the stimulus information are hence likely to arrive in the vicinity of the apical dendrites, where SSTs target pyramidal cells. Hence, those axons delivering stimulus information could also target the apical dendrites of pyramidal cells. As the reviewer probably had in mind, L4 input tends to arrive in the somatic layer. However, there are also stimulus-responsive cells in layer 2/3, such that the stimulus information does not need to come directly from L4, it could be relayed via those stimulus-responsive layer 2/3 cells. It has been shown that L2/3→L3 axons are mostly located in the upper basal dendrites and the apical oblique dendrites, above the input from L4 (Petreanu et al. The subcellular organization of neocortical excitatory connections). Hence, stimulus information could arrive on the apical dendrites, and be subtractively modulated by SSTs. We would also like to note that the model does not take into account the precise dendritic location of the inputs. The model only assumes that the difference between stimulus and prediction is calculated before the divisive modulation by the variance.

      In cortical circuits, assuming a 2:8 ratio of inhibitory to excitatory neurons, there are at least 10 pyramidal neurons to each SST and PV neuron. Pyramidal neurons are also typically much more selective about the type of sensory stimuli they respond to compared to these interneuron classes (e.g., Kerlin et al., 2012, Neuron). A nice feature of the proposed model is that the same interneurons can provide predictions of the mean and variance of the stimulus in a predictor-dependent manner. However, in a scenario where you have two types of sensory stimulus to predict (e.g., two different whiskers stimulated), with pyramidal neurons selective for prediction errors in one or the other, what does the model predict? Would you need specific SST and PV circuits for each type of predicted stimulus?

      If we understand correctly, this would be a scenario in which the same context (e.g., sound) is predicting two types of sensory stimulus. In that case, one may need specific SST and PV circuits for the different error neurons selective for prediction errors in these stimuli, depending on how different the predictions are for the two stimuli as we elaborate in the following. The reviewer is raising an important point here and that is why we added a section to the discussion elaborating on it.

      We think that there is a reason why interneurons are less selective than pyramidal cells and that this is also a feature in prediction error circuits. Similarly-tuned cells are more connected to each other, because they tend to be activated together as the stimuli they encode tend to be present in the environment together. Also, error neurons selective to nearby whiskers are more likely to receive similar stimulus information, and hence similar predictions. Hence, because nearby whiskers are more likely to be deflected similarly, a circuit structure may have developed during development such that neurons selective for prediction errors of nearby whiskers, may receive inputs from the same inhibitory interneurons. In that case, the same SST and PV cells could innervate those different neurons. If, however, the sensory stimuli to be predicted are very different, such that their representations are likely to be located far away from each other, then it also makes sense that the predictions for those stimuli are more diverse, and hence the error neurons selective to these are unlikely to be innervated by the same interneurons.

      We added a shorter version of this to the discussion: “The lower selectivity of interneurons in comparison to pyramidal cells could be a feature in prediction error circuits. Error neurons selective to similar stimuli are more likely to receive similar stimulus information, and hence similar predictions. Therefore, a circuit structure may have developed such that prediction error neurons with similar selectivity may receive inputs from the same inhibitory interneurons.”

      Reviewer 2 (Public Review):

      Summary: This computational modeling study addresses the observation that variable observations are interpreted differently depending on how much uncertainty an agent expects from its environment. That is, the same mismatch between a stimulus and an expected stimulus would be less significant, and specifically would represent a smaller prediction error, in an environment with a high degree of variability than in one where observations have historically been similar to each other. The authors show that if two different classes of inhibitory interneurons, the PV and SST cells, (1) encode different aspects of a stimulus distribution and (2) act in different (divisive vs. subtractive) ways, and if (3) synaptic weights evolve in a way that causes the impact of certain inputs to balance the firing rates of the targets of those inputs, then pyramidal neurons in layer 2/3 of canonical cortical circuits can indeed encode uncertainty-modulated prediction errors. To achieve this result, SST neurons learn to represent the mean of a stimulus distribution and PV neurons its variance.

      The impact of uncertainty on prediction errors is an understudied topic, and this study provides an intriguing and elegant new framework for how this impact could be achieved and what effects it could produce. The ideas here differ from past proposals about how neuronal firing represents uncertainty. The developed theory is accompanied by several predictions for future experimental testing, including the existence of different forms of coding by different subclasses of PV interneurons, which target different sets of SST interneurons (as well as pyramidal cells). The authors are able to point to some experimental observations that are at least consistent with their computational results. The simulations shown demonstrate that if we accept its assumptions, then the authors’ theory works very well: SSTs learn to represent the mean of a stimulus distribution, PVs learn to estimate its variance, firing rates of other model neurons scale as they should, and the level of un- certainty automatically tunes the learning rate, so that variable observations are less impactful in a high uncertainty setting.

      Strengths: The ideas in this work are novel and elegant, and they are instantiated in a progression of simulations that demonstrate the behavior of the circuit. The framework used by the authors is biologically plausible and matches some known biological data. The results attained, as well as the assumptions that go into the theory, provide several predictions for future experimental testing.

      Weaknesses: Overall, I found this manuscript to be frustrating to read and to try to understand in detail, especially the Results section from the UPE/Figure 4 part to the end and parts of the Methods section. I don’t think the main ideas are so complicated, and it should be possible to provide a much clearer presentation.

      For me, one source of confusion is the comparison across Figure 1EF, Figure 2A, Figure 3A, Figure 4AB, and Figure 5A. All of these are meant to be schematics of the same circuit (although with an extra neuron in Figure 5), yet other than Figures 1EF and 4AB, no two are the same! There should be a clear, consistent schematic used, with identical labeling of input sources, neuron types, etc. across all of these panels.

      We changed all figures to make them more consistent and pointed out that we consider subparts of the circuit.

      The flow of the Results section overall is clear until the “Calculation of the UPE in Layer 2/3 error neurons” and Figure 4, where I find that things become significantly more confusing. The mention of NMDA and calcium spikes comes out of the blue, and it’s not clear to me how this fits into the authors’ theory. Moreover: Why would this property of pyramidal cells cause the PV firing rate to increase as stated? The authors refer to one set of weights (from SSTs to UPE) needing to match two targets (weights from s to UPE and weights from mean representation to UPE); how can one set of weights match two targets? Why do the authors mention “out-of-distribution detection’ here when that property is not explored later in the paper? (see also below for other comments on Figure 4)

      We agree that the introduction of NMDA and calcium spikes was too short and understand that it was confusing. We therefore modified and expanded the section. To answer the two specific questions: First, Why would this property of pyramidal cells cause the PV firing rate to increase as stated? This property of pyramidal cells does not cause the PV firing rate to increase. When for example in positive error neurons, the mean input increases, then the PVs receive higher stimulus input on average, which is not compensated by the inhibitory prediction (which is still at the old mean), such that the PV firing rate increases. Due to the nonlinear integration in PVs, the firing rate can increase a lot and inhibit the error neurons strongly. If the error neurons integrate the difference nonlinearly, they compensate for the increased inhibition by PVs. In Figure 5, we show that a circuit in which error neurons exhibit a dendritic nonlinearity matches an idealised circuit in which the PVs perfectly represent the variance. We modified the text to clarify this.

      Second, how can one set of weights match two targets? In our model, one set of weights does not need to match two targets. We apologise that this was written in such a confusing way. In positive error neurons, the inhibitory weights from the SSTs need to match the excitatory weights from the stimulus, and in negative error neurons, the inhibitory weights from the SSTs need to match the excitatory weights from the prediction. The weights in positive and negative circuits do not need to be the same. So, on a particular error neuron, the inhibition needs to match the excitation to maintain EI balance. Given experimental evidence for EI balance and heterosynaptic plasticity, we think that this constraint is biologically achievable. The inhibitory and excitatory synapses that need to match are targeting the same postsynaptic neuron and could hence have access to their postsynaptic effect. We modified the text to be more clear. Finally, we omitted the mentioning of out-of-distribution detection, see our reply below.

      Coming back to one of the points in the previous paragraph: How realistic is this exact matching of weights, as well as the weight matching that the theory requires in terms of the weights from the SSTs to the PVs and the weights from the stimuli to the PVs? This point should receive significant elaboration in the discussion, with biological evidence provided. I would not advocate for the authors’ uncertainty prediction theory, despite its elegant aspects, without some evidence that this weight matching occurs in the brain. Also, the authors point out on page 3 that unlike their theory, “...SSTs can also have divisive effects, and PVs can have subtractive effects, dependent on circuit and postsynaptic properties”. This should be revisited in the Discussion, and the authors should explain why these effects are not problematic for their theory. In a similar vein, this work assumes the existence of two different populations of SST neurons with distinct UPE (pyramidal) targets. The Discussion doesn’t say much about any evidence for this assumption, which should be more thoroughly discussed and justified.

      These are very important points, we agree that the biological plausibility of the model’s predictions should be discussed and hence expanded the discussion with three new paragraphs:

      To enable the comparison between predictions and sensory information via subtractive inhibition, we pointed out that the weights of those inputs on the postsynaptic neuron need to match. This essentially means that there needs to be a balance of excitatory and inhibitory inputs. Such an EI balance has been observed experimentally (Tan and Wehr, 2009). And it has previously been suggested that error responses are the result of breaking this EI balance (Hertäg und Sprekeler, 2020, Barry and Gerstner, 2024). Heterosynaptic plasticity is a possible mechanism to achieve EI balance (Field et al. 2020). For example, spike pairing in pre- and postsynaptic neurons induces long-term potentiation at co-activated excitatory and inhibitory synapses with the degree of inhibitory potentiation depending on the evoked excitation (D’amour and Froemke, 2015), which can normalise EI balance (Field et al. 2020).

      In the model we propose, SSTs should be subtractive and PVs divisive. However, SSTs can also be divisive, and PVs subtractive dependent on circuit and postsynaptic properties (Seybold et al. 2015, Lee et al. 2012, Dorsett et al. 2021). This does not necessarily contradict our model, as circuits in which SSTs are divisive and PVs subtractive could implement a different function, as not all pyramidal cells are error neurons. Hence, our model suggests that error neurons which can calculate UPEs should have similar physiological properties to the layer 2/3 cells observed in the study by Wilson et al. 2012.

      Our model further posits the existence of two distinct subtypes of SSTs in positive and negative error circuits. Indeed, there are many different subtypes of SSTs. SST is expressed by a large population of interneurons, which can be further subdivided. There is e.g. a type called SST44, which was shown to specifically respond when the animal corrects a movement (Green et al. 2023). Our proposal is hence aligned with the observation of functionally specialised subtypes of SSTs.

      Finally, I think this is a paper that would have been clearer if the equations had been interspersed within the results. Within the given format, I think the authors should include many more references to the Methods section, with specific equation numbers, where they are relevant throughout the Results section. The lack of clarity is certainly made worse by the current state of the Methods section, where there is far too much repetition and poor ordering of material throughout.

      We implemented the reviewer’s detailed and helpful suggestions on how to improve the ordering and other aspects of the methods section and now either intersperse the equations within the results or refer to the relevant equation number from the Methods section within the Results section.

      Reviewer 3 (Public Review):

      Summary: The authors proposed a normative principle for how the brain’s internal estimate of an observed sensory variable should be updated during each individual observation. In particular, they propose that the update size should be inversely proportional to the variance of the variable. They then proposed a microcircuit model of how such an update can be implemented, in particularly incorporating two types of interneurons and their subtractive and divisive inhibition onto pyramidal neurons. One type should represent the estimated mean while another represents the estimated variance. The authors used simulations to show that the model works as expected.

      Strengths: The paper addresses two important issues: how uncertainty is represented and used in the brain, and the role of inhibitory neurons in neural computation. The proposed circuit and learning rules are simple enough to be plausible. They also work well for the designated purposes. The paper is also well-written and easy to follow.

      Weaknesses: I have concerns with two aspects of this work.

      (1) The optimality analysis leading to Eq (1) appears simplistic. The learning setting the authors describe (estimating the mean of a stationary Gaussian variable from a stream of observations) is a very basic problem in online learning/streaming algorithm literature. In this setting, the real “optimal” estimate is simply the arithmetic average of all samples seen so far. This can be implemented in an online manner with µˆt = µˆt−1 +(st −µˆt−1)/t. This is optimal in the sense that the estimator is always the maximum likelihood estimator given the samples seen up to time t. On the other hand, doing gradient descent only converges towards the MLE estimator after a large number of updates. Another critique is that while Eq (1) assumes an estimator of the mean (mˆu), it assumes that the variance is already known. However, in the actual model, the variance also needs to be estimated, and a more sophisticated analysis thus needs to take into account the uncertainty of the variance estimate and so on. Finally, the idea that the update should be inverse to the variance is connected to the well-established idea in neuroscience that more evidence should be integrated over when uncertainty is high. For example, in models of two-alternative forced choices it is known to be optimal to have a longer reaction time when the evidence is noisier.

      We agree with the reviewer that the simple example we gave was not ideal, as it could have been solved much more elegantly without gradient descent. And the reviewer correctly pointed out that our solution was not even optimal. We now present a better example in Figure 7, where the mean of the Gaussian variable is not stationary. Indeed, we did not intend to assume that the Gaussian variable is stationary, as we had in mind that the environment can change and hence also the Gaussian variable. If the mean is constant over time, it is indeed optimal to use the arithmetic mean. However, if the mean changes after many samples, then the maximum likelihood estimator model would be very slow to adapt to the new mean, because t is large and each new stimulus only has a small impact on the estimate. If the mean changes, uncertainty modulation may be useful: if the variance was small before, and the mean changes, then the resulting big error will influence the change in the estimate much more, such that we can more quickly learn the new mean. A combination of the two mechanisms would probably be ideal. We use gradient descent here, because not all optimisation problems the brain needs to solve are that simple. The problem with converging only after a large number of updates is a general problem of the algorithm. Here, we propose how the brain could estimate uncertainty to achieve the uncertainty-modulation observed in inference and learning tasks observed in behavioural studies. To give a more complex example, we present in a new Figure 8 how a hierarchy of UPE circuits can be used for uncertainty-based integration of prior and sensory information, similar to Bayes-optimal integration.

      Yes, indeed, there is well-known behavioural evidence, we would like to thank the reviewer for pointing out this connection to two-alternative forced choice tasks. We now cite this work. Our contribution is not on the already established computational or algorithmic level, but the proposal of a neural implementation of how uncertainty could modulate learning. The variance indeed needs to be estimated for optimal mean updating. That means that in the beginning, there will be non-optimal updating until the variance is learned. However, once the variance is learned, mean-updating can use the learned variance. There may be few variance contexts but many means to be learned, such that variance contexts can be reused. In any case, this is a problem on the algorithmic level, and not so much on the implementational level we are concerned with.

      (2) While the incorporation of different inhibitory cell types into the model is appreciated, it appears to me that the computation performed by the circuit is not novel. Essentially the model implements a running average of the mean and a running average of the variance, and gates updates to the mean with the inverse variance estimate. I am not sure about how much new insight the proposed model adds to our understanding of cortical microcircuits.

      We here suggest an implementation for how uncertainty could modulate learning via influencing prediction error com- putation. Our model can explain how humans could estimate uncertainty and weight prior versus sensory information accordingly. The focus of our work was not to design a better algorithm for mean and variance estimation, but rather to investigate how specialised prediction error circuits in the brain can implement these operations to provide new experimental hypotheses and predictions.

      Reviewer 1 (Recommendations For The Authors):

      Clarity and conciseness are a strength of this manuscript, but a more comprehensive explanation could improve the reader’s understanding in some instances. This includes the NMDA-based nonlinearity of pyramidal neuron activation - I am a little unclear exactly what problem this solves and how (alongside the significance of 5D and E).

      We agree that the introduction of the NMDA-based nonlinearity was too short and understand that it was confusing. We therefore modified and expanded the section, where we introduce the dendritic nonlinearity of the error neurons.

      Page 5: I think there is a ’positive’ and ’negative’ missing from the following sentence: ’the weights from the SSTs to the UPE neurons need to match the weights from the stimulus s to the UPE neuron and from the mean representation to the UPE neuron, respectively.’

      Thanks for pointing that out! We changed the sentence to be more clear to the following: “To ensure a comparison between the stimulus and the prediction, the inhibition from the SSTs needs to match the excitation it is compared to in the UPE neurons: In the positive PE circuit, the weights from the SSTs representing the prediction to the UPE neurons need to match the weights from the stimulus s to the UPE neurons. In the negative PE circuit, the weights from SSTs representing the stimulus to the negative UPE neurons need to match the weights from the mean representation to the UPE neurons, respectively.”

      Reviewer 2 (Recommendations For The Authors):

      Related to the first point above: I don’t feel that the authors adequately explained what the “s” and “a” information (e.g., in Figures 2A, 3A) represent, where they are coming from, what neurons they impact and in what way (and I believe Fig. 3A is missing one “a” label). I think they should elaborate more fully on these key, foundational details for their theory. To me, the idea of starting from the PV, SST, and pyramidal circuit, and then suddenly introducing the extra R neuron in Figure 5, just adds confusion. If the R neuron is meant to be the source, in practice, of certain inputs to some of the other cell types, then I think that should be included in the circuit from the start. Perhaps a good idea would be to start with two schematics, one in the form of Figure 5A (but with additional labeling for PV, SST) and one like Figure 1EF (but with auditory inputs as well), with a clear indication that the latter is meant to represent a preliminary, reduced form of the former that will be used in some initial tests of the performance of the PV, SST, UPE part of the circuit. Related to the Methods, I also can give a list of some specific complaints (in latex):

      (1) φ, φP V are used in equations (10), (11), so they should be defined there, not many equations later.

      Thank you, we changed that.

      (2) β, 1 − β appear without justification or explanation in (11). That is finally defined and derived several pages later.

      Thank you, we now define it right at the beginning.

      (3) Equations (10)-(12) should be immediately followed by information about plasticity, rather than deferring that.

      That’s a great idea. We changed it. Now the synaptic dynamics are explained together with the firing rate dynamics.

      (4) After the rate equations (10)-(12) and weight change equations (23)-(25) are presented, the same equations are simply repeated in the “Explanation of the synaptic dynamics” subsection.

      We agree that this was suboptimal. We moved the explanation of the synaptic dynamics up and removed the repetition.

      (5) In the circuit model (13)-(19), it’s not clear why rR shows up in the SST+ and PV− equations vs. rs in PV+ and SST−. Moreover, rs is not even defined! Also, I don’t see why wP V +,R shows up in the equation for rP V − .

      We added more explanation to the Methods section as to why the neurons receive these inputs and renamed rs to s, which is defined. The “+” in wP V +,R was a typo. Thank you for spotting that.

      (6) The authors should only number those equations that they will reference by number. Even more importantly, there are many numbers such as (20), (26), (32), (39) that are just floating there without referring to an equation at all.

      Thank you for spotting that. We corrected this.

      (7) The authors fail to specify what is ra in Figure 8. Moreover, it seems strange to me that wP V,a approaches σ rather than wP V,ara approaching σ, since φP V is a function of wP V,ara.

      You are right, wP V,ara should approach σ, but since ra is either 1 or 0 to indicate the presence of absence of the cue, and only wP V,a is plastic and changing„ wP V,a approaches σ.

      (8) I don’t understand the rationale for the authors to introduce equation. (30) when they already had plasticity equations earlier. What is the relation of (30), (31) to (24)?

      It is the same equation. In 30 we introduce simpler symbols for a better overview of the equations. 31 is equal to 30, with rP V replaced by it’s steady state.

      (9) η is omitted from (33) - it won’t affect the final result but should be there.

      We fixed this.

      I have many additional specific comments and suggestions, some related to errors that really should have been caught before manuscript submission. I will present these based on the order in which they arise in the manuscript.

      (1) In the abstract, the mention of layer 2/3 comes out of nowhere. Why this layer specifically? Is this meant to be an abstract/general cortical circuit model or to relate to a specific brain area? (Also watch for several minor grammatical issues in the abstract and later.)

      Thank you for pointing this out. We now mention that the observed error neurons can be found in layer 2/3 of diverse brain areas. It is meant to be a general cortical circuit model independent of brain area.

      (2) In par. 2 of the introduction, I find sentences 3-4 to be confusing and vague. Please rewrite what is meant more directly and clearly.

      We tried to improve those sentences.

      (3) Results subtitle 1: “suggests” → “suggest”

      Thank you.

      (4) Be careful to use math font whenever variables, such as a and N, are referenced (e.g., use of a instead of a bottom pg. 2).

      We agree and checked the entire manuscript.

      (5) Ref. to Fig. 1B bottom pg. 2 should be Fig. 1CD. The panel order in the figure should then be changed to match how it is referenced.

      We fixed it and matched the ordering of the text with the ordering of the figure.

      (6) Fig. 2C and 3E captions mention std but this is not shown in the figures - should be added.

      It is there, it is just very small.

      (7) Please clarify the relation of Figure 2C to 2F, and Figure 3F to 3H.

      We colour-coded the points in 2F that correspond to the bars in 2C. We did the same for 3F and 3H.

      (8) Figures 3E,3F appear to be identical except for the y-axis label and inclusion of std in 3F. Either more explanation is needed of how these relate or one should be cut.

      The difference is that 3E shows the activity of PVs based on only the sound cue in the absence of a whisker stimulus. And 3F shows the activity of PVs based on both the sound cue and whisker stimuli. We state this more clearly now.

      (9) Bottom of pg. 4: clarify that a quadratic φP V is a model assumption, not derived from results in the figure.

      We added that we assume this.

      (10) When k is referenced in the caption of Figure 4, the reader has no idea what it is. More substantially, most panels of Figure 4 are not referenced in the paper. I don’t understand what point the authors are trying to make here with much of this figure. Indeed, since the claim is that the uncertainy prediction should be based on division by σ2, why aren’t the numerical values for UPE rates much larger, since σ gets so small? The authors also fail to give enough details about the simulations done to obtain these plots; presumably these are after some sort of (unspecified) convergence, and in response to some sort of (unspecified) stimulus? Coming back to k, I don’t understand why k > 2 is used in addition to k = 2. The text mentions – even italicizes – “out-of-distribution dectection’, but this is never mentioned elsewhere in the paper and seems to be outside the true scope of the work (and not demonstrated in Figure 4). Sticking with k = 2 would also allow authors to simply use (·)k below (10), rather than the awkward positive part function that they have used now.

      We now introduce the equation for the error neurons in Eq. 3 within the text, such that k is introduced before the caption. It also explains why the numerical values do not become much larger. Divisive inhibition, unlike mathematical division, cannot lead to multiplication in neurons. To ensure this, we add 1 to the denominator.

      We show the error neuron responses to stimuli deviating from the learned mean after learning the mean and variance. The deviation is indicated either on the x-axis or in the legend depending on the plot. We now more explicitly state that these plots are obtained after learning the mean and the variance.

      We removed the mentioning of the “out-of-distribution detection” as a detailed treatment would indeed be outside of the scope.

      (11) Page 5, please clarify what is meant by “weights from the sound...”. You have introduced mathematical notation - use it so that you can be precise.

      We added the mathematical notation, thank you!

      (12) Figure 5D: legend has 5 entries but the figure panel only plots 4 quantities.

      The SST firing rate was below the R firing rate. We hence omitted the SST firing rate and its legend.

      (13) Figure 5: I don’t understand what point is being made about NMDA spikes. The text for Figure 5 refers to NMDA spikes in Figure 4, but nothing was said about NMDA spikes in the text for Figure 4 nor shown in Figure 4 itself.

      We were referring to the nonlinearity in the activation function of UPEs in Figure 4. We changed the text to clarify this point.

      (14) Figure 6: It is too difficult to distinguish the black and purple curves even on a large monitor. Also, the authors fail to define what they mean by “MM” and also do not define the quantities Y+ and Y− that they show. Another confusing aspect is that the model has PV+ and PV− neurons, so why doesn’t the figure?

      Thank you for the comment. We changed the colour for better visibility, replaced the Upsilons with UPE (we changed the notation at some point and forgot to change it in the figure), and defined MM, which is the mismatch stimulus that causes error activity. We did not distinguish between PV+ and PV− in the plot as their activity is the same on average. We plotted the activity of the PV+. We now mention that we show the activity of PV+ as the representative.

      (15) Also Figure 6: The authors do not make it clear in the text whether these are simulation results or cartoons. If the latter, please replace this with actual simulation results.

      They are actual simulation results. We clarified this in the text.

      (16) This work assumes the existence of two different populations of SST neurons with distinct UPE (pyramidal) targets. The Discussion doesn’t say much about any evidence for this assumption, which should be more thoroughly discussed and justified.

      We now discuss this in more detail in the discussion as mentioned in our response to the public review.

      (17) Par. 2 of the discussion refers to “Bayesian” and “Bayes-optimal” several times. Nothing was said earlier in the paper about a Bayesian framework for these results and it’s not clear what the authors mean by referring to Bayes here. This paragraph needs editing so that it clearly relates to the material of the results section and its implications.

      We added an additional results section (the last section with Figure 8) on integrating prior and sensory information based on their uncertainties, which is also the case for Bayes-optimal integration, and show that our model can reproduce the central tendency effect, which is a hallmark of Bayes-optimal behaviour.

      Reviewer 3 (Recommendations For The Authors):

      See public review. I think the gradient-descent type of update the authors do in Equation (1) could be more useful in a more complicated learning scenario where the MLE has no closed form and has to be computed with gradient-based algorithms.

      We responded in detail to your points in our point-by-point response to the public review.

    1. Author response:

      Joint Public Review:

      Summary:

      This study presents a strategy to efficiently isolate PcrV-specific BCRs from human donors with cystic fibrosis who have/had Pseudomonas aeruginosa (PA) infection. Isolation of mAbs that provide protection against PA may be a key to developing a new strategy to treat PA infection as the PA has intrinsic and acquired resistance to most antibiotic drug classes. Hale et al. developed fluorescently labeled antigen-hook and isolated mAbs with anti-PA activity. Overall, the authors' conclusion is supported by solid data analysis presented in the paper. Four of five recombinantly expressed PcrV-specific mAbs exhibited anti-PA activity in a murine pneumonia challenge model as potent as the V2L2MD mAb (equivalent to gremubamab). However, therapeutic potency for these isolated mAbs is uncertain as the gremubamab has failed in Phase 2 trials. Clarification of this point would greatly benefit this paper.

      Strengths:

      (1) High efficiency of isolating antigen-specific BCRs using an antigenic hook.

      (2) The authors' conclusion is supported by data.

      Weaknesses:

      Although the authors state that the goal of this study was to generate novel protective mAbs for therapeutic use (P12; Para. 2), it is unclear whether PcrV-specific mAbs isolated in this study have therapeutic potential better than the gremubamab, which has failed in Phase 2 trials. Four of five PcrV-specific mAbs isolated in this study reduced bacterial burdens in mice as potent as, but not superior to, gremubamab-equivalent mAb. Clarification of this concern by revising the text or providing experimental results that show better potential than gremubamab would greatly benefit this paper.

      The authors thank the reviewer for their thoughtful positive assessment. As noted by the reviewer, the studies described here, which were performed in mice, show that our MBC-derived mAbs are as effective as V2L2MD, a mAb that is one component of the gremubamab bi-specific. However, key theoretical strengths of MBC-derived mAbs (reduced immunogenicity, full participation in effector functions) are not easily tested in mice. We have clarified and expanded our discussion of these points in our revised manuscript, particularly in the Discussion paragraph 4.

    1. Author response:

      The following is the authors’ response to the previous reviews

      We greatly appreciate all the reviewers’ constructive comments on our previously revised manuscript. In the current revision, we added several experimental data for answering the reviewers’ comments. Below we describe our point-by-point responses to their comments:

      Reviewer #1 (Public Review):

      Unaddressed and additional concerns (re-submission)

      In this revised version of the manuscript, the authors have made important modifications in the text, inserted new references, and incorporated additional quantifications of cFos immunolabeling in three brain regions, as recommended by the reviewers. While these modifications have significantly improved the quality of the manuscript, other critical concerns raised during the initial submission of the

      manuscript (Major concerns 1, 2, and 4; some of them also raised by the other reviewers) were not properly addressed by the authors. On several occasions, the authors recognize the importance of clarifying the points for the correct interpretation of the results but opt for leaving the open questions to be addressed during future studies. Therefore, the authors might consider adding a new section at the end of the manuscript to include all the caveats and future directions.

      In the current revision, in order to answer the reviewer #1’s original concerns 1, 2, and 4, we added several experimental data.

      Original major concerns 1) and 2): Regarding whether mice are detecting qualitative or quantitative differences between fresh and old cat saliva.

      To address these concerns, as shown in new Figure 1I and J, we measured volumes of saliva contained in in individual swabs and total protein concentrations at the time of behavior tests: Fresh (15 minutes after collection) and Old (4 hours after collection). The saliva volumes at the time of behavioral testing were indistinguishable between fresh and old samples (Figure 1I). In addition, the concentrations of total proteins in both fresh and old saliva were also indiscernible (Figure 1J). Furthermore, we also examined the difference of the amount of Fel d 4 protein, one of the most abundant proteins in cat saliva, between fresh and old saliva by conducting western blotting analyses. As shown in new Supplemental Figure 2, the amount of Fel d 4 was nearly equivalent between fresh and old saliva. Indeed, our analyses using recombinant Fel d 4 protein showed that Fel d 4 does not induce freezing behavior (Supplemental Figure 5). Based on these findings, we believe that the difference between fresh and old cat saliva lies in specific components rather than the total or major saliva content. One possible explanation for this difference is the time-dependent reduction of specific freezing-inducing components in old saliva.

      To investigate such a possibility, we also examined mouse behavior directed toward swabs containing diluted fresh cat saliva. Indeed, exposure to diluted fresh saliva resulted in a shorter duration of freezing behavior. Fresh saliva diluted to 70% induced freezing behavior for a duration equivalent to that of undiluted fresh saliva, while freezing behavior in response to 50% and 30% fresh saliva was significantly reduced to the same duration as that observed with old saliva (Figure 1K). The duration of direct interaction with swabs containing 70% and 50–30% fresh saliva also exhibited a similar trend to that observed with fresh and old saliva swabs, respectively (Figure 1L).

      These new results provide compelling evidence that the differential freezing response of mice to fresh versus old cat saliva is not attributed to quantitative differences, such as total volume, total protein concentration, or the amount of major proteins like Fel d 4. However, when fresh saliva was diluted, we observed a corresponding reduction in freezing behavior, suggesting that specific components within the saliva—those responsible for inducing freezing—may decrease over time.

      Our findings indicate that while the overall content of saliva remains consistent over time, specific freezing-inducing components seem to degrade or reduce at a different rate than other components, which alters the composition of saliva over time. The speed of reduction of these freezing-inducing components appears to be different from more stable proteins such as Fel d 4. As a result, the composition of saliva changes over time, leading to a qualitative difference between fresh and old saliva that mice can detect. This ability to discern such subtle chemical changes likely reflects an adaptive sensory mechanism, allowing mice to respond to predator cues to induce optimal defensive behavior in a certain context. Identifying the specific freezing-inducing components through traditional purification processes, such as high-performance liquid chromatography followed by behavioral examination (Haga-Yamanaka et al., 2014; Kimoto et al., 2005), is crucial for a deeper understanding of the mechanisms underlying the observed behavior. Our research team is actively working to isolate these molecules, and we hope to report our findings in future studies.

      (4) The interpretation that fresh and old saliva activates different subpopulations of neurons in the VMH based on the observation that cFos positively correlates with freezing responses only with the fresh saliva lacks empirical evidence. To address this question, the authors should use two neuronal activity markers to track the response of the same population of VHM cells within the same animals during exposure to fresh vs. old saliva.

      To address this issue, as shown in the new Figure 7, we performed a double exposure experiment using Fos2A-iCreERT2; Ai9 (TRAP2) mice (Allen et al., 2017; DeNardo et al., 2019). In this experiment, mice were exposed to the first stimulus under the treatment of 4-hydroxytamoxifen (4-OHT). One week after the initial exposure, the same mice were subjected to a second stimulus exposure for one hour. Through this paradigm, neurons activated by the first stimulus were visualized by tdTomato, while ones activated by the second stimulus were detected as cFos-IR (Figure 7A). Quantification of tdTomato and cFos-IR double-positive cells among tdTomato-labeled cells revealed that 43% (mean per animal: 61 / 143) of cells activated by fresh saliva during the first exposure were also activated by fresh saliva during the second exposure, whereas only 16% (17 / 106) of cells activated by old saliva during the first exposure were activated by fresh saliva during the second exposure (p = 7.5e-6, Chi-squared test). The difference in the fraction of overlapping cells between fresh and old saliva exposures was found significant when we compared the two groups of animals (Figure 7D, p = 0.0035, permutation test). Additionally, quantification of tdTomato and cFos-IR double-positive cells among cFos-IR cells indicated that over 27% (61 / 226) of cells activated by fresh saliva during the second exposure were previously activated by fresh saliva, whereas only 15% (17 / 112) of cells activated by fresh saliva during the second exposure were previously activated by old saliva (p = 0.015, Chi-squared test). The difference in the fraction of overlapping cells between fresh and old saliva exposures was also significant in this analysis (Figure 7E,p = 0.0060, permutation test). Together, these results demonstrate that fresh and old cat saliva activate largely different populations of neurons within the VMH. These new results were described on page 11 line 18 – page 12 line 8.

      In addition to these unaddressed concerns, some new issues have emerged in the new version of the manuscript. For example, the following paragraph introduced in the discussion section is not supported by the experimental findings.

      "We assume that such differential activations of the mitral cells between fresh and old saliva result in the differential activation of targeting neural substrates, possibly MeApv, which results in differential activation of VMH neurons (Figure 7)."

      Although the authors did not observe statistical differences in cFos expression in the pvMeA among groups, they claim that the differences in cFos expression in the VMH between fresh vs. old saliva are mediated by differential activation of upstream neurons in the MeApv. The lack of statistical differences may be caused by the reduced number of subjects in each group, as recognized in the text by the

      authors.

      We appreciate the reviewer's thoughtful comment. We agree that the paragraph in the comment, which presented a working hypothesis regarding differential activations of mitral cells and the MeApv between fresh and old saliva exposures, was speculative and not fully supported by our experimental findings. To address this, we have removed the assumptions related to the differential responses of mitral cells and the MeApv from the discussion and have updated the figure accordingly (now presented as new Figure 8).

      Moreover, the authors propose that in addition to fel d 4, multiple molecules present in the cat saliva can be inducing distinct defensive responses in the animals, but they do not provide any reference to support their claim.

      We thank the reviewer for highlighting this point. Our claim regarding the presence of other molecules in cat saliva inducing freezing defensive responses is based on our observation, as shown in the new Supplemental Figure 5, that recombinant Fel d 4 protein alone does not induce freezing behavior. This suggests the existence of other unidentified components in cat saliva that may contribute to freezing behavior. As we agree that identifying these specific freezing-inducing components is important for a more comprehensive understanding of the underlying mechanisms, our research team is actively working to isolate these molecules, and we hope to report our findings in future studies.

      Reviewer #2 (Public Review):

      The findings are relatively preliminary. The identities of the receptor and the ligand in the cat saliva that induces the behavior remain unclear. The identity of VMH cells that are activated by the cat saliva remains unclear. There is a lack of targeted functional manipulation to demonstrate the role of V2R-A4 or VMH cells in the behavioral response to the cat saliva.

      We thank the reviewer’s important insight on the need for further investigation into the molecular and neural mechanisms underlying the behavioral response to cat saliva. We recognize the importance of conducting studies involving V2R-A4 receptor knockouts and targeted functional manipulations within the VMH using neural circuit perturbation approaches.

      However, the V2R-A4 subfamily consists of 25 Vmn2r genes, most of which are closely grouped together, forming a V2R-A4 gene cluster within a 2.5-megabase chromosomal region. As we described in our recent review article (Rocha et al., 2024), the Vmn2r genes within the V2R-A4 subfamily display a high degree of homology, with nucleotide and amino acid identities among the several Vmn2rs surpassing 97-99%, suggesting possible redundancy among these receptor genes. This is in stark contrast to the diversity typically observed within other V2R subfamilies. Consequently, knockout strategies targeting a single receptor gene, which have been successful for other vomeronasal receptors, may not be effective for V2R-A4 receptor genes. The most appropriate strategy for examining the necessity of V2R-A4 receptors would be knocking out the entire V2R-A4 gene cluster, spanning a 2.5-megabase chromosomal region. Due to the technical challenges involved, addressing this issue is not feasible in the foreseeable future. Moreover, in our current study, we aimed to establish the foundational relationship between predator cues in cat saliva and defensive behaviors. We view our findings as an important first step that sets the stage for these more targeted and mechanistic studies involving the neural circuit perturbation experiments, such as optogenetics and Designer Receptors Exclusively Activated by Designer Drugs (DREADDs), in the next step.

      Reviewer #3 (Public Review):

      Weaknesses:

      (1)  It is unclear if fresh and old saliva indeed alter the perceived imminence of predation, as claimed by the authors. Prior work indicates that lower imminence induces anxiety-related actions, such as re- organization of meal patterns and avoidance of open spaces, while slightly higher imminence produces freezing. Here, the authors show that fresh and old predator saliva only provoke different amounts of freezing, rather than changing the topography of defensive behaviors, as explained above. Another prediction of predatory imminence theory would be that lower imminence induced by old saliva should produce stronger cortical activation, while fresh saliva would activate amygdala, if these stimuli indeed correspond to significantly different levels of predation imminence.

      We appreciate the reviewer’s insightful comments regarding the perceived imminence of predation and the behavioral responses to fresh and old saliva. Our study specifically focused on comparing the defensive behaviors of mice in response to 15-minute-old and 4-hour-old cat saliva, particularly within the context of freezing behavior in their home cages. We chose these specific time points to capture the potential variation in behavioral intensity rather than the full spectrum of defensive behaviors. While a more comprehensive analysis—including varying time points, different types of defensive behaviors, and broader neural activation patterns (e.g., cortical versus amygdala activation)—might provide further insights into predation imminence theory, these aspects were beyond the scope of our current study. Future research could certainly address these points by examining behavioral and neural responses across additional saliva aging intervals and in varied behavioral contexts. Such studies would complement and extend the findings presented here, further elucidating the relationship between predator cue characteristics and defensive behaviors.

      (2)  It is known that predator odors activate and require AOB, VNO and VMH, thus replications of these findings are not novel, decreasing the impact of this work.

      As the reviewer mentioned, the activation of the AOB, VNO, and VMH by predator odors has been established in prior studies. However, our study provides new insights by demonstrating that defensive freezing behavior in response to predator odors is mediated through the vomeronasal organ (VNO) sensory circuit, which has not been previously shown. The novelty of our work lies in two key findings: 1) the introduction of a new behavioral paradigm that assesses freezing responses to predator cues based on the freshness of chemosensory signals in cat saliva, and 2) the demonstration that the vomeronasal sensory circuit mediates defensive freezing behavior in response to cat saliva.

      Additionally, our results show that cat saliva of different freshness levels differentially activates VNO sensory neurons that express the same subfamily of sensory receptors. This differential activation subsequently modulates the downstream neural circuits, leading to varied freezing behavioral outcomes. We believe these findings provide a novel conceptual advance over previous studies by elucidating a more detailed mechanism of how predator-derived cues influence defensive behaviors through the accessory olfactory system.

      (3)  There is a lack of standard circuit dissection methods, such as characterizing the behavioral effects of increasing and decreasing neural activity of relevant cell bodies and axonal projections, significantly decreasing the mechanistic insights generated by this work

      We thank the reviewer for this valuable comment. Investigating the behavioral effects of manipulating specific cell types and axonal projections, as well as characterizing circuit connectivity, is essential for a more comprehensive understanding of the underlying neural circuits. These approaches, such as modulating neural activity in defined cell populations and dissecting circuit pathways, using optogenetics, DREADD, etc., would provide deeper mechanistic insights. In our current study, however, we aimed to establish the foundational relationship between predator cues in cat saliva and defensive behaviors. We view our findings as an important first step that sets the stage for these more targeted and mechanistic studies in the future.

      (4)  The correlation shown in Figure 5c may be spurious. It appears that the correlation is primarily driven by a single point (the green square point near the bottom left corner). All correlations should be calculated using Spearman correlation, which is non-parametric and less likely to show a large correlation due to a small number of outliers. Regardless of the correlation method used, there are too few points in Figure 5c to establish a reliable correlation. Please add more points to 5c.

      We appreciate the reviewer’s suggestion regarding the correlation analysis in Figure 5E. We assessed the normality of our data using both the Shapiro-Wilk and Kolmogorov-Smirnov tests, which confirmed that the dataset is parametric, justifying the use of a parametric correlation method in this context. However, we acknowledge the concern about the limited number of data points and the influence of potential outliers on the observed correlation. Increasing the sample size might provide a more robust assessment of correlation patterns and reduce the potential impact of any single data point. While this would be an important direction for future research, such as with larger sample sizes, it is beyond the scope of the current study.

      (5)  Please cite recent relevant papers showing VMH activity induced by predators, such as https://pubmed.ncbi.nlm.nih.gov/33115925/ and https://pubmed.ncbi.nlm.nih.gov/36788059/

      We thank the reviewer’s suggestion to cite these important papers. https://pubmed.ncbi.nlm.nih.gov/33115925/ (Esteban Masferrer et al., 2020) and https://pubmed.ncbi.nlm.nih.gov/36788059/ (Tobias et al., 2023) are now cited at page 16 line 10 in Discussion under “Differential activation of VMH neurons potentially underlying distinct intensities of freezing behavior.”

      (6)  Add complete statistical information in the figure legends of all figures, which should include n, name of test used and exact p values.

      We included statistical analysis results in figure legends; for Figure 6B, we provided statistical analysis results in Supplemental Table 1.

      (7)  Some of the findings are disconnected from the story. For example, the authors show V2R-A4- expressing cells are activated by predator odors. Are these cells more likely to be connected to the rest of the predatory defense circuit than other VNO cells?

      Yes, our hypothesis posits that V2R-A4-expressing VNO sensory neurons serve as receptor neurons for predator cues present in cat saliva. Additionally, we assume that these specific sensory neurons have stronger anatomical connections with the defensive circuit compared to VNO sensory neurons expressing other receptor subfamilies. In our modified Discussion section, we discussed this point under “V2R-A4 subfamily as the receptor for predator cues in cat saliva.”

      (8)  Please paste all figure legends directly below their corresponding figure to make the manuscript easier to read

      We have added figure legends directly below their corresponding figures.

      (9)  Were there other behavioral differences induced by fresh compared to old saliva? Do they provoke differences in stretch-attend risk evaluation postures, number of approaches, average distance to odor stimulus, velocity of movements towards and away the odor stimulus, etc?

      We appreciate the reviewer's valuable comments. We have now incorporated an analysis of stretch-sniff risk assessment behavior, presented in new Figure 1F (graph) and Supplemental Figure 1B (raster plot). Mice exhibited stretch-sniff risk assessment behavior, which remained consistent across control, fresh saliva, and old saliva swabs. Additionally, we have also included a raster plot for direct investigation, previously noted as ‘interaction’ in the original manuscript (Supplemental Figure 1C). Mice exposed to a swab containing either fresh or old saliva significantly avoided directly investigating the swab. In contrast, mice exposed to a clean control swab spent a significant amount of time directly investigating the swab, engaging in behaviors such as sniffing and chewing (Figure 1G). A comparison of temporal behavioral patterns revealed a slightly higher frequency of direct investigation behavior toward old saliva compared to fresh saliva at the beginning of the exposure period (Supplemental Figure 1C).

      Reviewer #3 (Recommendations For The Authors):

      The authors have partially addressed several important points raised in the prior review, increasing the strength of the manuscript. However, 2 key questions already raised previously, were not addressed:

      (1)  Is old saliva qualitatively different from new saliva, or is it the same as a smaller amount of new saliva? As Reviewer 1 wrote: "An important point that the authors should clarify in this study is whether mice are detecting qualitative or quantitative differences between fresh and old cat saliva."

      Since one of the author's main points is that fresh and old saliva elicit different perceived threat imminences, it is crucial to show that these two stimuli are somehow qualitatively different.

      One way to investigate this could be to show that animals perform different behaviors when exposed to smaller among of new saliva vs old saliva, or that the cfos activation patterns are different in these two conditions.

      The answers to these concerns are provided in the Public review Comment from Reviewer #1.

      (2)  The other key question is if different VMH populations are activated by new vs old saliva.

      The answer to this concern is provided in the Public Review comment from Reviewer #1.

      Lastly, although the new analysis and text changes improved the manuscript, many issues raised were addressed with some variation of 'future studies will be done', or 'we concur with the Reviewer'. However, the extra experiments required to answer these questions were not done. For this reason, even though the authors have numerous exciting pieces of data, overall the work is still incomplete. I highlight below some examples in which the authors agree with the Reviewer, but do not answer the question with the new work that would be required, or propose to do the work in future studies.

      In this revised manuscript, we have conducted several additional experiments to address key concerns raised by the reviewers that are directly relevant to our claims. Specifically, we have examined: 1) whether qualitative or quantitative differences between fresh and old cat saliva are detected by mice to modulate behavior (NEW Figure 1I, J, K, and L, and NEW Supplemental Figure 2); 2) the involvement of Fel d 4 in freezing behavior (NEW Supplemental Figure 5); and 3) whether different VMH populations are activated by fresh versus old saliva (NEW Figure 7). However, some concerns raised by the reviewers fall outside the scope of the current manuscript. These include: 1) identifying the specific components that induce freezing, 2) examining the necessity of V2R-A4 receptors, 3) conducting neural circuit perturbations, and 4) performing a comprehensive analysis—including varying time points, different types of defensive behaviors, and broader neural activation patterns (e.g., cortical versus amygdala activation)—of the mouse’s defensive response to different levels of predator threat imminence. As these aspects are beyond the focus of our current manuscript, we have noted in the Public Review comments.

      References:

      Allen WE, DeNardo LA, Chen MZ, Liu CD, Loh KM, Fenno LE, Ramakrishnan C, Deisseroth K, Luo L. 2017. Thirst-associated preoptic neurons encode an aversive motivational drive. Science 357:1149– 1155.

      DeNardo LA, Liu CD, Allen WE, Adams EL, Friedmann D, Fu L, Guenthner CJ, Tessier-Lavigne M, Luo L. 2019. Temporal evolution of cortical ensembles promoting remote memory retrieval. Nat Neurosci 22:460–469.

      Haga-Yamanaka S, Ma L, He J, Qiu Q, Lavis LD, Looger LL, Yu CR. 2014. Integrated action of pheromone signals in promoting courtship behavior in male mice. Elife 3:e03025.

      Kimoto H, Haga S, Sato K, Touhara K. 2005. Sex-specific peptides from exocrine glands stimulate mouse vomeronasal sensory neurons. Nature 437:898–901.

      Rocha A, Nguyen QAT, Haga-Yamanaka S. 2024. Type 2 vomeronasal receptor-A4 subfamily: Potential predator sensors in mice. Genesis 62:e23597.

    1. Author response:

      Reviewer #1 (Public review):

      This manuscript from Schwintek and coworkers describes a system in which gas flow across a small channel (10^-4-10^-3 m scale) enables the accumulation of reactants and convective flow. The authors go on to show that this can be used to perform PCR as a model of prebiotic replication.

      Strengths:

      The manuscript nicely extends the authors' prior work in thermophoresis and convection to gas flows. The demonstration of nucleic acid replication is an exciting one, and an enzyme-catalyzed proof-of-concept is a great first step towards a novel geochemical scenario for prebiotic replication reactions and other prebiotic chemistry.

      The manuscript nicely combines theory and experiment, which generally agree well with one another, and it convincingly shows that accumulation can be achieved with gas flows and that it can also be utilized in the same system for what one hopes is a precursor to a model prebiotic reaction. This continues efforts from Braun and Mast over the last 10-15 years extending a phenomenon that was appreciated by physicists and perhaps underappreciated in prebiotic chemistry to increasingly chemically relevant systems and, here, a pilot experiment with a simple biochemical system as a prebiotic model.

      I think this is exciting work and will be of broad interest to the prebiotic chemistry community.

      Weaknesses:

      The manuscript states: "The micro scale gas-water evaporation interface consisted of a 1.5 mm wide and 250 µm thick channel that carried an upward pure water flow of 4 nl/s ≈ 10 µm/s perpendicular to an air flow of about 250 ml/min ≈ 10 m/s." This was a bit confusing on first read because Figure 2 appears to show a larger channel - based on the scale bar, it appears to be about 2 mm across on the short axis and 5 mm across on the long axis. From reading the methods, one understands the thickness is associated with the Teflon, but the 1.5 mm dimension is still a bit confusing (and what is the dimension in the long axis?) It is a little hard to tell which portion (perhaps all?) of the image is the channel. This is because discontinuities are present on the left and right sides of the experimental panels (consistent with the image showing material beyond the channel), but not the simulated panels. Based on the authors' description of the apparatus (sapphire/CNC machined Teflon/sapphire) it sounds like the geometry is well-known to them. Clarifying what is going on here (and perhaps supplying the source images for the machined Teflon) would be helpful.

      We understand. We will update the figures to better show dimensions of the experimental chamber. We will also add a more complete Figure in the supplementary information. Part of the complexity of the chamber however stems from the fact that the same chamber design has also been used to create defined temperature gradients which are not necessary and thus the chamber is much more complex than necessary.

      The data shown in Figure 2d nicely shows nonrandom residuals (for experimental values vs. simulated) that are most pronounced at t~12 m and t~40-60m. It seems like this is (1) because some symmetry-breaking occurs that isn't accounted for by the model, and perhaps (2) because of the fact that these data are n=1. I think discussing what's going on with (1) would greatly improve the paper, and performing additional replicates to address (2) would be very informative and enhance the paper. Perhaps the negative and positive residuals would change sign in some, but not all, additional replicates?

      To address this, we will show two more replicates of the experiment and include them in Figure 2.

      We are seeing two effects when we compare fluorescence measurements of the experiments.

      Firstly, degassing of water causes the formation of air-bubbles, which are then transported upwards to the interface, disrupting fluorescence measurements. This, however, mostly occurs in experiments with elevated temperatures for PCR reactions, such as displayed in Figure 4.

      Secondly, due to the high surface tension of water, the interface is quite flexible. As the inflow and evaporation work to balance each other, the shape of the interface adjusts, leading to alterations in the circular flow fields below.

      Thus the conditions, while overall being in steady state, show some fluctuations. The strong dependence on interface shape is also seen in the simulation. However, modeling a dynamic interface shape is not so easy to accomplish, so we had to stick to one geometry setting. Again here, the added movies of two more experiments should clarify this issue.

      The authors will most likely be familiar with the work of Victor Ugaz and colleagues, in which they demonstrated Rayleigh-Bénard-driven PCR in convection cells (10.1126/science.298.5594.793, 10.1002/anie.200700306). Not including some discussion of this work is an unfortunate oversight, and addressing it would significantly improve the manuscript and provide some valuable context to readers. Something of particular interest would be their observation that wide circular cells gave chaotic temperature profiles relative to narrow ones and that these improved PCR amplification (10.1002/anie.201004217). I think contextualizing the results shown here in light of this paper would be helpful.

      Thanks for pointing this out and reminding us. We apologize. We agree that the chaotic trajectories within Rayleigh-Bénard convection cells lead to temperature oscillations similar to the salt variations in our gas-flux system. Although the convection-driven PCR in Rayleigh-Bénard is not isothermal like our system, it provides a useful point of comparison and context for understanding environments that can support full replication cycles. We will add a section comparing approaches and giving some comparison into the history of convective PCR and how these relate to the new isothermal implementation.

      Again, it appears n=1 is shown for Figure 4a-c - the source of the title claim of the paper - and showing some replicates and perhaps discussing them in the context of prior work would enhance the manuscript.

      We appreciate the reviewer for bringing this to our attention. We will now include the two additional repeats for the data shown in Figure 4c, while the repeats of the PAGE measurements are already displayed in Supplementary Fig. IX.2. Initially, we chose not to show the repeats in Figure 4c due to the dynamic and variable nature of the system. These variations are primarily caused by differences at the water-air interface, attributed to the high surface tension of water. Additionally, the stochastic formation of air bubbles in the inflow—despite our best efforts to avoid them—led to fluctuations in the fluorescence measurements across experiments. These bubbles cause a significant drop in fluorescence in a region of interest (ROI) until the area is refilled with the sample.

      Unlike our RNA-focused experiments, PCR requires high temperatures and degassing a PCR master mix effectively is challenging in this context. While we believe our chamber design is sufficiently gas-tight to prevent air from diffusing in, the high surface-to-volume ratio in microfluidics makes degassing highly effective, particularly at elevated temperatures. We anticipate that switching to RNA experiments at lower temperatures will mitigate this issue, which is also relevant in a prebiotic context.

      The reviewer’s comments are valid and prompt us to fully display these aspects of the system. We will now include these repeats in Figure 4c to give readers a deeper understanding of the experiment's dynamics. Additionally, we will provide videos of all three repeats, allowing readers to better grasp the nature of the fluctuations in SYBR Green fluorescence depicted in Figure 4c.

      I think some caution is warranted in interpreting the PCR results because a primer-dimer would be of essentially the same length as the product. It appears as though the experiment has worked as described, but it's very difficult to be certain of this given this limitation. Doing the PCR with a significantly longer amplicon would be ideal, or alternately discussing this possible limitation would be helpful to the readers in managing expectations.

      This is a good point and should be discussed more in the manuscript. Our gel electrophoresis is capable of distinguishing between replicate and primer dimers. We know this since we were optimizing the primers and template sequences to minimize primer dimers, making it distinguishable from the desired 61mer product. That said, all of the experiments performed without a template strand added did not show any band in the vicinity of the product band after 4h of reaction, in contrast to the experiments with template, presenting a strong argument against the presence of primer dimers.

      Reviewer #2 (Public review):

      Schwintek et al. investigated whether a geological setting of a rock pore with water inflow on one end and gas passing over the opening of the pore on the other end could create a non-equilibrium system that sustains nucleic acid reactions under mild conditions. The evaporation of water as the gas passes over it concentrates the solutes at the boundary of evaporation, while the gas flux induces momentum transfer that creates currents in the water that push the concentrated molecules back into the bulk solution. This leads to the creation of steady-state regions of differential salt and macromolecule concentrations that can be used to manipulate nucleic acids. First, the authors showed that fluorescent bead behavior in this system closely matched their fluid dynamic simulations. With that validation in hand, the authors next showed that fluorescently labeled DNA behaved according to their theory as well. Using these insights, the authors performed a FRET experiment that clearly demonstrated the hybridization of two DNA strands as they passed through the high Mg++ concentration zone, and, conversely, the dissociation of the strands as they passed through the low Mg++ concentration zone. This isothermal hybridization and dissociation of DNA strands allowed the authors to perform an isothermal DNA amplification using a DNA polymerase enzyme. Crucially, the isothermal DNA amplification required the presence of the gas flux and could not be recapitulated using a system that was at equilibrium. These experiments advance our understanding of the geological settings that could support nucleic acid reactions that were key to the origin of life.

      The presented data compellingly supports the conclusions made by the authors. To increase the relevance of the work for the origin of life field, the following experiments are suggested:

      (1) While the central premise of this work is that RNA degradation presents a risk for strand separation strategies relying on elevated temperatures, all of the work is performed using DNA as the nucleic acid model. I understand the convenience of using DNA, especially in the latter replication experiment, but I think that at least the FRET experiments could be performed using RNA instead of DNA.

      We understand the request only partially. The modification brought about by the two dye molecules in the FRET probe to be able to probe salt concentrations by melting is of course much larger than the change of the backbone from RNA to DNA. This was the reason why we rather used the much more stable DNA construct which is also manufactured at a lower cost and in much higher purity also with the modifications. But we think the melting temperature characteristics of RNA and DNA in this range is enough known that we can use DNA instead of RNA for probing the salt concentration in our flow cycling.

      Only at extreme conditions of pH and salt, RNA degradation through transesterification, especially under alkaline conditions is at least several orders of magnitude faster than spontaneous degradative mechanisms acting upon DNA [Li, Y., & Breaker, R. R. (1999). Kinetics of RNA degradation by specific base catalysis of transesterification involving the 2 ‘-hydroxyl group. Journal of the American Chemical Society, 121(23), 5364-5372.]. The work presented in this article is however focussed on hybridization dynamics of nucleic acids. Here, RNA and DNA share similar properties regarding the formation of double strands and their respective melting temperatures. While RNA has been shown to form more stable duplex structures exhibiting higher melting temperatures compared to DNA [Dimitrov, R. A., & Zuker, M. (2004). Prediction of hybridization and melting for double-stranded nucleic acids. Biophysical Journal, 87(1), 215-226.], the general impact of changes in salt, temperature and pH [Mariani, A., Bonfio, C., Johnson, C. M., & Sutherland, J. D. (2018). pH-Driven RNA strand separation under prebiotically plausible conditions. Biochemistry, 57(45), 6382-6386.] on respective melting temperatures follows the same trend for both nucleic acid types. Also the diffusive properties of RNA and DNA are very similar [Baaske, P., Weinert, F. M., Duhr, S., Lemke, K. H., Russell, M. J., & Braun, D. (2007). Extreme accumulation of nucleotides in simulated hydrothermal pore systems. Proceedings of the National Academy of Sciences, 104(22), 9346-9351.].

      Since this work is a proof of principle for the discussed environment being able to host nucleic acid replication, we aimed to avoid second order effects such as degradation by hydrolysis by using DNA as a proxy polymer. This enabled us to focus on the physical effects of the environment on local salt and nucleic acid concentration. The experiments performed with FRET are used to visualize local salt concentration changes and their impact on the melting temperature of dissolved nucleic acids.  While performing these experiments with RNA would without doubt cover a broader application within the field of origin of life, we aimed at a step-by-step / proof of principle approach, especially since the environmental phenomena studied here have not been previously investigated in the OOL context. Incorporating RNA-related complexity into this system should however be addressed in future studies. This will likely require modifications to the experimental boundary conditions, such as adjusting pH, temperature, and salt concentration, to account for the greater duplex stability of RNA. For instance, lowering the pH would reduce the RNA melting temperature [Ianeselli, A., Atienza, M., Kudella, P. W., Gerland, U., Mast, C. B., & Braun, D. (2022). Water cycles in a Hadean CO2 atmosphere drive the evolution of long DNA. Nature Physics, 18(5), 579-585.].

      (2) Additionally, showing that RNA does not degrade under the conditions employed by the authors (I am particularly worried about the high Mg++ zones created by the flux) would further strengthen the already very strong and compelling work.

      Based on literature values for hydrolysis rates of RNA [Li, Y., & Breaker, R. R. (1999). Kinetics of RNA degradation by specific base catalysis of transesterification involving the 2 ‘-hydroxyl group. Journal of the American Chemical Society, 121(23), 5364-5372.], we estimate RNA to have a halflife of multiple months under the deployed conditions in the FRET experiment (High concentration zones contain <1mM of Mg2+). Additionally, dsRNA is multiple orders of magnitude more stable than ssRNA with regards to degradation through hydrolysis [Zhang, K., Hodge, J., Chatterjee, A., Moon, T. S., & Parker, K. M. (2021). Duplex structure of double-stranded RNA provides stability against hydrolysis relative to single-stranded RNA. Environmental Science & Technology, 55(12), 8045-8053.], improving RNA stability especially in zones of high FRET signal. Furthermore, at the neutral pH deployed in this work, RNA does not readily degrade. In previous work from our lab [Salditt, A., Karr, L., Salibi, E., Le Vay, K., Braun, D., & Mutschler, H. (2023). Ribozyme-mediated RNA synthesis and replication in a model Hadean microenvironment. Nature Communications, 14(1), 1495.], we showed that the lifetime of RNA under conditions reaching 40mM Mg2+ at the air-water interface at 45°C was sufficient to support ribozymatically mediated ligation reactions in experiments lasting multiple hours.

      With that in mind, gaining insight into the median Mg2+ concentration across multiple averaged nucleic acid trajectories in our system (see Fig. 3c&d) and numerically convoluting this with hydrolysis dynamics from literature would be highly valuable. We anticipate that longer residence times in trajectories distant from the interface will improve RNA stability compared to a system with uniformly high Mg2+ concentrations.

      (3) Finally, I am curious whether the authors have considered designing a simulation or experiment that uses the imidazole- or 2′,3′-cyclic phosphate-activated ribonucleotides. For instance, a fully paired RNA duplex and a fluorescently-labeled primer could be incubated in the presence of activated ribonucleotides +/- flux and subsequently analyzed by gel electrophoresis to determine how much primer extension has occurred. The reason for this suggestion is that, due to the slow kinetics of chemical primer extension, the reannealing of the fully complementary strands as they pass through the high Mg++ zone, which is required for primer extension, may outcompete the primer extension reaction. In the case of the DNA polymerase, the enzymatic catalysis likely outcompetes the reannealing, but this may not recapitulate the uncatalyzed chemical reaction.

      This is certainly on our to-do list. Our current focus is on templated ligation rather than templated polymerization and we are working hard to implement RNA-only enzyme-free ligation chain reaction, based on more optimized parameters for the templated ligation from 2’3’-cyclic phosphate activation that was just published [High-Fidelity RNA Copying via 2′,3′-Cyclic Phosphate Ligation, Adriana C. Serrão, Sreekar Wunnava, Avinash V. Dass, Lennard Ufer, Philipp Schwintek, Christof B. Mast, and Dieter Braun, JACS doi.org/10.1021/jacs.3c10813 (2024)]. But we first would try this at an air-water interface which was shown to work with RNA in a temperature gradient [Ribozyme-mediated RNA synthesis and replication in a model Hadean microenvironment, Annalena Salditt, Leonie Karr, Elia Salibi, Kristian Le Vay, Dieter Braun & Hannes Mutschler, Nature Communications doi.org/10.1038/s41467-023-37206-4 (2023)] before making the jump to the isothermal setting we describe here. So we can understand the question, but it was good practice also in the past to first get to know the setting with PCR, then jump to RNA.

      Reviewer #2 (Recommendations for the authors):

      (1) Could the authors comment on the likelihood of the geological environments where the water inflow velocity equals the evaporation velocity?

      This is an important point to mention in the manuscript, thank you for pointing that out. To produce a defined experiment, we were pushing the water out with a syringe pump, but regulated in a way that the evaporation was matching our flow rate. We imagine that a real system will self-regulate the inflow of the water column on the one hand side by a more complex geometry of the gas flow, matching the evaporation with the reflow of water automatically. The interface would either recede or move closer to the gas flux, depending on whether the inflow exceeds or falls short of the evaporation rate. As the interface moves closer, evaporation speeds up, while moving away slows it down. This dynamic process stabilizes the system, with surface tension ultimately fixing the interface in place.

      We have seen a bit of this dynamic already in the experiments, could however so far not yet find a good geometry within our 2-dimensional constant thickness geometry to make it work for a longer time. Very likely having a 3-dimensional reservoir of water with less frictional forces would be able to do this, but this would require a full redesign of a multi-thickness microfluidics. The more we think about it, the more we envisage to make the next implementation of the experiment with a real porous volcanic rock inside a humidity chamber that simulates a full 6h prebiotic day. But then we would lose the whole reproducibility of the experiment, but likely gain a way that recondensation of water by dew in a cold morning is refilling the water reservoirs in the rocks again. Sorry that I am regressing towards experiments in the future.

      (2) Could the authors speculate on using gases other than ambient air to provide the flux and possibly even chemical energy? For example, using carbonyl sulfide or vaporized methyl isocyanide could drive amino acid and nucleotide activation, respectively, at the gas-water interface.

      This is an interesting prospect for future work with this system. We thought also about introducing ammonia for pH control and possible reactions. We were amazed in the past that having CO2 instead of air had a profound impact on the replication and the strand separation [Water cycles in a Hadean CO2 atmosphere drive the evolution of long DNA, Alan Ianeselli, Miguel Atienza, Patrick Kudella, Ulrich Gerland, Christof Mast & Dieter Braun, Nature Physics doi.org/10.1038/s41567-022-01516-z (2022)]. So going more in this direction absolutely makes sense and as it acts mostly on the length-selectively accumulated molecules at the interface, only the selected molecules will be affected, which adds to the selection pressure of early evolutionary scenarios.

      Of course, in the manuscript, we use ambient air as a proxy for any gas, focusing primarily on the energy introduced through momentum transfer and evaporation. We speculate that soluble gasses could establish chemical gradients, such as pH or redox potential, from the bulk solution to the interface, similar to the Mg2+ accumulation shown in Figure 3c. The nature of these gradients would depend on each gas's solubility and diffusivity. We have already observed such effects in thermal gradients [Keil, L. M., Möller, F. M., Kieß, M., Kudella, P. W., & Mast, C. B. (2017). Proton gradients and pH oscillations emerge from heat flow at the microscale. Nature communications, 8(1), 1897.] and finding similar behavior in an isothermal environment would be a significant discovery.

      (3) Line 162: Instead of "risk," I suggest using "rate".

      Oh well - thanks for pointing this out! Will be changed.

      (4) Using FRET of a DNA duplex as an indicator of salt concentration is a decent proxy, but a more direct measurement of salt concentration would provide further merit to the explicit statement that it is the salt concentration that is changing in the system and not another hidden parameter.

      Directly observing salt concentration using microscopy is a difficult task. While there are dyes that change their fluorescence depending on the local Na+ or Mg2+ concentration, they are not operating differentially, i.e. by making a ratio between two color channels. Only then we are not running into artifacts from the dye molecules being accumulated by the non-equilibrium settings. We were able to do this for pH in the past, but did not find comparable optical salt sensors. This is the reason we ended up with a FRET pair, with the advantage that we actually probe the strand separation that we are interested in anyhow. Using such a dye in future work would however without a doubt enhance the understanding of not only this system, but also our thermal gradient environments.

      (5) Figure 3a: Could the authors add information on "Dried DNA" to the caption? I am assuming this is the DNA that dried off on the sides of the vessel but cannot be sure.

      Thanks to the reviewer for pointing this out. This is correct and we will describe this better in the revised manuscript.

      (6) Figure 4b and c: How reproducible is this data? Have the authors performed this reaction multiple independent times? If so, this data should be added to the manuscript.

      The data from the gel electrophoresis was performed in triplicates and is shown in full in supplementary information. The data in c is hard to reproduce, as the interface is not static and thus ROI measurements are difficult to perform as an average of repeats. Including the data from the independent repeats will however give the reader insight into some of the experimental difficulties, such as air bubbles, which form from degassing as the liquid heats up, that travel upwards to the interface, disrupting the ongoing fluorescence measurements.

      (7) Line 256: "shielding from harmful UV" statement only applies to RNA oligomers as UV light may actually be beneficial for earlier steps during ribonucleoside synthesis. I suggest rephrasing to "shielding nucleic acid oligomers from UV damage.".

      Will be adjusted as mentioned.

      (8) The final paragraph in the Results and Discussion section would flow better if placed in the Conclusion section.

      This is a good point and we will merge results and discussion closer together.

      (9) Line 262, "...of early Life" is slightly overstating the conclusions of the study. I suggest rephrasing to "...of nucleic acids that could have supported early life."

      This is a fair comment. We thank the reviewer for his detailed analysis of the manuscript!

      (10) In references, some of the journal names are in sentence case while others are in title case (see references 23 and 26 for example).

      Thanks - this will be fixed.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      Lejeune et al. demonstrated sex-dependent differences in the susceptibility to MRSA infection. The authors demonstrated the role of the microbiota and sex hormones as potential determinants of susceptibility. Moreover, the authors showed that Th17 cells and neutrophils contribute to sex hormone-dependent protection in female mice.

      Strengths:

      The role of microbiota was examined in various models (gnotobiotic, co-housing, microbiota transplantation). The identification of responsible immune cells was achieved using several genetic knockouts and cell-specific depletion models. The involvement of sex hormones was clarified using ovariectomy and the FCG model.

      Weaknesses:

      The mechanisms by which specific microbiota confer female-specific protection remain unclear.

      We thank the reviewer for highlighting the strength of the manuscript including the models and techniques we employ. We agree that the relationship between the microbiota and sex-dependent protection is less developed compared with other aspects of the study. In preparation of a revised manuscript, we intend on performing a more thorough comparison of male vs. female microbiota, along with quantification of sex hormones and downstream Th17 function (neutrophil recruitment and activation).

      Reviewer #2 (Public review):

      Overall, the paper nicely adds to the growing body of literature investigating how biological sex impacts the immune system and the burden of infectious disease. The conclusions are mostly supported by the data although there are some aspects of the data that could be better addressed and clarified.

      We thank the reviewer for appreciating our contribution. We intend on performing experiments to fill-in gaps and text revisions to increase clarity and acknowledge limitations.

      (1) There is something of a disconnect between the initial microbiome data and the later data that analyzes sex hormones and chromosomes. While there are clearly differences in microbial species across the two sites (NYU and JAX) how these bacterial species might directly interact with immune cells to induce female-specific responses is left unexplored. At the very least it would help to try and link these two distinct pieces of data to try and inform the reader how the microbiome is regulating the sex-specific response. Indeed, the reader is left with no clear exploration of the microbiota's role in the persistence of the infection and thus is left wanting.

      We agree. This comment is similar to Reviewer #1’s feedback. As mentioned above, we anticipate clarifying the association between sex differences and the microbiota. We will attempt to investigate specific bacteria, although some aspects of microbiota characterization may be outside the timeframe of the revision.

      (2) While the authors make a reasonable case that Th17 T cells are important for controlling infection (using RORgt knockout mice that cannot produce Th17 cells), it is not clear how these cells even arise during infection since the authors make most of the observations 2 days post-infection which is longer before a normal adaptive immune response would be expected to arise. The authors acknowledge this, but their explanation is incomplete. The increase in Th17 cells they observe is predicated on mitogenic stimulation, so they are not specific (at least in this study) for MRSA. It would be helpful to see a specific restimulation of these cells with MRSA antigens to determine if there are pre-existing, cross-reactive Th17 cells specific for MRSA and microbiota species which could then link these two as mentioned above.

      We acknowledge that this is a major limitation of our study. Although an experiment demonstrating pre-existing, cross-reactive T cells would help support our conclusion, aspects of MRSA biology may make the results of this experiment difficult to interpret. We have consulted with an expert on MRSA virulence factors, co-lead author Dr. Victor Torres, about the feasibility of this experiment. MRSA possess superantigens, such as Staphylococcal enterotoxin B, which bind directly to specific Vβ regions of T-cell receptors (TCR) and major histocompatibility complex (MHC) class II on antigen-presenting cells, resulting in hyperactivation of T lymphocytes and monocytes/macrophages. Additionally, other MRSA virulence factors, such as α-hemolysin and LukED, can induce cell death of lymphocytes. MRSA’s enterotoxins are heat stable, so heat-inactivation of the bacterium may not help in this matter.  For these reasons, restimulation of lymphocytes with MRSA antigens may be difficult to interpret. We humbly suggest that addressing this aspect of the mechanism is outside the scope of this manuscript.

      A study by Shao et al. provides an example of a host commensal species inducing Th17 cells with cross-reactivity against MRSA. Upon intestinal colonization, the intestinal fungus Candida albicans influences T cell polarization towards a Th17 phenotype in the spleen and peripheral lymph nodes which provided protection to the host against systemic candidemia. Interestingly, this induction of protective Th17 cells, increased IL-17 and responsiveness in circulating Ly6G+ neutrophils also protected mice from intravenous infection with MRSA, indicating that T cell activation and polarization by intestinal C. albicans leads to non-specific protective responses against extracellular pathogens.

      Shao TY, Ang WXG, Jiang TT, Huang FS, Andersen H, Kinder JM, Pham G, Burg AR, Ruff B, Gonzalez T, Khurana Hershey GK, Haslam DB, Way SS. Commensal Candida albicans Positively Calibrates Systemic Th17 Immunological Responses. Cell Host & Microbe. 2019 Mar 13;25(3):404-417.e6. doi: 10.1016/j.chom.2019.02.004. PMID: 30870622; PMCID: PMC6419754.

      Reviewer #3 (Public review):

      Strengths:

      A strength of the work is the rigorous experimental design. Appropriate controls were executed and, in most cases, multiple approaches were conducted to strengthen the authors' conclusions. The conclusions are supported by the data.

      The following suggestions are offered to improve an already strong piece of scholarship.

      Weaknesses:

      The correlation between female sex hormones and the elimination of S. aureus from the gut could be further validated by quantifying sex hormones produced in the four core genotype mice in response to colonization. Additionally, and this may not be feasible, but according to the proposed model administering female sex hormones to male mice should decrease colonization. Finally, knowing whether the quantity of IL-17a CD4+ cells change in the OVX mice has the potential to discern whether abundance/migration of the cells or their activation is promoted by female sex hormones.

      In the Discussion, the authors highlight previous work establishing a link between immune cells and sex hormone receptors, but whether the estrogen (and progesterone) receptor is differentially expressed in response to S. aureus colonization could be assessed in the RNAseq dataset. Differential expression of known X and Y chromosome-linked genes were discussed but specific sex hormones or sex hormone receptors, like the estrogen receptor, were not. This potential result could be highlighted.

      We appreciate the comment on the scholarship and thank the Reviewer for the insightful suggestions to improve this manuscript. We intend on measuring hormone levels and performing the recommended (or similar) experiments based on availability of reagents and mice during the revision period. We also apologize for not including references that address some of the Reviewer’s questions. Other research groups have compared the levels of hormones between XX and XY males and females in the four core genotypes model and have found similar levels of circulating testosterone in adult XX and XY males. No difference was found in circulating estradiol levels in XX vs XY- females when tested at 4-6 or 7-9 months of age.

      Karen M. Palaszynski, Deborah L. Smith, Shana Kamrava, Paul S. Burgoyne, Arthur P. Arnold, Rhonda R. Voskuhl, A Yin-Yang Effect between Sex Chromosome Complement and Sex Hormones on the Immune Response. Endocrinology, Volume 146, Issue 8, 1 August 2005, Pages 3280–3285, https://doi.org/10.1210/en.2005-0284

      Sasidhar MV, Itoh N, Gold SM, Lawson GW, Voskuhl RR. The XX sex chromosome complement in mice is associated with increased spontaneous lupus compared with XY. Ann Rheum Dis. 2012 Aug;71(8):1418-22. doi: 10.1136/annrheumdis-2011-201246. Epub 2012 May 12. PMID: 22580585; PMCID: PMC4452281.

      Examination of the levels of estrogen, progesterone, and androgen receptors in our cecal-colonic lamina propria RNA-seq dataset is an excellent idea. We will add these analyses to the revised manuscript. We are planning additional experiments to better understand the contributions of hormones or their receptors and anticipate including such data in either a response letter or revised manuscript.

    1. Author response:

      Reviewer #1 (Public Review):

      Strengths:

      Overall there are some very interesting results that make an important contribution to the field. Notably, the results seem to point to differential recruitment of the PL-DMS pathway in goal-tracking vs sign-tracking behaviors.

      Thank you.

      Weaknesses:

      There is a lot of missing information and data that should be reported/presented to allow a complete understanding of the findings and what was done. The writing of the manuscript was mostly quite clear, however, there are some specific leaps in logic that require more elaboration, and the focus at the start and end on cholinergic neurons and Parkinson's disease are, at the moment, confusing and require more justification.

      In the revised paper, we provide additional information in support of results and clarify procedures and findings. Furthermore, we expand the discussion of the proposed interpretational framework that suggests that the contrasts between the cortical-striatal processing of movement cues in sign- versus goal trackers are related to previously established, parallel contrasts in the cortical cholinergic detection of attention-demanding cues.

      Reviewer #2 (Public review):

      Strengths:

      The power of the sign- and goal-tracking model to account for neurobiological and behavioral variability is critically important to the field's understanding of the heterogeneity of the brain in health and disease. The approach and methodology are sound in their contribution to this important effort.

      The authors establish behavioral differences, measure a neurobiological correlate of relevance, and then manipulate that correlate in a broader circuitry and show a causal role in behavior that is consistent with neurobiological measurements and phenotypic differences.

      Sophisticated analyses provide a compelling description of the authors' observations.

      Thank you.

      Weaknesses:

      It is challenging to assess what is considered the "n" in each analysis (trial, session, rat, trace (averaged across a session or single trial)). Representative glutamate traces (n = 5 traces (out of hundreds of recorded traces)) are used to illustrate a central finding, while more conventional trial-averaged population activity traces are not presented or analyzed. The latter would provide much-needed support for the reported findings and conclusions. Digging deeper into the methods, results, and figure legends, provides some answers to the reader, but much can be done to clarify what each data point represents and, in particular, how each rat contributes to a reported finding (ie. single trial-averaged trace per session for multiple sessions, or dozens of single traces across multiple sessions).

      Representative traces should in theory be consistent with population averages within phenotype, and if not, discussion of such inconsistencies would enrich the conclusions drawn from the study. In particular, population traces of the phasic cue response in GT may resemble the representative peak examples, while smaller irregular peaks of ST may be missed in a population average (averaged prolonged elevation) and could serve as a rationale for more sophisticated analyses of peak probability presented subsequently.

      Figures 5c-f depict individual data from all rats and trials. For all major analyses, the revised manuscript consolidates information about the number of rats per phenotype and sex, and the number of trials contributed by individual rats, in the result section.

      As detailed in the section on statistical methods, and as mentioned by the reviewer under Strengths, we used advanced statistical methods to assure that data from individual animals contribute equally to the overall result, and to minimize the possibility that an inordinate number of trials obtained from just one or a couple of rats biased the overall analysis.

      As the reviewer correctly pointed out, we have chosen not to show trial- or subject-averaged traces to illustrate glutamate release dynamics across trials. The present analyses focus on peak glutamate concentrations, the number of peaks, and the timing of peaks relative to a task cue or a behavioral event. Within a response bin, such as the 2-s period following turn cues, glutamate peaks – as defined in Methods - occur at variable times relative to cue onset.  Averaging traces over a population of rats or trials would “wash-out” the phenotype- and task event-dependent patterns of glutamate peaks, yielding, for example, a single, nearly 2-s long plateau for cue-locked glutamate recordings from STs (Figure 5b). Thus, subject- or trial-averaged traces would not illustrate the major findings described in this paper and would rather be uninformative. As already mentioned, individual data from all subjects and trials are shown in Figs 5c-f.

      Reviewer #3 (Public review):

      Strengths:

      Overall these studies are interesting and are of general relevance to a number of research questions in neurology and psychiatry. The assessment of the intersection of individual differences in cue-related learning strategies with movement-related questions - in this case, cued turning behavior - is an interesting and understudied question. The link between this work and growing notions of corticostriatal control of action selection makes it timely.

      Thank you.

      Weaknesses:

      The clarity of the manuscript could be improved in several places, including in the graphical visualization of data. It is sometimes difficult to interpret the glutamate results, as presented, in the context of specific behavior, for example.

      We appreciate the reviewer’s concerns about the complexity of some of the graphics, particularly the results from the arguably innovative analysis illustrated in Figure 6. Figure 6 illustrates that the likelihood of a cued turn can be predicted based on single and combined glutamate peak characteristics. The revised legend for this figure provides additional information and examples to ease the readers’ access to this figure.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      The authors found that the loss of cell-ECM adhesion leads to the formation of giant monocular vacuoles in mammary epithelial cells. This process takes place in a macropinocytosis-like process and involves PI3 kinase. They further identified dynamin and septin as essential machinery for this process. Interestingly, this process is reversible and appears to protect cells from cell death.

      Strengths:

      The data are clean and convincing to support the conclusions. The analysis is comprehensive, using multiple approaches such as SIM and TEM. The discussion on lactation is plausible and interesting.

      We thank the reviewer for the summary of our study and the positive comment.

      Weaknesses:

      As the first paper describing this phenomenon, it is adequate. However, the elucidation of the molecular mechanisms is not as exciting as it does not describe anything new. It is hoped that novel mechanisms will be elucidated in the future. In particular, the molecules involved in the reversing process could be quite interesting.

      We agree with the reviewer’s comments and believe that investigating the molecular mechanisms involved in reversing GUVac formation, as illustrated in Figure 5J, would be valuable for future research.

      Additionally, the relationship to conventional endocytic compartments, such as early and late endosomes, is not analyzed.

      We thank the reviewer for the valuable comment. To determine whether GUVac displays markers of other endomembrane systems, we analyzed several markers, including EEA1, Rab5, LC3B, LAMP1, and Transferrin receptor (TfR). At early time points (1 h), we observed several large vesicles that had taken up 70kDa Dextran and exhibited EEA1 or Rab5, markers of early endosomes. By 6 hours, some of these large vesicles showed lysotracker positivity, indicating a transition from early to late endosomal fate, similar to the maturation process of conventional macropinocytic vesicles (see new Figure 1-figure supplement 2A). However, once the vesicles fused, grew, and became GUVac, these markers did not consistently correspond with the GUVac membrane but were instead unevenly distributed around it (new Figure 1-figure supplement 2B, C). This made it difficult to determine whether they were localized to separate organelles or part of the GUVac membrane. Interestingly, we found that the Transferrin receptor (TfR), which also marks a general membrane population involved in the endocytic pathway (such as PM invagination), was evenly distributed within the GUVac membrane (new Figure 1-figure supplement 2B, D). Therefore, GUVac appears to possess heterogeneous characteristics of the endocytic membrane, mainly with the TfR marker (likely due to PM invagination) and some partial endomembrane system markers. However, further analysis would be required to confirm this.

      Reviewer #2 (Public Review):

      Summary:

      The manuscript "Formation of a giant unilocular vacuole via macropinocytosis-like process confers anoikis resistance" describes an interesting observation and provides initial steps towards understanding the underlying molecular mechanism.

      The manuscript describes that the majority of non-tumorigenic mammary gland epithelial cells (MCF-10A) in suspension initiate entosis. A smaller fraction of cells forms a single giant unilocular vacuole (hereafter referred to as a GUVac). GUVac appeared to be empty and did not contain invading (entotic) cells. The formation of GUVac could be promoted by disrupting actin polymerisation with LatB and CytoD. The formation of GUVacs correlated with resistance to anoikis. GUVac formation was detected in several other epithelial cells from secretory tissues.

      The authors then use electron microscopy and super-resolution imaging to describe the biogenesis of GUVac. They find that GUVac formation is initiated by a micropinocytosis-like phenomenon (that is independent of actin polymerisation). This process leads to the formation of large plasma membrane invaginations, that pinch off from the PM to form larger vesicles that fuse with each other into GUVacs.

      Inhibition of actin polymerisation in suspended MCF-10a leads to the recruitment of Septin 6 to the PM via its amphipathic helix. Treatment with FCF (a septin polymerisation inhibitor) blocked GUVac biogenesis, as did pharmacological inhibition of dynamin-mediated membrane fission. The fusion of these vesicles in GUVacs required (perhaps not surprisingly) PI3P.

      Strengths:

      The authors have made an interesting and potentially important observation. They describe the formation of an endo-lysosomal organelle (a giant unilocular vacuole - GUVac) in suspended epithelial cells and correlate the formation of GUVacs with resistance to aniokis.

      We thank the reviewer for the summary of our study and the positive comment.

      Weaknesses:

      My major concern is the experimental strategy that is used throughout the paper to induce and study the formation GUVac. Almost every experiment is conducted in suspended cells that were treated with actin depolymerising drugs (e.g. LatB) and thus almost all key conclusions are based on the results of these experiments. I only have a few suggestions that would improve these experiments or change their outcome and interpretation. Yet, I believe it is essential to identify the endogenous pathway leading to the actin depolymerisation that drives the formation of GUVacs in detached epithelial cells (or alternatively to figure out how it is suppressed in most detached cells). A first step in that direction would be to investigate the polymerization status of actin in MCF-10a cells that 'spontaneously' form GUVacs and to test if these cells also become resistant to anoikis.

      We thank the reviewer for the valuable comments and fully acknowledge the limitations of our approach. Many detached cells likely tend to contact each other for cell aggregations to suppress GUVac formation. However, it is unclear whether cells that spontaneously form GUVac in suspension have a weakened F-actin structure, which would be valuable to investigate in future studies.

      Also, it would be great (and I believe reasonably easy) to better characterise molecular markers of GUVacs (LAMP's, Rab's, Cathepsins, etc....) to discriminate them from other endosomal organelles

      In response to a similar comment from Reviewer 1, we analyzed markers of other endocytic compartments, including EEA1, Rab5, Transferrin receptor (TfR), LC3B, and LAMP1. At early time points (1 h), we observed several large vesicles that had taken up 70kDa Dextran and exhibited EEA1 or Rab5, markers of early endosomes. By 6 hours, some of these large vesicles showed lysotracker positivity, indicating a transition from early to late endosomal fate, similar to the maturation process of conventional macropinocytic vesicles (see new Figure 1-figure supplement 2A). However, once the vesicles fused, grew, and became GUVac, these markers did not consistently correspond with the GUVac membrane but were instead unevenly distributed around it (new Figure 1-figure supplement 2B, C). This made it difficult to determine whether they were localized to separate organelles or part of the GUVac membrane. Interestingly, we found that the Transferrin receptor (TfR), which also marks a general membrane population involved in the endocytic pathway (such as PM invagination), was evenly distributed within the GUVac membrane (new Figure 1-figure supplement 2B, D). Therefore, GUVac appears to possess heterogeneous characteristics of the endocytic membrane, mainly with the TfR marker (likely due to PM invagination) and some partial endomembrane system markers. However, further analysis would be required to confirm this.

      Reviewer #3 (Public Review):

      Summary:

      Loss of cell attachment to extracellular matrix (ECM) triggers aniokis (a type of programmed cell death), and resistance to aniokis plays a role in cancer development. However, mechanisms underlying anoikis resistance, and the precise role of F-actin, are not fully known.

      Here the authors describe the formation of a new organelle, giant unilocular vacuole (GUVac), in cells whose F-actin is disrupted during loss of matrix attachment. GUVac formation (diameter >500 nm) resulted from a previously unrecognised macropinocytosis-like process, characterized by inwardly curved micron-sized plasma membrane invaginations, dependent on F-actin depolymerization, septin recruitment, and PI(3)P. Finally, the authors show GUVac formation after loss of matrix attachment promotes resistance to anoikis.

      From these results, the authors conclude that GUVac formation promotes cell survival in environments where F-actin is disrupted and conditions of cell stress.

      Strengths:

      The manuscript is clear and well-written, figures are all presented at a very high level.

      A variety of cutting-edge cell biology techniques (eg time-lapse imaging, EM, super-resolution microscopy) are used to study the role of the cytoskeleton in GUVac formation. It is discovered that: (i) a macropinocytosis-like process dependent on F-actin depolymerisation, SEPT6 recruitment, and PI(3)P contributes to GUVac formation, and (ii) GUVac formation is associated with resistance to cell death.

      We thank the reviewer for the concise summary of our study and positive comments.

      Weaknesses:

      The manuscript is highly reliant on the use of drugs, or combinations of drugs, for long periods of time (6hr, 18hr..). Wherever possible the authors should test conclusions drawn from experiments involving drugs also using other canonical cell biology approaches (eg siRNA, Crispr). Although suggestive as a first approach, it is not reliable to draw conclusions from experiments where only drug combinations are being advanced (eg LatB + FCF).

      We thank the reviewer for the comment and suggestion. As suggested, we employed siRNAs targeting Septin2 and Septin9 in cells treated with LatB as an alternative to the drug combination approach. This genetic approach, combined with chemical treatment, led to a consistent reduction in GUVac formation, similar to the results observed with LatB+FCF treatment (see new Figure 3D-WB and graph).

      F-actin is well known to play a wide variety of roles in cell death and other canonical cell death pathways (PMID: 26292640). The authors show using pharmacological inhibition that F-actin is key for GUVac formation. However, especially when testing for physiological relevance, how can these other roles for F-actin be ruled out?

      In Figure 5, we investigate the physiological relevance of GUVac, highlighting its role in suppressing apoptosis and enhancing anoikis resistance. As the reviewer correctly noted, F-actin inhibition is known to reduce apoptotic signaling (PMID: 16072039). However, we observed that anoikis resistance is lost when GUVac is suppressed through knockout of either PI3KC2alpha or VPS34 in cells with F-actin disrupted by LatB (Figure 5I). This suggests that GUVac plays a role in suppressing apoptosis independently of F-actin depolymerization-induced apoptosis resistance.

      To test the role of septins in GUVac formation only recruitment studies and no direct functional work is performed. A drug forchlofeneuron (FCF) is used, but this is well known to have off-target effects (PMID: 27473917).

      We thank the reviewer for the valuable comments. To eliminate potential off-target effects of FCF, as described above, we employed siRNA targeting Septin 2 and Septin 9 and observed similar results (see new Figure 3D).

      Cells that possess GUVac are resistant to aniokis, but how are these cells resistant? This report is focused on mechanisms underlying GUVac formation and does not directly test for mechanisms underlying aniokis resistance.

      We fully agree with the reviewer’s comments and recognize the importance of uncovering the mechanism behind GUVac-mediated anoikis resistance for future research. It will likely be essential to investigate how prosurvival signaling pathways are activated, like the PI3K-AKT signaling (as shown in Figure 5-Supplement 1) or the YAP/TAZ pathway.

      Reviewer #1 (Recommendations For The Authors):

      Figure 4 Supplemental 1. What are the faint bands in clones 23, 26, and 29? Are they cross-reacting bands? Or Vps34?

      We apologize if the data in our original manuscript were misleading. To clarify the specificity of the VPS34 antibody in the Western blot analysis of VPS34 KO clones, we compared these samples with those from siRNA-mediated VPS34-depleted cells (see new Figure 4-Supplement 1E, which replaces the original Figure). Consistent with the known size of VPS34 at approximately 100 kDa, we observed a clear disappearance of the VPS34 band at around 100 kDa in the sgVPS34 clones, which was comparable to the size observed in siRNA-treated cells.

      Reviewer #2 (Recommendations For The Authors):

      Figure 2B: Only 4 cells were counted. Please comment.

      At the outset of this study, we faced technical difficulties in preparing TEM samples, which limited the number of samples included in Figure 2B. However, subsequent experiments that combined TEM with super-resolution microscopy, as shown in Figure 4D-F, produced similar data on plasma membrane invagination, as depicted in Figure 2B, which is the initial step in the formation of GUVac.

      Figure 2C: do cells shrink after treatment with EIPA or LatB? Please comment.

      We apologize if the data presented in our original manuscript were misleading. Control cells treated with DMSO display multiple cell-in-cell structures (known as 'entosis'), which typically results in a larger overall cell size compared to EIPA or LatB-treated non-entotic single cells. This might have created the impression that cells shrink relative to the control under EIPA or LatB treatment. We hope this explanation has answered the reviewer’s question.

      Figure 3A: The changes in the localization of mCherry-Spetin6 appear to be very dramatic. Are these results properly reflected by the quantification in Figure 3B? Is indeed the entire mCherry-Spetin6 pool recruited to the plasma membrane? Wouldn't that imply that all other septin6-regulated processes are blocked?

      Again, we apologize if the data presented in our original manuscript caused any confusion. In Figure 3B, we quantified only the number of filament-like Septin6 structures predominantly observed in LatB-treated cells, rather than measuring changes in the relative fluorescence intensity of Septin6 between the plasma membrane and the cytosol. Although we could not estimate the proportion of total Septin6 recruited to the plasma membrane from the cytosol based solely on Figure 3A-B, conducting plasma membrane fractionation experiments with endogenous Septin6, followed by Western blot analysis, would be valuable for addressing this issue in future studies.

      Figure 3D: Please also provide data for the 6h time-point (as in all other experiments).

      We apologize for omitting the 6-hour time point, which may have caused confusion. The new Figure 3E (previously Figure 3D) shows that recruitment of wild-type Septin6, but not the amphipathic helix (AH) deletion mutant, occurs at a 6-hour time point.

      Figure 3E: Molecular weight for western blot is missing.

      We thank the reviewer for pointing this out and have revised the figure accordingly.

      Line 188 - Title of subchapter could include dynamin.

      We appreciate the reviewer’s helpful suggestion and have updated the revised manuscript to reflect this. The phrase "Recruitment of Septin to the Fluctuating Plasma Membrane Drives Macropinocytosis-like Process" has been revised to "Septin and Dynamin Drive Macropinocytosis-like Process".

      Line 450 - please describe how the genotyping of MCF10a gene-engineered cells was performed.

      We confirmed the knockout of MCF10A cell lines by Western blot analysis using specific antibodies against VPS34 and PI3KC2α, rather than through genotyping.

      Reviewer #3 (Recommendations For The Authors):

      (1) The manuscript is highly reliant on the use of drugs, or combinations of drugs, for long periods of time (6hr, 18hr..). Wherever possible authors should test conclusions drawn from experiments involving drugs also using other canonical cell biology approaches (eg siRNA, Crispr). Although suggestive as a first approach, it is not reliable to draw conclusions from experiments where only drug combinations are being advanced (eg LatB + FCF).

      We thank the reviewer for the comment. As suggested, we employed siRNAs targeting Septin2 and Septin9 in cells treated with LatB as an alternative to the drug combination approach. This genetic approach, combined with chemical treatment, led to a consistent reduction in GUVac formation, similar to the results observed with LatB+FCF treatment (see new Figure 3D-WB and graph).

      (2) SEPT6 is recruited at an inwardly curved plasma membrane. Can the authors better describe what type of structure is being recruited/quantified (filaments, collar-like structures, etc)?

      We apologize if the data presented was unclear. As outlined in the Methods section in the original manuscript, we detected puncta-like Septin6 structures using the Find Maxima tool in ImageJ, which could include both filamentous and collar-like structures that were less apparent in the DMSO control. We have added additional explanations in the revised manuscript in the legend of Figure 3B to clarify the recruitment of Septin6.

      Previous work has shown that octameric septin complexes are linking actin to the plasma membrane (PMID: 36562751). Tests for the recruitment/function of other key septins such as SEPT7 and SEPT9 to support conclusions.

      As previously mentioned, to further explore the role of other septin family members in GUVac formation, we tested the roles of Septin9 and Septin2 using siRNAs and found that they are essential for this process (see new Figure 3D). Unfortunately, we were unable to assess the localization of Septin2 and Septin9 due to the lack of suitable antibodies for detecting endogenous proteins by immunofluorescence.

      (3) SEPT6 recruitment is impaired when cells are treated with FCF. FCF is well known to have off-target effects (PMID: 25217460, PMID: 27473917). siRNA for SEPT2, SEPT7 and/or SEPT9 can be used to test phenotypes obtained using FCF.

      We thank the reviewer for the comment. As also mentioned above, to eliminate potential off-target effects of FCF, we used siRNA to target Septin2 and Septin9, and obtained similar results (see new Figure 3D).

      (4) SEPT6 is recruited to the fluctuating cell membrane via the amphipathic helix (AH) domain (Figure 3D). Are these only representative images? It is not clear what readers should be looking at - can the authors provide arrows to highlight what is the difference +/- AH? Can something be quantified?

      We thank the reviewer for the suggestion and have added arrows from the inset of the merge pannel Figure 3E, along with line profile analysis, to emphasize the failure of the AH deletion mutant of Septin6 to recruit to the plasma membrane.

      Throughout Figure 3, why use LatB treatment at different times?

      We apologize if this was not clearly addressed in our original manuscript. Throughout the study, we primarily used an 18-hour LatB treatment to evaluate GUVac formation, as this longer period allows for gradual vesicle fusion. In contrast, we utilized 6-hour treatments to demonstrate that Septin6 recruitment and subsequent plasma membrane invagination occur at earlier time points, as evidenced by the data in Figure 2G (super-resolution live imaging) and Figure 4D (electron microscopy analysis). This clarification has been incorporated into the revised manuscript.

      (5) F-actin is well known to play a wide variety of roles in cell death and other canonical cell death pathways (PMID: 26292640). The authors show using pharmacological inhibition that F-actin is key for GUVac formation. However, especially when testing for physiological relevance, how can these other roles for F-actin be ruled out?

      In Figure 5, we investigate the physiological relevance of GUVac, highlighting its role in suppressing apoptosis and enhancing anoikis resistance. As the reviewer correctly noted, F-actin inhibition is known to reduce apoptotic signaling (PMID: 16072039). However, when GUVac is suppressed through knockout of either PI3KC2alpha or VPS34 in cells with F-actin disrupted by LatB, anoikis resistance is lost (see Figure 5H, I). This suggests that GUVac plays a role in suppressing apoptosis independently of F-actin depolymerization-induced apoptosis resistance.

      (6) Cells that possess GUVac are resistant to aniokis, but how are these cells resistant? This report is focused on mechanisms underlying GUVac formation and does not directly test for mechanisms underlying aniokis resistance.

      We fully agree with the reviewer’s comments and recognize the importance of uncovering the mechanism behind GUVac-mediated anoikis resistance for future research. It will likely be essential to investigate how prosurvival signaling pathways are activated, like the PI3K-AKT signaling (as shown in Figure 5-Supplement 1) or the YAP/TAZ pathway.

      (7) In the Discussion, there is a lot of text on involution and speculative relevance of GUVac formation. I would focus the Discussion more on the clear results discovered here.

      We thank the reviewer’s feedback and have revised the discussion to reduce its length concerning involution.

      (8) Figure 5. GUVac formation promotes cell survival in altered actin and matrix environments. In Figure 5J, it will not be clear to readers outside the field what is being shown here.

      We appreciate the reviewer’s suggestion and have added two distinct dotted lines around the vacuole and cell area in the revised figure to emphasize the gradual reduction in its size over time.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      SUMO proteins are processed and then conjugated to other proteins via a C-terminal di-glycine motif. In contrast, the N-terminus of some SUMO proteins (SUMO2/3) contains lysine residues that are important for the formation of SUMO chains. Using NMR studies, the N-terminus of SUMO was previously reported to be flexible (Bayer et al., 1998). The authors are investigating the role of the flexible (referred to as intrinsically disordered) N-terminus of several SUMO proteins. They report their findings and modeling data that this intrinsically disordered N-terminus of SUMO1 (and the C. elegans Smo1) regulates the interaction of SUMO with SUMO interacting motifs (SIMs).

      Strengths:

      Among the strongest experimental data suggesting that the N-terminus plays an inhibitory function are their observations that

      (1) SUMO1∆N19 binds more efficiently to SIM-containing Usp25, Tdp2, and RanBp2,<br /> (2) SUMO1∆N19 shows improved sumoylation of Usp25,<br /> (3) changing negatively-charged residues, ED11,12KK in the SUMO1 N-terminus increased the interaction and sumoylation with/of USP25.

      The paper is very well-organized, clearly written, and the experimental data are of high quality. There is good evidence that the N-terminus of SUMO1 plays a role in regulating its binding and conjugation to SIM-containing proteins. Therefore, the authors are presenting a new twist in the ever-evolving saga of SUMO, SIMs, and sumoylation.

      Weaknesses:

      Much has been learned about SUMO through structure-function analyses and this study is another excellent example. I would like to suggest that the authors take some extra time to place their findings into the context of previous SUMO structure-function analyses. Furthermore, it would be fitting to place their finding of a potential role of N-terminally truncated Smo1 into the context of the many prior findings that have been made with regard to the C. elegans SUMO field. Finally, regarding their data modeling/simulation, there are questions regarding the data comparisons and whether manipulations of the N-terminus also have an effect on the 70/80 region of the core.

      We thank the reviewer for insightful and constructive comments to improve our manuscript. We have now placed our findings in the context of previous structure-function analyses at several occasions, details of which can be found in our replies to the detailed comments.

      We are also placing the C. elegans data into context of previously published findings on the various functions of SMO-1 in controlling development and maintaining genomic stability (lines 510ff). Finally, we addressed all questions and suggestions regarding comparison of MD simulation and NMR data, and addressed the question whether mutations in the N-terminus affected the 70/80 region. We have now clarified in the manuscript that the sum of MD and NMR data does not allow a clear-cut conclusion on the 70/80 interactions. 

      Reviewer #2 (Public Review):

      Summary:

      This very interesting study originated from a serendipitous observation that the deletion of the disordered N-terminal tail of human SUMO1 enhances its binding to its interaction partners. This suggested that the N terminus of SUMO1 might be an intrinsic competitive inhibitor of SUMO-interacting motif (SIM) binding to SUMO1. Subsequent experiments support this mechanism, showing that in humans it is specific to SUMO1 and does not extend to SUMO2 or SUMO3 (except, perhaps, when the N terminus of SUMO2 becomes phosphorylated, as the authors intriguingly suggest - and partially demonstrate). The auto-inhibition of SUMO1 via its N-terminal tail apparently explains the lower binding of SUMO1 compared to SUMO2 to some SIMs and lower SIM-dependent SUMOylation of some substrates with SUMO1 compared to SUMO2, thus adding an important element to the puzzle of SUMO paralogue preference. In line with this explanation, N-terminally truncated SUMO1 was equally efficient to SUMO2 in the studied cases. The inhibitory role of SUMO1's N terminus appears conserved in other species including S. cerevisiae and C. elegans, both of which contain only one SUMO. The study also elucidates the molecular mechanism by which the disordered N-terminal region of SUMO1 can exert this auto-inhibitory effect. This appears to depend on the transient, very highly dynamic physical interaction between the N terminus and the surroundings of the SIM-binding groove based mostly on electrostatic interactions between acidic residues in the N terminus and basic residues around the groove.

      Strengths:

      A key strength of this study is the interplay of different techniques, including biochemical experiments, NMR, molecular dynamics simulations, and, at the end, in vivo experiments. The experiments performed with these different techniques inform each other in a productive way and strengthen each others' conclusions. A further strength is the detailed and clear text, which patiently introduces, describes, and discusses the study. Finally, in terms of the message, the study has a clear, mechanistic message of fundamental importance for various aspects of the SUMO field, and also more generally for protein biochemists interested in the functional importance of intrinsically disordered regions.

      Weaknesses:

      Some of the authors' conclusions are similar to those from a recent study by Lussier-Price et al. (NAR, 2022), the two studies likely representing independent inquiries into a similar topic. I don't see it as a weakness by itself (on the contrary), but it seems like a lost opportunity not to discuss at more length the congruence between these two studies in the discussion (Lussier-Price is only very briefly cited). Another point that can be raised concerns the wording of conclusions from molecular dynamics. The use of molecular dynamics simulations in this study has been rigorous and fruitful - indeed, it can be a model for such studies. Nonetheless, parameters derived from molecular dynamics simulations, including kon and koff values, could be more clearly described as coming from simulations and not experiments. Lastly, some of the conclusions - such as enhanced binding to SIM-containing proteins upon N-terminal deletion - could be additionally addressed with a biophysical technique (e.g. ITC) that is more quantitative than gel-based pull-down assays - but I don't think it is a must.

      Thank you very much for pointing towards the study of Lussier-Price. We now point out congruent findings in more detail in the discussion.

      We also thank the reviewer for the advice to present and discuss the MD findings more clearly, and more explicitly specify which parameters were obtained from MD. We have made changes throughout the Results and Discussion sections.

      We agree that it would be a nice addition to use ITC measurements as a more quantitative method to assess differences in binding affinities upon deletion of the SUMO N-terminus. We had tried to measure affinities between SUMO and SIM-containing binding partners by ITC but in our hand, this failed. In the study of Lussier-Price et al., the authors were able to measure differences in SIM binding upon deleting the N-terminus but only when they used phosphorylated SIM peptides. Follow-up studies, e.g., on the effect of SUMO’s N-terminal modifications should certainly include more quantitative measurement such as ITCs, however these studies will have to be picked up by others. The main PI Frauke Melchior and most contributing authors moved on to new challenges.

      Reviewing Editor (Recommendations For The Authors):

      Both reviewers agreed that your manuscript presents novel results and the key findings including the self-inhibitory role of the N-terminal tail of SUMO proteins in their interaction with SIM are overall well supported by the data. The reviewers also provided constructive suggestions. They pointed out that some simulation results are not clear, which could be strengthened by control analysis and by toning down the related descriptions. In addition, Reviewer 2 suggested that the conclusions from the current biochemical and simulation studies could be further reinforced by more quantitative binding measurements. We hope that these points can be addressed in the revision.

      We thank both reviewers for their insightful and constructive comments and the appreciative tone. In our replies above and below we address most of the raised concerns.

      We strongly recommend the change of the current title. eLife advises that the authors avoid unfamiliar abbreviations or acronyms, or spell out in full or provide a brief explanation for any acronyms in the title.

      We changed the title to “The intrinsically disordered N-terminus of SUMO1 is an intramolecular inhibitor of SUMO1 interactions” to avoid acronyms in the title.

      Reviewer #1 (Recommendations For The Authors):

      Major:

      Lines 190-262: The authors use NMR experiments and all-atom molecular dynamics (MD) simulations. They state that this approach reveals a highly dynamic interaction of the SUMO1 N-terminus with the core and that the SIM binding groove and the 70/80 region are temporarily occupied by the SUMO1 N-terminus (Fig. 3C). After comparing SUMO1, Smt3, SUMO2, and Smo1 by this approach they state that the most striking differences exist for the interaction with the SIM-binding groove, while interactions with the 70/80 region are rather comparable.

      The authors then compare the average binding time data of Figure 3C, D, E, F in Figure 3G.

      It is not clear which data points are included in the bar graphs of Figure 3G and how the individual data points (there are maybe 8 shown in each bar) correspond to the data shown in 3C, D, E, and F or if they are iterations (n?) of the modeled data. This should be clarified. Also, for comparison, the authors should also graph the average data of the 70/80 region.

      We clarified the data shown in Figure 3G as well as 3C-F, and how It relates to each other. Indeed, Figure 3G shows 8 data points for 8 trajectories, and their average. Figure 3C-F are based on the same 8 trajectories, in this case broken down per residue of the protein. The average data of the 70/80 region does not show any significant differences across the proteins, as already pretty well visible from panels 3C-F.

      Line 322: More concerning, in Figure 5, the authors model how a ED11,12KK mutations disrupt the interaction between the N-terminus and the SIM-binding groove and state that this mutation leaves interactions with the 70/80 region largely untouched. Again, it is not clear which data points are included in the bar graph 5D and 5G and how many iterations. Furthermore, data of 5B, C (SUMO1) and 5 E, F (smo1) do show clear differences between the WT and mutants affecting both the SIM binding groove and the 70/80 region. The double mutation clearly seems to affect the 70/80 region when comparing 5B, C (SUMO1) and 5 E, F (smo1), but this result is not mentioned. Indeed, the authors state that the double mutants leave the interactions with the 70/80 region largely untouched, but this is not borne out by the data presented.

      We improved the clarity of the legend of Figure 5 as suggested. We also thank the reviewer for the comment on the changes in the 70/80 region, to which we point the reader explicitly now in the corresponding Results section. We, however, refrain from drawing conclusions from the MD in this case, as this change is not supported by the NMR measurements (Fig 5a). Charge-charge interactions in the charge-rich double mutants might be overstabilized in the MD simulations, a problem known for the canonical force fields used here, albeit tailoring it for IDPs. We now cite a corresponding reference. Another potential explanation for that the CMPs do not take this change up upon mutation could be a pronounced fuzziness in this region, which however, in turn, is not apparent from the simulations. We would therefore not overinterpret these differences in the 70/80 region. Our key conclusion is the loss of interactions with the SIM-binding groove – and thus of cis-inhibition – by mutations, which is supported by both, MD and NMR.  

      341: In their N-termini substitution experiments, the authors show that the SUMO1 core that carries the SUMO2 N-terminus (S2N-S1C) binds USP25 more efficiently than wt SUMO1. However, the SUMO1 core that carries the SUMO2 N-terminus is also reduced in its interaction with Usp25. This is concerning as the SUMO2 N-terminus was not predicted to interfere with SIM binding.

      We were excited to see that the inhibitory potential could be partially transplanted by swapping the N-termini of SUMO1 and SUMO2 demonstrating that some important determinants are contained within the N-terminal tail of SUMO proteins. However, the observed effects were partial indicating that also other determinants contribute and that we do not yet understand all aspects. Obviously, the SUMO1 and SUMO2 cores are similar (also in the area comprising the SIM binding groove) but not identical, and as the inhibition arises from dynamic interactions of the N-terminus with the SIM binding area, differences in the SUMO cores and in residues flanking SUMO’s N-terminus are likely to influence the inhibitory potential as well.

      Blue bars in 3G, 5D, and 6A look surprisingly similar down to the individual data points - does that mean that the same SUMO1 WT data was recycled for these different experiments? This is concerning to me.

      The data displayed in the figures listed above are derived from in silico simulations and indeed display the same data set for the case of SUMO1 WT repeatedly, as we also state in the figure legends (we had done so for 5D “(identical to Fig. 3C)”, and now added the same comment to 6A, thanks for pointing this out). We show the SUMO1 WT data again to facilitate comparing the different SUMO variants in MD simulations.

      Line 352 and 496: The authors used phosphomimetic mutants to assess the effect of SUMO2 N-term phosphorylation on interaction with Usp25. The data suggest a mild phenotype (6G) which is borne out by the quantization in 6H. In contrast, the effect of an array of modifications for SUMO1 (Figures 6A - C) was solely analyzed by MD simulation. If possible, this data should be confirmed, at least by using a phosphomimetic at the Ser9 position of SUMO1. Alternatively, a caveat explaining the need to confirm these predictions by actual experiments should be added to the text.

      Already now we state in “Limitations of the study” that “While our MD simulations and in vitro studies with selected mutants point in this direction, we have not been able to generate quantitatively acetylated and/or phosphorylated SUMO variants to test this hypothesis.”

      We agree that the hypothesis needs experimental validation. Phosphomimetic amino acids can be a useful tool in some cases but fail to mimic a phosphor group in other cases. In the past we had tested whether replacing Ser9 by a potentially phospho-mimicking amino acid (Glu) would further diminish binding of SIM-containing proteins compared to already strongly reduced binding to wt SUMO1 but the effect was too mild to yield a significant difference, at least in our assay. Whether this is due to a lack of Glu in mimicking phosphorylation of Ser9, due to limited sensitivity of our pulldown assay combined with the challenge to detect inhibition compared to an already inhibited state, or a failure in our hypothesis we were not able to clarify so far. We therefore now also added a sentence to the paragraph introducing phosphoSer9 MD simulations (now line 367) stating that this hypothesis needs to be tested experimentally.

      Minor:

      Line 110: the authors should include references for their summary statement that "A defining feature of SUMO proteins is the intrinsically disordered N-terminus, whose function is only partly understood." Also cite in line 119.

      Thank you, we now included some references.

      Line 75: Please indicate early on that the N-terminus of some SUMO proteins contains lysines for the formation of SUMO chains. Please list them.

      We now list, which of the SUMO proteins used in this study contain lysine residues in their N-termini.

      Line 113: Please cite studies that elucidated the sumoylation of lysines in the N-terminus of SUMO2/3 proteins.

      Thank you, we now included some references.

      Line 153: The authors should include additional references on Smt3 structure function analyses to provide better context. One important detail, for example, is the important finding that Yeast SUMO (Smt3) deletion can be complemented by hsSUMO1 but not hsSUMO2 and hsSUMO3. Additionally, in yeast the entire Smt3 N-terminus can be deleted without detectable effects on growth, underscoring the enigmatic role of the N-terminus (Newman et al., 2017). Caveat also applies to line 266.

      Thank you, we now included some additional information and references around line 153 and below.

      164: The hypothesis that the SUMO1 N-terminus interferes with SIM binding groove ignores the previous observation that deletion of the SUMO2 N-terminus does not have an effect on binding (in vitro). While this is addressed later, the authors should clarify this e.g. by stating "a unique feature of the SUMO1 N-terminus".
>

      We now explicitly mention that this feature appears to be unique to SUMO1.

      374 and 499: The authors should discuss the caveat that the deletion of the N-terminus of Smt3 does not have a phenotype in yeast in vivo (Newman et al., 2017).

      We now discuss that Smt3’s N-terminus can be deleted without detectable phenotype, both in the results as well as in “Limitations of the study”.

      Line 367: I feel this is overstated and I do not see any evidence that post translation modifications of the SUMO core plays a role. Therefore, I suggest: Our data and modeling are consistent with an interpretation that the N-termini of human and C. elegans SUMO1 proteins are inhibitory and that other SUMO N-termini may acquire such a function upon posttranslational modification of the N-terminus.

      We agree that this is pure speculation and therefore restrict our hypothesis to modifications of the N-terminus.

      Line 374 ff: Since Smo-∆N12 increases sumoylation (Fig. 2I), it is likely that the in vivo defect is due to over-sumoylation in C. elegans. The authors should discuss this possibility and quote appropriate literature e.g.: Rytinki et al., Overexpression of SUMO perturbs the growth and development of Caenorhabditis elegans. Cell Mol Life Sci. 2011 Oct;68(19):3219-32. PMID: 21253676.

      In our study, we employ in vitro SUMOylation as a means to assess the SIM binding capability in an in-solution assay. For this, we use USP25 as a specific substrate known to depend on a SIM for its SUMOylation. We cannot exclude that some specific substrates depending on this same mechanism for their modification may be upregulated in modification also in the Smo-1∆N12 worms. In vivo however, the majority of SUMO substrates is not subject to SIM-dependent SUMOylation. We now added a control experiment showing that we neither observe significantly increased SUMO levels nor upregulated steady state levels of SUMOylation in these worms (Supplemental figure 8).

      The phenotypes shown in the paper by Rytinki et al. do not resemble the smo-1∆N12 mutants. Rather, we observed a specific defect in the meiotic germ cells at the pachytene stage causing increased apoptosis Moreover, we show by western blot analysis that there is no global over-sumoylation occurring in smo-1∆N12 mutants (Fig. s8). Together, our data point to a germline-specific function of the SMO-1 N-terminus in maintaining genome stability (lines 510ff).

      Reviewer #2 (Recommendations For The Authors):

      Page2 - "Small Ubiquitin-related modifiers of the SUMO family regulate thousands of proteins in eukaryotic cells" - The authors could consider a more precise statement, e.g. that SUMO modifiers have been detected on thousands of proteins and their regulatory effect on many proteins have been demonstrated.

      To be a bit more precise, the sentence now reads: “Ubiquitin-related proteins of the SUMO family are reversibly attached to thousands of proteins”. The summary has a word limit, hence we did not expand further at this place.

      Page 4 - "Both events require SUMO-binding motifs (reviewed, e.g. in 7 ." - The end bracket is missing. Also, isn't it too strong a statement that paralogue specificity always requires a SIM? I don't know all the literature sufficiently well, but the authors could double-check if it is correct to say that paralogue-specific SUMOylation always depends on a SIM.

      Thank you, we added the missing bracket. We agree that it would not be correct to say that paralogue-specificity always depends on a SIM. One alternative example is Dpp9, which shows a clear preference for SUMO1 without owning a SIM. Instead, Dpp9 harbors an alternative SUMO-binding motif, the E67-interacting loop, with a strong paralogue-preference (Pilla et al., 2012). We never intended to imply that a SIM is required for paralogue preference and we also rather generically wrote “SUMO binding motif” instead of “SIM”. However, in the subsequent paragraph about SUMO binding motifs we only go into details of SIMs as one of three classes of SUMO binding motifs not even mentioning the alternative classes. To make this more obvious, we now list the two other known classes of SUMO binding motifs hoping that it will shed the correct light onto our previous statement about paralogue preference.

      Page 4 - In the nice discussion of different types of SIMs, the authors could consider mentioning also the special case of TDP2, which is used later by them as a model binding protein. This could provide an occasion to explain what the unusual "split SIM", mentioned on page 6, but not discussed, is, and what its relation to a normal SIM is. Also, it can perhaps be mentioned that TDP2 contacts SUMO2 not only through the two hydrophobic elements contiguous in space that mimic a SIM but also through a slightly larger interface around these regions on the surface of a folded domain.

      Thank you for pointing this out. In the introduction, we extended our section on SUMO binding and now also included TDP2’s “split SIM”.

      Page 11-12 - In the section "Interaction between SUMO's disordered N-termini and the SIM binding groove is highly dynamic" (and corresponding figures), it should be stated that the discussed kinetic parameters are derived from molecular dynamics simulations and not experimental measurements. It was not very clear to me. This also applies to this sentence on page 17: "First, we observed a very fast (ns) rate of the binding/unbinding process", which in its current form suggests direct observation rather than simulation.

      We thank the reviewer for pointing this out, and in fact, Rev #1 made the same comment. We specified now clearly that the rates were calculated from MD simulations, in the Results and Discussion sections (on page 11-12 and 18 (previously 17)).

      Page 16 - The authors could briefly mention that this relatively long disordered N-terminal tail is a specific feature of SUMO proteins that distinguishes them from ubiquitin. I guess it is obvious to people from the SUMO field, but I don't think it is explicitly stated anywhere in the text and it could be interesting for readers who are less familiar with SUMO/ubiquitin differences.

      Thank you, we added a short half-sentence pointing out this difference.

      Page 17 - "The N-terminal region remains fully disordered in the bound state and is thus a classic example of intrinsic disorder irrespective of the binding state." - it could be added to this sentence that this is suggested by molecular dynamics simulations and not directly observed.

      We added the information that this finding is based on the MD simulations.

      Page 18 - "(e.g., 41,53 or flanking the SIM binding groove24,42" - the end bracket is missing.

      Thanks, we added it.

      Page 19 - "Our analysis in C. elegans (Fig. 7) suggests that this N-terminal function is particularly important in DNA damage response, a pathway that is strongly dependent on the SUMO system." - this brief description of the in vivo data seems to overgeneralise them a little bit. Perhaps one can describe what was observed with slightly more nuance.

      See changes on p.19, lines 510ff.

    1. Author response

      The following is the authors’ response to the original reviews.

      We thank the editors and reviewers for their thoughtful comments on our manuscript. We greatly appreciated the suggestions and recommendations that helped us to improve the study. With adaptations, and inclusion of novel data and analyses, we have addressed all points raised, and hope that by these improvements the study further meets the standards for eLife. 

      Reviewer #1 (Recommendations For The Authors):

      Minor text edits should be made.

      (1.1) As a recent study from the Wong lab also showed sebaceous gland regeneration following complete ablation (Veniaminova et al., 2023), this finding should be mentioned in the text, and the abstract ("Most strikingly...") should be toned down.

      We thank the reviewer for the positive feedback, and for highlighting this part of the study from the Wong lab. Although we cited this study study in a different context, we had not discussed the sebaceous gland regeneration finding. We have now added this to the discussion section of the manuscript.

      (1.2) Introduction: In lines 31-33 discussing the connection of sebaceous glands with skin disorders, the 5 references cited seem to replicate the citations from a similar sentence in Veniaminova et al., 2019. The authors should vary their citations, as there are likely other publications that can be cited here.

      Additional references have been added.

      Reviewer #2 (Recommendations For The Authors):

      The manuscript is well written and the data are well presented in the figures.

      We thank the reviewer for the positive feedback.

      (2.1) Here are some points that could be taken into consideration to improve the manuscript:

      - Row 75 "the primary" regulator could be changed to "a crucial".

      We appreciate this suggestion and have made the text edit.

      - Row 86 could be added: ...is the dominant ligand of the Notch signalling.

      We have made the text edit as suggested.

      (2.2) Row 107-109 from the quantification of Figure 1G and Figure 2 it seems that only the aJ2 treatment has an SG phenotype. Why aJ1 doesn't have any effect? (same is true in other figures). If the data on aJ1 are maintained in the manuscript, this should be argued in the discussion section.

      The reviewer is correct in noting that the aJ1 treatment does not cause the phenotype, and this is indeed one of the key findings of the study. This is maintained throughout the manuscript. We have also cited references showing that embryonic and adult deletions of Jag1 do not cause any sebaceous gland defects. All these data argue that Jag1 is not the relevant Notch signaling ligand in sebocyte differentiation. We have further clarified this in the manuscript.

      (2.3) Related to Figure 3G. As the Lrig1 stem cells can go towards both the sebocyte differentiation, or the sebaceous duct differentiation, it would be interesting to evaluate if the differentiation impairment caused by the antibody treatment affects in a similar manner (or not) the sebaceous duct differentiation. This could be tested through immunofluorescence, selecting markers of sebaceous duct.

      We thank the reviewer for this thoughtful question. We are unable to find any unique markers of the sebaceous ducts (that are not expressed in other parts of the sebaceous gland, especially sebocytes) in the literature, thus, any analysis of markers would be confounded by its change of expression due to the loss of sebocytes.

      However, we have evaluated the histology using bursting sebocytes releasing sebum as a proxy of a functional sebaceous duct. We have not found any significant differences between treatments using this metric (Fig. S1).

      (2.4) As the word "therapeutic" is often underlined in the manuscript, maybe a few sentences on the transnational aspects of the results could be added to the discussion.

      We thank the reviewer for highlighting this point. We have added this to the discussion.

      (2.5) Figure 3 suggests that Jag2 is produced by basal sebocytes and used by these cells to induce sebocyte differentiation. I'm wondering if in an in vitro cell system (with a mixture of marked Jag2-expressing cells and marked Jag2-negative cells), it would be possible to understand if this mechanism of differentiation is a cell-autonomous mechanism or a mechanism based on cell competition (for instance, it would be possible that the progenitors compete for their niche on the basal layer by pushing neighbouring basal cells to differentiate presenting them Jag2).

      We thank the reviewer for the insightful suggestion. The mechanistic underpinning of how Notch signaling induces sebocyte differentiation is still unclear, and we find the reviewer’s suggestion very interesting. However, establishing an in vitro model that captures the aspects mentioned, would require a lot of optimization and validation. To help rapid dissemination of our findings we elected to keep this out of the manuscript, but we will certainly consider it for future studies.

      Reviewer #3 (Recommendations For The Authors):

      (3.1) The authors focussed on mouse back skin sebaceous glands to analyse the phenotype. Are the effects also reproducible in the sebaceous glands of the mouse ears and tail epidermis? If so, the data should be strengthened by quantifying the phenotype using tail epidermal whole mounts (Braun et al., 2003; Development, PMID: 12954714), ideally by co-staining sebaceous glands for differentiation markers (e.g. FASN, Adipophilin) or lipid deposits (e.g., Oil red O). Also, the authors need to clarify how many sebaceous glands were scored per mouse. If not, please provide a rationale explaining the location restriction.

      We thank the reviewer for pointing this out. Indeed, we have only incorporated data from the telogen dorsal skin of the animals. We have now more accurately reflected this in the revised manuscript. Additionally, we have added the number of sebaceous glands quantified in each figure per the reviewer’s suggestion.

      Since the stage of hair growth cycle can affect the sebaceous glands, we chose the resting (telogen) phase of the hair cycle to reliably study the sebaceous glands. At 8 weeks of age, hair follicles have uniformly entered the telogen phase. As subsequent re-entry into the anagen phase is asynchronous in the adult skin, the color of the dorsal skin of C57BL/6 mice can be used to determine whether the hair follicles are in the telogen phase or not. These reasons led us to choose this location, allowing us to study only telogen phase hair follicles.

      We also point out that previously reported data (Estrach et al., 2006) did not show differences between dorsal and tail skin, so we assume the mechanisms must largely be conserved. However, as the reviewer rightfully points out, we cannot be sure and have, therefore, indicated the dorsal location throughout the manuscript.

      (3.2) The micrographs in Figure 2 suggest that expression of both Jagged2 and Notch1 (intercellular domain) is not restricted to the sebaceous glands, as both molecules appear to be detected also in the isthmus and lower hair follicle. Of note, the online tool provided by the Kasper and Linnarsson labs (http://linnarssonlab.org/epidermis/) shows that both molecules are more widely expressed in mouse back skin. Please provide some analysis of the overall expression of these molecules in mouse skin. In line, is the observed effect of using the antagonising antibodies restricted to the sebaceous glands? Please provide additional data on proliferation and differentiation in the interfollicular epidermis, hair follicle cycling, and other skin compartments. For instance, the data published in the cited paper by Lafkas et al. (2005) suggest a thickening of the dermal adipocyte layer upon Jagged2 inhibition using monoclonal therapeutic antibodies.

      The reviewer is correct in noting that expression of both Jag2 and Notch1 is not restricted to the sebaceous gland. The Notch signaling pathway is a well-known regulator for epidermal differentiation, and members of the pathway are expressed in various locations of the skin, including the interfollicular epidermis and the hair follicle. The expression and function of Notch signaling in these locations has been reviewed in (Hsu et al., 2014; Nowell and Radtke, 2013; Watt et al., 2008). We have also added zoomed out images showing expression of Jag2 and Notch1 in the skin (Figure S2e,f).

      The effect of the antagonizing antibodies is not restricted to sebaceous glands, as we already noted in our discussion section: “While injections of the Notch blocking antibodies are systemic, we only observed a reduction in the number of Notch-active cells in the IFE, but not a complete loss.” The functional impact of the antibodies is likely beyond the sebaceous gland, as the reviewer points out, but understanding the full effect in other compartments, we consider beyond the scope of the current study.

      In our previous study (Lafkas et al., 2015), the skin was examined at different animal ages/gender and using different antibody dosing regimens, which is the likely explanation for the differences observed. We have now quantified the width of the adipocyte layer and the IFE and show that there are no significant differences between treatments (Figure S1g-j). This together with the histology suggest that there are no significant differences in the differentiation and proliferation of these compartments.

      (3.3) Since Jagged1 is a Wnt/beta-catenin target gene that is essential for (ectopic) hair follicle formation and differentiation (Estrach et al., 2006, Development, PMID: 17035290) and the sebaceous gland is widely considered as an epidermal compartment with absent/low Wnt/beta-catenin pathway activity during normal homeostasis (Lim & Nusse, 2013, Cold Spring Habor Perspectives in Biology, PMID: 23209129), how is the expression of Notch1 and Jagged2 regulated upstream in sebocyte progenitors? It would be important to bring some more mechanistic insights into the upstream regulation of Notch activity. In line with comment 2, how are the compartment-specific effects molecularly regulated if the effects are not restricted to the sebaceous glands?

      The reviewer is correct in noting that the Wnt pathway does not seem to be a likely candidate for driving sebocyte differentiation through Notch signaling. Indeed, Wnt inhibition is required for sebocyte differentiation (Merrill et al., 2001; Niemann et al., 2002), and the Jag2 promoter region also does not contain TCF binding sites (Katoh and Katoh, 2006).

      We speculate that Myc might regulate Notch signaling in the sebaceous gland. It is expressed in the sebaceous gland basal stem cells and has been reported to positively regulate sebocyte differentiation (Cottle et al., 2013). In addition, studies have shown that Jag2 is a Myc target gene (Fiaschetti et al., 2014; Yustein et al., 2010). However, evaluating which upstream pathway potentially regulates Notch signaling, and resolving the regulatory network of sebocyte differentiation beyond the direct Notch ligands and receptors would require extensive in vivo modeling using KO and transgenic animals, which we consider to be beyond the scope of the current manuscript.

      References

      Cottle DL, Kretzschmar K, Schweiger PJ, Quist SR, Gollnick HP, Natsuga K, Aoyagi S, Watt FM. 2013. c-MYC-Induced Sebaceous Gland Differentiation Is Controlled by an Androgen Receptor/p53 Axis. Cell Rep 3:427–441. doi:10.1016/j.celrep.2013.01.013

      Estrach S, Ambler CA, Celso CLL, Hozumi K, Watt FM. 2006. Jagged 1 is a β-catenin target gene required for ectopic hair follicle formation in adult epidermis. Development 133:4427–4438. doi:10.1242/dev.02644

      Fiaschetti G, Schroeder C, Castelletti D, Arcaro A, Westermann F, Baumgartner M, Shalaby T, Grotzer MA. 2014. NOTCH ligands JAG1 and JAG2 as critical pro-survival factors in childhood medulloblastoma. Acta Neuropathol Commun 2:39. doi:10.1186/2051-5960-2-39

      Hsu Y-C, Li L, Fuchs E. 2014. Emerging interactions between skin stem cells and their niches. Nat Med 20:847–856. doi:10.1038/nm.3643

      Katoh Masuko, Katoh Masaru. 2006. Notch ligand, JAG1, is evolutionarily conserved target of canonical WNT signaling pathway in progenitor cells. Int J Mol Med. doi:10.3892/ijmm.17.4.681

      Lafkas D, Shelton A, Chiu C, Boenig G de L, Chen Y, Stawicki SS, Siltanen C, Reichelt M, Zhou M, Wu X, Eastham-Anderson J, Moore H, Roose-Girma M, Chinn Y, Hang JQ, Warming S, Egen J, Lee WP, Austin C, Wu Y, Payandeh J, Lowe JB, Siebel CW. 2015. Therapeutic antibodies reveal Notch control of transdifferentiation in the adult lung. Nature 528:127–131. doi:10.1038/nature15715

      Merrill BJ, Gat U, DasGupta R, Fuchs E. 2001. Tcf3 and Lef1 regulate lineage differentiation of multipotent stem cells in skin. Genes Dev 15:1688–1705. doi:10.1101/gad.891401

      Niemann C, Owens DM, Hülsken J, Birchmeier W, Watt FM. 2002. Expression of ΔNLef1 in mouse epidermis results in differentiation of hair follicles into squamous epidermal cysts and formation of skin tumours. Development 129:95–109. doi:10.1242/dev.129.1.95

      Nowell C, Radtke F. 2013. Cutaneous Notch Signaling in Health and Disease. Cold Spring Harb Perspect Med 3:a017772. doi:10.1101/cshperspect.a017772

      Watt FM, Estrach S, Ambler CA. 2008. Epidermal Notch signalling: differentiation, cancer and adhesion. Curr Opin Cell Biol 20:171–179. doi:10.1016/j.ceb.2008.01.010

      Yustein JT, Liu Y-C, Gao P, Jie C, Le A, Vuica-Ross M, Chng WJ, Eberhart CG, Bergsagel PL, Dang CV. 2010. Induction of ectopic Myc target gene JAG2 augments hypoxic growth and tumorigenesis in a human B-cell model. Proc Natl Acad Sci 107:3534–3539. doi:10.1073/pnas.0901230107

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This important research uses an elegant combination of protein-protein biochemistry, genetics, and microscopy to demonstrate that the novel bacterial protein FipA is required for polar flagella synthesis and binds to FlhF in multiple bacterial species. This manuscript is convincing, providing evidence for the early stages of flagellar synthesis at a cell pole; however, the protein biochemistry is incomplete and would benefit from additional rigorous experiments. This paper could be of significant interest to microbiologists studying bacterial motility, appendages, and cellular biology.

      We are very grateful for the very positive and helpful evaluation.

      Joint Public Review:

      Bacteria exhibit species-specific numbers and localization patterns of flagella. How specificity in number and pattern is achieved in Gamma-proteobacteria needs to be better understood but often depends on a soluble GTPase called FlhF. Here, the authors take an unbiased protein-pulldown approach with FlhF, resulting in identifying the protein FipA in V. parahaemolyticus. They convincingly demonstrate that FipA interacts genetically and biochemically with previously known spatial regulators HubP and FlhF. FipA is a membrane protein with a cytoplasmic DUF2802; it co-localizes to the flagellated pole with HubP and FlhF. The DUF2802 mediates the interaction between FipA and FlhF, and this interaction is required for FipA function. Altogether, the authors show that FipA likely facilitates the recruitment of FlhF to the membrane at the cell pole together with the known recruitment factor HupB. This finding is crucial in understanding the mechanism of polar localization. The authors show that FipA co-occurs with FlhF in the genomes of bacteria with polarly-localized flagella and study the role of FipA in three of these organisms: V. parahaemolyticus, S. purtefaciens, and P. putida. In each case, they show that FipA contributes to FlhF polar localization, flagellar assembly, flagellar patterning, and motility, though the details differ among the species. By comparing the role of FipA in polar flagellum assembly in three different species, they discover that, while FipA is required in all three systems, evolution has brought different nuances that open avenues for further discoveries.

      Strengths:

      The discovery of a novel factor for polar flagellum development. The solid nature and flow of the experimental work.

      The authors perform a comprehensive analysis of FipA, including phenotyping of mutants, protein localization, localization dependence, and domains of FipA necessary for each. Moreover, they perform a time-series analysis indicating that FipA localizes to the cell pole likely before, or at least coincident with, flagellar assembly. They also show that the role of FipA appears to differ between organisms in detail, but the overarching idea that it is a flagellar assembly/localization factor remains convincing.

      The work is well-executed, relying on bacterial genetics, cell biology, and protein interaction studies. The analysis is deep, beginning with discovering a new and conserved factor, then the molecular dissection of the protein, and finally, probing localization and interaction determinants. Finally, the authors show that these determinants are important for function; they perform these studies in parallel in three model systems.

      Weaknesses:

      The comparative analysis in the different organisms was on balance, a weakness. Mixing the data for the organisms together made the text difficult to read and took away key points from the results. The individual details crowded out the model in its current form. Indeed, because some of the phenotypes and localization dependencies differ between model systems, the comparison is challenging to the reader. The authors could more clearly state what these differences mean, why they arise, and (in the discussion) how they might relate to the organism's lifestyle.

      More experiments would be needed to fully analyze the effects of interacting proteins on individual protein stability; this absence slightly detracted from the conclusions.

      We have tried our best to improve the manuscript according to the insightful suggestions of the reviewers. Please find our answers to the raised issues below.

      Reviewer #1 (Recommendations For The Authors):

      We are very grateful to this reviewer for the very positive evaluation and the great suggestions to improve the manuscript.

      I think there is value to the comparative analysis but how to present it in such a way that the key similarities and differences stand out is the challenge. Perhaps a table that compares the three datasets is sufficient. Or tell the story of V. parahaemolyticus first to establish the model, followed by comparative analysis of the other two organisms highlighting differences and relegating similarities to supplemental?

      We agree that the our previous presentation of our comparative analysis made it very hard to follow the major findings and the general role(s) of FipA, and we are very grateful for the suggestions on how to improve this. We have decided to change the presentation as the reviewer recommended. We used V. parahaemolyticus as a ‚lead model‘ to describe the role of FipA, and we then compared the major findings to the other two species. We hope that the story is now easier to follow.

      This is not something that needs to be addressed in the text but I wanted to bring the protein SwrB to the authors' attention which may further expand FipA relevance. Bacillus subtilis uses FlhFG to somehow pattern flagella in a peritrichous arrangement and there are a number of striking similarities, in my opinion, between FipA and SwrB. The two proteins have very similar domain architecture/topology, both proteins promote flagellar assembly, and the genetic neighborhood/operon organization is uncannily similar. There are other more minor similarities dependent on the organism in this paper.

      Phillips, Kearns. 2021. Molecular and cell biological analysis of SwrB in Bacillus subtilis. J Bacteriol 203:e0022721

      Phillips, Kearns. 2015. Functional activation of the flagellar type III secretion export apparatus. PLoS Genet 11:e1005443.

      We thank this reviewer for pointing out these intriguing similarities. For this study we have decided to exclusively concentrate on polarly flagellated bacteria. FlhF und FlhG are also present in B. subtilis where they play a role in organizing flagellation, but we feel that this would be out of scope for this manuscript.

      Reviewer #2 (Recommendations For The Authors):

      We would like to thank this reviewer for the very positive evaluation and for pointing out several issues to strengthen the story.

      Figure 3A data are problematic since everything is too small to visualize. Since these are functional GFP fusions (or mCherry for 2E data), why are they not presented in color?

      Again - why are color figures not used to help the reader in Fig 4A and 5F & 5G to confirm what is asserted?

      Again, it is difficult to see the images presented. It is asserted that FipA is recruited to the cell pole after cell division and before flagellum assembly, but one has to take their word for it.

      We fully agree that in some case the localization pattern is hard to see on the micrographs presented. We have, therefore, provided enlarged micrographs in the supplemental part which allow to better see the fluorescent foci within the cells. With respect to presentations in color – we found that this did not improve the visibility of localizations and therefore have decided to use the grayscale images.

      Here, what is missing are turnover assays. Do FipA, FlhF, and HubP all co-localize as complex or is the absence of one leading to the protein turnover of other partners? I think this needs to be sorted out before final conclusions can be made.

      Thanks for pointing out this important point. We have now provided western analysis which demonstrate that FipA and FlhF are produced and stable in the absence of the other partners (see Supplemental Figure 5). Stability of HubP as a general polar marker not only required for flagellation was not determined.

      Minor comments:

      Line 58: change "around" to "in timing with"

      Line 79: what "signal" is transferred from the C-ring to the MS-ring. Are they not fully connected such that rotation is the entire structure - C-ring-MS-ring-Rod-Hook-Filament. Is it not the change in the relationship to the stator complex where the signal is transferred?

      Line 85: change "counting" to "control of flagellar numbers per cell"

      Line 110: change "is (co-)responsible for recruiting" to "facilitates recruitment of"

      Thanks for pointing this out. We have adjusted the wording according to the reviewer’s suggestions.

      Given that motility phenotypes vary on individual plates (volumes and dryness vary), why in Figure 2C are the motility assays for fipA and flhF mutants of P. putida done on different plates?

      For better visualisation, we have rearranged the spreading halos for the figure. All strain spreading comparisons on soft agar were always conducted on the same plate due to the reasons this reviewer mentioned.

      Reviewer #3 (Recommendations For The Authors):

      We thank this reviewer for the very positive evalution and the great suggestions.

      One possibility is to describe first all the results relating to FipA in Vibrio and then add the result sections at the end to illustrate the differences between Vibrio and Shewanella, and then Vibrio and Pseudomonas. This may make it easier to follow for the reader.

      We agree that the our previous presentation of our comparative analysis made it very hard to follow the major findings and the general role(s) of FipA, and we are very grateful for the suggestions on how to improve this. We have decided to change the presentation as the reviewer recommended. We used V. parahaemolyticus as a ‚lead model‘ to describe the role of FipA, and we then compared the major findings to the other two species. We hope that the story is now easier to follow.

      I would have liked to see some TEM analysis of flagella in fipA/hubP double mutants strains and was also wondering if FipA/FlhF/HubP colocalization had been studied in E. coli when all proteins are expressed together, at least with two bearing fluorescent tags.

      Thanks for these great suggestions. In this study, we have concentrated on the localization of FlhF by FipA and HubP. HubP has multiple functions in the cell and may also affect flagellar synthesis to some extent in a species-specific fashion. Therefore, any findings would have to be discussed very carefully, so we have decided to leave that out for the time being.

      With respect to the FipA/HubP/FlhF production in a heterologous host such as E. coli, this has been partly done (without FipA) in a second parallel story (see reference to Dornes et al (2024) in this manuscript). Rebuilding larger parts of the system in a heterologous host is currently done in an independent study. Therefore, we have decided not to include this already here.

      From the Reviewing Editor:

      We are grateful for handling the fair reviewing process, for the positive evaluation and the helpful hints.

      The microscopy was inconsistent (DIC versus phase) for unclear reasons. Did using different microscopes impact the ability to acquire low-intensity fluorescence signals? Please add a sentence in the Methods section to clarify.

      We are sorry for this inconsistency. As the imaging was carried out by different labs (to some part before the projects were joined), the corresponding preferred microscopy settings were used. We have added an explaining sentence to the Methods section.

      Also, some subcellular fluorescence localizations were not visible in the selected images (e.g., Figures 3 and 5). The reader had to rely on the authors' statements and analyses. The conclusions could be more robust with fluorescence measurements across the cell body for a subset of cells. The authors could provide this data analysis in the Supplemental; this measurement would more clearly show an accumulation of fluorescence at the cell pole, particularly in low-intensity images.

      We fully agree that in some case the localization pattern is hard to see on the micrographs presented. Unfortunately, often the signal is not sufficiently strong to provied proper demographs. We have, therefore, provided enlarged micrographs in the supplemental part, which allow to better see the fluorescent foci within the cells.

    1. Author response:

      We sincerely thank the reviewers for their thoughtful, critical, and constructive comments, which will help us in further exploring the mechanisms by which LDH regulates glycolysis, the tricarboxylic acid cycle, and oxidative phosphorylation future studies. The following is our responses to the reviewers' comments.

      Reviewer #1 (Public Review):

      Summary:

      Zeng et al. have investigated the impact of inhibiting lactate dehydrogenase (LDH) on glycolysis and the tricarboxylic acid cycle. LDH is the terminal enzyme of aerobic glycolysis or fermentation that converts pyruvate and NADH to lactate and NAD+ and is essential for the fermentation pathway as it recycles NAD+ needed by upstream glyceraldehyde-3-phosphate dehydrogenase. As the authors point out in the introduction, multiple published reports have shown that inhibition of LDH in cancer cells typically leads to a switch from fermentative ATP production to respiratory ATP production (i.e., glucose uptake and lactate secretion are decreased, and oxygen consumption is increased). The presumed logic of this metabolic rearrangement is that when glycolytic ATP production is inhibited due to LDH inhibition, the cell switches to producing more ATP using respiration. This observation is similar to the well-established Crabtree and Pasteur effects, where cells switch between fermentation and respiration due to the availability of glucose and oxygen. Unexpectedly, the authors observed that inhibition of LDH led to inhibition of respiration and not activation as previously observed. The authors perform rigorous measurements of glycolysis and TCA cycle activity, demonstrating that under their experimental conditions, respiration is indeed inhibited. Given the large body of work reporting the opposite result, it is difficult to reconcile the reasons for the discrepancy. In this reviewer's opinion, a reason for the discrepancy may be that the authors performed their measurements 6 hours after inhibiting LDH. Six hours is a very long time for assessing the direct impact of a perturbation on metabolic pathway activity, which is regulated on a timescale of seconds to minutes. The observed effects are likely the result of a combination of many downstream responses that happen within 6 hours of inhibiting LDH that causes a large decrease in ATP production, inhibition of cell proliferation, and likely a range of stress responses, including gene expression changes.

      Strengths:

      The regulation of metabolic pathways is incompletely understood, and more research is needed, such as the one conducted here. The authors performed an impressive set of measurements of metabolite levels in response to inhibition of LDH using a combination of rigorous approaches.

      Weaknesses:

      Glycolysis, TCA cycle, and respiration are regulated on a timescale of seconds to minutes. The main weakness of this study is the long drug treatment time of 6 hours, which was chosen for all the experiments. In this reviewer's opinion, if the goal was to investigate the direct impact of LDH inhibition on glycolysis and the TCA cycle, most of the experiments should have been performed immediately after or within minutes of LDH inhibition. After 6 hours of inhibiting LDH and ATP production, cells undergo a whole range of responses, and most of the observed effects are likely indirect due to the many downstream effects of LDH and ATP production inhibition, such as decreased cell proliferation, decreased energy demand, activation of stress response pathways, etc.

      We appreciate the reviewer’s critical comments. The main argument is whether the inhibition of LDH induces a temporal perturbation in glycolysis, the TCA cycle, and OXPHOS, or if it leads to a shift to a new steady state. We argue that this shift represents a transition between two steady states; specifically, GNE-140 treatment drives metabolism from one steady state to another.

      Before conducting the experiment, we performed a time course experiment, measuring glucose consumption and lactate production in cells treated with GNE-140. The results demonstrated a very good linearity, indicating that the glycolytic rate remained constant—thus confirming that glycolysis was at steady state. Given the tight coupling between glycolysis, the TCA cycle, and OXPHOS, we infer that the TCA cycle and OXPHOS were also at steady state. However, this ‘infer’ requires further confirmation.

      Multiple published reports have shown that LDH inhibition in cancer cells causes a shift from fermentative ATP production to respiratory ATP production. This notion persists because it is often compared to the well-established Crabtree and Pasteur effects, where cells toggle between fermentation and respiration based on glucose and oxygen availability. However, in the Pasteur or Crabtree effects, the deprivation of oxygen—the terminal electron acceptor—drives the switch, which is fundamentally different from LDH inhibition.

      Reviewer #2 (Public Review):

      Summary:

      Zeng et al. investigated the role of LDH in determining the metabolic fate of pyruvate in HeLa and 4T1 cells. To do this, three broad perturbations were applied: knockout of two LDH isoforms (LDH-A and LDH-B), titration with a non-competitive LDH inhibitor (GNE-140), and exposure to either normoxic (21% O2) or hypoxic (1% O2) conditions. They show that knockout of either LDH isoform alone, though reducing both protein level and enzyme activity, has virtually no effect on either the incorporation of a stable 13C-label from a 13C6-glucose into any glycolytic or TCA cycle intermediate, nor on the measured intracellular concentrations of any glycolytic intermediate (Figure 2). The only apparent exception to this was the NADH/NAD+ ratio, measured as the ratio of F420/F480 emitted from a fluorescent tag (SoNar).

      The addition of a chemical inhibitor, on the other hand, did lead to changes in glycolytic flux, the concentrations of glycolytic intermediates, and in the NADH/NAD+ ratio (Figure 3). Notably, this was most evident in the LDH-B-knockout, in agreement with the increased sensitivity of LDH-A to GNE-140 (Figure 2). In the LDH-B-knockout, increasing concentrations of GNE-140 increased the NADH/NAD+ ratio, reduced glucose uptake, and lactate production, and led to an accumulation of glycolytic intermediates immediately upstream of GAPDH (GA3P, DHAP, and FBP) and a decrease in the product of GAPDH (3PG). They continue to show that this effect is even stronger in cells exposed to hypoxic conditions (Figure 4). They propose that a shift to thermodynamic unfavourability, initiated by an increased NADH/NAD+ ratio inhibiting GAPDH explains the cascade, calculating ΔG values that become progressively more endergonic at increasing inhibitor concentrations.

      Then - in two separate experiments - the authors track the incorporation of 13C into the intermediates of the TCA cycle from a 13C6-glucose and a 13C5-glutamine. They use the proportion of labelled intermediates as a proxy for how much pyruvate enters the TCA cycle (Figure 5). They conclude that the inhibition of LDH decreases fermentation, but also the TCA cycle and OXPHOS flux - and hence the flux of pyruvate to all of those pathways. Finally, they characterise the production of ATP from respiratory or fermentative routes, the concentration of a number of cofactors (ATP, ADP, AMP, NAD(P)H, NAD(P)+, and GSH/GSSG), the cell count, and cell viability under four conditions: with and without the highest inhibitor concentration, and at norm- and hypoxia. From this, they conclude that the inhibition of LDH inhibits the glycolysis, the TCA cycle, and OXPHOS simultaneously (Figure 7).

      Strengths:

      The authors present an impressively detailed set of measurements under a variety of conditions. It is clear that a huge effort was made to characterise the steady-state properties (metabolite concentrations, fluxes) as well as the partitioning of pyruvate between fermentation as opposed to the TCA cycle and OXPHOS.

      A couple of intermediary conclusions are well supported, with the hypothesis underlying the next measurement clearly following. For instance, the authors refer to literature reports that LDH activity is highly redundant in cancer cells (lines 108 - 144). They prove this point convincingly in Figure 1, showing that both the A- and B-isoforms of LDH can be knocked out without any noticeable changes in specific glucose consumption or lactate production flux, or, for that matter, in the rate at which any of the pathway intermediates are produced. Pyruvate incorporation into the TCA cycle and the oxygen consumption rate are also shown to be unaffected.

      They checked the specificity of the inhibitor and found good agreement between the inhibitory capacity of GNE-140 on the two isoforms of LDH and the glycolytic flux (lines 229 - 243). The authors also provide a logical interpretation of the first couple of consequences following LDH inhibition: an increased NADH/NAD+ ratio leading to the inhibition of GAPDH, causing upstream accumulations and downstream metabolite decreases (lines 348 - 355).

      Weaknesses:

      Despite the inarguable comprehensiveness of the data set, a number of conceptual shortcomings afflict the manuscript. First and foremost, reasoning is often not pursued to a logical conclusion. For instance, the accumulation of intermediates upstream of GAPDH is proffered as an explanation for the decreased flux through glycolysis. However, in Figure 3C it is clear that there is no accumulation of the intermediates upstream of PFK. It is unclear, therefore, how this traffic jam is propagated back to a decrease in glucose uptake. A possible explanation might lie with hexokinase and the decrease in ATP (and constant ADP) demonstrated in Figure 6B, but this link is not made.

      We appreciate the reviewer's critical comment. In Figure 3C, there is no accumulation of F6P or G6P, which are upstream of PFK1. This is because the PFK1-catalyzed reaction sets a significant thermodynamic barrier. Even with treatment using 30 μM GNE-140, the ∆GPFK1 (Gibbs free energy of the PFK1-catalyzed reaction) remains -9.455 kJ/mol (Figure 3D), indicating that the reaction is still far from thermodynamic equilibrium, thereby preventing the accumulation of F6P and G6P.

      We agree with the reviewer that hexokinase inhibition may play a role, this requires further investigation.

      The obvious link between the NADH/NAD+ ratio and pyruvate dehydrogenase (PDH) is also never addressed, a mechanism that might explain how the pyruvate incorporation into the TCA cycle is impaired by the inhibition of LDH (the observation with which they start their discussion, lines 511 - 514).

      We agree with the reviewer’s comment. In this study, we did not explore how the inhibition of LDH affects pyruvate incorporation into the TCA cycle. As this mechanism was not investigated, we have titled the study: "Elucidating the Kinetic and Thermodynamic Insights into the Regulation of Glycolysis by Lactate Dehydrogenase and Its Impact on the Tricarboxylic Acid Cycle and Oxidative Phosphorylation in Cancer Cells."

      It was furthermore puzzling how the ΔG, calculated with intracellular metabolite concentrations (Figures 3 and 4) could be endergonic (positive) for PGAM at all conditions (also normoxic and without inhibitor). This would mean that under the conditions assayed, glycolysis would never flow completely forward. How any lactate or pyruvate is produced from glucose, is then unexplained.

      This issue also concerned me during the study. However, given the high reproducibility of the data, we consider it is true, but requires explanation.

      The PGAM-catalyzed reaction is tightly linked to both upstream and downstream reactions in the glycolytic pathway. In glycolysis, three key reactions catalyzed by HK2, PFK1, and PK are highly exergonic, providing the driving force for the conversion of glucose to pyruvate. The other reactions, including the one catalyzed by PGAM, operate near thermodynamic equilibrium and primarily serve to equilibrate glycolytic intermediates rather than control the overall direction of glycolysis, as previously described by us (J Biol Chem. 2024 Aug 8;300(9):107648).

      The endergonic nature of the PGAM-catalyzed reaction does not prevent it from proceeding in the forward direction. Instead, the directionality of the pathway is dictated by the exergonic reaction of PFK1 upstream, which pushes the flux forward, and by PK downstream, which pulls the flux through the pathway. The combined effects of PFK1 and PK may account for the observed endergonic state of the PGAM reaction.

      However, if the PGAM-catalyzed reaction were isolated from the glycolytic pathway, it would tend toward equilibrium and never surpass it, as there would be no driving force to move the reaction forward.

      Finally, the interpretation of the label incorporation data is rather unconvincing. The authors observe an increasing labelled fraction of TCA cycle intermediates as a function of increasing inhibitor concentration. Strangely, they conclude that less labelled pyruvate enters the TCA cycle while simultaneously less labelled intermediates exit the TCA cycle pool, leading to increased labelling of this pool. The reasoning that they present for this (decreased m2 fraction as a function of DHE-140 concentration) is by no means a consistent or striking feature of their titration data and comes across as rather unconvincing. Yet they treat this anomaly as resolved in the discussion that follows.

      GNE-140 treatment increased the labeling of TCA cycle intermediates by [13C6]glucose but decreased the OXPHOS rate, we consider the conflicting results as an 'anomaly' that warrants further explanation. To address this, we analyzed the labeling pattern of TCA cycle intermediates using both [13C6]glucose and  [13C5]glutamine. Tracing the incorporation of glucose- and glutamine-derived carbons into the TCA cycle suggests that LDH inhibition leads to a reduced flux of glucose-derived acetyl-CoA into the TCA cycle, coupled with a decreased flux of glutamine-derived α-KG, and a reduction in the efflux of intermediates from the cycle. These results align with theoretical predictions. Under any condition, the reactions that distribute TCA cycle intermediates to other pathways must be balanced by those that replenish them. In the GNE-140 treatment group, the entry of glutamine-derived carbon into the TCA cycle was reduced, implying that glucose-derived carbon (as acetyl-CoA) entering the TCA cycle must also be reduced, or vice versa.

      This step-by-step investigation is detailed under the subheading "The Effect of LDHB KO and GNE-140 on the Contribution of Glucose Carbon to the TCA Cycle and OXPHOS" in the Results section in the manuscript.

      In the Discussion, we emphasize that caution should be exercised when interpreting isotope tracing data. In this study, treatment of cells with GNE-140 led to an increase labeling percentage of TCAC intermediates by [13C6]glucose (Figure 5A-E). However, this does not necessarily imply an increase in glucose carbon flux into TCAC; rather, it indicates a reduction in both the flux of glucose carbon into TCAC and the flux of intermediates leaving TCAC. When interpreting the data, multiple factors must be considered, including the carbon-13 labeling pattern of the intermediates (m1, m2, m3, ---) (Figure 5G-K), replenishment of intermediates by glutamine (Figure 5M-V), and mitochondrial oxygen consumption rate (Figure 5W). All these factors should be taken into account to derive a proper interpretation of the data. 

      Reviewer #3 (Public Review):

      Hu et al in their manuscript attempt to interrogate the interplay between glycolysis, TCA activity, and OXPHOS using LDHA/B knockouts as well as LDH-specific inhibitors. Before I discuss the specifics, I have a few issues with the overall manuscript. First of all, based on numerous previous studies it is well established that glycolysis inhibition or forcing pyruvate into the TCA cycle (studies with PDKs inhibitors) leads to upregulation of TCA cycle activity, and OXPHOS, activation of glutaminolysis, etc (in this work authors claim that lowered glycolysis leads to lower levels of TCA activity/OXPHOS). The authors in the current work completely ignore recent studies that suggest that lactate itself is an important signaling metabolite that can modulate metabolism (actual mechanistic insights were recently presented by at least two groups (Thompson, Chouchani labs). In addition, extensive effort was dedicated to understanding the crosstalk between glycolysis/TCA cycle/OXPHOS using metabolic models (Titov, Rabinowitz labs). I have several comments on how experiments were performed. In the Methods section, it is stated that both HeLa and 4T1 cells were grown in RPMI-1640 medium with regular serum - but under these conditions, pyruvate is certainly present in the medium - this can easily complicate/invalidate some findings presented in this manuscript. In LDH enzymatic assays as described with cell homogenates controls were not explained or presented (a lot of enzymes in the homogenate can react with NADH!). One of the major issues I have is that glycolytic intermediates were measured in multiple enzyme-coupled assays. Although one might think it is a good approach to have quantitative numbers for each metabolite, the way it was done is that cell homogenates (potentially with still traces of activity of multiple glycolytic enzymes) were incubated with various combinations of the SAME enzymes and substrates they were supposed to measure as a part of the enzyme-based cycling reaction. I would prefer to see a comparison between numbers obtained in enzyme-based assays with GC-MS/LC-MS experiments (using calibration curves for respective metabolites, of course). Correct measurements of these metabolites are crucial especially when thermodynamic parameters for respective reactions are calculated. Concentrations of multiple graphs (Figure 1g etc.) are in "mM", I do not think that this is correct.

      While the roles of lactate as a signaling metabolite and metabolic models are important areas of research, our work focuses on different aspects.

      It is true that cell homogenates contain many enzymes that use NAD as a hydride acceptor or NADH as a hydride donor. However, in our assay system, the substrates are pyruvate and NADH, meaning only enzymes that catalyze the conversion of pyruvate + NADH to NAD + lactate can utilize NADH. Other enzymes do not interfere with this reaction. Although some enzymes may also catalyze this reaction, their catalytic efficiency is markedly lower than that of LDH, ensuring the validity of this assay.

      Similarly, the assays for glycolytic intermediates are validated by the substrate specificity.

      We have developed an LC-MS methodology for some glycolytic intermediates, but the accuracy of quantification remains unsatisfactory due to inherent limitations of this methodology.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      Lodhiya et al. demonstrate that antibiotics with distinct mechanisms of action, norfloxacin, and streptomycin, cause similar metabolic dysfunction in the model organism Mycobacterium smegmatis. This includes enhanced flux through the TCA cycle and respiration as well as a build-up of reactive oxygen species (ROS) and ATP. Genetic and/or pharmacologic depression of ROS or ATP levels protect M. smegmatis from norfloxacin and streptomycin killing. Because ATP depression is protective, but in some cases does not depress ROS, the authors surmise that excessive ATP is the primary mechanism by which norfloxacin and streptomycin kill M. smegmatis. In general, the experiments are carefully executed; alternative hypotheses are discussed and considered; the data are contextualized within the existing literature. Clarification of the effect of 1) ROS depression on ATP levels and 2) ADP vs. ATP on divalent metal chelation would strengthen the paper, as would discussion of points of difference with the existing literature. The authors might also consider removing Figures 9 and 10A-B as they distract from the main point of the paper and appear to be the beginning of a new story rather than the end of the current one. Finally, statistics need some attention.

      Strengths:

      The authors tackle a problem that is both biologically interesting and medically impactful, namely, the mechanism of antibiotic-induced cell death.

      Experiments are carefully executed, for example, numerous dose- and time-dependency studies; multiple, orthogonal readouts for ROS; and several methods for pharmacological and genetic depletion of ATP.

      There has been a lot of excitement and controversy in the field, and the authors do a nice job of situating their work in this larger context.

      Inherent limitations to some of their approaches are acknowledged and discussed e.g., normalizing ATP levels to viable counts of bacteria.

      We sincerely thanks appreciate the reviewer’s encouraging feedback.

      Weaknesses:

      The authors have shown that treatments that depress ATP do not necessarily repress ROS, and therefore conclude that ATP is the primary cause of norfloxacin and streptomycin lethality for M. smegmatis. Indeed, this is the most impactful claim of the paper. However, GSH and dipyridyl beautifully rescue viability. Do these and other ROS-repressing treatments impact ATP levels? If not, the authors should consider a more nuanced model and revise the title, abstract, and text accordingly.

      We thank the reviewer for asking this question. In the revised version of the manuscript, we will include data on the impact of the antioxidant GSH on ATP levels.

      Does ADP chelate divalent metal ions to the same extent as ATP? If so, it is difficult to understand how conversion of ADP to ATP by ATP synthase would alter metal sequestration without concomitant burst in ADP levels.

      We sincerely thank the reviewer for raising this insightful question. Indeed, ADP and AMP can also form complexes with divalent metal ions; however, these complexes tend to be less stable. According to the existing literature, ATP-metal ion complexes exhibit a higher formation constant compared to ADP or AMP complexes. This has been attributed to the polyphosphate chain of ATP, which acts as an active site, forming a highly stable tridentate structure (Khan et al., 1962; Distefano et al., 1953). An antibiotic-induced increase in ATP levels, irrespective of any changes in ADP levels, could still result in the formation of more stable complexes with metal ions, potentially leading to metal ion depletion. Although recent studies indicate that antibiotic treatment stimulates purine biosynthesis (Lobritz MA et al., 2022; Yang JH et al., 2019), thereby imposing energy demands and enhancing ATP production, the possibility of a corresponding increase in total purine nucleotide levels (ADP+ATP) exist (is mentioned in discussion section). However, this hypothesis requires further investigation.

      Khan MMT, Martell AE. Metal Chelates of Adenosine Triphosphate. Journal of Physical Chemistry (US). 1962 Jan 1;Vol: 66(1):10–5

      Distefano v, Neuman wf. Calcium complexes of adenosinetriphosphate and adenosinediphosphate and their significance in calcification in vitro. Journal of Biological Chemistry. 1953 Feb 1;200(2):759–63

      Lobritz MA, Andrews IW, Braff D, Porter CBM, Gutierrez A, Furuta Y, et al. Increased energy demand from anabolic-catabolic processes drives β-lactam antibiotic lethality. Cell Chem Biol [Internet]. 2022 Feb 17.

      Yang JH, Wright SN, Hamblin M, McCloskey D, Alcantar MA, Schrübbers L, et al. A White-Box Machine Learning Approach for Revealing Antibiotic Mechanisms of Action. Cell [Internet]. 2019 May 30

      Some of the results in the paper diverge from what has been previously reported by some of the referenced literature. These discrepancies should be clarified.

      We apologize for any confusion, but we are uncertain about the specific discrepancies the reviewer is referring. In the discussion section, we have addressed and analysed our results within the broader context of the existing literature, regardless of whether our findings align with or differ from previous studies.

      Reviewer #2 (Public review):

      Summary:

      The authors are trying to test the hypothesis that ATP bursts are the predominant driver of antibiotic lethality of Mycobacteria.

      Strengths:

      This reviewer has not identified any significant strengths of the paper in its current form.

      Weaknesses:

      A major weakness is that M. smegmatis has a doubling time of three hours and the authors are trying to conclude that their data would reflect the physiology of M. tuberculosis which has a doubling time of 24 hours. Moreover, the authors try to compare OD measurements with CFU counts and thus observe great variabilities.

      If the authors had evidence to support the conclusion that ATP burst is the predominant driver of antibiotic lethality in mycobacteria then this paper would be highly significant. However, with the way the paper is written, it is impossible to make this conclusion.

      We have identified this new mechanism of antibiotic action in Mycobacterium smegmatis and have also mentioned that whether and how much of this mechanism is true in other organism needs to be tested as argued extensively in the discussion section of the manuscript.

      We have always drawn inferences from the CFU counts as the OD600nm is never a reliable method as reported in all of our experiments.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors use microscopy experiments to track the gliding motion of filaments of the cyanobacteria Fluctiforma draycotensis. They find that filament motion consists of back-and-forth trajectories along a "track", interspersed with reversals of movement direction, with no clear dependence between filament speed and length. It is also observed that longer filaments can buckle and form plectonemes. A computational model is used to rationalize these findings.

      We thank the reviewer for this accurate summary of the presented work.

      Strengths:

      Much work in this field focuses on molecular mechanisms of motility; by tracking filament dynamics this work helps to connect molecular mechanisms to environmentally and industrially relevant ecological behavior such as aggregate formation.

      The observation that filaments move on tracks is interesting and potentially ecologically significant.

      The observation of rotating membrane-bound protein complexes and tubular arrangement of slime around the filament provides important clues to the mechanism of motion.

      The observation that long filaments buckle has the potential to shed light on the nature of mechanical forces in the filaments, e.g. through the study of the length dependence of buckling.

      We thank the reviewer for listing these positive aspects of the presented work.

      Weaknesses:

      The manuscript makes the interesting statement that the distribution of speed vs filament length is uniform, which would constrain the possibilities for mechanical coupling between the filaments. However, Figure 1C does not show a uniform distribution but rather an apparent lack of correlation between speed and filament length, while Figure S3 shows a dependence that is clearly increasing with filament length. Also, although it is claimed that the computational model reproduces the key features of the experiments, no data is shown for the dependence of speed on filament length in the computational model. The statement that is made about the model "all or most cells contribute to propulsive force generation, as seen from a uniform distribution of mean speed across different filament lengths", seems to be contradictory, since if each cell contributes to the force one might expect that speed would increase with filament length.

      We agree that the data shows in general a lack of correlation, rather than strictly being uniform. In the revised manuscript, we intend to collect more data from observations on glass to better understand the relation between filament length and speed. 

      In considering longer filaments, one also needs to consider the increased drag created by each additional cell - in other words, overall friction will either increase or be constant as filament length increases. Therefore, if only one cell (or few cells) are generating motility forces, then adding more cells in longer filaments would decrease speed.

      Since the current data does not show any decrease in speed with increasing filament length, we stand by the argument that the data supports that all (or most) cells in a filament are involved in force generation for motility. We would revise the manuscript to make this point - and our arguments about assuming multiple / most cells in a filament contributing to motility - clear.

      The computational model misses perhaps the most interesting aspect of the experimental results which is the coupling between rotation, slime generation, and motion. While the dependence of synchronization and reversal efficiency on internal model parameters are explored (Figure 2D), these model parameters cannot be connected with biological reality. The model predictions seem somewhat simplistic: that less coupling leads to more erratic reversal and that the number of reversals matches the expected number (which appears to be simply consistent with a filament moving backwards and forwards on a track at constant speed).

      We agree that the coupling between rotation, slime generation and motion is interesting and important when studying the specific mechanism leading to filament motion. However, we believe it even more fundamental to consider the intercellular coordination that is needed to realise this motion. Individual filaments are a collection of independent cells. This raises the question of how they can coordinate their thrust generation in such a way that the whole filament can both move and reverse direction of motion as a single unit. With the presented model, we want to start addressing precisely this point.

      The model allows us to qualitatively understand the relation between coupling strength and reversals (erratic vs. coordinated motion of the filament). It also provides a hint about the possibility of de-coordination, which we then look for and identify in longer filaments.

      While the model results seem obvious in hindsight, the analysis of the model allows phrasing the question of cell-to-cell coordination, which has not been brought up previously when considering the inherently multi-cell process of filament motility.

      Filament buckling is not analysed in quantitative detail, which seems to be a missed opportunity to connect with the computational model, eg by predicting the length dependence of buckling.

      Please note that Figure S10 provides an analysis of filament length and number of buckling instances observed. This suggests that buckling happens only in filaments above a certain length.

      We do agree that further analyses of buckling - both experimentally and through modelling would be interesting.  This study, however,  focussed on cell-to-cell coupling / coordination during filament motility. We have identified the possibility of de-coordination through the use of a simple 1D model of motion, and found evidence of such de-coordination in experiments. Notice that the buckling we report does not depend on the filament hitting an external object. It is a direct result of a filament activity which, in this context, serves as evidence of cellular de-coordination.

      Now that we have observed buckling and plectoneme formation, these processes need to be analysed with additional experiments and modelling. The appropriate model for this process needs to be 3D, and should ideally include torques arising from filament rotation. Experimentally, we need to identify means of influencing filament length and motion and see if we can measure buckling frequency and position across different filament lengths. These works are ongoing and will have to be summarised in a separate, future publication.

      Reviewer #2 (Public review):

      Summary:

      The authors combined time-lapse microscopy with biophysical modeling to study the mechanisms and timescales of gliding and reversals in filamentous cyanobacterium Fluctiforma draycotensis. They observed the highly coordinated behavior of protein complexes moving in a helical fashion on cells' surfaces and along individual filaments as well as their de-coordination, which induces buckling in long filaments.

      We thank the reviewer for this accurate summary of the presented work.

      Strengths:

      The authors provided concrete experimental evidence of cellular coordination and de-coordination of motility between cells along individual filaments. The evidence is comprised of individual trajectories of filaments that glide and reverse on surfaces as well as the helical trajectories of membrane-bound protein complexes that move on individual filaments and are implicated in generating propulsive forces.

      We thank the reviewer for listing these positive aspects of the presented work.

      Limitations:

      The biophysical model is one-dimensional and thus does not capture the buckling observed in long filaments. I expect that the buckling contains useful information since it reflects the competition between bending rigidity, the speed at which cell synchronization occurs, and the strength of the propulsion forces.

      Cell-to-cell coordination is a more fundamental phenomenon than the buckling and twisting of longer filaments, in that the latter is a consequence of limits of the former. In this sense, we are focussing here on something that we think is the necessary first step to understand filament gliding. The 3D motion of filaments (bending, plectoneme formation) is fascinating and can have important consequences for collective behaviour and macroscopic structure formation. As a consequence of cellular coupling, however, it is beyond the scope of the present paper.

      Please also see our response above. We believe that the detailed analysis of buckling and plectoneme formation requires (and merits) dedicated experiments and modelling which go beyond the focus of the current study (on cellular coordination) and will constitute a separate analysis that stands on its own. We are currently working in that direction.

      Future directions:

      The study highlights the need to identify molecular and mechanical signaling pathways of cellular coordination. In analogy to the many works on the mechanisms and functions of multi-ciliary coordination, elucidating coordination in cyanobacteria may reveal a variety of dynamic strategies in different filamentous cyanobacteria.

      We thank the reviewer for highlighting this point again and seeing the value in combining molecular and dynamical approaches.

      Reviewer #3 (Public review):

      Summary:

      The authors present new observations related to the gliding motility of the multicellular filamentous cyanobacteria Fluctiforma draycotensis. The bacteria move forward by rotating their about their long axis, which causes points on the cell surface to move along helical paths. As filaments glide forward they form visible tracks. Filaments preferentially move within the tracks. The authors devise a simple model in which each cell in a filament exerts a force that either pushes forward or backwards. Mechanical interactions between cells cause neighboring cells to align the forces they exert. The model qualitatively reproduces the tendency of filaments to move in a concerted direction and reverse at the end of tracks.

      We thank the reviewer for this accurate summary of the presented work.

      Strengths:

      The observations of the helical motion of the filament are compelling.

      The biophysical model used to describe cell-cell coordination of locomotion is clear and reasonable. The qualitative consistency between theory and observation suggests that this model captures some essential qualities of the true system.

      The authors suggest that molecular studies should be directly coupled to the analysis and modeling of motion. I agree.

      We thank the reviewer for listing these positive aspects of the presented work and highlighting the need for combining molecular and biophysical approaches.

      Weaknesses:

      There is very little quantitative comparison between theory and experiment. It seems plausible that mechanisms other than mechano-sensing could lead to equations similar to those in the proposed model. As there is no comparison of model parameters to measurements or similar experiments, it is not certain that the mechanisms proposed here are an accurate description of reality. Rather the model appears to be a promising hypothesis.

      We agree with the referee that the model we put forward is one of several possible. We note, however, that the assumption of mechanosensing by each cell - as done in this model - results in capturing both the alignment of cells within a filament (with some flexibility) and reversal dynamics. We have explored an even more minimal 1D model, where the cell’s direction of force generation is treated as an Ising-like spin and coupled between nearest neighbours (without assuming any specific physico-chemical basis). We found that this model was not fully able to capture both phenomena. In that model, we found that alignment required high levels of coupling (which is hard to justify except for mechanical coupling) and reversals were not readily explainable (and required additional assumptions). These points led us to the current, mechanically motivated model.

      The parameterisation of the current model would require measuring cellular forces. To this end, a recent study has attempted to measure some of the physical parameters in a different filamentous cyanobacteria [1] and in our revision we will re-evaluate model parameters and dynamics in light of that study. We will also attempt to directly verify the presence of mechano-sensing by obstructing the movement of filaments.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      The authors present valuable findings on how to determine the genetic architecture of extreme phenotype values by using data on sibling pairs. While the authors' derivations of the method are correct, the scenarios considered are incomplete, making it difficult to have confidence in the interpretation of the results as demonstrating the influence of de-novo or Mendelian (rare, penetrant-variant) architectures. The method nevertheless shows promise and will be of interest to researchers studying complex trait genetics. 

      A.1: We have now expanded our consideration of the scenarios and we have ensured that we do not over-interpret our results as being due to de novo or Mendelian architectures. Instead, we make clear that our statistical tests are powered to identify these architectures but that there are other potential causes of significant results (e.g. measurement error or uncontrolled environmental factors from heavy-tailed distributions), making follow-up validation studies necessary before underlying architectures can be confirmed. We consider this to be typical of observational research, in which significant results may indicate causal effects unless uncontrolled confounding factors explain the observed associations, requiring experimental/trial follow-up for validation. We believe that our tests are useful for providing initial inference, and that in some settings – e.g. prioritising samples for sequencing to identify rare variants – could be useful as an initial screening step to increase the efficacy of a planned analysis or study.

      Additionally, we have now developed “SibArc”, an openly available software tool that takes input sibling trait data and estimates conditional sibling heritability across the trait distribution. Then - based on our theoretical framework developed and described in the paper - for each tail of the trait distribution, estimates effect sizes and generates P-values corresponding to our de novo and Mendelian tests, and performs a Kolmogorov-Smirnov test to identify general departures from our null model. Furthermore, SibArc also provides additional functionality for users under preliminary beta form, for example, running an iterative optimisation routine to infer approximate relative degrees of polygenic, de novo, and Mendelian architectures prevailing in each trait tail. We have made this software tool, Quick Start tutorial, and sample data available online at Github and are hosting these on a dedicated website: www.sibarc.net.

      Reviewer #1 (Public Review):

      This is a clever and well-done paper that should be published. The authors sought to craft a method, applicable to biobank-scale data but without necessarily using genotyping or sequencing, to detect the presence of de novo mutations and rare variants that stand out from the polygenic background of a given trait. Their method depends essentially on sibling pairs where one sibling is in an extreme tail of the phenotypic distribution and whether the other sibling's regression to the mean shows a systematic deviation from what is expected under a simple polygenic architecture. 

      Their method is successful in that it builds on a compelling intuition, rests on a rigorous derivation, and seems to show reasonable statistical power in the UK Biobank. (More biobanks of this size will probably become available in the near future.)  It is somewhat unsuccessful in that rejection of the null hypothesis does not necessarily point to the favored hypothesis of de novo or rare variants. The authors discuss the alternative possibility of rare environmental events of large effect. Maybe attention should be drawn to this in the abstract or the introduction of the paper. Nevertheless, since either of these possibilities is interesting, the method remains valuable. 

      A.2: We agree with the reviewer that we should have made it clearer that - while our statistical tests are powered to identify de novo and Mendelian architectures – significant findings from our tests could also be explained by rare environmental events of large effect (specifically by uncontrolled environmental factors with heavy-tailed distributions). We have now made this clear throughout the manuscript (see A.1).

      Moreover, we agree with the reviewer that whether the cause of deviations from expectations are due to de novo or rare variants, or environmental factors, either possibility is interesting. For example, in either scenario, our results can highlight inaccuracy in PRS prediction of extreme trait values for certain traits, and also provides a relative measure across different traits of large effects impacting on the trait tails, irrespective of whether genetic or environmental. We now place more emphasis on this point throughout the manuscript.

      Reviewer #2 (Public Review):

      Souaiaia et al. attempt to use sibling phenotype data to infer aspects of genetic architecture affecting the extremes of the trait distribution. They do this by considering deviations from the expected joint distribution of siblings' phenotypes under the standard additive genetic model, which forms their null model. They ascribe excess similarity compared to the null as due to rare variants shared between siblings (which they term 'Mendelian') and excess dissimilarity as due to de-novo variants. While this is a nice idea, there can be many explanations for rejection of their null model, which clouds interpretation of Souaiaia et al.'s empirical results.

      A.3: We agree with the reviewer that we should have made clearer that there are other explanations for significant results from our tests and we have now fully addressed this point – (see A.1, A.2, A.4, A.5 for more detail).  In addition, we now elaborate on exactly what our null hypothesis is: which is not only that the expected joint distribution of siblings’ phenotypes is governed by the standard additive genetic model, but that environmental effects are either controlled for or else their combined effect is approximately Gaussian. Furthermore, by selecting only those traits whose raw trait distribution most closely corresponds to a Gaussian distribution from the UK Biobank, we increase the probability that significant results from our tests are due to rare variants (shared or unshared among siblings).

      The authors present their method as detecting aspects of genetic architecture affecting the extremes of the trait distribution. However, I think it would be better to characterize the method as detecting whether siblings are more or less likely to be aggregated in the extremes of the phenotype distribution than would be predicted under a common variant, additive genetic model.

      A.4: As discussed above we should have stated more clearly that significant results could be due to non-genetic factors, we have now addressed this.

      However, we do not think that it would be appropriate to characterise our tests as merely corresponding to over and under aggregation of siblings in the tails. Firstly, environmental factors should be controlled for as part of our testing, increasing the probability that significant results are due to genetic, and not environmental factors. Secondly, tests for identifying broad over and under aggregation of siblings in the tails should be designed differently and, accordingly, the tests that we have developed here would not be optimal to detect over/under aggregation of siblings in trait tails. Our test for inference of de novo variants, for example, exploits the fact that de novo alleles of large effect result in one sibling being extreme and all others being drawn from the background distribution, so that the mean of other siblings is relatively low – not merely that other siblings are less likely to be found in the tail. For more discussion on this issue in relation to one of reviewer 1’s points, see A.9.

      Exactly how the rareness and penetrance of a genetic variant influence the conditional sibling phenotype distribution at the extremes is not made clear. The contrast between de-novo and 'Mendelian' architectures is somewhat odd since these are highly related phenomena: a 'Mendelian' architecture could be due to a de-novo variant of the previous generation. The fact that these two phenomena are surmised to give opposing signatures in the authors' statistical tests seems suboptimal to me: would it not be better to specify a parameter that characterizes the degree or sharing between siblings of rare factors of large effect? This could be related to the mixture components in the bimodal distribution displayed in Fig 1. In fact, won't the extremes of all phenotypes be influenced by all three types of variants (common, rare, de-novo) to greater or lesser degree? By framing the problem as a hypothesis testing problem, I think the authors are obscuring the fact that the extremes of real phenotypes likely reflect a mixture of causes: common, de-novo, and rare variants (and shared and non-shared environmental factors). 

      A.5: We absolutely recognise that there will typically be a complex and continuous mix of genetic architectures underlying complex traits in their tails, dictated by the 2-dimensional relationship between allele frequency and effect size. We did consider developing a fully Bayesian statistical framework to model this, but soon realised that doing this properly would require a substantial amount of model development, accounting for multiple factors in ways that would require a great deal of further investigation; for example, performing a range of complex simulations to investigate the effects of different selective pressures over time, different patterns of assortative mating, and effect size generating distributions. We are in the process of applying for funding for a multi-year project that will perform exactly these investigations as a step towards developing more sophisticated models of inference. In the meantime, we do believe that the simpler hypothesis-testing framework that we have developed here does have important value. Assuming that environmental factors are accounted for, or that any that are not accounted for have combined Gaussian effects, then our tests will indeed infer enrichments of de novo and ‘Mendelian’ rare alleles of large effect in the tails of complex traits. Results from these tests can also be compared within and across traits to compare the relative degree of such enrichments among traits. For some traits we observe significant results from both tests, and for other traits we observe highly significant results from one of our tests but not the other. Thus, while our tests do not provide a complete picture about the genetic architecture in the tails of complex traits, they do offer some intriguing initial insights into tail architecture, important given the enrichment of disease in trait tails.

      To better enable interpretation of the results of this method, a more comprehensive set of simulations is needed. Factors that may influence the conditional distribution of siblings' phenotypes beyond those considered include: non-normal distribution, assortative mating, shared environment, interactions between genetic and shared environmental factors, and genetic interactions. 

      A.6: As described above (see A.5) we do agree that a more comprehensive set of simulations is exactly what is needed to further extend this work. However, we believe that the tests that we have developed so far, which make some simplifying assumptions that we think would often hold in practice, is a useful start to what is an entirely novel approach to inferring genetic architecture from family trait-only (non-genetic) data. Our work could already be useful for method developers who may wish to extend our approach in ways that we may not think of. It could also be useful for applied scientists focusing on specific traits who will be able to gain initial, inference-level, insights by applying our tests to their data, while the results of applying our tests may even guide study design of rare variant mapping studies.

      In summary, I think this is a promising method that is revealing something interesting about extreme values of phenotypes. Determining exactly what is being revealed is going to take a lot more work, however. 

      A.7: We thank the reviewer for highlighting the promise in our approach and agree that it is revealing something interesting about complex traits. We also agree that it is going to take a lot more work to reveal exactly what that is for different traits, which we plan to work on ourselves and hope that this paper will help other interested scientists to follow-up on and extend as well.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      R.1.1: Why these particular traits (body fat, mean corpuscular haemoglobin, neuroticism, heel bone mineral density, monocyte count, sitting height)? 

      A.8: Traits were initially selected to cover a variety of traits (anthropometric, metabolic, personality..) and to illustrate different examples of tail architecture. However, in response to a point from reviewer 2 (see A.17), we have now overhauled our quality control of traits to ensure that only traits closely matching Gaussian distributions are included. In total, 18 traits were selected, with detailed results presented in Appendix 4 and results corresponding to 6 of the traits presented in the main text (Figure 6) to show examples of different types of tail architecture.

      R.1.2: Why are there separate tests for de novo and Mendelian architectures? It seems that one could use either of the derived tests for both purposes, simply by switching to a two-sided test for each tail. My guess is that the score test of whether alpha is zero would be the more statistically powerful test. 

      A.9: The score test of whether alpha is zero has limited power to detect Mendelian architectures. This is because under Mendelian effects, half the siblings in a family have trait values reflecting the background distribution, such that the mean of sibling trait values is not so different from the polygenic expectation (i.e. alpha close to 0). The Mendelian score test that we developed is substantially more powerful because it evaluates co-occurrence of siblings in the tails, which is far higher under Mendelian architecture in the tail than compared to polygenic architecture.

      However, in order test for general departures from our null model, including those of non-Gaussian environmental factors, we now include results from performing a Kolmogorov-Smirnoff test of difference from the expected distribution, and also provide this test as an option in our ‘SibArc’ software tool.

      R.1.3: This method assumes that assortative mating is absent. I worry that sitting height might not be a good trait to analyze, since there is some assortative mating (~0.3) for height (e.g., Yengo et al., 2018). Perhaps this trait should not be included among those that are analyzed in this paper. Then again, it is possible that there is less assortative mating for sitting height than total height (i.e., leg length) (Jensen & Sinha, 1993). 

      A.10:  It is true that our method assumes random mating. We note that while  assortative mating increases sibling similarity relative to expectation, if it is stable across the trait distribution it will also bias heritability estimation upward which is likely it’s potential impact in our framework.  However, if assortative mating is more prevalent in the tails of the distribution, it can result in excess kurtosis – an impact that can increase false positive Mendelian tests and false negative de novo tests.  Given that the trait distribution for Sitting Height has only moderate excess Kurtosis (~0.4, see Fig 9, Appendix 4) and we inferred de novo architecture only for this trait, we feel that including it in the paper is appropriate. 

      R.1.4: I wonder if it's possible to discuss the impact of non-additive genetic variance on the method. How does this affect the estimation of heritability, which calibrates the expectation for regression to the mean? Can non-additive genetic deviations explain a rejection of the null hypothesis of simple polygenicity? 

      A.11: Yes, the heritability estimation, which calibrates expectation for regression to the mean, assumes additivity of effects, as do the most popular estimators of heritability from GWAS data in the field: GCTA-GREML, LD Score regression and LDAK. Accordingly, non-additive genetic effects could result in rejection of the null hypothesis. We have highlighted this point in the Discussion. However, we also point out that current evidence suggests that the contribution of non-additive genetic effects to complex trait variation is relatively small (Hivert 2021) and that non-additive genetic effects that have a similar impact across the trait distribution should not be a problem for our approach (only those that have an increasing effect towards the tails would be).

      R.1.5: p.5: Maybe a more realistic way to simulate a genetic architecture is to draw the MAF from the distribution [MAF(1 - MAF)]^{-1} and then an effect of the minor allele from some mound-shaped distribution (e.g., mixture of normals). The absolute or squared effect of the minor allele should increases as the MAF decreases, and there have been some papers trying to estimate this relationship (e.g., Zeng et al., 2021). Maybe make the number of causal SNPs 10,000. I don't rate this as an urgent suggestion because my sense is that the method should be robust, making adequate even a fairly minimal simulation confirming its accuracy. 

      A.11: In separate work, we have performed a comprehensive simulation study using the forward-in-time population genetic simulator SLIM-3 (Haller and Messer, 2019), which generates genetic effects according to Gaussian and Gamma distributions and models different selective pressures on complex traits. We plan to publish this work shortly and also extend the simulations to family data, from which we will be able to test the performance of our methods here under a range of different scenarios of genetic variation generation, including a variety of relationships between allele frequency and effect sizes. We agree with the reviewer that at this point, however, our minimal simulation should be sufficient to confirm our tests’ general robustness and so we will perform further testing once we have extended our more sophisticated simulation study.

      R.1.6: p.6: Step D seems to leave out a normalization of G to have unit variance. Also, the last part should say "the square of the correlation between the genetic liability and the trait is equal to the heritability." 

      A.12: Corrected – we thank the reviewer for spotting this.

      R.1.7: Figure 5: The power being adequate if roughly 1 of a 1000 index siblings with an extreme trait value owes their values to de novo mutations makes me think that there should be a discussion of the prior probability. The average person carries about 80 de novo mutations. How many of these are likely to affect, e.g., height? Zeng et al. (2021) gave estimates of mutational targets. Given that a mutation affects height, will its likely effect size be large enough to be detected with the method? Kemper et al. (2012) discussed this point in a perhaps useful way. 

      A.13: We find the work investigating mutational target sizes and generating effect sizes of different mutations (de novo or rare) to be extremely interesting and critical for understanding the causes of observed genetic variation. However, we think that this work is insufficiently progressed at this point to build on directly here for making more nuanced interpretation of our results. We are, however, exploring the impact of mutational target sizes, effect size distributions and selection effects, on the genetic architecture of complex traits via population genetic simulations (see A.11), and so we hope to be able to provide more in-depth interpretation of our results in the future.

      R.1.8: Figure 6: The number in the tables for Mendelian architecture are presumably observed and expected counts. But what about the numbers for de novo architecture? Those don't look like counts. Maybe they are conditional expectations of standardized trait values. Whatever the case may be, the caption should provide an explanation. 

      A.14: The observed and expected values for the de novo statistical test represent the expected and observed mean standardized trait values for siblings of individuals in the bottom and top 1% of the distribution. We have now made this clear in our updated figure.

      R.1.9: p. 16: Element (2,1) in the precision matrix after Equation 15 is missing a negative sign. 

      A.15: Corrected – we thank the reviewer for spotting this.

      R.1.10: p. 20: Shouldn't Equation 20 place an exponent of n on the factor outside of the exponential? 

      A.16: Corrected – we thank the reviewer for spotting this.

      Reviewer #2 (Recommendations For The Authors):

      R.2.1: The first concern that I have is that their statistical tests rely heavily on an assumption of bivariate normal distribution for sibling pair's phenotypes. Real phenotypes do not have such a distribution in general. The authors rely upon an inverse-normal transform when applying their method to real data. While the inverse-normal transform will ensure that the siblings' phenotypes have a marginal normal distribution, such a transform does not ensure that the joint distribution is bivariate normal. The authors should examine their procedure for simulated phenotypes with a non-normal distribution to see if their statistical tests remain properly calibrated. Related to this, I am concerned about applying an inverse normal transform to the neuroticism phenotype that contains only 13 unique values in UKB. How does the transform deal with tied values? Can we sensibly talk about extreme trait values for such a set of observations? 

      A.17: The reviewer is correct that a bivariate normal distribution for sibling pairs’ trait values does not necessarily hold, and only does so if the assumptions of our null model are met (polygenic effects, Gaussian environmental effects, random mating..). We have now more clearly described the assumptions of our null model, and to increase the matching of our selected traits to those assumptions we have expanded our analyses and now present results on traits that are close to Gaussian. As part of this more strict quality control, only traits with more than 50 unique values are included, meaning that neuroticism is excluded in our final analysis. We also now note that performing an inverse normal transformation on the traits only increases the robustness of the tests to some of our modelling assumptions. In future work we plan to investigate how best to model the conditional sibling distribution under a variety of non-Gaussian environmental effects and different non-random patterns of mating.

      R.2.2: The joint sibling phenotype distribution (Equation 4) can be derived by applying the formula for the conditional distribution of a multivariate Gaussian to the standard additive genetic model. The authors' derivation is unnecessarily complex. Furthermore, many of the formulae have been used in Shai Carmi's work on embryo screening, but this work is not cited. 

      A.18: We now state in the text that the conditional sibling distribution can also be derived from the joint trait distribution of related individuals, which we use in our extension to the 3-sibling scenario, and cite Shai Carmi’s work where this is used. The joint distribution is a more straightforward way to derive the conditional sibling distribution, but our derivation based on considering mid-parents is generalisable to cases where assumptions of random mating, Gaussian population trait distribution and no selection do not hold. We also think that our mid-parent based derivation will be more intuitive to many readers, leading to greater understanding and potential for extension. Therefore, overall we believe that its presentation is worthwhile and we have now elaborated on this in the Methods.

      R.2.3: Equation 8: this probability should be conditional on s1 

      A.19: Corrected – we thank the reviewer for spotting this.

      R.2.4: The empirical application to UKB data is lacking methodological details. Also, the number of siblings used is low compared to the number of available sibling pairs. Around 19k sibling pairs are available in the UKB white British subsample, but only 10k were used for height. Why? Also, why are extreme values excluded? Isn't this removing the signal the authors are looking to explain?

      A.20: We have now provided more methodological details throughout the Methods section, in particular in relation to the samples used and quality control performed. The removal of individuals with extreme values, in particular, is because unusually low/high trait values are more likely to be due to measurement error (e.g. due to imperfect measuring device, or storage/assaying) than for typical values, and so while this may also result in some loss in power (albeit small due to few individuals having values +/- 8 s.d. trait means) we consider it worth it for the potential reduction in type I error. In performing our newly expanded analysis (described above), and accounting for the reviewer’s point here about sample size, we did find a bug in our pipeline that meant that we did not include as many sibling pairs as available. We thank the reviewer for spotting this, since this contributed to our new analysis being substantially more powerful than the original (including up to ~17k sibling pairs depending on completeness of trait data).

      Benjamin C Haller, Phillip W Messer. SLiM 3: Forward Genetic Simulations Beyond the Wright–Fisher Model. Molecular Biology and Evolution. 2019. 36(3): 632-637.

      SD Whiteman, SM McHale, A Soli. Theoretical Perspectives on Sibling Relationships. J Fam Theory Rev. 2011 Jun 1;3(2):124-139.

      Nicholas H Barton, Alison M Etheridge, and Amandine Véber. The infinitesimal model: Definition, derivation, and implications. Theoretical population biology, 118:50–73, 2017.

      Valentin Hivert et al. “Estimation of non-additive genetic variance in human complex traits from a large sample of unrelated individuals.” American journal of human genetics vol. 108,5 (2021)

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors demonstrate that it is possible to carry out eQTL experiments for the model eukaryote S. cerevisiae, in "one pot" preparations, by using single-cell sequencing technologies to simultaneously genotype and measure expression. This is a very appealing approach for investigators studying genetic variation in single-celled and other microbial systems, and will likely inspire similar approaches in non-microbial systems where comparable cell mixtures of genetically heterogeneous individuals could be achieved.

      Strengths:

      While eQTL experiments have been done for nearly two decades (the corresponding author's lab are pioneers in this field), this single-cell approach creates the possibility for new insights about cell biology that would be extremely challenging to infer using bulk sequencing approaches. The major motivating application shown here is to discover cell occupancy QTL, i.e. loci where genetic variation contributes to differences in the relative occupancy of different cell cycle stages. The authors dissect and validate one such cell cycle occupancy QTL, involving the gene GPA1, a G-protein subunit that plays a role in regulating the mating response MAPK pathway. They show that variation at GPA1 is associated with proportional differences in the fraction of cells in the G1 stage of the cell cycle. Furthermore, they show that this bias is associated with differences in mating efficiency.

      Weaknesses:

      While the experimental validation of the role of GPA1 variation is well done, the novel cell cycle occupancy QTL aspect of the study is somewhat underexploited. The cell occupancy QTLs that are mentioned all involve loci that the authors have identified in prior studies that involved the same yeast crosses used here. It would be interesting to know what new insights, besides the "usual suspects", the analysis reveals. For example, in Cross B there is another large effect cell occupancy QTL on Chr XI that affects the G1/S stage. What candidate genes and alleles are at this locus? And since cell cycle stages are not biologically independent (a delay in G1, could have a knock-on effect on the frequency of cells with that genotype in G1/S), it would seem important to consider the set of QTLs in concert.

      We thank the reviewer for this suggested clarification. We have modified the text to make it clear that cell cycle occupancy is a compositional phenotype. Like the reviewer, we also noticed the distal trans eQTL hotspot on Chr XI in Cross B, but we were not able to identify compelling candidate gene(s) or variant(s) despite extensive effort.

      Reviewer #2 (Public Review):

      Boocock and colleagues present an approach whereby eQTL analysis can be carried out by scRNA-Seq alone, in a one-pot-shot experiment, due to genotypes being able to be inferred from SNPs identified in RNA-Seq reads. This approach obviates the need to isolate individual spores, genotype them separately by low-coverage sequencing, and then perform RNA-Seq on each spore separately. This is a substantial advance and opens up the possibility to straightforwardly identify eQTLs over many conditions in a cost-efficient manner. Overall, I found the paper to be well-written and well-motivated, and have no issues with either the methodological/analytical approach (though eQTL analysis is not my expertise), or with the manuscript's conclusions.

      I do have several questions/comments.

      393 segregant experiment:

      For the experiment with the 393 previously genotyped segregants, did the authors examine whether averaging the expression by genotype for single cells gave expression profiles similar to the bulk RNA-Seq data generated from those genotypes? Also, is it possible (and maybe not, due to the asynchronous nature of the cell culture) to use the expression data to aid in genotyping for those cells whose genotypes are ambiguous? I presume it might be if one has a sufficient number of cells for each genotype, though, for the subsequent one-pot experiments, this is a moot point.

      As mentioned in our preliminary response, while it is possible to expand the analysis along these lines, this is not relevant for the subsequent one-pot experiments. We have made all the data available so that anyone interested can try these analyses.

      Figure 1B:

      Is UMAP necessary to observe an ellipse/circle - I wouldn't be surprised if a simple PCA would have sufficed, and given the current discussion about whether UMAP is ever appropriate for interpreting scRNA-Seq (or ancestry) data, it seems the PCA would be a preferable approach. I would expect that the periodic elements are contained in 2 of the first 3 principal components. Also, it would be nice if there were a supplementary figure similar to Figure 4 of Macosko et al (PMID 26000488) to indeed show the cell cycle dependent expression.

      We have added two new figures (S2 and S3) that represent alternative visualizations of the cell-cycle that are not dependent on UMAP. Figure S2 shows plots of different pairs of principal components, with each cell colored by its assigned cell-cycle stage. We do not observe a periodic pattern in the first 3 principal components as the reviewer expected, but when we explore the first 6 principal components, we see combinations of components that clearly separate the cell cycle clusters. We emphasize that the clusters were generated using the Louvain algorithm and assigned to cell-cycle stages using marker genes, and that UMAP was used only for visualization.

      We could not create a figure similar to Macosko et al. because of differences between the cell cycle categories we used and those of Spellman et al (PMID 9843569). We instead created Figure S3 to address the reviewer's comment. This figure uses a heatmap in a style similar to that of Macosko et al. to display cell-cycle-dependent expression of the 22 genes we used as cell cycle markers across each of the five cell cycle stages (M/G1, G1, G1/S, S, G2/M).

      We have renumbered the supplementary figures after incorporating these two additional supplementary figures into the manuscript.

      Aging, growth rate, and bet-hedging:

      The mention of bet-hedging reminded me of Levy et al (PMID 22589700), where they saw that Tsl1 expression changed as cells aged and that this impacted a cell's ability to survive heat stress. This bet-hedging strategy meant that the older, slower-growing cells were more likely to survive, so I wondered a couple of things. It is possible from single-cell data to identify either an aging, or a growth rate signature? A number of papers from David Botstein's group culminated in a paper that showed that they could use a gene expression signature to predict instantaneous growth rate (PMID 19119411) and I wondered if a) this is possible from single-cell data, and b) whether in the slower growing cells, they see markers of aging, whether these two signatures might impact the ability to detect eQTLs, and if they are detected, whether they could in some way be accounted for to improve detection.

      As mentioned in our preliminary response, we are not sure how to look for gene expression signatures of aging in yeast scRNA-seq data. We believe that the proposed analyses are beyond the scope of the current paper. As noted above, we have made all the data available so that anyone interested can explore these hypotheses.

      AIL vs. F2 segregants:

      I'm curious if the authors have given thought to the trade-offs of developing advanced intercross lines for scRNA-Seq eQTL analysis. My impression is that AIL provides better mapping resolution, but at the expense of having to generate the lines. It might be useful to see some discussion on that.

      We thank the reviewer for the comments. We believe that a discussion of trade-offs between different approaches for constructing mapping populations, such as AIL and F2 segregants, is beyond the scope of this paper.

      10x vs SPLit-Seq

      10x is a well established, but fairly expensive approach for scRNA-Seq - I wondered how the cost of the 10x approach compares to the previously used approach of genotyping segregants and performing bulk RNA-Seq, and how those costs would change if one used SPLiT-Seq (see PMID 38282330).

      We thank the reviewer for the comments. We believe that a discussion of cost trade-offs between 10x and other approaches is beyond the scope of this paper, especially given the rapidly evolving costs of different technologies.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Throughout the results section the authors point to File S1 for additional information. This file is a tarball with about 20 Excel documents in it, each with several sheets embedded. The authors should provide a detailed README describing how to understand the organizations of the files in File S1 and the many embedded sheets in each file. Statements made in the manuscript about File S1 should explicitly direct the reader to a specific spreadsheet and table to refer to.

      We have added an additional README file to the tarball that explains the organization of File S1 and describes the data contained in each sheet. Throughout the text, we now reference specific spreadsheets to assist the reader. In addition, these spreadsheets have been added to a github repository https://github.com/theboocock/finemapping_spreadsheets_single_cell

      Neither of the two GitHub repositories referenced under "Code availability" has adequate documentation that would allow a reader to try and reproduce the analyses presented here. The one entitled https://github.com/joshsbloom/single_cell_eQTL has no functional README, while https://github.com/theboocock/yeast_single_cell_post_analysis is somewhat better but still hard to navigate. Basic information on expected inputs, file formats, file organization, output types, and formats, etc. is required to get any of these pipelines to run and should be provided at a minimum.

      We thank the reviewer for the comment. In response, we have refactored both GitHub repositories and added extensive documentation to improve usability. We updated the versions of software and packages, this has been reflected in the methods section.

      S. cerevisiae strains are preferentially diploid in nature and many genes involved in the mating pathway are differentially regulated in diploids vs haploids. Have the authors explored the fitness effects of the GPA1 82R allele in diploids? What is the dominance relationship between 82W and 82R?

      We thank the reviewer for the comment. In diploid yeast, the mating pathway is repressed, and thus we would not expect there to be any fitness consequences due to the presence of different alleles of GPA1.

      The diploid expression profiling (page 5 and Table S9) doesn't implicate GPA1; can you the authors comment on this in light of their finding in haploids?

      The mating pathway, including GPA1, is repressed in diploids, and hence the expression of GPA1 cannot be studied in these strains (PMID: 3113739). In addition, allele-specific expression differences only identify cis-regulatory effects. We know that the GPA1 variant results in a protein-coding change, which may or may not influence the levels of mRNA in cis, so that even if GPA1 were expressed in diploids, there would be no expectation of an allele-specific difference in expression.

      With respect to the candidate CYR1 QTL -- note that strains with compromised Cyr1 function also generally show increased sporulation rates and/or sporulation in rich media conditions (cAMP-PKA signaling represses sporulation). Is this the case in diploids with the CBS2888 allele at CYR1? If the CBS2888 allele is a CYR1 defect one might expect reduced cAMP levels. It is possible to estimate adenylate cyclase levels using a fairly straightforward ELISA assay. This would provide more convincing evidence of the causal mechanism of the alleles identified.

      We thank the reviewer for the comment, and we agree that a functional study of the CYR1 alleles would provide more convincing evidence for the causal mechanism of the connection between cell cycle occupancy, cAMP levels, and growth. However, we believe that the proposed experiments are beyond the scope of our current study. The evidence we provide is sufficient to establish that CYR1 is a strong candidate gene for the eQTL hotspot.

      Re: CYR1 candidate QTL -- The authors should reference the work of [Patrick Van Dijck] (https://pubmed.ncbi.nlm.nih.gov/?sort=date&term=Van+Dijck+P&cauthor_id= 20924200) and [Johan M Thevelein] (https://pubmed.ncbi.nlm.nih.gov/?sort=date&term=Thevelein+JM&cauth or_id=20924200) on CYR1 allelic variation, and other papers besides the Matsumoto/ Ishikawa papers, as the effects of cAMP-PKA signaling on stress can be quite variable. cAMP pathway variants, including in CYR1, have popped up in quite a few other yeast QTL mapping and experimental evolution papers. These should be referenced as well.

      We thank the reviewer for these references; we have added a comment about the relationship between stress tolerance and CYR1 variation, and cited the relevant references accordingly.

      Figure S10 - the subfigure showing the frequency of the GPA 82R compared to 82W suggests a fairly large and deleterious fitness effect of this allele; on the order of 7-8% fewer cells per cell cycle stage than the 82W allele. Can the authors reconcile this with the more modest growth rate effect they report on page 8?

      Figure S12C displays the allele frequency of the 82R allele across the cell cycle in the single-cell data from allele-replacement strains. These strains were grown separately and processed using two individual 10x chromium runs. The resulting sequenced library had 11,695 cells with the 82R allele and 14,894 cells with the 82W allele. The 7-8% difference in the number of cells is due to slight differences in the number of captured cells per run, not due to growth differences, because we attempted to pool cells in equal numbers from separate mid-log cultures.

      The proportion of cells in G1 increases by ~3% in strains with the 82R allele relative to the baseline proportion of cells in the experiment, which, to the reviewers point, is still larger than the ~1% growth difference we observed. Cell cycle occupancy is a compositional phenotype. As shown in figure S12C, the 82R variant increases the fraction of cells in G1 and slightly decreases the fraction of cells in M/G1. There is no obvious expectation for quantitatively translating a change in cell cycle occupancy to a change in growth rate.

      The authors refer to the Lang et al. 2009 paper w/respect to GPA1 variant S469I but that paper seems to have explored a different GPA1 allele, GPA1-G1406T, with respect to growth rates.

      We thank the reviewer for their comment. The S469I variant is the same as the G1406T variant, one denoting the amino acid change at position 469 in the protein and the other denoting the corresponding nucleotide change at position 1406 in the DNA coding sequence. We have altered the text to make this clear to the reader.

      Reviewer #2 (Recommendations For The Authors):

      I make no recommendations as to additional work for the authors. The manuscript is complete. I suggested some things I would like to see in my review, but it's up to them to decide whether they think any of those would further enhance the manuscript.

      However, I do have I have some pedantic formatting notes:

      - Microliters are variously presented as uL, ul, and µl - it should be µL

      - Similarly, milliliters are presented as ml and ML - it should be mL

      - Also, there should be a space between the number and the unit, e.g. 10 µL

      - Some gene names in the manuscript are not italicized in all instances, e.g., GPA1

      We thank the reviewer for these formatting suggestions, we have made these changes throughout the text.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Response to Public Reviews:

      We thank the reviewers for their kind comments have implemented many of the suggestion their suggestions. Our paper has greatly benefited from their advice.  Like Reviewer 1, we acknowledge that while the exact involvement of Ih in allowing smooth transitions is likely not universal across all systems, our demonstration of the ways in which such currents can affect the dynamics of the response of complex rhythmic motor networks provides valuable insight. To address the concerns of Reviewer 2, we included a sentence in the discussion to highlight the fact that cesium neither increased the pyloric frequency nor caused consistent depolarization in intracellular recordings. We also highlighted that these observations suggest both that cesium is not indirectly raising [K+]outside and support the conclusion that the effects of cesium are primarily through blockade of Ih rather than other potassium channels.

      Reviewer 3 raised some important points about modeling. While the lab has models that explore the effects of temperature on artificial triphasic rhythms, these models do not account for all the biophysical nuances of the full biological system. We have limited data about the exact nature of temperature-induced parameter changes and the extent to which these changes are mediated by intrinsic effects of temperature on protein structure versus protein interactions/modification by processes such as phosphorylation. With respects to the A current, Tang et al., 2010 reported that the activation and inactivation rates are differentially temperature sensitive but we do not have the data to suggest whether or not the time courses of such sensitivities are different. As such, we focus our discussion on the properties we know are modulated by temperature, i.e. activation rates. Within the discussion we now include the suggestion that future, more comprehensive modeling may be appropriate to further elucidate the ways in which reducing Ih may produce the here reported experimentally observed effects.

      Reviewer #1 (Recommendations For The Authors):

      Suggested revisions:

      A figure showing examples of the voltage-clamp traces for the critical measurements of the extent of Ih block by 5 mM CsCl in PD and LP neurons at the temperature extremes in these preparations is not shown, and the authors should consider including such a figure, perhaps as a supplemental figure.

      We have added Supplemental Figure 1 containing voltage-clamp traces demonstrating the extent of Ih block by 5mM CsCl in PD and LP neurons at 11 and 21°C.  Due to technical concerns, different preparations were used in the measurements at 11°C and 21°C, but the point that the H-current is reduced is demonstrated in all cases.

      Reviewer #2 (Recommendations for The Authors):

      Specific (Minor) Comments:

      (1) Line 83: In Cs+ "at 11°C, the pyloric frequency was significantly decreased compared to control conditions (Saline: 1.2± 0.2 Hz; Cs+ 0.9± 0.2 Hz)".

      As above, the authors often report that cesium generally reduces pyloric frequency. Figure 5A demonstrates this action quite nicely. However, cesium's effect on pyloric frequency at 11°C seems less robust in Figure 1C. Why the discrepancy?

      There is variability in the effects of Cs+ on the pyloric frequency.  As noted, the standard deviation in frequency in both conditions is 0.2Hz.  As such, there are some cases in which the initial frequency drop in Cs+ compared to control was relatively small.  1C is one such case, but was selected as an example because of its clear reduction in temperature sensitivity. 

      (2) I don't understand what the arrows/dashed lines are trying to convey in Figure 3C.

      The arrows/dashed lines represent the criteria used to define a cycle as “decreasing in frequency” (Temperature Increasing) or “increasing in frequency” (Temperature Stable).  We have amended lines 130 and 137 in the text to hopefully clarify this point, as well as the figure legend.

      (3) Lines 118/168. The description of cesium's specific action on the depolarizing portion of PD activity is a bit confusing. In my mind, "depolarization phase" refers to the point at which PD is most depolarized. Perhaps restating the phrase to "elongation of the depolarizing trajectory" is less confusing. The authors may also want to consider labeling this trajectory in Figure 2C.

      We have changed “depolarization phase” to “depolarizing phase” to highlight that this is the period during which the cell is depolarizing, rather than at its most depolarized.  We consider the plateau of the slow wave and spiking (the point at which PD is most depolarized) to be the “bursting phase”.  We have labeled these phases in Figure 2C as suggested.

      (4) Figure 3C legend: a few words seem to be missing. I suggest "the change in mean frequency was more likely TO decrease IN Cs+ than in saline".

      Thank you for catching this typo, it has been corrected.

      (5) Line 165: Awkward phrasing. “In one experiment, the decrease in frequency while temperature increased and subsequent increase in frequency after temperature stabilized was particularly apparent in Cs+ PTX”.

      How about: “One Cs+ PTX experiment wherein elevating the temperature transiently decreased pyloric frequency is shown in Figure 4F.”

      We have amended this sentence to read, “One Cs++PTX experiment in which elevating the temperature produced a particularly pronounced transient decrease in frequency is shown in Figure 4F.”

      (6) Line 186: Awkward phrasing. "LP OFF was also significantly advanced in Cs+, although duty cycle (percent of the period a neuron is firing) was preserved".

      The use of the word "although" seems a bit strange. If both LP onset and LP offset phase advance by the same amount, then isn't an unchanged duty cycle expected?

      “Although” has been changed to “and subsequently”.

      Reviewer #3 (Recommendations For The Authors):

      Major comments:

      (1) I know the Marder lab has detailed models of the pyloric rhythm. I am not saying they have to add modeling to this already extensive and detailed paper, but it would be useful to know how much of these temperature effects have been modeled, for example in the following locations.

      (2) Line 259 - "Mathematically..." - Is there a computational model of H current that has shown this decrease in frequency in pyloric neurons? If you are working on one for the future, you could mention this.

      There is not currently a model in which the reduction of the H-current results in the non-minimum phase dynamics in the frequency response to temperature seen experimentally. It should be noted that our existing models of pyloric activity responses to temperature are not well suited to investigate such dynamics in their current iterations.  Further work is necessary to demonstrate the principles observed experimentally in computational modeling, and we have added a sentence to the paper to reflect this point (Line 268).

      (3) Line 318 - "therefore it remains unclear" - I thought they had models of the circuit rhythmicity. Do these models include temperature effects? Can they comment on whether their models of the circuit show an opposite effect to what they see in the experiment? I'm not saying they have to model these new effects as that is probably an entirely different paper, but it would be interesting to know whether current models show a different effect.

      We have some models of the pyloric response to temperature, but these models were specifically selected to maintain phase across the range of temperature.  When Ih was reduced in these models, a variety of effects on phase and duty cycle were seen.  These models were selected to have the same key features of behavior as the pyloric rhythm, but do not capture all the biophysical nuances of the complete system, and therefore should not necessarily be expected to reflect the experimental findings in their current iterations.  Furthermore, these models are meant to have temperature as a static, rather than dynamic input, and thus are ill-suited to examine the conditions of our experiments.  The models in their current state are not sufficiently relevant to these experimental findings that we they can illuminate the present paper `2.

      (4) "If deinactivation is more accelerated or altered by temperature than inactivation...While temperature continued to change, the difference in parameters would continue to grow" - This is described as a difference in temperature sensitivity, but it seems like it is also a function of the time course of the response to change in temperature (i.e. the different components could have the same final effect of temperature but show a different time course of the change).

      We know from Tang et al, 2010, that activation and inactivation rates of the A current are differentially temperature sensitive. We have no evidence to suggest that the time course of the response to temperature of various parameters differ.  The physical actions of temperature on proteins are likely to be extremely rapid, making a time course difference on the order of tens of seconds less unlikely, though not impossible. Modeling of the biophysics might illuminate the relative plausibility of these different mechanisms of action, but we feel that our current suggested explanation is reasonable based on existing information.

      (5) Is it known how temperature is altering these channel kinetics? Is it via an intrinsic rearrangement of the protein structure, or is it a process that involves phosphorylation (that could explain differences in time course?). Some mention of the mechanism of temperature changes would be useful to readers outside this field.

      It is not known exactly how temperature alters channel parameters.  Invariably some, if not all, of it is due to an intrinsic rearrangement of protein structure, and our current models treat all parameter changes as an instantaneous consequence.  However, it is possible that some effects of temperature are due to longer timescale processes such as phosphorylation or cAMP interactions.  Current work in the lab is actively exploring these questions, but there is no definitive answer. Given that this paper focuses on the phenomenon and plausible biomolecular explanations based on existing data, we have not altered the paper to include more exhaustive  coverage of all the possible avenues by which temperature may alter channel properties.

      Specific comments:

      Title: misspelling of "Cancer" ?

      We are unsure how that extra “w” got into the earliest version of the manuscript and have removed it.

      Line 66 "We used 5mM CsCl" - might mention right up front that this was a bath application of the substance.

      We have altered this line to read “used bath application of 5mM CsCl”.  

      Figure 4 - "The only feedback synapse to the pacemaker kernel neurons, LP to PD, and is blocked by picrotoxin" - I think the word "and" should be removed from this phrase in the figure legend.

      Fixed

      Figure 4 legend - "Reds denote temperature...yellows denote..." - I think it should be "Red dots denote temperature...yellow dots denote...".

      Done

      Figure 4B - Why does the change in frequency in cesium look so different in Figure 4B compared to Figure 1C or Figure 3B? In the earlier figures, the increase of frequency is smaller but still present in cesium, whereas, in Figure 4B, cesium seems to completely block the increase in frequency. I'm not sure why this is different, but I guess it's because 3B and 4B are just mean traces from single experiments. Presumably, 4B is showing an experiment in which the cesium was subsequently combined with picrotoxin?

      Figures 1C, 3B, and 4B are indeed all from different single experiments. As acknowledged in our concluding paragraph, there was substantial variability in the exact response of the pyloric rhythm to temperature while in cesium.  The most consistent effect was that the difference in frequency between cesium and saline at a particular temperature increased, as demonstrated across 21 preparations in Figure 1D. It may be noted in Figure 1E that the Q10 was not infrequently <1, meaning that there was a net decrease in frequency as temperature increased in some experiments such as seen in the example of Figure 4B.  The “fold over” (initial increase in steady-state frequency with temperature, then decrease at higher temperatures) has been observed at higher temperatures (typically around 23-30 degrees C) even under control conditions but has not been highlighted in previous publications.  The example in 4B was chosen because it demonstrated both the similarity in jags between Cs+ and Cs++PTX and an overall decrease in temperature sensitivity, even though in this instance the steady-state change in frequency with temperature was not monotonic. 

      Figure 6A - "Phase 0 to 1.0" - The y-axis should provide units of phase. Presumably, these are units of radians so 1.0=2*pi radians (or 360 degrees, but probably best to avoid using degrees of phase due to confusion with degrees of temperature).

      Phase, with respect to pyloric rhythm cycles, does not traditionally have units as it is a proportion rather than an angle. As such, we have not changed the figure.

      Line 275 - "the pacemaker neuron can increase" - Does this indicate that the main effects of H current are in the follower neurons (i.e. LP and PY versus the driver neuron PD)?

      Not necessarily.  We posit in the next paragraph that the effect of the H current on the temperature sensitivity could be due to its phase advance of LP, but that phase advance of LP is not particularly expected to increase frequency.  We favor the possibility that temperature increases Ih in the pacemaker, which in turn advances the PRC of the rhythm, allowing the frequency increase seen under normal conditions.  In Cs+, this advance does not occur, resulting in the lower temperature sensitivity.  In Cs++PTX, the lack of inhibition from LP means compensatory advance of the pacemaker PRC by Ih is unnecessary to allow increased frequency.

      Line 285 - "either increase frequency have no effect" - Is there a missing "or" in this phrase?

      Thank you, we have added the “or”.

    1. Author response:

      eLife assessment

      This potentially valuable study examines the role of IL17-producing Ly6G PMNs as a reservoir for Mycobacterium tuberculosis to evade host killing activated by BCG immunisation. The authors report that IL17-producing polymorphonuclear neutrophils harbour a significant bacterial load in both wild-type and IFNg-/- mice and that targeting IL17 and Cox2 improved disease outcomes whilst enhancing BCG efficacy. Although the authors suggest that targeting these pathways may improve disease outcomes in humans, the evidence as it stands is incomplete and requires additional experimentation for the study to realise its full impact.

      Thank you for evaluating our manuscript. We understand the concern related to the direct role of Ly6G+Gra-derived IL17 in TB pathogenesis. For the revised manuscript, we will provide additional experimental evidence through direct regulation of IL-17 production in Mtb-infected mice and its impact on improving BCG efficacy.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Recruitment of neutrophils to the lungs is known to drive susceptibility to infection with M. tuberculosis. In this study, the authors present data in support of the hypothesis that neutrophil production of the cytokine IL-17 underlies the detrimental effect of neutrophils on disease. They claim that neutrophils harbor a large fraction of Mtb during infection, and are a major source of IL-17. To explore the effects of blocking IL-17 signaling during primary infection, they use IL-17 blocking antibodies, SR221 (an inverse agonist of TH17 differentiation), and celecoxib, which they claim blocks Th17 differentiation, and observe modest improvements in bacterial burdens in both WT and IFN-γ deficient mice using the combination of IL-17 blockade with celecoxib during primary infection. Celecoxib enhances control of infection after BCG vaccination. 

      Thank you for the summary.

      Strengths:

      The most novel finding in the paper is that treatment with celecoxib significantly enhances control of infection in BCG-vaccinated mice that have been challenged with Mtb. It was already known that NSAID treatments can improve primary infection with Mtb.

      Thank you.

      Weaknesses:

      The major claim of the manuscript - that neutrophils produce IL-17 that is detrimental to the host - is not strongly supported by the data. Data demonstrating neutrophil production of IL17 lacks rigor. 

      Our response: Neutrophil production of IL-17 is supported by two independent methods/ techniques in the current version: 

      (1) Through Flow cytometry- a large fraction of Ly6G+CD11b+ cells from the lungs of Mtb-infected mice were also positive for IL-17 (Fig. 3C).

      (2) IFA co-staining of Ly6G + cells with IL-17 in the lung sections from Mtb-infected mice (Fig. 3 E_G and Fig. 4H, Fig. 5I).

      However, to further strengthen this observation, we plan to analyse sorted Ly6G+Gra from the lungs of infected mice using IL-17 ELISPOT assay. This will unequivocally prove the Ly6+Gra production of IL-17. Several publications support the production of IL-17 by neutrophils (Li et al. 2010; Katayama et al. 2013; Lin et al. 2011). For example, neutrophils have been identified as a source of IL-17 in human psoriatic lesions (Lin et al. 2011), in neuroinflammation induced by traumatic brain injury (Xu et al. 2023) and in several mouse models of infectious and autoimmune inflammation (Ferretti et al. 2003; Hoshino et al. 2008) (Li et al. 2010). However, ours is the first study reporting neutrophil IL-17 production during Mtb pathology.

      The experiments examining the effects of inhibitors of IL-17 on the outcome of infection are very difficult to interpret. First, treatment with IL-17 inhibitors alone has no impact on bacterial burdens in the lung, either in WT or IFN-γ KO mice. This suggests that IL-17 does not play a detrimental role during infection. Modest effects are observed using the combination of IL-17 blocking drugs and celecoxib, however, the interpretation of these results mechanistically is complicated. Celecoxib is not a specific inhibitor of Th17. Indeed, it affects levels of PGE2, which is known to have numerous impacts on Mtb infection separate from any effect on IL-17 production, as well as other eicosanoids. 

      The reviewer correctly says that Celecoxib is not a specific inhibitor of Th17. However, COX-2 inhibition does have an effect on IL-17 levels, and numerous reports support this observation (Paulissen et al. 2013; Napolitani et al. 2009; Lemos et al. 2009). We elaborate on the results below for better clarity.

      Firstly, in the WT mice, Celecoxib treatment led to a complete loss of IL-17 production in the lungs of Mtb-infected mice (Fig. 5D). Interestingly, IL-17 production independent of IL-23 is known to require PGE2 (Paulissen et al. 2013; Polese et al. 2021). In the WT or IFNγ KO mice, we rather noted a decline in IL-23 levels post-infection, suggesting a possible role of PGE2 in IL-17 production. However, in the lung homogenates of Mtb-infected IFNγ KO mice, Celecoxib had no effect on IL-17 levels in the lung homogenates. Thus, celecoxib controls IL-17 levels only in the Mtb-infected WT mice. Including celecoxib with anti-IL17 in the IFNγ KO mice controls pathology and extends its survival.

      Second, the reviewer’s observation is only partially correct that IL-17 inhibition has a modest effect on the outcome of infection. While IL-17 neutralization and inhibition alone in the IFNγ KO mice and WT mice, respectively, did not bring down the lung CFU burden significantly, in both these cases, there was an improvement in the lung pathology. The reduced pathology coincided with reduced neutrophil recruitment and a reduced Ly6G+Graresident Mtb population in the WT mice. IL-17 neutralization alone improved IFNγ KO mice survival by ~10 days (Fig. 4F-G). 

      Third, regarding the SR2211 and Celecoxib combination study, we agree with the reviewer that Celecoxib has roles independent of IL-17 regulation. However, in the results presented in this study, there are three key aspects- 1) neutrophil-derived IL-17-dependent neutrophil recruitment, 2) the presence of a large proportion of intracellular Mtb in the neutrophils and 3) dissemination of Mtb to the spleen. Celecoxib treatment alone helps reduce lung Mtb burden in the WT mice. However, SR2211 fails to do so. It is evident that celecoxib is doing more than just inhibiting IL-17 production. The result shows that celecoxib blocks neutrophil recruitment (which could be an IL-17-dependent mechanism) and also controls the intraneutrophil bacterial population. Finally, either SR2211 or celecoxib could block dissemination to the spleen. The role of neutrophils in TB dissemination is only beginning to emerge (Hult et al. 2021). We will revise the description in the results and discussion section for this data to make it easier to understand.

      Finally, we have also done experiments with SR2211 in BCG-vaccinated animals, which shows the direct impact of IL-17 inhibition on the BCG vaccine efficacy. We will add this result in the revised version.

      Finally, the human data simply demonstrates that neutrophils and IL-17 both are higher in patients who experience relapse after treatment for TB, which is expected and does not support their specific hypothesis. 

      We disagree with the above statement. Why a higher IL-17 is expected in patients who show relapse, death or failed treatment outcomes? Classically, IL-17 is believed to be protective against TB, and the reviewer also points to that in the comments below. A very limited set of studies support the non-protective/pathological role of IL-17 in tuberculosis (Cruz et al. 2010). High IL-17 and neutrophilia at the baseline in the human subjects (i.e. at the time of recruitment in the study) highlight severe pathology in those subjects, which could have contributed to the failed treatment outcome. This observation in the human cohort strongly supports the overall theme and central observation in this study.

      The use of genetic ablation of IL-17 production specifically in neutrophils and/or IL-17R in mice would greatly enhance the rigor of this study. 

      The reviewer’s point is well-taken. Having a genetic ablation of IL-17 production, specifically in the neutrophils, would be excellent. At present, however, we lack this resource, and therefore, it is not feasible to do this experiment within a defined timeline. Instead, for the revised manuscript, we will present the data with SR2211, a direct inhibitor of RORgt and, therefore, IL-17, in BCG-vaccinated mice.

      The authors do not address the fact that numerous studies have shown that IL-17 has a protective effect in the mouse model of TB in the context of vaccination.

      Yes, there are a few articles that talk about the protective effect of IL-17 in the mouse model of TB in the context of vaccination (Khader et al. 2007; Desel et al. 2011; Choi et al. 2020). This part was discussed in the original manuscript (in the Introduction section). For the revised manuscript, we will also provide results from the experiment where we blocked IL-17 production by inhibiting RORgt using SR2211 in BCG-vaccinated mice. The results clearly show IL-17 as a negative regulator of BCG-mediated protective immunity. We believe some of the reasons for the observed differences could be 1) in our study, we analysed IL-17 levels in the lung homogenates at late phases of infection, and 2) most published studies rely on ex vivo stimulation of immune cells to measure cytokine production, whereas we actually measured the cytokine levels in the lung homogenates. We will elaborate on these points in the revised version.

      Finally, whether and how many times each animal experiment was repeated is unclear.

      We will provide the details of the number of experiments in the revised version. Briefly, the BCG vaccination experiment (Figure 1) and BCG vaccination with Celecoxib treatment experiment (Figure 6) were performed twice and thrice, respectively. The IL-17 neutralization experiment (Figure 4) and the SR2211 treatment experiment (Figure 5) were done once. We will add another SR2211 experiment data in the revised version. 

      Reviewer #2 (Public review):

      Summary:

      In this study, Sharma et al. demonstrated that Ly6G+ granulocytes (Gra cells) serve as the primary reservoirs for intracellular Mtb in infected wild-type mice and that excessive infiltration of these cells is associated with severe bacteremia in genetically susceptible IFNγ/- mice. Notably, neutralizing IL-17 or inhibiting COX2 reversed the excessive infiltration of Ly6G+Gra cells, mitigated the associated pathology, and improved survival in these susceptible mice. Additionally, Ly6G+Gra cells were identified as a major source of IL-17 in both wild-type and IFNγ-/- mice. Inhibition of RORγt or COX2 further reduced the intracellular bacterial burden in Ly6G+Gra cells and improved lung pathology.

      Of particular interest, COX2 inhibition in wild-type mice also enhanced the efficacy of the BCG vaccine by targeting the Ly6G+Gra-resident Mtb population.

      Thank you for the summary.

      Strengths:

      The experimental results showing improved BCG-mediated protective immunity through targeting IL-17-producing Ly6G+ cells and COX2 are compelling and will likely generate significant interest in the field. Overall, this study presents important findings, suggesting that the IL-17-COX2 axis could be a critical target for designing innovative vaccination strategies for TB.

      Thank you for highlighting the overall strengths of the study.  Weaknesses:

      However, I have the following concerns regarding some of the conclusions drawn from the experiments, which require additional experimental evidence to support and strengthen the overall study.

      Major Concerns:

      (1) Ly6G+ Granulocytes as a Source of IL-17: The authors assert that Ly6G+ granulocytes are the major source of IL-17 in wild-type and IFN-γ KO mice based on colocalization studies of Ly6G and IL-17. In Figure 3D, they report approximately 500 Ly6G+ cells expressing IL-17 in the Mtb-infected WT lung. Are these low numbers sufficient to drive inflammatory pathology? Additionally, have the authors evaluated these numbers in IFN-γ KO mice? 

      Thank you for pointing out about the numbers in Fig. 3D. It was our oversight to label the axis as No. of IL17+Ly6G+Gra/lung. For this data, only a part of the lung was used. For the revised manuscript, we will provide the number of these cells at the whole lung level from Mtb-infected WT mice. Unfortunately, we did not evaluate these numbers in IFN-γ KO mice through FACS. 

      For the assertion that Ly6G+Gra are the major source of IL-17 in TB, we have used two separate strategies- a) IFA and b) FACS. 

      However, as described above in response to the first reviewer, for the revision, we propose to perform an IL-17 ELISpot assay on the sorted Ly6G+Gra from the lungs of Mtb-infected WT mice.

      (2) Role of IL-17-Producing Ly6G Granulocytes in Pathology: The authors suggest that IL17-producing Ly6G granulocytes drive pathology in WT and IFN-γ KO mice. However, the data presented only demonstrate an association between IL-17+ Ly6G cells and disease pathology. To strengthen their conclusion, the authors should deplete neutrophils in these mice to show that IL-17 expression, and consequently the pathology, is reduced.

      Thank you for this suggestion. Others have done neutrophil depletion studies in TB, and so far, the outcomes remain inconclusive. In some studies, neutrophil depletion helps the pathogen (Rankin et al. 2022; Pedrosa et al. 2000; Appelberg et al. 1995), and in others, it helps the host (Lovewell et al. 2021; Mishra et al. 2017) ). One reason for this variability is the stage of infection when neutrophil depletion was done. However, another crucial factor is the heterogeneity in the neutrophil population. There are reports that suggest neutrophil subtypes with protective versus pathological trajectories (Nwongbouwoh Muefong et al. 2022; Lyadova 2017; Hellebrekers, Vrisekoop, and Koenderman 2018; Leliefeld et al. 2018). Depleting the entire population using anti-Ly6G could impact this heterogeneity and may impact the inferences drawn. A better approach would be to characterise this heterogeneous population, efforts towards which could be part of a separate study.

      For the revised manuscript, we will provide results from the SR2211 experiment in BCG-vaccinated mice and other results to show the role of IL-17-producing Ly6G+Gra in TB pathology.   

      (3) IL-17 Secretion by Mtb-Infected Neutrophils: Do Mtb-infected neutrophils secrete IL-17 into the supernatants? This would serve as confirmation of neutrophil-derived IL-17. Additionally, are Ly6G+ cells producing IL-17 and serving as pathogenic agents exclusively in vivo? The authors should provide comments on this.

      We have not directly measured IL-17 secretion by neutrophils in our experiments. However, Hu et al have reported IL-17 secretion by Mtb-infected neutrophils in vitro (Hu et al. 2017). Whether there are a few neutrophil roles exclusively seen under in vivo condition is an interesting proposition. We do have some observations that suggest in vitro phenotype of Mtb-infected neutrophils is different from in vivo.

      (4) Characterization of IL-17-Producing Ly6G+ Granulocytes: Are the IL-17-producing Ly6G+ granulocytes a mixed population of neutrophils and eosinophils, or are they exclusively neutrophils? Sorting these cells followed by Giemsa or eosin staining could clarify this.

      This is a very important point. While usually eosinophils do not express Ly6G markers in laboratory mice, under specific contexts, including infections, eosinophils can express Ly6G. Since we have not characterized these potential Ly6G+ sub-populations, that is one of the reasons we refer to the cell types as Ly6G+ granulocytes, which do not exclude Ly6G+ eosinophils. A detailed characterization of these subsets could be taken up as a separate study.

      Reviewer #3 (Public review):

      Summary:

      The authors examine how distinct cellular environments differentially control Mtb following BCG vaccination. The key findings are that IL17-producing PMNs harbor a significant Mtb load in both wild-type and IFNg-/- mice. Targeting IL17 and Cox2 improved disease and enhanced BCG efficacy over 12 weeks and neutrophils/IL17 are associated with treatment failure in humans. The authors suggest that targeting these pathways, especially in MSMD patients may improve disease outcomes.

      Thank you.

      Strengths:

      The experimental approach is generally sound and consists of low-dose aerosol infections with distinct readouts including cell sorting followed by CFU, histopathology, and RNA sequencing analysis. By combining genetic approaches and chemical/antibody treatments, the authors can probe these pathways effectively.

      Understanding how distinct inflammatory pathways contribute to control or worsen Mtb disease is important and thus, the results will be of great interest to the Mtb field.

      Thank you.

      Weaknesses:

      A major limitation of the current study is overlooking the role of non-hematopoietic cells in the IFNg/IL17/neutrophil response. Chimera studies from Ernst and colleagues (PMCID: PMC2807991) previously described this IDO-dependent pathway following the loss of IFNg through an increased IL17 response. This study is not cited nor discussed even though it may alter the interpretation of several experiments.

      Thank you for pointing out this earlier study, which we concede we missed discussing. We disagree on the point that results from that study may alter the interpretation of several experiments in our study. On the contrary, the main observation that loss of IFNγ causes severe IL-17 levels is aligned in both studies.

      IDO1 is known to alter Th cell differentiation towards Tregs and away from Th17 (Baban et al. 2009). It is absolutely feasible for the non-hematopoietic cells to regulate these events. However, that does not rule out the neutrophil production of IL-17 and the downstream pathological effect shown in this study. We will discuss and cite this study in the revised manuscript.

      Several of the key findings in mice have previously been shown (albeit with less sophisticated experimentation) and human disease and neutrophils are well described - thus the real new finding is how intracellular Mtb in neutrophils are more refractory to BCGmediated control. However, given there are already high levels of Mtb in PMNs compared to other cell types, and there is a decrease in intracellular Mtb in PMNs following BCG immunization the strength of this finding is a bit limited.

      The reviewer’s interpretation of the BCG-refractory Mtb population in the neutrophil is interesting. The reviewer is right that neutrophils had a higher intracellular Mtb burden, which decreased in the BCG-vaccinated animals. Thus, on that account, the reviewer rightly mentions that BCG is able to control Mtb even in neutrophils. However, BCG almost clears intracellular burden from other cell types analysed, and therefore, the remnant pool of intracellular Mtb in the lungs of BCG-vaccinated animals could be mostly those present in the neutrophils. This is a substantial novel development in the field and attracts focus towards innate immune cells for vaccine efficacy. 

      References:

      Appelberg, R., A. G. Castro, S. Gomes, J. Pedrosa, and M. T. Silva. 1995. 'SuscepBbility of beige mice to Mycobacterium avium: role of neutrophils', Infect Immun, 63: 3381-7.

      Baban, B., P. R. Chandler, M. D. Sharma, J. Pihkala, P. A. Koni, D. H. Munn, and A. L. Mellor. 2009. 'IDO activates regulatory T cells and blocks their conversion into Th17-like T cells', J Immunol, 183: 2475-83.

      Choi, H. G., K. W. Kwon, S. Choi, Y. W. Back, H. S. Park, S. M. Kang, E. Choi, S. J. Shin, and H. J. Kim. 2020. 'AnBgen-Specific IFN-gamma/IL-17-Co-Producing CD4(+) T-Cells Are the Determinants for ProtecBve Efficacy of Tuberculosis Subunit Vaccine', Vaccines (Basel), 8.

      Cruz, A., A. G. Fraga, J. J. Fountain, J. Rangel-Moreno, E. Torrado, M. Saraiva, D. R. Pereira, T. D. Randall, J. Pedrosa, A. M. Cooper, and A. G. Castro. 2010. 'Pathological role of interleukin 17 in mice subjected to repeated BCG vaccination after infection with Mycobacterium tuberculosis', J Exp Med, 207: 1609-16.

      Desel, C., A. Dorhoi, S. Bandermann, L. Grode, B. Eisele, and S. H. Kaufmann. 2011. 'Recombinant BCG DeltaureC hly+ induces superior protection over parental BCG by simulating a balanced combination of type 1 and type 17 cytokine responses', J Infect Dis, 204: 1573-84.

      Ferreg, S., O. Bonneau, G. R. Dubois, C. E. Jones, and A. Trifilieff. 2003. 'IL-17, produced by lymphocytes and neutrophils, is necessary for lipopolysaccharide-induced airway neutrophilia: IL-15 as a possible trigger', J Immunol, 170: 2106-12.

      Hellebrekers, P., N. Vrisekoop, and L. Koenderman. 2018. 'Neutrophil phenotypes in health and disease', Eur J Clin Invest, 48 Suppl 2: e12943.

      Hoshino, A., T. Nagao, N. Nagi-Miura, N. Ohno, M. Yasuhara, K. Yamamoto, T. Nakayama, and K. Suzuki. 2008. 'MPO-ANCA induces IL-17 production by activated neutrophils in vitro via classical complement pathway-dependent manner', J Autoimmun, 31: 79-89.

      Hu, S., W. He, X. Du, J. Yang, Q. Wen, X. P. Zhong, and L. Ma. 2017. 'IL-17 ProducBon of Neutrophils Enhances AnBbacteria Ability but Promotes ArthriBs Development During Mycobacterium tuberculosis InfecBon', EBioMedicine, 23: 88-99.

      Hult, C., J. T. Magla, H. P. Gideon, J. J. Linderman, and D. E. Kirschner. 2021. 'Neutrophil Dynamics Affect Mycobacterium tuberculosis Granuloma Outcomes and DisseminaBon', Front Immunol, 12: 712457.

      Katayama, M., K. Ohmura, N. Yukawa, C. Terao, M. Hashimoto, H. Yoshifuji, D. Kawabata, T. Fujii, Y. Iwakura, and T. Mimori. 2013. 'Neutrophils are essential as a source of IL-17 in the effector phase of arthritis', PLoS One, 8: e62231.

      Khader, S. A., G. K. Bell, J. E. Pearl, J. J. Fountain, J. Rangel-Moreno, G. E. Cilley, F. Shen, S. M. Eaton, S. L. Gaffen, S. L. Swain, R. M. Locksley, L. Haynes, T. D. Randall, and A. M. Cooper. 2007. 'IL-23 and IL-17 in the establishment of protective pulmonary CD4+ T cell responses after vaccination and during Mycobacterium tuberculosis challenge', Nat Immunol, 8: 369-77.

      Leliefeld, P. H. C., J. Pillay, N. Vrisekoop, M. Heeres, T. Tak, M. Kox, S. H. M. Rooijakkers, T. W. Kuijpers, P. Pickkers, L. P. H. Leenen, and L. Koenderman. 2018. 'DifferenBal antibacterial control by neutrophil subsets', Blood Adv, 2: 1344-55.

      Lemos, H. P., R. Grespan, S. M. Vieira, T. M. Cunha, W. A. Verri, Jr., K. S. Fernandes, F. O. Souto, I. B. McInnes, S. H. Ferreira, F. Y. Liew, and F. Q. Cunha. 2009. 'Prostaglandin mediates IL-23/IL-17induced neutrophil migraBon in inflammation by inhibiting IL-12 and IFNgamma production', Proc Natl Acad Sci U S A, 106: 5954-9.

      Li, L., L. Huang, A. L. Vergis, H. Ye, A. Bajwa, V. Narayan, R. M. Strieter, D. L. Rosin, and M. D. Okusa. 2010. 'IL-17 produced by neutrophils regulates IFN-gamma-mediated neutrophil migration in mouse kidney ischemia-reperfusion injury', J Clin Invest, 120: 331-42.

      Lin, A. M., C. J. Rubin, R. Khandpur, J. Y. Wang, M. Riblen, S. Yalavarthi, E. C. Villanueva, P. Shah, M. J. Kaplan, and A. T. Bruce. 2011. 'Mast cells and neutrophils release IL-17 through extracellular trap formation in psoriasis', J Immunol, 187: 490-500.

      Lovewell, R. R., C. E. Baer, B. B. Mishra, C. M. Smith, and C. M. Sasseg. 2021. 'Granulocytes act as a niche for Mycobacterium tuberculosis growth', Mucosal Immunol, 14: 229-41.

      Lyadova, I. V. 2017. 'Neutrophils in Tuberculosis: Heterogeneity Shapes the Way?', Mediators Inflamm, 2017: 8619307.

      Mishra, B. B., R. R. Lovewell, A. J. Olive, G. Zhang, W. Wang, E. Eugenin, C. M. Smith, J. Y. Phuah, J. E. Long, M. L. Dubuke, S. G. Palace, J. D. Goguen, R. E. Baker, S. Nambi, R. Mishra, M. G. Booty, C. E. Baer, S. A. Shaffer, V. Dartois, B. A. McCormick, X. Chen, and C. M. Sasseg. 2017. 'Nitric oxide prevents a pathogen-permissive granulocytic inflammation during tuberculosis', Nat Microbiol, 2: 17072.

      Napolitani, G., E. V. Acosta-Rodriguez, A. Lanzavecchia, and F. Sallusto. 2009. 'Prostaglandin E2 enhances Th17 responses via modulation of IL-17 and IFN-gamma production by memory CD4+ T cells', Eur J Immunol, 39: 1301-12.

      Nwongbouwoh Muefong, C., O. Owolabi, S. Donkor, S. Charalambous, A. Bakuli, A. Rachow, C. Geldmacher, and J. S. Sutherland. 2022. 'Neutrophils Contribute to Severity of Tuberculosis Pathology and Recovery From Lung Damage Pre- and Posnreatment', Clin Infect Dis, 74: 1757-66.

      Paulissen, S. M., J. P. van Hamburg, N. Davelaar, P. S. Asmawidjaja, J. M. Hazes, and E. Lubberts. 2013. 'Synovial fibroblasts directly induce Th17 pathogenicity via the cyclooxygenase/prostaglandin E2 pathway, independent of IL-23', J Immunol, 191: 1364-72.

      Pedrosa, J., B. M. Saunders, R. Appelberg, I. M. Orme, M. T. Silva, and A. M. Cooper. 2000. 'Neutrophils play a protective nonphagocytic role in systemic Mycobacterium tuberculosis infection of mice', Infect Immun, 68: 577-83.

      Polese, B., B. Thurairajah, H. Zhang, C. L. Soo, C. A. McMahon, G. Fontes, S. N. A. Hussain, V. Abadie, and I. L. King. 2021. 'Prostaglandin E(2) amplifies IL-17 production by gamma-delta T cells during barrier inflammation', Cell Rep, 36: 109456.

      Rankin, A. N., S. V. Hendrix, S. K. Naik, and C. L. Stallings. 2022. 'Exploring the Role of Low-Density Neutrophils During Mycobacterium tuberculosis InfecBon', Front Cell Infect Microbiol, 12: 901590.

      Xu, X. J., Q. Q. Ge, M. S. Yang, Y. Zhuang, B. Zhang, J. Q. Dong, F. Niu, H. Li, and B. Y. Liu. 2023. 'Neutrophil-derived interleukin-17A participates in neuroinflammation induced by traumatic brain injury', Neural Regen Res, 18: 1046-51.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary: 

      In this paper Homan et al used mouse models of Metabolic Dysfunction-Associated Steatotic Liver Disease and different specific target deletions in cells to rule out the role of Complement 3a Receptor 1 in the pathogenesis of disease. They provided limited evidence and only descriptive results that despite C3aR being relevant in different contexts of inflammation, however, these tenets did not hold true. 

      Weaknesses: 

      (1) The results are based on readouts showing that C3aR is not involved in the pathogenesis of liver metabolic disease. 

      (2) The description of the mouse models they used to validate their findings is not clear. Lysm-cre mice - which are claimed to delete C3aR in (?) macrophages are not specific for these cells, and the genetic strategy to delete C3aR in Kupffer cells is not clear. 

      (3) Taking this into account, it is very challenging to determine the validity of these data, also considering that they are merely descriptive and correlative. 

      We generated 2 different cohorts of mice using LysM-Cre (Jackson Strain #004781) to drive deletion in all macrophages and Clec4f-Cre (Jackson Strain #033296) to specifically ablate C3ar1 in Kupffer cells. We will ensure that experimental models will be clearly defined in the revised manuscript. The reviewer’s point is well taken that LysM-Cre transgene can also be active in granulocytes and some dendritic cells. Even so, despite deletion of C3ar1 in macrophages and other granulocytes, we do not see a major effect on hepatic steatosis and fibrosis in this GAN diet induced model of MASLD/MASH. This was a somewhat surprising finding. We do not agree that our findings are correlative. We specifically ablated C3aR1 in macrophages or Kupffer cells and found no significant differences in the major readouts of steatosis and fibrosis for MASLD/MASH between control and knockout mice. It is possible that in other models of liver injury that we did not test (e.g., short-term treatment with a hepatotoxin such as carbon tetrachloride), there may be differences in liver injury in mice lacking C3ar1 in macrophages, but the GAN diet model has been shown to better parallel the gene expression changes in human MAFLD/MASH.

      Reviewer #2 (Public review):

      Summary:

      Homan et al. examined the effect of macrophage- or Kupffer cell-specific C3aR1 KO on MASLD/MASHrelated metabolic or liver phenotypes. 

      Strengths:

      Established macrophage- or Kupffer cell-specific C3aR1 KO mice. 

      Weaknesses:

      Lack of in-depth study; flaws in comparisons between KC-specific C3aR1KO and WT in the context of MASLD/MASH, because MASLD/MASH WT mice likely have a low abundance of C3aR1 on KCs. 

      Homan et al. reported a set of observation data from macrophage or Kupffer cell-specific C3aR1KO mice. Several questions and concerns as follows could challenge the conclusions of this study: 

      (1) As C3aR1 is robustly repressed in MASLD or MASH liver, GAN feeding likely reduced C3aR1 abundance in the liver of WT mice. Thus, it is not surprising that there were no significant differences in liver phenotypes between WT vs. C3aR1KO mice after prolonged GAN diet feeding. It would give more significance to the study if restoring C3aR1 abundance in KCs in the context of MASLD/MASH. 

      GAN diet feeding resulted in higher liver C3ar1 compared to regular diet (Figure 1H). This thus became an impetus for studying the effects of C3ar1 deletion in macrophages or Kupffer cells, which are responsible for the majority of liver C3ar1 expression, in MASLD/MASH (Figures 2B and 3H).  

      (2) Would C3aR1KO mice develop liver abnormalities after a short period of GAN diet feeding?  

      We did not assess if short term GAN diet feeding resulted in significant differences in liver abnormalities in the C3ar1 macrophage or Kupffer cell knockout mice. Perhaps the reviewer’s point is that perhaps with shorter periods of GAN diet feeding there may be a phenotype in the KO mice. We agree that this is entirely possible, though with shorter feeding timeframes what is typically seen is hepatic steatosis without fibrosis. Nevertheless, the most important element in our opinion for a disease preventing or modifying model lies with the longer-term GAN diet feeding. With long term GAN diet feeding that has been previously shown to model human MASLD/MASH, we did not observe significant differences in liver abnormalities with the KO mice.

      (3) What would be the liver macrophage phenotypes in WT vs C3aR1KO mice after GAN feeding? 

      Similar to the above point, given the lack of a major MASLD/MASH phenotype in hepatic steatosis and fibrosis, we did not further profile the liver macrophage profiles of the macrophage or Kupffer cell C3ar1 KO mice with GAN feeding.  

      (4) In Fig 1D, >25wks GAN feeding had minimal effects on female body weight gain. These GAN-fed female mice also develop NASLD/MASH liver abnormalities? 

      We thank the reviewer for this question. In general, female GAN-fed mice develop milder MASLD/MASH abnormalities. We will include additional data in the revised manuscript.

      (5) Would C3aR1KO result in differences in liver phenotypes, including macrophage population/activation, liver inflammation, lipogenesis, in lean mice? 

      Likewise, we will include data further characterizing liver inflammation, lipogenesis and macrophages in macrophage C3ar1 KO mice under lean/regular diet conditions.

      (6) The authors should provide more information regarding the generation of KC-specific C3aR1KO. Which Cre mice were used to breed with C3aR1 flox mice? 

      Clec4f-Cre transgenic mice were used to generate Kupffer cell specific KO of C3ar1. This will be clarified and explicitly stated in the revised manuscript.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study puts forth the model that under IFN-B stimulation, liquid-phase WTAP coordinates with the transcription factor STAT1 to recruit MTC to the promoter region of interferon-stimulated genes (ISGs), mediating the installation of m6A on newly synthesized ISG mRNAs. This model is supported by strong evidence that the phosphorylation state of WTAP, regulated by PPP4, is regulated by IFN-B stimulation, and that this results in interactions between WTAP, the m6A methyltransferase complex, and STAT1, a transcription factor that mediates activation of ISGs. This was demonstrated via a combination of microscopy, immunoprecipitations, m6A sequencing, and ChIP. These experiments converge on a set of experiments that nicely demonstrate that IFN-B stimulation increases the interaction between WTAP, METTL3, and STAT1, that this interaction is lost with the knockdown of WTAP (even in the presence of IFN-B), and that this IFN-B stimulation also induces METTL3-ISG interactions.

      Strengths:

      The evidence for the IFN-B stimulated interaction between METTL3 and STAT1, mediated by WTAP, is quite strong. Removal of WTAP in this system seems to be sufficient to reduce these interactions and the concomitant m6A methylation of ISGs. The conclusion that the phosphorylation state of WTAP is important in this process is also quite well supported.

      Weaknesses:

      The evidence that the above mechanism is fundamentally driven by different phase-separated pools of WTAP (regulated by its phosphorylation state) is weaker. These experiments rely relatively heavily on the treatment of cells with 1,6-hexanediol, which has been shown to have some off-target effects on phosphatases and kinases (PMID 33814344).

      Given that the model invoked in this study depends on the phosphorylation (or lack thereof) of WTAP, this is a particularly relevant concern.

      Related to this point, it is also interesting (and potentially concerning for the proposed model) that the initial region of WTAP that was predicted to be disordered is in fact not the region that the authors demonstrate is important for the different phase-separated states. Taking all the data together, it is also not clear to me that one has to invoke phase separation in the proposed mechanism.

      We are grateful for the Reviewer’s positive comment and constructive feedback. In this article, we claim a novel and important mechanism that de-phosphorylation-driven solid to liquid phase transition of WTAP mediates its co-transcriptional m6A modification. We first observed that WTAP underwent phase transition during virus infection and IFN-β stimulation, and confirmed the phase transition driven force of WTAP through multiple experiments. Besides 1,6‐hexanediol (1,6-hex) treatment, we also introduced S/T to D/A mutations to mimic the phosphorylation and de-phosphorylation WTAP in vitro and in cells, identified 5ST-D mutant as SLPS mutant, and 5ST-A mutant as LLPS mutant. We then performed 1,6-hex experiment to confirm the importance of phase separation for WTAP function, and revealed that 5ST-D SLPS mutant and 5ST-A LLPS mutant had different influence on WTAP-promoter region interaction and co-transcriptional m6A modification. Following the reviewer’s suggestion, we need to further clarify the phosphorylation of WTAP phase separation. We plan to repeat the experiments by introducing potent PP4 inhibitor, fostriecin, and performed further experiments to explore the effect of WTAP IDR domain, which is reported to play a critical role for its phase separation.

      1,6-hex was initially considered as the inhibitor of hydrophobic interaction which involved in various kinds of protein-protein interaction, indicating that off-target effects of 1,6-hex was inevitable. It is reported that 1,6-hex impaired RNA pol II CTD specific phosphatase and kinase activity at 5% concentration3. However, 1,6-hex is still widely used in the LLPS-associated functional studies despite its off-target effect. Related to this article, 10% 1,6-hex was reported to dissolve WTAP phase separation droplets2. Beside WTAP, 1,6-hex (5%-10% w/v) was also used to explore the phase separation characteristic and function on phosphorylated protein or even kinase, including p‐tau441, TAZ, HSF1 and so on4-6. 10% 1,6-hex inhibited the crucial role of phosphorylation-driven HSF1 LLPS in chromatin binding and transcriptional process presented by RNA-seq dataset6, indicating the function on kinase or phosphatase of 1,6-hex might not a global effect. To avoid the 1,6-hex-mediated kinase/phosphatase impairment in this project, we introduced the WTAP SLPS mutation and LLPS mutation besides 1,6-hex treatment to explore the m6A modification function of WTAP phase transition. We plan to repeat the experiments by lower the 1,6-hex concentration, check the WTAP phosphorylation status after 1,6-hex treatment, and discuss them in the discussion part.

      A considerable number of proteins undergo phase separation via interactions between intrinsically disordered regions (IDRs). IDR contains more charged and polar amino acids to present multiple weakly interacting elements, while lacking hydrophobic amino acids to show flexible conformations7. In our article, we used PLAAC websites (http://plaac.wi.mit.edu/) to predict IDR domain of WTAP, and a fragment (234-249 amino acids) was predicted as prion-like domain. However, deletion of this fragment failed to abolish the phase separation properties of WTAP, which might be the main confusion to reviewers. To explain this issue, we checked the WTAP structure (within part of MTC complex) from protein data bank (https://www.rcsb.org/structure/7VF2) and found that prediction of IDR has been renewed due to the update of different algorithm. IDR of WTAP has expanded to 245-396 amino acids, containing the whole CTD region. According to our results, lack of CTD inhibited WTAP liquid-liquid phase separation both in vitro and in cells, while the phosphorylation status on CTD had dramatic impact on WTAP phase transition, which was consistent with the LLPS-regulating function of IDR. Therefore, we will revise our description on WTAP IDR, and performed further experiment to test its function.

      Taken together, given the highly association between WTAP phosphorylation with phase separation status and its function during IFN-β stimulation, it is necessary to involve WTAP phase separation in our mechanism. We will perform further experiments to propose more convincing evidence and perfect our project.

      Reviewer #2 (Public review):

      In this study, Cai and colleagues investigate how one component of the m6A methyltransferase complex, the WTAP protein, responds to IFNb stimulation. They find that viral infection or IFNb stimulation induces the transition of WTAP from aggregates to liquid droplets through dephosphorylation by PPP4. This process affects the m6A modification levels of ISG mRNAs and modulates their stability. In addition, the WTAP droplets interact with the transcription factor STAT1 to recruit the methyltransferase complex to ISG promoters and enhance m6A modification during transcription. The investigation dives into a previously unexplored area of how viral infection or IFNb stimulation affects m6A modification on ISGs. The observation that WTAP undergoes a phase transition is significant in our understanding of the mechanisms underlying m6A's function in immunity. However, there are still key gaps that should be addressed to fully accept the model presented.

      Major points:

      (1) More detailed analyses on the effects of WTAP sgRNA on the m6A modification of ISGs:

      a. A comprehensive summary of the ISGs, including the percentage of ISGs that are m6A-modified. merip-isg percentage

      b. The distribution of m6A modification across the ISGs. topology

      c. A comparison of the m6A modification distribution in ISGs with non-ISGs. topology

      In addition, since the authors propose a novel mechanism where the interaction between phosphorylated STAT1 and WTAP directs the MTC to the promoter regions of ISGs to facilitate co-transcriptional m6A modification, it is critical to analyze whether the m6A modification distribution holds true in the data.

      We appreciate the reviewer‘s summary of our manuscript and the constructive assessment. We plan to perform the related analysis accordingly to present the m6A modification in ISGs in our model. 

      (2) Since a key part of the model includes the cytosol-localized STAT1 protein undergoing phosphorylation to translocate to the nucleus to mediate gene expression, the authors should focus on the interaction between phosphorylated STAT1 and WTAP in Figure 4, rather than the unphosphorylated STAT1. Only phosphorylated STAT1 localizes to the nucleus, so the presence of pSTAT1 in the immunoprecipitate is critical for establishing a functional link between STAT1 activation and its interaction with WTAP.

      We plan to repeat the immunoprecipitation experiments to clarify the function of pSTAT1 in WTAP interaction and m6A modification as the reviewer suggested.

      (3) The authors should include pSTAT1 ChIP-seq and WTAP ChIP-seq on IFNb-treated samples in Figure 5 to allow for a comprehensive and unbiased genomic analysis for comparing the overlaps of peaks from both ChIP-seq datasets. These results should further support their hypothesis that WTAP interacts with pSTAT1 to enhance m6A modifications on ISGs.

      We first performed the MeRIP-seq and RNA-seq and explored the critical role of WTAP in ISGs m6A modification and expression. By immunoprecipitation and immunofluorescence experiments, we found phase transition of WTAP enhanced its interaction to pSTAT1. These results indicate that WTAP mediated ISGs m6A modification and expression by enhanced its interaction with pSTAT1 during virus infection and IFN-β stimulation. However, we were still not sure how WTAP-mediated m6A modification related to pSTAT1-mediated transcription. By analyzing METTL3 ChIP-seq data or caPAR-CLIP-seq data, several researches have revealed the recruitment of m6A methylation complex (MTC) to transcription start sites (TSS) of coding genes and R-loop structure by interacting with transcriptional factors STAT5B or DNA helicase DDX21, indicating the engagement of MTC mediated m6A modification on nascent transcripts at the very beginning of transcription 8-10. Thus, we proposed that phase transition of WTAP could be recruited to the ISGs promoter region by pSTAT1, and verified this hypothesis by pSTAT1/WTAP-ChIP-qPCR. We believe ChIP-seq experiment is a good idea to explore the mechanism in depth, but the results in this article for now are enough to explain our mechanism. We will continuously focus on the whole genome chromatin distribution of WTAP and explore more functional effect of transcriptional factor-dependent WTAP-promoter region interaction in t.

      Minor points:

      (1) Since IFNb is primarily known for modulating biological processes through gene transcription, it would be informative if the authors discussed the mechanism of how IFNb would induce the interaction between WTAP and PPP4.

      (2) The authors should include mCherry alone controls in Figure 1D to demonstrate that mCherry does not contribute to the phase separation of WTAP. Does mCherry have or lack a PLD?

      (3) The authors should clarify the immunoprecipitation assays in the methods. For example, the labeling in Figure 2A suggests that antibodies against WTAP and pan-p were used for two immunoprecipitations. Is that accurate?

      (4) The authors should include overall m6A modification levels quantified of GFPsgRNA and WTAPsgRNA cells, either by mass spectrometry (preferably) or dot blot.

      We thank reviewer for raising these useful suggestions. We will perform related experiments and revised the manuscript carefully the as reviewer suggested.

      Reviewer #3 (Public review):

      Summary:

      This study presents a valuable finding on the mechanism used by WTAP to modulate the IFN-β stimulation. It describes the phase transition of WTAP driven by IFN-β-induced dephosphorylation. The evidence supporting the claims of the authors is solid, although major analysis and controls would strengthen the impact of the findings. Additionally, more attention to the figure design and to the text would help the reader to understand the major findings.

      Strength:

      The key finding is the revelation that WTAP undergoes phase separation during virus infection or IFN-β treatment. The authors conducted a series of precise experiments to uncover the mechanism behind WTAP phase separation and identified the regulatory role of 5 phosphorylation sites. They also succeeded in pinpointing the phosphatase involved.

      Weaknesses:

      However, as the authors acknowledge, it is already widely known in the field that IFN and viral infection regulate m6A mRNAs and ISGs. Therefore, a more detailed discussion could help the reader interpret the obtained findings in light of previous research.

      It is well-known that protein concentration drives phase separation events. Similarly, previous studies and some of the figures presented by the authors show an increase in WTAP expression upon IFN treatment. The authors do not discuss the contribution of WTAP expression levels to the phase separation event observed upon IFN treatment. Similarly, METTL3 and METTL14, as well as other proteins of the MTC are upregulated upon IFN treatment. How does the MTC protein concentration contribute to the observed phase separation event?

      How is PP4 related to the IFN signaling cascade?

      In general, it is very confusing to talk about WTAP KO as WTAPgRNA.

      We are grateful for the positive comments and the unbiased advice by reviewer. To interpret the findings in previous research, we will revise the manuscript carefully and preform more detailed discussion on ISGs m6A modification during virus infection or IFN stimulation. As previous reported, WTAP protein level will be induced by long time IFN-β stimulation or LPS stimulation, while LPS-induced WTAP expression promoted its phase separation ability2,11. Although there was no significant upregulation of WTAP expression level in our short time treatment, we hypothesized that WTAP phase separation will be promoted due to higher protein concentration after long time IFN stimulation, enhancing m6A modification deposition on ISGs mRNA, revealing a feedback loop between WTAP phase separation and m6A modification during specific stimulation. To discuss the effect of MTC protein concentration in our proposed event, we will perform immunoblotting experiments of MTC proteins and check the phase separation effect in different WTAP concentration.

      Protein phosphatase 4 (PP4) is a multi-subunit Ser/Thr phosphatase complex that participate in diverse cellular pathways including DDR, cell cycle progression, and apoptosis12. Protein phosphatase 4 catalytic subunit 4C (PPP4C) is one of the components of PP4 complex. Previous research showed that knockout of PPP4C enhanced IFN-β downstream signaling and gene expression, which was consistent with our findings that knockdown of PPP4C impaired WTAP-mediated m6A modification, enhanced the ISGs expression. Since there was no significant enhancement in PPP4C expression level during IFN-β stimulation in our results, we will consider to explore the post-translation modification that may influence the protein-protein interaction, such as ubiquitination.

      In this project, all the WTAP-deficient THP-1 cells were bulk cells treated with WTAPsgRNA, but not monoclonal knockout cells. We confirmed that WTAP expression was efficiently knockdown in WTAPsgRNA THP-1 cells, and the m6A modification level has been impaired, avoiding the compensatory effect on m6A modification by other possible proteins. Thus, we prefer to call it WTAPsgRNA THP-1 cells rather than WTAP KO THP-1 cells.  

      References

      (1) Raja, R., Wu, C., Bassoy, E.Y., Rubino, T.E., Jr., Utagawa, E.C., Magtibay, P.M., Butler, K.A., and Curtis, M. (2022). PP4 inhibition sensitizes ovarian cancer to NK cell-mediated cytotoxicity via STAT1 activation and inflammatory signaling. J Immunother Cancer 10. 10.1136/jitc-2022-005026.

      (2) Ge, Y., Chen, R., Ling, T., Liu, B., Huang, J., Cheng, Y., Lin, Y., Chen, H., Xie, X., Xia, G., et al. (2024). Elevated WTAP promotes hyperinflammation by increasing m6A modification in inflammatory disease models. J Clin Invest 134. 10.1172/JCI177932.

      (3) Duster, R., Kaltheuner, I.H., Schmitz, M., and Geyer, M. (2021). 1,6-Hexanediol, commonly used to dissolve liquid-liquid phase separated condensates, directly impairs kinase and phosphatase activities. J Biol Chem 296, 100260. 10.1016/j.jbc.2021.100260.

      (4) Wegmann, S., Eftekharzadeh, B., Tepper, K., Zoltowska, K.M., Bennett, R.E., Dujardin, S., Laskowski, P.R., MacKenzie, D., Kamath, T., Commins, C., et al. (2018). Tau protein liquid-liquid phase separation can initiate tau aggregation. The EMBO journal 37. 10.15252/embj.201798049.

      (5) Lu, Y., Wu, T., Gutman, O., Lu, H., Zhou, Q., Henis, Y.I., and Luo, K. (2020). Phase separation of TAZ compartmentalizes the transcription machinery to promote gene expression. Nat Cell Biol 22, 453-464. 10.1038/s41556-020-0485-0.

      (6) Zhang, H., Shao, S., Zeng, Y., Wang, X., Qin, Y., Ren, Q., Xiang, S., Wang, Y., Xiao, J., and Sun, Y. (2022). Reversible phase separation of HSF1 is required for an acute transcriptional response during heat shock. Nat Cell Biol 24, 340-352. 10.1038/s41556-022-00846-7.

      (7) Hou, S., Hu, J., Yu, Z., Li, D., Liu, C., and Zhang, Y. (2024). Machine learning predictor PSPire screens for phase-separating proteins lacking intrinsically disordered regions. Nat Commun 15, 2147. 10.1038/s41467-024-46445-y.

      (8) Hao, J.D., Liu, Q.L., Liu, M.X., Yang, X., Wang, L.M., Su, S.Y., Xiao, W., Zhang, M.Q., Zhang, Y.C., Zhang, L., et al. (2024). DDX21 mediates co-transcriptional RNA m(6)A modification to promote transcription termination and genome stability. Mol Cell 84, 1711-1726 e1711. 10.1016/j.molcel.2024.03.006.

      (9) Barbieri, I., Tzelepis, K., Pandolfini, L., Shi, J., Millan-Zambrano, G., Robson, S.C., Aspris, D., Migliori, V., Bannister, A.J., Han, N., et al. (2017). Promoter-bound METTL3 maintains myeloid leukaemia by m(6)A-dependent translation control. Nature 552, 126-131. 10.1038/nature24678.

      (10) Bhattarai, P.Y., Kim, G., Lim, S.C., and Choi, H.S. (2024). METTL3-STAT5B interaction facilitates the co-transcriptional m(6)A modification of mRNA to promote breast tumorigenesis. Cancer Lett 603, 217215. 10.1016/j.canlet.2024.217215.

      (11) Ge, Y., Ling, T., Wang, Y., Jia, X., Xie, X., Chen, R., Chen, S., Yuan, S., and Xu, A. (2021). Degradation of WTAP blocks antiviral responses by reducing the m(6) A levels of IRF3 and IFNAR1 mRNA. EMBO Rep 22, e52101. 10.15252/embr.202052101.

      (12) Dong, M.Z., Ouyang, Y.C., Gao, S.C., Ma, X.S., Hou, Y., Schatten, H., Wang, Z.B., and Sun, Q.Y. (2022). PPP4C facilitates homologous recombination DNA repair by dephosphorylating PLK1 during early embryo development. Development 149. 10.1242/dev.200351.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Although this study provides a comprehensive outlook on the ETC function in various tissues, the main caveat is that it's too technical and descriptive. The authors didn't invest much effort in putting their findings in the context of the biological function of the tissue analyzed, i.e., some tissues might be more glycolytic than others and have low ETC activity.

      To better contextualize our results, we have added substantial amount of new information to the Discussion Section.

      Also, it is unclear what slight changes in the activity of one or the other ETC complex mean in terms of mitochondrial ATP production.

      Unfortunately, the method we used can only determine oxygen consumption rate through complex I (CI), CII, or CIV. It cannot tell us about ATP production. This method only measures maximal uncoupled respiration.

      Likely, these small changes reported do not affect the mitochondrial respiration.

      We are indeed looking at mitochondrial respiration. Some changes are more dramatic while others are much more modest. We are looking at the normal aging process across tissues (focusing on mitochondrial respiration) and not pathological states. As such, we expect many of the changes in mitochondrial respiration across tissues to be mild or relatively modest. After all, aging is slow and progressive. In fact, the variations we observed in mitochondrial respiration across tissues are consistent with the known heterogenous rate of aging across tissues.

      With such a detailed dataset, the study falls short of deriving more functionally relevant conclusions about the heterogeneity of mitochondrial function in various tissues. In the current format, the readers get lost in the large amount of data presented in a technical manner.

      We agree that the paper contains a large amount of information. In the revised manuscript, we did our best to contextualize our results by substantially expanding the Discussion Section.

      Also, it is highly recommended that all the raw data and the values be made available as an Excel sheet (or other user-friendly formats) as a resource to the community.

      We included all the data in two excel sheets (Figure 1 – data source 1; Figure 1 – data source 2). We presented them in such as way that it will be easy for other investigators to follow and re-use our dataset in their own studies for comparison.

      Major concerns

      (1) In this study, the authors used the method developed by Acin-Perez and colleagues (EMBO J, 2020) to analyze ETC complex activities in mitochondria derived from the snap-frozen tissue samples. However, the preservation of cellular/mitochondrial integrity in different types of tissues after being snap-frozen was not validated.

      All the samples are actually maximally preserved due to being snap frozen. Freezing the samples disrupts the mitochondria to produce membrane fragments. Subsequent thawing, mincing, and homogenization in a non-detergent based buffer (mannose-sucrose) ensures that all tissue samples are maximally disrupted into fragments which contain ETC units in various combinations. This allows the assay to give an accurate representation of maximal respiratory capacity given the ETC units present in a tissue sample.

      Since aging has been identified as the most important effector in this study, it is essential to validate how aging affects respiration in various fresh frozen tissues. Such analysis will ensure that the results presented are not due to the differential preservation of the mitochondrial respiration in the frozen tissue. In addition, such validations will further strengthen the conclusions and promote the broad usability of this "new" method.

      The reason we adopted this method is because it has been rigorously validated in the original publication (PMID: 32432379) and a subsequent methods paper (PMID: 33320426). The authors in the original paper benchmarked their frozen tissue method with freshly isolated mitochondria from the same set of tissues. Their work showed highly comparable mitochondrial respiration from frozen tissues and isolated mitochondria. For this reason, we did not repeat those validation studies.

      (2) In this study, the authors sampled the maximal activity of ETC complex I, II, and IV, but throughout the manuscript, they discussed the data in the context of mitochondrial function.

      We apologize that we did not make it clearer in our manuscript. We corrected this in our revised manuscript (the Discussion Section). Our method we measure respiration starting at Complex I (CI; via NADH), starting at CII (via succinate), or starting at CIV (using TMPD and ascorbate). Regardless of whether electrons (donated by the substrate) enter the respiratory chain through CI, CII or CIV, oxygen (as the final electron acceptor) is only consumed at CIV. Therefor, the method measures mitochondrial respiration and function through CI, CII, or CIV. This high-resolution respirometry analysis method is different from the classic enzymatic method of assessing CI, CII, or CIV activity individually; the enzymatic method does not actually measure oxygen consumption due to electrons flowing through the respiratory complexes.

      However, it is unclear how the changes in CI, CII, and CIV activity affect overall mitochondrial function (if at all) and how small changes seen in the maximal activity of one or more complexes affect the efficiency and efficacy of ATP production (OxPhos).

      Please see the preceding response to the previous question. The method is measuring mitochondrial respiration through CI, CII or CIV. The limitation of this method is that it is maximal uncoupled respiration; namely, mitochondrial respiration is not coupled to ATP synthesis since the measurements are not performed on intact mitochondria. As such, we cannot say anything about the efficiency and efficacy of ATP production. This will be an interesting future studies to further investigating tissue level variations of mitochondrial OXPHOS.

      The authors report huge variability between the activity of different complexes - in some tissues all three complexes (CI, CII, and CIV) and often in others, just one complex was affected. For example, as presented in Figure 4, there is no difference in CI activity in the hippocampus and cerebellum, but there is a slight change in CII and CIV activity. In contrast, in heart atria, there is a change in the activity of CI but not in CII and CIV. However, the authors still suggest that there is a significant difference in mitochondrial activity (e.g., "Old males showed a striking increase in mitochondrial activity via CI in the heart atria....reduced mitochondrial respiration in the brain cortex..." - Lines 5-7, Page 9). Until and unless a clear justification is provided, the authors should not make these broad claims on mitochondrial respiration based on small changes in the activity of one or more complexes (CI/CII/CIV). With such a data-heavy and descriptive study, it is confusing to track what is relevant and what is not for the functioning of mitochondria.

      We have attempted to address these issues in the revised Discussion section.

      (3) What do differences in the ETC complex CI, CII, and CIV activity in the same tissue mean? What role does the differential activity of these complexes (CI, CII, and CIV) play in mitochondrial function? What do changes in Oxphos mean for different tissues? Does that mean the tissue (cells involved) shift more towards glycolysis to derive their energy? In the best world, a few experiments related to the glycolytic state of the cells would have been ideal to solidify their finding further. The authors could have easily used ECAR measurements for some tissues to support their key conclusions.

      We have attempted to address these issues in the revised Discussion section. The frozen tissue method does not involve intact mitochondria. As such, the method cannot measure ECAR, which requires the presence of intact mitochondria.

      (4) The authors further analyzed parameters that significantly changed across their study (Figure 7, 98 data points analyzed). The main caveat of such analysis is that some tissue types would be represented three or even more times (due to changes in the activity of all three complexes - CI, CII, and CIV, and across different ages and sexes), and some just once. Such a method of analysis will skew the interpretation towards a few over-represented organ/tissue systems. Perhaps the authors should separately analyze tissue where all three complexes are affected from those with just one affected complex.

      Figure 7 summarizes the differences between male vs female, and between young vs old. All the tissue-by-tissue comparisons (data separated by CI-linked respiration, CII-linked respiration, and CIV-linked respiration) can be found in earlier figures (Figure 1-6).

      The focus of Figure 7 is to helps us better appreciate all the changes seen in the preceding Figure 1-6:

      Panel A and B indicate all changes that are considered significant

      Panel C indicates total tissues with at least one significantly affected respiration

      Panel D indicates total magnitude of change (i.e., which tissue has the highest OCR) offering a non-relative view

      Panel E indicates whole body separations

      Panel F indicates whole body separations and age vs sex clustering

      (5) The current protocol does not provide cell-type-specific resolution and will be unable to identify the cellular source of mitochondrial respiration. This becomes important, especially for those organ systems with tremendous cellular heterogeneity, such as the brain. The authors should discuss whether the observed changes result from an altered mitochondria respiratory capacity or if changes in proportions of cell types in the different conditions studied (young vs. aged) might also contribute to differential mitochondrial respiration.

      We agree with the reviewer that this is a limitation of the method. We have addressed this issue in the revised Discussion section.

      (6) Another critical concern of this study is that the same datasets were repeatedly analyzed and reanalyzed throughout the study with almost the same conclusion - namely, aging affects mitochondrial function, and sex-specific differences are limited to very few organs. Although this study has considerable potential, the authors missed the chance to add new insights into the distinct characteristics of mitochondrial activity in various tissue and organ systems. The author should invest significant efforts in putting their data in the context of mitochondrial function.

      We have attempted to address these issues in the revised Discussion section.

      Reviewer #2 (Public Review):

      Summary:

      The authors utilize a new technique to measure mitochondrial respiration from frozen tissue extracts, which goes around the historical problem of purifying mitochondria prior to analysis, a process that requires a fair amount of time and cannot be easily scaled up.

      Strengths:

      A comprehensive analysis of mitochondrial respiration across tissues, sexes, and two different ages provides foundational knowledge needed in the field.

      Weaknesses:

      While many of the findings are mostly descriptive, this paper provides a large amount of data for the community and can be used as a reference for further studies. As the authors suggest, this is a new atlas of mitochondrial function in mouse. The inclusion of a middle aged time point and a slightly older young point (3-6 months) would be beneficial to the study.

      We agreed with the reviewer that inclusion of additional time points (e.g., 3-6 months) would further strengthen the study. However, the cost, labor, and time associated with another set of samples (660 tissue samples from male and female mice and 1980 respirometry assays) are too high for our lab with limited budget and manpower. Regrettably, we will not be able to carry out the extra work as requested by the reviewer.  

      Reviewer #3 (Public Review):

      The aim of the study was to map, a) whether different tissues exhibit different metabolic profiles (this is known already), what differences are found between female and male mice and how the profiles changes with age. In particular, the study recorded the activity of respirasomes, i.e. the concerted activity of mitochondrial respiratory complex chains consisting of CI+CIII2+CIV, CII+CIII2+CIV or CIV alone.

      The strength is certainly the atlas of oxidative metabolism in the whole mouse body, the inclusion of the two different sexes and the comparison between young and old mice. The measurement was performed on frozen tissue, which is possible as already shown (Acin-Perez et al, EMBO J, 2020).

      Weakness:

      The assay reveals the maximum capacity of enzyme activity, which is an artificial situation and may differ from in vivo respiration, as the authors themselves discuss. The material used was a very crude preparation of cells containing mitochondria and other cytosolic compounds and organelles. Thus, the conditions are not well defined and the respiratory chain activity was certainly uncoupled from ATP synthesis. Preparation of more pure mitochondria and testing for coupling would allow evaluation of additional parameters: P/O ratios, feedback mechanism, basal respiration, and ATP-coupled respiration, which reflect in vivo conditions much better. The discussion is rather descriptive and cautious and could lead to some speculations about what could cause the differences in respiration and also what consequences these could have, or what certain changes imply.

      Nevertheless, this study is an important step towards this kind of analysis.

      We have attempted to address some of these issues in the revised Discussion Section. The frozen tissue method can only measure maximal uncoupled respiration. Because we are not measuring mitochondrial respiration using intact mitochondria, several of the functional parameters the reviewer alluded to (e.g., P/O ratios, feedback mechanism, basal respiration, and ATP-coupled respiration) simply cannot be obtained with the current set of samples. Nevertheless, we agree that all the additional data (if obtained) would be very informative.

      Reviewer #1 (Recommendations For The Authors):

      (1) For most of the comparative analysis, the authors normalized OCR/min to MitoTracker Deep RedFM (MTDR) fluorescence intensity. Why was the data normalized to the total protein content not used for comparative analysis? Is there a correlation between MTDR fluorescence and the protein content across different tissues?

      Given that we used the crude extract method, total protein content does not equal total mitochondrial protein content. This is why the MTDR method was used, as this represents a high throughput method of assessing mitochondrial mass in this volume of samples. In general, the total protein concentration is used to ensure the respiration intensity was approximately the same across all samples loaded into the Seahorse machine.

      (2) To test the mitochondrial isolation yield, the authors should run immunoblot against canonical mitochondrial proteins in both homogenates and mitochondrial-containing supernatants and show that the protocol followed effectively enriched mitochondria in the supernatant fraction. This would also strengthen the notion that the "µg protein" value used to normalize the total mitochondrial content comes from isolated mitochondria and not other extra-mitochondrial proteins.

      Because we are using crude tissue lysate (from frozen tissue), the total ug protein content does not come from isolated mitochondria; for this reason, it was not used and this is why MTDR was. Total mitochondrial protein content is subject to change depending on tissue for non-mitochondrial reasons. This method does not use isolated mitochondria; we only use tissue lysates enriched for mitochondrial proteins. This method has been rigorously validated in the original study (PMID: 32432379) and a subsequent methods paper (PMID: 33320426). In those studies, the authors had performed requisite quality checks the reviewer has asked for (e.g., immunoblot against canonical mitochondrial proteins in both homogenates and mitochondrial-containing supernatants to show effective enrichment of mitochondrial proteins). For this reason, we did not repeat this.

      (3) MitoTracker loads into mitochondria in a membrane potential-dependent manner. The authors should rule out the possibility that samples from different ages and sexes might have different mitochondrial membrane potentials and exhibit a differential MitoTracker loading capacity. This becomes relevant for data normalization based on MTDR (MTDR/µg protein) since it was assumed that loading capacity is the same for mitochondria across different tissue and age groups.

      MitoTracker Deep Red is not membrane potential dependent and can be effectively used to quantify mitochondrial mass even when mitochondrial membrane potential is lost. This is highlighted in the original study (PMID: 32432379).

      (4) Page 11, line 3 typo - across, not cross.

      Response: We have fixed the typo.

      Reviewer #2 (Recommendations For The Authors):

      If possible, I would include a middle aged time point between 12 and 14 months of age.

      We agreed with reviewer that inclusion of additional time points (e.g., 3-6 months) would further strengthen the study. However, the cost, labor, and time associated with another set of samples (660 tissue samples from male and female mice and 1980 respirometry assays) are too high for our lab with limited budget and manpower. Regrettably, we will not be able to carry out the extra work as requested by the reviewer. 

      Reviewer #3 (Recommendations For The Authors):

      Overall, the work is well done and the data are well processed making them easy to understand. Some minor adjustments would improve the manuscript further:

      - Significance OCR in Figure 2, maybe add error bars?

      We have added the error bars and statistical significance to revised Figure 2.

      - Tissue comparison A-C, right panel: graphs are cropped

      We are not sure what the reviewer meant here. We have double checked all our revised figures to make sure nothing is accidentally cropped.

      - Heart ventricle: Old males and females have higher CI- and CII-dependent respiration than young males and females? Only CIV respiration is lower?

      Comparing old to young male or female heart ventricle respiration via CI or CII shows an increase in maximal capacity with age. CIV-linked respiration is in the upward direction as well, although not significant, when comparing old to young. When comparing the respiration values among themselves within a mouse, i.e. old male CI- or CII-linked respiration compared to old male CIV- linked respiration, we can see that the old male CIV-linked respiration is very similar. When comparing the same in the old female mouse, there appears to be something special about electrons entering through CI as compared to CII or CIV, as CI-linked respiration appears to be elevated compared to both CII and CIV. Although we do not know if this is significantly different, the trend in the data is clear. We do not know the exact reason as to why this occurred in the heart ventricles. To differing degrees, the connected nature of CI-, CII-, and CIV-linked respirations seems to be in a generally similar style in most skeletal muscles as well, and the old male heart atria. Again, the root of this discrepancy is unknown and potentially indicates an interesting physiologic trait of certain types of muscle and merits further exploration.

      - What is plotted in Fig.3: The mean of all OCR of all tissues? A,B,C: Plot with break in x-axis to expand the violin, add mean/median values as numbers to the graph (same for Fig4)

      The left most side of Figure 3 A, B, and C shows the average OCR/MTDR value across all tissues in a group. Each tissue assayed is represented in the violin plot as an open circle.

      - Fig. 3D: add YM/YF to graph for better understanding, same in following figures

      This is in the scale bar next to all heat maps presented in the figures. We also added to the revised figure as well to improve clarity.

      - Additional figures: x-axis title (time) is missing in OCR graphs

      Time has been added to the x axis of all additional figures for clarity.

      - Also a more general question is: where the concentrations of substrates and inhibitors optimized before starting the series of experiments?

      All the details of assay optimization was carried out in the original study (PMID: 32432379) and the subsequent methods paper (PMID: 33320426). Because we had to survey 33 different tissues, we tested and optimized the “optimal” protein concentrations we need to use; the primary goal of this was to balance enough respiration signal without too much respiration signal across all tissue types as to keep all the diverse tissues analyzed under the Seahorse machine’s capabilities of detection. Through our optimization of mostly the very high respiring tissues like heart and kidney, we were also able to prove that all substrates and inhibitors were in saturating concentrations since we could get respiration to go higher if more sample was added and that all signal could be lost in these samples with the same amount of inhibitors.

    1. Author response:

      The following is the authors’ response to the current reviews.

      Reviewer 2:

      In addition, it is still unacceptable for me that the number of ovulated oocytes in mice at 6 months of age is only one third of young mice (10 vs 30; Fig. S1E). The most of published literature show that mice at 12 months of age still have ~10 ovulated oocytes.

      We disagree with the reviewer’s comment, and the concerns raised were not shared by the other reviewers.  We have reported our data with full transparency (each data point is plotted). In the current study, we observed an intermediate phenotype in gamete number (assessed by both ovarian follicle counts and ovulated eggs) when comparing 6 month old mice to 6 week or 10 month old mice; this is as expected. It is well accepted that follicle counts are highly mouse strain dependent.  Although the reviewer mentions that mice at 12 months have ~10 ovulated oocytes, no actual references are provided nor are the mouse strain or other relevant experimental details mentioned.  Therefore, we do not know how these quoted metrics relate to the female FVB mice used in our current study.   As clearly explained and justified in our manuscript, we used mice at 6 months and 10 months to represent a physiologic aging continuum. 

      Moreover, based on the follicle counting method used in the present study (Fig. S1D), there are no antral follicles observed in mice at 6 months and 10 months of age, which is not reasonable.

      This statement is incorrect. Antral follicles were present at 6 and 10 months of age, but due to the scale of the y-axis and the normalization of follicle number/area in Fig. S1D, the values are small.  The absolute number of antral follicles per ovary (counted in every 5th section) was 31.3 ± 3.8 follicles for 6-week old mice, 9.3 ± 2.3 follicles for 6-month old mice, and 5.3 ± 1.8 follicles for 10-month old mice.  Moreover, it is important to note that these ovaries were not collected in a specific stage of the estrous cycle, so the number of antral follicles may not be maximal.  In addition, as described in the Materials and Methods, antral follicles were only counted when the oocyte nucleus was present in a section to avoid double counting.  Therefore, this approach (which was applied consistently across samples) could potentially underestimate the total number.


      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This manuscript by Bomba-Warczak describes a comprehensive evaluation of long-lived proteins in the ovary using transgenerational radioactive labelled 15N pulse-chase in mice. The transgenerational labeling of proteins (and nucleic acids) with 15N allowed the authors to identify regions enriched in long-lived macromolecules at the 6 and 10-month chase time points. The authors also identify the retained proteins in the ovary and oocyte using MS. Key findings include the relative enrichment in long-lived macromolecules in oocytes, pregranulosa cells, CL, stroma, and surprisingly OSE. Gene ontology analysis of these proteins revealed enrichment for nucleosome, myosin complex, mitochondria, and other matrix-type protein functions. Interestingly, compared to other post-mitotic tissues where such analyses have been previously performed such as the brain and heart, they find a higher fractional abundance of labeled proteins related to the mitochondria and myosin respectively.

      Response: We thank the reviewer for this thoughtful summary of our work.  We want to clarify that our pulse-chase strategy relied on a two-generation stable isotope-based metabolic labelling of mice using 15N from spirulina algae (for reference, please see (Fornasiero & Savas, 2023; Hark & Savas, 2021; Savas et al., 2012; Toyama et al., 2013)).  We did not utilize any radioactive isotopes.

      Strengths:

      A major strength of the study is the combined spatial analyses of LLPs using histological sections with MS analysis to identify retained proteins.

      Another major strength is the use of two chase time points allowing assessment of temporal changes in LLPs associated with aging.

      The major claims such as an enrichment of LLPs in pregranulosa cells, GCs of primary follicles, CL, stroma, and OSE are soundly supported by the analyses, and the caveat that nucleic acids might differentially contribute to this signal is well presented.

      The claims that nucleosomes, myosin complex, and mitochondrial proteins are enriched for LLPs are well supported by GO enrichment analysis and well described within the known body of evidence that these proteins are generally long-lived in other tissues.

      Weaknesses:

      Comment 1: One small potential weakness is the lack of a mechanistic explanation of if/why turnover may be accelerating at the 6-10 month interval compared to 1-6.

      Response 1: At the 6-month time point, we detected more long lived proteins than the 10 month time point in both the ovary and the oocyte.  We anticipated this because proteins are degraded over time, and substantially more time has elapsed at the later time point.  Moreover, at the 6–10-month time point, age-related tissue dysfunction is already evident in the ovary.  For example, in 6-9 month old mice, there is already a deterioration of chromosome cohesion in the egg which results in increased interkinetochore distances (Chiang et al., 2010), and by 10 months, there are multinucleated giant cells present in the ovarian stroma which is consistent with chronic inflammation (Briley et al., 2016).  Thus, the observed changes in protein dynamics may be another early feature of aging progression in the ovary.  

      Comment 2: A mild weakness is the open-ended explanation of OSE label retention. This is a very interesting finding, and the claims in the paper are nuanced and perfectly reflect the current understanding of OSE repair. However, if the sections are available and one could look at the spatial distribution of OSE signal across the ovarian surface it would interesting to note if label retention varied by regions such as the CLs or hilum where more/less OSE division may be expected. 

      Response 2: We agree that the enrichment of long-lived molecules in the OSE is interesting. To make interpretable conclusions about the dynamics of long-lived molecules in the OSE, we would need to generate a series of samples at precise stages of the estrous cycle or ideally across a timecourse of ovulation to capture follicular rupture and repair.  These samples do not currently exist and are beyond the scope of this study. However, this idea is an important future direction and it has been added to the discussion (lines 221-223). Furthermore, from a practical standpoint, MIMS imaging is resource and time intensive. Thus, we are not able to readily image entire ovarian sections.  Instead, we focused on structures within the ovary and took select images of follicles, stroma, and OSE.  We, therefore, do not have a comprehensive series of images of the OSE from the entire ovarian section for each mouse analyzed.

      Reviewer #2 (Public Review):

      Summary:

      The manuscript by Bomba-Warczak et al. applied multi-isotope imaging mass spectrometry (MIMS) analysis to identify the long-lived proteins in mouse ovaries during reproductive aging, and found some proteins related to cytoskeletal and mitochondrial dynamics persisting for 10 months.

      Response: We thank the reviewer for their summary and feedback.

      Strengths:

      The manuscript provides a useful dataset about protein turnover during ovarian aging in mice.

      Weaknesses:

      Comment 1: The study is pretty descriptive and short of further new findings based on the dataset. In addition, some results such as the numbers of follicles and ovulated oocytes in aged mice are not consistent with the published literature, and the method for follicle counting is not accurate. The conclusions are not fully supported by the presented evidence.

      Response 1: We agree with the reviewer that this study is descriptive. Our goal, as stated, was to use a discovery-based approach to define the long-lived proteome of the ovary and oocyte across a reproductive aging continuum.  As the prominent aging researcher, Dr. James Kirkland, stated: “although ‘descriptive’ is sometimes used as a pejorative term…descriptive or discovery research leading to hypothesis generation has become highly sophisticated and of great relevance to the aging field (Kirkland, 2013).”  We respectfully disagree with the reviewer that our study is short of new findings. In fact, this is the first time that a stable two-generation stable isotope-based metabolic labelling of mice in combination with two different state-of-the-art mass spectrometry methods has been used to identify and localize long lived molecules in the ovary and oocyte along this particular reproductive aging continuum in an unbiased manner.  We have identified proteins groups that were previously not known to be long lived in the ovary and oocyte.  Our hope is that this long-lived proteome will become an important hypothesis-generating resource for the field of reproductive aging.

      The age-dependent decline in number of follicles and eggs ovulated in mice has been well established by our group as well as others (Duncan et al., 2017; Mara et al., 2020).  Thus, we are unclear about the reviewer’s comments that our results are not consistent with the published literature.  The absolute numbers of follicles and eggs ovulated as well as the rate of decline with age are highly strain dependent.  Moreover, mice can have a very small ovarian reserve and still maintain fertility (Kerr et al., 2012).  In our study, we saw a consistent age-dependent decrease in the ovarian reserve (Figure 1 – figure supplement 1 D), the number of oocytes collected from large antral follicles following hyperstimulation with PMSG (used for LC-MS/MS), and the number of eggs collected from the oviduct following hyperstimulation and superovulation with PMSG and hCG (Figure 1 – figure supplement 1 E and F).  In all cases, the decline was greater in 10 month old compared to 6 month old mice demonstrating a relative reproductive aging continuum even at these time points.

      Our research team has significant expertise in follicle classification and counting as evidenced by our publication record (Duncan et al., 2017; Kimler et al., 2018; Perrone et al., 2023; Quan et al., 2020).  We used our established methods which we have further clarified in the manuscript text (lines 395-397).  Follicle counts were performed on every 5th tissue section of serial sectioned ovaries, and 1 ovary from 3 mice per timepoint were counted. Therefore, follicle counts were performed on an average of 48-62 total sections per ovary. The number of follicles was then normalized per total area (mm2) of the tissue section, and the counts were averaged. Figure 1 – figure supplement 1 C and D represents data averaged from all ovarian sections counted per mouse.   It is important to note that the same criteria were applied consistently to all ovaries across the study, and thus regardless of the technique used, the relative number of follicles or oocytes across ages can be compared.

      Reviewer #3 (Public Review):

      Summary:

      In this study, Bomba-Warczak et al focused on reproductive aging, and they presented a map for long-lived proteins that were stable during reproductive lifespan. The authors used MIMS to examine and show distinct molecules in different cell types in the ovary and tissue regions in a 6 month mice group, and they also used proteomic analysis to present different LLPs in ovaries between these two timepoints in 6-month and 10-month mice. The authors also examined the LLPs in oocytes in the 6-months mice group and indicated that these were nuclear, cytoskeleton, and mitochondria proteins.

      Response: We thank the reviewer for their summary and feedback.

      Strengths:

      Overall, this study provided basic information or a 'map' of the pattern of long-lived proteins during aging, which will contribute to the understanding of the defects caused by reproductive aging.

      Weaknesses:

      Comment 1: The 6-month mice were used as an aged model; no validation experiments were performed with proteomics analysis only.  

      Response 1:  We did not select the 6-month time point to be representative of the “aged model” but rather one of two timepoints on the reproductive aging continuum – 6 and 10 months.  In the manuscript (Figure 1 – figure supplement 1) we have demonstrated the relevance of the two timepoints by illustrating a decrease in follicle counts, number of fully grown oocytes collected, and number of eggs ovulated as well as a tendency towards increased stromal fibrosis (highlighted in the main text lines 78-85).  Inclusion of the 6-month timepoint ultimately turned out to be informative and essential as many long-lived proteins were absent by the 10 month timepoint. These results suggest that important shifts in the proteome occur during mid to advanced reproductive age.  The relevance of these timepoints is mentioned in the discussion (lines 247-270).

      Two independent mass spectrometry approaches (MIMS and LC-MS/MS) were used to validate the presence of long-lived macromolecules in the ovary and oocyte. Studies focused on the role of specific long-lived proteins in oocyte and ovarian biology as well as how they change with age in terms of function, turnover, and modification are beyond the scope of the current study but are ongoing.  We have acknowledged these important next steps in the manuscript text (lines 286-288, 311-312).

      It is important to note, that oocytes are biomass limited cells, and their numbers decrease with age.  Thus, we had to select ages where we could still collect enough from the mice available to perform LC-MS/MS. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Comment 1: The writing and figures are beautiful - it would be hard to improve this manuscript.

      Response 1: We greatly appreciate this enthusiastic evaluation of our work.

      Comment 2: In Fig S1E/F it would help to list the N number here. Why are there 2 groups at 6-12 wk?

      Response 2:  We did not have 6 month and 10-month-old mice available at the same time to be able to run the hyperstimulation and superovulation experiment in parallel.  Therefore, we performed independent experiments comparing the number of eggs collected from either 6-month-old or 10 month old mice relative to 6-12 week old controls.  In each trial, eggs were collected from pooled oviducts from between 3-4 mice per age group, and the average total number of eggs per mouse was reported.  Each point on the graph corresponds to the data from an individual trial, and two trials were performed.  This has been clarified in the figure legend (lines 395-397).  Of note, while addressing this reviewer’s comments, we noticed that we were missing Materials and Methods regarding the collection of eggs from the oviduct following hyperstimulation and superovulation with PMSG and hCG.  This information has now been added in Methods Section, lines 477-481.

      Comment 3: The manuscript would benefit from an explanation of why the pups were kept on a 1-month N15 diet after birth, since the oocytes are already labeled before birth, and granulosa at most by day 3-4. Would ZP3 have not been identified otherwise?

      Response 3:   The pups used in this study were obtained from fully labeled female dams that were maintained on an15N diet.  These pups had to be kept with their mothers through weaning.  To limit the pulse period only through birth, the pups would have had to be transferred to unlabeled foster mothers.  However, this would have risked pup loss which would have significantly impacted our ability to conduct the studies given that we only had 19 labeled female pups from three breeding pairs.  We have clarified this in the manuscript text in lines 78-80.  It is hard to know, without doing the experiment, whether we would have detected ZP3 if we only labeled through birth.  The expression of ZP3 in primordial follicles, albeit in human, would suggest that this protein is expressed quite early in development.

      Comment 4: What is happening to the mitochondria at 6-10 months? Does their number change in the oocyte? Is there a change in the rate of fission? Any chance to take a stab at it with these or other age-matched slides?

      Response 4:  The reviewer raises an excellent point.  As mentioned previously in the Discussion (lines 290-301), there are well documented changes in mitochondrial structure and function in the oocyte in mice of advanced reproductive age.  However, there is a paucity of data on the changes that may happen at earlier mid-reproductive age time points.  From the oocyte mitochondrial proteome perspective, our data demonstrate a prominent decline in the persistence of long-lived proteins between 6 and 10 months, and this occurs in the absence of a change in the total pool of mitochondrial proteins (both long and short lived populations) as assessed by spectral counts or protein IDs (figure below).  These data, which we have added into Figure 3 – figure supplement 1 and in the manuscript text (lines 164-170) are suggestive of similar numbers of mitochondria at these two timepoints. It would be informative to do a detailed characterization of oocyte mitochondrial structure and function within this window to see if there is a correlation with this shift in long lived mitochondrial proteins.  Although this analysis is beyond the scope of the current manuscript, it is an important next line of inquiry which we have highlighted in the manuscript text (lines 255-257 and 311-312).

      Reviewer #2 (Recommendations For The Authors):

      Several concerns are raised as shown below.

      Comment 1: In Fig. 2F, it is surprising that ZP3 disappeared in the ovary from mice at the age of 10 months by MIMS analysis, because quite a few oocytes with intact zona pellucida can still be obtained from mice at this age. Notably, ZP would not be renewed once formed.

      Response 1: To clarify, Figure 2F shows LC-MS/MS data and not MIMS data.  As mentioned in the Discussion, the detection of long-lived pools of ZP3 at 6 months cannot be derived from newly synthesized zona pellucidae in growing follicles because they would not have been present during the pulse period.  The only way we could detect ZP3 at 6 months is if it forms a primitive zona scaffold in the primordial follicle or if ZPs from atretic follicles of the first couple of waves of folliculogenesis incorporate into the extracellular matrix of the ovary.  The lack of persistence of ZP3 at 10 months could be due to protein degradation. Should ZP3 indeed form a primitive zona, its loss at 10 months would be predicted to result in poor formation of a bona fide zona pellucida upon follicle growth.  Interestingly, aging has been associated with alterations in zona pellucida structure and function.   These data open novel hypotheses regarding the zona pellucida (e.g. a primitive zona scaffold and part of the extracellular matrix) and will require significant further investigation to test. These points are highlighted in the Discussion lines 227-245.

      Comment 2: To determine whether those proteins that can not be identified by MIMS at the time point of 10 months are degraded or renewed, the authors should randomly select some of them to examine their protein expression levels in the ovary by immunoblotting analysis.

      Response 2: To clarify, proteins were identified by LC-MS/MS and not MIMS which was used to visualize long lived macromolecules.   Each protein will be comprised of old pools (15N containing) and newly synthesized pools (14N containing).  Degradation of the old pool of protein does not mean that there will be a loss of total protein.  Moreover, immunoblotting cannot distinguish old and newly synthesized pools of protein. Where overall peptide counts are listed for each protein identified at both time points.  As peptides derive from proteins, the table provided with the manuscript reflects what immunoblotting would, but on a larger and more precise scale.

      Comment 3: I think those proteins that can be identified by MIMS at the time point of 6 months but not 10 months deserve more analyses as they might be the key molecules that drive ovarian aging.

      Response 3:  This comment conflicts with comment 2 from Reviewer #3 (Recommendations For The Authors).  This underscores that different researchers will prioritize the value and follow up of such rich datasets differently.  We agree that the LLP identified at 6 months are of particular interest to reproductive aging, and we are planning to follow up on these in future studies.

      Comment 4:  Figure 1 – figure supplement 1 C-F, compared with the published literature, the numbers of follicles at different developmental stages and ovulated oocytes at both ages of 6 months and 10 months were dramatically low in this study. For 6-month-old female mice, the reproductive aging just begins, thus these numbers should not be expected to decrease too much. In addition, follicle counting was carried out only in an area of a single section, which is an inaccurate way, because the numbers and types of follicles in various sections differ greatly. Also, the data from a single section could not represent the changes in total follicle counts.

      Response 4: We have addressed these points in response to Comment 1 in the Reviewer #2 Public Review, and corresponding changes in the text have been noted.    

      Comment 5:  The study lacks follow-up verification experiments to validate their MIMS data.

      Response 5: Two independent mass spectrometry approaches (MIMS and LC-MS/MS) were used to validate the presence of long-lived macromolecules in the ovary and oocyte. Studies focused on the role of specific long-lived proteins in oocyte and ovarian biology as well as how they change with age in terms of function, turnover, and modification are beyond the scope of the current study but ongoing.  We have acknowledged these important next steps in the manuscript text (lines 286-288 and 311-312).

      Reviewer #3 (Recommendations For The Authors):

      Comment 1: The authors used the 6-month mice group to represent the aged model, and examined the LLPs from 1 month to 6 months. Indeed, 6-month-old mice start to show age-related changes; however, for the reproductive aging model, the most widely accepted model is that 10-month-old age mice start to show reproductive-related changes and 12-month-old mice (corresponding to 35-40 year-old women) exhibit the representative reproductive aging phenotypes. Therefore, the data may not present the typical situation of LLPs during reproductive aging.

      Response 1: As described in the response to Comment 1 in the Reviewer #3 Public Review, there were clear logistical and technical feasibility reasons why the 6 month and 10-month timepoints were selected for this study.  Importantly, however, these timepoints do represent a reproductive aging continuum as evidenced by age-related changes in multiple parameters.  Furthermore, there were ultimately very few LLPs that remained at 10 months in both the oocyte and ovary, so inclusion of the 6-month time point was an important intermediate.  Whether the LLPs at the 6-month timepoint serve as a protective mechanism in maintaining gamete quality or whether they contribute to decreased quality associated with reproductive aging is an intriguing dichotomy which will require further investigation.  This has been added to the discussion (lines 247-257).

      Comment 2:  Following the point above, the authors examined the ovaries in 6 months and 10 months mice by proteomics, and found that 6 months LLPs were not identical compared with 10 months, while there were Tubb5, Tubb4a/b, Tubb2a/b, Hist2h2 were both expressed at these two time points (Fig 2B), why the authors did not explore these proteins since they expressed from 1 month to 10 months, which are more interesting.

      Response 2:  The objective of this study was to profile the long-lived proteome in the ovary and oocyte as a resource for the field rather than delving into specific LLPs at a mechanistic level.  That being said, we wholeheartedly agree with the reviewer that the proteins that were identified at both 6 month and 10 months are the most robust and long lived and worthy of prioritizing for further study.  Interestingly, Tubb5 and Tubb4a have high homology to primate-specific Tubb8, and Tubb8 mutations in women are associated with meiosis I arrest in oocytes and infertility (Dong et al., 2023; Feng et al., 2016).  Thus, perturbation of these specific proteins by virtue of their long-lived nature may be associated with impaired function and poor reproductive outcomes.  We have highlighted the importance of these LLPs which are present at both timepoints and persist to at least 10 months in the manuscript text (lines 259-270).

      Comment 3:  The authors also need to provide a hypothesis or explanation as to why LLDs from 6 months LLPs were not identical compared with 10 months.

      Response 3:  We agree that LLDs identified at 10 months should be also identified as long-lived at 6 months. This is a common limitation of mass spectrometry-based proteomics where each sample is prepared and run individually, which introduces variability between biological replicates, especially when it comes to low abundant proteins. It is key to note that just because we do not identify a protein, it does not mean the protein is not there – it merely means that we were not able to detect it in this particular experiment, but low levels of the protein may still be there. To compensate for this known and inherent variability, we have applied stringent filtering criteria where we required long-lived peptides to be identified in an independent MS scan (alternative is to identify peptide in either heavy or light scan and use modeling to infer FA value based on m/z shift), which gave us peptides of highest confidence. Ideally, these experiments would be done using TMT (tandem mass tag) approach. However, TMT-based experiments typically require substantial amount of input (80-100ug per sample) which unfortunately is not feasible with oocytes obtained from a limited number of pulse-chased animals.  We have added this explanation to the discussion (lines 265-270).

      Comment 4:  The reviewer thinks that LLPs from 6 months to 10 months may more closely represent the long-lived proteins during reproductive aging.

      Response 4:  We fully agree that understanding the identity of LLPs between the 6 month and 10 month period will be quite informative given that this is a dynamic period when many of LLPs get degraded and thus might be key to the observed decline in reproductive aging. This is a very important point that we hope to explore in future follow-up studies.

      Comment 5: The authors used proteomics for the detection of ovaries and oocytes, however, there are no validation experiments at all. Since proteomics is mainly for screening and prediction, the authors should examine at least some typical proteins to confirm the validity of proteomics. For example, the authors specifically emphasized the finding of ZP3, a protein that is critical for fertilization.

      Response 5:  Thank you, we agree that closer examination of proteins relevant and critical for fertilization is of importance.  However, a detailed analysis of specific proteins fell outside of the scope of this study which aimed at unbiased identification of long-lived macromolecules in ovaries and oocytes. We hope to continue this important work in near future.

      Comment 6: For the oocytes, the authors indicated that cytoskeleton, mitochondria-related proteins were the main LLPs, however, previous studies reported the changes of the expression of many cytoskeleton and mitochondria-related proteins during oocyte aging. How do the authors explain this contrary finding?   

      Response 6:  Our findings are not contrary to the studies reporting changes in protein expression levels during oocyte aging – the two concepts are not mutually exclusive. The average FA value at 6-month chase for oocyte proteins is 41.3 %, which means that while 41.3% of long-lived proteins pool persisted for 6 months, the other 58.7% has in fact been renewed. With the exception of few mitochondrial proteins (Cmkt2 and Apt5l), and myosins (Myl2 and Myh7), which had FA values close to 100% (no turnover), most of the LLPs had a portion of protein pools that were indeed turned over. Moreover, we included new data analysis illustrating that we identify comparable number of mitochondrial proteins between the two time points, indicating that while the long-lived pools are changing over time, the total content remains stable (Figure 3 – figure supplement 1E-G).

      Comment 7:  The authors also should provide in-depth discussion about the findings of the current study for long-lived proteins. In this study, the authors reported the relationship between these "long-lived" proteins with aging, a process with multiple "changes". Do long-lived proteins (which are related to the cytoskeleton and mitochondria) contribute to the aging defects of reproduction? or protect against aging?

      Response 7: This is a very important comment and one that needs further exploration. The fact is – we do not know at this moment whether these proteins are protective or deleterious, and such a statement would be speculative at this stage of research into LLPs in ovaries and oocytes. Future work is needed to address this question in detail.

      Briley, S. M., Jasti, S., McCracken, J. M., Hornick, J. E., Fegley, B., Pritchard, M. T., & Duncan, F. E. (2016). Reproductive age-associated fibrosis in the stroma of the mammalian ovary. Reproduction, 152(3), 245-260. https://doi.org/10.1530/REP-16-0129

      Chiang, T., Duncan, F. E., Schindler, K., Schultz, R. M., & Lampson, M. A. (2010). Evidence that Weakened Centromere Cohesion Is a Leading Cause of Age-Related Aneuploidy in Oocytes. Current Biology, 20(17), 1522-1528. https://doi.org/10.1016/j.cub.2010.06.069

      Dong, J., Jin, L., Bao, S., Chen, B., Zeng, Y., Luo, Y., Du, X., Sang, Q., Wu, T., & Wang, L. (2023). Ectopic expression of human TUBB8 leads to increased aneuploidy in mouse oocytes. Cell Discov, 9(1), 105. https://doi.org/10.1038/s41421-023-00599-z

      Duncan, F. E., Jasti, S., Paulson, A., Kelsh, J. M., Fegley, B., & Gerton, J. L. (2017). Age-associated dysregulation of protein metabolism in the mammalian oocyte. Aging Cell, 16(6), 1381-1393. https://doi.org/10.1111/acel.12676

      Feng, R., Sang, Q., Kuang, Y., Sun, X., Yan, Z., Zhang, S., Shi, J., Tian, G., Luchniak, A., Fukuda, Y., Li, B., Yu, M., Chen, J., Xu, Y., Guo, L., Qu, R., Wang, X., Sun, Z., Liu, M., . . . Wang, L. (2016). Mutations in TUBB8 and Human Oocyte Meiotic Arrest. N Engl J Med, 374(3), 223-232. https://doi.org/10.1056/NEJMoa1510791

      Fornasiero, E. F., & Savas, J. N. (2023). Determining and interpreting protein lifetimes in mammalian tissues. Trends Biochem Sci, 48(2), 106-118. https://doi.org/10.1016/j.tibs.2022.08.011

      Hark, T. J., & Savas, J. N. (2021). Using stable isotope labeling to advance our understanding of Alzheimer's disease etiology and pathology. J Neurochem, 159(2), 318-329. https://doi.org/10.1111/jnc.15298

      Kerr, J. B., Hutt, K. J., Michalak, E. M., Cook, M., Vandenberg, C. J., Liew, S. H., Bouillet, P., Mills, A., Scott, C. L., Findlay, J. K., & Strasser, A. (2012). DNA damage-induced primordial follicle oocyte apoptosis and loss of fertility require TAp63-mediated induction of Puma and Noxa. Mol Cell, 48(3), 343-352. https://doi.org/10.1016/j.molcel.2012.08.017

      Kimler, B. F., Briley, S. M., Johnson, B. W., Armstrong, A. G., Jasti, S., & Duncan, F. E. (2018). Radiation-induced ovarian follicle loss occurs without overt stromal changes. Reproduction, 155(6), 553-562. https://doi.org/10.1530/REP-18-0089

      Kirkland, J. L. (2013). Translating advances from the basic biology of aging into clinical application. Exp Gerontol, 48(1), 1-5. https://doi.org/10.1016/j.exger.2012.11.014

      Mara, J. N., Zhou, L. T., Larmore, M., Johnson, B., Ayiku, R., Amargant, F., Pritchard, M. T., & Duncan, F. E. (2020). Ovulation and ovarian wound healing are impaired with advanced reproductive age. Aging (Albany NY), 12(10), 9686-9713. https://doi.org/10.18632/aging.103237

      Perrone, R., Ashok Kumaar, P. V., Haky, L., Hahn, C., Riley, R., Balough, J., Zaza, G., Soygur, B., Hung, K., Prado, L., Kasler, H. G., Tiwari, R., Matsui, H., Hormazabal, G. V., Heckenbach, I., Scheibye-Knudsen, M., Duncan, F. E., & Verdin, E. (2023). CD38 regulates ovarian function and fecundity via NAD(+) metabolism. iScience, 26(10), 107949. https://doi.org/10.1016/j.isci.2023.107949

      Quan, N., Harris, L. R., Halder, R., Trinidad, C. V., Johnson, B. W., Horton, S., Kimler, B. F., Pritchard, M. T., & Duncan, F. E. (2020). Differential sensitivity of inbred mouse strains to ovarian damage in response to low-dose total body irradiationdagger. Biol Reprod, 102(1), 133-144. https://doi.org/10.1093/biolre/ioz164

      Savas, J. N., Toyama, B. H., Xu, T., Yates, J. R., 3rd, & Hetzer, M. W. (2012). Extremely long-lived nuclear pore proteins in the rat brain. Science, 335(6071), 942. https://doi.org/10.1126/science.1217421

      Toyama, B. H., Savas, J. N., Park, S. K., Harris, M. S., Ingolia, N. T., Yates, J. R., 3rd, & Hetzer, M. W. (2013). Identification of long-lived proteins reveals exceptional stability of essential cellular structures. Cell, 154(5), 971-982. https://doi.org/10.1016/j.cell.2013.07.037

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      This is a fine paper that serves the purpose to show that the use of light sheet imaging may be used to provide whole brain imaging of axonal projections. The data provided suggest that at this point the technique provides lower resolution than with other techniques. Nonetheless, the technique does provide useful, if not novel, information about particular brain systems. 

      Strengths: 

      The manuscript is well written. In the introduction a clear description of the functional organization of the barrel cortex is provided provides the context for applying the use of specific Cre-driver lines to map the projections of the main cortical projection types using whole brain neuroanatomical tracing techniques. The results provided are also well written, with sufficient detail describing the specifics of how techniques were used to obtain relevant data. Appropriate controls were done, including the identification of whisker fields for viral injections and determination of the laminar pattern of Cre expression. The mapping of the data provides a good way to visualize low resolution patterns of projections. 

      Weaknesses: 

      (1) The results provided are, as stated in the discussion, "largely in agreement with previously reported studies of the major projection targets". However it must be stated that the study does not "extend current knowledge through the high sensitivity for detecting sparse axons, the high specificity of labeling of genetically defined classes of neurons and the brain wide analysis for assigning axons to detailed brain regions" which have all been published in numerous other studies. ( the allen connectivity project and related papers, along with others). If anything the labeling of axons obtained with light sheet imaging in this study does not provide as detailed mapping obtained with other techniques. Some detail is provided of how the raw images are processed to resolve labeled axons, but the images shown in the figures do not demonstrate how well individual axons may be resolved, of particular interest would be to see labeling in terminal areas such as other cortical areas, striatum and thalamus. As presented the light sheet imaging appears to be rather low resolution compared to the many studies that have used viral tracing to look at cortical projections from genetically identified cortical neurons. 

      We agree with the reviewer that the resolution of imaging should be further improved in future studies, as also mentioned in the original manuscript. On P. 17 of the revised manuscript we write “Probably most important for future studies is the need to increase the light-sheet imaging resolution perhaps combined with the use of expansion microscopy to provide brain-wide micron-resolution data (Glaser et al., 2023; Wassie et al., 2019).” However, even at somewhat lower resolution, through bright sparse labelling, individual axonal segments can nonetheless be traced through machine learning to define axonal skeletons, whose length can be quantified as we do in this study. This methodology highlights sparse wS1 and wS2 innervation of a large number of brain areas, some of which are not typically considered, and our anatomical results might therefore help the neuronal circuit analysis underlying various aspects of whisker sensorimotor processing. Despite impressive large-scale projection mapping projects such as the Allen connectivity atlas, there remains relatively sparse cell typespecific projection map data for the representations of the large posterior whiskers in wS1 and wS2, and our data in this study thus adds to a growing body of cell-type specific projection mapping with the specific focus on the output connectivity of these whisker-related neocortical regions of sensory cortex.

      In the revised manuscript, we now provide an additional supplementary figure (Figure 1 – figure supplement 2) showing examples of the axonal segmentation from further additional image planes including branching axons in the key innervation regions mentioned by the reviewer, namely “other cortical areas, striatum and thalamus”.

      (2) Amongst the limitations of this study is the inability to resolve axons of passage and terminal fields. This has been done in other studies with viral constructs labeling synaptophysin. This should be mentioned. 

      The reviewer brings up another important point for future methodological improvements to enhance connectivity mapping. Indeed, we already mentioned this in our original submission near the end of the first paragraph under the Limitations and future perspectives section. In the revised manuscript on P. 17, we write “Future studies should also aim to identify neurotransmitter release sites along the axon, which could be achieved by fluorescent labeling of prominent synaptic components, such as synaptophysin-GFP (Li et al., 2010).”

      (3) There is no quantitative analysis of differences between the genetically defined neurons projecting to the striatum, what is the relative area innervated by, density of terminals, other measures. 

      The reviewer raises an interesting question, and in the revised manuscript, we now present a more detailed analysis of cell class-specific axonal projections focusing specifically on the striatum. Following the reviewer’s suggestion, in a new supplementary figure (Figure 7 – figure supplement 1), we now report spatial axonal density maps in the striatum from SSp-bfd and SSs, finding potentially interesting differences comparing the projections of Rasgrf2-L2/3, Scnn1a-L4 and Tlx3-L5IT neurons. On P. 12 of the revised manuscript, we now write “We also investigated the spatial innervation pattern of Rasgrf2-L2/3, Scnn1a-L4 and Tlx3-L5IT neurons in the striatum (Figure 7 – figure supplement 1), where we found that axonal density from Rasgrf2-L2/3 neurons in both SSp-bfd and SSs was concentrated in a posterior dorsolateral part of the ipsilateral striatum, whereas Tlx3-L5IT neurons had extensive axonal density across a much larger region of the striatum, including bilateral innervation by SSp-bfd neurons. Striatal innervation by Scnn1a-L4 neurons was intermediate between Rasgrf2-L2/3 and Tlx3-L5IT neurons.” We think the reviewer’s comment has helped reveal further interesting aspects of our data set, and we thank the reviewer.

      (4) Figure 5 is an example of the type of large sets of data that can be generated with whole brain mapping and registration to the Allen CCF that provides information of questionable value. Ordering the 50 plus structures by the density of labeling does not provide much in terms of relative input to different types of areas. There are multiple subregions for different functional types ( ie, different visual areas and different motor subregions are scattered not grouped together. Makes it difficult to understand any organizing principles.

      We agree with the reviewer, and fully support the importance of considering subregions within the relatively coarse compartmentalization of the current Allen CCF. In order to provide some further information about connectivity that may help give the reader further insights into the data, we have now added further quantification of cortex-specific axonal density ranked according to functional subregions in a new supplementary figure (Figure 5 – figure supplement 2). 

      (5) The GENSAT Cre driver lines used must have the specific line name used, not just the gene name as the GENSAT BAC-Cre lines had multiple lines for each gene and often with very different expression patterns. Rbp4_KL100, Tlx3_PL56, Sim1_KJ18, Ntsr1_ GN220. 

      In the revised manuscript, we now write out a fuller description of the mouse lines the first time they are mentioned in the Results section on P. 7. The full mouse line names, accession numbers and references were of course already described in the methods section, which remains the case in the revised manuscript.

      Reviewer #2 (Public Review): 

      Summary: 

      This study takes advantage of multiple methodological advances to perform layer-specific staining of cortical neurons and tracking of their axons to identify the pattern of their projections. This publication offers a mesoscale view of the projection patterns of neurons in the whisker primary and secondary somatosensory cortex. The authors report that, consistent with the literature, the pattern of projection is highly different across cortical layers and subtype, with targets being located around the whole brain. This was tested across 6 different mouse types that expressed a marker in layer 2/3, layer 4, layer 5 (3 sub-types) and layer 6.  Looking more closely at the projections from primary somatosensory cortex into the primary motor cortex, they found that there was a significant spatial clustering of projections from topographically separated neurons across the primary somatosensory cortex. This was true for neurons with cell bodies located across all tested layers/types. 

      Strengths: 

      This study successfully looks at the relevant scale to study projection patterns, which is the whole brain. This is achieved thanks to an ambitious combination of mouse lines, immunohistochemistry, imaging and image processing, which results in a standardized histological pipeline that processes the whole-brain projection patterns of layer-selected neurons of the primary and secondary somatosensory cortex. 

      This standardization means that comparisons between cell-types projection patterns are possible and that both the large-scale structure of the pattern and the minute details of the intra-areas pattern are available. 

      This reference dataset and the corresponding analysis code are made available to the research community. 

      Weaknesses: 

      One major question raised by this dataset is the risk of missing axons during the postprocessing step. Indeed, it appears that the control and training efforts have focused on the risk of false positives (see Figure 1 supplementary panels). And indeed, the risk of overlooking existing axons in the raw fluorescence data id discussed in the article. 

      Based on the data reported in the article, this is more than a risk. In particular, Figure 2 shows an example Rbp4-L5 mouse where axonal spread seems massive in Hippocampus, while there is no mention of this area in the processed projection data for this mouse line. 

      In Figure 2, we show the expression of tdTomato in double-transgenic mice in which the Cre-driver lines were crossed with a Cre-dependent reporter mouse expressing cytosolic tdTomato. In addition to the specific labelling of L5PT neurons in the somatosensory cortex, Rbp4-Cre mice also express Cre-recombinase in other brain regions including the hippocampus. In the reporter mice crossed with Rbp4-Cre mice, tdTomato is expressed in neurons with cell bodies in the hippocampus which is clearly visualized in Figure 2. Because our axonal labelling is based on localized viral vector expression of tdTomato in SSp-bfd and SSs, the expression of Cre in hippocampus does not affect our analysis. In order to clarify to the reader, in the legend to Figure 2D, we now specifically write “As for panel A, but for Rbp4-L5 neurons. Note strong expression of Cre in neurons with cell bodies located in the hippocampus, which does not affect our analysis of axonal density based on virus injected locally into the neocortex.” Consistent with this observation, the Allen Institute’s ISH data support

      expression of Rbp4 in neurons of the hippocampus e.g. https://mouse.brainmap.org/gene/show/19425 and https://mouse.brainmap.org/experiment/show/68632655.

      Similarily, the Ntsr1-L6CT example shows a striking level of fluorescence in Striatum, that does not reflect in the amount of axons that are detected by the algorithms in the next figures.  These apparent discrepancies may be due to non axonal-specific fluorescence in the samples. In any case, further analysis of such anatomical areas would be useful to consolidate the valuable dataset provided by the article. 

      As pointed out above, Figure 2 shows cytosolic tdTomato fluorescence in transgenic crosses of the Cre-driver mice with Cre-dependent tdTomato reporter mice. For the Ntsr1-Cre x LSL-tdTomato mice, all corticothalamic L6CT neurons from across the entire cortex drive tdTomato expression. The axon of each neuron must traverse the striatum giving rise to fluorescence in the striatum. As discussed above, labelling of synaptic specialisations will be important in future studies to separate travelling axon from innervating axon. However, the overall impact of the axons traversing the striatum is again mitigated in our study by considering the axonal projections from local sparse infections in SSp-bfd and SSs rather than from cortex-wide tdTomato expression.

      Reviewer #3 (Public Review): 

      Summary: 

      The paper offers a systematic and rigorous description of the layer-and sublayer specific outputs of the somatosensory cortex using a modern toolbox for the analysis of brain connectivity which combines: 1) Layer-specific genetic drivers for conditional viral tracing; 2) whole brain analyses of axon tracts using tissue clearing and imaging; 3) Segmentation and quantification of axons with normalization to the number of transduced neurons; 4) registration of connectivity to a widely used anatomical reference atlas; 5) functional validation of the connectivity using optogenetic approaches in vivo. 

      Strengths: 

      Although the connectivity of the somatosensory cortex is already known, precise data are dispersed in different accounts (papers, online resources,) using different methods. So the present account has the merit of condensing this information in one very precisely documented report. It also brings new insights on the connectivity, such as the precise comparison of layer specific outputs, and of the primary and secondary somatosensory areas. It also shows a topographic organization of the circuits linking the somatosensory and motor cortices. The paper also offers a clear description of the methodology and of a rigorous approach to quantitative anatomy. 

      Weaknesses: 

      The weakness relates to the intrinsic limitations of the in toto approaches, that currently lack the precision and resolution allowing to identify single axons, axon branching or synaptic connectivity. These limitations are identified and discussed by the authors. 

      We agree with the reviewer.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      No additional comment 

      OK

      Reviewer #2 (Recommendations For The Authors): 

      In Figure 8, we don't get to see much raw data, while the diversity of functional responses pattern to the primary and supplementary S1 activations is highly intriguing (and this diversity exists as suggested by the results in Figure 8E, LRPT). 

      Can Figure 8C be less blurred? Maybe give more space to individual examples, such as an overlay of the delineations of the activated area across the tested mice? 

      Also, can we have a view on the time dynamics of the functional activation and integration window? 

      Raw data - We have now added a new supplementary figure (Figure 8 – figure supplement 1) to show data from individual mice, as well as plotting the time-course of the evoked jRGECO fluorescence signals in the frontal cortex hotspot. 

      Image blur - Each pixel represents 62.5 x 62.5 um on the cortical surface. The images in Figure 8B&C were averaged across mice, which causes some additional spatial blurring. However, the most likely explanation for the ‘blurred’ impression, is the overall large horizontal extent of the axonal innervation as well as likely rapid lateral spread of excitation both at the stimulation area and in the target region, as for example also indicated in rapid voltage-sensitive imaging experiments (Ferezou et al., 2007).  

      Reviewer #3 (Recommendations For The Authors): 

      At the time being, the abstract is really centred on the methodology which is no longer very novel as it has actually been already been described previously by other groups. In my view the paper would gain visibility, and be a useful tool for the community if amended to better point out the significant results of the study, for instance, i) the layer and sub-layer specificity of the outputs, using the listed genetic drivers; ii) the comparison of primary and secondary somatosensory areas, iii) the functional validation. The layer specificity of each cre- line should be indicated in the abstract. 

      We have tried to improve the writing of the abstract along the lines suggested by the reviewer. Specifically, we have now added layer and projection class of the various Cre-lines, and we now also highlight the most obvious differences in the innervation patterns.

      There is some degree of redundancy in the description in the result section. One suggestion, for an easier flow of reading, would be to join the paragraphs " Laminar characterization of the Cre-lines.." and: "Axonal projections...". Start for each Cre-line with a description of the laminar localisation of recombination in the somatosensory cortices, followed therefrom by the description of outputs from SSp-bfd and SSs; Then the general description/overview of the outputs can be summarized as a legend to Figure 5-supplementary 2, which could appear as a main figure. 

      Although we agree with the reviewer that there is some level of redundancy in the text, the results of the characterization of the Cre-line (Figure 2) is quite a different experiment compared to the viral injections described in other figures, and we therefore prefer to keep these sections separate.

      Other minor points: 

      In the text; Indicate the genetic background of the transgenic mouse lines. 

      On P. 18, we now indicate that all mice were “back-crossed with C57BL/6 mice”.

      Keep consistency in the designation of the areas, S1 appears sometimes as SSp-bfd or as SSp 

      We thank the reviewer for pointing out the inconsistent nomenclature, which we have now corrected in the revised manuscript. ‘SSp’ remains used on P. 9 and P. 16 of the revised manuscript to indicate a region including SSp-bfd but also extending beyond.

      Figure 1 supplement 2 is not really necessary to show (as the viral tools have previously been validated) can just be stated in the text. Conversely one would like to see a higher resolution image of the injection sites that allowed to do the cell counts used for normalization, as this can be pretty tricky. 

      In response to the reviewer’s suggestion, we have now added a new supplemental figure to show an example of how cells in the injection site were counted (Figure 1 – figure supplement 3).

      Figure 2: the most important here is the higher magnification to show the precise laminar localisation of the recombination, rather than the atlas landmarks that is already shown in Figure 1. This would allow more space for clearer higher magnification panels comparing SSs and SSp. The present image hints to some real differences, but difficult to appreciate with the current resolution. The legend should also comment on the labelling seen in layer 1, in the Tlx2 and Rbp4 lines. Could be dendritic labelling, but this needs a word of clarification.

      We think both the overview images as well as the high-resolution images are of value to the reader. Following the reviewer’s comment, in the legends to Figure 2C&D, we have now added text suggesting that the layer 1 fluorescence is likely axonal or dendritic in origin : “Labelling in layer 1 is likely of axonal or dendritic origin, and no cell bodies were labelled in this layer.” In addition, we have added a new supplemental figure which shows the cortical labelling in SSp and SSS in a more magnified view (Figure 2 – figure supplement 1).

      Figure 3: the comparison of the 3 transgenic lines labelling layer 5 and showing sublaminar identities is really interesting in showing the heterogeneity of this layer and possible regional differences. However, the cases shown for illustration for Rbp4 and Tlx3 seem pretty massive in comparison with the other drivers. Maybe cases with smaller injections could be chosen for illustration. 

      Figure 3 shows grand average axonal density maps across different mice normalized to the number of neurons in the injection site. The large amount of axon per neuron observed in Rbp4 and Tlx3 mice therefore shows their long, wide-ranging axons compared to other neuronal classes.

      Figure 6A could be a supplementary figure in my view; 6B is clearer. 

      We think both representations are useful, and we think different readers might better appreciate either of the two analyses.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This work is potentially useful because it has generated a mineable yield of new candidate immune inhibitory receptors, which can serve both as drug targets and as subjects for further biological investigation. It is noted however that the argument of the work is rather incomplete, in that it does very little to validate the putative new receptors, and merely makes a study of their putative distribution across cell types. Experimental follow-up to demonstrate the claimed properties for the proteins identified, or mining existing experimental data sources on gene expression across tissues to at least show that the pipeline correctly identified genes likely to be specific to immune cells (or something along these lines), would make this work more complete and compelling. 

      We thank the editors for their critical reading and assessment of our manuscript. We acknowledge that the present study is limited by a lack of experimental follow-up. However, we purposely chose to make this pipeline of putative novel inhibitory receptors public at this early stage for our work to be a starting point for further functional investigation of these targets by the scientific community.   

      Public Reviews:

      Reviewer #1 (Public Review):

      This manuscript proposes a new bioinformatics approach identifying several hundreds of previously unknown inhibitory immunoreceptors. When expressed in immune cells (such as neutrophils, monocytes, CD8+, CD4+, and T-cells), such receptors inhibit the functional activity of these cells. Blocking inhibitory receptors represents a promising therapeutic strategy for cancer treatment.

      As such, this is a high-quality and important bioinformatics study. One general concern is the absence of direct experimental validation of the results. In addition to the fact that the authors bioinformatically identified 51 known receptors, providing such experimental evaluation (of at least one, or better few identified receptors) would, in my opinion, significantly strengthen the presented evidence.

      I will now briefly summarize the results and give my comments.

      First, using sequence comparison analysis, the authors identify a large set of putative receptors based on the presence of immunoreceptor tyrosine-based inhibitory motifs (ITIMs), or immunoreceptor tyrosinebased switch motifs (ITSMs). They further filter the identified set of receptors for the presence of the ITIMs or ITSMs in an intracellular domain of the protein. Second, using AlphaFold structure modeling, the authors select only receptors containing ITIMs/ITSMs in structurally disordered regions. Third, the evaluation of gene expression profiles of known and putative receptors in several immune cell types was performed. Fourth, the authors classified putative receptors into functional categories, such as negative feedback receptors, threshold receptors, threshold disinhibition, and threshold-negative feedback. The latter classification was based on the available data from Nat Rev Immunol 2020. Fifth, using publicly available single-cell RNA sequencing data of tumor-infiltrating CD4+ and CD8+ cells from nearly twenty types of cancer, the authors demonstrate that a significant fraction of putative receptors are indeed expressed in these datasets.

      In summary, in my opinion, this is an interesting, important, high-quality bioinformatics work. The manuscript is clearly written and all technical details are carefully explained.

      One comment/suggestion regarding the methodology of evaluating gene expression profiles of putative receptors: perhaps it might be important to look at clusters of genes that are co-expressed with putative inhibitory receptors. 

      We thank the reviewer for their comments and suggestions.  We acknowledge that looking at co-expressed genes and subsequently at gene ontology enrichment could be an interesting approach to prioritize the inhibitory receptors. However, since there are many ways to approach the results of the gene coexpression networks, which also depend on the cell type and activation status of interest, we have chosen to discuss the implications of these networks in the discussion with the following paragraph, rather than reporting all these different approaches in the paper:

      “To further prioritize inhibitory receptors in immune cell subsets or diseases of interest, gene coexpression networks of putative inhibitory receptors could be assessed. On the one hand, the cooccurrence of putative inhibitory receptors with known inhibitory receptors within a module could be one approach, while on the other hand the presence of putative inhibitory receptors in a different module could suggest novel regulation of different biological functions than the known receptors. The location of the putative inhibitory receptors in the network could also change depending on the cell type and the activation status of the cell. Additionally, one could look at the co-expression of candidates with other genes within a gene module to look at potential biological function, and at co-expression with signalling molecules known to interact with inhibitory receptors, such as Csk, SHP-1, SHP-2 and SHIP1, although their regulation might be more post-translationally regulated rather than at mRNA level.”

      Reviewer #2 (Public Review):

      Summary:

      The authors developed a bioinformatic pipeline to aid the screening and identification of inhibitory receptors suitable as drug targets. The challenge lies in the large search space and lack of tools for assessing the likelihood of their inhibitory function. To make progress, the authors used a consensus protein membrane topology and sequence motif prediction tool (TOPCOS) combined with both a statistical measure assessing their likelihood function and a machine learning protein structural prediction model (AlphaFold) to greatly cut down the search space. After obtaining a manageable set of 398 high-confidence known and putative inhibitory receptors through this pipeline, the authors then mapped these receptors to different functional categories across different cell types based on their expression both in the resting and activated state. Additionally, by using publicly available pan-cancer scRNA-seq for tumor-infiltrating T-cell data, they showed that these receptors are expressed across various cellular subsets.

      Strengths:

      The authors presented sound arguments motivating the need to efficiently screen inhibitory receptors and to identify those that are functional. Key components of the algorithm were presented along with solid justification for why they addressed challenges faced by existing approaches. To name a few:

      • TOPCON algorithm was elected to optimize the prediction of membrane topology.

      • A statistical measure was used to remove potential false positives.

      • AlphaFold is used to filter out putative receptors that are low confidence (and likely intrinsically disordered).

      To examine receptors screened through this pipeline through a functional lens, the authors proposed to look at their expression of various immune cell subsets to assign functional categories. This is a reasonable and appropriate first step for interpreting and understanding how potential drug targets are differentially expressed in some disease contexts.

      Weaknesses:

      The paper has strength in the pipeline they presented, but the weakness, in my opinion, lies in the lack of concrete demonstration on how this pipeline can be used to at least "rediscover" known targets in a

      disease-specific manner. For example, the result that both known and putative immune inhibitory receptors are expressed across a wide variety of tumor-infiltrating T-cell subsets is reassuring, but this would have been more informative and illustrative if the authors could demonstrate using a disease with known targets, as opposed to a pan-cancer context. Additionally, a discussion that contrasts the known and putative receptors in the context above would help readers better identify use cases suitable for their research using this pipeline. Particularly,

      • For known receptors, does the pipeline and the expression analysis above rediscover the known target in the disease of interest?

      • For putative receptors, what do the functional category mapping and the differential expression across various tumor-infiltrating T-cell subsets imply on a potential therapeutic target?

      We thank the reviewer for their assessment and comments. The primary purpose of the bioinformatics pipeline was to identify putative inhibitory receptors in a disease-agnostic manner and allow the scientific community to further explore targets in their specific diseases of interest. We performed our pan-cancer expression analysis as a preliminary proof of concept and agree that exploring targets in specific diseases, cancer or otherwise, could be more informative. To validate that we rediscovered known immunotherapeutic targets, we analyzed the expression of known inhibitory receptors on tumorinfiltrating T cells of melanoma patients using the same dataset as figure 3. We find high expression of known therapeutic targets, such as PD-1, in addition to other known inhibitory receptors that are being targeted in clinical trials, one of which being TIGIT. We have added this information to the results section and added the corresponding graph as supplementary figure 5. 

      For the putative inhibitory receptors, we believe the functional categorization can assist in selecting targets that are more likely to be successful in a therapeutic context. As we previously proposed in our perspective on functional categorization of inhibitory receptors (Rumpret et al., Nat Imm, 2020), it might be beneficial to target inhibitory receptors of different functional categories in cancer immunotherapy. Targeting a threshold receptor to lower the threshold for activation and a negative feedback receptor to lengthen and strengthen the cellular response might therefore be more effective than targeting two receptors of a single functional category. Even though we realize RNA sequencing data of in vitro stimulated immune cells is not identical to data from TILs, we have tried to characterize the functional categories expressed by TILs by extrapolating the defined functional categorization per gene from figure 2, and added the corresponding graphs as supplementary figure 4. This shows that mainly threshold receptors and some (threshold-)negative feedback receptors are expressed by the different T cell subsets, which would open the possibility of using the proposed therapeutic strategy of targeting different functional categories. However, we acknowledge that this will require further validation of expression patterns in vivo in different cancers and immune cell subsets. 

      Reviewer #1 (Recommendations For The Authors):

      One comment/suggestion regarding the methodology of evaluating gene expression profiles of putative receptors: perhaps it might be important to look at clusters of genes that are co-expressed with putative inhibitory receptors.

      See our reply to the suggestion above.

      Reviewer #2 (Recommendations For The Authors):

      Results section

      (a) "Putative ITIM/ITSM-bearing immune inhibitory receptors can be found in the human genome"

      i. Figure 1 could benefit from additional labeling. For example, in B, the grey line indicates 5%, etc. Additionally, in panel B&C, I assume by "predicted" the author meant using TOPCONS?

      ii. Figure 1B doesn't seem to be consistent with this sentence "However, for 10 out of 51, we observed ITIM/ITSM sequences in the permutated sequence up to ~25% of the time" [page 2, line 1-3], as all 51 data points in Figure 1B (under "Known" panel) are below the 0.25 horizontal line?

      i. We have adjusted the figure legend to better indicate the information provided in the figures. The predicted genes are all unknown transmembrane candidates that contain an ITIM or ITSM in their intracellular domain, as determined using TOPCONS.

      ii. Due to the nature of permutation testing, there is some variation in the individual likelihood values for each protein sequence. However, as they were generally below 0.25 in any given iteration, we decided to define this value as a threshold for inclusion. 

      (b) "AlphaFold structure predictions can assist in identifying likely functional ITIM/ITSMs"

      i. Readability would increase if the author indicate how pLDDT score is computed and in what range is it (between 0 and 100.)

      ii. Third paragraph. Can the author comment on why 80 pLDDT is chosen as the cutoff? The first sentence of this paragraph states "We found that 99 out of 101 ITIM/ITSMs of the 51 known receptors had low confidence score, i.e., less than 80 pLDDT, with an average confidence score of 49.3 pLDDT..." However, it was later stated in the Discussion, page 10, starting Line 11 "We determined a threshold of 80 pLDDT based on the average prediction scores of the ITIM/ITSMs in known inhibitory receptors....". If 99 out of 101 ITIM/ITSMs had pLDDT<80, then it seems strange that the average of the 101 is at 80pLDDT, even in the extreme where the remaining 101-99=2 ITIM/ITSMs attain the maximum pLDDT score at 100, unless the distribution of those 99 is narrowly centered around 80? A distribution of the pLDDT would help clarify.

      i. The pLDDT scores are computed by AlphaFold as a way to determine how well a specific residue and/or region is expected to be modelled in three-dimensional space. We now refer to the corresponding AlphaFold publications and references therein to clarify this (10.1093/nar/gkab1061, 10.1038/s41586021-03819-2, 10.1093/bioinformatics/btt473). We also have now included the range (i.e., 0-100) in the text.

      ii. The threshold of 80 pLDDT was chosen as this still encompasses all known inhibitory receptors and was not calculated based on an average of the prediction scores. In this way, we still included ITIM/ITSMs with a relatively high pLDDT, such as those observed in PD-1 and LAIR-1. The previous text ‘average prediction scores of the ITIM/ITSMs in known inhibitory receptors’ referred to the averaging of the confidence score for each of the six amino acids encompassing the ITIM/ITSM into one overall score per ITIM/ITSM. We have adjusted the text to better reflect this.

      (c) "Putative inhibitory receptors are expressed across immune cell subsets"

      Figure S2, the last sentence in the caption (relevant for panel C) states "Cell subsets without uniquely expressed putative inhibitory receptors i.e., B cells and T cell, are excluded from the panel for clarity", but B cells and T cells are present in panel C?

      Indeed, but they are only included for the cases where the cell subsets share receptor expression with other immune cell subsets. The B and T cells do not express any unique putative multi-spanning receptors, all receptors are shared with at least one other immune cell subset. 

      (d) "Known and putative inhibitory receptors are expressed on tumour infiltrating T cells"

      i. Missing panel C label in Figure 3 and S3.

      ii. By comparing Figure 3 and S3, it looks to me that there's not a big difference between single-spanning and multi-spanning inhibitory receptors. I wonder if the authors can comment or speculate on this similarity in addition to differences of expression among T-cell subsets. Would the similarities and differences above be explained by cancer type?

      i. Figure 3 and S3 do not contain a panel C, but panel B consists of a lower (CD8+) and an upper (CD4+) subpanel, we have more clearly indicated this in the figure legend in the revised manuscript. 

      ii. While some T cell subsets, such as exhausted CD8+ T cells and CD4+ regulatory T cells, appear to not differ much in their expression of either single- or multi-spanning receptors, we do observe that, for example, effector memory CD4+ T cells or EMRA CD8+ T cells express single-spanning inhibitory receptors to a higher extent than multi-spanning inhibitory receptors. It is possible that these differences and similarities reflect some of the roles multi-spanning inhibitory receptors could play in regulating immune cells, for example in response to chemokines, as many chemokine receptors are multi-spanning proteins. 

      Data and Code availability

      Although the Methods section provides some context for the computational analysis and citations for relevant data, software availability and a data availability statement are lacking.

      We have included a data availability statement to the data files and code in the revised manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The manuscript by Jang et al. describes the application of new methods to measure the localization of GTP-binding signaling proteins (G proteins) on different membrane structures in a model mammalian cell line (HEK293). G proteins mediate signaling by receptors found at the cell surface (GPCRs), with evidence from the last 15 years suggesting that GPCRs can induce G-protein mediated signaling from different membrane structures within the cell, with variation in signal localization leading to different cellular outcomes. While it has been clearly shown that different GPCRs efficiently traffic to various intracellular compartments, it is less clear whether G proteins traffic in the same manner, and whether GPCR trafficking facilitates "passenger" G protein trafficking. This question was a blind spot in the burgeoning field of GPCR localized signaling in need of careful study, and the results obtained will serve as an important guidepost for further work in this field. The extent to which G proteins localize to different membranes within the cell is the main experimental question tested in this manuscript. This question is pursued through two distinct methods, both relying on genetic modification of the G-beta subunit with a tag. In one method, G-beta is modified with a small fragment of the fluorescent protein mNG, which combines with the larger mNG fragment to form a fully functional fluorescent protein to facilitate protein trafficking by fluorescent microscopy. This approach was combined with the expression of fluorescent proteins directed to various intracellular compartments (different types of endosomes, lysosome, endoplasmic reticulum, Golgi, mitochondria) to look for colocalization of G-beta with these markers. These experiments showed compelling evidence that G-beta co-localizes with markers at the plasma membrane and the lysosome, with weak or absent co-localization for other markers. A second method for measuring localization relied on fusing G-beta with a small fragment from a miniature luciferase (HiBit) that combines with a larger luciferase fragment (LgBit) to form an active luciferase enzyme. Localization of Gbeta (and luciferase signal) was measured using a method known as bystander BRET, which relies on the expression of a fluorescent protein BRET acceptor in different cellular compartments. Results using bystander BRET supported findings from fluorescence microscopy experiments. These methods for tracking G protein localization were also used to probe other questions. The activation of GPCRs from different classes had virtually no impact on the localization of G-beta, suggesting that GPCR activation does not result in the shuttling of G proteins through the endosomal pathway with activated receptors.

      Strengths:

      The question probed in this study is quite important and, in my opinion, understudied by the pharmacology community. The results presented here are an important call to be cognizant of the localization of GPCR coupling partners in different cellular compartments. Abundant reports of endosomal GPCR signaling need to consider how the impact of lower G protein abundance on endosomal membranes will affect the signaling responses under study.

      The work presented is carefully executed, with seemingly high levels of technical rigor. These studies benefit from probing the experimental questions at hand using two different methods of measurement (fluorescent microscopy and bystander BRET). The observation that both methods arrive at the same (or a very similar) answer inspires confidence about the validity of these findings.

      Weaknesses:

      The rationale for fusing G-beta with either mNG2(11) or SmBit could benefit from some expansion. I understand the speculation that using the smallest tag possible may have the smallest impact on protein performance and localization, but plenty of researchers have fused proteins with whole fluorescent proteins to provide conclusions that have been confirmed by other methods. Many studies even use G proteins fused with fluorescent proteins or luciferases. Is there an important advantage to tagging G-beta with small tags? Is there evidence that G proteins with full-size protein tags behave aberrantly? If the studies presented here would not have been possible without these CRISPR-based tagging approaches, it would be helpful to provide more context to make this clearer. Perhaps one factor would be interference from newly synthesized G proteins-fluorescent protein fusions en route to the plasma membrane (in the ER and Golgi).

      There are several advantages to using small peptide tags that we did not fully explain. From a practical standpoint the most important advantage of using the HiBit tag instead of full-length Nanoluc is that it allows us to restrict luminescence output to cells transiently transfected with LgBit. In this way untransfected cells contribute no background signal. Although we did not take advantage of it here, this also applies to fluorescent protein complementation, and will be useful for visualizing proteins in individual cells within tissues. The HiBit tag also allows PAGE analysis by probing membranes with LgBit (as in Fig. 1). We are not aware of evidence that tagging Gb or Gg subunits on the N terminus results in aberrant behavior, while there is some evidence that Ga subunits tagged with full-size protein tags (in some positions) have altered functional properties (PMID: 16371464). We do think that editing endogenous genes is critical, as studies using transient overexpression (usually driven by strong promoters) have sometimes reported accumulation of tagged G proteins in the biosynthetic pathway (e.g., PMID: 17576765), as the reviewer suggests. Ga and Gbg appear to be mutually dependent on each other for appropriate trafficking to the plasma membrane (reviewed in PMID: 23161140), therefore the native (presumably matched) stoichiometry is likely to be critical.

      To clarify this context the revised manuscript includes the following:

      “For bioluminescence experiments we added the HiBit tag (Schwinn et al., 2018) and isolated clonal “HiBit-b1“ cell lines. An advantage of this approach over adding a full-length Nanoluc luciferase is that it requires coexpression of LgBit to produce a complemented luciferase. This limits luminescence to cotransfected cells and thus eliminates background from untransfected cells.”

      “Some studies using overexpressed G protein subunits have suggested that a large pool of G proteins is located on intracellular membranes, including the Golgi apparatus (Chisari et al., 2007; Saini et al., 2007; Tsutsumi et al., 2009), whereas others have indicated a distribution that is dominated by the plasma membrane (Crouthamel et al., 2008; Evanko, Thiyagarajan, & Wedegaertner, 2000; Marrari et al., 2007; Takida & Wedegaertner, 2003). A likely factor contributing to these discrepant results is the stoichiometry of overexpressed subunits, as neither Ga nor Gbg traffic appropriately to the plasma membrane as free subunits (Wedegaertner, 2012). Our gene-editing approach presumably maintains the native subunit stoichiometry, providing a more accurate representation of native G protein distribution.”

      As noted by the authors, they do not demonstrate that the tagged G-beta is predominantly found within heterotrimeric G protein complexes. If there is substantial free G-beta, then many of the conclusions need to be reconsidered. Perhaps a comparison of immunoprecipitated tagged G beta vs immunoprecipitated supernatant, with blotting for other G protein subunits would be informative.

      We do think that HiBit-b1 exists predominantly within heterotrimeric complexes, for several reasons. First, overexpression studies have shown that Gbg requires association with Ga to traffic to the plasma membrane, and that by itself Gbg is retained on the endoplasmic reticulum

      (PMID: 12609996; PMID: 12221133). We find almost no endogenous Gb1 on the endoplasmic reticulum, and a high density on the plasma membrane. Second, we are able to detect large increases in free HiBit-Gbg after G protein activation using free Gbg sensors (e.g. Fig. 1). Third, many proteins that bind to free Gbg are found entirely in the cytosol of HEK 293 cells (e.g. PMID: 10066824), suggesting there is not a large population of free Gbg. We have added discussion of these points to the revised manuscript as follows:

      “Endogenous Ga and Gb subunits are expressed at approximately a 1:1 ratio, and Gb subunits are tightly associated with Gg and inactive Ga subunits (Cho et al., 2022; Gilman, 1987; Krumins & Gilman, 2006). Moreover, proteins that bind to free Gbg dimers are found in the cytosol of unstimulated HEK 293 cells, suggesting at most only a small population of free Gbg in these cells. Therefore, we assume that the large majority of mNG-b1 and HiBit-b1 subunits in unstimulated cells are part of heterotrimers.”

      “Notably, when Gbg dimers are expressed alone they accumulate on the endoplasmic reticulum

      (Michaelson et al., 2002; Takida & Wedegaertner, 2003). That we detect almost no endogenous Gbg on the endoplasmic reticulum supports our conclusion that the large majority of Gbg in unstimulated HEK 293 cells is associated with Ga, although we cannot rule out a small population of free Gbg.”

      We do not entirely understand the suggested experiment, as free Gbg will still be largely associated with the membrane fraction. Notably, we find almost no HiBit-b1 in the supernatant after lysis in hypotonic buffer and preparation of membrane fractions, and the small amount that we do find does not change if Ga is overexpressed.

      Additional context and questions:

      (1) There exists some evidence that certain GPCRs can form enduring complexes with G-betagamma (PubMed: 23297229, 27499021). That would seem to offer a mechanism that would enable receptor-mediated transport of G protein subunits. It would be helpful for the authors to place the findings of this manuscript in the context of these previous findings since they seem somewhat contradictory.

      We agree. In our original submission we noted “It is possible that other receptors will influence G protein distribution using mechanisms not shared by the receptors we studied.” In the revised manuscript we have added:

      “For example, a few receptors are thought to form relatively stable complexes with Gbg, which could provide a mechanism of trafficking to endosomes (Thomsen et al., 2016; Wehbi et al., 2013).”

      (2) There is some evidence that GaS undergoes measurable dissociation from the plasma membrane upon activation (see the mechanism of the assay in PubMed: 35302493). It seems possible that G-alpha (and in particular GaS) might behave differently than the G-beta subunit studied here. This is not entirely clear from the discussion as it now stands.

      Indeed, there is abundant evidence that some Gas translocates away from the plasma membrane upon activation. We referred to translocation of “some Ga subunits” in the introduction, although we did not specify that Gas is by far the most studied example. In a previous study (PMID: 27528603) we found that overexpressed Gas samples many intracellular membranes upon activation and returns to the plasma membrane when activation ceases. This is similar to activation-dependent translocation of free Gbg dimers. Because these translocation mechanisms depend on activation and are reversible they are unlikely to be a major source of inactive heterotrimers for intracellular membranes.

      We did a poor job of making it clear that we intentionally avoided translocation mechanisms that operate only during receptor and G protein stimulation. In the revised manuscript we have added new data showing reversible activation-dependent translocation of endogenous HiBitGb1.

      (3) The authors say "The presence of mNG-b1 on late endosomes suggested that some G proteins may be degraded by lysosomes". The mechanism of lysosomal degradation by proteins on the outside of the lysosome is not clear. It would be helpful for the authors to clarify.

      We agree we didn’t connect the dots here. Our initial idea was that G proteins on the surface of late endosomes might reach the interior of late endosomes and then lysosomes by involution into multivesicular bodies. However, the reviewer correctly points out that much of the G protein associated with lysosomes still appears to be on the cytosolic surface, where it would not be subject to degradation. In fact, since lysosomes can fuse with the plasma membrane under certain circumstances, this could even represent a pathway for recycling G proteins to the plasma membrane.

      We have revised the text to avoid giving the impression that lysosomes degrade G proteins, since we have scant evidence that this occurs. In the revised discussion we point out that we do not know the fate of G proteins located on the surface of lysosomes and speculate that these could be returned to the plasma membrane:

      “We do not know the fate of G proteins located on the surface of lysosomes. Since lysosomes may fuse with the plasma membrane under certain circumstances (Xu & Ren, 2015), it is possible that this represents a route of G protein recycling to the plasma membrane.”

      (4) Although the authors do a good job of assessing G protein dilution in endosomal membranes, it is unclear how this behavior compares to the measurement of other lipidanchored proteins using the same approach. Is the dilution of G proteins what we would expect for any lipid-anchored protein at the inner leaflet of the plasma membrane?

      This is a great question. To begin to address it we have studied a model lipid-anchored protein consisting of mNeongreen2 anchored to the plasma membrane by the C terminus of HRas, which is palmitoylated and prenylated. We find that this protein is also diluted on endocytic vesicles, although to a lesser degree than heterotrimeric G proteins. We have added a section to the results and a new figure supplement describing these results:

      “To test if other peripheral membrane proteins are similarly depleted from endocytic vesicles, we performed analogous experiments by overexpressing mNG bearing the C-terminal membrane anchor of HRas (mNG-HRas ct). We found that mNG-HRas ct was also less abundant on FM464-positive endocytic vesicles than expected based on plasma membrane abundance, although not to the same extent as mNG-b1 (Figure 4 - figure supplement 2); mNG-HRas ct density on FM4-64-positive vesicles was 64 ± 17% (mean ± 95% CI; n=78) of the nearby plasma membrane.”

      Reviewer #2 (Public Review):

      This is an interesting method that addresses the important problem of assessing G protein localization at endogenous levels. The data are generally convincing.

      Specific comments

      Methods:

      The description of the gene editing method is unclear. There are two different CRISPR cell lines made in two different cell backgrounds. The methods should clearly state which CRISPR guides were used on which cell line. It is also not clear why HiBit is included in the mNG-β1 construct. Presumably, this is not critical but it would be helpful to explicitly note. In general, the Methods could be more complete.

      We have added the following to the methods to clarify that the same gRNA was used to produce both cell lines:

      “The human GNB1 gene was targeted at a site corresponding to the N-terminus of the Gb1 protein; the sequence 5’-TGAGTGAGCTTGACCAGTTA-3’ was incorporated into the crRNA, and the same gRNA was used to produce both HiBit-b1 and mNG-b1 cell lines.”

      We have added the following to the methods to clarify why HiBit is included in the mNG-b1 construct:

      “HiBit was included in the repair template for producing mNG-b1 cells to enable screening for edited clones using luminescence.”

      Results:

      The explanation of validation experiments in Figures 1 C and D is incomplete and difficult to follow. The rationale and explanation of the experiments could be expanded. In addition, because this is an interesting method, it would be helpful to know if the endogenous editing affects normal GPCR signaling. For example, the authors could include data showing an Isoinduced cAMP response. This is not critical to the present interpretation but is relevant as a general point regarding the method. Also, it may be relevant to the interpretation of receptor effects on G protein localization.

      We have expanded the rationale and explanation of experiments in Figures 1C and D by adding:

      “For example, we observed agonist-induced BRET between the D2 dopamine receptor and mNG-b1, an interaction that requires association with endogenous Ga subunits (Figure 1C). Similarly, we observed BRET between HiBit-b1 and the free Gbg sensor memGRKct-Venus after activation of receptors that couple Gi/o, Gs, and Gq heterotrimers, indicating that HiBit-b1 associated with endogenous Ga subunits from these three families (Figure 1D).”

      We have done the suggested cAMP experiment and provide the data in a new figure supplement:

      “We also found that cyclic AMP accumulation in response to stimulation of endogenous b adrenergic receptors was similar in edited cell lines and their unedited parent lines (Figure 1 - figure supplement 1).”

      Discussion:

      The conclusion that beta-gamma subunits do not redistribute after GPCR activation seems new and different from previous reports. Is this correct? Can the authors elaborate on how the results compare to previous literature?

      Many previous studies have indeed shown that free Gbg dimers can redistribute after GPCR activation and sample intracellular membranes. Our initial focus was on possible changes in heterotrimer distribution after GPCR activation, but in retrospect we should have directly addressed free Gbg translocation and made the distinction clear. 

      In the revised manuscript we show that during stimulation we observe changes consistent with modest translocation of endogenous Gbg from the plasma membrane and sampling of intracellular compartments. To our knowledge this is the first demonstration of endogenous Gbg translocation.

      We have added:

      “With overexpressed G proteins free Gbg dimers translocate from the plasma membrane and sample intracellular membrane compartments after activation-induced dissociation from Ga subunits. Consistent with this, we observed small decreases in bystander BRET at the plasma membrane and small increases in bystander BRET at intracellular compartments during activation of GPCRs, suggesting that endogenous Gbg subunits undergo similar translocation (Figure 5- figure supplement 1). Notably, these changes occurred at room temperature, suggesting that endocytosis was not involved, and developed over the course of minutes. The latter observation and the small magnitude of agonist-induced changes are both consistent with expression of primarily slowly-translocating endogenous Gg subtypes in HEK 293 cells. Moreover, as shown previously for overexpressed Gbg, the changes we observed with endogenous Gbg were readily reversible (Figure 5- figure supplement 1), suggesting that most heterotrimers reassemble at the plasma membrane after activation ceases.”

      Can the authors note that OpenCell has endogenously tagged Gβ1 and reports more obvious internal localization? Can the authors comment on this point?

      OpenCell has tagged GNB1 and the Leonetti group kindly provided a parent cell line we used to add a slightly different tag. Although their study did not identify any specific intracellular compartments, our impression is that most of the internal structures visible in their images are likely to be lysosomes, as they are large, round and often have a clear lumen. Overall their images and ours are comfortingly similar. We have added:

      “Unsurprisingly, our images are quite similar to those made as part of previous study that labeled Gb1 subunits with mNG2 (Cho et al., 2022).”

      Notably, the Leonetti group has recently reported the subcellular distribution of many untagged proteins using a proteomic approach. They find that Gb1 is enriched on the plasma membrane and lysosomes but is not enriched on endosomes, the Golgi apparatus, endoplasmic reticulum or mitochondria (https://www.biorxiv.org/content/10.1101/2023.12.18.572249v1). We have cited this work in the revised manuscript.

      Is this the first use of CRISPR / HiBit for BRET assay? It would be helpful to know this or cite previous work if not. Also, as this is submitted as a tools piece, the authors might say a little more about the potential application to other questions.

      The only previous study we are aware of utilizing a similar combination of methods is a 2020 report from the group of Dr. Stephen Hill, in which the authors studied binding of fluorescent ligands to HiBit-tagged GPCRs. This work is now cited.

      We have also added the following to our previous brief statement about potential applications:

      “In addition, it may also be possible to use these cells in combination with targeted sensors to study endogenous G protein activation in different subcellular compartments. More broadly, our results show that subcellular localization of endogenous membrane proteins can be studied in living cells by adding a HiBit tag and performing bystander BRET mapping. Applied at large scale this approach would have some advantages over fluorescent protein complementation, most notably the ability to localize endogenous membrane proteins that are expressed at levels that are too low to permit fluorescence microscopy.”

      Reviewer #3 (Public Review):

      Summary:

      This article addresses an important and interesting question concerning intracellular localization and dynamics of endogenous G proteins. The fate and trafficking of G protein-coupled receptors (GPCRs) have been extensively studied but so far little is known about the trafficking routes of their partner G proteins that are known to dissociate from their respective receptors upon activation of the signaling pathway. The authors utilize modern cell biology tools including genome editing and bystander bioluminescence resonance energy transfer (BRET) to probe intracellular localization of G proteins in various membrane compartments in steady state and also upon receptor activation. Data presented in this manuscript shows that while G proteins are mostly present on the plasma membrane, they can be also detected in endosomal compartments, especially in late endosomes and lysosomes. This distribution, according to data presented in this study, seems not to be affected by receptor activation. These findings will have implications in further studies addressing GPCR signaling mechanisms from intracellular compartments.

      Strengths:

      The methods used in this study are adequate for the question asked. Especially, the use of genome-edited cells (for the addition of the tag on one of the G proteins) is a great choice to prevent the effects of overexpression. Moreover, the use of bystander BRET allowed authors to probe the intracellular localization of G proteins in a very high-throughput fashion. By combining imaging and BRET authors convincingly show that G proteins are very low abundant on early endosomes (also ER, mitochondria, and medial Golgi), however seem to accumulate on membranes of late endosomal compartments.

      Weaknesses:

      While the authors provide a novel dataset, many questions regarding G protein trafficking remain open. For example, it is not entirely clear which pathway is utilized to traffic G proteins from the plasma membrane to intracellular compartments. Additionally, future studies should also address the dynamics of G protein trafficking, for example by tracking them over multiple time points.

      We agree, there is much more to do.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      On page 7 the text says "the difference did reach significance (Figure 5D)". It looks like the difference did not reach significance. Please check on this.

      Thank you, this was an unfortunately significant typo.

      Reviewer #3 (Recommendations For The Authors):

      This article addresses an important and interesting question concerning intracellular localization and dynamics of endogenous G proteins. While the posed question is indeed a grand one and the methods used by the authors are novel, I believe that the data presented in this manuscript are still insufficient to support all claims posed by the authors. Below I list my major concerns:

      (1) The authors claim that they provide a "detailed subcellular map of endogenous G protein distribution", however, the map is in my opinion not sufficiently detailed (e.g. trans-Golgi network is not included) and not quantitative enough (e.g. % of proteins present on one compartment vs. the other as authors claim that BRET signals "cannot be directly compared between different compartments"). To strengthen this statement, except for providing more extensive and quantitative data, it would be beneficial to provide such a "map" as an illustration based on the findings presented in this article.

      “Detailed” is certainly a subjective term. While we maintain that our description of endogenous G protein distribution is far more detailed than any previous study, we now simply claim to provide a “subcellular map”. We have added images of TGNP (TGN46; TGOLN2), showing that endogenous G proteins are readily detectable on the structures labeled by this marker. These data are now provided in Figure 3 – figure supplement 7.

      We did not claim that our study was quantitative- we did not try to count G proteins. However, if we use published estimates of total G proteins and surface area for HEK 293 cells we estimate that there are roughly 2,500 G proteins µm-2 on the plasma membrane and 500 G proteins µm-2 on endocytic vesicles. For other intracellular compartments relative density can be approximated by inspecting images, but a truly quantitative estimate would require a surface area standard analogous to FM4-64 for each compartment. The percentage of the total G protein pool on a given compartment is, in our opinion, less important than the density of G proteins on that compartment, as the latter is more likely to affect the efficiency of local signal transduction. Since we do not claim to have accurate G protein density estimates for many intracellular compartments, we prefer to provide several raw images for each compartment rather than a schematized map.

      Bystander BRET values cannot be compared directly across compartments due to differences in expression and energy transfer efficiency of different markers and compartment surface area. This method is well suited for following changes in distribution as a function of time or after perturbations and for sensitive detection of weak colocalization but can only provide approximate “maps” of absolute distribution.

      (2) Probing of the intracellular distribution of these proteins, especially after GPCR activation, includes a single chosen timepoint. I believe that the manuscript would greatly benefit from including some dynamic data on internalization and intracellular trafficking kinetics. What is the turnover of tested G proteins? What is the fraction that is going to recycling compartments and/or lysosomes? Authors could perhaps turn to other methods to be able to dynamically track proteins over time e.g. via photoconversion techniques.

      Because G protein trafficking appears to be largely constitutive there is no easy way for us to assess how long it takes G proteins to transit various intracellular compartments, although we agree this would be interesting. As the reviewer suggests, dynamic data on constitutive trafficking would require methods (such as photoconversion) not currently available to us for endogenous G proteins. Accordingly, we have made no claims regarding the kinetics of G protein trafficking. As for possible redistribution after GPCR activation, in the revised manuscript we have added 5- and 15-minute timepoints after agonist stimulation for our bystander BRET mapping (Figure 5- figure supplement 2). These timepoints were chosen to correspond to persistent signaling mediated by internalized receptors. 

      (3) Exemplary images with cells showing significant colocalization with lysosomal compartments seem to contain more intracellular vesicles visible in the mNG channel than in the case of the other compartment. Is it an effect of the treatment to stain lysosomes? It would be beneficial to compare it with some endogenous marker e.g. LAMP1 without additional treatments.

      The visibility of intracellular vesicles in our lysosome images likely reflects our selection of cells and regions with visible and abundant lysosomes, specifically peripheral regions directly adhered to the coverslip, rather than treatment with lysosomal stains (LV 633 and dextran). As suggested, we now include images of cells expressing LAMP1 as an alternative lysosome marker (Figure 3 - figure supplement 6).

      (4) The authors probe an abundance of G proteins along the constitutive endocytic pathway. However, to prove that G proteins are not de-palmitoylated rather than endocytosed authors should perform control experiments where endocytosis is blocked e.g. pharmacologically or via a knockdown approach. Additionally, various endocytic pathways can be probed.

      We did not claim that depalmitoylation plays no role in delivery of G proteins to internal compartments. In fact, we pointed out that we cannot at present rule out other pathways and delivery mechanisms. Importantly, if some of the G proteins that we detect along the endocytic pathway do arrive there by trafficking through the cytosol this would only strengthen our major conclusion that endocytosis is inefficient.

      Having said this, we have now conducted extensive experiments investigating the role of palmitate cycling in the trafficking of heterotrimeric G proteins and the small G protein H-Ras. Our results suggest that a depalmitoylation-repalmitoylation cycle is not important for the distribution of heterotrimers, but these findings will be the subject of a separate publication focused on this specific question for both large and small G proteins.

      We agree that it will be interesting to probe different endocytic pathways, as suggested using a genetic approach. Our main interest here was in endocytic membranes that were defined functionally (with FM4-64 or internalized receptors) rather than biochemically.

      Minor comments:

      (5) "Imaging" paragraph in the Methods section refers to a non-existent figure called "SI Appendix S9".

      Thank you.

      (6) It is not clear what was used as a "control" in Figure 5E.

      “Control” refers to DPBS vehicle alone. This information is now added to the legend for Figure 5E.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer 1:

      Comment 1 and 2: “The pipeline relies on a large number of hard-coded conditions: size of Gaussian blur (Gaussian should be written in uppercase), values of contrast, size of filters, levels of intensity, etc. Presumably, the authors followed a heuristic approach and tried values of these and concluded that the ones proposed were optimal. A proper sensitivity analysis should be performed. That is, select a range of values of the variables and measure the effect on the output.”

      “Linked to the previous comments. Other researchers that want to follow the pipeline would have either to have exactly the same acquisition conditions as the manuscript or start playing with values and try to compensate for any difference in their data (cell diameter, fluorescent intensity, etc.) to see if they can match the results of the manuscript.”

      We thank the Reviewer for his insightful comments. We have modified the “Usage” section of the GitHub page (https://github.com/ieoresearch/cellcycle-image-analysis) to include, for each step of the image processing, a paragraph explaining the significance of the operation and a paragraph named “Suggested Values Range” where tips for optimal parameter settings are given and examples with different parameter settings are shown. We believe that these new paragraphs help researchers easily customize the pipeline to their own data.

      Reviewer 2:

      Comment 1: “It would be useful to include frames from the movie showing a G1/S cell in Figures 1 and S1 with some indication of how long that cell is present. From Figure S4 it looks like it is substantially less than an hour.

      It would definitely be nice to validate this observation. A brief pulse of EdU together with the FUCCI colors could allow you to do that in a culture of cycling cells. It appears that the green color as cells enter S-phase develops slowly (and maybe gets brighter continuously) as does the red color as cells progress through G1. It would be nice to validate what the color the cells are when they actually initiate DNA replication.”  

      We thank the Reviewer for the opportunity to further investigate our results and clarify points that were unclear in the first version of the manuscript. As suggested, we have included all acquired frames depicting the G1 to S transition/early S phase of three cells: the Kasumi-1 untreated cell and the PF-06873600 treated NB4 cell shown in Fig. 1A, and the MDA-MB-231 cell shown in Fig. S1; they are shown in panels D of Fig. 4 and S5, respectively.

      For the Kasumi-1 and NB4 cells, the G1 to S transition/early S phase, defined in the pipeline refinement step as a yellow phase appearing before the S phase, is visible at the 12-hour frame. Conversely, the MDA-MB-231 cell shown in Fig. S5D does not exhibit the G1 to S or early S phase, yellow; it transitions abruptly from red to green within our acquisition timeframe (30 min in this case), producing a green early S phase. This observation supports the Reviewer's suggestion that the G1 to S yellow transition is often shorter than one hour and it is not identifiable in all cells.

      To further investigate this point, we also conducted the EdU experiments kindly suggested by the Reviewer. Kasumi-1 and MDA-MB-231 cells expressing the FUCCI(CA)2 probes were exposed to a pulse of EdU, and subsequently analyzed using flow cytometry and confocal microscopy. A new paragraph titled “The workflow allows the identification of the G1 to S phase transition” has been added to the Results section, with the corresponding data presented in Fig. 4 and Fig. S5 for Kasumi-1 and MDA-MB-231 cells, respectively. The Methods section has also been updated describing the new experiments.

      Additionally, in BOX1 under the 'Cell phase assignment' paragraph, point (III), we have removed point 'a. Re-assign the G2/M frames to G1'. Although theoretically possible according to the pipeline, this reassignment is incorrect in practice because mVenus fluorescence indicates that the cells are starting or have already initiated DNA replication.

      All the modifications we made in the text and Figure captions are highlighted in red. We would be thankful if the co-first authorship of Kourosh Hayatigolkhatmi, Chiara Soriani and Emanuel Soda is acknowledged in the final published version of the article.

      We believe that the revisions have strengthened our manuscript, and we hope that it now meets the reviewers' suggestions for greater clarity.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      In the present study, Rincon-Torroella et al. developed ME3BP-7, a microencapsulated formulation of 3BP, as an agent to target MCT1 overexpressing PDACs. They provided evidence showing the specific killing of PDAC cells with MCT1 overexpressing in vitro, along with demonstrating the safety and anti-tumor efficacy of ME3BP-7 in PDAC orthotopic mouse models.

      Strengths:

      * Developed a novel agent.

      * Well-designed experiments and an organized presentation of data that support the conclusions drawn.

      Weaknesses:

      There are some minor issues that could enhance the clarity and completeness of the study:

      (1) Statistical results should be visually presented in Figure 4 and Figure S1.

      (2) Given the tumor heterogeneity and the identification of focal high expression of MCT1 in Figure 7 and Figure S5B, it is suggested that the authors include the results of immunohistochemical (IHC) analysis of MCT1 expression in both control and ME3BP-7 treated tumor tissues. This addition may offer insight into whether the remaining tumors are composed of PDAC cells with negative MCT1 expression, while the cells with relatively high levels of MCT1 expression were eliminated by ME3BP-7 treatment.

      (3) The authors are encouraged to discuss the future directions for improving the efficacy of this study. For example, exploring the combination of ME3BP-7 with a glutaminase-1 inhibitor (PMID 37891897) could be a valuable avenue for further research.

      We thank the reviewer for pointing these out. We have addressed these individually in detail in the next section

      Reviewer #2 (Public Review):

      Summary:

      In the manuscript by Rincon-Torroella et al, the authors evaluated the therapeutic potential of ME3BP-7, a microencapsulated formulation of 3BP which specifically targets MCT-1 high tumor cells, in pancreatic cancer models. The authors showed that, compared to 3BP, ME3BP-7 exhibited much-enhanced stability in serum. In addition, the authors confirmed the specificity of ME3BP-7 toward MCT-1 high tumor cells and demonstrated the in vivo anti-tumor effect of ME3BP-7 in orthotopic xenograft of human PDAC cell line and PDAC PDX model.

      Strengths:

      (1) The study convincingly demonstrated the superior stability of ME3BP-7 in serum.

      (2) The specificity of ME3BP-7 and 3BP toward MCT-1 high PDAC cells was clearly demonstrated with CRISPR-mediated knockout experiments.

      Weaknesses:

      The advantage of ME3BP-7 over 3BP under an in vivo situation was not fully established.

      This is a helpful observation indeed and we have attempted to address this in the revised manuscript as well as clarified the details in the following section in detail.

      Reviewer #1 (Recommendations For The Authors):

      There are some minor issues that could enhance the clarity and completeness of the study:

      We appreciate these comments and have addressed them to the best of our abilities in the revised manuscript.

      (1) Statistical results should be visually presented in Figure 4 and Figure S1.

      Figure 4 and S1 have been updated to include visual representation of statistical results.

      (2) Given the tumor heterogeneity and the identification of focal high expression of MCT1 in Figure 7 and Figure S5B, it is suggested that the authors include the results of immunohistochemical (IHC) analysis of MCT1 expression in both control and ME3BP-7 treated tumor tissues. This addition may offer insight into whether the remaining tumors are composed of PDAC cells with negative MCT1 expression, while the cells with relatively high levels of MCT1 expression were eliminated by ME3BP-7 treatment.

      This is an excellent suggestion, but unfortunately, we were unable to implement it.   We identified a single antibody that showed specificity in our MCT1 knockout isogenic panel after testing 6 different commercial anti-MCT1 antibodies. While the chosen antibody (sc-365501) worked well on fixed human pancreatic cancer samples, it exhibited significant cross-reactivity against background mouse tissue, rendering it difficult to effectively visualize the orthotopically implanted PDx samples.  

      (3) The authors are encouraged to discuss the future directions for improving the efficacy of this study. For example, exploring the combination of ME3BP-7 with a glutaminase-1 inhibitor (PMID 37891897) could be a valuable avenue for further research.

      We have included potentially useful combinations of ME3BP-7 in the discussion section.

      Reviewer #2 (Recommendations For The Authors):

      The overall study is straightforward with translational significance. However, additional clarification is needed to determine the novelty of the study. As cited by the authors, the same group previously published a paper in Clinical Cancer Research, demonstrating the anti-tumor effect of beta-CD-3BP which is also a microencapsulated form of 3BP prepared with succinyl-beta-cyclodextrin. Please clarify what is the major difference between the ME3BP-7 and beta-CD-3BP.

      We designed the first generation of beta-CD-3BP and presented the preliminary results in the Clinical Cancer Research paper.  Over the last several years, we sought to optimize the formulation so that it would be a a robust clinical candidate. The current manuscript describes our in-depth exploration.

      We used a combination of SEC HPLC analyses (representative chromatogram in Fig. 3A) along with a newly developed assay to assess serum stability (representative data in Fig 3B) of a panel of ME-3BP complexes. The panel was created by varying the molar ratios of three different beta-CDs (succinyl beta-CD, native beta-CD and hydroxypropyl beta CD) to 3BP.   We discovered that an excess of succinyl-beta-CD (1.2 :1) resulted in the most stable agent with no noticeable batch effects, and this formulation was dubbed ME3BP-7).

      The study clearly demonstrated the superior stability of ME3BP-7 in serum compared to 3BP. To further support the advantage of ME3BP-7, it will be important to include the same dose of 3BP as a control in the in vivo treatment experiment to evaluate the difference in both toxicity and anti-tumor effect.

      We wanted to include a control arm in our study wherein the same dose of 3BP was used. However, in toxicity studies on three different species of mice, we found that infusion of 3BP at the identical dose was highly toxic, killing the animals within a few days.  We have highlighted this toxicity of the non-microencapsulated 3BP in the revised manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The manuscript by Kim et al. describes a role for axonal transport of Wnd (a dual leucine zipper kinase) for its normal degradation by the Hiw ubiquitin ligase pathway. In Hiw mutants, the Wnd protein accumulates dramatically in nerve terminals compared to the cell body of neurons. In the absence of axonal transport, Wnd levels rise and lead to excessive JNK signaling that makes neurons unhappy.

      Strengths:

      Using GFP-tagged Wnd transgenes and structure-function approaches, the authors show that palmitoylation of the protein at C130 plays a role in this process by promoting golgi trafficking and axonal localization of the protein. In the absence of this transport, Wnd is not degraded by Hiw. The authors also identify a role for Rab11 in the transport of Wnd, and provide some evidence that Rab11 loss-of-function neuronal degenerative phenotypes are due to excessive Wnd signaling. Overall, the paper provides convincing evidence for a preferential site of action for Wnd degradation by the Hiw pathway within axonal and/or synaptic compartments of the neuron. In the absence of Wnd transport and degradation, the JNK pathway becomes hyperactivated. As such, the manuscript provides important new insights into compartmental roles for Hiw-mediated Wnd degradation and JNK signaling control.

      Weaknesses:

      It is unclear if the requirement for Wnd degradation at axonal terminals is due to restricted localization of HIW there, but it seems other data in the field argues against that model. The mechanistic link between Hiw degradation and compartmentalization is unknown. 

      We thank the Reviewer for valuable comments. In our revised manuscript, we have addressed reviewer ‘s comments and clarified confusions. We did not intent to imply that Rab11 directly mediates anterograde Wnd protein transport towards axon terminals. We re-worded related text throughout our manuscript to avoid confusion. Additionally, to strengthen the link between Rab11 and Wnd, we have added additional data that heterozygous mutation of wnd could rescue the eye degeneration phenotypes caused by Rab11 loss-of-function (new Figure 7C).

      It is unclear if the requirement for Wnd degradation at axonal terminals is due to restricted localization of HIW there, but it seems other data in the field argues against that model. The mechanistic link between Hiw degradation and compartmentalization is unknown.

      We believe that the mechanistic understanding on how Wnd protein turnover is restricted to axon/axon terminals is beyond the scope of current manuscript. We are actively investigating this interesting research question – please see our point-by-point response for details.

      Reviewer #2 (Public Review):

      Summary:

      Utilizing transgene expression of Wnd in sensory neurons in Drosophila, the authors found that Wnd is enriched in axonal terminals. This enrichment could be blocked by preventing palmitoylation or inhibiting Rab1 or Rab11 activity. Indeed, subsequent experiments showed that inhibiting Wnd can prevent toxicity by Rab11 loss of function.

      Strengths:

      This paper evaluates in detail Wnd location in sensory neurons, and identifies a novel genetic interaction between Rab11 and Wnd that affects Wnd cellular distribution.

      Weaknesses:

      The authors report low endogenous expression of wnd, and expressing mutant hiw or overexpressing wnd is necessary to see axonal terminal enrichment. It is unclear if this overexpression model (which is known to promote synaptic overgrowth) would be relevant to normal physiology.

      We agree that most of our subcellular localization studies were conducted using transgenes, which may not accurately reflect endogenous protein localization. Albeit with this technical limitation, our work addresses an important mechanistic link between DLK’s axonal localization and protein turnover, in neuronal stress signaling and neurodegeneration. 

      Additionally, most of our experiments were done using a kinase-dead form of Wnd or with DLKi treatment (DLK kinase inhibitor). Neurons do not display synaptic overgrowth phenotypes under these experimental conditions. Thus, the changes in Wnd axonal localization are likely independent of synaptic overgrowth phenotypes.

      Palmitoylation of the Wnd orthologue DLK in sensory neurons has previously been identified as important for DLK trafficking in a cell culture model.

      Palmitoylation of DLK has been studied in previous works including Holland et al. 2015. These are important works. However, there are significant differences from our findings. First, inhibiting DLK palmitoylation caused cytoplasmic localization of DLK. It has been reported that expression levels of wild-type and the palmitoylation-defective DLK (DLK-CS) in axons are not different in cultured sensory neurons (Holland 2015, Figure 2A and 2B). This could be simply because DLK-CS is entirely cytoplasmic and can readily diffuse into axons – which led to the conclusion that DLK palmitoylation is essential for DLK localization on motile axonal puncta. Second, because of this cytoplasmic localization, DLK-CS failed to induce downstream signaling (Holland 2015).

      However, the behavior of Wnd-CS from our study is entirely different. Wnd-CS does not show diffuse cytoplasmic localization, rather shows discrete localizations in neuronal cell bodies (Figure 2E, Figure 2-supplement 1). Furthermore, Wnd-CS is able to induce downstream signaling (Figure 4 – supplement 1 and 2). Thus, our manuscript is not an extension of previously published work. Rather, our manuscript took advantage of this unique behavior of Wnd-CS and elucidated biological function of the axonal localization of Wnd.

      The authors find genetic interaction between Wnd and Rab11, but these studies are incomplete and they do not support the authors' mechanistic interpretation.

      Our model describes that Wnd is constantly transported to axon terminals for protein degradation (protein turnover), and that this process is essential to keep Wnd activity at low levels to prevent unwanted neuronal stress signal. Based on this model, a failure in Wnd transport to axon terminals – as seen in Wnd-C130S or by Rab11 loss-of-function – would compromises protein degradation of Wnd, hence, results in excessive abundance of Wnd proteins. This was clearly demonstrated for Wnd-C130S (Figure 3) and for Rab11 mutants (Figure 6E), which support our model.

      To strengthen the link between Rab11 and Wnd, we have added additional data in our revised manuscript, which showed that heterozygous mutation of wnd significantly rescued the eye degeneration phenotypes caused by Rab11 loss-of-function (new Figure 7C).

      We did not intent to imply that Rab11 directly mediates anterograde Wnd protein transport towards axon terminals. We re-worded related text throughout our manuscript to avoid confusion.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) It would be interesting to overexpress Hiw in C4da neurons to see if this can degrade the C130S Wnd protein and reduce ERK signaling, or overexpress Hiw in the Rab11 mutant background to see if this can reduce the accumulation of Wnd or total Wnd levels. This could address the question of whether the reduction in Wnd turnover is due to Hiw's inaccessibility to Wnd.

      Thank you for your comment. We believe this question warrants an independent line of study. Although this is beyond the scope of current work, we would like to share our findings here. We have found that overexpressing Hiw did not suppress the transgenic expression of Wnd-KD in C4da neurons regardless of cellular locations. However interestingly, the same Hiw overexpression suppressed increased Wnd-KD expression by hiw mutations in C4da neuron axon terminals. Thus, it seems that endogenous levels of Hiw in wild-type was sufficient to suppress transgenic expression of Wnd-KD, and that excessive Hiw expression does not further enhance this effect. Currently, we do not know the mechanisms underlying these observations. One possibility is that Hiw functions exclusively in the context of E3 ubiquitin ligase complex. Wu et al. (2007) found that DFsn is synaptically enriched and acts as an F-box protein of Hiw E3 ligase complex. It is possible that DFsn or some other components of Hiw E3 ligase complex determine the subcellular specificity of Hiw function. We are actively pursuing this research question currently.

      (2) The authors claim that Rab11 transports Wnd to the axon terminals. However, they do not see reliable colocalization of Rab11 and Wnd at axon terminals. Can the authors see Rab11-enriched vesicles with Wnd in nerve bundles, or is the role only to sort Wnd onto a post-recycling endosome compartment that moves to axonal terminals without Rab11?

      We apologize for the confusion. We did not intend to claim that Rab11 directly transports Wnd along axons. We suggested that Rab11 is necessary for axonal localization of Wnd by acting at the somatic recycling endosomes since Rab11 and Wnd extensively colocalize in the cell body but not in the axon terminals (Figure 6 and Figure 6 supplement 1). In our new “Figure 6 supplement 1”, we have now added Rab11 and Wnd colocalization in axons (segmental nerves). We also revised the text (line 294-298) “On the other hand, we did not detect any meaningful colocalization between YFP::Rab11 and Wnd-KD::mRFP in C4da axon terminals or in axons (Manders’ coefficient 0.34 ± 0.14 and 0.41 ± 0.10 respectively) (Figure 6 – supplement 1). These suggest that Rab11 is involved in Wnd protein sorting at the somatic REs rather than transporting Wnd directly.” And in Discussion (line 396-398) “These further suggest that Rab11 is not directly involved in the anterograde long-distance transport of Wnd proteins, rather is responsible for sorting Wnd into the axonal anterograde transporting vesicles.”.

      (3) The authors mis-cite the Tortosa et al 2022 study which shows the exact opposite of what the authors state. Tortosa et al show DLK recruitment to vesicles through phosphorylation and palmitoylation is essential for its signaling, not the opposite, so the authors should reword that or remove the citation.

      We believe the citation is correct. Tortosa et al (2022) “Stress‐induced vesicular assemblies of dual leucine zipper kinase are signaling hubs involved in kinase activation and neurodegeneration” describes that membrane association of DLK rather than palmitoylation itself is sufficient for DLK signaling activation. This is achieved by DLK palmitoylation for mammalian DLK. However, when artificially targeted to cellular membranes, palmitoylation defective DLK (mammalian DLK-CS in their study) was able to induce DLK signaling. Specifically, in their Figure 2 (K-N), when targeted to the intracellular membranes of ER and mitochondria, DLK-CS (palmitoylation defective DLK) elicited DLK signaling as shown by c-Jun phosphorylation.

      Reviewer #2 (Recommendations For The Authors):

      Major Concerns:

      (1) A concern is the overinterpretation of results. The authors find the accumulation of Wnd in axon terminals when they express hiw null or when they overexpress Wnd, but extrapolate that this occurs in "normal conditions" without evidence. Could the increase of Wnd in the axonal terminal be in the setting of known synaptic overgrowth associated with transgene expression?

      Most of our work was conducted using a kinase-dead version of Wnd (Wnd-KD) in a wild-type background (Figure 1C and Figure 1 supplement 1). Moreover, Wnd kinase activity does not affect Wnd axonal localization in our experimental settings (Figure 1 supplement 1).

      When using hiw mutant background, the larvae were treated with Wnd kinase inhibitor thus, prevented excessive axonal growth (Figure 1E, bottom right image – note that there is no axonal overgrowth in this condition). Additionally, Wnd-C130S is expressed lower levels in axon terminals than Wnd (Figure 3B) while exhibiting similar axon overgrowth (Figure 4 supplement 1B). Taken together, axonal overgrowth is unlikely affect axonal protein localization of Wnd.

      (2) The interpretation of these results is based on a supposition that Rab11 anterogradely transports Wnd along axons without evidence for this. Indeed, it has been shown that Rab11 is excluded from axons in mature neurons, but can be mislocalized when overexpressed. This should be addressed in their discussion.

      We apologize for the confusion. We did not intend to suggest that Rab11 directly transports Wnd along axons. We suggested that Rab11 is necessary for axonal localization of Wnd by acting at the somatic recycling endosomes since Rab11 and Wnd extensively colocalize in the cell body but not in the axon terminals (Figure 6 and Figure 6 supplement 1). In our new “Figure 6 supplement 1”, we have now added Rab11 and Wnd colocalization in axons (segmental nerves). We also revised the text (line 296-298) “On the other hand, we did not detect any meaningful colocalization between YFP::Rab11 and Wnd-KD::mRFP in C4da axon terminals or in axons (Manders’ coefficient 0.34 ± 0.14 and 0.41 ± 0.10 respectively) (Figure 6 – supplement 1). These suggest that Rab11 is involved in Wnd protein sorting at the somatic REs rather than transporting Wnd directly.” And in Discussion (line 396-398) “These further suggest that Rab11 is not directly involved in the anterograde long-distance transport of Wnd proteins, rather is responsible for sorting Wnd into the axonal anterograde transporting vesicles.”.

      (3) In Figure 1, the authors should also show images of Wnd-GFSTF in wild-type (non-hiw mutations) to show endogenous Wnd levels in the axon terminal.

      We have now added the figures of Wnd-GFSTF in wild-type (new Figure 1A). To show the comparable fluorescent intensities, we also re-performed hiw mutant experiment and replaced the old images.

      (4) For Figure 1- Supplement, the authors state that the kinase-dead version of Wnd exhibited similar axonal enrichment in comparison to Wnd::GFP in the presence and absence of DLKi. This statement would be better supported with images specifically showing this (for example Wnd-KD::GFP compared to Wnd:GFP with DLKi and Wnd:GFP without DLKi).

      We did not show the images from Wnd::GFP (DLKi) in this supplement figure because it would be redundant with Figure 1C. Rather, we presented the axonal enrichment index for Wnd::GFP (DLKi), Wnd-KD::GFP, Wnd-KD::GFP (DLKi), and Wnd-KD::GFP (DMSO) in Figure 1 supplement 1B.

      Overexpressing catalytically active Wnd dramatically lowers ppk-GAL4 activity in C4da neurons thus prevents us from performing an experiment for Wnd::GFP without DLKi. In this condition, Wnd::GFP expression is barely detectable in C4da neurons.

      (5) In Figure 2 - Supplement 3 the authors state that their data suggests that Wnd protein palmitoylation is catalyzed by HIP14 due to colocalization in the somatic Golgi and mutating HIP14 leads to less Wnd in the axon terminal. This statement would be better supported by evaluating Wnd's palmitoylation via immunoprecipitation in response to dHIP14 enzyme activity.

      We appreciate reviewer’s comment. Although the exact identity of Wnd palmitoyltransferase might be of high interest, our study rather concerns about the biological role of Wnd axonal localization. Moreover, the identity of DLK palmitoyltransferase has been identified in mammalian cell culture and worm studies (Niu et al. 2020 “Coupled Control of Distal Axon Integrity and Somal Responses to Axonal Damage by the Palmitoyl Acyltransferase ZDHHC17”). ZDHHC17 is another name for HIP14. Our data together with these published works strongly suggest that Wnd, the Drosophila DLK might also be targeted by Drosophila HIP14 or dHIP14.

      (6) The authors argue that palmitoylation of Wnd is essential for axonal localization of Wnd. If dHIP14 indeed palmitoylates Wnd as the authors claim, shouldn't there be a decrease in Wnd's palmitoylation within dHIP14 mutants, consequently resulting in its accumulation in the cell body rather than localization in the axonal terminal? However, Wnd is reduced at the axon terminal in dHip14 mutants, but it does not appear to increase in the cell body (Figure 2S3.C). This observation contradicts the results showing increased Wnd in the cell body presented in Figure 2. B and E. This discrepancy should be addressed.

      Thank you for your comment. Our study concerns about the biological role of Wnd axonal localization. Although in an ideal model, dHIP14 mutations should prevent Wnd palmitoylation and causes subsequent cell body accumulation. However, it is highly likely that dHIP14 mutations affect a large number of protein palmitoylations – not just Wnd, which likely changes many aspect of cell functions. We envision that Wnd protein expression might be indirectly affected by these changes. In this context, mutating C130 in Wnd can be considered as more targeted approach – and our data clearly shows that such Wnd mutations render Wnd accumulation in cell bodies.

      (7) Figure 3 - the authors show increased Wnd protein by Western blot in WndC130S:GFP compared to Wnd::GFP. qPCR experiments to show similar mRNA expression of these two transgenes would be an important control, if it's thought that the increase of protein is due to reduction of protein degradation.

      Thank you for your comment. Expressing WndC130S::GFP vs Wnd::GFP was done by GAL4-UAS system – not through endogenous wnd promoter. Thus, we do not expect different mRNA abundance of WndC130S::GFP and Wnd::GFP. However, your concern is valid for Rab11 mutants. We measured wnd mRNA abundance by RT-qPCR and found that Rab11 mutations did not increase wnd mRNA levels (Figure 6 - Supplement 2). Rather, we observed consistent reduction in wnd mRNA levels by Rab11 mutant. Please note that total Wnd protein levels were significantly increased by Rab11 mutations. We currently do not have a clear explanation. We envision that the dramatic increase in Wnd signaling (ie, JNK signal, Figure 7A) induces a negative-feedback to reduce wnd mRNA levels (line 313-317).

      (8) Figure 4 Supplement - the authors report that Wnd::GFP causes robust induction of Puc-LacZ. A control without Wnd::GFP expression would be necessary to support that there was an induction.

      We have added control data of UAS-Wnd-KD::GFP (new Figure 4 supplement 1A). Since this required a new side-by-side comparison of fluorescent intensities, we re-performed the full set of experiments and replaced our old data sets.  The results confirmed that both Wnd::GFP and Wnd-C130S::GFP induces puc-lacZ expression. 

      (9) Previously it was shown that inhibiting palmitoylation of DLK prevented activation of JNK signaling (Holland et al 2015), but the authors show in Figure 4A instead an increase of JNK signaling. This discrepancy should be addressed.

      The use of Wnd palmitoylation-defective mutant in our study was only possible because of different behavior of Wnd-C130S from those of palmitoylation-defective DLK. Unlike diffuse cytoplasmic localization of the palmitoylation-defective DLK in mammalian cells or in C elegans neurons, Wnd-C130S exhibited clear puncta localization in neuronal cell bodies – which extensively co-localizes with somatic Golgi complex (Figure 2E and Figure 2 supplement 1). Tortosa et al (2022) showed that palmitoylation-defective DLK (DLK-CS) can trigger DLK signaling when artificially targeted to intracellular membranous organelles (Tortosa 2022, Figure 2 (K-N)). Thus, we reasoned that unlike the palmitoylation-defective DLK from mammalian and worms, Drosophila DLK, Wnd might be catalytically active when mutated on Cysteine 130 because of its puncta localization.

      (10) Figure 6 Supplement - the Rab11 staining is not in a pattern that would be expected with endosomes. A control of just YFP would be useful to determine if this fluorescence signal is specific to Rab11. Can endogenous Rab11 be detected in axons or in the axonal terminal?

      In our model system, endogenously tagged Rab11 (TI-Rab11) does not show clear puncta patterns in segmental nerves (axons) and neuropils (axon terminals), neither colocalize with Wnd-KD. This is indeed related to the reviewer’s comment #2, which suggests that Rab11 does not form endosomes in distal axons or axon terminals in mature neurons. Expressing Rab11 transgenes exhibited some puncta structures in axons (segmental nerves) (new Figure 6 supplement 1). However, they did not show meaningful colocalize with Wnd-KD. These are consistent with our model that Rab11 acts in neuronal cell bodies for Wnd axonal transport – likely via a sorting process.

      (11) There is growing evidence that palmitoylation is important for cargo sorting in the Golgi, and Rab11 is also located at the Golgi and important for trafficking from the Golgi. A mechanism that could be considered from your data is that blocking palmitoylation impairs sorting at the Golgi and trafficking from the Golgi, as opposed to impairing fast axonal transport. Indeed, Rab11 has been shown to be blocked from axons in mature neurons, making Rab11 unlikely to be responsible for the fast axonal transport of Wnd. Direct evidence of Rab11 transporting Wnd in axons would be necessary for the claim that Rab11 constantly transports DLK to terminals.

      We apologize for the confusion. We did not intend to suggest that Rab11 directly transports Wnd along the axons. We suggested that Rab11 is necessary for axonal localization of Wnd by acting at the somatic recycling endosomes since Rab11 and Wnd extensively colocalize in the cell body but not in the axon terminals (Figure 6 and Figure 6 supplement 1). In our new “Figure 6 supplement 1”, we have now added Rab11 and Wnd colocalization in axons (segmental nerves). We also revised the text (line 296-298) “On the other hand, we did not detect any meaningful colocalization between YFP::Rab11 and Wnd-KD::mRFP in C4da axon terminals or in axons (Manders’ coefficient 0.34 ± 0.14 and 0.41 ± 0.10 respectively) (Figure 6 – supplement 1). These suggest that Rab11 is involved in Wnd protein sorting at the somatic REs rather than transporting Wnd directly.” And in Discussion (line 394-398) “These further suggest that Rab11 is not directly involved in the anterograde long-distance transport of Wnd proteins, rather is responsible for sorting Wnd into the axonal anterograde transporting vesicles.”.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This work used a comprehensive dataset to compare the effects of species diversity and genetic diversity within each trophic level and across three trophic levels. The results showed that species diversity had negative effects on ecosystem functions, while genetic diversity had positive effects. These effects were observed only within each trophic level and not across the three trophic levels studied. Although the effects of biodiversity, especially genetic diversity across multi-trophic levels, have been shown to be important, there are still very few empirical studies on this topic due to the complex relationships and difficulty in obtaining data. This study collected an excellent dataset to address this question, enhancing our understanding of genetic diversity effects in aquatic ecosystems.

      Strengths:

      The study collected an extensive dataset that includes species diversity of primary producers (riparian trees), primary consumers (macroinvertebrate shredders), and secondary consumers (fish). It also includes the genetic diversity of the dominant species at each trophic level, biomass production, decomposition rates, and environmental data.

      The conclusions of this paper are mostly well supported by the data and the writing is logical and easy to follow.

      Weaknesses:

      While the dataset is impressive, the authors conducted analyses more akin to a "meta-analysis," leaving out important basic information about the raw data in the manuscript. Given the complexity of the relationships between different trophic levels and ecosystem functions, it would be beneficial for the authors to show the results of each SEM (structural equation model).

      We understand the point raised by the reviewer. Our objective was to focus the Results section on the main hypotheses, and for this we let away the raw statistics. We can definitively show the seven individual SEM, highlighting the major links, which may help understand some processes. This will be done in the next version of the manuscript.

      The main results presented in the manuscript are derived from a "metadata" analysis of effect sizes. However, the methods used to obtain these effect sizes are not sufficiently clarified. By analyzing the effect sizes of species diversity and genetic diversity on these ecosystem functions, the results showed that species diversity had negative effects, while genetic diversity had positive effects on ecosystem functions. The negative effects of species diversity contradict many studies conducted in biodiversity experiments. The authors argue that their study is more relevant because it is based on a natural system, which is closer to reality, but they also acknowledge that natural systems make it harder to detect underlying mechanisms. Providing more results based on the raw data and offering more explanations of the possible mechanisms in the introduction and discussion might help readers understand why and in what context species diversity could have negative effects.

      We hope you will be right. As said above, we will explore this possibility.

      Environmental variation was included in the analyses to test if the environment would modulate the effects of biodiversity on ecosystem functions. However, the main results and conclusions did not sufficiently address this aspect.

      This will be addressed by the more in-depth analysis of individual SEM, and we will discuss this further.

      Reviewer #2 (Public review):

      Summary:

      Fargeot et al. investigated the relative importance of genetic and species diversity on ecosystem function and examined whether this relationship varies within or between trophic-level responses. To do so, they conducted a well-designed field survey measuring species diversity at 3 trophic levels (primary producers [trees], primary consumers [macroinvertebrate shredders], and secondary consumers [fishes]), genetic diversity in a dominant species within each of these 3 trophic levels and 7 ecosystem functions across 52 riverine sites in southern France. They show that the effect of genetic and species diversity on ecosystem functions are similar in magnitude, but when examining within-trophic level responses, operate in different directions: genetic diversity having a positive effect and species diversity a negative one. This data adds to growing evidence from manipulated experiments that both species and genetic diversity can impact ecosystem function and builds upon this by showing these effects can be observed in nature.

      Strengths:

      The study design has resulted in a robust dataset to ask questions about the relative importance of genetic and species diversity of ecosystem function across and within trophic levels.

      Overall, their data supports their conclusions - at least within the system that they are studying - but as mentioned below, it is unclear from this study how general these conclusions would be.

      Weaknesses:

      (1) While a robust dataset, the authors only show the data output from the SEM (i.e., effect size for each individual diversity type per trophic level (6) on each ecosystem function (7)), instead of showing much of the individual data. Although the summary SEM results are interesting and informative, I find that a weakness of this approach is that it is unclear how environmental factors (which were included but not discussed in the results) nor levels of diversity were correlated across sites. As species and genetic diversity are often correlated but also can have reciprocal feedbacks on each other (e.g., Vellend 2005), there may be constraints that underpin why the authors observed positive effects of one type of diversity (genetic) when negative effects of the other (species). It may have also been informative to run SEM with links between levels of diversity. By focusing only on the summary of SEM data, the authors may be reducing the strength of their field dataset and ability to draw inferences from multiple questions and understand specific study-system responses.

      We will address this issue by performing a more in-depth analysis of each individual SEMs, and provide directly these raw data. Regarding the comment on species-genomic diversity correlations (SGDCs), we would like to point out that this has already been addressed in a previous paper (Fargeot et al. Oikos, 2023). There is actually no correlations between genomic and species diversity in these dataset, which is merely explain by the selection of the sampling sites. The relationships between species diversity, genomic diversity and environmental factors are also detailed in Fargeot et al. (2023). We precisely published this paper first to focus here “only” on BEFs. But we realize we need to provide further information and discuss further these issues. This will be done in the next version of the manuscript.

      (2) My understanding of SEM is it gives outputs of the strength/significance of each pathway/relationship and if so, it isn't clear why this wasn't used and instead, confidence intervals of Z scores to determine which individual BEFs were significant. In addition, an inclusion of the 7 SEM pathway outputs would have been useful to include in an appendix.

      Yes, we can provide p-values. Results from p-values will provide the same information than 95%Cis, both yield very similar (if not exactly the same) results/conclusions. We wil provide the 7 SEMs in Appendices.

      (3) I don't fully agree with the authors calling this a meta-analysis as it is this a single study of multiple sites within a single region and a specific time point, and not a collection of multiple studies or ecosystems conducted by multiple authors. Moreso, the authors are using meta-analysis summary metrics to evaluate their data. The authors tend to focus on these patterns as general trends, but as the data is all from this riverine system this study could have benefited from focusing on what was going on in this system to underpin these patterns. I'd argue more data is needed to know whether across sites and ecosystems, species diversity and genetic diversity have opposite effects on ecosystem function within trophic levels.

      We agree. “Meta-regression” would perhaps be more adequate than “meta-analyses”. As said above, more details will be provided on the next version of the manuscript.

      Reviewer #3 (Public review):

      The manuscript by Fargeot and colleagues assesses the relative effects of species and genetic diversity on ecosystem functioning. This study is very well written and examines the interesting question of whether within-species or among-species diversity correlates with ecosystem functioning, and whether these effects are consistent across trophic levels. The main findings are that genetic diversity appears to have a stronger positive effect on function than species diversity (which appears negative). These results are interesting and have value.

      However, I do have some concerns that could influence the interpretation.

      (1) Scale: the different measures of diversity and function for the different trophic levels are measured over very different spatial scales, for example, trees along 200 m transects and 15 cm traps. It is not clear whether trees 200 m away are having an effect on small-scale function.

      Trees identification and invertebrate (and fish) sampling are done on the same scale. Trees are spread along the river so that their leaves fall directly in the river. Traps have been installed all along the same transect in various micro-habitats. Diversity have been measured at the exact same scale for all organisms. We will try to be more precise.

      (2) Size of diversity gradients: More information is needed on the actual diversity gradients. One of the issues with surveys of natural systems is that they are of species that have already gone through selection filters from a regional pool, and theoretically, if the environments are similar, you should get similar sets of species, without monocultures. So, if the species diversity gradients range from say, 6 to 8 species, but genetic diversity gradients span an order of magnitude more, you can explain much more variance with genetic diversity. Related to this, species diversity effects on function are often asymptotic at high diversity and so if you are only sampling at the high diversity range, we should expect a strong effect.

      We will provide more information. The range of diversity also vary according to the trophic level; there are more invertebrate species than fish species. But overall the rage of species number is large.

      (3) Ecosystem functions: The functions are largely biomass estimates (expect decomposition), and I fail to see how the biomass of a single species can be construed as an ecosystem function. Aren't you just estimating a selection effect in this case?

      The biomass estimated for a certain area represent an estimate of productivity, whatever the number of species being considered. Obviously, productivity of a species can be due to environmental constraints; the biomass is expected to be lower at the niche margin (selection effect). But is these environmental effects are taken into account (which is the case in the SEMs), then the residual variation can be explained by biodiversity effects. We will try to make it more clear.

      Note that the article claims to be one of the only studies to look at function across trophic levels, but there are several others out there, for example:

      Thanks, we will cite some of these studies (and make our claim less strong)

      Li, F., Altermatt, F., Yang, J., An, S., Li, A., & Zhang, X. (2020). Human activities' fingerprint on multitrophic biodiversity and ecosystem functions across a major river catchment in China. Global change biology, 26(12), 6867-6879.

      Luo, Y. H., Cadotte, M. W., Liu, J., Burgess, K. S., Tan, S. L., Ye, L. J., ... & Gao, L. M. (2022). Multitrophic diversity and biotic associations influence subalpine forest ecosystem multifunctionality. Ecology, 103(9), e3745.

      Moi, D. A., Romero, G. Q., Antiqueira, P. A., Mormul, R. P., Teixeira de Mello, F., & Bonecker, C. C. (2021). Multitrophic richness enhances ecosystem multifunctionality of tropical shallow lakes. Functional Ecology, 35(4), 942-954.

      Wan, B., Liu, T., Gong, X., Zhang, Y., Li, C., Chen, X., ... & Liu, M. (2022). Energy flux across multitrophic levels drives ecosystem multifunctionality: Evidence from nematode food webs. Soil Biology and Biochemistry, 169, 108656.

      And the case was made strongly by:

      Seibold, S., Cadotte, M. W., MacIvor, J. S., Thorn, S., & Müller, J. (2018). The necessity of multitrophic approaches in community ecology. Trends in ecology & evolution, 33(10), 754-764.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      Line 127. Provide a few more words describing the voltage protocol. To the uninitiated, panels A and B will be difficult to understand. "The large negative step is used to first close all channels, then probe the activation function with a series of depolarizing steps to re-open them and obtain the max conductance from the peak tail current at -36 mV. "

      We have revised the text as suggested (revision lines 127 to Line 131): “From a holding potential within the gK,L activation range (here –74 mV), the cell is hyperpolarized to –124 mV, negative to EK and the activation range, producing a large inward current through open gK,L channels that rapidly decays as the channels deactivate. We use the large transient inward current as a hallmark of gK,L. The hyperpolarization closes all channels, and then the activation function is probed with a series of depolarizing steps, obtaining the max conductance from the peak tail current at –44 mV (Fig. 1A).”

      Incidentally, why does the peak tail current decay? 

      We added this text to the figure legend to explain this: “For steps positive to the midpoint voltage, tail currents are very large. As a result, K+ accumulation in the calyceal cleft reduces driving force on K+, causing currents to decay rapidly, as seen in A (Lim et al., 2011).”

      The decay of the peak tail current is a feature of gK,L (large K+ conductance) and the large enclosed synaptic cleft (which concentrates K+ that effluxes from the HC). See Govindaraju et al. (2023) and Lim et al. (2011) for modeling and experiments around this phenomenon.

      Line 217-218. For some reason, I stumbled over this wording. Perhaps rearrange as "In type II HCs absence of Kv1.8 significantly increased Rin and tauRC. There was no effect on Vrest because the conductances to which Kv1.8 contributes, gA and gDR activate positive to the resting potential. (so which K conductances establish Vrest???). 

      We kept our original wording because we wanted to discuss the baseline (Vrest) before describing responses to current injection.

      ->Vrest is presumably maintained by ATP-dependent Na/K exchangers (ATP1a1), HCN, Kir, and mechanotransduction currents. Repolarization is achieved by delayed rectifier and A-type K+ conductances in type II HCs.

      Figure 4, panel C - provides absolute membrane potential for voltage responses. Presumably, these were the most 'ringy' responses. Were they obtained at similar Vm in all cells (i.e., comparisons of Q values in lines 229-230). 

      We added the absolute membrane potential scale. Type II HC protocols all started with 0 pA current injection at baseline, so they were at their natural Vrest, which did not differ by genotype or zone. Consistent with Q depending on expression of conductances that activate positive to Vrest, Q did not co-vary with Vrest (Pearson’s correlation coefficient = 0.08, p = 0.47, n= 85).

      Lines 254. Staining is non-specific? Rather than non-selective? 

      Yes, thanks - Corrected (Line 264).

      Figure 6. Do you have a negative control image for Kv1.4 immuno? Is it surprising that this label is all over the cell, but Kv1.8 is restricted to the synaptic pole? 

      We don’t have a null-animal control because this immunoreactivity was done in rat. While the cuticular plate staining was most likely nonspecific because we see that with many different antibodies, it’s harder to judge the background staining in the hair cell body layer. After feedback from the reviewers, we decided to pull the KV1.4 immunostaining from the paper because of the lack of null control, high background, and inability to reproduce these results in mouse tissue. In our hands, in mouse tissue, both mouse and rabbit anti-KV1.4 antibodies failed to localize to the hair cell membrane. Further optimization or another method could improve that, but for now the single-cell expression data (McInturff et al., 2018) remain the strongest evidence for KV1.4 expression in murine type II hair cells.

      Lines 400-404. Whew, this is pretty cryptic. Expand a bit? 

      We simplified this paragraph (revision lines 411-413): “We speculate that gA and gDR(KV1.8) have different subunit composition: gA may include heteromers of KV1.8 with other subunits that confer rapid inactivation, while gDR(KV1.8) may comprise homomeric KV1.8 channels, given that they do not have N-type inactivation .”

      Line 428. 'importantly different ion channels'. I think I understand what is meant but perhaps say a bit more. 

      Revised (Line 438): “biophysically distinct and functionally different ion channels”.

      Random thought. In addition to impacting Rin and TauRC, do you think the more negative Vrest might also provide a selective advantage by increasing the driving force on K entry from endolymph? 

      When the calyx is perfectly intact, gK,L is predicted to make Vrest less negative than the values we report in our paper, where we have disturbed the calyx to access the hair cell (–80, Govindaraju et al., 2023, vs. –87 mV, here). By enhancing K+ accumulation in the calyceal cleft, the intact calyx shifts EK—and Vrest—positively (Lim et al., 2011), so the effect on driving force may not be as drastic as what you are thinking.

      Reviewer #2 (Recommendations For The Authors): 

      (1) Introduction: wouldn't the small initial paragraph stating the main conclusion of the study fit better at the end of the background section, instead of at the beginning? 

      Thank you for this idea, we have tried that and settled on this direct approach to let people know in advance what the goals of the paper are.

      (2) Pg.4: The following sentence is rather confusing "Between P5 and P10, we detected no evidence of a non-gK,L KV1.8-dependent.....". Also, Suppl. Fig 1A seems to show that between P5 and P10 hair cells can display a potassium current having either a hyperpolarised or depolarised Vhalf. Thus, I am not sure I understand the above statement. 

      Thank you for pointing out unclear wording. We used the more common “delayed rectifier” term in our revision (Lines 144-147): “Between P5 and P10, some type I HCs have not yet acquired the physiologically defined conductance, gK,L.. N effects of KV1.8 deletion were seen in the delayed rectifier currents of immature type I HCs (Suppl. Fig. 1B), showing that they are not immature forms of the Kv1.8-dependent gK,L channels. ”

      (3) For the reduced Cm of hair cells from Kv1.8 knockout mice, could another reason be simply the immature state of the hair cells (i.e. lack of normal growth), rather than less channels in the membrane? 

      There were no other signs to suggest immaturity or abnormal growth in KV1.8–/– hair cells or mice. Importantly, type II HCs did not show the same Cm effect.

      We further discussed the capacitance effect in lines 160-167: “Cm scales with surface area, but soma sizes were unchanged by deletion of KV1.8 (Suppl. Table 2). Instead, Cm may be higher in KV1.8+/+ cells because of gK,L for two reasons. First, highly expressed trans-membrane proteins (see discussion of gK,L channel density in Chen and Eatock, 2000) can affect membrane thickness (Mitra et al., 2004), which is inversely proportional to specific Cm. Second, gK,L could contaminate estimations of capacitive current, which is calculated from the decay time constant of transient current evoked by small voltage steps outside the operating range of any ion channels. gK,L has such a negative operating range that, even for Vm negative to –90 mV, some gK,L channels are voltage-sensitive and could add to capacitive current.”

      (4) Methods: The electrophysiological part states that "For most recordings, we used .....". However, it is not clear what has been used for the other recordings.

      Thanks for catching this error, a holdover from an earlier ms. version.  We have deleted “For most recordings” (revision line 466).

      Also, please provide the sign for the calculated 4 mV liquid junction potential. 

      Done (revision line 476).

      Reviewer #3 (Recommendations For The Authors): 

      (1) Some of the data in panels in Fig. 1 are hard to match up. The voltage protocols shown in A and B show steps from hyperpolarized values to -71mV (A) and -32 mV (B). However, the value from A doesn't seem to correspond with the activation curve in C.

      Thank you for catching this.  We accidentally showed the control I-X curve from a different cell than that in A. We now show the G-V relation for the cell in A.

      Also the Vhalf in D for -/- animals is ~-38 mV, which is similar to the most positive step shown in the protocol.

      The most positive step in Figure 1B is actually –25 mV. The uneven tick labels might have been confusing, so we re-labeled them to be more conventional.

      Were type I cells stepped to more positive potentials to test for the presence of voltage-activated currents at greater depolarizations? This is needed to support the statement on lines 147-148. 

      We added “no additional K+ conductance activated up to +40 mV” (revision line 149-150).  Our standard voltage-clamp protocol iterates up to ~+40 mV in KV1.8–/– hair cells, but in Figure 1 we only showed steps up to –25 mV because K+ accumulation in the synaptic cleft with the calyx distorts the current waveform even for the small residual conductances of the knockouts. KV1.8–/– hair cells have a main KV conductance with a Vhalf of ~–38 mV, as shown in Figure 1, and we did not see an additional KV conductance that activated with a more positive Vhalf up to +40 mV.

      (2) Line 151 states "While the cells of Kv1.8-/- appeared healthy..." how were epithelia assessed for health? Hair cells arise from support cells and it would be interesting to know if Kv1.8 absence influences supporting cells or neurons. 

      We added our criteria for cell health to lines 477-479: “KV1.8–/– hair cells appeared healthy in that cells had resting potentials negative to –50 mV, cells lasted a long time (20-30 minutes) in ruptured patch recordings, membranes were not fragile, and extensive blebbing was not seen.”

      Supporting cells were not routinely investigated. We characterized calyx electrical activity (passive membrane properties, voltage-gated currents, firing pattern) and didn’t detect differences between +/+, +/–, and –/– recordings (data not shown). KV1.8 was not detected in neural tissue (Lee et al., 2013). 

      (3) Several different K+ channel subtypes were found to contribute to inner hair cell K+ conductances (Dierich et al. 2020) but few additional K+ channel subtypes are considered here in vestibular hair cells. Further comments on calcium-activated conductances (lines 310-317) would be helpful since apamin-sensitive SK conductances are reported in type II hair cells (Poppi et al. 2018) and large iberiotoxin-sensitive BK conductances in type I hair cells (Contini et al. 2020). Were iberiotoxin effects studied at a range of voltages and might calcium-dependent conductances contribute to the enhanced resonance responses shown in Fig. 4? 

      We refer you to lines 310-317 in the original ms (lines 322-329 in the revised ms), where we explain possible reasons for not observing IK(Ca) in this study.

      (4) Similar to GK,L erg (Kv11) channels show significant Cs+-permeability. Were experiments using Cs+ and/or Kv11 antagonists performed to test for Kv11? 

      No. Hurley et al. (2006) used Kv11 antagonists to reveal Kv11 currents in rat utricular type I hair cells with perforated patch, which were also detected in rats with single-cell RT-PCR (Hurley et al. 2006) and in mice with single-cell RNAseq (McInturff et al., 2018).  They likely contribute to hair cell currents, alongside Kv7, Kv1.8, HCN1, and Kir. 

      (5) Mechanosensitive ("MET") channels in hair cells are mentioned on lines 234 and 472 (towards the end of the Discussion), but a sentence or two describing the sensory function of hair cells in terms of MET channels and K+ fluxes would help in the Introduction too. 

      Following this suggestion we have expanded the introduction with the following lines  (78-87): “Hair cells are known for their large outwardly rectifying K+ conductances, which repolarize membrane voltage following a mechanically evoked perturbation and in some cases contribute to sharp electrical tuning of the hair cell membrane.  Because gK,L is unusually large and unusually negatively activated, it strongly attenuates and speeds up the receptor potentials of type I HCs (Correia et al., 1996; Rüsch and Eatock, 1996b). In addition, gK,L augments a novel non-quantal transmission from type I hair cell to afferent calyx by providing open channels for K+ flow into the synaptic cleft (Contini et al., 2012, 2017, 2020; Govindaraju et al., 2023), increasing the speed and linearity of the transmitted signal (Songer and Eatock, 2013).”

      (6) Lines 258-260 state that GKL does not inactivate, but previous literature has documented a slow type of inactivation in mouse crista and utricle type I hair cells (Lim et al. 2011, Rusch and Eatock 1996) which should be considered. 

      Lim et al. (2011) concluded that K+ accumulation in the synaptic cleft can explain much of the apparent inactivation of gK,L. In our paper, we were referring to fast, N-type inactivation. We changed that line to be more specific; new revision lines 269-271: “KV1.8, like most KV1 subunits, does not show fast inactivation as a heterologously expressed homomer (Lang et al., 2000; Ranjan et al., 2019; Dierich et al., 2020), nor do the KV1.8-dependent channels in type I HCs, as we show, and in cochlear inner hair cells (Dierich et al., 2020).”

      (7) Lines 320-321 Zonal differences in inward rectifier conductances were reported previously in bird hair cells (Masetto and Correia 1997) and should be referenced here.

      Zonal differences were reported by Masetto and Correia for type II but not type I avian hair cells, which is why we emphasize that we found a zonal difference in I-H in type I hair cells. We added two citations to direct readers to type II hair cell results (lines 333-334): “The gK,L knockout allowed identification of zonal differences in IH and IKir in type I HCs, previously examined in type II HCs (Masetto and Correia, 1997; Levin and Holt, 2012).”

      Also, Horwitz et al. (2011) showed HCN channels in utricles are needed for normal balance function, so please include this reference (see line 171). 

      Done (line 184).

      (8) Fig 6A. Shows Kv1.4 staining in rat utricle but procedures for rat experiments are not described. These should be added. Also, indicate striola or extrastriola regions (if known). 

      We removed KV1.4 immunostaining from the paper, see above.

      (9) Table 6, ZD7288 is listed -was this reagent used in experiments to block Gh? If not please omit. 

      ZD7288 was used to block gH to produce a clean h-infinity curve in Figure 6, which is described in the legend.

      (10) In supplementary Fig. 5A make clear if the currents are from XE991 subtraction. Also, is the G-V data for single cell or multiple cells in B? It appears to be from 1 cell but ages P11-505 are given in legend. 

      The G-V curve in B is from XE991 subtraction, and average parameters in the figure caption are for all the KV1.8–/–  striolar type I hair cells where we observed this double Boltzmann tail G-V curve. I added detail to the figure caption to explain this better.

      (11) Supplementary Fig. 6A claims a fast activation of inward rectifier K+ channels in type II but not type I cells-not clear what exactly is measured here.

      We use “fast inward rectifier” to indicate the inward current that increases within the first 20 ms after hyperpolarization from rest (IKir, characterized in Levin & Holt, 2012) in contrast to HCN channels, which open over ~100 ms. We added panel C to show that the activation of IKir is visible in type II hair cells but not in the knockout type I hair cells that lack gK,L. IKir was a reliable cue to distinguish type I and type II hair cells in the knockout.

      For our actual measurements in Fig 6B, we quantified the current flowing after 250 ms at –124 mV because we did not pharmacologically separate IKir and IH.

      Could the XE991-sensitive current be activated and contributing?

      The XE991-sensitive current could decay (rapidly) at the onset of the hyperpolarizing step, but was not contributing to our measurement of IKir­ and IH, made after 250 ms at –124 mV, at which point any low-voltage-activated (LVA) outward rectifiers have deactivated. Additionally, the LVA XE991-sensitive currents were rare (only detected in some striolar type I hair cells) and when present did not compete with fast IKir, which is only found in type II hair cells.

      Also, did the inward rectifier conductances sustain any outward conductance at more depolarized voltage steps? 

      For the KV1.8-null mice specifically, we cannot answer the question because we did not use specific blocking agents for inward rectifiers.  However, we expect that there would only be sustained outward IR currents at voltages between EK and ~-60 mV: the foot of IKir’s I-V relation according to published data from mouse utricular hair cells – e.g., Holt and Eatock 1995, Rusch and Eatock 1996, Rusch et al. 1998, Horwitz et al., 2011, etc.  Thus, any such current would be unlikely to contaminate the residual outward rectifiers in Kv1.8-null animals, which activate positive to ~-60 mV. 

      (I-HCN is also not a problem, because it could only be outward positive to its reversal potential at ~-40 mV, which is significantly positive to its voltage activation range.)

    1. Author response:

      The following is the authors’ response to the original reviews.

      We edited the manuscript for clarity, added information described in new figure panels (below) and corrected typos.

      In figure 1 we corrected a typo.

      In figure 2, panel 2H, and Figure S2E, we included a new statistical analysis (mixed effect linear regression) to compare mutational burden in controls and AD patients.

      In figure 3, and Figure S4B, we revised the western blots panels in Panel 3E,F, to improve presentation of controls and quantification.

      we corrected typos.

      In figure 5 we removed a panel (former 5D) which did not add useful information.

      In Figure S1A we included information about sex and age from the control and patients analyzed. In Figure S2B, we added an analysis of the mutational burden in controls, distinguishing controls with and without cancer.

      We modified Table S1 for completeness of information for all samples analyzed.

      Reviewer #1:

      Weaknesses: 

      Even though the study is overall very convincing, several points could help to connect the seen somatic variants in microglia more with a potential role in disease progression. The connection of P-SNVs in the genes chosen from neurological disorders was not further highlighted by the authors. 

      All P-SNVs are reported in Table S3.

      We observed only two P-SNVs within genes associated to neurological disorders (brain panel in Table S2). - SQSTM1 (p.P392L) was identified in blood but not in brain from the patient AD48A.

      - OPTN was identified (p.Q467P) in PU.1 from control 25.   

      To highlight this point, we modified the first paragraph of the discussion as follow:

      “We report here that microglia from a cohort of 45 AD patients with intermediate-onset sporadic AD (mean age 65 y.o) is enriched for clones carrying pathogenic/oncogenic variants in genes associated with clonal proliferative disorders (Supplementary Table 2) in comparison to 44 controls. Of note we did not observe microglia P-SNVs within genes reported to be associated with neurological disorders (Supplementary Table 2) in patients, and one such variant was identified in a control (Supplementary Table 3) “.

      The authors show in snRNA-seq data that a disease-associated microglia state seems to be enriched in patients with somatic variants in the CBL ring domain, however, this analysis could be deepened. For example, how this knowledge may translate to patient benefits when the relevant cell populations appear concentrated in a single patient sample (Figure 5; AD52) is unclear; increasing the analyzed patient pool for Figure 5 and showcasing the presence of this microglia state of interest in a few more patients with driving mutations for CBL or other MAPK pathway associated mutations would lend their hypotheses further credibility. 

      We acknowledge this limitation, but we respectfully submit that the analysis was performed in 2 patients. AD 53 also show a MAPK-associated inflammatory signature in the microglia clusters associated with mutations.

      We performed the analysis on all FACS-purified PU.1+ nuclei samples that passed QC for single nuclei RNAseq. It should be noted that this analysis is extremely difficult with current technologies because microglia nuclei need to be fixed for PU.1 staining and FACS purification and the clones are small (~1% of microglia).

      A potential connection between P-SNVs in microglia and disease pathology and symptoms was not further explored by the authors. 

      At the population level, Braak/CERAD scores, the presence of Lewy bodies, amyloid angiopathy, tauopathy, or alpha synucleinopathy were not different between AD patients with or without pathogenic microglial clones (Figure S3 and Table S1). Of note, we studied here a homogenous population of AD patients.

      At the tissue level, the roles of mutant microglia in plaques for example is being investigated, but we do not have results to present at this time.

      A recent preprint (Huang et al., 2024) connected the occurrence of somatic variants in genes associated with clonal hematopoiesis in microglia in a large cohort of AD patients, this study is not further discussed or compared to the data in this manuscript. 

      This pre-print supports the high frequency of detection of oncogenic variants associated with clonal proliferative disorders, they hypothesize that the mutations may be associated with microglia, but they only check a few mutations in purified microglia. Most of the study is performed in whole brain tissue. It does not really bring new information as compared to other study we cite in the introduction (and to our manuscript).

      Reviewer #2 (Recommendations For The Authors): 

      Suggestions for improved or additional experiments, data, or analyses: 

      The authors can demonstrate that identified pathological SNVs from their AD cohort also lead to the activation of human microglia-like cells in vitro, but do not provide any data from histological examination of the patient cohort (e.g. accumulation at the plaque site, microglia distribution, and cell number). The study could be further supported by providing a histological examination of patients with and without P-SNVs to identify if microglia response to pathology, microglia accumulation, or phagocytic capacity are altered in these patients. 

      We performed IBA1 staining in brain samples from control and from AD patients, with or without microglial clones and microglia density was not different between patient with and without mutations. In addition, histological reports from the brain bank (Braak/CERAD scores, Lewis bodies, amyloid angiopathy, tauopathy, or alpha synucleinopathy did not suggest differences between patient with and without mutations (Figure S3). These results are preliminary and further investigations are ongoing.

      It would have been interesting to see if for example, transgenic AD mice with an introduced somatic mutation in microglia show an altered disease progression with alterations in amyloid pathology or cognition. 

      We agree with the reviewer. We performed an in vivo study with mice expressing a  5xFAD transgene, an inducible microglia Cx3cr1CreERt2 BrafLSL-V600E transgene, or both, and performed survival, behavioral (Y-Maze and Novel Object Recognition), and histological analyses for β-Amyloid, p-Tau and Iba1 staining.

      Microgliosis was increased in the group with the 2 transgenes, however the phenotype associated with the expression of a BrafV600E allele in microglia (Mass et al Nature 2017) was strongly dominant over the phenotype of 5xFAD mice, which did not allow us to conclude on survival and behavioral analyses.

      Other studies with different transgenes are in progress but we have no results yet to include in this revised manuscript.

      To connect the somatic mutations in microglia better to a potential contribution in neurodegeneration or neurotoxicity, the authors could provide further details on how to demonstrate if human microglia-like cells respond differentially to amyloid or induce neurotoxicity in a co-culture or slice culture model. 

      These studies are undertaken in the laboratory, but unfortunately, we have no results as yet to include in this revised manuscript.

      The number of samples analyzed for hippocampi, especially in the age-matched controls might be underpowered. 

      Unfortunately, despite our best efforts, we were not able to analyze more hippocampus from control individuals. To control for bias in sampling as well as to other potential bias in our analysis, we investigated the statistical analysis of the cohorts for inclusion of age as a criterion (age matched controls), inclusion of a random effect structure, and possible confounding factor such as sex, brain bank site, and samples’ anatomical location (see revised Methods and revised Fig. 2C, F, and H, and S2B).

      We first tested whether the inclusion of age is appropriate in a fixed-effects linear regression using a generalized linear model (GLM) with gaussian distribution. Compared to the baseline model, the model with age had significantly low AIC (from -66.6 to -71.9, P = 0.0067 by chi-square test). Therefore, the inclusion of age as a fixed effect is appropriate. We next tested multiple structures of mixed-effects linear modeling. We used donors as random effects, while utilizing age, disease status (neurotypical control vs. AD), or both as fixed effects. Fitting was performed using the lme function implemented in the nlme package with the maximum likelihood (ML) method. The incorporation of age and disease status significantly improved overall model fitting. Both age and AD are associated with a significant increase in SNV burden in this model (P<1x10^-4 and P=1x10^-4, respectively, by likelihood ratio test). The model's total explanatory power is substantial (conditional R^2=0.48). We also asked if the addition of potential confounding factors to the model is justified. Three factors were tested via the two above-mentioned methods: sex, brain bank site, and the anatomical location of the samples. In all cases, the AIC increased, and the P values by likelihood ratio tests were higher than 0.99. Therefore, from a statistical standpoint, the inclusion of these potential confounding factors does not seem to improve overall model fitting.

      Minor corrections to the text and figures: 

      The authors made a great effort to analyze various samples from one individual donor. One can get a bit confused by the sentence that "an average of 2.5 brains samples were analyzed for each donor". Maybe the authors could highlight more in the first paragraph of the results section and in Figure 1A, that there are multiple samples ("technical replicates") from one individual patient across different brain regions used. 

      We removed the ‘2.5’ sentence and rewrote the paragraph for clarity. Samples information’s are now displayed in Table S1.

      In the method section is a part included "Expression of target genes in microglia", it was very hard to allocate where these data from public data sets were actually used and for which analysis. Maybe the authors could clarify this again. 

      AU response: we apologize and corrected the paragraph in the methods (page 6) as follow: “ Expression of target genes in microglia. To evaluate the expression levels of the genes identified in this study as target of somatic variants, we consulted a publicly available database (https://www.proteinatlas.org/), and also plotted their expression as determined by RNAseq in 2 studies (Galatro et al. GSE99074 33, and Gosselin et al. 34) (Table S3 and Figure S2). For data from Galatro et al. (GSE99074) 33, normalized gene expression data and associated clinical information of isolated human microglia (N = 39) and whole brain (N = 16) from healthy controls were downloaded from GEO. For data from Gosselin et al. 34, raw gene expression ­data and associated clinical information of isolated microglia (N = 3) and whole brain (N = 1) from healthy controls were extracted from the original dataset. Raw counts were normalized using the DESeq2 package in R 35.”

      Table S3 is very informative, but also very complex. The reader could maybe benefit a lot from this table if it can be structured a bit easier especially when it comes to identifying P-SNVs and in which tissue sample they were found and if this was the same patient. The sorting function on top of the columns helps, but the color coding is a bit unclear. 

      Despite our best efforts we agree that the table, which contain all sequencing data for all samples, is complex. The color coding (red) only highlights the presence of pathogenic mutation.

      Reviewer #3 (Recommendations For The Authors): 

      This is a well-done study of an important problem. I present the following minor critiques: 

      At the bottom of Page 4 and into the top of Page 5, the authors state that 66 of the 826 variants identified in their panel sequencing experiment were found in multiple donors. Then the authors proceed to analyze the remaining 760 variants. It seems that the authors concluded that these multi-donor mosaics were artifacts, which is why they were excluded from further analysis. I think this is a reasonable assumption, but it should be stated explicitly so it is clear to the reader. Complicating this assumption, however, the authors later state that one of their CBL variants was found in two donors, and it is treated as a true mosaic. The authors should make it clear whether recurrent variants were filtered out of any given analysis. It remains possible that all recurrent variants are true mosaics that occurred in multiple donors. The authors should do a bit more to characterize these recurrent variants. Are they observed in the human population using a database like gnomAD, which, together with their recurrence, would strongly suggest they are germline variants? Are they in MAPK genes, or otherwise relevant to the study?

      We apologize for the confusion. Our original intent for the ddPCR validation of variants (Figure 1E) was to count only 1 ‘unique’ variant for variants found for example in 1 brain sample and in the blood from the same patient, or in 2 brain regions from one patient, in order to avoid the criticism of overinflating our validation rate. This was notably the case for TET2 and DNMT3 variants. For example, validation of a TET2 variant found in 2 different brain areas and blood of the same donor is counted as 1 and not 3. We did not eliminate these variants from the analysis as they passed the criteria for somatic variants as presented in Methods.

      In contrast, when a specific variant was found and validated in two different donors, we counted it as 2.

      The characterization of variants included multiple parameters and databases, including for example AF and gnomAD, as indicated in Methods and reported in Table S3.

      All ddPCR results can be found at the end of Table S3.

      Figure 2B labels age-matched controls as "C", but Figure 2C labels age-matched controls as AM-C. Labels should be consistent throughout the manuscript. 

      We corrected this in the revised version.

      It is not clear if the "p:0.02" label in Figure 2F is referring to AM-C Cx vs. AD-Cx or AM-C vs. AD. Please clarify. 

      We apologize for the confusion, and we corrected the legend. The calculated p value is for the comparison between Cortex from Controls (age-matched) and the Cortex from AD.

      On Page 7, the authors state, "The allelic frequencies at which MAPK activating variants are detected in brain samples from AD patients range from ~1-6% of microglia (Fig. 3G), which correspond to clones representing 2 to 12% of mutant microglia in these samples, assuming heterozygosity." I understand what the authors mean here but I think it's a bit confusingly stated. I suggest something like "The allelic frequencies at which MAPK activating variants are detected in brain samples from AD patients range from ~1-6% in microglia (Figure 3G), which correspond to mutant clones representing 2 to 12% of all microglia in these samples, assuming heterozygosity." 

      We thank the reviewer for this suggestion and re-wrote that sentence.

      Is there any evidence that the transcriptional regulators mutated in AD microglia (MED12, SETD2, MLL3, DNMT3A, ASXL1, etc.) are involved in regulating MAPK genes? This would tie these mutations into the broader conclusions of the paper. 

      This is a very interesting question, and indeed published studies indicate that some of the transcriptional /epigenetic regulators regulate expression of MAPK genes. However, in the absence of experimental evidence in microglia and patients, the argument may be too speculative to be included.

      Do the authors have any thoughts as to whether germline variants in CBL are linked to AD? If not, why do they think germline mutations in CBL are not relevant to AD? 

      This is also a very interesting question. As indicated in our manuscript, germline mutations in CBL (and other member of the classical MAPK genes, see Figure 3C) cause early onset (pediatric) and severe developmental diseases known as RASopathies, characterized by multiple developmental defects, and associated with frequent neurological and cognitive deficits.

      It is possible that some other (and more frequent?) germline variants may be associated with a late-onset brain restricted phenotype, but we did not find germline pSNV in our patients. GWAS studies may be more appropriate to test this hypothesis.

      Do any donors show multiple variants? I don't think this is addressed in the text. 

      We do find donors with multiple variants (see Figure 3D and Figure S3), however at this stage, we did not perform single nuclei genotyping to investigate whether they are part of the same clone.

      Figure S3 appears to be upside down. 

      This was corrected

      Figure 5C should have some kind of label telling the reader what gene set is being depicted. 

      We added this information above the panel (it was in the corresponding legend).

      At the top of Page 12, Lewy bodies are written as Lewis bodies. 

      This was corrected

      Many control donors died of cancer (Table S1). Is there any information on which, if any, chemotherapeutics or radiation these patients received? Might this impact the somatic mutation burden? The authors should compare controls with and without cancer or with and without cancer treatments to rule this out. 

      As suggested by the reviewer, we analyzed the mutational load of age-matched controls with and without cancer (revised Figure S2B). As expected, we saw an increase in the mutational load in controls with cancer, particularly in their blood. This information was added in the result section.

      This is most likely associated with the treatments received as well as possible cancer clones.

      The formatting for Table S3 is odd. Multiple different fonts are used (this is also seen in Table S5). Column Q has no column ID. The word "panel" is spelled "pannel." The word "expressed" is spelled "expressd" in one of the worksheet labels. Columns BG-BN in the ALL-SNV worksheet are blank but seemingly part of the table. 

      We fixed this error in Table S3.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We thank the reviewers for their constructive reviews.  Taken together, the comments and suggestions from reviewers made it clear that we needed to focus on improving the clarity of the methods and results.  We have revised the manuscript with that in mind.  In particular, we have restructured the results to make the logic of the manuscript clearer and we have added details to the methods section.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      The work of Muller and colleagues concerns the question of where we place our feet when passing uneven terrain, in particular how we trade-off path length against the steepness of each single step. The authors find that paths are chosen that are consistently less steep and deviate from the straight line more than an average random path, suggesting that participants indeed trade-off steepness for path length. They show that this might be related to biomechanical properties, specifically the leg length of the walkers. In addition, they show using a neural network model that participants could choose the footholds based on their sensory (visual) information about depth. 

      Strengths: 

      The work is a natural continuation of some of the researchers' earlier work that related the immediately following steps to gaze [17]. Methodologically, the work is very impressive and presents a further step forward towards understanding real-world locomotion and its interaction with sampling visual information. While some of the results may seem somewhat trivial in hindsight (as always in this kind of study), I still think this is a very important approach to understanding locomotion in the wild better. 

      Weaknesses: 

      The manuscript as it stands has several issues with the reporting of the results and the statistics. In particular, it is hard to assess the inter-individual variability, as some of the data are aggregated across individuals, while in other cases only central tendencies (means or medians) are reported without providing measures of variability; this is critical, in particular as N=9 is a rather small sample size. It would also be helpful to see the actual data for some of the information merely described in the text (e.g., the dependence of \Delta H on path length). When reporting statistical analyses, test statistics and degrees of freedom should be given (or other variants that unambiguously describe the analysis).

      There is only one figure (Figure 6) that shows data pooled over subjects and this is simply to illustrate how the random paths were calculated. The actual paths generated used individual subject data. We don’t draw our conclusions from these histograms – they are instead used to generate bounds for the simulated paths.  We have made clear both in the text and in the figure legends when we have plotted an example subject. Other plots show the individual subject data. We have given the range of subject medians as well as the standard deviation for data illustrated in Figure (random vs chosen), we have also given the details of the statistical test comparing the flatness of the chosen paths versus the randomly generated paths.  We have added two supplemental figures to show individual walker data more directly: (Fig. 14) the per subject histograms of step parameters, (Fig. 18) the individual subject distributions for straight path slopes and tortuosity.

      The CNN analysis chosen to link the step data to visual sampling (gaze and depth features) should be motivated more clearly, and it should describe how training and test sets were generated and separated for this analysis.

      We have motivated the CNN analysis and moved it earlier in the manuscript to help clarify the logic the manuscript. Details of the training and test are now provided, and the data have been replotted. The values are a little different from the original plot after making a correction in the code, but the conclusions drawn from this analysis are unchanged. This analysis simply shows that there is information in the depth images from the subject’s perspective that a network can use to learn likely footholds. This motivates the subsequent analysis of path flatness.

      There are also some parts of figures, where it is unclear what is shown or where units are missing. The details are listed in the private review section, as I believe that all of these issues can be fixed in principle without additional experiments. 

      Several of the Figures have been replotted to fix these issues.

      Reviewer #2 (Public Review): 

      Summary: 

      This manuscript examines how humans walk over uneven terrain using vision to decide where to step. There is a huge lack of evidence about this because the vast majority of locomotion studies have focused on steady, well-controlled conditions, and not on decisions made in the real world. The author team has already made great advances in this topic, but there has been no practical way to map 3D terrain features in naturalistic environments. They have now developed a way to integrate such measurements along with gaze and step tracking, which allows quantitative evaluation of the proposed trade-offs between stepping vertically onto vs. stepping around obstacles, along with how far people look to decide where to step. 

      Strengths: 

      (1) I am impressed by the overarching outlook of the researchers. They seek to understand human decision-making in real-world locomotion tasks, a topic of obvious relevance to the human condition but not often examined in research. The field has been biased toward well-controlled studies, which have scientific advantages but also serious limitations. A well-controlled study may eliminate human decisions and favor steady or periodic motions in laboratory conditions that facilitate reliable and repeatable data collection. The present study discards all of these usually-favorable factors for rather uncontrolled conditions, yet still finds a way to explore real-world behaviors in a quantitative manner. It is an ambitious and forward-thinking approach, used to tackle an ecologically relevant question. 

      (2) There are serious technical challenges to a study of this kind. It is true that there are existing solutions for motion tracking, eye tracking, and most recently, 3D terrain mapping. However most of the solutions do not have turn-key simplicity and require significant technical expertise. To integrate multiple such solutions together is even more challenging. The authors are to be commended on the technical integration here.

      (3) In the absence of prior studies on this issue, it was necessary to invent new analysis methods to go with the new experimental measures. This is non-trivial and places an added burden on the authors to communicate the new methods. It's harder to be at the forefront in the choice of topic, technical experimental techniques, and analysis methods all at once. 

      Weaknesses: 

      (1) I am predisposed to agree with all of the major conclusions, which seem reasonable and likely to be correct. Ignoring that bias, I was confused by much of the analysis. There is an argument that the chosen paths were not random, based on a comparison of probability distributions that I could not understand. There are plots described as "turn probability vs. X" where the axes are unlabeled and the data range above 1. I hope the authors can provide a clearer description to support the findings. This manuscript stands to be cited well as THE evidence for looking ahead to plan steps, but that is only meaningful if others can understand (and ultimately replicate) the evidence. 

      We have rewritten the manuscript with the goal of clarifying the analyses, and we have re-labelled the offending figure.

      (2) I wish a bit more and simpler data could be provided. It is great that step parameter distributions are shown, but I am left wondering how this compares to level walking.  The distributions also seem to use absolute values for slope and direction, for understandable reasons, but that also probably skews the actual distribution. Presumably, there should be (and is) a peak at zero slope and zero direction, but absolute values mean that non-zero steps may appear approximately doubled in frequency, compared to separate positive and negative. I would hope to see actual distributions, which moreover are likely not independent and probably have a covariance structure. The covariance might help with the argument that steps are not random, and might even be an easy way to suggest the trade-off between turning and stepping vertically. This is not to disregard the present use of absolute values but to suggest some basic summary of the data before taking that step. 

      We have replotted the step parameter distributions without absolute values. Unfortunately, the covariation of step parameters (step direction and step slope) is unlikely to help establish this tradeoff.  Note that the primary conclusion of the manuscript is that works make turns to keep step slope low (when possible). Thus, any correlation that might exist between goal direction and step slope would be difficult to interpret without a direct comparison to possible alternative paths (as we have done in this paper). As such we do not draw our conclusions from them.  We use them primarily to generate plausible random paths for comparison with the chosen paths.  We have added two supplementary figures including distributions (Fig 15) and covariation of all the step parameters discussed in the methods (Fig 16).

      (3) Along these same lines, the manuscript could do more to enable others to digest and go further with the approach, and to facilitate interpretability of results. I like the use of a neural network to demonstrate the predictiveness of stepping, but aside from above-chance probability, what else can inform us about what visual data drives that?

      The CNN analysis simply shows that the information is there in the image from the subject’s viewpoint and is used to motivate the subsequent analysis.  As noted above, we have generally tried to improve the clarity of the methods.

      Similarly, the step distributions and height-turn trade-off curves are somewhat opaque and do not make it easy to envision further efforts by others, for example, people who want to model locomotion. For that, clearer (and perhaps) simpler measures would be helpful. 

      We have clarified the description of these plots in the main text and in the methods.  We have also tried to clarify why we made the choices that we did in measuring the height-turn trade-off and why it is necessary in order to make a fair comparison.

      I am absolutely in support of this manuscript and expect it to have a high impact. I do feel that it could benefit from clarification of the analysis and how it supports the conclusions. 

      Reviewer #3 (Public Review): 

      Summary: 

      The systematic way in which path selection is parametrically investigated is the main contribution. 

      Strengths: 

      The authors have developed an impressive workflow to study gait and gaze in natural terrain. 

      Weaknesses: 

      (1) The training and validation data of the CNN are not explained fully making it unclear if the data tells us anything about the visual features used to guide steering. It is not clear how or on what data the network was trained (training vs. validation vs. un-peeked test data), and justification of the choices made. There is no discussion of possible overfitting. The network could be learning just e.g. specific rock arrangements. If the network is overfitting the "features" it uses could be very artefactual, pixel-level patterns and not the kinds of "features" the human reader immediately has in mind. 

      The CNN analysis has now been moved earlier in the manuscript to help clarify its significance and we have expanded the description of the methods. Briefly, it simply indicates that there is information in the depth structure of the terrain that can be learned by a network. This helps justify the subsequent analyses.  Importantly, the network training and testing sets were separated by terrain to ensure that the model was being tested on “unseen” terrain and avoid the model learning specific arrangements.  This is now clarified in the text.

      (2) The use of descriptive terminology should be made systematic. 

      Specifically, the following terms are used without giving a single, clear definition for them: path, step, step location, foot plant, foothold, future foothold, foot location, future foot location, foot position. I think some terms are being used interchangeably. I would really highly recommend a diagrammatic cartoon sketch, showing the definitions of all these terms in a single figure, and then sticking to them in the main text. 

      We have made the language more systematic and clarified the definition of each term (see Methods). Path refers to the sequence of 5 steps. Foothold is where the foot was placed in the environment. A step is the transition from one foothold to the next.

      (3) More coverage of different interpretations / less interpretation in the abstract/introduction would be prudent.  The authors discuss the path selection very much on the basis of energetic costs and gait stability. At least mention should be given to other plausible parameters the participants might be optimizing (or that indeed they may be just satisficing). That is, it is taken as "given" that energetic cost is the major driver of path selection in your task, and that the relevant perception relies on internal models. Neither of these is a priori obvious nor is it as far as I can tell shown by the data (optimizing other variables, satisficing behavior, or online "direct perception" cannot be ruled out). 

      The abstract has been substantially rewritten.  We have adjusted our language in the introduction/discussion to try to address this concern.

      Recommendations for the authors:

      Reviewing Editor comments 

      You will find a full summary of all 3 reviews below. In addition to these reviews, I'd like to highlight a few points from the discussion among reviewers. 

      All reviewers are in agreement that this study has the potential to be a fundamental study with far-reaching empirical and practical implications. The reviewers also appreciate the technical achievements of this study. 

      At the same time, all reviewers are concerned with the overall lack of clarity in how the results are presented. There are a considerable number of figures that need better labeling, text parts that require clearer definitions, and the description of data collection and analysis (esp. with regard to the CNN) requires more care. Please pay close attention to all comments related to this, as this was the main concern that all reviewers shared. 

      At a more specific level, the reviewers discussed the finding around leg length, and admittedly, found it hard to believe, in short: "extraordinary claims need strong evidence". It would be important to strengthen this analysis by considering possible confounds, and by including a discussion of the degree of conviction. 

      We have weakened the discussion of this finding and provided some an additional analyses in a supplemental figure (Figure 17) to help clarify the finding.

      Reviewer #1 (Recommendations For The Authors): 

      First, let me apologize for the long delay with this review. Despite my generally positive evaluation (see public review), I have some concerns about the way the data are presented and questions about methodological details. 

      (1) Representation of results: I find it hard to decipher how much variability arises within an individual and how much across individuals. For example, Figure 7b seems to aggregate across all individuals, while the analysis is (correctly) based on the subject medians.

      Figure 7b That figure was just one subject. This is now clarified.

      It would be good to see the distribution of all individuals (maybe use violin plots for each observer with the true data on one side and the baseline data on the other, or simple histograms for each). To get a feeling for inter-individual and intra-individual variability is crucial, as obviously (see the leg-length analysis) there are larger inter-individual differences and representations like these would be important to appreciate whether there is just a scaling of more or less the same effect or whether there are qualitative differences (especially in the light of N=9 being not a terribly huge sample size). 

      The medians for the individual subjects are now provided with the standard deviations between subjects to indicate the extent of individual differences. Note that the random paths were chosen from the distribution of actual step slopes for that subject as one of the constraints. This makes the random paths statistically similar to the chosen paths with the differences only being generated by the particular visual context. Thus the test for a difference between chosen and random is quite conservative

      Similarly, seeing \DeltaH plotted as a function of steps in the path as a figure rather than just having the verbal description would also help. 

      To simplify the discussion of our methods/results we have removed the analyses that examine mean slope as a function of steps.  Because of the central limit theorem the slopes of the chosen paths remain largely unchanged regardless of the choice path length.  The slopes of the simulated paths are always larger irrespective of the choice of path length.

      (2) Reporting the statistical analyses: This is related to my previous issue: I would appreciate it if the test statistics and degrees-of-freedom of the statistical tests were given along with the p-values, instead of only the p-values. This at some points would also clarify how the statistics were computed exactly (e.g., "All subjects showed comparable difference and the difference in medians evaluated across subjects was highly significant (p<<0.0001).", p.10, is ambiguous to me). 

      Details have been added as requested.

      (3) Why is the lower half ("tortuosity less than the median tortuosity") of paths used as "straight" rather than simply the minimum of all viable paths)?

      The benchmark for a straight path is somewhat arbitrary. Using the lower half rather than the minimum length path is more conservative.

      (4) For the CNN analysis, I failed to understand what was training and what was test set. I understand that the goal is to predict for all pixels whether they are a potential foothold or not, and the AUC is a measure of how well they can be discriminated based on depth information and then this is done for each image and the median over all images taken. But on which data is the CNN trained, and on which is it tested? Is this leave-n-out within the same participant? If so, how do you deal with dependencies between subsequent images? Or is it leave-1-out across participants? If so, this would be more convincing, but again, the same image might appear in training and test. If the authors just want to ask how well depth features can discriminate footholds from non-footholds, I do not see the benefit of a supervised method, which leaves the details of the feature combinations inside a black box. Rather than defining the "negative set" (i.e., the non-foothold pixels) randomly, the simulated paths could also be used, instead. If performance (AUC) gets lower than for random pixels, this would confirm that the choice of parameters to define a "viable path" is well-chosen. 

      This has been clarified as described above.

      Minor issues: 

      (5) A higher tortuosity would also lead a participant to require more steps in total than a lower tortuosity. Could this partly explain the correlation between the leg length and the slope/tortuosity correlation? (Longer legs need fewer steps in total, thus there might be less tradeoff between \Delta H and keeping the path straight (i.e., saving steps)). To assess this, you could give the total number of steps per (straight) distance covered for leg length and compare this to a flat surface.

      The calculations are done on an individual subject basis and the first and last step locations are chosen from the actual foot placements, then the random paths are generated between those endpoints. The consequence of this is that the number of steps is held constant for the analysis.  We have clarified the methods for this analysis to try to make this more clear.

      (6) As far as I understand, steps happen alternatingly with the two feet. That is, even on a flat surface, one would not reach 0 tortuosity. In other words, does the lateral displacement of the feet play a role (in particular, if paths with even and paths with odd number of steps were to be compared), and if so, is it negligible for the leg-length correlation? 

      All the comparisons here are done for 5 step sequences so this potential issue should not affect the slope of the regression lines or the leg length correlation.

      (7) Is there any way to quantify the quality of the depth estimates? Maybe by taking an actual depth image (e.g., by LIDAR or similar) for a small portion of the terrain and comparing the results to the estimate? If this has been done for similar terrain, can a quantification be given? If errors would be similar to human errors, this would also be interesting for the interpretation of the visual sampling data.

      Unfortunately, we do not have the ground truth depth image from LIDAR.  When these data were originally collected, we had not imagined being able to reconstruct the terrain.  However, we agree with the reviewers that this would be a good analysis to do. We plan to collect LIDAR in future experiments. 

      To provide an assessment of quality for these data in the absence of a ground truth depth image, we have performed an evaluation of the reliability of the terrain reconstruction across repeats of the same terrain both between and within participants.  We have expanded the discussion of these reliability analyses in the results section entitled “Evaluating Terrain Reconstruction”, as well as in the corresponding methods section (see Figure 10).

      (8) The figures are sometimes confusing and a bit sloppy. For example, in Figure 7a, the red, cyan, and green paths are not mentioned in the caption, in Figure 8 units on the axes would be helpful, in Figure 9 it should probably be "tortuosity" where it now states "curviness". 

      These details have been fixed.

      (9) I think the statement "The maximum median AUC of 0.79 indicates that the 0.79 is the median proportion of pixels in the circular..." is not an appropriate characterization of the AUC, as the number of correctly classified pixels will not only depend on the ROC (and thus the AUC), but also on the operating point chosen on the ROC (which is not specified by the AUC alone). I would avoid any complications at this point and just characterize the AUC as a measure of discriminability between footholds and non-footholds based on depth features. 

      This has been fixed.

      (10) Ref. [16]is probably the wrong Hart paper (I assume their 2012 Exp. Brain Res. [https://doi.org/10.1007/s00221-012-3254-x] paper is meant at this point) 

      Fixed

      Typos (not checked systematically, just incidental discoveries): 

      (11) "While there substantial overlap" (p.10) 

      (12) "field.." (p.25) 

      (13) "Introduction", "General Discussion" and "Methods" as well as some subheadings are numbered, while the other headings (e.g., Results) are not. 

      Fixed

      Reviewer #2 (Recommendations For The Authors): 

      The major suggestions have been made in the Public Review. The following are either minor comments or go into more detail about the major suggestions. All of these comments are meant to be constructive, not obstructive. 

      Abstract. This is well written, but the main conclusions "Walkers avoid...This trade off is related...5 steps ahead" sound quite qualitative. They could be strengthened by more specificity (NOT p-values), e.g. "positive correlation between the unevenness of the path straight ahead and the probability that people turned off that path." 

      The abstract has been substantially rewritten.

      P. 5 "pinning the head position estimated from the IMU to the Meshroom estimates" sounds like there are two estimates. But it does not sound like both were used. Clarify, e.g. the Meshroom estimate of head position was used in place of IMU? 

      Yes that’s correct.  We have clarified this in the text.

      Figure 5. I was confused by this. First, is a person walking left to right? When the gaze position is shown, where was the eye at the time of that gaze? There are straight lines attached to the blue dots, what do they represent? The caption says gaze is directed further along the path, which made me guess the person is walking right to left, and the line originates at the eye. Except the origins do not lie on or close to the head locations. There's also no scale shown, so maybe I am completely misinterpreting. If the eye locations were connected to gaze locations, it would help to support the finding that people look five steps ahead of where they step. 

      We have updated the figure and clarified the caption to remove these confusions.  There was a mistake in the original figure (where the yellow indicated head locations, we had plotted the center of mass and the choice of projection gave the incorrect impression that the fixations off the path, in blue, were separated from the head).

      The view of the data is now presented so the person is walking left to right and with a projection of the head location (orange), gaze locations (blue or green) and feet (pink).

      Figure 6. As stated in the major comments, the step distributions would be expected to have a covariance structure (in terms of raw data before taking absolute values). It would be helpful to report the covariances (6 numbers). As an example of a simple statistical analysis, a PCA (also based on a data covariance) would show how certain combinations of slope/distance/direction are favored over others. Such information would be a simple way to argue that the data are not completely random, and may even show a height-turn trade-off immediately. (By the way, I am assuming absolute values are used because the slopes and directions are only positive, but it wasn't clear if this was the definition.) A reason why covariances and PCA are helpful is that such data would be helpful to compute a better random walk, generated from dynamics. I believe the argument that steps are not random is not served by showing the different histograms in Figure 7, because I feel the random paths are not fairly produced. A better argument might draw randomly from the same distribution as the data (or drive a dynamical random walk), and compare with actual data. There may be correlations present in the actual data that differ from random. I could be mistaken, because it is difficult or impossible to draw conclusions from distributions of absolute values, or maybe I am only confused. In any case, I suspect other readers will also have difficulty with this section. 

      This has been addressed above in the major comments.

      p. 9, "average step slope" I think I understand the definition, but I suggest a diagram might be helpful to illustrate this.

      There is a diagram of a single step slope in Figure 6 and a diagram of the average step slope for a path segment in Figure 12.

      Incidentally, the "straight path slope" is not clearly defined. I suspect "straight" is the view from above, i.e. ignoring height changes. 

      Clarified

      p. 11 The tortuosity metric could use a clearer definition. Should I interpret "length of the chosen path relative to a straight path" as the numerator and denominator? Here does "length" also refer to the view from above? Why is tortuosity defined differently from step slope? Couldn't there be an analogue to step slope, except summing absolute values of direction changes? Or an analogue to tortuosity, meaning the length as viewed from the side, divided by the length of the straight path? 

      We followed the literature in the definition of tortuosity.  We have clarified the definition of tortuosity in the methods, but yes, you can interpret the length of the chosen path relative to a straight path, as the numerator and denominator, and length refers to 3D length.  We agree that there are many interesting ways to look at the data but for clarity we have limited the discussion to a single definition of tortuosity in this paper.

      Figure 8 could use better labeling. On the left, there is a straight path and a more tortuous path, why not report the metrics for these? On the right, there are nine unlabeled plots. The caption says "turn probability vs. straight path slope" but the vertical axis is clearly not a probability. Perhaps the axis is tortuosity? I presume the horizontal axis is a straight path slope in degrees, but this is not explained. Why are there nine plots, is each one a subject? I would prefer to be informed directly instead of guessing. (As a side note, I like the correlations as a function of leg length, it is interesting, even if slightly unbelievable. I go hiking with people quite a bit shorter and quite a lot taller than me, and anecdotally I don't think they differ so much from each other.) 

      We have fixed Figure 8 which shows the average “mean slope” as a function of tortuosity.  We have added a supplemental figure which shows a scatter plot of the raw data (mean slope vs. tortuosity for each path segment).  

      Note that when walking with friends other factors (e.g. social) will contribute to the cost function. As a very short person my experience is that it is a problem. In any case, the data are the data, whatever the underlying reasons. It does not seem so surprising that people of different heights make different tradeoffs. We know that the preferred gait depends on individual’s passive dynamics as described in the paper, and the terrain will change what is energetically optimal as described in the Darici and Kuo paper.

      Figure 9 presumably shows one data point per subject, but this isn't clear. 

      The correlations are reported per subject, and this has been clarified. 

      p. 13 CNN. I like this analysis, but only sort of. It is convincing that there is SOME sort of systematic decision-making about footholds, better than chance. What it lacks is insight. I wonder what drives peoples' decisions. As an idle suggestion, the AlexNet (arXiv: Krizhevsky et al.; see also A. Karpathy's ConvNETJS demo with CIFAR-10) showed some convolutional kernels to give an idea of what the layers learned. 

      Further exploration of CNN’s would definitely be interesting, but it is outside the scope of the paper. We use it simply to make a modest point, as described above.

      p. 15 What is the definition of stability cost? I understand energy cost, but it is unclear how circuitous paths have a higher stability cost. One possible definition is an energetic cost having to do with going around and turning. But if not an energy cost, what is it? 

      We meant to say that the longer and flatter paths are presumably more stable because of the smaller height changes. You are correct that we can’t say what the stability cost is and we have clarified this in the discussion.

      p. 16 "in other data" is not explained or referenced.

      Deleted 

      p. 10 5 step paths and p. 17 "over the next 5 steps". I feel there is very little information to really support the 5 steps. A p-value only states the significance, not the amount of difference. This could be strengthened by plotting some measures vs. the number of steps ahead. For example, does a CNN looking 1-5 steps ahead predict better than one looking N<5 steps ahead? I am of course inclined to believe the 5 steps, but I do not see/understand strong quantitative evidence here. 

      We have weakened the statements about evidence for planning 5 steps ahead.

      p. 25 CNN. I did not understand the CNN. The list of layers seems incomplete, it only shows four layers. The convolutional-deconvolutional architecture is mentioned as if that is a common term, which I am unfamiliar with but choose to interpret as akin to encoder-decoder. However, the architecture does not seem to have much of a bottleneck (25x25x8 is not greatly smaller than 100x100x4), so what is the driving principle? It's also unclear how the decoder culminates, does it produce some m x m array of probabilities of stepping, where m is some lower dimension than the images? It might be helpful also to illustrate the predictions, for example, show a photo of the terrain view, along with a probability map for that view. I would expect that the reader can immediately say yes, I would likely step THERE but not there. 

      We have clarified the description of the CNN. An illustration is shown in Figure 11.

      Reviewer #3 (Recommendations For The Authors): 

      (This section expands on the points already contained in the Public Review). 

      Major issues 

      (1) The training and validation data of the CNN are not explained fully making it unclear if the data tells us anything about the visual features used to guide steering. A CNN was used on the depth scenes to identify foothold locations in the images. This is the bit of the methods and the results that remains ambiguous, and the authors may need to revisit the methods/results. It is not clear how or on what data the network was trained (training vs. validation vs. un-peeked test data), and justification of the choices made. There is no discussion of possible overfitting. The network could be learning just for example specific rock arrangements in the particular place you experimented. Training the network on data from one location and then making it generalize to another location would of course be ideal. Your network probably cannot do this (as far as I can tell this was not tried), and so the meaning of the CNN results cannot really be interpreted. 

      I really like the idea, of getting actual retinotopic depth field approximations. But then the question would be: what features in this information are relevant and useful for visual guidance (of foot placement)? But this question is not answered by your method. 

      "If a CNN can predict these locations above chance using depth information, this would indicate that depth features can be used to explain some variation in foothold selection." But there is no analysis of what features they are. If the network is overfitting they could be very artefactual, pixel-level patterns and not the kinds of "features" the human reader immediately has in mind. As you say "CNN analysis shows that subject perspective depth features are predictive of foothold locations", well, yes, with 50,000 odd parameters the foothold coordinates can be associated with the 3D pixel maps, but what does this tell us? 

      See previous discussion of these issues.

      It is true that we do not know the precise depth features used. We established that information about height changes was being used, but further work is needed to specify how the visual system does this. This is mentioned in the Discussion.

      You open the introduction with a motivation to understand the visual features guiding path selection, but what features the CNN finds/uses or indeed what features are there is not much discussed. You would need to bolster this, or down-emphasize this aspect in the Introduction if you cannot address it. 

      "These depth image features may or may not overlap with the step slope features shown to be predictive in the previous analysis, although this analysis better approximates how subjects might use such information." I do not think you can say this. It may be better to approximate the kind of (egocentric) environment the subjects have available, but as it is I do not see how you can say anything about how the subject uses it. (The results on the path selection with respect to the terrain features, viewpoint viewpoint-independent allocentric properties of the previous analyses, are enough in themselves!) 

      We have rewritten the section on the CNN to make clearer what it can and cannot do and its role in the manuscript. See previous discussion.

      (2) The use of descriptive terminology should be made systematic. Overall the rest of the methodology is well explained, and the workflow is impressive. However, to interpret the results the introduction and discussion seem to use terminology somewhat inconsistently. You need to dig into the methods to figure out the exact operationalizations, and even then you cannot be quite sure what a particular term refers to. Specifically, you use the following terms without giving a single, clear definition for them (my interpretation in parentheses): 

      foothold (a possible foot plant location where there is an "affordance"? or a foot plant location you actually observe for this individual? or in the sample?) 

      step (foot trajectory between successive step locations) 

      step location (the location where the feet are placed) 

      path (are they lines projected on the ground, or are they sequences of foot plants? The figure suggests lines but you define a path in terms of five steps. 

      foot plant (occurs when the foot comes in contact with step location?) 

      future foothold (?) 

      foot location (?) 

      future foot location (?) 

      foot position (?) 

      I think some terms are being used interchangeably here? I would really highly recommend a diagrammatic cartoon sketch, showing the definitions of all these terms in a single figure, and then sticking to them in the main text. Also, are "gaze location" and "fixation" the same? I.e. is every gaze-ground intersection a "gaze location" (I take it it is not a "fixation", which you define by event identification by speed and acceleration thresholds in the methods)? 

      We have cleaned up the language. A foothold is the location in the terrain representation (mesh) where the foot was placed. A step is the transition from one foothold to the next. A path is the sequences of 5 steps. The lines simply illustrate the path in the Figures. A gaze location is the location in the terrain representation where the walker is holding gaze still (the act of fixating). See Muller et al (2023) for further explanation.

      (3) More coverage of different interpretations / less interpretation in the abstract/introduction would be prudent. You discuss the path selection very much on the basis of energetic costs and gait stability. At least mention should be given to other plausible parameters the participants might be optimizing (or that indeed they may be just satisficing). Temporal cost (more circuitous route takes longer) and uncertainty (the more step locations you sample the more chance that some of them will not be stable) seem equally reasonable, given the task ecology / the type of environment you are considering. I do not know if there is literature on these in the gait-scene, but even if not then saying you are focusing on just one explanation because that's where there is literature to fall back on would be the thing to do. 

      Also in the abstract and introduction you seem to take some of this "for granted". E.g. you end the abstract saying "are planning routes as well as particular footplants. Such planning ahead allows the minimization of energetic costs. Thus locomotor behavior in natural environments is controlled by decision mechanisms that optimize for multiple factors in the context of well-calibrated sensory and motor internal models". This is too speculative to be in the abstract, in my opinion. That is, you take as "given" that energetic cost is the major driver of path selection in your task, and that the relevant perception relies on internal models. Neither of these is a priori obvious nor is it as far as I can tell shown by your data (optimizing other variables, satisficing behavior, or online "direct perception" cannot be ruled out). 

      We have rewritten the abstract and Discussion with these concerns in mind.

      You should probably also reference: 

      Warren, W. H. (1984). Perceiving affordances: Visual guidance of stair climbing. Journal of Experimental Psychology: Human Perception and Performance, 10(5), 683-703. https://doi.org/10.1037/0096-1523.10.5.683 

      Warren WH Jr, Young DS, Lee DN. Visual control of step length during running over irregular terrain. J Exp Psychol Hum Percept Perform. 1986 Aug;12(3):259-66. doi: 10.1037//0096-1523.12.3.259. PMID: 2943854. 

      We have added these references to the introduction.

      Minor point 

      Related to (2) above, the path selection results are sometimes expressed a bit convolutedly, and the gist can get lost in the technical vocabulary. The generation of alternative "paths" and comparison of their slope and tortuousness parameters show that the participants preferred smaller slope/shorter paths. So, as far as I can tell, what this says is that in rugged terrain people like paths that are as "flat" as possible. This is common sense so hardly surprising. Do not be afraid to say so, and to express the result in plain non-technical terms. That an apple falls from a tree is common sense and hardly surprising. Yet quantifying the phenomenon, and carefully assessing the parameters of the path that the apple takes, turned out to be scientifically valuable - even if the observation itself lacked "novelty". 

      Thanks.  We have tried to clarify the methods/results with this in mind.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We are grateful for the many positive comments. Moreover, we appreciate the recommendations to improve the manuscript; particularly, the important discussion points raised by reviewer 1 and the comments made by reviewer 2 concerning an extended quantification of how near-spike input conductances vary across individual spikes. We have performed several new detailed analyses to address reviewer 2’s comments. In particular, we now provide for all relevant postsynaptic cells the complete distributions of the excitatory and inhibitory input conductance changes that occur right before and after postsynaptic spiking, and we provide corresponding distributions of non-spiking regions as a reference. We performed these analyses separately for different baseline activity levels. Our new results largely support our previous conclusions but provide a much more nuanced picture of the synaptic basis of spiking. To the best of our knowledge, this is the first time that parallel information on input excitation, inhibition and postsynaptic spiking is provided for individual neurons in a biological network. We would argue that our new results further support the fundamental notion that even a reductionist neuronal culture model can give rise to sophisticated network dynamics with spiking – at least partially – triggered by rapid input fluctuations, as predicted by theory. Moreover, it appears that changes in input inhibition are a key mechanism to regulate spiking during spontaneous recurrent network activity. It will be exciting to test whether this holds true for neural circuits in vivo.

      In the following section, we address the reviewers’ comments individually.

      Reviewer 1:

      In this study the authors develop methods to interrogate cultured neuronal networks to learn about the contributions of multiple simultaneously active input neurons to postsynaptic activity. They then use these methods to ask how excitatory and inhibitory inputs combine to result in postsynaptic neuronal firing in a network context.

      The study uses a compelling combination of high-density multi-electrode array recordings with patch recordings. They make ingenious use of physiology tricks such as shifting the reversal potential of inhibitory inputs, and identifying inhibitory vs. excitatory neurons through their influence on other neurons, to tease apart the key parameters of synaptic connections.

      We thank the reviewer for acknowledging our efforts to develop an approach to investigate the synaptic basis of spiking in biological neurons and for appreciating the technical challenges that needed to be overcome.

      The method doesn't have complete coverage of all neurons in the culture, and it appears to work on rather low-density cultures so the size of the networks in the current study is in the low tens.

      (1) It would be valuable to see the caveats associated with the small size of the networks examined here.

      (2) It would be also helpful if there were a section to discuss how this approach might scale up, and how better network coverage might be achieved.

      These are indeed very important points that we should have discussed in more detail. Maximizing the coverage of neurons is critical to our approach, as it determines the number of potential synaptic connections that can be tested. The number of cells that we seeded onto our HD-MEA chip was chosen to achieve monolayer neuronal cultures. As detailed in ‘Materials and Methods -> Electrode selection and long-term extracellular recording of network spiking’, the entire HD-MEA chip (all 26'400 electrodes) was scanned for activity at the beginning of each experiment, and electrodes that recorded spiking activity were subsequently selected. While it is possible that some individual neurons escape detection, since they were not directly adjacent to an electrode, we estimate that a large majority of the active neurons in the culture was covered by our electrode selection method. New generations of CMOS HD-MEAs developed in our laboratory and other groups feature higher electrode densities, larger recording areas, and larger sets of electrodes that can be simultaneously recorded from (e.g., DOI:

      10.1109/JSSC.2017.2686580 & 10.1038/s41467-020-18620-4). These features will substantially improve the coverage of the network and also allow for using larger neuronal networks. As suggested by reviewer 1, we added these points to the Discussion section of the revised manuscript.

      The authors obtain a number of findings on the conditions in which the dynamics of excitatory and inhibitory inputs permit spiking, and the statistics of connectivity that result in this. This is of considerable interest, and clearly one would like to see how these findings map to larger networks, to non-cortical networks, and ideally to networks in-vivo. The suite of approaches discussed here could potentially serve as a basis for such further development.

      (3) It would be useful for the authors to suggest such approaches.

      We are confident that our suite of approaches will open important avenues to study the E & I input basis of postsynaptic spiking in other circuits beyond the in vitro cortical networks studied here. In fact, CMOS HD-MEA probes have been successfully combined with patch clamping in vivo (DIO: 10.1101/370080) and, in principle, the strategies and software tools introduced in our study would be equally applicable in an in vivo context. However, currently available in vitro CMOS HD-MEAs still surpass their in vivo counterparts (e.g., Neuropixels probes) in terms of electrode count. Moreover, using in vitro neural networks enables easy access and better network coverage compared to in vivo conditions. These are the main reasons why we chose an in vitro network for our investigation. We added these points to the Discussion section of the revised manuscript.

      (4) The authors report a range of synaptic conductance waveforms in time. Not surprisingly, E and I look broadly different. Could the authors comment on the implications of differences in time-course of conductance profiles even within E (or I) synapses? Is this functional or is it an outcome of analysis uncertainty?

      We are grateful to the reviewer for raising this interesting point. On the one hand, the onsets of the synaptic conductance waveform estimates were strikingly different between E and I synapses (see Fig. 8D). Furthermore, the rise and decay phases of synaptic currents were distinct for E vs. I inputs (Fig. 4C). We think that these differences are not just due to analysis uncertainty because both these observations are consistent with previously described properties of E and I inputs: Synaptic GABAergic I currents are typically slower compared to Glutamatergic E currents with respect to both rising and decay phase (DOI: 10.1126/science.abj586). Moreover, the relatively small onset latencies for I inputs that we observed are consistent with the well-known local action of inhibition. This finding was also consistent with smaller PRE-POST distances and general differences in neurite characteristics of E compared to I cells (Fig. S2).

      One of the challenges in doing such studies in a dish is that the network is simply ticking away without any neural or sensory context to work on, nor any clear idea of what its outputs might mean. Nevertheless, at a single-neuron level one expects that this system might provide a reasonable subset of the kinds of activity an individual cell might have to work on.

      (5) Could the authors comment on what subsets of network activity is, and is not, likely to be seen in the culture?

      (6) Could they indicate what this would mean for the conclusions about E-I summation, if the in-vivo activity follows different dynamics?

      We agree that there are natural limitations to a reductionist model, such as a dissociated cell culture. One may argue that neuronal cultures bear some similarities with neural networks formed during early brain development, where network formation is primarily driven by intrinsic, self-organizational capabilities. While such a self-organization is likely constrained in a 2D culture, it has been shown that several important circuit mechanisms that are observed in vivo are preserved in 2D dissociated cultures. For example, dissociated neuronal cultures can maintain E-I balance and achieve active decorrelation (DOI: 10.1038/nn.4415). In addition, in terms of activity levels, the sequences of heightened and more quiescent network spiking bear similarities with cortical Up-Down state oscillations observed during slow-wave sleep. To what extent individual circuit connectivity motifs and more nuanced network dynamics, found in vivo, can be recapitulated in vitro, is still not clear. However, combining our and previous work (especially DOI: 10.1038/nn.4415), we believe that there is sufficient evidence to justify work such as ours. On the one hand, identifying in simple cell culture models features of network dynamics and microcircuits known (or predicted) to exist in vivo is a testimony of neuronal self-organizing capabilities. On the other hand, our in vitro results will allow for more directed testing of equivalent mechanisms in vivo.

      Reviewer 2:

      The authors had two aims in this study. First, to develop a tool that lets them quantify the synaptic strength and sign of upstream neurons in a large network of cultured neurons. Second, they aimed at disentangling the contributions of excitatory and inhibitory inputs to spike generation.

      For the quantification of synaptic currents, their methods allows them to quantify excitatory and inhibitory currents simultaneously, as the sign of the current is determined by the neuron identity in the high-density extracellular recording. They further made sure that their method works for nonstationary firing rates, and they did a simulation to characterize what kind of connections their analysis does not capture. They did not include the possibility of (dendritic) nonlinearities or gap junctions or any kind of homeostatic processes.

      Thank you for the concise summary of our aims and of the features of our method. Indeed, we did not model nonlinear synaptic interactions, short-term plasticity etc., as there is likely a spectrum of possible interaction rules. Importantly, non-linear synaptic interactions were reduced by performing synaptic measurements in voltage-clamp mode.

      We do not anticipate that this would impact our connectivity inference per se. However, the presence of a significant number of nonlinear events would imply that some deviations between reconstructed and measured patch current traces were to be expected even if all incoming monosynaptic connections were identified. In the future, it will be exciting to add to our current experimental protocol a simultaneous HD-MEA & patch-clamp recording, in which the membrane potential is measured in current-clamp mode. Following application of our synaptic input-mapping procedure, one could, in this way, directly assess input-sequence dependent non-linear synaptic integration during spontaneous network activity.

      I see a clear weakness in the way that they quantify their goodness of fit, as they only report the explained variance, while their data are quite nonstationary. It could help to partition the explained variance into frequency bands, to at least separate the effects of a bias in baseline, the (around 100 Hz) band of synaptic frequencies and whatever high-frequency observation noise there may be. Another weak point is their explanation of unexplained variance by potential activation of extrasynaptic receptors without providing evidence. Given that these cultures are not a tissue and diffusion should be really high, this idea could easily be tested by adding a tiny amount of glutamate to the culture media.

      As suggested by the reviewer, we have now partitioned the current traces into frequency bands and separately assessed the goodness-of-fit. We have updated Fig. 3C accordingly:

      The following sentence was added to the main text:

      “We separately compared slow baseline changes (< 3 Hz), fast synaptic activity (3 - 200 Hz) and putative high-frequency noise (> 200 Hz), yielding a median variance explained of approximately 60% in the 3 - 200 Hz range (Fig. 3C).”

      Importantly, the variance explained in the frequency range of synaptic activity remains high. We would also like to point out that, even if all synaptic input connections were identified, one would expect some deviations between measured and reconstructed current trace. This is because the reconstructed trace is based on average input current waveforms and in the measured trace there may be synaptic transmission failures.

      We agree that the offered explanation for unexplained variance by activation of extrasynaptic receptors is fairly speculative. As it was not a crucial discussion point, we have therefore removed the statement.

      For the contributions of excitation and inhibition to neuronal spiking, the authors found a clear reduction of inhibitory inputs and increase of excitation associated with spiking when averaging across many spikes. And interestingly, the inhibition shows a reversal right after a spike and the timescale is faster during higher network activity. While these findings are great and provide further support that their method is working, they stop at this exciting point where I would really have liked to see more detail.

      Thank you for acknowledging our main results concerning the synaptic basis of spiking. We attempted to integrate in one manuscript a suite of new approaches, in addition to the respective applications. We, therefore, tried to strike the appropriate level of detail in presenting our findings. With regard to our analyses of which synaptic input events regulate postsynaptic spiking, we agree with reviewer 2’s assessment that more detail concerning the variability across individual spikes would be helpful. In the following parts, we detail multiple new analyses that we have included in the revised manuscript to address reviewer 2’s comments.

      A concern, of course, is that the network bursts in cultures are quite stereotypical, and that might cause averages across many bursts to show strange behaviour. So what I am missing here is a reference or baseline or null hypothesis. How does it look when using inputs from neurons that are not connected? And then, it looks like the E/(E+I) curve has lots of peaks of similar amplitude (that could be quantified...), so why does the neuron spike where it does? If I would compare to the peak (of similar amplitude) right before or right after (as a reference) are there some systematic changes? Is maybe the inhibition merely defining some general scaffold where spikes can happen and the excitation causes the spike as spiking is more irregular?

      The averaged trace reveals a different timescale for high and low activity states. But does that reflect a superposition of EPSCs in a single trial or rather a different jittering of a single EPSC across trials? For answering this question, it would be good to know the variance (and whether/ how much it changes over time). Maybe not all spikes are preceded by a decrease in inhibition. Could you quantitify (correlate, scatterplot?) how exactly excitation and inhibition contributions relate for single postsynaptic spikes (or single postsynaptic non-spikes)? After all, this would be the kind of detail that requires the large amount of data that this study provides.

      First of all, we are very grateful for the reviewer’s thorough assessment of our work and for the many valuable suggestions to improve it. We are convinced that we have addressed with our new analyses and the updated manuscript all issues raised here. One of the main findings from our original manuscript was that a rapid and brief change in input conductance (and particularly a reduction in inhibition) is an important spike trigger/regulator. We followed the reviewer’s suggestion and now provide scatter plots and distributions of the pre- (and post-spike) changes in input excitation and inhibition for individual postsynaptic spikes. A quantification of the peaks in the noisy E/(E+I) traces was not always trivial, which is why we reasoned that an assessment of the respective E and I changes is better suited. Moreover, as an unbiased reference, we generated separately for each postsynaptic cell a corresponding distribution of changes in input conductance in non-spiking periods (using random time points). We included our new results and updated figures in our responses to the specific reviewer comments below.

      For the first part, the authors achieved their goal in developing a tool to study synaptic inputs driving subthreshold activity at the soma, and characterizing such connections. For the second part, they found an effect of EPSCs on firing, but they barely did any quantification of its relevance due to the lack of a reference.

      With the availability of Neuropixels probes, there is certainly use for their tool in in vivo applications, and their statistical analysis provides a reference for future studies.

      The relevance of excitatory and inhibitory currents on spiking remains to be seen in an updated version of the manuscript.

      Thank you. Please see our new analyses below. Our new findings are in agreement with the main conclusions of the original manuscript. We provide evidence that rapid pre-spike changes in input conductance are observed across most individual spikes and that these rapid changes occur significantly more often before measured spikes than in non-spiking periods.

      I feel that specifically Figures 6 and 7 lack relevant detail and a consistent representation that would allow the reader to establish links between the different panels. The analysis shows very detailed examples, but then jumps into analyses that show population averages over averaged responses, losing or ignoring the variability across trials. In addition, while their results themselves pass a statistical test, it is crucial to establish some measure of how relevant these results are. For that, I would really want to know how much spiking would actually be restricted by the constraints that would be posed by these results, i.e. would this be reflected in tiny changes in spiking probabilities, or are there times when spiking probabilities are necessarily high, or do we see times when we would almost certainly get a spike, but neurons can fire during other times as well.

      I would agree that a detailed, quantitative analysis of this question is beyond the scope of this paper, but a qualitative analysis is feasible and should be done.

      Please see our revised Figure 6. We have rearranged some of the original panels and removed one example of mean conductance profiles. Moreover, we removed a panel with analysis results based on mean conductances that is now obsolete, as more detailed analyses are provided (which are in agreement with the original findings). Analyses from panels (A-F) are mostly unchanged. Panels (G-J) show the new results.

      The following paragraphs, which were added to the main text of the revised manuscript, describe our new findings:

      “For a more nuanced picture of which synaptic events are associated with postsynaptic spiking, we next quantified the changes in input excitation and inhibition that preceded individual postsynaptic spikes. In our analysis, we first focused on periods with high synaptic input activity. As previously discussed, cortical neurons in vivo typically receive and integrate barrages of input activation, similar to the high-activity events that we observed here (e.g., the event depicted in Fig. 6A, right). In Fig. 6G/H, individual pre-spike changes in input conductance are shown for two example postsynaptic neurons (plots labeled ‘spiking’, right). To assess how specific these conductance changes were to spiking periods, we also quantified the changes in input conductance that occurred during non-spiking periods as a reference (we used random time points from high-activity events excluding time points adjacent to measured spike times; we upscaled the number of measured spikes by 10x; the respective plots were labeled ‘non-spiking’). Spikes of both example neurons exhibited – compared to non-spiking regions – significantly more often a pre-spike decrease in inhibition, consistent with the mean conductance profiles. Precisely how an increase (top-right quadrants in Fig. 6G/H) or decrease (bottom-left quadrants) in both I and E conductance influenced the neuronal membrane potential is difficult to predict. However, if rapid changes in input conductance had a significant role in triggering spikes, one would expect that fewer spikes would exhibit a hyperpolarizing pre-spike increase in I and decrease in E (top-left quadrant) compared to the non-spiking period. Conversely, a decrease in I and an increase E (bottom-right quadrants) would likely result in a membrane potential depolarization so that more spikes should feature the corresponding pre-spike conductance changes compared to non-spiking periods. These relative shifts are precisely what can be observed in the plots of the two example neurons (Fig. 6G/H) and, in fact, across recordings (Fig. 6I). Finally, we compared the distributions of pre-spike changes in input inhibition and excitation of each postsynaptic neuron (Fig. 6J). Further indicating a pivotal role of inhibition in triggering spikes, 6 out of 7 neurons exhibited a clear decrease in the mean values (and medians) of pre-spike changes in inhibition compared to non-spiking periods. Interestingly, the 3 out of 7 neurons with an increase in excitation showed the smallest decrease in inhibition (or even an increase in inhibition in case of neuron #7). This latter observation suggests a matching of E and I inputs and cell-specific relative contributions of E and I conductance changes in triggering spikes.

      Theoretically, neuronal spiking could be driven by a prolonged suprathreshold depolarization (Petersen and Berg 2016; Renart et al. 2007) or, in more favorable subthreshold regimes, by fast synaptic input fluctuations (Ahmadian and Miller 2021; Amit and Brunel 1997; Brunel 2000; Van Vreeswijk and Sompolinsky 1996). In this section, we demonstrated that the majority of investigated neurons featured – during high-activity periods – a significant number of spikes that were associated with rapid pre-spike changes in input conductances. These findings suggest that even simple neuronal cultures can self-organize to form circuits exhibiting sophisticated spiking dynamics.”

      Our new analyses detailed in Fig. 6 show that there are also presumably depolarizing events (e.g., decrease in I and increase in E) in non-spiking regions. In future studies, it will be interesting to examine what distinguishes these events from spike-inducing events of similar magnitude – one possibility is a dependency on specific input-activation sequences.

      During the first days and weeks of developing neuronal cultures, spiking activity rapidly shifts from synapse-independent activity patterns to spiking dynamics that do depend on synaptic inputs and are progressively organized in network-wide high-activity events (DOI: 10.1016/j.brainres.2008.06.022). In our study, cultures at days-in-vitro 15-18 were used, and approximately 15% of the spikes occurred during high-activity events with relatively strong E and I input activity. In addition, spikes that occurred during low-activity events were at least partially regulated by synaptic input (see answers below related to Fig. 7).

      In the following, I am detailing what I would consider necessary to be done about these two Figures:

      Figure 6C is indeed great, though I don't see why the authors would characterize synchrony as low. When comparing with Figure 4B, I'd think that some of these values are quite high. And it wouldn't help me to imagine error bars in panel 6D.

      We have removed our characterization as ‘low’ from the text. One important difference between our synchrony measure (STTC) and the quantification of spike-transmission probability (STP) is the ‘lag’ of a few milliseconds for the STP quantification window to account for synaptic delay.

      Figure 6B is useful, but could be done better: The autocovariance of a shotnoise process is a convolution of the autocovariance of underlying point process and the autocovariance of the EPSC kernel. So one would want to separate those to obtain a better temporal resolution. But a shotnoise process has well defined peaks, and the time of these local maxima can be estimated quite precisely. Now if I would do a peak triggered average instead of the full convolution, I would do half of the deconvolution and obtain a temporally asymmetric curve of what is expected to happen around an EPSC. Importantly, one could directly see expected excitation after inhibition or expected inhibition after excitation, and this visualization could be much better and more intuitively compared to panel 6E.

      We appreciate the reviewer’s suggestion to present these results in a more sophisticated way. We would like to propose to stick with the original analysis to have it comparable with related analyses from the literature (e.g., DOI: 10.1038/nn.2105). Therefore, we hope the reviewer finds it acceptable that we leave the presentation of the data in its original form and potentially follow up in future work with the analysis strategy proposed by the reviewer.

      Panel D needs some variability estimate (i.e. standard deviation or interquartile range or even a probability density) for those traces.

      Figure 6E: Please use more visible colors. A sensitivity analysis to see traces for 2E/(2E+I) and E/(E+2I) would be great.

      Figure 6F: with an updated panel B, we should be able to have a slope for average inhibition after excitation for each of these cells. A second panel / third column showing those slopes would be of interest. It would serve as a reference for what could be expected from E-I interactions alone.

      With regard to the variability estimate in D, we now provide multiple panels characterizing the variability. For one, Fig. 6H contains a scatter plot of the pre-spike changes in input conductance across all individual postsynaptic spikes from the example cell shown in D. Moreover, in Fig. 7A, we show from the same example cell the standard deviations associated with the mean conductance traces separately for spikes that occurred during low- and high-activity states. For better visibility and because the separation according to activity states is more informative, we kept the original presentation of panel D (however, removing one example cell). In addition, we show the same mean traces from panel D with the respective standard deviations (across all spikes) in Supplementary Figure S3.

      Colors in Fig. 6E are adjusted, as requested.

      We have removed panel Fig. 6F as we now provide more detailed analyses at single-spike level (see Fig. 6G-J).

      Figure 6G: Could the authors provide an interquartile range here?

      With regard to the aligned input-output data from original panel Fig. 6G, now in panel Fig. 6F in the updated figure version, we show all individual traces that were averaged: the E/I traces from panel Fig. 6E and the three action potential waveforms from Supplementary Figure S5. Therefore, we chose to present the means only for better visibility.

      Figure 7A: it may be hard to squeeze in variability estimates here, but the information on whether and how much variance might be explained is essential. Maybe add another panel to provide a variability estimate? The variability estimate in panel 7B and 7D only reflect variability across connections, and it would be useful to add panels for the time courses of the variability of g (or E/(E+I) respectively).

      We now include the standard deviations across the input conductance traces in the updated Fig. 7A, as requested. We have also simplified Fig. 7 and performed the analysis using the 6 out of 7 neurons that, based on our new analysis (Fig. 6J) displayed a clear reduction in pre-spike inhibition, relative to the reference distribution. For a complete overview of the state-dependent changes in input conductance that are associated with individual postsynaptic spikes, we have included a new supplementary figure (Fig. S6). Fig. S6 also includes a characterization of the changes in input inhibition that occur right after postsynaptic spiking. In addition, Fig. S6D shows the standard deviations corresponding to the mean input conductance traces of all cells – separately for high- and low-activity periods.

      We added the following paragraph to the main text of the revised manuscript:

      “How can these deviations in the mean conductance profiles be explained? To answer this question, we further quantified – separately for low and high g states – the changes in input inhibition that occurred right before and after individual postsynaptic spikes (Fig. S6). This single-spike analysis suggested that, during high g states, most spikes experienced a post-spike increase and pre-spike decrease in inhibition (see also Fig. 6J). On the other hand, low g states were characterized by sparse synaptic input (e.g., see reconstruction in Fig. 6A). Therefore, many of the spikes that occurred during low g states were associated with little change in input conductance (note medians of approximately zero in Fig. S6A/C). Nevertheless, a considerable fraction of spikes (often > 25%) from low g states were also associated with a post-spike increase and pre-spike drop in inhibition. It, therefore, appears that even the sparse inhibitory inputs of low g states could influence spike timing. Moreover, the post-spike increases in input inhibition during low g states suggest that there were strong regulatory inhibitory circuits in place. However, limited activity levels during low g states presumably introduced an increased jitter of these spike-associated changes in input inhibition.

      In summary, the input inhibition of high-conductance states provides reliable and narrow windows-of-spiking opportunity. In addition, even during periods of sparse activity, there are rudimentary synaptic mechanisms in place to regulate spike timing.”

      As a suggestion for further analysis, though I am well aware that this is likely beyond the scope of this manuscript, I'd suggest the following analysis:

      I would split the data into the high and low activity states. Then I would compute the average of E/(E+I) values for spikes. Assuming that spikes tend to happen for local maxima of E/(E+I) I would find local maxima for periods without spike such that their average is equal to the value for actual spikes. Finally, I would test for a systematic difference in either excitation or inhibition.

      If there is no difference, you can make the claim that synaptic input does not guarantee a spike, and compare to a global average of E/(E+I).

      We are grateful for the fantastic suggestions for future analysis. We look forward to conducting these analyses in a more detailed follow-up characterization.

      In addition to the major alterations detailed above, we performed smaller corrections (e.g., spelling mistakes, inaccuracies) in some parts of the manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this manuscript, the authors use a large dataset of neuroscience publications to elucidate the nature of self-citation within the neuroscience literature. The authors initially present descriptive measures of self-citation across time and author characteristics; they then produce an inclusive model to tease apart the potential role of various article and author features in shaping self-citation behavior. This is a valuable area of study, and the authors approach it with an appropriate and well-structured dataset.

      The study's descriptive analyses and figures are useful and will be of interest to the neuroscience community. However, with regard to the statistical comparisons and regression models, I believe that there are methodological flaws that may limit the validity of the presented results. These issues primarily affect the uncertainty of estimates and the statistical inference made on comparisons and model estimates - the fundamental direction and magnitude of the results are unlikely to change in most cases. I have included detailed statistical comments below for reference.

      Conceptually, I think this study will be very effective at providing context and empirical evidence for a broader conversation around self-citation. And while I believe that there is room for a deeper quantitative dive into some finer-grained questions, this paper will be a valuable catalyst for new areas of inquiry around citation behavior - e.g., do authors change self-citation behavior when they move to more or less prestigious institutions? do self-citations in neuroscience benefit downstream citation accumulation? do journals' reference list policies increase or decrease self-citation? - that I hope that the authors (or others) consider exploring in future work.

      Thank you for your suggestions and your generally positive view of our work. As described below, we have made the statistical improvements that you suggested.

      Statistical comments:

      (1) Throughout the paper, the nested nature of the data does not seem to be appropriately handled in the bootstrapping, permutation inference, and regression models. This is likely to lead to inappropriately narrow confidence bands and overly generous statistical inference.

      We apologize for this error. We have now included nested bootstrapping and permutation tests. We defined an “exchangeability block” as a co-authorship group of authors. In this dataset, that meant any authors who published together (among the articles in this dataset) as a First Author / Last Author pairing were assigned to the same exchangeability block. It is not realistic to check for overlapping middle authors in all papers because of the collaborative nature of the field. In addition, we believe that self-citations are primarily controlled by first and last authors, so we can assume that middle authors do not control self-citation habits. We then performed bootstrapping and permutation tests in the constraints of the exchangeability blocks.

      We first describe this in the results (page 3, line 110):

      “Importantly, we accounted for the nested structure of the data in bootstrapping and permutation tests by forming co-authorship exchangeability blocks.”

      We also describe this in 4.8 Confidence Intervals (page 21, line 725):

      “Confidence intervals were computed with 1000 iterations of bootstrap resampling at the article level. For example, of the 100,347 articles in the dataset, we resampled articles with replacement and recomputed all results. The 95% confidence interval was reported as the 2.5 and 97.5 percentiles of the bootstrapped values.

      We grouped data into exchangeability blocks to avoid overly narrow confidence intervals or overly optimistic statistical inference. Each exchangeability block comprised any authors who published together as a First Author / Last Author pairing in our dataset. We only considered shared First/Last Author publications because we believe that these authors primarily control self-citations, and otherwise exchangeability blocks would grow too large due to the highly collaborative nature of the field. Furthermore, the exchangeability blocks do not account for co-authorship in other journals or prior to 2000. A distribution of the sizes of exchangeability blocks is presented in Figure S15.”

      In describing permutation tests, we also write (page 21, line 739):

      “4.9 P values

      P values were computed with permutation testing using 10,000 permutations, with the exception of regression P values and P values from model coefficients. For comparing different fields (e.g., Neuroscience and Psychiatry) and comparing self-citation rates of men and women, the labels were randomly permuted by exchangeability block to obtain null distributions. For comparing self-citation rates between First and Last Authors, the first and last authorship was swapped in 50% of exchangeability blocks.”

      For modeling, we considered doing a mixed effects model but found difficulties due to computational power. For example, with our previous model, there were hundreds of thousands of levels for the paper random effect, and tens of thousands of levels for the author random effect. Even when subsampling or using packages designed for large datasets (e.g., mgcv’s bam function: https://www.rdocumentation.org/packages/mgcv/versions/1.9-1/topics/bam), we found computational difficulties.

      As a result, we switched to modeling results at the paper level (e.g., self-citation count or rate). We found that results could be unstable when including author-level random effects because in many cases there was only one author per group. Instead, to avoid inappropriately narrow confidence bands, we resampled the dataset such that each author was only represented once. For example, if Author A had five papers in this dataset, then one of their five papers was randomly selected. We updated our description of our models in the Methods section (page 21, line 754):

      “4.10 Exploring effects of covariates with generalized additive models

      For these analyses, we used the full dataset size separately for First and Last Authors (Table S2). This included 115,205 articles and 5,794,926 citations for First Authors, and 114,622 articles and 5,801,367 citations for Last Authors. We modeled self-citation counts, self-citation rates, and number of previous papers for First Authors and Last Authors separately, resulting in six total models.

      We found that models could be computationally intensive and unstable when including author-level random effects because in many cases there was only one author per group. Instead, to avoid inappropriately narrow confidence bands, we resampled the dataset such that each author was only represented once. For example, if Author A had five papers in this dataset, then one of their five papers was randomly selected. The random resampling was repeated 100 times as a sensitivity analysis (Figure S12).

      For our models, we used generalized additive models from mgcv’s “gam” function in R 49. The smooth terms included all the continuous variables: number of previous papers, academic age, year, time lag, number of authors, number of references, and journal impact factor. The linear terms included all the categorical variables: field, gender affiliation country LMIC status, and document type. We empirically selected a Tweedie distribution 50 with a log link function and p=1.2. The p parameter indicates that the variance is proportional to the mean to the p power 49. The p parameter ranges from 1-2, with p=1 equivalent to the Poisson distribution and p=2 equivalent to the gamma distribution. For all fitted models, we simulated the residuals with the DHARMa package, as standard residual plots may not be appropriate for GAMs 51. DHARMa scales the residuals between 0 and 1 with a simulation-based approach 51. We also tested for deviation from uniformity, dispersion, outliers, and zero inflation with DHARMa. Non-uniformity, dispersion, outliers, and zero inflation were significant due to the large sample size, but small in effect size in most cases. The simulated quantile-quantile plots from DHARMa suggested that the observed and simulated distributions were generally aligned, with the exception of slight misalignment in the models for the number of previous papers. These analyses are presented in Figure S11 and Table S7.

      In addition, we tested for inadequate basis functions using mgcv’s “gam.check()” function 49. Across all smooth predictors and models, we ultimately selected between 10-20 basis functions depending on the variable and outcome measure (counts, rates, papers). We further checked the concurvity of the models and ensured that the worst-case concurvity for all smooth predictors was about 0.8 or less.”

      The direction of our results primarily stayed the same, with the exception of gender results. Men tended to self-cite slightly less (or equal self-citation rates) after accounting for numerous covariates. As such, we also modeled the number of previous papers to explain the discrepancy between our raw data and the modeled gender results. Please find the updated results text below (page 11, line 316):

      “2.9 Exploring effects of covariates with generalized additive models

      Investigating the raw trends and group differences in self-citation rates is important, but several confounding factors may explain some of the differences reported in previous sections. For instance, gender differences in self-citation were previously attributed to men having a greater number of prior papers available to self-cite 7,20,21. As such, covarying for various author- and article-level characteristics can improve the interpretability of self-citation rate trends. To allow for inclusion of author-level characteristics, we only consider First Author and Last Author self-citation in these models.

      We used generalized additive models (GAMs) to model the number and rate of self-citations for First Authors and Last Authors separately. The data were randomly subsampled so that each author only appeared in one paper. The terms of the model included several article characteristics (article year, average time lag between article and all cited articles, document type, number of references, field, journal impact factor, and number of authors), as well as author characteristics (academic age, number of previous papers, gender, and whether their affiliated institution is in a low- and middle-income country). Model performance (adjusted R2) and coefficients for parametric predictors are shown in Table 2. Plots of smooth predictors are presented in Figure 6.

      First, we considered several career and temporal variables. Consistent with prior works 20,21, self-citation rates and counts were higher for authors with a greater number of previous papers. Self-citation counts and rates increased rapidly among the first 25 published papers but then more gradually increased. Early in the career, increasing academic age was related to greater self-citation. There was a small peak at about five years, followed by a small decrease and a plateau. We found an inverted U-shaped trend for average time lag and self-citations, with self-citations peaking approximately three years after initial publication. In addition, self-citations have generally been decreasing since 2000. The smooth predictors showed larger decreases in the First Author model relative to the Last Author model (Figure 6).

      Then, we considered whether authors were affiliated with an institution in a low- and middle-income country (LMIC). LMIC status was determined by the Organisation for Economic Co-operation and Development. We opted to use LMIC instead of affiliation country or continent to reduce the number of model terms. We found that papers from LMIC institutions had significantly lower self-citation counts (-0.138 for First Authors, -0.184 for Last Authors) and rates (-12.7% for First Authors, -23.7% for Last Authors) compared to non-LMIC institutions. Additional results with affiliation continent are presented in Table S5. Relative to the reference level of Asia, higher self-citations were associated with Africa (only three of four models), the Americas, Europe, and Oceania.

      Among paper characteristics, a greater number of references was associated with higher self-citation counts and lower self-citation rates (Figure 6). Interestingly, self-citations were greater for a small number of authors, though the effect diminished after about five authors. Review articles were associated with lower self-citation counts and rates. No clear trend emerged between self-citations and journal impact factor. In an analysis by field, despite the raw results suggesting that self-citation rates were lower in Neuroscience, GAM-derived self-citations were greater in Neuroscience than in Psychiatry or Neurology.

      Finally, our results aligned with previous findings of nearly equivalent self-citation rates for men and women after including covariates, even showing slightly higher self-citation rates in women. Since raw data showed evidence of a gender difference in self-citation that emerges early in the career but dissipates with seniority, we incorporated two interaction terms: one between gender and academic age and a second between gender and the number of previous papers. Results remained largely unchanged with the interaction terms (Table S6).

      2.10 Reconciling differences between raw data and models

      The raw and GAM-derived data exhibited some conflicting results, such as for gender and field of research. To further study covariates associated with this discrepancy, we modeled the publication history for each author (at the time of publication) in our dataset (Table 2). The model terms included academic age, article year, journal impact factor, field, LMIC status, gender, and document type. Notably, Neuroscience was associated with the fewest number of papers per author. This explains how authors in Neuroscience could have the lowest raw self-citation rates but the highest self-citation rates after including covariates in a model. In addition, being a man was associated with about 0.25 more papers. Thus, gender differences in self-citation likely emerged from differences in the number of papers, not in any self-citation practices.”

      (2) The discussion of the data structure used in the regression models is somewhat opaque, both in the main text and the supplement. From what I gather, these models likely have each citation included in the model at least once (perhaps twice, once for first-author status and one for last-author status), with citations nested within citing papers, cited papers, and authors. Without inclusion of random effects, the interpretation and inference of the estimates may be misleading.

      Please see our response to point (1) to address random effects. We have also switched to GAMs (see point #3 below) and provided more detail in the methods. Notably, we decided against using author-level effects due to poor model stability, as there can be as few as one author per group. Instead, we subsampled the dataset such that only one paper appeared from each author.

      (3) I am concerned that the use of the inverse hyperbolic sine transform is a bit too prescriptive, and may be producing poor fits to the true predictor-outcome relationships. For example, in a figure like Fig S8, it is hard to know to what extent the sharp drop and sign reversal are true reflections of the data, and to what extent they are artifacts of the transformed fit.

      Thank you for raising this point. We have now switched to using generalized additive models (GAMs). GAMs provide a flexible approach to modeling that does not require transformations. We described this in detail in point (1) above and in Methods 4.10 Exploring effects of covariates with generalized additive models (page 21, line 754).

      “4.10 Exploring effects of covariates with generalized additive models

      For these analyses, we used the full dataset size separately for First and Last Authors (Table S2). This included 115,205 articles and 5,794,926 citations for First Authors, and 114,622 articles and 5,801,367 citations for Last Authors. We modeled self-citation counts, self-citation rates, and number of previous papers for First Authors and Last Authors separately, resulting in six total models.

      We found that models could be computationally intensive and unstable when including author-level random effects because in many cases there was only one author per group. Instead, to avoid inappropriately narrow confidence bands, we resampled the dataset such that each author was only represented once. For example, if Author A had five papers in this dataset, then one of their five papers was randomly selected. The random resampling was repeated 100 times as a sensitivity analysis (Figure S12).

      For our models, we used generalized additive models from mgcv’s “gam” function in R 48. The smooth terms included all the continuous variables: number of previous papers, academic age, year, time lag, number of authors, number of references, and journal impact factor. The linear terms included all the categorical variables: field, gender affiliation country LMIC status, and document type. We empirically selected a Tweedie distribution 49 with a log link function and p=1.2. The p parameter indicates that the variance is proportional to the mean to the p power 48. The p parameter ranges from 1-2, with p=1 equivalent to the Poisson distribution and p=2 equivalent to the gamma distribution. For all fitted models, we simulated the residuals with the DHARMa package, as standard residual plots may not be appropriate for GAMs 50. DHARMa scales the residuals between 0 and 1 with a simulation-based approach 50. We also tested for deviation from uniformity, dispersion, outliers, and zero inflation with DHARMa. Non-uniformity, dispersion, outliers, and zero inflation were significant due to the large sample size, but small in effect size in most cases. The simulated quantile-quantile plots from DHARMa suggested that the observed and simulated distributions were generally aligned, with the exception of slight misalignment in the models for the number of previous papers. These analyses are presented in Figure S11 and Table S7.

      In addition, we tested for inadequate basis functions using mgcv’s “gam.check()” function 48. Across all smooth predictors and models, we ultimately selected between 10-20 basis functions depending on the variable and outcome measure (counts, rates, papers). We further checked the concurvity of the models and ensured that the worst-case concurvity for all smooth predictors was about 0.8 or less.”

      (4) It seems there are several points in the analysis where papers may have been dropped for missing data (e.g., missing author IDs and/or initials, missing affiliations, low-confidence gender assessment). It would be beneficial for the reader to know what % of the data was dropped for each analysis, and for comparisons across countries it would be important for the authors to make sure that there is not differential missing data that could affect the interpretation of the results (e.g., differences in self-citation being due to differences in Scopus ID coverage).

      Thank you for raising this important point. In the methods section, we describe how the data are missing (page 18, line 623):

      “4.3 Data exclusions and missingness

      Data were excluded across several criteria: missing covariates, missing citation data, out-of-range values at the citation pair level, and out-of-range values at the article level (Table 3). After downloading the data, our dataset included 157,287 articles and 8,438,733 citations. We excluded any articles with missing covariates (document type, field, year, number of authors, number of references, academic age, number of previous papers, affiliation country, gender, and journal). Of the remaining articles, we dropped any for missing citation data (e.g., cannot identify whether a self-citation is present due to lack of data). Then, we removed citations with unrealistic or extreme values. These included an academic age of less than zero or above 38/44 for First/Last Authors (99th percentile); greater than 266/522 papers for First/Last Authors (99th percentile); and a cited year before 1500 or after 2023. Subsequently, we dropped articles with extreme values that could contribute to poor model stability. These included greater than 30 authors; fewer than 10 references or greater than 250 references; and a time lag of greater than 17 years. These values were selected to ensure that GAMs were stable and not influenced by a small number of extreme values.

      In addition, we evaluated whether the data were not missing at random (Table S8). Data were more likely to be missing for reviews relative to articles, for Neurology relative to Neuroscience or Psychiatry, in works from Africa relative to the other continents, and for men relative to women. Scopus ID coverage contributed in part to differential missingness. However, our exclusion criteria also contribute. For example, Last Authors with more than 522 papers were excluded to help stabilize our GAMs. More men fit this exclusion criteria than women.”

      Due to differential missingness, we wrote in the limitations (page 16, line 529):

      “Ninth, data were differentially missing (Table S8) due to Scopus coverage and gender estimation. Differential missingness could bias certain results in the paper, but we hope that the dataset is large enough to reduce any potential biases.”

      Reviewer #2 (Public Review):

      The authors provide a comprehensive investigation of self-citation rates in the field of Neuroscience, filling a significant gap in existing research. They analyze a large dataset of over 150,000 articles and eight million citations from 63 journals published between 2000 and 2020. The study reveals several findings. First, they state that there is an increasing trend of self-citation rates among first authors compared to last authors, indicating potential strategic manipulation of citation metrics. Second, they find that the Americas show higher odds of self-citation rates compared to other continents, suggesting regional variations in citation practices. Third, they show that there are gender differences in early-career self-citation rates, with men exhibiting higher rates than women. Lastly, they find that self-citation rates vary across different subfields of Neuroscience, highlighting the influence of research specialization. They believe that these findings have implications for the perception of author influence, research focus, and career trajectories in Neuroscience.

      Overall, this paper is well written, and the breadth of analysis conducted by authors, with various interactions between variables (eg. gender vs. seniority), shows that the authors have spent a lot of time thinking about different angles. The discussion section is also quite thorough. The authors should also be commended for their efforts in the provision of code for the public to evaluate their own self-citations. That said, here are some concerns and comments that, if addressed, could potentially enhance the paper:

      Thank you for your review and your generally positive view of our work.

      (1) There are concerns regarding the data used in this study, specifically its bias towards top journals in Neuroscience, which limits the generalizability of the findings to the broader field. More specifically, the top 63 journals in neuroscience are based on impact factor (IF), which raises a potential issue of selection bias. While the paper acknowledges this as a limitation, it lacks a clear justification for why authors made this choice. It is also unclear how the "top" journals were identified as whether it was based on the top 5% in terms of impact factor? Or 10%? Or some other metric? The authors also do not provide the (computed) impact factors of the journals in the supplementary.

      We apologize for the lack of clarity about our selection of journals. We agree that there are limitations to selecting higher impact journals. However, we needed to apply some form of selection in order to make the analysis manageable. For instance, even these 63 journals include over five million citations. We better describe our rationale behind the approach as follows (page 17, line 578):

      “We collected data from the 25 journals with the highest impact factors, based on Web of Science impact factors, in each of Neurology, Neuroscience, and Psychiatry. Some journals appeared in the top 25 list of multiple fields (e.g., both Neurology and Neuroscience), so 63 journals were ultimately included in our analysis. We recognize that limiting the journals to the top 25 in each field also limits the generalizability of the results. However, there are tradeoffs between breadth of journals and depth of information. For example, by limiting the journals to these 63, we were able to look at 21 years of data (2000-2020). In addition, the definition of fields is somewhat arbitrary. By restricting the journals to a set of 63 well-known journals, we ensured that the journals belonged to Neurology, Neuroscience, or Psychiatry research. It is also important to note that the impact factor of these journals has not necessarily always been high. For example, Acta Neuropathologica had an impact factor of 17.09 in 2020 but 2.45 in 2000. To further recognize the effects of impact factor, we decided to include an impact factor term in our models.”

      In addition, we have now provided the 2020 impact factors in Table S1.

      By exclusively focusing on high impact journals, your analysis may not be representative of the broader landscape of self-citation patterns across the neuroscience literature, which is what the title of the article claims to do.

      We agree that this article is not indicative of all neuroscience literature, but rather the top journals. Thus, we have changed the title to: “Trends in Self-citation Rates in High-impact Neurology, Neuroscience, and Psychiatry Journals”. We would also like to note that compared to previous bibliometrics works in neuroscience (Bertolero et al. 2020; Dworkin et al. 2020; Fulvio et al. 2021), this article includes a wider range of data.

      (2) One other concern pertains to the possibility that a significant number of authors involved in the paper may not be neuroscientists. It is plausible that the paper is a product of interdisciplinary collaboration involving scientists from diverse disciplines. Neuroscientists amongst the authors should be identified.

      In our opinion, neuroscience is a broad, interdisciplinary field. Individuals performing neuroscience research may have a neuroscience background. Yet, they may come from many backgrounds, such as physics, mathematics, biology, chemistry, or engineering. As such, we do not believe that it is feasible to characterize whether each author considers themselves a neuroscientist or not. We have added the following to the limitations section (page 16, line 528):

      “Eighth, authors included in this work may not be neurologists, neuroscientists, or psychiatrists. However, they still publish in journals from these fields.”

      (3) When calculating self-citation rate, it is important to consider the number of papers the authors have published to date. One plausible explanation for the lower self-citation rates among first authors could be attributed to their relatively junior status and short publication record. As such, it would also be beneficial to assess self-citation rate as a percentage relative to the author's publication history. This number would be more accurate if we look at it as a percentage of their publication history. My suspicion is that first authors (who are more junior) might be more likely to self-cite than their senior counterparts. My suspicion was further raised by looking at Figures 2a and 3. Considering the nature of the self-citation metric employed in the study, it is expected that authors with a higher level of seniority would have a greater number of publications. Consequently, these senior authors' papers are more likely to be included in the pool of references cited within the paper, hence the higher rate.

      While the authors acknowledge the importance of the number of past publications in their gender analysis, it is just as important to include the interplay of seniority in (1) their first and last author self-citation rates and (2) their geographic analysis.

      Thank you for this thoughtful comment. We agree that seniority and prior publication history play an important role in self-citation rates.

      For comparing First/Last Author self-citation rates, we have now included a plot similar to Figure 2a, where self-citation as a percentage of prior publication history is plotted.

      (page 4, line 161): “Analyzing self-citations as a fraction of publication history exhibited a similar trend (Figure S3). Notably, First Authors were more likely than Last Authors to self-cite when normalized by prior publication history.

      For the geographic analysis, we made two new maps: 1) that of the number of previous papers, and 2) that of the journal impact factor (see response to point #4 below).

      (page 5, line 185): “We also investigated the distribution of the number of previous papers and journal impact factor across countries (Figure S4). Self-citation maps by country were highly correlated with maps of the number of previous papers (Spearman’s r\=0.576, P=4.1e-4; 0.654, P=1.8e-5 for First and Last Authors). They were significantly correlated with maps of average impact factor for Last Authors (0.428, P=0.014) but not Last Authors (Spearman’s r\=0.157, P=0.424). Thus, further investigation is necessary with these covariates in a comprehensive model.”

      Finally, we included a model term for the number of previous papers (Table 2). We analyzed this both for self-citation counts and self-citation rates and found a strong relationship between publication history and self-citations. We also included the following section where we modeled the number of previous papers for each author (page 13, line 384):

      “2.10 Reconciling differences between raw data and models

      The raw and GAM-derived data exhibited some conflicting results, such as for gender and field of research. To further study covariates associated with this discrepancy, we modeled the publication history for each author (at the time of publication) in our dataset (Table 2). The model terms included academic age, article year, journal impact factor, field, LMIC status, gender, and document type. Notably, Neuroscience was associated with the fewest number of papers per author. This explains how authors in Neuroscience could have the lowest raw self-citation rates but the highest self-citation rates after including covariates in a model. In addition, being a man was associated with about 0.25 more papers. Thus, gender differences in self-citation likely emerged from differences in the number of papers, not in any self-citation practices.”

      (4) Because your analysis is limited to high impact journals, it would be beneficial to see the distribution of the impact factors across the different countries. Otherwise, your analysis on geographic differences in self-citation rates is hard to interpret. Are these differences really differences in self-citation rates, or differences in journal impact factor? It would be useful to look at the representation of authors from different countries for different impact factors.

      We made a map of this in Figure S4 (see our response to point #3 above).

      (page 5, line 185): “We also investigated the distribution of the number of previous papers and journal impact factor across countries (Figure S4). Self-citation maps by country were highly correlated with maps of the number of previous papers (Spearman’s r=0.576, P=4.1e-4; 0.654, P=1.8e-5 for First and Last Authors). They were significantly correlated with maps of average impact factor for Last Authors (0.428, P=0.014) but not Last Authors (Spearman’s r=0.157, P=0.424). Thus, further investigation is necessary with these covariates in a comprehensive model.”

      We also included impact factor as a term in our model. The results suggest that there are still geographic differences (Table 2, Table S5).

      (5) The presence of self-citations is not inherently problematic, and I appreciate the fact that authors omit any explicit judgment on this matter. That said, without appropriate context, self-citations are also not the best scholarly practice. In the analysis on gender differences in self-citations, it appears that authors imply an expectation of women's self-citation rates to align with those of men. While this is not explicitly stated, use of the word "disparity", and also presentation of self-citation as an example of self-promotion in discussion suggest such a perspective. Without knowing the context in which the self-citation was made, it is hard to ascertain whether women are less inclined to self-promote or that men are more inclined to engage in strategic self-citation practices.

      We agree that on the level of an individual self-citation, our study is not useful for determining how related the papers are. Yet, understanding overall trends in self-citation may help to identify differences. Context is important, but large datasets allow us to investigate broad trends. We added the following text to the limitations section (page 16, line 524):

      “In addition, these models do not account for whether a specific citation is appropriate, as some situations may necessitate higher self-citation rates.”

      Reviewer #3 (Public Review):

      This paper analyses self-citation rates in the field of Neuroscience, comprising in this case, Neurology, Neuroscience and Psychiatry. Based on data from Scopus, the authors identify self-citations, that is, whether references from a paper by some authors cite work that is written by one of the same authors. They separately analyse this in terms of first-author self-citations and last-author self-citations. The analysis is well-executed and the analysis and results are written down clearly. There are some minor methodological clarifications needed, but more importantly, the interpretation of some of the results might prove more challenging. That is, it is not always clear what is being estimated, and more importantly, the extent to which self-citations are "problematic" remains unclear.

      Thank you for your review. We attempted to improve the interpretation of results, as described in the following responses.

      When are self-citations problematic? As the authors themselves also clarify, "self-citations may often be appropriate". Researchers cite their own previous work for perfectly good reasons, similar to reasons of why they would cite work by others. The "problem", in a sense, is that researchers cite their own work, just to increase the citation count, or to promote their own work and make it more visible. This self-promotional behaviour might be incentivised by certain research evaluation procedures (e.g. hiring, promoting) that overly emphasise citation performance. However, the true problem then might not be (self-)citation practices, but instead, the flawed research evaluation procedures that emphasis citation performance too much. So instead of problematising self-citation behaviour, and trying to address it, we might do better to address flawed research evaluation procedures. Of course, we should expect references to be relevant, and we should avoid self-promotional references, but addressing self-citations may just have minimal effects, and would not solve the more fundamental issue.

      We agree that this dataset is not designed to investigate the downstream effects of self-citations. However, self-citation practices are more likely to be problematic when they differ across specific groups. This work can potentially spark more interest in future longitudinal designs to investigate whether differences in self-citation practices leads to differences in career outcomes, for example. We added the following text to clarify (page 17, line 565):

      “Yet, self-citation practices become problematic when they are different across groups or are used to “game the system.” Future work should investigate the downstream effects of self-citation differences to see whether they impact the career trajectories of certain groups. We hope that this work will help to raise awareness about factors influencing self-citation practices to better inform authors, editors, funding agencies, and institutions in Neurology, Neuroscience, and Psychiatry.”

      Some other challenges arise when taking a statistical perspective. For any given paper, we could browse through the references, and determine whether a particular reference would be warranted or not. For instance, we could note that there might be a reference included that is not at all relevant to the paper. Taking a broader perspective, the irrelevant reference might point to work by others, included just for reasons of prestige, so-called perfunctory citations. But it could of course also include self-citations. When we simply start counting all self-citations, we do not see what fraction of those self-citations would be warranted as references. The question then emerges, what level of self-citations should be counted as "high"? How should we determine that? If we observe differences in self-citation rates, what does it tell us?

      Our focus is when the self-citation practices differ across groups. We agree that, on a case-by-case basis, there is no exact number for a self-citation rate that is “high.” With a dataset of the current size, evaluating whether each individual self-citation is appropriate is not feasible. If we observe differences in self-citation rate, this may tell us about broad (not individual-level) trends and differences in self-citing practice. If one group is self-citing much more highly compared to another group–even after covarying relevant variables such as prior publication history–then the self-citation differences can likely be attributed to differences in self-citation practices/behaviors.

      For example, the authors find that the (any author) self-citation rate in Neuroscience is 10.7% versus 15.9% in Psychiatry. What does this difference mean? Are psychiatrists citing themselves more often than neuroscientists? First author men showed a self-citation rate of 5.12% versus a self-citation rate of 3.34% of women first authors. Do men engage in more problematic citation behaviour? Junior researchers (10-year career) show a self-citation rate of about 5% compared to a self-citation rate of about 10% for senior researchers (30-year career). Are senior researchers therefore engaging in more problematic citation behaviour? The answer is (most likely) "no", because senior authors have simply published more, and will therefore have more opportunities to refer to their own work. To be clear: the authors are aware of this, and also take this into account. In fact, these "raw" various self-citation rates may, as the authors themselves say, "give the illusion" of self-citation rates, but these are somehow "hidden" by, for instance, career seniority.

      We included numerous covariates in our model. In addition, to address the difference between “raw” and “modeled” self-citation rates, we added the following section (page 13, line 384):

      “2.10 Reconciling differences between raw data and models

      The raw and GAM-derived data exhibited some conflicting results, such as for gender and field of research. To further study covariates associated with this discrepancy, we modeled the publication history for each author (at the time of publication) in our dataset (Table 2). The model terms included academic age, article year, journal impact factor, field, LMIC status, gender, and document type. Notably, Neuroscience was associated with the fewest number of papers per author. This explains how authors in Neuroscience could have the lowest raw self-citation rates but the highest self-citation rates after including covariates in a model. In addition, being a man was associated with about 0.25 more papers. Thus, gender differences in self-citation likely emerged from differences in the number of papers, not in any self-citation practices.”

      Again, the authors do consider this, and "control" for career length and number of publications, et cetera, in their regression model. Some of the previous observations then change in the regression model. Neuroscience doesn't seem to be self-citing more, there just seem to be junior researchers in that field compared to Psychiatry. Similarly, men and women don't seem to show an overall different self-citation behaviour (although the authors find an early-career difference), the men included in the study simply have longer careers and more publications.

      But here's the key issue: what does it then mean to "control" for some variables? This doesn't make any sense, except in the light of causality. That is, we should control for some variable, such as seniority, because we are interested in some causal effect. The field may not "cause" the observed differences in self-citation behaviour, this is mediated by seniority. Or is it confounded by seniority? Are the overall gender differences also mediated by seniority? How would the selection of high-impact journals "bias" estimates of causal effects on self-citation? Can we interpret the coefficients as causal effects of that variable on self-citations? If so, would we try to interpret this as total causal effects, or direct causal effects? If they do not represent causal effects, how should they be interpreted then? In particular, how should it "inform author, editors, funding agencies and institutions", as the authors say? What should they be informed about?

      We apologize for our misuse of language. We will be more clear, as in most previous self-citation papers, that our analysis is NOT causal. Causal datasets do have some benefits in citation research, but a limitation is that they may not cover as wide of a range of authors. Furthermore, non-causal correlational studies can still be useful in informing authors, editors, funding agencies, and institutions. Association studies are widely used across various fields to draw non-causal conclusions. We made numerous changes to reduce our causal language.

      Before: “We then developed a probability model of self-citation that controls for numerous covariates, which allowed us to obtain significance estimates for each variable of interest.”

      After (page 3, line 113): “We then developed a probability model of self-citation that includes numerous covariates, which allowed us to obtain significance estimates for each variable of interest.”

      Before: “As such, controlling for various author- and article-level characteristics can improve the interpretability of self-citation rate trends.”

      After (page 11, line 321): “As such, covarying various author- and article-level characteristics can improve the interpretability of self-citation rate trends.”

      Before: “Initially, it appeared that self-citation rates in Neuroscience are lower than Neurology and Psychiatry, but after controlling for various confounds, the self-citation rates are higher in Neuroscience.”

      After (page 15, line 468): “Initially, it appeared that self-citation rates in Neuroscience are lower than Neurology and Psychiatry, but after considering several covariates, the self-citation rates are higher in Neuroscience.”

      We also added the following text to the limitations section (page 16, line 526):

      “Seventh, the analysis presented in this work is not causal. Association studies are advantageous for increasing sample size, but future work could investigate causality in curated datasets.”

      The authors also "encourage authors to explore their trends in self-citation rates". It is laudable to be self-critical and review ones own practices. But how should authors interpret their self-citation rate? How useful is it to know whether it is 5%, 10% or 15%? What would be the "reasonable" self-citation rate? How should we go about constructing such a benchmark rate? Again, this would necessitate some causal answer. Instead of looking at the self-citation rate, it would presumably be much more informative to simply ask authors to check whether references are appropriate and relevant to the topic at hand.

      We believe that our tool is valuable for authors to contextualize their own self-citation rates. For instance, if an author has published hundreds of articles, it is not practical to count the number of self-citations in each. We have added two portions of text to the limitations section:

      (page 16, line 524): “In addition, these models do not account for whether a specific citation is appropriate, though some situations may necessitate higher self-citation rates.”

      (page 16, line 535): “Despite these limitations, we found significant differences in self-citation rates for various groups, and thus we encourage authors to explore their trends in self-citation rates. Self-citation rates that are higher than average are not necessarily wrong, but suggest that authors should further reflect on their current self-citation practices.”

      In conclusion, the study shows some interesting and relevant differences in self-citation rates. As such, it is a welcome contribution to ongoing discussions of (self) citations. However, without a clear causal framework, it is challenging to interpret the observed differences.

      We agree that causal studies provide many benefits. Yet, association studies also provide many benefits. For example, an association study allowed us to analyze a wider range of articles than a causal study would have.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Statistical suggestions:

      (1) To improve statistical inference, nesting should be accounted for in all of the analyses. For example, the logistic regression model using citing/cited pairs should include random effects for article, author, and perhaps subfield, in order for independence of observations to be plausible. Similarly, bootstrapping and permutation would ideally occur at the author level rather than (or in addition to) the paper level.

      Detailed updates addressing these points are in the public review. In short, we found computational challenges with many levels of the random effects (>100,000) and millions of observations at the citation pairs level. As such, we decided to model citations rates and counts by paper. In this case, we found that results could be unstable when including author-level random effects because in many cases there was only one author per group. Instead, to avoid inappropriately narrow confidence bands, we resampled the dataset such that each author was only represented once. For example, if Author A had five papers in this dataset, then one of their five papers was randomly selected. We repeated the random resampling 100 times (Figure S12). We updated our description of our models in the Methods section (page 21, line 754).

      For permutation tests and bootstrapping, we now define an “exchangeability block” as a co-authorship group of authors. In this dataset, that meant any authors who published together (among the articles in this dataset) as a First Author / Last Author pairing were assigned to the same exchangeability block. It is not realistic to check for overlapping middle authors in all papers because of the collaborative nature of the field. In addition, we believe that self-citations are primarily controlled by first and last authors, so we can assume that middle authors do not control self-citation habits. We then performed bootstrapping and permutation tests in the constraints of the exchangeability blocks.

      (2) In general, I am having trouble understanding the structure of the regression models. My current belief is that rows are composed of individual citations from papers' reference lists, with the outcome representing their status as a self-citation or not, and with various citing article and citing author characteristics as predictors. However, the fact that author type is included in the model as a predictor (rather than having a model for FA self-citations and another for LA self-citations) suggests to me that each citation is entered as two separate rows - once noting whether it was a FA self-citation and once noting whether it was an LA self-citation - and then it is run as a single model.

      (2a) If I am correct, the model is unlikely to be producing valid inference. I would recommend breaking this analysis up into two separate models, and including article-, author-, and subfield-level random effects. You could theoretically include a citation-level random effect and keep it as one model, but each 'group' would only have two observations and the model would be fairly unstable as a result.

      (2b) If I am misunderstanding (and even if not), I would encourage you to provide a more detailed description of the dataset structure and the model - perhaps with a table or diagram

      We split the data into two models and decided to model on the level of a paper (self-citation rate and self-citation count). In addition, we subsampled the dataset such that each author only appears once to avoid misestimation of confidence intervals (see point (1) above). As described in the public review, we included much more detail in our methods section now to improve the clarity of our models.

      (3) I would suggest removing the inverse hyperbolic sine transform and replacing it with a more flexible approach to estimating the relationships' shape, like generalized additive models or other spline-based methods to ensure that the chosen method is appropriate - or at the very least checking that it is producing a realistic fit that reflects the underlying shape of the relationships.

      More details are available in the public review, but we now use GAMs throughout the manuscript.

      (4) For the "highly self-citing" analysis, it is unclear why papers in the 15-25% range were dropped rather than including them as their own category in an ordinal model. I might suggest doing the latter, or explaining the decision more fully

      We previously included this analysis as a paper-level model because our main model was at the level of citation pairs. Now, we removed this analysis because we model self-citation rates and counts by paper.

      (5) It would be beneficial for the reader to know what % of the data was dropped for each analysis, and for your team to make sure that there is not differential missing data that could affect the interpretation of the results (e.g., differences in self-citation being due to differences in Scopus ID coverage).

      Thank you for this suggestion. We added more detailed missingness data to 4.3 Data exclusions and missingness. We did find differential missingness and added it to the limitations section. However, certain aspects of this cannot be corrected because the data are just not available (e.g., Scopus coverage issues). Further details are available in the public review.

      Conceptual thoughts:

      (1) I agree with your decision to focus on the second definition of self-citation (self-cites relative to my citations to others' work) rather than the first (self-cites relative to others' citations to my work). But it does seem that the first definition is relevant in the context of gaming citation metrics. For example, someone who writes one paper per year with a reference list of 30% self-citations will have much less of an impact on their H-index than someone who writes 10 papers per year with 10% self-citations. It could be interesting to see how these definitions interact, and whether people who are high on one measure tend to be high on the other.

      We agree this would be interesting to investigate in the future. Unfortunately, our dataset is organized at the level of the paper and thus does not contain information regarding how many times the authors cite a particular work. We hope that we can explore this interaction in the future.

      (2) This is entirely speculative, but I wonder whether the increasing rate of LA self-citation relative to FA self-citation is partly due to PIs over-citing their own lab to build up their trainees' citation records and help them succeed in an increasingly competitive job market. This sounds more innocuous than doing it to benefit their own reputation, but it would provide another mechanism through which students from large and well-funded labs get a leg-up in the job market. Might be interesting to explore, though I'm not exactly sure how :)

      This is a very interesting point. We do not have any means to investigate this with the current dataset, but we added it to the discussion (page 14, line 421):

      “A third, more optimistic explanation is that principal investigators (typically Last Authors) are increasingly self-citing their lab’s papers to build up their trainee’s citation records for an increasingly competitive job market.”

      Reviewer #2 (Recommendations For The Authors):

      (1) In regards to point 1 in the public review: In the spirit of transparency, the authors would benefit from providing a rationale for their choice of top journals, and the methodology used to identify them. It would also be valuable to include the impact factor of each journal in the S1 table alongside their names.

      Given the availability and executability of code, it would be useful to see how and if the self-citation trends vary amongst the "low impact" journals (as measured by the IF). This could go in any of the three directions:

      a. If it is found that self-citations are not as prevalent in low impact journals, this could be a great starting point for a conversation around the evaluation of journals based on impact factor, and the role of self-citations in it.

      b. If it is found that self-citations are as prevalent in low impact journals as high impact journals, that just strengthens your results further.

      c. If it is found that self-citations are more prevalent in low impact journals, this would mean your current statistics are a lower bound to the actual problem. This is also intuitive in the sense that high impact journals get more external citations (and more exposure) than low impact journals, as such authors (and journals) may be less likely to self-cite.

      Expanding the dataset to include many more journals was not feasible. Instead, we included an impact factor term in our models, as detailed in the public review. We found no strong trends in the association between impact factor and self-citation rate/count. Another important note is that these journals were considered “high impact” in 2020, but many had lower impact factors in earlier years. Thus, our modeling allows us to estimate how impact factor is related to self-citations across a wide range of impact factors.

      It is crucial to consider utilizing such a comprehensive database as Scopus, which provides a more thorough list of all journals in Neuroscience, to obtain a more representative sample. Alternatively, other datasets like Microsoft Academic Graph, and OpenAlex offer information on the field of science associated with each paper, enabling a more comprehensive analysis.

      We agree that certain datasets may offer a wider view of the entire field. However, we included a large number of papers and journals relative to previous studies. In addition, Scopus provides a lot of detailed and valuable author-level information. We had to limit our calls to the Scopus API so restricted journals by 2020 impact factor.

      (2) In regards to point 2 in the public review: To enhance the accuracy and specificity of the analysis, it would be beneficial to distinguish neuroscientists among the co-authors. This could be accomplished by examining their publication history leading up to the time of publication of the paper, and identify each author's level of engagement and specialization within the field of neuroscience.

      Since the field of neuroscience is largely based on collaborations, we find that it might be impossible to determine who is a neuroscientist. For example, a researcher with a publication history in physics may now be focusing on computational neuroscience research. As such, we feel that our current work, which ensures that the papers belong to neuroscience, is representative of what one may expect in terms of neuroscience research and collaboration.

      (3) In regards to point 3 in the public review: I highly recommend plotting self-citation rate as the number of papers in the reference list over the number of total publications to date of paper publication.

      As described in the public review, we have now done this (Figure S3).

      (4) In regards to point 5 in the public review: It would be useful to consider the "quality" of citations to further the discussion on self-citations. For instance, differentiating between self-citations that are perfunctory and superficial from those that are essential for showing developmental work, would be a valuable contribution.

      Other databases may have access to this information, but ours unfortunately does not. We agree that this is an interesting area of work.

      (5) The authors are to be commended for their logistic regression models, as they control for many confounders that were lacking in their earlier descriptive statistics. However, it would be beneficial to rerun the same analysis but on a linear model whereby the outcome variable would be the number of self-citations per author. This would possibly resolve many of the comments mentioned above.

      Thank you for your suggestion. As detailed in the public review, we now model the number of self-citations. This is modeled on the paper level, not the author level, because our dataset was downloaded by paper, not by author.

      Minor suggestions:

      (1) Abstract says one of your findings is: "increasing self-citation rates of First Authors relative to Last Authors". Your results actually show the opposite (see Figure 1b).

      Thank you for catching this error. We corrected it to match the results and discussion in the paper:

      “…increasing self-citation rates of Last Authors relative to First Authors.”

      (2) It might be interesting to compute an average academic age for each paper, and look at self-citation vs average academic age plot.

      We agree that this would be an interesting analysis. However, to limit calls to the API, we collected academic age data only on First and Last Authors.

      (3) It may be interesting to look at the distribution of women in different subfields within neuroscience, and the interaction of those in the context of self-citations.

      Thank you for this interesting suggestion. We added the following analysis (page 9, line 305):

      “Furthermore, we explored topic-by-gender interactions (Figure S10). In short, men and women were relatively equally represented as First Authors, but more men were Last Authors across all topics. Self-citation rates were higher for men across all topics.”

      Reviewer #3 (Recommendations For The Authors):

      - In the abstract, "flaws in citation practices" seems worded rather strongly.

      We respectfully disagree, as previous works have shown significant bias in citation practices. For example, Dworkin et al. (Dworkin et al. 2020) found that neuroscience reference lists tended to under-cite women, even after including various covariates.

      - Links of the references to point to (non-accessible) paperpile references, you would probably want to update this.

      We apologize for the inconvenience and have now removed these links.

      - p 2, l 24: The explanation of ref. (5) seems to be a bit strangely formulated. The point of that article is that citations to work that reinforce a particular belief are more likely to be cited, which *creates* unfounded authority. The unfounded authority itself is hence no part of the citation practices

      Thank you for catching our misinterpretation. We have now removed this part of the sentence.

      - p 3, l 16: "h indices" or "citations" instead of "h-index".

      We now say “h-indices”.

      - p 5, l 5: how was the manual scoring done?

      We added the following to the caption of Figure S1.

      “Figure S1. Comparison between manual scoring of self-citation rates and self-citation rates estimated from Python scripts in 5 Psychiatry journals: American Journal of Psychiatry, Biological Psychiatry, JAMA Psychiatry, Lancet Psychiatry, and Molecular Psychiatry. 906 articles in total were manually evaluated (10 articles per journal per year from 2000-2020, four articles excluded for very large author list lengths and thus high difficulty of manual scoring). For manual scoring, we downloaded information about all references for a given article and searched for matching author names.”

      - p 5, l 23: Why this specific p-value upper bound of 4e-3? From later in the article, I understand that this stems from the 10000 bootstrap sample, with then taking a Bonferroni correction? Perhaps good to clarify this briefly somewhere.

      Thank you for this suggestion. We now perform Benjamini/Hochberg false discovery rate (FDR) correction, but we added a description of the minimum P value from permutations (page 21, line 748):

      “All P values described in the main text were corrected with the Benjamini/Hochberg 16 false discovery rate (FDR) correction. With 10,000 permutations, the lowest P value after applying FDR correction is P=2.9e-4, which indicates that the true point would be the most extreme in the simulated null distribution.”

      - Fig. 1, caption: The (a) and (b) labelling here is a bit confusing, because the first sentence suggests both figures portray the same, but do so for different time periods. Perhaps rewrite, so that (a) and (b) are both described in a single sentence, instead of having two different references to (a) and (b).

      Thank you for pointing this out. We fixed the labeling of this caption:

      “Figure 1. Visualizing recent self-citation rates and temporal trends. a) Kernel density estimate of the distribution of First Author, Last Author, and Any Author self-citation rates in the last five years. b) Average self-citation rates over every year since 2000, with 95% confidence intervals calculated by bootstrap resampling.”

      - p7, l 9: Regarding "academic age", note that there might be a difference between "age" effects and "cohort" effects. That is, there might be difference between people with a certain career age who started in 1990 and people with the same career age, but who started in 2000, which would be a "cohort" effect.

      We agree that this is a possible effect and have added it to the limitations (page 16, line 532):

      “Tenth, while we considered academic age, we did not consider cohort effects. Cohort effects would depend on the year in which the individual started their career.”

      - p 7, l 15: "jumps" suggests some sort of sudden or discontinuous transition, I would just say "increases".

      We now say “increases.”

      - Fig. 2: Perhaps it should be made more explicit that this includes only academics with at least 50 papers. Could the authors please clarify whether the same limitation of at least 50 papers also features in other parts of the analysis where academic age is used? This selection could affect the outcomes of the analysis, so its consequences should be carefully considered. One possibility for instance is that it selects people with a short career length who have been exceptionally productive, namely those that have had 50 papers, but only started publishing in 2015 or so. Such exceptionally productive people will feature more highly in the early career part, because they need to be so productive in order to make the cut. For people with a longer career, the 50 papers would be less of a hurdle, and so would select more and less productive people more equally.

      We apologize for the lack of clarity. We did not use this requirement where academic age was used. We mainly applied this requirement when aggregating by country, as we did not want to calculate self-citation rate in a country based on only several papers. We have clarified various data exclusions in our new section 4.3 Data exclusions and missingness.

      - p 8, l 11: The affiliated institution of an author is not static, but rather changes throughout time. Did the authors consider this? If not, please clarify that this refers to only the most recent affiliation (presumably). Authors also often have multiple affiliations. How did the authors deal with this?

      The institution information is at the time of publication for each paper. We added more detail to our description of this on page 19, line 656:

      “For both First and Last Authors, we found the country of their institutional affiliation listed on the publication. In the case of multiple affiliations, the first one listed in Scopus was used.”

      - p 10, l 6: How were these self-citation rates calculated? This is averaged per author (i.e. only considering papers assigned to a particular topic) and then averaged across authors? (Note that in this way, the average of an author with many papers will weigh equally with the average of an author with few papers, which might skew some of the results).

      We calculate it across the entire topic (i.e., do NOT calculate by author first). We updated the description as follows (page 7, line 211):

      “We then computed self-citation rates for each of these topics (Figure 4) as the total number of self-citations in each topic divided by the total number of references in each topic…”

      - p 13, l 18: Is the academic age analysis here again limited to authors having at least 50 papers?

      This is not limited to at least 50 papers. To clarify, the previous analysis was not limited to authors with 50 papers. It was instead limited to ages in our dataset that had at least 50 data points. e.g., If an academic age of 70 only had 20 data points in our dataset, it would have been excluded.

      - Fig. 5: Here, comparing Fig. 5(d) and 5(f) suggests that partly, the self-citation rate differences between men and women, might be the result of the differences in number of papers. That is, the somewhat higher self-citation rate at a given academic age, might be the result of the higher number of papers at that academic age. It seems that this is not directly described in this part of the analysis (although this seems to be the case from the later regression analysis).

      We agree with this idea and have added a new section as follows (page 13, line 384):

      “2.10 Reconciling differences between raw data and models

      The raw and GAM-derived data exhibited some conflicting results, such as for gender and field of research. To further study covariates associated with this discrepancy, we modeled the publication history for each author (at the time of publication) in our dataset (Table 2). The model terms included academic age, article year, journal impact factor, field, LMIC status, gender, and document type. Notably, Neuroscience was associated with the fewest number of papers per author. This explains how authors in Neuroscience could have the lowest raw self-citation rates by highest self-citation rates after including covariates in a model. In addition, being a man was associated with about 0.25 more papers. Thus, gender differences in self-citation likely emerged from differences in the number of papers, not in any self-citation practices.”

      - Section 2.10. Perhaps the authors could clarify that this analysis takes individual articles as the unit of analysis, not citations.

      We updated all our models to take individual articles and have clarified this with more detailed tables.

      - p 18, l 10: "Articles with between 15-25% self-citation rates were 10 discarded" Why?

      We agree that these should not be discarded. However, we previously included this analysis as a paper-level model because our main model was at the level of citation pairs. Now, we removed this analysis because we model self-citation rates and counts by paper.

      - p 20, l 5: "Thus, early-career researchers may be less incentivized to 5 self-promote (e.g., self-cite) for academic gains compared to 20 years ago." How about the possibility that there was less collaboration, so that first authors would be more likely to cite their own paper, whereas with more collaboration, they will more often not feature as first author?

      This is an interesting point. We feel that more collaboration would generally lead to even more self-citations, if anything. If an author collaborates more, they are more likely to be on some of the references as a middle author (which by our definition counts toward self-citation rates).

      - p 20, l 15: Here the authors call authors to avoid excessive self-citations. Of course, there's nothing wrong with calling for that, but earlier the authors were more careful to not label something directly as excessive self-citations. Here, by stating it like this, the authors suggest that they have looked at excessive self-citations.

      We rephrased this as follows:

      Before: “For example, an author with 30 years of experience cites themselves approximately twice as much as one with 10 years of experience on average. Both authors have plenty of works that they can cite, and likely only a few are necessary. As such, we encourage authors to be cognizant of their citations and to avoid excessive self-citations.”

      After: “For example, an author with 30 years of experience cites themselves approximately twice as much as one with 10 years of experience on average. Both authors have plenty of works that they can cite, and likely only a few are necessary. As such, we encourage authors to be cognizant of their citations and to avoid unnecessary self-citations.”

      - p 22, l 11: Here again, the same critique as p 20, l15 applies.

      We switched “excessively” to “unnecessarily.”

      - p 23, l 12: The authors here critique ref. (21) of ascertainment bias, namely that they are "including only highly-achieving researchers in the life 12 sciences". But do the authors not do exactly the same thing? That is, they also only focus on the top high-impact journals.

      We included 63 high-impact journals with tens of thousands of authors. In addition, some of these journals were not high-impact at the time of publication. For example, Acta Neuropathologica had an impact factor of 17.09 in 2020 but 2.45 in 2000. This still is a limitation of our work, but we do cover a much broader range of works than the listed reference (though their analysis also has many benefits since it included more detailed information).

      - p 26, l 22-26: It seems that the matching is done quite broadly (matching last names + initials at worst) for self-citations, while later (in section 4.9, p 31, l 9), the authors switch to only matching exact Scopus Author IDs. Why not use the same approach throughout? Or compare the two definitions (narrow / broad).

      Thank you for catching this mistake. We now use the approach of matching Scopus Author IDs throughout.

      - S8: it might be nice to explore open alternatives, such as OpenAlex or OpenAIRE, instead of the closed Scopus database, which requires paid access (which not all institutions have, perhaps that could also be corrected in the description in GitHub).

      Thank you for this suggestion. Unfortunately, switching databases would require starting our analysis from the beginning. On our GitHub page, we state: “Please email matthew.rosenblatt@yale.edu if you have trouble running this or do not have institutional access. We can help you run the code and/or run it for you and share your self-citation trends.” We feel that this will allow us to help researchers who may not have institutional access. In addition, we released our aggregated, de-identified (title and paper information removed) data on GitHub for other researchers to use.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      [...] Based on these results, the authors support a model whereby kinetic regimes are encoded in the cis-regulatory sequences of a gene instead of imposed by an evolving trans-regulatory environment.

      The question asked in this manuscript is important and the eve locus represents an ideal paradigm to address it in a quantitative manner. Most of the results are correctly interpreted and well-presented. However, the main conclusion pointing towards a potential "unified theory" of burst regulation during Drosophila embryogenesis should be nuanced or cross-validated.

      Our results and those of others suggest that different developmental genes follow unified—yet different—transcriptional control strategies whereby different combinations of bursting parameters are regulated to modulate gene expression: burst frequency and amplitude for eve (Berrocal et al., 2020), and burst frequency and duration for gap genes (Zoller et al., 2018). In light of the aforementioned works, we can only claim that our results suggest a unified strategy for eve, our case of study, as we observe that eve regulatory strategies are robust to disruption of enhancers and binding sites. In the Discussion section of our revised manuscript, we will emphasize that the bursting control strategy we uncovered for eve does not necessarily apply to other genes, and speculate in more detail that genes that employ the same strategy of transcriptional bursting may be grouped in families that share a common molecular mechanism of transcription.

      Manuscript updates:

      We have emphasized in the Discussion section that our claim of unified strategies pertains exclusively to the bursting behavior of the gene even-skipped, and do not necessarily extend to other genes. To clarify this point, we referenced the findings of (Zoller, Little, and Gregor 2018) and (Chen et al. 2023), who observed that the bursting control strategy of Drosophila gap genes relies on the modulation of burst frequency and duration. Additionally, we cited the findings of (Syed, Duan, and Lim 2023), who reported a decrease in bursting amplitude and duration upon disruption of Dorsal binding sites on the snail minimal distal enhancer. Both examples describe bursting control strategies that differ from the modulation of burst frequency and amplitude observed for even-skipped.

      In addition to the lack of novelty (some results concerning the fact that koff does not change along the A/P axis/the idea of a 'unified regime' were already obtained in Berrocal et al 2020),...

      Unfortunately, we believe there is a misunderstanding in terms of what we construe as novelty in our work. In our previous work (Berrocal et al., 2020), we observed that the seven stripes of even-skipped (eve) expression modulate transcriptional bursting through the same strategy—bursting frequency and amplitude are controlled to yield various levels of mRNA synthesis, while burst duration remains constant. We reproduce that result in our paper, and do not claim any novelty. However, what was unclear is whether the observed eve bursting control strategy would only exist in the wild-type stripes, whose expression—we reasoned—is under strong selection due to the dramatic phenotypic consequences of eve transcription, or if eve transcriptional bursting would follow the same strategy under trans-regulatory environments that are not under selection to deliver specific spatiotemporal dynamics of eve expression. Our results—and here lies the novelty of our work—support the second scenario, and point to a model where eve bursting strategies do not result from adaptation of eve activity to specific trans-regulatory environments. Instead, we speculate that a molecular mechanism constrains eve bursting strategy whenever and wherever the gene is active. This is something that we could not have known from our first study in (Berrocal et al., 2020) and constitutes the main novelty of our paper. To put this in other words, the novelty of our work does not rest on the fact that both burst frequency and amplitude are modulated in the endogenous eve pattern, but that this modulation remains quantitatively indistinguishable when we focus on ectopic areas of expression. We will make this point clearer in the Introduction and Discussion section of our revised manuscript.

      Manuscript updates:

      We have clarified this point in both the Introduction and Discussion sections. In the updated Introduction, we state that while our previous work (Berrocal et al. 2020) examined bursting strategies in endogenous expression regions that are, in principle, subject to selection, the present study induced the formation of ectopic expression patterns to probe bursting strategies in regions presumably devoid of evolutionary pressures. In the Discussion section, we highlight that the novelty of our work lies in the insights derived from the comparative analysis between ectopic and endogenous regions of even-skipped expression, an aspect not addressed in our previous work.

      … note i) the limited manipulation of TF environment;...

      We acknowledge that additional genetic manipulations would make it possible to further test the model. However, we hope that the reviewer will agree with us that the manipulations that we did perform are sufficient to provide evidence for common bursting strategies under the diverse trans-regulatory environments present in wild-type and ectopic regions of gene expression. In the Discussion section of our revised manuscript, we will elaborate further on the kind of genetic manipulations (e.g., probing transcriptional strategies that result from swapping promoters in the context of eve-MS2 BAC; or quantifying the impact on eve transcriptional control after performing optogenetic perturbations of transcription factors and/or chromatin remodelers) that could shed further light on the currently undefined molecular mechanism that constrains eve bursting strategies, as a mean to motivate future work.

      Manuscript updates:

      In our Discussion section, we elaborated on proposed manipulations of the transcription factor environment to elucidate the molecular mechanisms behind even-skipped bursting control strategies. We began by listing studies linking transcription factor concentration to bursting control strategies, such as (Hoppe et al. 2020), who observed that the natural BMP (Bone Morphogenetic Protein) gradient shapes bursting frequency of target genes in Drosophila embryos. And (Zhao et al. 2023), who used the LEXY optogenetic system to modulate Knirps nuclear concentration and observed that this repressor acts on eve stripe 4+6 enhancer by gradually decreasing bursting frequency until the locus adopts a reversible quiescent state. Then, we proposed performing systematic LEXY-mediated modulation of critical transcription factors (Bicoid, Hunchback, Giant, Kruppel, Zelda) to understand the extent of their contribution to the unified even-skipped bursting strategies.

      To better frame the hypothesis that the even-skipped promoter defines strategies of bursting control, we added a reference to the work of (Tunnacliffe, Corrigan, and Chubb 2018). This study surveyed 17 actin genes with identical sequences but distinct promoters in the amoeba Dictyostelium discoideum, and found that all genes display different bursting strategies. Their findings, together with the previously cited work by (Pimmett et al. 2021) and (Yokoshi et al. 2022), suggest a critical role of gene promoters in constraining the bursting strategies of eukaryotic genes.

      … ii) the simplicity with which bursting is analyzed (only a two-state model is considered, and not cross-validated with an alternative approach than cpHMM) and…

      Based on our previous work (Lammers et al., 2020), and as described in the SI Section of the current manuscript: Inference of Bursting Parameters, we selected a three-state model (OFF, ON1, ON2) under the following rationale: transcription of even-skipped in pre-gastrulating embryos occurs after DNA replication, and promoters on both sister chromatids remain paired. Most of the time these paired loci cannot be resolved independently using conventional microscopy. As a result, when we image an MS2 spot, we are actually measuring the transcriptional dynamics of two promoters. Thus, each MS2-fluorescent spot may result from none (OFF), one (ON1) or two (ON2) sister promoters being in the active state. Following our previous work, we analyzed our data assuming the three-state model (OFF, ON1, ON2), and then, for ease of presentation, aggregated ON1 and ON2 into an effective single ON state. As for the lack of an alternative model, we chose the simplest model compatible with our data and our current understanding of transcription at the eve locus. With this in mind, we do not rule out the possibility that more complex processes—that are not captured by our model—shape MS2 fluorescence signals. For example, promoters may display more than two states of activity. However, as shown in (Lammers et al., 2020 - SI Section: G. cpHMM inference sensitivities), model selection schemes and cross-validation do not give consistent results on which model is more favorable; and for the time being, there is not a readily available alternative to HMM for inference of promoter states from MS2 signal. For example, orthogonal approaches to quantify transcriptional bursting, such as smFISH, are largely blind to temporal dynamics. As a result, we choose to entertain the simplest two-state model for each sister promoter. We appreciate these observations, as they point out the need of devoting a section in the supplemental material of our revised manuscript to clarify the motivations behind model selection.

      Manuscript updates:

      We have devoted the new Supplemental Material section “Selection of a three-state model of promoter activity and a compound Hidden Markov Model for inference of promoter states from MS2 fluorescent signal” to clarify the rationale behind our selection of a three-state promoter activity model. Since transcription in pre-gastrulating Drosophila embryos occurs after DNA replication, each MS2-active locus contains two unresolvable sister promoters that can either be inactive (OFF), one active (ON1), or both active (ON2).

      Next, we elaborated on the conversion of a three-state model into an effective two-state model for ease of presentation and described how the effective two-state model parameters—kon (burst frequency), koff-1 (burst duration), and r (burst amplitude)—were calculated.

      Additionally, we acknowledged that while the three-state model of promoter activity is the simplest model compatible with our current understanding of transcription in the even-skipped locus, we do not rule out the possibility that even-skipped transcription may be described by more complex models that include multiple states beyond ON and OFF. Finally, we referenced (Lammers et al. 2020) who asserted that while all inferences of promoter states computed from confocal microscopy of MS2/PP7 fluorescence data rely on Hidden Markov models, cross-comparisons between one, two, or multiple-state Hidden Markov models do not yield consistent results regarding which is more accurate. We close the new section by proposing that state-of-the-art microscopy and deconvolution algorithms to improve signal-to-noise-ratio may offer alternatives to the inference of promoter states.

      … iii) the lack of comparisons with published work.

      We thank the reviewer for pointing this out. In the current discussion of our manuscript, we compare our findings to recent articles that have addressed the question of the origin of bursting control strategies in Drosophila embryos (Pimmett et al., 2021; Yokoshi et al., 2022; Zoller et al., 2018). Nevertheless, we acknowledge that we failed to include references that are relevant to our study. Thus, our revised Discussion section must include recent results by (Syed et al., 2023), which showed that the disruption of Dorsal binding sites on the snail minimal distal enhancer results in decreased amplitude and duration of transcription bursts in fruit fly embryos. Additionally, we have to incorporate the study by (Hoppe et al., 2020), which reported that the Drosophila bone morphogenetic protein (BMP) gradient modulates the bursting frequency of BMP target genes. References to thorough studies of bursting control in other organisms, like Dictyostelium discoideum (Tunnacliffe et al., 2018), are due as well.

      Manuscript updates:

      As mentioned in the updates above, our revised manuscript now includes long due references to studies by (Syed, Duan, and Lim 2023), (Hoppe et al. 2020), (Tunnacliffe, Corrigan, and Chubb 2018), and (Chen et al. 2023). All of which are relevant for our current workk.

      Reviewer #2 (Public Review):

      The manuscript by Berrocal et al. asks if shared bursting kinetics, as observed for various developmental genes in animals, hint towards a shared molecular mechanism or result from natural selection favoring such a strategy. Transcription happens in bursts. While transcriptional output can be modulated by altering various properties of bursting, certain strategies are observed more widely. As the authors noted, recent experimental studies have found that even-skipped enhancers control transcriptional output by changing burst frequency and amplitude while burst duration remains largely constant. The authors compared the kinetics of transcriptional bursting between endogenous and ectopic gene expression patterns. It is argued that since enhancers act under different regulatory inputs in ectopically expressed genes, adaptation would lead to diverse bursting strategies as compared to endogenous gene expression patterns. To achieve this goal, the authors generated ectopic even-skipped transcription patterns in fruit fly embryos. The key finding is that bursting strategies are similar in endogenous and ectopic even-skipped expression. According to the authors, the findings favor the presence of a unified molecular mechanism shaping even-skipped bursting strategies. This is an important piece of work. Everything has been carried out in a systematic fashion. However, the key argument of the paper is not entirely convincing.

      We thank the reviewer, as these comments will enable us to improve the Discussion section and overall logic of our revised manuscript. We agree that the evidence provided in this work, while systematic and carefully analyzed, cannot conclusively rule out either of the two proposed models, but just provide evidence supporting the hypothesis for a specific molecular mechanism constraining eve bursting strategies. Our experimental evidence points to valuable insights about the mechanism of eve bursting control. For instance, had we observed quantitative differences in bursting strategies between ectopic and endogenous eve domains, we would have rejected the hypothesis that a common molecular mechanism constrains eve transcriptional bursting to the observed bursting control strategy of frequency and amplitude modulation. Thus, we consider that our proposition of a common molecular mechanism underlying unified eve bursting strategies despite changing trans-regulatory environments is more solid. On the other hand, while our model suggests that this undefined bursting control strategy is not subject to selection acting on specific trans-regulatory environments, it is not trivial to completely discard selection for specific bursting control strategies given our current lack of understanding of the molecular mechanisms that shape the aforesaid strategies. Indeed, we cannot rule out the hypothesis that the observed strategies are most optimal for the expression of eve endogenous stripes according to natural selection, and that these control strategies persist in ectopic regions as an evolutionary neutral “passenger phenotype” that does not impact fitness. We recognize the need to acknowledge this last hypothesis in the updated Introduction and Discussion sections of our manuscript. Further studies will be needed to determine the mechanistic and molecular basis of eve bursting strategies.

      Manuscript updates:

      In this work, we compared strategies of bursting control between endogenous and ectopic regions of even-skipped expression. Different strategies between both regions would suggest that selective pressure maintains defined bursting strategies in endogenous regions. Conversely, similar strategies in both ectopic and endogenous regions would imply that a shared molecular mechanism constrains bursting parameters despite changing trans-regulatory environments.

      In our updated Discussion section, we acknowledge that while our work provides evidence supporting the second hypothesis, we cannot conclusively rule out the possibility that the observed strategies were selected as the most optimal for endogenous even-skipped expression regions and that ectopic regions retain such optimal bursting strategies as an evolutionary neutral “passenger phenotype”.

      Reviewer #3 (Public Review):

      In this manuscript by Berrocal and coworkers, the authors do a deep dive into the transcriptional regulation of the eve gene in both an endogenous and ectopic background. The idea is that by looking at eve expression under non-native conditions, one might infer how enhancers control transcriptional bursting. The main conclusion is that eve enhancers have not evolved to have specific behaviors in the eve stripes, but rather the same rates in the telegraph model are utilized as control rates even under ectopic or 'de novo' conditions. For example, they achieve ectopic expression (outside of the canonical eve stripes) through a BAC construct where the binding sites for the TF Giant are disrupted along with one of the eve enhancers. Perhaps the most general conclusion is that burst duration is largely constant throughout at ~ 1 - 2 min. This conclusion is consistent with work in human cell lines that enhancers mostly control frequency and that burst duration is largely conserved across genes, pointing to an underlying mechanistic basis that has yet to be determined.

      We thank the reviewer for the assessment of our work. Indeed, evidence from different groups (Berrocal et al., 2020; Fukaya et al., 2016; Hoppe et al., 2020; Pimmett et al., 2021; Senecal et al., 2014; Syed et al., 2023; Tunnacliffe et al., 2018; Yokoshi et al., 2022; Zoller et al., 2018) is coming together to uncover commonalities, discrepancies, and rules that constrain transcriptional bursting in Drosophila and other organisms.

      Additional updates to the manuscript

      (1) In our current study, we observed the appearance of a mutant stripe of even-skipped expression beyond the anterior edge of eve stripe 1, which we refer to as eve stripe 0. This stripe appeared in embryos with a disrupted eve stripe 1 enhancer. In a previous study, (Small, Blair, and Levine 1992) reported a “head patch” of even-skipped expression while assaying the regulation of reporter constructs carrying the minimal regulatory element of eve stripe 2 enhancer alone. In our updated manuscript, we state that it is tempting to identify our eve stripe 0 with the previously reported head patch. (Small, Blair, and Levine 1992) speculated that this head patch of even-skipped expression appeared as a result of regulatory sequences present in the P-transposon system they used for genomic insertions. However, P-transposon sequences are not present in our experimental design. Thus, the appearance of eve stripe 0 indicates a repressive role of the eve stripe 1 enhancer at the anterior end of the embryo and may imply that the minimal regulatory element of the eve stripe 2 enhancer, as probed by (Small, Blair, and Levine 1992), can drive the expression of the head patch/eve stripe 0 when the eve stripe 1 enhancer is not present.

      (2)  In our current analysis, we observed that the disruption of Gt-binding sites on the eve stripe 2 enhancer synergizes with the deletion of the eve stripe 1 enhancer, as double mutant embryos display more ectopic expression in their anterior regions than embryos with only disrupted Gt-binding sites. While this may indicate that the repressive activity of eve stripe 1 enhancer synergizes with the repression exerted by Giant, other unidentified transcription factors may be involved in this repressive synergy. In the updated manuscript we clarified that unidentified transcription factors may bind in the vicinity of Gt-binding sites. The hypothesis that Gt-binding sites recognize other transcription factors was proposed by (Small, Blair, and Levine 1992), as they observed that the anterior expansion of eve stripe 2 resulting from Gt-binding site deletions was “somewhat more severe” than expansion observed in embryos carrying null-Giant alleles.

    1. Author response:

      We will address all the textual suggestions, including rectifying any typos and incorporating the most recent literature.

      We will conduct longitudinal studies to determine whether the phenotype worsens or improves over time in liver-specific SMN-depleted mice. In this regard, we will present data from P60 animals, such as histological analyses for the characterization of the liver and pancreas.

    1. Author response:

      We thank the reviewers for their productive comments on our work. While we have chosen to not revise the manuscript further, we reply to the public reviewer comments here so as to provide clarification on certain points.

      Reviewer #1 (Public Review):

      Summary:

      The aim of the study described in this paper was to test whether visual stimuli that pulse synchronously with the systole phase of the cardiac cycle are suppressed compared with stimuli that pulse in the diastole phase. To this end, the authors employed a binocular rivalry task and used the duration of the perceived image as the metric of interest. The authors predicted that if there was global suppression of the visual stimulus during systole then the durations of the stimulus that were pulsing synchronously with systole should be of shorter duration than those pulsing in diastole. However, the results observed were the opposite of those predicted. The authors speculate on what this facilitation effect might mean for the baroreceptor suppression hypothesis.

      Strengths:

      This is an interesting and timely study that uses a clever paradigm to test the baroreceptor suppression hypothesis in vision. This is a refreshingly focussed paper with interesting and seemingly counterintuitive results.

      Weaknesses:

      The paper could benefit from a clearer explanation of the predicted results. For those not experts in binocular rivalry, it would be useful to explain the predicted results. Does pulsing stimuli in this way change durations in such a task? If there is global suppression of visual stimuli why would this lead to shorter/longer durations in the systole compared to the diastole conditions? In addition, the duration lengths in both conditions seem to be longer than one cardiac cycle. If the cardiac cycle modulates duration it would be interesting to discuss why this occurs on some cycles but not on others. If there is a facilitation effect why does it only occur on some cycles?

      In general, pulsing stimuli (i.e. moving gratings) show longer dominance durations when in competition with non-pulsing stimuli; in other words, pulses increase the “stimulus strength” of a visual grating (Wade, De Weert & Swanston, 1984). The Baroreceptor Hypothesis predicts global suppression of visual cortex during systole (and not during diastole), so the stimulus strength boost yielded by a pulse should be attenuated during systole. Thus, the stimulus that only pulses during systole would have lower stimulus strength (and thus shorter dominance durations) than that which pulses during diastole; however, we observe the opposite pattern in our data, seemingly contradicting the Baroreceptor Hypothesis.

      In typical binocular rivalry paradigms, dominance durations are biased by stimulus strength, but perception remains bistable such that the stronger stimulus is not necessarily dominant at a given time. We see no reason, then, why switching would have to occur every cycle. The dominance durations we see are quite typical of binocular rivalry paradigms, whereas durations shorter than a cardiac cycle would be rather unusual (Carmel et al., 2010).

      Reviewer #2 (Public Review):

      Summary:

      This is a binocular rivalry study that uses electrocardiogram events to modulate visual stimuli in real-time, relative to participants' heartbeats. The main finding is that modulations during the period around when the heart has contracted (systole) increase rivalry dominance durations. This is a really neat result, that demonstrates the link between interoception and vision. I thought the Bayesian mixture modelling was a really smart way to identify cardiac non-perceivers, and the finding that the main result is preserved in this group is compelling. Overall, the study has been conducted to a high standard, is appropriately powered, and reported clearly. I have one suggestion about interpretation, which concerns the explanation of increased dominance durations with reference to contemporary models of binocular rivalry, and a few minor queries. However, I think this paper is a worthwhile addition to the literature.

      The point Reviewer 2 makes with respect to contemporary models of binocular rivalry is important – perhaps more so than its brief statement in this public review suggests. As we already expand upon in our Discussion, the effects of global (neural) inhibition depend on the preexisting role that inhibition plays in a given neural circuit. The original framing of the Baroreceptor Hypothesis describes baroreceptor activity of uniformly impeding sensory processing (Lacey, 1967; Lacey & Lacey, 1978, American Psychologist), which is contradicted by our present results. This account is often interpreted as implying the effects of baroreceptor activation is inhibitory in terms of neural mechanism (e.g. Rau et al., 1993, Psychophysiology; Edwards et al., 2009, Psychophysiology). Some researchers argue this serves a parallel function to the inhibitory projections from motor to sensory areas during volitional movement, “cancelling” the sensory effects of heartbeats (Van Elk, et al., 2014, Biological Psychology).

      However, baroreceptor activity has also been described as introducing noise into sensory processing rather than inhibiting it directly (e.g. Allen et al., 2022, PLoS Computational Biology). Lacey and Lacey’s own account actually seemed to point toward attention as a mediating mechanism (Hahn, 1973, Psychological Bulletin), with the disproportionate focus on cortical inhibition emerging in the literature over time. All this is to say that, while our results seem to falsify the behavioral predictions of the original Baroreceptor Hypothesis, subsequent versions of that hypothesis that describe an inhibitory neural mechanism, rather than an inhibition of perception per se, could potentially still be compatible with our results. This is a topic we plan to explore in future work.

      Reviewer #3 (Public Review):

      Summary:

      The manuscript addresses a question inspired by the Baroceptor Hypothesis and its links to visual awareness and interoception. Specifically, the reported study aimed to determine if the effects of cardiac contraction (systole) on binocular rivalry (BR) are facilitatory or suppressive. The main experiment - relying on a technically challenging procedure of presenting stimuli synchronised with the heartbeats of participants - has been conducted with great care, and numerous manipulation checks the authors report convincingly show that the methods they used work as intended. Moreover, the control experiment allows for excluding alternative explanations related to participants being aware of their heartbeats. Therefore, the study convincingly shows the effect of cardiac activity on BR - and this is an important finding. The results, however, do not allow for unambiguously determining if this effect is facilitatory or suppressive (see details below), which renders the study not as informative as it could be.

      While the authors strongly focus on interoception and awareness, this study will be of interest to researchers studying BR as such. Moreover, the code and the data the authors share can facilitate the adoption of their methods in other labs.

      Strengths:

      (1) The study required a complex technical setup and the manuscript both describes it well and demonstrates that it was free from potential technical issues (e.g. in section 3.3. Manipulation check).

      (2) The sophisticated statistical methods the authors used, at least for a non-statistician like me, appear to be well-suited for their purpose. For example, they take into account the characteristics of BR (gamma distributions of dominance durations). Moreover, the authors demonstrate that at least in one case their approach is more conservative than a more basic one (Binomial test) would be.

      (3) Finally, the control experiment, and the analysis it enabled, allow for excluding a multitude of alternative explanations of the main results.

      (4) The authors share all their data and materials, even the code for the experiment.

      (5) The manuscript is well-written. In particular, it introduces the problem and methods in a way that should be easy to understand for readers coming from different research fields.

      Weaknesses:

      (1) The interpretation of the main result in the context of the Baroceptor hypothesis is not clear. The manuscript states: The Baroreceptor Hypothesis would predict that the stimulus entrained to systole would spend more time suppressed and, conversely, less time dominant, as cortical activity would be suppressed each time that stimulus pulses. The manuscript does not specify why this should be the case, and the term 'entrained' is not too helpful here (does it refer to neural entrainment? or to 'being in phase with'?). The answer to this question is provided by the manuscript only implicitly, and, to explain my concern, I try to spell it out here in a slightly simplified form.

      During systole (cardiac contraction), the visual system is less sensitive to external information, so it 'ignores' periods when the systole-synchronised stimulus is at the peak of its pulse. Conversely, the system is more sensitive during diastole, so the stimulus that is at the peak of its pulse then should dominate for longer, because its peaks are synchronised with the periods of the highest sensitivity of the visual system when the information used to resolve the rivalry is sampled from the environment. This idea, while indeed being a clever test of the hypothesis in question, rests on one critical assumption: that the peak of the stimulus pulse (as defined in the manuscript) is the time when the stimulus is the strongest for the visual system. The notion of 'stimulus strength' is widely used in the BR literature (see Brascamp et al., 2015 for a review). It refers to the stimulus property that, simply speaking, determines its tendency to dominate in the BR. The strength of a stimulus is underpinned by its low-level visual properties, such as contrast and spatial frequency content. Coming back to the manuscript, the pulsing of the stimuli affected at least spatial frequency (and likely other low-level properties), and it is unknown if it was in phase with the pulsing of the stimulus strength, or not. If my understanding of the premise of the study is correct, the conclusions drawn by the authors stand only if it was.

      In other words, most likely the strength of one of the stimuli was pulsating in sync with the systole, but is it not clear which stimulus it was. It is possible that, for the visual system, the stimulus meant to pulse in sync with the systole was pulsing strength-wise in phase with the diastole (and the one intended to pulse with in sync with the diastole strength-wise pulsed with the systole). If this is the case, the predictions of the Baroceptor Hypothesis hold, which would change the conclusion of the manuscript.

      We agree with Reviewer 3’s argumentation here. If the pulses decreased, rather than increased, effective stimulus strength, then the present results would indeed be consistent with the Baroreceptor Hypothesis. However, Wade et al. (1984) demonstrated that grating stimuli which pulse in the same manner (i.e. by dynamically varying the spatial frequency of the grating) as in our experiment indeed show increased stimulus strength relative to static stimuli, even if the dynamic stimuli have lower spatial frequency on average (https://doi.org/10.3758/BF03203891).

      We admit our results would be stronger had we included a replication of Wade at al. (1984) in our study, but in light of this previous work, our interpretation is indeed supported.

      (2) Using anaglyph goggles necessitates presenting stimuli of a different colour to each eye. The way in which different colours are presented can impact stimulus strength (e.g. consider that different anaglyph foils can attenuate the light they let through to different degrees). To deal with such effects, at least some studies on BR employed procedures of adjusting the colours for each participant individually (see Papathomas et al., 2004; Patel et al., 2015 and works cited there). While I think that counterbalancing applied in the study excludes the possibility that colour-related effects influenced the results, the effects of interest still could be stronger for one of the coloured foils.

      It is the case that, when we split the data up by eye (and thus by color), we only see statistically significant results for one eye – though the nominal direction of the effect is consistent across both eyes. So it is indeed possible that the effect could be stronger for one of the colored foils, but the present experiment was not designed to be powered to test that cardiac phase-by-color interaction.

      We concur with the Reviewer, however, that our use of counterbalancing excludes color-related effects as an explanation for our main findings.

      (3) Several aspects of the methods (e.g. the stimuli), are not described at the level of detail some readers might be accustomed to. The most important issue here is the task the participants performed. The manuscript says that they pressed a button whenever they experienced a switch in perception, but it is only implied that there were different buttons for each stimulus.

      There were indeed different buttons for each stimulus (i.e. a button to indicate their perception had switched to the red stimulus and another to indicate it had switched to blue). Our full, unmodified experiment code has been made available and is permanently archived (https://doi.org/10.5281/zenodo.10367327), so the full procedure is well documented and can be replicated exactly.

      Brascamp, J. W., Klink, P. C., & Levelt, W. J. M. (2015). The 'laws' of binocular rivalry: 50 years of Levelt's propositions. Vision Research, 109, 20-37. https://doi.org/10.1016/j.visres.2015.02.019

      Papathomas, T. V., Kovács, I., & Conway, T. (2004). Interocular grouping in binocular rivalry: Basic attributes and combinations. In D. Alais & R. Blake (Eds.), Binocular Rivalry (pp. 155-168). MIT Press

      Patel, V., Stuit, S., & Blake, R. (2015). Individual differences in the temporal dynamics of binocular rivalry and stimulus rivalry. Psychonomic Bulletin and Review, 22(2), 476-482. https://doi.org/10.3758/s13423-014-0695-1

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Zheng and colleagues assessed the real-world efficacy of SARS-CoV-2 vaccination against re-infection following the large omicron wave in Shanghai in April 2022. The study was performed among previously vaccinated individuals. The study successfully documents a small but real added protective benefit of re-vaccination, though this diminishes in previously boosted individuals. Unsurprisingly, vaccine preventative efficacy was higher if the vaccine was given in the month before the 2nd large wave in Shanghai. The re-infection rate of 24% suggests that long-term anti-COVID immunity is very difficult to achieve. The conclusions are largely supported by the analyses. These results may be useful for planning the timing of subsequent vaccine rollouts.

      Strengths:

      The strengths of the study are a very large and unique cohort based on synchronously timed single infection among individuals with well-documented vaccine histories. Statistical analyses seem appropriate. As with any cohort study, there are potential confounders and the possibility of misclassification and the authors outline limitations nicely in the discussion.

      Weaknesses:

      (1) Partially and fully vaccinated are never defined and it is difficult to understand how this differs from single, and double, booster vaccines. The figures including all of these groups are a bit confusing for this reason.

      We agree with the reviewer that the distinction between these groups could have been made clearer. To address this comment, we modified the legend of the figure that presents hazard ratios based on these two categorisations (here, and throughout this document, changes in the text are underlined):

      “Figure 3. Effect of post-infection vaccination on SARS-CoV-2 reinfection stratified by pre-infection vaccination. Error bars (95% CIs) and circles represent aHR for SARS-CoV-2 reinfection estimated using Cox proportional hazards models. V-I-V, 1V-I-V, 2V-I-V, 3V-I-V corresponds to any pre-infection vaccination, 1, 2 and 3 vaccine doses before infection, then vaccination, respectively; they were compared to  V-I, 1V-I, 2V-I, 3V-I, respectively. Partial V-I-V, Full V-I-V and Booster V-I-V represent partial vaccination, full vaccination and booster vaccination before infection, followed by post-infection vaccination, respectively. The number of doses received by individuals with partial versus full (and full with booster) vaccination depends on the type of SARS-CoV-2 vaccine received; in Table S3 we present a cross-classification of participants in the analytic population by these vaccination-related categorical variables.”

      Further, to facilitate visualisation of Figure 3, and emphasize that estimates are presented based on two different ways of categorising vaccination history, we have now included a horizontal line between estimates based on each category.

      Table S3 has been included in the Supplementary Appendix:

      (2) Figure 3 is a bit challenging to interpret because it is a bit atypical to compare each group to a different baseline (ie 2V-I-V vs 2V-I). I would label the y-axis 2V-I-V vs 2V-I (change all of the labels) to make this easier to understand.

      We agree that having the y-axis tick labels describing both groups being compared, rather than only describing the post-infection vaccination group, will help readers to understand this figure. In our response to the previous comment, we presented an updated version of this figure, where this change was also incorporated (see above).

      (3) A 15% reduction in infection is quite low. It would be helpful to discuss if any quantitative or qualitative signals suggest at least a reduction in severe outcomes such as death, hospitalization, ER visits, or long COVID. I am not sure that a 15% reduction in cases supports extra vaccination without some other evidence of added benefit.

      Unfortunately, data on the clinical severity of diagnosed SARS-CoV-2 infections were not available. Some previous studies on COVID-19 vaccines observed that effectiveness against severe outcomes was similar or higher than that for outcomes that do not imply severe disease (e.g. infection). For example, in a study in Israel comparing four versus three vaccine doses, Magen and colleagues observed that the effectiveness of a fourth dose, relative to three doses, was 52% against infection, 61% against symptomatic COVID-19, and 76% against COVID-19 related death (Magen et al. Fourth Dose of BNT162b2 mRNA Covid-19 Vaccine in a Nationwide Setting. NEJM 2022; see also, for example, Nasreen et al. Effectiveness of COVID-19 vaccines against symptomatic SARS-CoV-2 infection and severe outcomes with variants of concern in Ontario. Nature Microbiology 2022, or Sacco et al. Effectiveness of BNT162b2 vaccine against SARS-CoV-2 infection and severe COVID-19 in children aged 5–11 years in Italy: a retrospective analysis of January–April, 2022. Lancet 2022). However, this pattern of increasing effectiveness with increasing outcome severity was not consistently reported in all studies or settings. We agree that public health officials who will use our results to guide future vaccination policy in China and abroad need to interpret the results in the context of these other outcomes that were not assessed and of those previous studies, that, although performed in different epidemiological settings, suggest that our analysis does not capture all benefits of post-infection vaccine doses.

      We have now included the following statements in the Discussion section:

      “Finally, data on the severity of infections during the second wave were not available, which prevented analyses of clinical outcomes other than infections (e.g. COVID-19-related hospitalization or death). Although some previous studies (Magen et al. Fourth Dose of BNT162b2 mRNA Covid-19 Vaccine in a Nationwide Setting. NEJM 2022; Nasreen et al. Effectiveness of COVID-19 vaccines against symptomatic SARS-CoV-2 infection and severe outcomes with variants of concern in Ontario. Nature Microbiology 2022) estimated similar or higher vaccine effectiveness against severe outcomes compared to outcomes that presumably include both milder and severe presentations, this pattern was not observed in all studies. Epidemiologists and public health officials who will use our results to define vaccination policy should thus take into account the fact that our analysis does not capture all benefits of post-infection vaccinations.”

      (4) Why exclude the 74962 unvaccinated from the analysis. it would be interesting to see if getting vaccinated post-infection provides benefits to this group

      The reasons why we focused on individuals who had been vaccinated before their first infection were two: (i) in most settings, including those with SARS-CoV-2 epidemiologic history similar to that of Shanghai, a high percentage of the population has received vaccine doses; (ii) in settings with high vaccination coverage, the group of individuals who remain unvaccinated despite widespread availability of vaccines likely differs from those who have been vaccinated – for example, with regard to behavioural factors and comorbidity profile. Having said that, we agree that reporting analyses for the group of individuals who had not been vaccinated before first infection might be informative. We have thus included in the Supplementary Appendix a short section that reports results for this group of patients; Table S4 also presents these estimates.

      “Effect of post-infection vaccination in individuals with no history of vaccination before infection

      In this supplementary section, we present findings for individuals who were unvaccinated before infection during the first Omicron variant wave in Shanghai. For this group of individuals, post-infection vaccination did not confer significant protection against reinfection (adjusted hazard ratio [aHR] 1.06, 95% CI 0.97, 1.16). The analysis indicates that the effect of post-infection vaccine doses was not significant in both female (aHR 0.97 [0.84, 1.11]) and male individuals (aHR 1.12 [0.99, 1.26]), as well as for participants aged 60 years or older (aHR 0.92 [0.82, 1.04]) and younger adults (20-60 years) (aHR 1.12 [0.92, 1.37]). These results suggest that, in the context of the two Omicron variant waves in Shanghai, a first vaccine dose administered after infection did not provide a clear benefit in terms of reducing risk of subsequent infections for those not previously vaccinated.”

      We refer to this new analysis in the Results section:

      “For individuals who had received at least one vaccine dose before infection during the first Omicron variant wave, post-infection vaccination was protective against reinfection (adjusted hazard ratio [aHR] 0.82, 95% CI 0.79, 0.85). As shown in Figure 3, this protective effect was observed in subgroups defined by the number of pre-infection vaccine doses: aHR of 0.84 (95% CI, 0.76, 0.93) and 0.87 (95% CI, 0.83, 0.90) for one and two pre-infection doses respectively; and for patients with three vaccine doses prior to infection, the association was not statistically significant (aHR: 0.96 [0.74, 1.23]). When analyses are stratified by partial and full vaccination status before the first infection, an additional vaccine dose was protective (aHR 0.76 [0.68, 0.84], and 0.93 [0.89, 0.97], respectively); and among individuals who had received booster vaccination before the spread of the first Omicron variant wave in Shanghai, the hazard ratio estimate was consistent with a more limited effect (aHR: 0.95 [0.75, 1.22]). For comparison, results for individuals who had not been vaccinated before their first infection are shown in the Supplementary Appendix (supplementary section “Effect of post-infection vaccination in individuals with no history of vaccination before infection” and Table S4)”

      (5) Pudong should be defined for those who do not live in China.

      We have now included a sentence defining Pudong in the Methods section:

      “This study included individuals diagnosed with their first SARS-CoV-2 infection between April 1 and May 31, 2022 in the Pudong District, which is a large and densely populated district of Shanghai spanning an area of 1,210 square kilometers with a permanent resident population of 5.57 million, served by more than 30 hospitals and 60 community health centers;… ”

      (6) The discussion about healthcare utilization bias is welcomed and well done. It would be great to speculate on whether this bias might favor the null or alternative hypothesis.

      We believe the reviewer is referring to the following statement:

      “Differences in healthcare-seeking behavior could also bias case ascertainment between post-infection vaccinated and unvaccinated individuals, although, as we restricted the study population to individuals who had received at least one pre-infection dose, this potential bias might be more limited than in other vaccine studies.”

      Bias linked to healthcare seeking behaviour could affect the association between vaccination and infection in two different ways: individuals who are more health conscious are more likely to get vaccinated and also to seek medical care when infected, and this would bias results toward null; however, if the same individuals are also more likely to avoid exposure to potentially infectious individuals, their behaviour could also bias results in the opposite direction – that is, it would appear to increase vaccine effectiveness. As mentioned in the Discussion section, we expected this bias to be limited. We have now modified the paragraph:

      “Differences in healthcare-seeking behavior could also bias case ascertainment between post-infection vaccinated and unvaccinated individuals. Although we restricted the study population to individuals who had received at least one pre-infection vaccination, which suggests a higher degree of homogeneity in healthcare-seeking behaviour compared to that in the total population, it is possible that this bias might have affected our estimates. For example: individuals who were more health conscious might have been more likely to receive post-infection vaccination and also more likely to seek medical care or testing when reinfected, and this would have biased results toward the null; it is, however, also conceivable that these individuals were more likely to avoid contact with potentially infectious persons, which could have biased results in the opposite direction.”

      Reviewer #2 (Public Review):

      Summary:

      This paper evaluates the effect of COVID-19 booster vaccination on reinfection in Shanghai, China among individuals who received primary COVID-19 vaccination followed by initial infection, during an Omicron wave.

      Strengths:

      A large database is collated from electronic vaccination and infection records. Nearly 200,000 individuals are included in the analysis and 24% became reinfected.

      Weaknesses:

      The article is difficult to follow in terms of the objectives and individuals included in various analyses. There appear to be important gaps in the analysis. The electronic data are limited in their ability to draw causal conclusions.

      More detailed comments:

      In multiple places (abstract, introduction), the authors frame the work in terms of understanding the benefit of booster vaccination among individuals with hybrid immunity (vaccination + infection). However, their analysis population does not completely align with this framing. As best as I can tell, only individuals who first received COVID-19 vaccination, and subsequently experienced infection, were included. Why the analysis does not also consider individuals who were infected and then vaccinated is not clear.

      The focus of our analysis is on the most frequent scenario in many countries: settings where a high proportion of the population has been vaccinated. As mentioned in our response to a comment from Reviewer #1, those individuals who remain unvaccinated after the first years of this pandemic are likely to be different, with respect to many factors, from individuals with history of SARS-CoV-2 vaccination. Further, differences between unvaccinated and vaccinated individuals are likely setting-specific, linked to local availability of and access to vaccination, cultural differences in healthcare seeking behaviour, and possible differences in the frequencies of medical conditions that might influence (promote or prevent) vaccine uptake. We prefer to keep the focus of this work on individuals who had been vaccinated before their first infection; however, we have now included in the Supplementary Appendix a section, presented in a response to Reviewer #1, that reports results for this group of individuals.

      In vaccine effectiveness analyses, why was time since initial infection not examined as a modifier of the booster effect? Time since the onset of the Omicron wave is only loosely tied to the immune status of the individual.

      We agree with the reviewer that assessing effect modification by the time since initial infection would be important. However, in Shanghai, most initial infections occurred during a narrow time window relative to the time window between the first and second Omicron variant waves. Indeed, as mentioned in the Results section, most first infections (243,906, 88.8%) occurred in April; for 306 (0.1%) individuals, information on the date of first infection was not available. Given this narrow time window and in order to limit the number of comparisons in our study, we preferred not to investigate this aspect of the hybrid immunity. In settings where multiple SARS-CoV-2 waves occurred, over a longer period of time, which would imply sufficient variation in this variable “time since initial infection”, we believe that it would be essential to account for this.

      The effect of booster vaccination on preventing symptomatic vs. asymptomatic reinfection does not appear to have been evaluated; this is a key gap in the analysis and it would seem the data would support it.

      Not having clinical presentation data is a limitation in our study. That is a weakness of many real-world vaccine effectiveness analyses based large medical and administrative datasets. We have now explicitly mentioned this in the Discussion section.

      “Finally, data on the severity of infections during the second wave were not available, which prevented analyses of clinical outcomes other than infections (e.g. COVID-19-related hospitalization or death). Although some previous studies (Magen et al. Fourth Dose of BNT162b2 mRNA Covid-19 Vaccine in a Nationwide Setting. NEJM 2022; Nasreen et al. Effectiveness of COVID-19 vaccines against symptomatic SARS-CoV-2 infection and severe outcomes with variants of concern in Ontario. Nature Microbiology 2022) estimated similar or higher vaccine effectiveness against severe outcomes compared to outcomes that presumably include both milder and severe presentations, this pattern was not observed in all studies. Epidemiologists and public health officials who will use our results to define vaccination policy should thus take into account the fact that our analysis does not capture all benefits of post-infection vaccinations.”

      In lines 105-108, the demographic description of the analysis population is incomplete. Is sex or gender identity being described? Are any individuals non-binary? What is the age distribution? (Only the proportions 20-39 and under 6 are stated.)

      We have now clarified in the manuscript that only information on sex at birth was provided by the Center for Disease Control and Prevention in Shanghai. We made the following change in the Methods section:

      “Information on infection history as well as data on demographic variables (sex at birth, and age) were provided by Center for Disease Control and Prevention in Shanghai, China”

      We have also modified the legend of Table 1:

      “Table 1. Characteristics of the study population and reinfection rate by post-infection vaccination status. Here, reinfection rate refers to the percentage of the relevant study subpopulation with evidence of reinfection between December 1, 2022 and January 3, 2023. Note that for the variables on region, occupation, and clinical severity, data are missing for large fractions of the study population. Note also that information was only available on sex at birth, but not on gender.”

      Regarding the reviewer’s comment on the age distribution, this information is presented for the following categories in Table 1: 0-6 years, 7-19 years, 20-39 years, 40-59 years, and 60+ years. However, we had not referred to Table 1 in the section 3.1 of the manuscript. We have now corrected that:

      “To assess the effect of an additional vaccine dose given after infection, the analytic sample consisted of 199,312 individuals (Figure 1). 85,804 were women (43.1%); 836 (0.4%) had gender information missing. 38.1% of the study participants were aged 20 to 39 years and only 0.9% were aged 0 to 6 years (see Table 1 for additional information).”

      Figure 1 consort diagram is confusing. In the last row, are the two boxes independent or overlapping sets of individuals? Are all included in secondary analyses?

      We agree that additional information should have been provided in the legend. The boxes represent overlapping sets of individuals – that is, some individuals were included in both secondary analyses in the box on the left and in the box on the right. These analyses involved different ways of categorizing individuals. Below is the updated figure legend:

      “Figure 1. Flow chart describing the selection of participants for the analysis. The number of individuals in this figure is not the same as some of the numbers in Table 1 because of missing data in key variables. Note that in the bottom part of the chart, related to secondary analyses, the boxes represent overlapping sets of study participants; in other words, some individuals included in the secondary analyses that correspond to the left box were also included in analyses corresponding to the box on the right.”

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      Minor comment: the terms "vaccination"/"vaccinated" are used both to refer to the primary vaccination (pre-initial infection) and to the booster vaccination (post-initial vaccination), and this causes confusion.

      Thank you. We have now revised the manuscript (Methods, Results and Discussion sections) to use the terms “post-infection vaccination” and “post-infection vaccinated” to reduce ambiguity. We also included the following statement in the Background section:

      “In December 2022, an important change in the COVID-19 policy in China, namely the end of most social distancing measures and of mass screening activities, was associated with a second surge in SARS-CoV-2 infections in Shanghai. The current circulation of the virus in the Shanghainese population and reports of vaccine fatigue mean that it is important to estimate the protective effect of vaccination against reinfection in this population. In this study, we aimed to quantify the effect of vaccine doses given after a first infection on the risk of subsequent infection. For that, we used data collected during the first Omicron variant wave, when hundreds of thousands of individuals tested real-time polymerase chain reaction (RT-PCR)-positive for SARS-CoV-2 infection8 in Shanghai, of which 275,896 individuals in Pudong. The fact that the population in Shanghai was mostly SARS-CoV-2 infection naïve before the spread of the Omicron variant provides a unique opportunity to estimate the real-world benefit of post-infection vaccine doses in a population that was first exposed to infection during a relatively short and well-defined time window. We further investigated whether the number of pre-infection vaccination doses modified the protective effect of the post-infection dose against Omicron BA.5 sublineage. To avoid ambiguity in the text, in the following sections, we often refer to vaccine doses given after the initial infection as “post-infection vaccination” or “post-infection vaccine doses”.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study presents valuable findings on the potential of short-movie viewing fMRI protocol to explore the functional and topographical organization of the visual system in awake infants and toddlers. Although the data are compelling given the difficulty of studying this population, the evidence presented is incomplete and would be strengthened by additional analyses to support the authors' claims. This study will be of interest to cognitive neuroscientists and developmental psychologists, especially those interested in using fMRI to investigate brain organisation in pediatric and clinical populations with limited fMRI tolerance.

      We are grateful for the thorough and thoughtful reviews. We have provided point-bypoint responses to the reviewers’ comments, but first, we summarize the major revisions here. We believe these revisions have substantially improved the clarity of the writing and impact of the results.

      Regarding the framing of the paper, we have made the following major changes in response to the reviews:

      (1) We have clarified that our goal in this paper was to show that movie data contains topographic, fine-grained details of the infant visual cortex. In the revision, we now state clearly that our results should not be taken as evidence that movies could replace retinotopy and have reworded parts of the manuscript that could mislead the reader in this regard.

      (2) We have added extensive details to the (admittedly) complex methods to make them more approachable. An example of this change is that we have reorganized the figure explaining the Shared Response Modelling methods to divide the analytic steps more clearly.

      (3) We have clarified the intermediate products contributing to the results by adding 6 supplementary figures that show the gradients for each IC or SRM movie and each infant participant.

      In response to the reviews, we have conducted several major analyses to support our findings further:

      (1) To verify that our analyses can identify fine-grained organization, we have manually traced and labeled adult data, and then performed the same analyses on them. The results from this additional dataset validate that these analyses can recover fine-grained organization of the visual cortex from movie data.

      (2) To further explore how visual maps derived from movies compare to alternative methods, we performed an anatomical alignment control analysis. We show that high-quality maps can be predicted from other participants using anatomical alignment.

      (3) To test the contribution of motion to the homotopy analyses, we regressed out the motion effects in these analyses. We found qualitatively similar results to our main analyses, suggesting motion did not play a substantial role.

      (4) To test the contribution of data quantity to the homotopy analyses, we correlated the amount of movie data collected from each participant with the homotopy results. We did not find a relationship between data quantity and the homotopy results. 

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Ellis et al. investigated the functional and topographical organization of the visual cortex in infants and toddlers, as evidenced by movie-viewing data. They build directly on prior research that revealed topographic maps in infants who completed a retinotopy task, claiming that even a limited amount of rich, naturalistic movie-viewing data is sufficient to reveal this organization, within and across participants. Generating this evidence required methodological innovations to acquire high-quality fMRI data from awake infants (which have been described by this group, and elsewhere) and analytical creativity. The authors provide evidence for structured functional responses in infant visual cortex at multiple levels of analyses; homotopic brain regions (defined based on a retinotopy task) responded more similarly to one another than to other brain regions in visual cortex during movie-viewing; ICA applied to movie-viewing data revealed components that were identifiable as spatial frequency, and to a lesser degree, meridian maps, and shared response modeling analyses suggested that visual cortex responses were similar across infants/toddlers, as well as across infants/toddlers and adults. These results are suggestive of fairly mature functional response profiles in the visual cortex in infants/toddlers and highlight the potential of movie-viewing data for studying finer-grained aspects of functional brain responses, but further evidence is necessary to support their claims and the study motivation needs refining, in light of prior research.

      Strengths:

      - This study links the authors' prior evidence for retinotopic organization of visual cortex in human infants (Ellis et al., 2021) and research by others using movie-viewing fMRI experiments with adults to reveal retinotopic organization (Knapen, 2021).

      - Awake infant fMRI data are rare, time-consuming, and expensive to collect; they are therefore of high value to the community. The raw and preprocessed fMRI and anatomical data analyzed will be made publicly available.

      We are grateful to the reviewer for their clear and thoughtful description of the strengths of the paper, as well as their helpful outlining of areas we could improve.

      Weaknesses:

      - The Methods are at times difficult to understand and in some cases seem inappropriate for the conclusions drawn. For example, I believe that the movie-defined ICA components were validated using independent data from the retinotopy task, but this was a point of confusion among reviewers. 

      We acknowledge the complexity of the methods and wish to clarify them as best as possible for the reviewers and the readers. We have extensively revised the methods and results sections to help avoid potential misunderstandings. For instance, we have revamped the figure and caption describing the SRM pipeline (Figure 5).

      To answer the stated confusion directly, the ICA components were derived from the movie data and validated on the (completely independent) retinotopy data. There were no additional tasks. The following text in the paper explains this point:

      “To assess the selected component maps, we correlated the gradients (described above) of the task-evoked and component maps. This test uses independent data: the components were defined based on movie data and validated against task-evoked retinotopic maps.” Pg. 11

      In either case: more analyses should be done to support the conclusion that the components identified from the movie reproduce retinotopic maps (for example, by comparing the performance of movie-viewing maps to available alternatives (anatomical ROIs, group-defined ROIs). 

      Before addressing this suggestion, we want to restate our conclusions: features of the retinotopic organization of infant visual cortex could be predicted from movie data. We did not conclude that movie data could ‘reproduce’ retinotopic maps in the sense that they would be a replacement. We recognize that this was not clear in our original manuscript and have clarified this point throughout, including in this section of the discussion:

      “To be clear, we are not suggesting that movies work well enough to replace a retinotopy task when accurate maps are needed. For instance, even though ICA found components that were highly correlated with the spatial frequency map, we also selected some components that turned out to have lower correlations. Without knowing the ground truth from a retinotopy task, there would be no way to weed these out. Additionally, anatomical alignment (i.e., averaging the maps from other participants and anatomically aligning them to a held-out participant) resulted in maps that were highly similar to the ground truth. Indeed, we previously23 found that adult-defined visual areas were moderately similar to infants. While functional alignment with adults can outperform anatomical alignment methods in similar analyses27, here we find that functional alignment is inferior to anatomical alignment. Thus, if the goal is to define visual areas in an infant that lacks task-based retinotopy, anatomical alignment of other participants’ retinotopic maps is superior to using movie-based analyses, at least as we tested it.” Pg. 21

      As per the reviewer’s suggestion and alluded to in the paragraph above, we have created anatomically aligned visual maps, providing an analogous test to the betweenparticipant analyses like SRM. We find that these maps are highly similar to the ground truth. We describe this result in a new section of the results:

      “We performed an anatomical alignment analog of the functional alignment (SRM) approach. This analysis serves as a benchmark for predicting visual maps using taskbased data, rather than movie data, from other participants. For each infant participant, we aggregated all other infant or adult participants as a reference. The retinotopic maps from these reference participants were anatomically aligned to the standard surface template, and then averaged. These averages served as predictions of the maps in the test participant, akin to SRM, and were analyzed equivalently (i.e., correlating the gradients in the predicted map with the gradients in the task-based map). These correlations (Table S4) are significantly higher than for functional alignment (using infants to predict spatial frequency, anatomical alignment > functional alignment: ∆Fisher Z M=0.44, CI=[0.32–0.58], p<.001; using infants to predict meridians, anatomical alignment > functional alignment: ∆Fisher Z M=0.61, CI=[0.47–0.74], p<.001; using adults to predict spatial frequency, anatomical alignment > functional alignment: ∆Fisher Z

      M=0.31, CI=[0.21–0.42], p<.001; using adults to predict meridians, anatomical alignment > functional alignment: ∆Fisher Z M=0.49, CI=[0.39–0.60], p<.001). This suggests that even if SRM shows that movies can be used to produce retinotopic maps that are significantly similar to a participant, these maps are not as good as those that can be produced by anatomical alignment of the maps from other participants without any movie data.” Pg. 16–17

      Also, the ROIs used for the homotopy analyses were defined based on the retinotopic task rather than based on movie-viewing data alone - leaving it unclear whether movie-viewing data alone can be used to recover functionally distinct regions within the visual cortex.

      We agree with the reviewer that our approach does not test whether movie-viewing data alone can be used to recover functionally distinct regions. The goal of the homotopy analyses was to identify whether there was functional differentiation of visual areas in the infant brain while they watch movies. This was a novel question that provides positive evidence that these regions are functionally distinct. In subsequent analyses, we show that when these areas are defined anatomically, rather than functionally, they also show differentiated function (e.g., Figure 2). Nonetheless, our intention was not to use the homotopy analyses to define the regions. We have added text to clarify the goal and novelty of this analysis.

      “Although these analyses cannot define visual maps, they test whether visual areas have different functional signatures.” Pg. 6

      Additionally, even if the goal were to define areas based on homotopy, we believe the power of that analysis would be questionable. We would need to use a large amount of the movie data to define the areas, leaving a low-powered dataset to test whether their function is differentiated by these movie-based areas.

      - The authors previously reported on retinotopic organization of the visual cortex in human infants (Ellis et al., 2021) and suggest that the feasibility of using movie-viewing experiments to recover these topographic maps is still in question. They point out that movies may not fully sample the stimulus parameters necessary for revealing topographic maps/areas in the visual cortex, or the time-resolution constraints of fMRI might limit the use of movie stimuli, or the rich, uncontrolled nature of movies might make them inferior to stimuli that are designed for retinotopic mapping, or might lead to variable attention between participants that makes measuring the structure of visual responses across individuals challenging. This motivation doesn't sufficiently highlight the importance or value of testing this question in infants. Further, it's unclear if/how this motivation takes into account prior research using movie-viewing fMRI experiments to reveal retinotopic organization in adults (e.g., Knapen, 2021). Given the evidence for retinotopic organization in infants and evidence for the use of movie-viewing experiments in adults, an alternative framing of the novel contribution of this study is that it tests whether retinotopic organization is measurable using a limited amount of movie-viewing data (i.e., a methodological stress test). The study motivation and discussion could be strengthened by more attention to relevant work with adults and/or more explanation of the importance of testing this question in infants (is the reason to test this question in infants purely methodological - i.e., as a way to negate the need for retinotopic tasks in subsequent research, given the time constraints of scanning human infants?).

      We are grateful to the reviewer for giving us the opportunity to clarify the innovations of this research. We believe that this research contributes to our understanding of how infants process dynamic stimuli, demonstrates the viability and utility of movie experiments in infants, and highlights the potential for new movie-based analyses (e.g., SRM). We have now consolidated these motivations in the introduction to more clearly motivate this work:

      “The primary goal of the current study is to investigate whether movie-watching data recapitulates the organization of visual cortex. Movies drive strong and naturalistic responses in sensory regions while minimizing task demands12, 13, 24 and thus are a proxy for typical experience. In adults, movies and resting-state data have been used to characterize the visual cortex in a data-driven fashion25–27. Movies have been useful in awake infant fMRI for studying event segmentation28, functional alignment29, and brain networks30. However, this past work did not address the granularity and specificity of cortical organization that movies evoke. For example, movies evoke similar activity in infants in anatomically aligned visual areas28, but it remains unclear whether responses to movie content differ between visual areas (e.g., is there more similarity of function within visual areas than between31). Moreover, it is unknown whether structure within visual areas, namely visual maps, contributes substantially to visual evoked activity. Additionally, we wish to test whether methods for functional alignment can be used with infants. Functional alignment finds a mapping between participants using functional activity – rather than anatomy – and in adults can improve signal-to-noise, enhance across participant prediction, and enable unique analyses27, 32–34.” Pg. 3-4

      Furthermore, the introduction culminates in the following statement on what the analyses will tell us about the nature of movie-driven activity in infants:

      “These three analyses assess key indicators of the mature visual system: functional specialization between areas, organization within areas, and consistency between individuals.” Pg. 5

      Furthermore, in the discussion we revisit these motivations and elaborate on them further:

      [Regarding homotopy:] “This suggests that visual areas are functionally differentiated in infancy and that this function is shared across hemispheres31.” Pg. 19

      [Regarding ICA:] “This means that the retinotopic organization of the infant brain accounts for a detectable amount of variance in visual activity, otherwise components resembling these maps would not be discoverable.” Pg. 19–20

      [Regarding SRM:] “This is initial evidence that functional alignment may be useful for enhancing signal quality, like it has in adults27,32,33, or revealing changing function over development45.” Pg. 21

      Additionally, we have expanded our discussion of relevant work that uses similar methods such as the excellent research from Knapen (2021) and others:

      “In adults, movies and resting-state data have been used to characterize the visual cortex in a data-driven fashion25-27.” Pg. 4

      “We next explored whether movies can reveal fine-grained organization within visual areas by using independent components analysis (ICA) to propose visual maps in individual infant brains25,26,35,42,43.” Pg. 9

      Reviewer #2 (Public Review):

      Summary:

      This manuscript shows evidence from a dataset with awake movie-watching in infants, that the infant brain contains areas with distinct functions, consistent with previous studies using resting state and awake task-based infant fMRI. However, substantial new analyses would be required to support the novel claim that movie-watching data in infants can be used to identify retinotopic areas or to capture within-area functional organization.

      Strengths:

      The authors have collected a unique dataset: the same individual infants both watched naturalistic animations and a specific retinotopy task. These data position the authors to test their novel claim, that movie-watching data in infants can be used to identify retinotopic areas.

      Weaknesses:

      To claim that movie-watching data can identify retinotopic regions, the authors should provide evidence for two claims:

      - Retinotopic areas defined based only on movie-watching data, predict retinotopic responses in independent retinotopy-task-driven data.

      - Defining retinotopic areas based on the infant's own movie-watching response is more accurate than alternative approaches that don't require any movie-watching data, like anatomical parcellations or shared response activation from independent groups of participants.

      We thank the reviewer for their comments. Before addressing their suggestions, we wish to clarify that we do not claim that movie data can be used to identify retinotopic areas, but instead that movie data captures components of the within and between visual area organization as defined by retinotopic mapping. We recognize that this was not clear in our original manuscript and have clarified this point throughout, including in this section of the discussion:

      “To be clear, we are not suggesting that movies work well enough to replace a retinotopy task when accurate maps are needed. For instance, even though ICA found components that were highly correlated with the spatial frequency map, we also selected some components that turned out to have lower correlations. Without knowing the ground truth from a retinotopy task, there would be no way to weed these out. Additionally, anatomical alignment (i.e., averaging the maps from other participants and anatomically aligning them to a held-out participant) resulted in maps that were highly similar to the ground truth. Indeed, we previously23 found that adult-defined visual areas were moderately similar to infants. While functional alignment with adults can outperform anatomical alignment methods in similar analyses27, here we find that functional alignment with infants is inferior to anatomical alignment. Thus, if the goal is to define visual areas in an infant that lacks task-based retinotopy, anatomical alignment of other participants’ retinotopic maps is superior to using movie-based analyses, at least as we tested it.” Pg. 21

      In response to the reviewer’s suggestion, we compare the maps identified by SRM to the averaged, anatomically aligned maps from infants. We find that these maps are highly similar to the task-based ground truth and we describe this result in a new section:

      “We performed an anatomical alignment analog of the functional alignment (SRM) approach. This analysis serves as a benchmark for predicting visual maps using taskbased data, rather than movie data, from other participants. For each infant participant, we aggregated all other infant or adult participants as a reference. The retinotopic maps from these reference participants were anatomically aligned to the standard surface template, and then averaged. These averages served as predictions of the maps in the test participant, akin to SRM, and were analyzed equivalently (i.e., correlating the gradients in the predicted map with the gradients in the task-based map). These correlations (Table S4) are significantly higher than for functional alignment (using infants to predict spatial frequency, anatomical alignment < functional alignment: ∆Fisher Z M=0.44, CI=[0.32–0.58], p<.001; using infants to predict meridians, anatomical alignment < functional alignment: ∆Fisher Z M=0.61, CI=[0.47–0.74], p<.001; using adults to predict spatial frequency, anatomical alignment < functional alignment: ∆Fisher Z

      M=0.31, CI=[0.21–0.42], p<.001; using adults to predict meridians, anatomical alignment < functional alignment: ∆Fisher Z M=0.49, CI=[0.39–0.60], p<.001). This suggests that even if SRM shows that movies can be used to produce retinotopic maps that are significantly similar to a participant, these maps are not as good as those that can be produced by anatomical alignment of the maps from other participants without any movie data.” Pg. 16–17

      Note that we do not compare the anatomically aligned maps with the ICA maps statistically. This is because these analyses are not comparable: ICA is run within-participant whereas anatomical alignment is necessarily between-participant — either infant or adults. Nonetheless, an interested reader can refer to the Table where we report the results of anatomical alignment and see that anatomical alignment outperforms ICA in terms of the correlation between the predicted and task-based maps.

      Both of these analyses are possible, using the (valuable!) data that these authors have collected, but these are not the analyses that the authors have done so far. Instead, the authors report the inverse of (1): regions identified by the retinotopy task can be used to predict responses in the movies. The authors report one part of (2), shared responses from other participants can be used to predict individual infants' responses in the movies, but they do not test whether movie data from the same individual infant can be used to make better predictions of the retinotopy task data, than the shared response maps.

      So to be clear, to support the claims of this paper, I recommend that the authors use the retinotopic task responses in each individual infant as the independent "Test" data, and compare the accuracy in predicting those responses, based on:

      -  The same infant's movie-watching data, analysed with MELODIC, when blind experimenters select components for the SF and meridian boundaries with no access to the ground-truth retinotopy data.

      -  Anatomical parcellations in the same infant.

      -  Shared response maps from groups of other infants or adults.

      -  (If possible, ICA of resting state data, in the same infant, or from independent groups of infants).

      Or, possibly, combinations of these techniques.

      If the infant's own movie-watching data leads to improved predictions of the infant's retinotopic task-driven response, relative to these existing alternatives that don't require movie-watching data from the same infant, then the authors' main claim will be supported.

      These are excellent suggestions for additional analyses to test the suitability for moviebased maps to replace task-based maps. We hope it is now clear that it was never our intention to claim that movie-based data could replace task-based methods. We want to emphasize that the discoveries made in this paper — that movies evoke fine-grained organization in infant visual cortex — do not rely on movie-based maps being better than alternative methods for producing maps, such as the newly added anatomical alignment.

      The proposed analysis above solves a critical problem with the analyses presented in the current manuscript: the data used to generate maps is identical to the data used to validate those maps. For the task-evoked maps, the same data are used to draw the lines along gradients and then test for gradient organization. For the component maps, the maps are manually selected to show the clearest gradients among many noisy options, and then the same data are tested for gradient organization. This is a double-dipping error. To fix this problem, the data must be split into independent train and test subsets.

      We appreciate the reviewer’s concern; however, we believe it is a result of a miscommunication in our analytic strategy. We have now provided more details on the analyses to clarify how double-dipping was avoided. 

      To summarize, a retinotopy task produced visual maps that were used to trace both area boundaries and gradients across the areas. These data were then fixed and unchanged, and we make no claims about the nature of these maps in this paper, other than to treat them as the ground truth to be used as a benchmark in our analyses. The movie data, which are collected independently from the same infant in the session, used the boundaries from the retinotopy task (in the case of homotopy) or were compared with the maps from the retinotopy task (in the case of ICA and SRM). In other words, the statement that “the data used to generate maps is identical to the data used to validate those maps” is incorrect because we generated the maps with a retinotopy task and validated the maps with the movie data. This means no double dipping occurred.

      Perhaps a cause of the reviewer’s interpretation is that the gradients used in the analysis are not clearly described. We now provide this additional description:  “Using the same manually traced lines from the retinotopy task, we measured the intensity gradients in each component from the movie-watching data. We can then use the gradients of intensity in the retinotopy task-defined maps as a benchmark for comparison with the ICA-derived maps.” Pg. 10

      Regarding the SRM analyses, we take great pains to avoid the possibility of data contamination. To emphasize how independent the SRM analysis is, the prediction of the retinotopic map from the test participant does not use their retinotopy data at all; in fact, the predicted maps could be made before that participant’s retinotopy data were ever collected. To make this prediction for a test participant, we need to learn the inversion of the SRM, but this only uses the movie data of the test participant. Hence, there is no double-dipping in the SRM analyses. We have elaborated on this point in the revision, and we remade the figure and its caption to clarify this point:

      We also have updated the description of these results to emphasize how double-dipping was avoided:

      “We then mapped the held-out participant's movie data into the learned shared space without changing the shared space (Figure 5c). In other words, the shared response model was learned and frozen before the held-out participant’s data was considered.

      This approach has been used and validated in prior SRM studies45.” Pg. 14

      The reviewer suggests that manually choosing components from ICA is double-dipping. Although the reviewer is correct that the manual selection of components in ICA means that the components chosen ought to be good candidates, we are testing whether those choices were good by evaluating those components against the task-based maps that were not used for the ICA. Our statistical analyses evaluate whether the components chosen were better than the components that would have been chosen by random chance. Critically: all decisions about selecting the components happen before the components are compared to the retinotopic maps. Hence there is no double-dipping in the selection of components, as the choice of candidate ICA maps is not informed by the ground-truth retinotopic maps. We now clarify what the goal of this process is in the results:

      “Success in this process requires that 1) retinotopic organization accounts for sufficient variance in visual activity to be identified by ICA and 2) experimenters can accurately identify these components.” Pg. 10

      The reviewer also alludes to a concern that the researcher selecting the maps was not blind to the ground-truth retinotopic maps from participants and this could have influenced the results. In such a scenario, the researcher could have selected components that have the gradients of activity in the places that the infant has as ground truth. The researcher who made the selection of components (CTE) is one of the researchers who originally traced the areas in the participants approximately a year prior to the identification of ICs. The researcher selecting the components didn’t use the ground-truth retinotopic maps as reference, nor did they pay attention to the participant IDs when sorting the IC components. Indeed, they weren’t trying to find participants-specific maps per se, but rather aimed to find good candidate retinotopic maps in general. In the case of the newly added adult analyses, the ICs were selected before the retinotopic mapping was reviewed or traced; hence, no knowledge about the participant-specific ground truth could have influenced the selection of ICs. Even with this process from adults, we find results of comparable strength as we found in infants, as shown in Figure S3. Nonetheless, there is a possibility that this researcher’s previous experience of tracing the infant maps could have influenced their choice of components at the participant-specific level. If so, it was a small effect since the components the researcher selected were far from the best possible options (i.e., rankings of the selected components averaged in the 64th percentile for spatial frequency maps and the 68th percentile for meridian maps). We believe all reasonable steps were taken to mitigate bias in the selection of ICs.

      Reviewer #3 (Public Review):

      The manuscript reports data collected in awake toddlers recording BOLD while watching videos. The authors analyse the BOLD time series using two different statistical approaches, both very complex but do not require any a priori determination of the movie features or contents to be associated with regressors. The two main messages are that 1) toddlers have occipital visual areas very similar to adults, given that an SRM model derived from adult BOLD is consistent with the infant brains as well; 2) the retinotopic organization and the spatial frequency selectivity of the occipital maps derived by applying correlation analysis are consistent with the maps obtained by standard and conventional mapping.

      Clearly, the data are important, and the author has achieved important and original results. However, the manuscript is totally unclear and very difficult to follow; the figures are not informative; the reader needs to trust the authors because no data to verify the output of the statistical analysis are presented (localization maps with proper statistics) nor so any validation of the statistical analysis provided. Indeed what I think that manuscript means, or better what I understood, may be very far from what the authors want to present, given how obscure the methods and the result presentation are.

      In the present form, this reviewer considers that the manuscript needs to be totally rewritten, the results presented each technique with appropriate validation or comparison that the reader can evaluate.

      We are grateful to the reviewer for the chance to improve the paper. We have broken their review into three parts: clarification of the methods, validation of the analyses, and enhancing the visualization.

      Clarification of the methods

      We acknowledge that the methods we employed are complex and uncommon in many fields of neuroimaging. That said, numerous papers have conducted these analyses on adults (Beckman et al., 2005; Butt et al., 2015; Guntupalli et al., 2016; Haak & Beckman, 2018; Knapen, 2021; Lu et al., 2017) and non-human primates (Arcaro & Livingstone, 2017; Moeller et al., 2009). We have redoubled our efforts in the revision to make the methods as clear as possible, expanding on the original text and providing intuitions where possible. These changes have been added throughout and are too vast in number to repeat here, especially without context, but we hope that readers will have an easier time following the analyses now. 

      Additionally, we updated Figures 3 and 5 in which the main ICA and SRM analyses are described. For instance, in Figure 3’s caption we now add details about how the gradient analyses were performed on the components: 

      “We used the same lines that were manually traced on the task-evoked map to assess the change in the component’s response. We found a monotonic trend within area from medial to lateral, just like we see in the ground truth.” Pg. 11

      Regarding Figure 5, we reconsidered the best way to explain the SRM analyses and decided it would be helpful to partition the diagram into steps, reflecting the analytic process. These updates have been added to Figure 5, and the caption has been updated accordingly.

      We hope that these changes have improved the clarity of the methods. For readers interested in learning more, we encourage them to either read the methods-focused papers that debut the analyses (e.g., Chen et al., 2015), read the papers applying the methods (e.g., Guntupalli et al., 2016), or read the annotated code we publicly release which implements these pipelines and can be used to replicate the findings.

      Validation of the analyses

      One of the requests the reviewer makes is to validate our analyses. Our initial approach was to lean on papers that have used these methods in adults or primates (e.g., Arcaro,

      & Livingstone, 2017; Beckman et al., 2005; Butt et al., 2015; Guntupalli et al., 2016; Haak & Beckman, 2018; Knapen, 2021; Moeller et al., 2009) where the underlying organization and neurophysiology is established. However, we have made changes to these methods that differ from their original usage (e.g., we used SRM rather than hyperalignment, we use meridian mapping rather than traveling wave retinotopy, we use movie-watching data rather than rest). Hence, the specifics of our design and pipeline warrant validation. 

      To add further validation, we have rerun the main analyses on an adult sample. We collected 8 adult participants who completed the same retinotopy task and a large subset of the movies that infants saw. These participants were run under maximally similar conditions to infants (i.e., scanned using the same parameters and without the top of the head-coil) and were preprocessed using the same pipeline. Given that the relationship between adult visual maps and movie-driven (or resting-state) analyses has been shown in many studies (Beckman et al., 2005; Butt et al., 2015; Guntupalli et al., 2016; Haak & Beckman, 2018; Knapen, 2021; Lu et al., 2017), these adult data serve as a validation of our analysis pipeline. These adult participants were included in the original manuscript; however, they were previously only used to support the SRM analyses (i.e., can adults be used to predict infant visual maps). The adult results are described before any results with infants, as a way to engender confidence. Moreover, we have provided new supplementary figures of the adult results that we hope will be integrated with the article when viewing it online, such that it will be easy to compare infant and adult results, as per the reviewer’s request. 

      As per the figures and captions below, the analyses were all successful with the adult participants: 1) Homotopic correlations are higher than correlations between comparable areas in other streams or areas that are more distant within stream. 2) A multidimensional scaling depiction of the data shows that areas in the dorsal and ventral stream are dissimilar. 3) Using independent components analysis on the movie data, we identified components that are highly correlated with the retinotopy task-based spatial frequency and meridian maps. 4) Using shared response modeling on the movie data, we predicted maps that are highly correlated with the retinotopy task-based spatial frequency and meridian maps.

      These supplementary analyses are underpowered for between-group comparisons, so we do not statistically compare the results between infants and adults. Nonetheless, the pattern of adult results is comparable overall to the infant results. 

      We believe these adult results provide a useful validation that the infant analyses we performed can recover fine-grained organization.

      The reviewer raises an additional concern about the lack of visualization of the results. We recognize that the plots of the summary statistics do not provide information about the intermediate analyses. Indeed, we think the summary statistics can understate the degree of similarity between the components or predicted visual maps and the ground truth. Hence, we have added 6 new supplementary figures showing the intensity gradients for the following analyses: 1. spatial frequency prediction using ICA, 2. meridian prediction using ICA, 3. spatial frequency prediction using infant SRM, 4.

      meridian prediction using infant SRM, 5. spatial frequency prediction using adult SRM, and 6. meridian prediction using adult SRM.

      We hope that these visualizations are helpful. It is possible that the reviewer wishes us to also visually present the raw maps from the ICA and SRM, akin to what we show in Figure 3A and 3B. We believe this is out of scope of this paper: of the 1140 components that were identified by ICA, we selected 36 for spatial frequency and 17 for meridian maps. We also created 20 predicted maps for spatial frequency and 20 predicted meridian maps using SRM. This would result in the depiction of 93 subfigures, requiring at least 15 new full-page supplementary figures to display with adequate resolution. Instead, we encourage the reader to access this content themselves: we have made the code to recreate the analyses publicly available, as well as both the raw and preprocessed data for these analyses, including the data for each of these selected maps.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) As mentioned in the public review, the authors should consider incorporating relevant adult fMRI research into the Introduction and explain the importance of testing this question in infants.

      Our public response describes the several citations to relevant adult research we have added, and have provided further motivation for the project.

      (2) The authors should conduct additional analyses to support their conclusion that movie data alone can generate accurate retinotopic maps (i.e., by comparing this approach to other available alternatives).

      We have clarified in our public response that we did not wish to conclude that movie data alone can generate accurate retinotopic maps, and have made substantial edits to the text to emphasize this. Thus, because this claim is already not supported by our analyses, we do not think it is necessary to test it further.

      (3) The authors should re-do the homotopy analyses using movie-defined ROIs (i.e., by splitting the movie-viewing data into independent folds for functional ROI definition and analyses).

      As stated above, defining ROIs based on the movie content is not the intended goal of this project. Even if that were the general goal, we do not believe that it would be appropriate to run this specific analysis with the data we collected. Firstly, halving the data for ROI definition (e.g., using half the movie data to identify and trace areas, and then use those areas in the homotopy analysis to run on the other half of data) would qualitatively change the power of the analyses described here. Secondly, we would be unable to define areas beyond hV4/V3AB with confidence, since our retinotopic mapping only affords specification of early visual cortex. Thus we could not conduct the MDS analyses shown in Figure 2.

      (4) If the authors agree that a primary contribution of this study and paper is to showcase what is possible to do with a limited amount of movie-viewing data, then they should make it clearer, sooner, how much usable movie data they have from infants. They could also consider conducting additional analyses to determine the minimum amount of fMRI data necessary to reveal the same detailed characteristics of functional responses in the visual cortex.

      We agree it would be good to highlight the amount of movie data used. When the infant data is first introduced in the results section, we now state the durations:

      “All available movies from each session were included (Table S2), with an average duration of 540.7s (range: 186--1116s).” Pg. 5

      Additionally, we have added a homotopy analysis that describes the contribution of data quantity to the results observed. We compare the amount of data collected with the magnitude of same vs. different stream effect (Figure 1B) and within stream distance effect (Figure 1C). We find no effect of movie duration in the sample we tested, as reported below:

      “We found no evidence that the variability in movie duration per participant correlated with this difference [of same stream vs. different stream] (r=0.08, p=.700).” Pg. 6-7

      “There was no correlation between movie duration and the effect (Same > Adjacent: r=-

      0.01, p=.965, Adjacent > Distal: r=-0.09, p=.740).” Pg. 7

      (5) If any of the methodological approaches are novel, the authors should make this clear. In particular, has the approach of visually inspecting and categorizing components generated from ICA and movie data been done before, in adults/other contexts?

      The methods we employed are similar to others, as described in the public review.

      However, changes were necessary to apply them to infant samples. For instance, Guntupalli et al. (2016) used hyperalignment to predict the visual maps of adult participants, whereas we use SRM. SRM and hyperalignment have the same goal — find a maximally aligned representation between participants based on brain function — but their implementation is different. The application of functional alignment to infants is novel, as is their use in movie data that is relatively short by comparison to standard adult data. Indeed, this is the most thorough demonstration that SRM — or any functional alignment procedure — can be usefully applied to infant data, awake or sleeping. We have clarified this point in the discussion.

      “This is initial evidence that functional alignment may be useful for enhancing signal quality, like it has in adults27,32,33, or revealing changing function over development45, which may prove especially useful for infant fMRI52.” Pg. 21

      (6) The authors found that meridian maps were less identifiable from ICA and movie data and suggest that this may be because these maps are more susceptible to noise or gaze variability. If this is the case, you might predict that these maps are more identifiable in adult data. The authors could consider running additional analyses with their adult participants to better understand this result.

      As described in the manuscript, we hypothesize that meridian maps are more difficult to identify than spatial frequency maps because meridian maps are a less smooth, more fine-grained map than spatial frequency. Indeed, it has previously been reported (Moeller et al., 2009) that similar procedures can result in meridian maps that are constituted by multiple independent components (e.g., a component sensitive to horizontal orientations, and a separate component sensitive to vertical components). Nonetheless, we have now conducted the ICA procedure on adult participants and again find it is easier to identify spatial frequency components compared to meridian maps, as reported in the public review.

      Minor corrections:

      (1) Typo: Figure 3 title: "Example retintopic task vs. ICA-based spatial frequency maps.".

      Fixed

      (2) Given the age range of the participants, consider using "infants and toddlers"? (Not to diminish the results at all; on the contrary, I think it is perhaps even more impressive to obtain awake fMRI data from ~1-2-year-olds). Example: Figure 3 legend: "A) Spatial frequency map of a 17.1-monthold infant.".

      We agree with the reviewer that there is disagreement about the age range at which a child starts being considered a toddler. We have changed the terms in places where we refer to a toddler in particular (e.g., the figure caption the reviewer highlights) and added the phrase “infants and toddlers” in places where appropriate. Nonetheless, we have kept “infants” in some places, particularly those where we are comparing the sample to adults. Adding “and toddlers” could imply three samples being compared which would confuse the reader.

      (3) Figure 6 legend: The following text should be omitted as there is no bar plot in this figure: "The bar plot is the average across participants. The error bar is the standard error across participants.".

      Fixed

      (4) Table S1 legend: Missing first single quote: Runs'.

      Fixed

      Reviewer #2 (Recommendations For The Authors):

      I request that this paper cite more of the existing literature on the fMRI of human infants and toddlers using task-driven and resting-state data. For example, early studies by (first authors) Biagi, Dehaene-Lambertz, Cusack, and Fransson, and more recent studies by Chen, Cabral, Truzzi, Deen, and Kosakowski.

      We have added several new citations of recent task-based and resting state studies to the second sentence of the main text:

      “Despite the recent growth in infant fMRI1-6, one of the most important obstacles facing this research is that infants are unable to maintain focus for long periods of time and struggle to complete traditional cognitive tasks7.”

      Reviewer #3 (Recommendations For The Authors):

      In the following, I report some of my main perplexities, but many more may arise when the material is presented more clearly.

      The age of the children varies from 5 months to about 2 years. While the developmental literature suggests that between 1 and 2 years children have a visual system nearly adult-like, below that age some areas may be very immature. I would split the sample and perhaps attempt to validate the adult SRM model with the youngest children (and those can be called infants).

      We recognize the substantial age variability in our sample, which is why we report participant-specific data in our figures. While splitting up the data into age bins might reveal age effects, we do not think we can perform adequately powered null hypothesis testing of the age trend. In order to investigate the contribution of age, larger samples will be needed. That said, we can see from the data that we have reported that any effect of age is likely small. To elaborate: Figures 4 and 6 report the participant-specific data points and order the participants by age. There are no clear linear trends in these plots, thus there are no strong age effects.

      More broadly, we do not think there is a principled way to divide the participants by age. The reviewer suggests that the visual system is immature before the first year of life and mature afterward; however, such claims are the exact motivation for the type of work we are doing here, and the verdict is still out. Indeed, the conclusion of our earlier work reporting retinotopy in infants (Ellis et al., 2021) suggests that the organization of the early visual cortex in infants as young as 5 months — the youngest infant in our sample — is surprisingly adult-like.

      The title cannot refer to infants given the age span.

      There is disagreement in the field about the age at which it is appropriate to refer to children as infants. In this paper, and in our prior work, we followed the practice of the most attended infant cognition conference and society, the International Congress of Infant Studies (ICIS), which considers infants as those aged between 0-3 years old, for the purposes of their conference. Indeed, we have never received this concern across dozens of prior reviews for previous papers covering a similar age range. That said, we understand the spirit of the reviewer’s comment and now refer to the sample as “infants and toddlers” and to older individuals in our sample as “toddlers” wherever it is appropriate (the younger individuals would fairly be considered “infants” under any definition).

      Figure 1 is clear and an interesting approach. Please also show the average correlation maps on the cortical surface.

      While we would like to create a figure as requested, we are unsure how to depict an area-by-area correlation map on the cortical surface. One option would be to generate a seed-based map in which we take an area and depict the correlation of that seed (e.g., vV1) with all other voxels. This approach would result in 8 maps for just the task-defined areas, and 17 maps for anatomically-defined areas. Hence, we believe this is out of scope of this paper, but an interested reader could easily generate these maps from the data we have released.

      Figure 2 results are not easily interpretable. Ventral and dorsal V1-V3 areas represent upper or lower VF respectively. Higher dorsal and ventral areas represent both upper and lower VF, so we should predict an equal distance between the two streams. Again, how can we verify that it is not a result of some artifacts?

      In adults, visual areas differ in their functional response properties along multiple dimensions, including spatial coding. The dorsal/ventral stream hypothesis is derived from the idea that areas in each stream support different functions, independent of spatial coding. The MDS analysis did not attempt to isolate the specific contribution of spatial representations of each area but instead tested the similarity of function that is evoked in naturalistic viewing. Other covariance-based analyses specifically isolate the contribution of spatial representations (Haak et al., 2013); however, they use a much more constrained analysis than what was implemented here. The fact that we find broad differentiation of dorsal and ventral visual areas in infants is consistent with adults (Haak & Beckman, 2018) and neonate non-human primates (Arcaro & Livingstone, 2017). 

      Nonetheless, we recognize that we did not mention the differences in visual field properties across areas and what that means. If visual field properties alone drove the functional response then we would expect to see a clustering of areas based on the visual field they represent (e.g., hV4 and V3AB should have similar representations). Since we did not see that, and instead saw organization by visual stream, the result is interesting and thus warrants reporting. We now mention this difference in visual fields in the manuscript to highlight the surprising nature of the result.

      “This separation between streams is striking when considering that it happens despite differences in visual field representations across areas: while dorsal V1 and ventral V1 represent the lower and upper visual field, respectively, V3A/B and hV4 both have full visual field maps. These visual field representations can be detected in adults41; however, they are often not the primary driver of function39. We see that in infants too: hV4 and V3A/B represent the same visual space yet have distinct functional profiles.” Pg. 8

      The reviewer raises a concern that the MDS result may be spurious and caused by noise. Below, we present three reasons why we believe these results are not accounted for by artifacts but instead reflect real functional differentiation in the visual cortex. 

      (1) Figure 2 is a visualization of the similarity matrix presented in Figure S1. In Figure S1, we report the significance testing we performed to confirm that the patterns differentiating dorsal and ventral streams — as well as adjacent areas from distal areas — are statistically reliable across participants. If an artifact accounted for the result then it would have to be a kind of systematic noise that is consistent across participants.

      (2) One of the main sources of noise (both systematic and non-systematic) with infant fMRI is motion. Homotopy is a within-participant analysis that could be biased by motion. To assess whether motion accounts for the results, we took a conservative approach of regressing out the framewise motion (i.e., how much movement there is between fMRI volumes) from the comparisons of the functional activity in regions. Although the correlations numerically decreased with this procedure, they were qualitatively similar to the analysis that does not regress out motion:

      “Additionally, if we control for motion in the correlation between areas --- in case motion transients drive consistent activity across areas --- then the effects described here are negligibly different (Figure S5).” Pg. 7

      (3) We recognize that despite these analyses, it would be helpful to see what this pattern looks like in adults where we know more about the visual field properties and the function of dorsal and ventral streams. This has been done previously (e.g., Haak & Beckman, 2018), but we have now run those analyses on adults in our sample, as described in the public review. As with infants, there are reliable differences in the homotopy between streams (Figure S1). The MDS results show that the adult data was more complex than the infant data, since it was best described by 3 dimensions rather than 2. Nonetheless, there is a rotation of the MDS such that the structure of the ventral and dorsal streams is also dissociable. 

      Figure 3 also raises several alternative interpretations. The spatial frequency component in B has strong activity ONLY at the extreme border of the VF and this is probably the origin of the strong correlation. I understand that it is only one subject, but this brings the need to show all subjects and to report the correlation. Also, it is important to show the putative average ICA for retinotopy and spatial frequencies across subjects and for adults. All methods should be validated on adults where we have clear data for retinotopy and spatial frequency.

      The reviewer notes that the component in Figure 3 shows strong negative response in the periphery. It is often the case, as reported elsewhere (Moeller et al., 2009), that ICA extracts portions of visual maps. To make a full visual map would require combining components into a composite (e.g., a component that has a high response in the periphery and another component that has a high response in the fovea). If we were to claim that this component, or others like it, could replace the need for retinotopic mapping, then we would want to produce these composite maps; however, our conclusion in this project is that the topographic information of retinotopic maps manifest in individual components of ICA. For this purpose, the analysis we perform adequately assesses this topography.

      Regarding the request to show the results for all subjects, we address this in the public response and repeat it here briefly: we have added 6 new figures to show results akin to Figure 3C and D. It is impractical to show the equivalent of Figure 3A and B for all participants, yet we do release the data necessary to see to visualize these maps easily.

      Finally, the reviewer suggests that we validate the analyses on adult participants. As shown in Figure S3 and reported in the public response, we now run these analyses on adult participants and observe qualitatively similar results to infants.

      How much was the variation in the presumed spatial frequency map? Is it consistent with the acuity range? 5-month-old infants should have an acuity of around 10c/deg, depending on the mean luminance of the scene.

      The reviewer highlights an important weakness of conducting ICA: we cannot put units on the degree of variation we see in components. We now highlight this weakness in the discussion:

      “Another limitation is that ICA does not provide a scale to the variation: although we find a correlation between gradients of spatial frequency in the ground truth and the selected component, we cannot use the component alone to infer the spatial frequency selectivity of any part of cortex. In other words, we cannot infer units of spatial frequency sensitivity from the components alone.” Pg. 20

      Figure 5 pipeline is totally obscure. I presumed that I understood, but as it is it is useless. All methods should be clearly described, and the intermediate results should be illustrated in figures and appropriately discussed. Using such blind analyses in infants in principle may not be appropriate and this needs to be verified. Overall all these techniques rely on correlation activities that are all biased by head movement, eye movement, and probably the dummy sucking. All those movements need to be estimated and correlated with the variability of the results. It is a strong assumption that the techniques should work in infants, given the presence of movements.

      We recognize that the SRM methods are complex. Given this feedback, we remade Figure 5 with explicit steps for the process and updated the caption (as reported in the public review).

      Regarding the validation of these methods, we have added SRM analyses from adults and find comparable results. This means that using these methods on adults with comparable amounts of data as what we collected from infants can predict maps that are highly similar to the real maps. Even so, it is not a given that these methods are valid in infants. We present two considerations in this regard. 

      First, as part of the SRM analyses reported in the manuscript, we show that control analyses are significantly worse than the real analyses (indicated by the lines on Figure 6). To clarify the control analysis: we break the mapping (i.e., flip the order of the data so that it is backwards) between the test participant and the training participants used to create the SRM. The fact that this control analysis is significantly worse indicates that SRM is learning meaningful representations that matter for retinotopy. 

      Second, we believe that this paper is a validation of SRM for infants. Infant fMRI is a nascent field and SRM has the potential to increase the signal quality in this population. We hope that readers will see these analyses as a proof of concept that SRM can be used in their work with infants. We have stated this contribution in the paper now.

      “Additionally, we wish to test whether methods for functional alignment can be used with infants. Functional alignment finds a mapping between participants using functional activity -- rather than anatomy -- and in adults can improve signal-to-noise, enhance across participant prediction, and enable unique analyses27,32-34.” Pg. 4

      “This is initial evidence that functional alignment may be useful for enhancing signal quality, like it has in adults27,32,33, or revealing changing function over development45.” Pg. 21

      Regarding the reviewer’s concern that motion may bias the results, we wish to emphasize the nature of the analyses being conducted here: we are using data from a group of participants to predict the neural responses in a held-out participant. For motion to explain consistency between participants, the motion would need to be timelocked across participants. Even if motion was time-locked during movie watching, motion will impair the formation of an adequate model that can contain retinotopic information. Thus, motion should only hurt the ability for a shared response to be found that can be used for predicting retinotopic maps. Hence, the results we observed are despite motion and other sources of noise.

      What is M??? is it simply the mean value??? If not, how it is estimated?

      M is an abbreviation for mean. We have now expanded the abbreviation the first time we use it.

      Figure 6 should be integrated with map activity where the individual area correlation should be illustrated. Probably fitting SMR adult works well for early cortical areas, but not for more ventral and associative, and the correlation should be evaluated for the different masks.

      With the addition of plots showing the gradients for each participant and each movie (Figures S10–S13) we hope we have addressed this concern. We additionally want to clarify that the regions we tested in the analysis in Figure 6 are only the early visual areas V1, V2, V3, V3A/B, and hV4. The adult validation analyses show that SRM works well for predicting the visual maps in these areas. Nonetheless, it is an interesting question for future research with more extensive retinotopic mapping in infants to see if SRM can predict maps beyond extrastriate cortex.

      Occipital masks have never been described or shown.

      The occipital mask is from the MNI probabilistic structural atlas (Mazziotta et al., 2001), as reported in the original version and is shared with the public data release. We have added the additional detail that the probabilistic atlas is thresholded at 0% in order to be liberally inclusive. 

      “We used the occipital mask from the MNI structural atlas63 in standard space -- defined liberally to include any voxel with an above zero probability of being labelled as the occipital lobe -- and used the inverted transform to put it into native functional space.” Pg. 27–28

      Methods lack the main explanation of the procedures and software description.

      We hope that the additions we have made to address this reviewer’s concerns have provided better explanations for our procedures. Additionally, as part of the data and code release, we thoroughly explain all of the software needed to recreate the results we have observed here.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      In this study, Bu et al examined the dynamics of TRPV4 channel in cell overcrowding in carcinoma conditions. They investigated how cell crowding (or high cell confluence) triggers a mechano-transduction pathway involving TRPV4 channels in high-grade ductal carcinoma in situ (DCIS) cells that leads to large cell volume reduction (or cell volume plasticity) and pro-invasive phenotype.

      In vitro, this pathway is highly selective for highly malignant invasive cell lines derived from a normal breast epithelial cell line (MCF10CA) compared to the parent cell line, but not present in another triple-negative invasive breast epithelial cell line (MDA-MB-231). The authors convincingly showed that enhanced TRPV4 plasma membrane localization correlates with high-grade DCIS cells in patient tissue samples.

      Specifically in invasive MCF10DCIS.com cells, they showed that overcrowding or over-confluence leads to a decrease in cell volume and intracellular calcium levels. This condition also triggers the trafficking of TRPV4 channels from intracellular stores (nucleus and potentially endosomes), to the plasma membrane (PM). When these over-confluent cells are incubated with a TRPV4 activator, there is an acute and substantial influx of calcium, attesting to the fact that there are a high number of TRPV4 channels present on the PM. Long-term incubation of these over-confluent cells with the TRPV4 activator results in the internalization of the PM-localized TRPV4 channels.

      In contrast, cells plated at lower confluence primarily have TRPV4 channels localized in the nucleus and cytosol. Long-term incubation of these cells at lower confluence with a TRPV4 inhibitor leads to the relocation of TRPV4 channels to the plasma membrane from intracellular stores and a subsequent reduction in cell volume. Similarly, incubation of these cells at low confluence with PEG 3000 (a hyperosmotic agent) promotes the trafficking of TRPV4 channels from intracellular stores to the plasma membrane.

      Strengths:

      The study is elegantly designed and the findings are novel. Their findings on this mechano-transduction pathway involving TRPV4 channels, calcium homeostasis, cell volume plasticity, motility, and invasiveness will have a great impact in the cancer field and are potentially applicable to other fields as well. Experiments are well-planned and executed, and the data is convincing. The authors investigated TRVP4 dynamics using multiple different strategies- overcrowding, hyperosmotic stress, and pharmacological means, and showed a good correlation between different phenomena.

      Weaknesses:

      A major emphasis in the study is on pharmacological means to relate TRPV4 channel function to the phenotype. I believe the use of genetic means would greatly enhance the impact and provide compelling proof for the involvement of TRPV4 channels in the associated phenotype. In this regard, I wonder if siRNA-mediated knockdown of TRPV4 in over-confluent cells (or knockout) would lead to an increase in cell volume and normalize the intracellular calcium levels back to normal, thus ultimately leading to a decrease in cell invasiveness.

      We greatly appreciate the positive feedback regarding the design of our study and the novelty of our findings. We also acknowledge the constructive suggestion to complement our pharmacological approaches with genetic manipulation of TRPV4.

      In response to the comment regarding siRNA-mediated knockdown or knockout of TRPV4, we fully agree that this would further substantiate our findings. We will use shRNA targeting TRPV4 approaches to further explore the functional effects of TRPV4 knockdown on cell volume plasticity, intracellular calcium level changes, and invasiveness phenotypes through motility assays at the single cell level under cell crowding or hyperosmotic stress and will include these results in our revised manuscript.

      Reviewer #2 (Public review):

      Summary:

      The metastasis poses a significant challenge in cancer treatment. During the transition from non-invasive cells to invasive metastasis cells, cancer cells usually experience mechanical stress due to a crowded cellular environment. The molecular mechanisms underlying mechanical signaling during this transition remain largely elusive. In this work, the authors utilize an in vitro cell culture system and advanced imaging techniques to investigate how non-invasive and invasive cells respond to cell crowding, respectively.

      Strengths:

      The results clearly show that pre-malignant cells exhibit a more pronounced reduction in cell volume and are more prone to spreading compared to non-invasive cells. Furthermore, the study identifies that TRPV4, a calcium channel, relocates to the plasma membrane both in vitro and in vivo (patient samples). Activation and inhibition of the TRPV4 channel can modulate the cell volume and cell mobility. These results unveil a novel mechanism of mechanical sensing in cancer cells, potentially offering new avenues for therapeutic intervention targeting cancer metastasis by modulating TRPV4 activity. This is a very comprehensive study, and the data presented in the paper are clear and convincing. The study represents a very important advance in our understanding of the mechanical biology of cancer.

      Weaknesses:

      However, I do think that there are several additional experiments that could strengthen the conclusions of this work. A critical limitation is the absence of genetic ablation of the TRPV4 gene to confirm its essential role in the response to cell crowding.

      We are grateful for the positive assessment of our study and the acknowledgment of the impact of our findings on the understanding of mechanical signaling in cancer progression. We also appreciate the suggestion to include genetic ablation experiments to confirm the role of TRPV4 in cell crowding responses. As noted in our response to reviewer #1, we plan to use shRNA TRPV4 to examine the functional effects of TRPV4 knockdown on cell volume plasticity, changes in intracellular calcium levels, and invasive phenotypes through motility assays at the single-cell level under conditions of cell crowding or hyperosmotic stress. We will include these results in our revised manuscript.

      Once again, we thank the reviewers for their valuable feedback, which will help us further improve our manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      In this manuscript, Ferhat and colleagues describe their study aimed at developing a blood-brain barrier (BBB) penetrant agent that could induce hypothermia and provide neuroprotection from the sequelae of status epilepticus (SE) in mice. Hypothermia is used clinically in an attempt to reduce neurological sequelae of injury and disease. Hypothermia can be effective, but physical means used to reduce core body temperature are associated with untoward effects. Pharmacological means to induce hypothermia could be as effective with fewer untoward complications. Intracerebroventricularly applied neurotensin can cause hypothermia; however, neurotensin applied peripherally is degraded and does not cross the BBB. Here the authors develop and characterize a neurotensin conjugate that can reach the brain, induce hypothermia, and reduce seizures, cognitive changes, and inflammatory changes associated with status epilepticus. 

      Strengths: <br /> (1) In general, the study is well-reasoned, well-designed, and seemingly well-executed. 

      (2) Strong dose-response assessment of multiple neurotensin conjugates in mice. 

      (3) Solid assessment of binding affinity, in vitro stability in blood, and brain uptake of the conjugate. 

      (4) Appropriate inclusion of controls for SE and for drug injections. However, perhaps a vehicle control could have been employed. 

      Sham animals received saline 0.9% which is the vehicle control considering it was used to dilute the water-soluble VH-N412 molecule.

      (5) Multifaceted assessment of neurodegeneration, inflammation, and mossy fiber sprouting in the different groups. 

      (6) Inclusion of behavioral assessments. 

      (7) Evaluates NSTR1 receptor distribution in multiple ways; however, does not evaluate changes in receptor distribution or ping wo/w SE and/or various drugs. 

      (8) Demonstrates that this conjugate can induce hypothermia and have positive effects on the sequelae of SE. Could have a great impact on the application of pharmacologically-induced hypothermia as a neuroprotective measure in patients. 

      Weaknesses: 

      (1) The authors make the claim, repeatedly, that the hypothermia caused by the neurotensin conjugate is responsible for the effects they see; however, what they really show is that the conjugate causes hypothermia AND has favorable effects on the sequelae of SE. They need to discuss that they did not administer the conjugate without allowing the pharmacological hypothermia (e.g., by warming the animal, etc.). 

      We agree with Reviewer 1. We indeed hypothesize that it is principally the hypothermia induced by the NT conjugate that is responsible for the effects we observe. However, we do not exclude the possibility that the conjugate itself can have direct effects on the sequelae of SE. We tried to address this question with the in vitro experiments. Our results suggest that indeed, in the absence of hypothermia, the conjugate showed intrinsic neuroprotection of cultured hippocampal neurons challenged with excitotoxic agents such as NMDA or KA. Besides the description of these results in the “Results Section”, end of page 19 of the original manuscript, we had discussed them at the end of the “Discussion Section”, top of page 43 of the original manuscript.

      In order to separate the hypothermia component from the potential direct neuroprotective effects of the NT conjugate, we did consider abolishing hypothermia in animals that were injected with the NT conjugate by warming them up. However, it is particularly difficult to increase in a well- controlled manner the body temperature of mice, in particular undergoing seizures, in a closed temperature-controlled chamber. In response to Reviewer 1 demand, we added a few sentences in the “Discussion Section”, page 45 of the revised version.

      (2) In the status epilepticus studies, it is unclear how or whether they monitored animals for the development of spontaneous seizures. Can the authors please describe this?

      The KA model we used was originally discovered more than 30 years ago, developed and very well characterized and mastered in our laboratory by Ben-Ari (Ben-Ari et al., 1979). Most of KA-treated mice that developed SE after KA injection developed spontaneous seizures subsequent to a latent period of about 1 week as described in Figure 3A, Results Section page 11 and in the reference we had mentioned in the Materials and Methods Section, page 27 (Schauwecker and Steward, 1997).

      We agree that information regarding the development of spontaneous seizures is missing. We added 2 references, Gröticke et al., 2008; Wu et al., 2021 in the Materials and Methods Section, page 28 of the revised version, that describe the occurrence of spontaneous seizures after KA administration in mice. We also now added the following information in the Materials and Methods Section, end of page 29: In order to study mice in the chronic stage of epilepsy with spontaneous seizures, they were observed daily (at least 3 hours per day) for general behavior and occurrence of SRS. These are highly reproducible in the mouse KA model, allowing for visual monitoring and scoring of epileptic activity. After 3 weeks, most animals exhibited SRS, with 2 to 3 seizures per day, similar to previous observations (Wu et al., 2021). The detection of at least one spontaneous seizure per day was used as criterion indicating the animals had reached chronic phase that can ultimately be confirmed by mossy fiber sprouting (see Figure 7).

      (3) They do not evaluate changes in receptor distribution or ping wo/w SE and/or various drugs. 

      It is not clear to us what changes in receptor distribution need evaluation. We suppose the question concerns NTSR1 receptor. It would indeed be very interesting to compare NTSR1 in brain regions and different brain cells wo/w SE and/or various drugs, to assess receptor distribution or re-distribution, if any. However, addressing such a question is a project in itself that could not be addressed in the present study. Reviewer 1 also evokes ping wo/w SE and/or various drugs and if our understanding is correct, Reviewer 1 alludes to PING, Pyramidal Interneuronal Network γ (Dugladze et al., 2013, see reference below). Although we did not assess PING per se, we used multi-electrode arrays (MEA) on hippocampal brain slices stimulated wo/w KA to assess whether the VH-N412 conjugate could modulate pyramidal neuron activity. In order to respond to Reviewer 1 concern we added these data as Figure S2 with corresponding modifications in the Material and Methods Section (pages 34-35), in the Results Section (page 19) and in the Discussion Section page 43 of the revised version of our manuscript.

      Dugladze T, Maziashvili N, Börgers C, Gurgenidze S, Häussler U, Winkelmann A, Haas CA, Meier JC, Vida I, Kopell NJ, Gloveli T. GABA(B) autoreceptor-mediated cell type-specific reduction of inhibition in epileptic mice. Proc Natl Acad Sci U S A. 2013 Sep 10;110(37):15073-8. doi: 10.1073/pnas.1313505110. Epub 2013 Aug 26. PMID: 23980149; PMCID: PMC3773756.

      Bas du formulaire

      (4) It is not clear why several different mouse strains were employed. 

      We used 2 mouse strains in our work as mentioned in the Materials and Methods Section, page 21. The conjugates we developed and hypothermia evaluation were initially tested on adult Swiss CD-1 males. For the KA model and for behavioral tests, adult male FVB/N mice were used because they are considered as reliable and well described mouse models of epilepsy, where seizures are associated with cell death (Schauwecker, 2003). This not the case for a number of mouse strains that demonstrate very heterogeneous behavior in SE and heterogeneous neuronal death, sprouting and neuroinflammation. The FVB/N are also well suited for behavioral tests.

      In response to the Reviewer 1 demand, the following sentence has been introduced in the Results Section, page 11 and in the Materials and Methods Section, page 21 of the revised manuscript: We assessed our conjugates in a model of KA-induced seizures using adult male FVB/N mice. This mouse strain was selected as a reliable and well described mouse model of epilepsy, where seizures are associated with cell death and neuroinflammation (Schauwecker, 2003; Wu et al., 2021).

      Reviewer #2 (Public Review): 

      Summary: 

      The authors generated analogs consisting of modified neurotensin (NT) peptides capable of binding to low-density lipoprotein (LDL) and NT receptors. Their lead analog was further evaluated for additional validation as a novel therapeutic. The putative mechanism of action for NT in its antiseizure activity is hypothermia, and as therapeutic hypothermia has been demonstrated in epilepsy, NT analogs may confer antiseizure activity and avoid the negative effects of induced hypothermia. 

      Strengths: 

      The authors demonstrate an innovative approach, i.e. using LDLR as a means of transport into the brain, that may extend to other compounds. They systematically validate their approach and its potential through binding, brain penetration, in vivo antiseizure efficacy, and neuroprotection studies. 

      Weaknesses: 

      Tolerability studies are warranted, given the mechanism of action and the potential narrow therapeutic index. In vivo studies were used to assess the efficacy of the peptide conjugate analogs in the mouse KA model. However, it would be beneficial to have shown tolerability in naïve animals to better understand the therapeutic potential of this approach. 

      Tolerability studies were performed, but the results were not presented in the first version of the manuscript. In order to comply with Reviewer 2 demand, we have added the following text in the Results section, page 11 of the revised version to describe our tolerability results.

      Finally, tolerability studies were performed with the administration up to 20 and 40 mg/kg Eq. NT (i.e. 25.8 and 51.6 mg/kg of VH-N412) with n=3 for these doses. The rectal temperature of the animals did not fall below 32.5 to 33.2°C, similar to the temperature induced with the 4 mg/kg Eq. NT dose. We observed no mortality or notable clinical signs other than those associated with the rapid HT effect such as a decrease in locomotor activity. We thus report a very interesting therapeutic index since the maximal tolerated dose (MTD) was > 40 mg/kg Eq. NT, while the maximum effect is observed at a 10x lower dose of 4 mg/kg Eq. NT and an ED50 established at 0.69 mg/kg as shown in Figure 1G.

      Mice may be particularly sensitive to hypothermia. It would be beneficial to show similar effects in a rat model. 

      We have tested our conjugate in mice, rats, and pigs, with in all cases nice dose response curves. We added a few words in the Discussion Section, page 38 of the revised version to mention that we can elicit hypothermia with our conjugates in the above-mentioned species.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      (1) In Figures 4, 5, 6, 8, and 9, scale bars are needed on all panels. 

      We have looked carefully at the Figures. Scale bars are present on all Figures, as mentioned in the Legends of all Figures, but not necessarily on all panel pictures at the same magnification.

      (2) The supplemental would seemingly be better moved into the main body of the manuscript. 

      In agreement with Reviewer 1 demand, we moved the Supplemental Figures into the main body of the manuscript, except for Figure S1, previously Figure S3, and the new Figure S2. Tables S1 to S5 remain as Supplemental files.

      Reviewer #2 (Recommendations For The Authors): 

      Activation of LDLRs can have widespread effects in the CNS and peripherally. The authors should further discuss any beneficial or untoward effects of binding to LDL and activating LDLRs. 

      As mentioned in the Introduction and in a number of references where we describe the development of our family of LDLR peptide ligands (see below), we only selected peptide ligands that do not compete with LDL, one of the major ligands of the LDLR. We indeed showed that while LDL binds the ligand-binding domain of the LDLR, the peptide ligands we developed bind to the EGF-precursor homology domain of the receptor (See Malcor et al., 2008 below).

      We have studied our peptide ligands in vitro and in vivo for more than 15 years and we have not observed beneficial or adverse effects. Actually, one of the members of our LDLR peptide family has been validated as a theragnostic agent and is in Phase 1 clinical trials for brain glioblastoma and pancreatic cancer. Hence, to our knowledge, the peptide ligand we describe in the present study shows no beneficial or untoward effects on LDL binding and activation of the LDLR. In response to Reviewer 2 recommendation, we added the following information and references in the Introduction Section, page 6 of the revised version of our manuscript: These peptides bind the EGF precursor homology domain of the LDLR and thus do not compete with LDL binding on the ligand-binding domain. To our knowledge, they have no beneficial or untoward effects on LDL binding and LDLR activity (Malcor et al., 2012; Jacquot et al., 2016; David et al., 2018; Varini et al., 2019; Acier et al., 2021, Yang et al., 2023; Broc et al., 2024).

      Broc B, Varini K, Sonnette R, Pecqueux B, Benoist F, Masse M, Mechioukhi Y, Ferracci G, Temsamani J, Khrestchatisky M, Jacquot G, Lécorché P. LDLR-Mediated Targeting and Productive Uptake of siRNA-Peptide Ligand Conjugates In Vitro and In Vivo. Pharmaceutics. 2024 Apr 17;16(4):548. doi: 10.3390/pharmaceutics16040548. PMID: 38675209; PMCID: PMC11054735.

      Yang X, Varini K, Godard M, Gassiot F, Sonnette R, Ferracci G, Pecqueux B, Monnier V, Charles L, Maria S, Hardy M, Ouari O, Khrestchatisky M, Lécorché P, Jacquot G, Bardelang D. Preparation and In Vitro Validation of a Cucurbit[7]uril-Peptide Conjugate Targeting the LDL Receptor. J Med Chem. 2023 Jul 13;66(13):8844-8857. doi: 10.1021/acs.jmedchem.3c00423. Epub 2023 Jun 20. PMID: 37339060. 

      Acier A, Godard M, Gassiot F, Finetti P, Rubis M, Nowak J, Bertucci F, Iovanna JL, Tomasini R, Lécorché P, Jacquot G, Khrestchatisky M, Temsamani J, Malicet C, Vasseur S, Guillaumond F. LDL receptor-peptide conjugate as in vivo tool for specific targeting of pancreatic ductal adenocarcinoma. Commun Biol. 2021 Aug 19;4(1):987. doi: 10.1038/s42003-021-02508-0. PMID: 34413441; PMCID: PMC8377056.

      Varini K, Lécorché P, Sonnette R, Gassiot F, Broc B, Godard M, David M, Faucon A, Abouzid K, Ferracci G, Temsamani J, Khrestchatisky M, Jacquot G. Target engagement and intracellular delivery of mono- and bivalent LDL receptor- binding peptide-cargo conjugates: Implications for the rational design of new targeted drug therapies. J Control Release. 2019 Nov 28;314:141-161. doi: 10.1016/j.jconrel.2019.10.033. Epub 2019 Oct 20. PMID: 31644939.

      David M, Lécorché P, Masse M, Faucon A, Abouzid K, Gaudin N, Varini K, Gassiot F, Ferracci G, Jacquot G, Vlieghe P, Khrestchatisky M. Identification and characterization of highly versatile peptide-vectors that bind non- competitively to the low-density lipoprotein receptor for in vivo targeting and delivery of small molecules and protein cargos. PLoS One. 2018 Feb 27;13(2):e0191052. doi: 10.1371/journal.pone.0191052. PMID: 29485998; PMCID: PMC5828360.

      Molino Y, David M, Varini K, Jabès F, Gaudin N, Fortoul A, Bakloul K, Masse M, Bernard A, Drobecq L, Lécorché P, Temsamani J, Jacquot G, Khrestchatisky M. Use of LDL receptor-targeting peptide vectors for in vitro and in vivo cargo transport across the blood-brain barrier. FASEB J. 2017 May;31(5):1807-1827. doi: 10.1096/fj.201600827R. Epub 2017 Jan 20. PMID: 28108572.

      Jacquot G, Lécorché P, Malcor JD, Laurencin M, Smirnova M, Varini K, Malicet C, Gassiot F, Abouzid K, Faucon A, David M, Gaudin N, Masse M, Ferracci G, Dive V, Cisternino S, Khrestchatisky M. Optimization and in Vivo Validation of Peptide Vectors Targeting the LDL Receptor. Mol Pharm. 2016 Dec 5;13(12):4094-4105. doi: 10.1021/acs.molpharmaceut.6b00687. Epub 2016 Oct 11. PMID: 27656777.

      Malcor JD, Payrot N, David M, Faucon A, Abouzid K, Jacquot G, Floquet N, Debarbieux F, Rougon G, Martinez J, Khrestchatisky M, Vlieghe P, Lisowski V. Chemical optimization of new ligands of the low-density lipoprotein receptor as potential vectors for central nervous system targeting. J Med Chem. 2012 Mar 8;55(5):2227-41. doi: 10.1021/jm2014919. Epub 2012 Feb 14. PMID: 22257077.

      As described above, the authors should also comment on the tolerability of these analogs. 

      Tolerability studies were performed, but the results were not presented in the first version of the manuscript. In order to comply with Reviewer 2 demand, we have added the following text in the Results section, page 11 of the revised version to describe our tolerability results.

      Finally, tolerability studies were performed with the administration up to 20 and 40 mg/kg Eq. NT (i.e. 25.8 and 51.6 mg/kg of VH-N412) with n=3 for these doses. The rectal temperature of the animals did not fall below 32.5 to 33.2°C, similar to the temperature induced with the 4 mg/kg Eq. NT dose. We observed no mortality or notable clinical signs other than those associated with the rapid HT effect such as a decrease in locomotor activity. We thus report a very interesting therapeutic index since the maximal tolerated dose (MTD) was > 40 mg/kg Eq. NT, while the maximum effect is observed at a 10x lower dose of 4 mg/kg Eq. NT and an ED50 established at 0.69 mg/kg as shown in Figure 1G.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We are deeply appreciative of the reviewers' insightful comments and constructive feedback on our manuscript. In response, we have implemented substantial revisions to enhance the clarity and impact of our work. Key changes include: 

      Reframing: We have shifted our focus from cognitive control to attention and memory processes, aligning more closely with our experimental design. This reframing is reflected throughout the manuscript, including additional citations highlighting the triple network model's involvement in memory processing. To reflect this change, we have updated the title to "Causal dynamics of salience, default mode, and frontoparietal networks during episodic memory formation and recall: A multi-experiment iEEG replication".

      Control analyses using resting-state epochs: We have conducted new analyses comparing task periods to resting baseline epochs. These results demonstrate enhanced directed information flow from the anterior insula to both the default mode and frontoparietal networks during encoding and recall periods compared to resting state across all four experiments. This finding underscores the anterior insula's critical role in memory and attention processing.

      Control analysis using the inferior frontal gyrus: To address specificity concerns, we performed control analyses using the inferior frontal gyrus as a comparison region. This analysis confirms that the observed directed information flow to the default mode and frontoparietal networks is specific to the anterior insula, rather than a general property of task-engaged brain regions.

      These revisions, combined with our rigorous methodologies and comprehensive analyses, provide compelling support for the central claims of our manuscript. We believe these changes significantly enhance the scientific contribution of our work.

      Our point-by-point responses to the reviewers' comments are provided below.

      Reviewer 1:

      -  The authors present results from an impressively sized iEEG sample. For reader context, this type of invasive human data is difficult and time-consuming to collect and many similar studies in high-level journals include 5-20 participants, typically not all of whom have electrodes in all regions of interest. It is excellent that they have been able to leverage open-source data in this way. 

      -  Preprocessing of iEEG data also seems sensible and appropriate based on field standards. 

      -  The authors tackle the replication issues inherent in much of the literature by replicating findings across task contexts, demonstrating that the principles of network communication evidenced by their results generalize in multiple task memory contexts. Again, the number of iEEG patients who have multiple tasks' worth of data is impressive. 

      We thank the reviewer for the encouraging comments and appreciate the positive feedback.  

      (1.1) The motivation for investigating the tripartite network during memory tasks is not currently well-elaborated. Though the authors mention, for example, that "the formation of episodic memories relies on the intricate interplay between large-scale brain networks (p. 4)", there are no citations provided for this statement, and the reader is unable to evaluate whether the nodes and networks evidenced to support these processes are the same as networks measured here. 

      Recommendation: Detail with citations the motivation for assessing the tripartite network in these tasks. Include work referencing network-level and local effects during encoding and recall.

      We appreciate the reviewer's feedback and suggestions for improving our framing. We have substantially expanded and revised the Introduction to elaborate on the motivation for investigating the tripartite network during memory tasks, supported by relevant citations.

      We now provide a stronger rationale for examining these networks in the context of episodic memory, emphasizing that while the tripartite network has been extensively studied in cognitive control tasks, growing evidence suggests its relevance to episodic memory as a domain-general network. We cite several key studies that demonstrate the involvement of the salience, default mode, and frontoparietal networks in memory processes, including work by Sestieri et al. (2014) and Vatansever et al. (2021), which show the consistent engagement of these networks during memory tasks. We have also included references to studies examining network-level and local effects during encoding and recall, such as the work by Xie et al. (2012) on disrupted intrinsic connectivity in amnestic mild cognitive impairment, and Le Berre et al. (2017) on the role of insula connectivity in memory awareness (pages 4-5).

      Furthermore, we have clarified how our study aims to address gaps in the current understanding by investigating the electrophysiological basis of these network interactions during memory formation and retrieval, which has not been explored in previous research. This expanded framing provides a clearer motivation for our investigation and places our study within the broader context of memory and network neuroscience research (pages 3-6).  

      (1.2) In addition, though the tripartite network has been proposed to support cognitive control processes, and the neural basis of cognitive control is the framed focus of this work, the authors do not demonstrate that they have measured cognitive control in addition to simple memory encoding and retrieval processes. Tasks that have investigated cognitive control over memory (such as those cited on p. 13 - Badre et al., 2005; Badre & Wagner, 2007; Wagner et al., 2001; Wagner et al., 2005) generally do not simply include encoding, delay, and recall (as the tasks used here), but tend to include some manipulation that requires participants to engage control processes over memory retrieval, such as task rules governing what choice should be made at recall (e.g., from Badre et al., 2005 Fig. 1: congruency of match, associative strength, number of choices, semantic similarity). Moreover, though there are task-responsive signatures in the nodes of the tripartite networks, concluding that cognitive control is present because cognitive control networks are active would be a reverse inference.

      Recommendation: If present, highlight components of the tasks that are known to elicit cognitive control processes and cite relevant literature. If the tasks cannot be argued to elicit cognitive control, reframe the motivation to focus on task-related attention or memory processes. If the latter, reframe the motivation for investigating the tripartite network in this context absent control.

      We appreciate the reviewer's insightful comment and recommendation. We acknowledge that our tasks do not include specific manipulations designed to elicit cognitive control processes over memory retrieval. In light of this, we have reframed our motivation and discussion to focus on the role of the tripartite network in attention and memory processes more broadly, rather than cognitive control specifically (pages 3-6).

      As noted in Response 1.1, we have revised the Introduction to emphasize the domain-general nature of these networks and their involvement in various cognitive processes, including memory. We also highlight how the salience, default mode, and frontoparietal networks contribute to different aspects of memory formation and retrieval, drawing on relevant literature.

      Our revised framing examines the salience network's role in detecting behaviorally relevant stimuli and orienting attention during encoding, the default mode network's involvement in internally-driven processes during recall, and the frontoparietal network's contribution to maintaining and manipulating information in working memory. We now present our study as an investigation into how these networks interact during different phases of memory processing, rather than focusing specifically on cognitive control. This approach aligns better with our experimental design and allows us to explore the broader applicability of the tripartite network model to memory processes. 

      This revised reframing provides a more accurate representation of our study's scope and contribution to understanding the role of large-scale brain networks in memory formation and retrieval (pages 3-6). 

      (1.3) It is currently unclear if the directed information flow from AI to DMN and FPN nodes truly arises from task-related processes such as cognitive control or if it is a function of static brain network characteristics constrained by anatomy (such as white matter connection patterns, etc.). This is a concern because the authors did not find that influences of AI on DMN or FPN are increased relative to a resting baseline (collected during the task) or that directed information flow differs in successful compared to unsuccessful retrieval. I doubt that this AI influence is 1) supporting a switch between the DMN and FPN via the SN or 2) relevant for behavior if it doesn't differ from baseline-active task or across accuracy conditions. An additional comparison that may help investigate whether this is reflective of static connectivity characteristics would be a baseline comparison during non-task rest or sleep periods.  

      Recommendation: As described in the task of the concern, analyze the PTE across the same contacts during sleep or task-free rest periods (if present in the dataset). 

      We thank the reviewer for this suggestion. We have now carried out additional analyses using resting-state baseline epochs. We found that directed information flow from the AI to both the DMN and FPN were enhanced during the encoding and recall periods compared to resting-state baseline in all four experiments. These new results have now been included in the revised Results (page 12):    

      “Enhanced information flow from the AI to the DMN and FPN during episodic memory processing, compared to resting-state baseline  

      We next examined whether directed information flow from the AI to the DMN and FPN nodes during the memory tasks differed from the resting-state baseline. Resting-state baselines were extracted immediately before the start of the task sessions and the duration of task and rest epochs were matched to ensure that differences in network dynamics could not be explained by differences in duration of the epochs. Directed information flow from the AI to both the DMN and FPN were higher during both the memory encoding and recall phases and across the four experiments, compared to baseline in all but two cases (Figures S6, S7). These findings provide strong evidence for enhanced role of AI directed information flow to the DMN and FPN during memory processing compared to the resting state.” 

      (1.4) Related to the above concern, it is also questionable how directed information flow from AI facilitates switching between FPN and DMN during both encoding and recall if high gamma activity does not significantly differ in AI versus PCC or mPFC during recall as it does during encoding. It seems erroneous to conclude that the network-level communication is happening or happening with the same effect during both task time points when these effects are decoupled in such a way from the power findings.  

      We appreciate the reviewer's insightful observation regarding the apparent discrepancy between our directed information flow findings and the high-gamma activity results. This comment highlights an important distinction in interpreting our results, and we thank the reviewer for the opportunity to address this point.

      Our findings demonstrate that directed information flow from the AI to the DMN and FPN persists during both encoding and recall, despite differences in local high-gamma activity patterns. This dissociation suggests that the network-level communication facilitated by the AI may operate independently of local activation levels in individual nodes. It is important to note that our directed connectivity analysis (using phase transfer entropy) was conducted on broadband signals (0.5-80 Hz), while the power analysis focused specifically on the high-gamma band (80-160 Hz). These different frequency ranges may capture distinct aspects of neural processing. The broadband connectivity might reflect more general, sustained network interactions, while high-gamma activity may be more sensitive to specific task demands or cognitive processes.

      The phase transfer entropy analysis captures directed interactions over extended time periods, while the high-gamma power analysis provides a more temporally precise measure of local neural activity. The persistent directed connectivity from AI during recall, despite changes in local activity, might reflect the AI's ongoing role in coordinating network interactions, even when its local activation is not significantly different from other regions.

      Rather than facilitating "switching" between FPN and DMN, as we may have previously overstated, our results suggest that the AI maintains a consistent pattern of influence on both networks across task phases. This influence might serve different functions during encoding (e.g., orienting attention to external stimuli) and recall (e.g., monitoring and evaluating retrieved information), even if local activation patterns differ.

      It is crucial to note that in the three verbal tasks, our analysis of memory recall is time-locked to word production onset. However, the precise timing of the internal recall process initiation is unknown. This limitation may affect our ability to capture the full dynamics of network interactions during recall, particularly in the early stages of memory retrieval. Interestingly, in the spatial memory task WMSM, the PCC/precuneus exhibited an earlier onset and enhanced activity compared to the AI. This task may provide a clearer window into recall processes:

      findings align with the view that DMN nodes may play a crucial role in triggering internal recall processes. However, the precise timing of internal retrieval initiation remains a challenge in the three verbal tasks, potentially limiting our ability to capture the full dynamics of regional activity, and its replicability, during early stages of recall.

      These observations highlight the need for more detailed investigation of the temporal dynamics of network interactions during memory recall. To further elucidate the relationship between directed connectivity and local activity, future studies could employ time-resolved connectivity analyses and investigate coupling between different frequency bands. This could provide a more precise understanding of how network-level communication relates to local neural dynamics across different task phases.

      We have revised the manuscript to more accurately reflect these points and avoid overstating the implications of our findings (pages 15-19). We thank the reviewer for prompting this important clarification, which we believe strengthens the interpretation and discussion of our results.

      (1.5) Missing information about the methods used for time-frequency conversion for power calculation and the power normalization/baseline-correction procedure bars a thorough evaluation of power calculation methods and results. 

      Recommendation: Include more information about how power was calculated. For example, how were time-series data converted to time-frequency (with complex wavelets, filter-hilbert, etc.)? What settings were used (frequency steps, wavelet length)? How were power values checked for outliers and normalized (decibels, Z-transform)? How was baseline correction applied (subtraction, division)?

      We have now included detailed information related to our power calculation and normalization steps as we note on page 28: “We first filtered the signals in the high-gamma (80160 Hz) frequency band (Canolty et al., 2006; Helfrich & Knight, 2016; Kai J. Miller, Weaver, & Ojemann, 2009) using sequential band-pass filters in increments of 10 Hz (i.e., 80–90 Hz, 90– 100 Hz, etc.), using a fourth order two-way zero phase lag Butterworth filter. We used these narrowband filtering processing steps to correct for the 1/f decay of power. We then calculated the amplitude (envelope) of each narrow band signal by taking the absolute value of the analytic signal obtained from the Hilbert transform (Foster, Rangarajan, Shirer, & Parvizi, 2015). Each narrow band amplitude time series was then normalized to its own mean amplitude, expressed as a percentage of the mean. Finally, we calculated the mean of the normalized narrow band amplitude time series, producing a single amplitude time series. Signals were then smoothed using 0.2s windows with 90% overlap (Kwon et al., 2021) and normalized with respect to 0.2s pre-stimulus periods by subtracting the pre-stimulus baseline from the post-stimulus signal.” 

      (1.6) If revisions to the manuscript can address concerns about directed information flow possibly being due to anatomical constraints - such as by indicating that directed information flow is not present during non-task rest or sleep - this work may convey important information about the structure and order of communication between these networks during attention to tasks in general. However, the ability of the findings to address cognitive control-specific communication and the nature of neurophysiological mechanisms of this communication - as opposed to the temporal order and structure of recruited networks - may be limited.

      We appreciate the reviewer's insightful feedback, which has led to significant improvements in our manuscript. In response, we have made the following key changes. We have shifted our focus from cognitive control to the broader roles of the tripartite network in attention and memory processes. This reframing aligns more closely with our experimental design and the nature of our tasks. We have revised the Introduction, Results, and Discussion sections to reflect this perspective, providing a more accurate representation of our study's scope and contribution. Additionally, to strengthen our findings, we have conducted new analyses comparing task periods to resting-state baselines. These analyses revealed that directed information flow from the anterior insula to both the DMN and FPN was significantly enhanced during memory encoding and recall periods compared to resting-state across all four experiments. This finding provides robust evidence for the specific involvement of these network interactions in memory processing. Please also see Response 1.2 above. 

      (1.7) Because phase-transfer entropy is presented as a "causal" analysis in this investigation (PTE), I also believe it is important to highlight for readers recent discussions surrounding the description of "causal mechanisms" in neuroscience (see "Confusion about causation" section from Ross and Bassett, 2024, Nature Neuroscience). A large proportion of neuroscientists (admittedly, myself included) use "causal" only to refer to a mechanism whose modulation or removal (with direct manipulation, such as by lesion or stimulation) is known to change or control a given outcome (such as a successful behavior). As Ross and Bassett highlight, it is debatable whether such mechanistic causality is captured by Granger "causality" (a.k.a. Granger prediction) or the parametric PTE, and the imprecise use of "causation" may be confusing. The authors could consider amending language regarding this analysis if they are concerned about bridging these definitions of causality across a wide audience. 

      We thank the reviewer for this suggestion. We would like to clarify here that we define causality in our manuscript as follows: a brain region has a causal influence on a target if knowing the past history of temporal signals in both regions improves the ability to predict the target's signal in comparison to knowing only the target's past, as defined in earlier studies (Granger, 1969; Lobier, Siebenhühner, Palva, & Matias, 2014). We have now included this clarification in the Introduction section (page 6).  

      We also agree with the reviewer that to more mechanistically establish a causal link between the neural dynamics and behavior, lesion or brain stimulation studies are necessary. We have now acknowledged this in the revised Discussion as we note: “Although our computational methods suggest causal influences, direct causal manipulations, such as targeted brain stimulation during memory tasks, are needed to establish definitive causal relationships between network nodes.” (page 19). 

      Minor additional information that would be helpful to the reader to include: 

      (1.8) How exactly was line noise (p. 24) removed? (For example, if notch filtered, how were slight offsets of the line noise from exactly 60.0Hz and harmonics identified and handled?). 

      We would like to clarify here that to filter line noise and its harmonics, we used bandstop filters at 57-63 Hz, 117-123 Hz, and 177-183 Hz. To create a band-stop filter, we used a fourth order two-way zero phase lag Butterworth filter. This information has now been included in the revised Methods (page 26). 

      (1.9) Why were the alpha and beta bands collapsed for narrowband filtering?

      Please note that we did not combine the alpha (8-12 Hz) and beta (12-30 Hz) bands for narrowband filtering, rather these two frequency bands were analyzed separately. However, we combined the delta (0.5-4 Hz) and theta (4-8 Hz) frequency bands into a combined delta-theta (0.5-8 Hz) frequency band for our analysis since previous human electrophysiology studies have not settled on a specific band of frequency (delta or theta) for memory processing. Previous human iEEG (Ekstrom et al., 2005; Ekstrom & Watrous, 2014; Engel & Fries, 2010; Gonzalez et al., 2015; Watrous, Tandon, Conner, Pieters, & Ekstrom, 2013) as well as scalp EEG and MEG studies, have shown that both the delta and theta frequency band oscillations play a prominent role for human memory encoding as well as retrieval (Backus, Schoffelen, Szebényi, Hanslmayr, & Doeller, 2016; Clouter, Shapiro, & Hanslmayr, 2017; Griffiths, Martín-Buro, Staresina, & Hanslmayr, 2021; Guderian & Düzel, 2005; Guderian, Schott, Richardson-Klavehn, & Düzel, 2009).  

      Reviewer 2:

      In this study, the authors leverage a large public dataset of intracranial EEG (the University of Pennsylvania RAM repository) to examine electrophysiologic network dynamics involving the participation of salience, frontoparietal, and default mode networks in the completion of several episodic memory tasks. They do this through a focus on the anterior insula (AI; salience network), which they hypothesize may help switch engagement between the DMN and FPN in concert with task demands. By analyzing high-gamma spectral power and phase transfer entropy (PTE; a putative measure of information "flow"), they show that the AI shows higher directed PTE towards nodes of both the DMN and FPN, during encoding and recall, across multiple tasks. They further demonstrate that high-gamma power in the PCC/precuneus is decreased relative to the AI during memory encoding. They interpret these results as evidence of "triple-network" control processes in memory tasks, governed by a key role of the AI. 

      I commend the authors on leveraging this large public dataset to help contextualize network models of brain function with electrophysiological mechanisms - a key problem in much of the fMRI literature. I also appreciate that the authors emphasized replicability across multiple memory tasks, in an effort to demonstrate conserved or fundamental mechanisms that support a diversity of cognitive processes. However, I believe that their strong claims regarding causal influences within circumscribed brain networks cannot be supported by the evidence as presented. In my efforts to clearly communicate these inadequacies, I will suggest several potential analyses for the authors to consider that might better link the data to their central hypotheses.

      We thank the reviewer for the encouraging comments and suggestions for improving the manuscript. Please see our detailed responses and clarifications below. 

      (2.1) As a general principle, the effects that the authors show - both in regards to their highgamma power analysis and PTE analysis - do not offer sufficient specificity for a reader to understand whether these are general effects that may be repeated throughout the brain, or whether they reflect unique activity to the networks/regions that are laid out in the Introduction's hypothesis. This lack of specificity manifests in several ways, and is best communicated through examples of control analyses. 

      We appreciate the reviewer's insightful comment regarding the specificity of our findings. We agree that additional analyses could provide valuable context for interpreting our results. In response, we have conducted the following additional analyses and made corresponding revisions to the manuscript:

      Following the reviewer's suggestion, we have selected the inferior frontal gyrus (IFG, BA 44) as a control region. The IFG serves as an ideal control region due to its anatomical adjacency to the AI, its involvement in a wide range of cognitive control functions including response inhibition (Cai, Ryali, Chen, Li, & Menon, 2014), and its frequent co-activation with the AI in fMRI studies. Furthermore, the IFG has been associated with controlled retrieval of memory (Badre et al., 2005; Badre & Wagner, 2007; Wagner et al., 2001), making it a compelling region for comparison. We repeated our PTE analysis using the IFG as the source region, comparing its directed influence on the DMN and FPN nodes to that of the AI.  

      Our analysis revealed a striking contrast between the AI and IFG in their patterns of directed information flow. While the AI exhibited strong causal influences on both the DMN and FPN, the IFG showed the opposite pattern. Specifically, both the DMN and FPN demonstrated higher influence on the IFG than the reverse, during both encoding and recall periods, and across all four memory experiments (Figures S4, S5). 

      These findings highlight the unique role of the AI in orchestrating large-scale network dynamics during memory processes. The AI's pattern of directed information flow stands in contrast to that of the IFG, despite their anatomical proximity and shared involvement in cognitive control processes. This dissociation underscores the specificity of the AI's function in coordinating network interactions during memory formation and retrieval. These results have now been included in our revised Results on page 11.  

      (2.2) First, the PTE analysis is focused solely on the AI's interactions with nodes of the DMN and FPN; while it makes sense to focus on this putative "switch" region, the fact that the authors report significant PTE from the AI to nodes of both networks, in encoding and retrieval, across all tasks and (crucially) also at baseline, raises questions about the meaningfulness of this statistic. One way to address this concern would be to select a control region that would be expected to have little/no directed causal influence on these networks and repeat the analysis. Alternatively (or additionally), the authors could examine the time course of PTE as it evolves throughout an encoding/retrieval interval, and relate that to the timing of behavioral events or changes in high-gamma power. This would directly address an important idea raised in their own Discussion, "the AI is wellpositioned to dynamically engage and disengage with other brain areas."  

      Please see Response 2.1 above for additional analyses related to control region.  

      We also appreciate the reviewer's suggestion regarding time-resolved PTE analysis. However, it's important to note that our current methodology does not allow for such fine-grained temporal analysis. This is due to the fact that PTE, which is an information theoretic measure and relies on constructing histograms of occurrences of singles, pairs, or triplets of instantaneous phase estimates from the phase time-series (Hillebrand et al., 2016) (Methods), requires sufficient number of cycles in the phase time-series for its reliable estimation (Lobier et al., 2014). PTE is based on estimating the time-delayed directed influences from one time-series to the other and its estimate is the most accurate when a large number of time-points (cycles) are available (Lobier et al., 2014). Since our encoding and recall epochs in the verbal recall tasks were only 1.6 seconds long, which corresponds to only 800 time-points with a 500 Hz sampling rate, we used the entire encoding and recall epochs for the most efficient estimate of PTE, rather than estimating PTE in a time-resolved manner. Please note that this is consistent with previous literature which have used ~ 225000 time-points (3 minutes of resting-state data with 1250 Hz sampling rate) for estimating PTE, for example, see (Hillebrand et al., 2016). 

      This limitation prevents us from examining how directed connectivity evolves throughout the encoding and retrieval intervals on a moment-to-moment basis. Future studies employing longer task epochs or alternative methods for time-resolved connectivity analysis could provide valuable insights into the dynamic engagement and disengagement of the AI with other brain areas based on task demands. Such analyses could potentially reveal task-specific temporal patterns in the AI's influence on DMN and FPN nodes during different phases of memory processing.

      Finally, it is crucial to note that in the three verbal tasks, our analysis of memory recall is timelocked to word production onset. However, the precise timing of the internal retrieval process initiation is unknown. This limitation may affect our ability to capture the full dynamics of network interactions during recall, particularly in the early stages of memory retrieval. Interestingly, in the spatial memory task, where this timing issue is less problematic due to the nature of the task, we observe that the PCC/precuneus shows an earlier onset of activity compared to the AI. This process is aligned with the view that DMN nodes may trigger internal recall processes, the full extent and replication of which across verbal and spatial tasks could not be examined in this study.  

      We have added a discussion of these limitations and future directions to the manuscript to provide a more nuanced interpretation of our findings and to highlight important areas for further investigation (page 19). 

      (2.3) Second, the authors state that high-gamma suppression in the PCC/precuneus relative to the AI is an anatomically specific signature that is not present in the FPN. This claim does not seem to be supported by their own evidence as presented in the Supplemental Data (Figures S2 and S3), which to my eye show clear evidence of relative suppression in the MFG and dPPC (e.g. S2a and S3a, most notably) which are notated as "significant" with green bars. I appreciate that the magnitude of this effect may be greater in the PCC/precuneus, but if this is the claim it should be supported by appropriate statistics and interpretation.  

      We thank the reviewer for raising this point. We have now directly compared the high-gamma power of the PCC/precuneus with the dPPC and MFG nodes of the FPN and we note that the suppression effects of the PCC/precuneus are stronger compared to those of the dPPC and MFG during memory encoding (Figures S8, S9). 

      (2.4) I commend the authors on emphasizing replicability, but I found their Bayes Factor (BF) analysis to be difficult to interpret and qualitatively inconsistent with the results that they show. For example, the authors state that BF analysis demonstrates "high replicability" of the gamma suppression effect in Figure 3a with that of 3c and 3d. While it does appear that significant effects exist across all three tasks, the temporal structure of high gamma signals appears markedly different between the two in ways that may be biologically meaningful. Moreover, it appears that the BF analysis did not support replicability between VFR and CATVFR, which is very surprising; these are essentially the same tasks (merely differing in the presence of word categories) and would be expected to have the highest degree of concordance, not the lowest. I would suggest the authors try to analytically or conceptually reconcile this surprising finding. 

      We appreciate the reviewer's commendation on our emphasis on replicability and thank the reviewer for the opportunity to provide clarification.

      First, we would like to clarify the nature of our BF analysis. Bayes factors are calculated as the ratio of the marginal likelihood of the replication data, given the posterior distribution estimated from the original data, and the marginal likelihood for the replication data under the null hypothesis of no effect (Ly, Etz, Marsman, & Wagenmakers, 2019). Specifically, BFs use the posterior distribution from the first experiment as a prior distribution for the replication test of the second experiment to constitute a joint multivariate distribution (i.e., the additional evidence for the alternative hypothesis given what was already observed in the original study) and this joint distribution is dependent on the similarity between the two experiments (Ly et al., 2019).  This analysis revealed that PCC/precuneus suppression, in comparison to the AI during memory encoding, observed in the VFR during memory encoding was detected in two other tasks, PALVCR, and WMSM with high BFs. In the CATVFR task, although there were short time periods of PCC/precuneus suppression (Figure 3), the effects were not strong enough like the three other tasks.  

      Regarding the high-gamma suppression effect, our BF analysis indeed supports replicability across the VFR, PALVCR, and WMSM tasks. While we agree with the reviewer that the temporal structure of high-gamma signals appears different across tasks, the BF analysis focuses on the overall presence of the suppression effect rather than its precise temporal profile. The high BFs indicate that the core finding - PCC/precuneus suppression relative to the AI during memory encoding - is replicated across these tasks, despite differences in the timing of this suppression. Moreover, at no time point did responses in the PCC/precuneus exceed that of the AI in any of the four memory encoding tasks. 

      The reason for differences in temporal profiles is not clear. While VFR and CATVFR are similar, the addition of categorical structure in CATVFR may have introduced cognitive processes that alter the temporal dynamics of regional responses. Moreover, differences in electrode placements across participants in each experiment may also have contributed to variability in the observed effects. Further studies using within-subjects experimental designs are needed to address this. 

      We have updated our Results and Discussion sections to reflect these points and to provide a more nuanced interpretation of the replicability across tasks.  

      (2.5) To aid in interpretability, it would be extremely helpful for the authors to assess acrosstask similarity in high-gamma power on a within-subject basis, which they are wellpowered to do. For example, could they report the correlation coefficient between HGP timecourses in paired-associates versus free-recall tasks, to better establish whether these effects are consistent on a within-subject basis? This idea could similarly be extended to the PTE analysis. Across-subject correlations would also be a welcome analysis that may provide readers with better-contextualized effect sizes than the output of a Bayes Factor analysis.  

      We thank the reviewer for this suggestion. However, a within-subject analysis was not possible because very few participants participated in multiple memory tasks. 

      For example, for the AI-PCC/Pr analysis, only 1 individual participated in both the VFR and PALVCR tasks (Tables S2a, S2c). Similarly, for AI-mPFC analysis, only 3 subjects participated in both the VFR and PALVCR tasks (Tables S2a, S2c).  

      Due to these small sample sizes, it was not feasible for us to assess replicability across tasks on a within-subject basis in our dataset. Therefore, for all our analysis, we have pooled electrode pairs across subjects and then subjected these to a linear mixed effects modeling framework for assessing significance and then subsequently assessing replicability of these effects using the Bayes factor (BF) framework.    

      Recommendations For The Authors: 

      (2.6) I would emphasize manuscript organization in a potential rewrite; it was very difficult to follow which analyses were attempting to show a contrast between effects versus a similarity between effects. Results were grouped by the underlying experimental conditions (e.g. encoding/recall, network identity, etc.) but may be better grouped according to the actual effects that were found. 

      We thank the reviewer for this suggestion. We considered this possibility, but we feel that the Results section is best organized in the order of the hypotheses we set out to test, starting from analyzing local brain activity using high-gamma power analysis, and then results related to analyzing brain connectivity using PTE. All these results are systematically ordered by presenting results related to encoding first and then the recall periods as they appear sequentially in our task-design, presenting the results related to the VFR task first and then demonstrating replicability of the results in the three other experiments. Results are furthermore arranged by nodes, where we first discuss results related to the DMN nodes, and then the same for the FPN nodes. This is to ensure systematic, unbiased organization of all our results for the readers to clearly follow the hypotheses, statistical analyses, and the brain regions considered. Therefore, for transparency and ethical reasons, we would respectfully like to present our results as they appear in our current manuscript, rather than presenting the results based on effect sizes. 

      However, please note that we indeed have ordered our results in the Discussion section based on actual effects, as suggested by the reviewer.  

      (2.7) The absence of a PTE effect when analyzing through the lens of successful vs. unsuccessful memory is an important limitation of the current study and a significant departure from the wealth of subsequent memory effects reported in the literature (which the authors have already done a good job citing, e.g. Burke et al. 2014 Neuroimage). I'm glad that the authors raised this in their Discussion, but it is important that the results of such an analysis actually be shown in the manuscript. 

      We thank the reviewer for this suggestion. We have now included the results related to PTE dynamics for successful vs. unsuccessful memory trials in the revised Results section as we note on page 12: 

      “Differential information flow from the AI to the DMN and FPN for successfully recalled and forgotten memory trials 

      We examined memory effects by comparing PTE between successfully recalled and forgotten memory trials. However, this analysis did not reveal differences in directed influence from the AI on the DMN and FPN or the reverse, between successfully recalled and forgotten memory trials during the encoding as well as recall periods in any of the memory experiments (all ps>0.05).”

      (2.8) I believe the claims of causality through the use of the PTE are overstated throughout the manuscript and may contribute to further confusion in the literature regarding how causality in the brain can actually be understood. See Mehler and Kording, 2018 arXiv for an excellent discussion on the topic (https://arxiv.org/abs/1812.03363). My recommendation would be to significantly tone down claims that PTE reflects causal interactions in the brain. 

      We thank the reviewer for this suggestion. We would like to clarify here that we define causality in our manuscript as follows: a brain region has a causal influence on a target if knowing the past history of temporal signals in both regions improves the ability to predict the target's signal in comparison to knowing only the target's past, as defined in earlier studies (Granger, 1969; Lobier et al., 2014). We have now included this clarification in the Introduction section (page 6).  

      We also agree with the reviewer that to more mechanistically establish a causal link between the neural dynamics and behavior, lesion or brain stimulation studies are necessary. We have now acknowledged this in the revised Discussion as we note: “Although our computational methods suggest causal influences, direct causal manipulations, such as targeted brain stimulation during memory tasks, are needed to establish definitive causal relationships between network nodes.” (page 19). 

      Finally, we have now significantly toned down our claims that PTE reflects causal interactions in the brain, in the Introduction, Results, and Discussion sections of our revised manuscript.  

      (2.9) Relatedly, it may be useful for the authors to consider a supplemental analysis that uses classic measures of inter-regional synchronization, e.g. the PLV, and compare to their PTE findings. They cite literature to suggest a metric like the PTE may be useful, but this hardly rules out the potential utility of investigating narrowband phase synchronization. 

      We thank the reviewer for this suggestion. We have now run new analyses based on PLV to examine phase synchronization between the AI and the DMN and FPN. However, we did not find a significant PLV for the interactions between the AI and DMN and FPN nodes for the different task periods compared to the resting baselines, as we note on page 13: 

      “Narrowband phase synchronization between the AI and the DMN and FPN during encoding and recall compared to resting baseline  

      We next directly compared the phase locking values (PLVs) (see Methods for details) between the AI and the PCC/precuneus and mPFC nodes of the DMN and also the dPPC and MFG nodes of the FPN for the encoding and the recall periods compared to resting baseline. However, narrowband PLV values did not significantly differ between the encoding/recall vs. rest periods, in any of the delta-theta (0.5-8 Hz), alpha (8-12 Hz), beta (12-30 Hz), gamma (30-80 Hz), and high-gamma (80-160 Hz) frequency bands. These results indicate that PTE, rather than phase synchronization, more robustly captures the AI dynamic interactions with the DMN and the FPN.” 

      Please note that phase locking measures such as the PLV or coherence do not probe directed causal influences and cannot address how one region drives another. Instead, our study examined the direction of information flow between the AI and the DMN and FPN using robust estimators of the direction of information flow. PTE assesses with the ability of one time-series to predict future values of other time-series, thus estimating the time-delayed causal influences between the two time-series, whereas PLV or coherence can only estimate “instantaneous” phase synchronization, but not predict the future time-series. 

      Additionally, please note that the directed information flow from the AI to both the DMN and FPN were enhanced during the encoding and recall periods compared to resting state across all four experiments, in a new set of analyses that we have carried out in our revised manuscript. Specifically, we have now carried out our task versus rest comparison by using resting baseline epochs before the start of the entire session of the task periods, rather than our previously used rest epochs which were in between the task periods. These new results have now been included in the revised Results as we note on page 12:  

      “Enhanced information flow from the AI to the DMN and FPN during episodic memory processing, compared to resting-state baseline

      We next examined whether directed information flow from the AI to the DMN and FPN nodes during the memory tasks differed from the resting-state baseline. Resting-state baselines were extracted immediately before the start of the task sessions and the duration of task and rest epochs were matched to ensure that differences in network dynamics could not be explained by differences in duration of the epochs. Directed information flow from the AI to both the DMN and FPN were higher during both the memory encoding and recall phases and across the four experiments, compared to baseline in all but two cases (Figures S6, S7). These findings provide strong evidence for enhanced role of AI directed information flow to the DMN and FPN during memory processing compared to the resting state.”  

      References 

      Backus, A. R., Schoffelen, J. M., Szebényi, S., Hanslmayr, S., & Doeller, C. F. (2016). Hippocampal-Prefrontal Theta Oscillations Support Memory Integration. Curr Biol, 26(4), 450-457. doi:10.1016/j.cub.2015.12.048

      Bastos, A. M., Vezoli, J., Bosman, C. A., Schoffelen, J. M., Oostenveld, R., Dowdall, J. R., . . . Fries, P. (2015). Visual areas exert feedforward and feedback influences through distinct frequency channels. Neuron, 85(2), 390-401. doi:10.1016/j.neuron.2014.12.018

      Canolty, R. T., Edwards, E., Dalal, S. S., Soltani, M., Nagarajan, S. S., Kirsch, H. E., . . . Knight, R. T. (2006). High gamma power is phase-locked to theta oscillations in human neocortex. Science, 313(5793), 1626-1628. doi:10.1126/science.1128115

      Canolty, R. T., & Knight, R. T. (2010). The functional role of cross-frequency coupling. Trends Cogn Sci, 14(11), 506-515. doi:10.1016/j.tics.2010.09.001

      Chen, L., Wassermann, D., Abrams, D. A., Kochalka, J., Gallardo-Diez, G., & Menon, V. (2019). The visual word form area (VWFA) is part of both language and attention circuitry. Nat Commun, 10(1), 5601. doi:10.1038/s41467-019-13634-z

      Clouter, A., Shapiro, K. L., & Hanslmayr, S. (2017). Theta Phase Synchronization Is the Glue that Binds Human Associative Memory. Curr Biol, 27(20), 3143-3148.e3146. doi:10.1016/j.cub.2017.09.001

      Crone, N. E., Boatman, D., Gordon, B., & Hao, L. (2001). Induced electrocorticographic gamma activity during auditory perception. Brazier Award-winning article, 2001. Clin Neurophysiol, 112(4), 565-582. doi:10.1016/s1388-2457(00)00545-9

      Daitch, A. L., & Parvizi, J. (2018). Spatial and temporal heterogeneity of neural responses in human posteromedial cortex. Proc Natl Acad Sci U S A, 115(18), 4785-4790. doi:10.1073/pnas.1721714115

      Das, A., de Los Angeles, C., & Menon, V. (2022). Electrophysiological foundations of the human default-mode network revealed by intracranial-EEG recordings during restingstate and cognition. Neuroimage, 250, 118927. doi:10.1016/j.neuroimage.2022.118927

      Das, A., & Menon, V. (2020). Spatiotemporal Integrity and Spontaneous Nonlinear Dynamic Properties of the Salience Network Revealed by Human Intracranial Electrophysiology: A Multicohort Replication. Cereb Cortex, 30(10), 5309-5321. doi:10.1093/cercor/bhaa111

      Das, A., & Menon, V. (2021). Asymmetric Frequency-Specific Feedforward and Feedback Information Flow between Hippocampus and Prefrontal Cortex during Verbal Memory Encoding and Recall. J Neurosci, 41(40), 8427-8440. doi:10.1523/jneurosci.080221.2021

      Das, A., & Menon, V. (2022). Replicable patterns of causal information flow between hippocampus and prefrontal cortex during spatial navigation and spatial-verbal memory formation. Cereb Cortex, 32(23), 5343-5361. doi:10.1093/cercor/bhac018

      Das, A., & Menon, V. (2023). Concurrent- and after-effects of medial temporal lobe stimulation on directed information flow to and from prefrontal and parietal cortices during memory formation. J Neurosci, 43(17), 3159-3175. doi:10.1523/jneurosci.1728-22.2023

      Edwards, E., Soltani, M., Deouell, L. Y., Berger, M. S., & Knight, R. T. (2005). High gamma activity in response to deviant auditory stimuli recorded directly from human cortex. J Neurophysiol, 94(6), 4269-4280. doi:10.1152/jn.00324.2005

      Ekstrom, A. D., Caplan, J. B., Ho, E., Shattuck, K., Fried, I., & Kahana, M. J. (2005). Human hippocampal theta activity during virtual navigation. Hippocampus, 15(7), 881-889. doi:10.1002/hipo.20109

      Ekstrom, A. D., & Watrous, A. J. (2014). Multifaceted roles for low-frequency oscillations in bottom-up and top-down processing during navigation and memory. Neuroimage, 85 Pt 2, 667-677. doi:10.1016/j.neuroimage.2013.06.049

      Engel, A. K., & Fries, P. (2010). Beta-band oscillations--signalling the status quo? Curr Opin Neurobiol, 20(2), 156-165. doi:10.1016/j.conb.2010.02.015

      Foster, B. L., Rangarajan, V., Shirer, W. R., & Parvizi, J. (2015). Intrinsic and task-dependent coupling of neuronal population activity in human parietal cortex. Neuron, 86(2), 578590. doi:10.1016/j.neuron.2015.03.018

      Gonzalez, A., Hutchinson, J. B., Uncapher, M. R., Chen, J., LaRocque, K. F., Foster, B. L., . . . Wagner, A. D. (2015). Electrocorticography reveals the temporal dynamics of posterior parietal cortical activity during recognition memory decisions. Proc Natl Acad Sci U S A, 112(35), 11066-11071. doi:10.1073/pnas.1510749112

      Granger, C. W. J. (1969). Investigating Causal Relations by Econometric Models and Crossspectral Methods. Econometrica, 37(3), 424-438. doi:10.2307/1912791

      Griffiths, B. J., Martín-Buro, M. C., Staresina, B. P., & Hanslmayr, S. (2021). Disentangling neocortical alpha/beta and hippocampal theta/gamma oscillations in human episodic memory formation. Neuroimage, 242, 118454. doi:10.1016/j.neuroimage.2021.118454

      Guderian, S., & Düzel, E. (2005). Induced theta oscillations mediate large-scale synchrony with mediotemporal areas during recollection in humans. Hippocampus, 15(7), 901-912. doi:10.1002/hipo.20125

      Guderian, S., Schott, B. H., Richardson-Klavehn, A., & Düzel, E. (2009). Medial temporal theta state before an event predicts episodic encoding success in humans. Proc Natl Acad Sci U S A, 106(13), 5365-5370. doi:10.1073/pnas.0900289106

      Helfrich, R. F., & Knight, R. T. (2016). Oscillatory Dynamics of Prefrontal Cognitive Control. Trends Cogn Sci, 20(12), 916-930. doi:10.1016/j.tics.2016.09.007

      Hillebrand, A., Tewarie, P., van Dellen, E., Yu, M., Carbo, E. W., Douw, L., . . . Stam, C. J. (2016). Direction of information flow in large-scale resting-state networks is frequencydependent. Proc Natl Acad Sci U S A, 113(14), 3867-3872. doi:10.1073/pnas.1515657113

      Jeffreys, H. (1998). The Theory of Probability (3rd ed.). Oxford, England: Oxford University Press.

      Kwon, H., Kronemer, S. I., Christison-Lagay, K. L., Khalaf, A., Li, J., Ding, J. Z., . . . Blumenfeld, H. (2021). Early cortical signals in visual stimulus detection. Neuroimage, 244, 118608. doi:10.1016/j.neuroimage.2021.118608

      Lachaux, J. P., George, N., Tallon-Baudry, C., Martinerie, J., Hugueville, L., Minotti, L., . . . Renault, B. (2005). The many faces of the gamma band response to complex visual stimuli. Neuroimage, 25(2), 491-501. doi:10.1016/j.neuroimage.2004.11.052

      Lobier, M., Siebenhühner, F., Palva, S., & Matias, P. J. (2014). Phase transfer entropy: A novel phase-based measure for directed connectivity in networks coupled by oscillatory interactions. Neuroimage, 85, 853-872. doi:10.1016/j.neuroimage.2013.08.056

      Ly, A., Etz, A., Marsman, M., & Wagenmakers, E. J. (2019). Replication Bayes factors from evidence updating. Behav Res Methods, 51(6), 2498-2508. doi:10.3758/s13428-0181092-x

      Mainy, N., Kahane, P., Minotti, L., Hoffmann, D., Bertrand, O., & Lachaux, J. P. (2007). Neural correlates of consolidation in working memory. Hum Brain Mapp, 28(3), 183-193. doi:10.1002/hbm.20264

      Miller, K. J., Leuthardt, E. C., Schalk, G., Rao, R. P., Anderson, N. R., Moran, D. W., . . . Ojemann, J. G. (2007). Spectral changes in cortical surface potentials during motor movement. J Neurosci, 27(9), 2424-2432. doi:10.1523/jneurosci.3886-06.2007

      Miller, K. J., Weaver, K. E., & Ojemann, J. G. (2009). Direct electrophysiological measurement of human default network areas. Proceedings of the National Academy of Sciences, 106(29), 12174-12177. doi:10.1073/pnas.0902071106

      Nuzzo, R. L. (2017). An Introduction to Bayesian Data Analysis for Correlations. Pm r, 9(12), 1278-1282. doi:10.1016/j.pmrj.2017.11.003

      Ray, S., Crone, N. E., Niebur, E., Franaszczuk, P. J., & Hsiao, S. S. (2008). Neural correlates of high-gamma oscillations (60-200 Hz) in macaque local field potentials and their potential implications in electrocorticography. J Neurosci, 28(45), 11526-11536. doi:10.1523/jneurosci.2848-08.2008

      Sederberg, P. B., Schulze-Bonhage, A., Madsen, J. R., Bromfield, E. B., Litt, B., Brandt, A., & Kahana, M. J. (2007). Gamma oscillations distinguish true from false memories. Psychol Sci, 18(11), 927-932. doi:10.1111/j.1467-9280.2007.02003.x

      Tallon-Baudry, C., Bertrand, O., Hénaff, M. A., Isnard, J., & Fischer, C. (2005). Attention modulates gamma-band oscillations differently in the human lateral occipital cortex and fusiform gyrus. Cereb Cortex, 15(5), 654-662. doi:10.1093/cercor/bhh167

      Watrous, A. J., Tandon, N., Conner, C. R., Pieters, T., & Ekstrom, A. D. (2013). Frequencyspecific network connectivity increases underlie accurate spatiotemporal memory retrieval. Nature Neuroscience, 16(3), 349-356. doi:10.1038/nn.3315

      Wetzels, R., & Wagenmakers, E. J. (2012). A default Bayesian hypothesis test for correlations and partial correlations. Psychon Bull Rev, 19(6), 1057-1064. doi:10.3758/s13423-0120295-x

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      (1) The introduction includes the following sentence: "CDH1 interacts with the APC/C during G1 and S-phase. On entering mitosis, CDK and polo kinases phosphorylate the APC/C and CDH1 to effect switching to CDC20." In fact, CDH1 is inactivated from late G1 to mid-mitosis as a result of phosphorylation by G1/S, S phase, and mitotic Cdk-cyclin complexes. Phosphorylation of the APC/C, not inactivation of CDH1, enables the switch to CDC20. 

      Thank you for this. We have corrected the text and include the Zachariae et al (1998) reference.

      (2) Supplementary Table 1 provides a long list of APC/C sites phosphorylated in vitro by Cdk2-cyclin A-Cks2 and Plk1 (note that the main text states only Cdk2-cyclin A). It seems likely that the high amount of kinase in these reactions has led to minor levels of phosphorylation at many of these sites. Although I realize that these results are peripheral to the central findings of the paper, it would be helpful to see confidence scores or other evidence of significance for the indicated phosphopeptides. Perhaps the Cdk consensus sites could be marked on the table in some way, and a description of the MS methods could be provided in the Methods section. 

      We have implemented this useful suggestion to highlight the Cdk consensus sites. Unfortunately we don’t have confidence scores of significance of the indicated phosphopeptides.

      Reviewer #2 (Recommendations For The Authors): 

      (1) My only real concern is with the phosphorylated APC/C structure. The authors provide a table that lists a bunch of phosphorylation sites detected before and after in vitro phosphorylation by purified kinases, and in the purified protein gels, some mobility shifts that would be consistent with significant phosphorylation are observed for some of the subunits. However, the mass spec data are non-quantitative. It would be more useful to provide estimated stoichiometries for the various phosphorylation sites to help support the expectation that the complex is heavily phosphorylated and that the structure presented actually represents hyperphosphorylated APC/C. No evidence of phosphorylated amino acids is noted in the cryo-EM structures, presumably because resolution is not high enough and/or there is too much flexibility in these areas. Given that hyperphosphorylation does not affect enzymatic activity and has very little impact on the complex structure, it seems important to provide readers with additional confidence that the complex is indeed heavily phosphorylated and that the complex isolated from insect cells is not heavily phosphorylated. Since the complex is purified from a eukaryotic expression system it seems formally possible that key phosphorylation sites could already be present due to the activity of endogenous Cdk or other kinases in insect cells and, indeed, quite a few sites are noted to exist even without in vitro phosphorylation. Providing stoichiometry of these sites might help address the likelihood that key regulatory sites are already occupied upon purification. It might at least be worth addressing this in the text. 

      The suggestion to comment on the level of phosphorylation of the ‘unphosphorylated’ and Cdk-treated ‘phosphorylated’ APC/C is an excellent idea. The text has been modified on page 9 to include such a discussion. Unfortunately we don’t have quantification of the stoichiometry of the phospho-sites.

      (2) On a minor note, in the results text the authors mention that cyclin A-Cdk2 is used for in vitro phosphorylation, but in the methods, it states both cyclin A-Cdk2 and Plk1 are used. This should be edited for consistency. 

      Thank you for noticing this. Now corrected.

      (3) Another minor issue - the authors state in the introduction (third paragraph) that "CDH1 interacts with the APC/C during G1 and S-phase". Actually, Cdh1 becomes phosphorylated and APC/CCdh1 inactivated at S-phase onset, both in S. cerevisiae and humans. In fact, phosphorylation of Cdh1 is an important driver of the irreversible transition from G1 to S-phase. This statement should be corrected. 

      Thank you for noticing this. This error has been corrected and include the Zachariae et al (1998) reference.

      Reviewer #3 (Recommendations For The Authors):

      To be addressed in a revised manuscript: 

      (1) The authors should cite and discuss Cole Ferguson et al., Mol Cell 2022. This study describes the loss of APC7 in a human disease and provides a detailed structural and biochemical examination of the effects of APC7 loss on human APC/C. Given that much of our understanding of APC7 comes from this work, it should be highlighted in the introduction and discussed in depth in light of the new work on S. cerevisiae APC/C. 

      Thank you for mentioning this interesting paper. We discuss its main findings in the ‘Discussion’. Given the paper shows that deletion of APC7 has no discernible effect on the stability of human APC/C, we have deleted the discussion that APC7 stabilises human APC/C analogous to the stabilisation conferred on S. cerevisiae APC/C by APC9.

      (2) There are multiple cases in the manuscript where the text was referring to the human complex but APC/CCdh1:Hsl1 was written, including labeling of Figure 4b. It would be useful to consider nomenclature considering that Hsl1 is a yeast protein. 

      Thank you for noticing this. We mistakenly wrote ‘Hsl1’ instead of ‘Emi1’. Now corrected.

      (3) The authors should tone down claims regarding their discoveries absence of APC7 in S. cerevisiae. The absence of APC7 has been known for nearly two decades and the authors confirm this {Pan et al.l 2007, Journal of Cell Science) and then show the structure. 

      We agree with this as explained in response to point 1.

      (4) On page 7, the authors are writing about the four helices mediating the APC/C-CDH1 interactions but list only 3. 

      We have revised the sentence to clarify this point.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Thank you for your careful reviews of our manuscript. This revision is mainly aimed at addressing some minor errors in the text, English writing, grammar, etc. The details are as follows:

      (1) We added the information for the sntB-HA strain in table 1.

      (2) We added the primer information for the construction of sntB-HA strain in table 2.

      (3) Some errors in English writing, grammar. Please see the revised manuscript with markers.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The study used root tips from semi-hydroponic tea seedlings. The strategy followed sequential steps to draw partial conclusions.

      Initially, protoplasts obtained from root tips were processed for scRNA-seq using the 10x Genomics platform. The sequencing data underwent pre-filtering at both the cell and gene levels, leading to 10,435 cells. These cells were then classified into eight clusters using t-SNE algorithms. The present study scrutinised cell typification through protein sequence similarity analysis of homologs of cell type marker genes. The analysis was conducted to ensure accuracy using validated genes from previous scRNA-seq studies and the model plant Arabidopsis thaliana. The cluster cell annotation was confirmed using in situ RT-PCR analyses. This methodology provided a comprehensive insight into the cellular differentiation of the sample under study. The identified clusters, spanning 1 to 8, have been accurately classified as xylem, epidermal, stem cell niche, cortex/endodermal, root cap, cambium, phloem, and pericycle cells.

      Then, the authors performed a pseudo-time analysis to validate the cell cluster annotation by examining the differentiation pathways of the root cells. Lastly, they created a differentiation heatmap from the xylem and epidermal cells and identified the biological functions associated with the highly expressed genes.

      Upon thoroughly analysing the scRNA-seq data, the researchers delved into the cell heterogeneity of nitrate and ammonium uptake, transport, and nitrogen assimilation into amino acids. The scRNA-seq data was validated by in situ RT-PCR. It allows the localisation of glutamine and alanine biosynthetic enzymes along the cell clusters and confirms that both constituent the primary amino acid metabolism in the root. Such investigation was deemed necessary due to the paramount importance of these processes in theanine biosynthesis since this molecule is synthesised from glutamine and alanine-derived ethylamine.

      Afterwards, the authors analysed the cell-specific expression patterns of the theanine biosynthesis genes, combining the same molecular tools. They concluded that theanine biosynthesis is more enriched in cluster 8 "pericycle cells" than glutamine biosynthesis (Lines 271-272). However, the statement made in Line 250 states that the highest expression levels of genes responsible for glutamine biosynthesis were observed in Clusters 1, 3, 4, 6, and 8, leading to an unclear conclusion.

      Thank you for your interest in and feedback on the paper. We have made revisions to the manuscript as per your suggestions. We would like to emphasize that the precursors of theanine biosynthesis are alanine-derived ethylamine and glutamate, not glutamine. Furthermore, in terms of the intermediates, only ethylamine is specific to the theanine biosynthetic pathway, as glutamate is the primary product of nitrogen assimilation and serves as a precursor for the biosynthesis of amino acids, proteins, chlorophyll, and many secondary metabolites.

      In this study, we observed a high expression of genes encoding enzymes involved in the glutamate biosynthetic pathway (CsGOGATs and CsGDHs) across all 8 clusters, with particularly strong expression in cluster 1, 3, 4, 6, and 8 (Figure 4D and 5B). However, the gene encoding CsTHSI responsible for catalyzing theanine biosynthesis from glutamate and ethylamine was determined to be more enriched in cluster 8 (Figure 5B and 5C). Therefore, we concluded that theanine biosynthesis was more enriched in cluster 8, whereas glutamate biosynthesis was more broadly active in clusters 1, 3, 4, 6 and 8.

      The regulation of theanine biosynthesis by the MYB transcription factor family is well-established. In particular, CsMYB6, a transcription factor expressed specifically in roots, has been  to promote theanine biosynthesis by binding to the promoter of the TSI gene responsible for theanine synthesis. However, their findings indicate that CsMYB6 expression is present in Cluster 3 (SCN), Cluster 6 (cambium cells), and Cluster 1 (xylem cells) but not in Cluster 8 (pericycle cells), which is known for its high expression of CsTSI. Similarly, their scRNA-seq data indicated that CsMYB40 and CsHHO3, which activate and repress CsAlaDC expression, respectively, did not show high expression in Cluster 1 (the cell cluster with high CsAlaDC expression). Based on these findings, the authors hypothesised that transcription factors and target genes are not necessarily always highly expressed in the same cells. Nonetheless, additional evidence is essential to substantiate this presumption.

      Thank you for your advice. We fully agree that additional evidence is essential to support the presumption that transcription factors and target genes are not always highly expressed in the same cells. Therefore, in this study, we identified another transcription factor, CsLBD37, which was characterized to negatively regulate CsAlaDC expression in response to nitrogen levels. Consistent with our presumption, the expression of CsLBD37 was not enriched in cluster 1, where the expression of CsAlaDC was primarily enriched (Figure 5B and 6D; Line 365).

      To further identify supporting evidence, we also analyzed the expression of some transcription factors and their target genes in the model plant Arabidopsis, using published single cell RNA-seq data (Ryu et al., 2019; Wendrich et al., 2020; Zhang et al., 2019; Denyer et al., 2019; Jean-Baptiste et al. 2019; Shulse et al., 2019; Shahan et al., 2022) and database (Root Cell Atlas, https://rootcellatlas.org/; BAR, https://bar.utoronto.ca/#GeneExpressionAndProteinTools). Similar to the situation in tea plants, the regulators were not exactly the same as the cell types in which their target genes were highly expressed. For example, AtARF7 and AtARF19 were highly expressed in the cortex and stele, respectively, whereas their target genes AtLBD16 and AtLBD29 were highly expressed in endodermal cells (Okushima et al.,2007; Supplemental figure 8B and 8C; Line 312-325 and Line 525-526); AtPHR1 was highly expressed in root epidermal cells and pericyte cells, but its target gene AtF3’H was highly expressed in the cortex and AtRALF23 was highly expressed in xylem cells (Liu et al., 2022; Tang et al., 2022; Supplemental figure 8B and 8C; Line 322-327 and Line 527-530).

      At the same time, we discussed that we cannot rule out the possibility of transcription factors regulating their target genes in the same cell type and both being highly expressed. One of the reasons is that these theanine-associated genes are promiscuous, having many target genes and regulate multiple biological processes in tea plants. We have only shown that high expression in the same cell type is not a necessary condition (Line 534-554). We strongly agree with the reviewer's opinion that more evidence is needed to illustrate this model in the future.

      Reference:

      Denyer, T. et al. (2019). Spatiotemporal developmental trajectories in the arabidopsis root revealed using high-throughput single-cell RNA sequencing. Dev Cell. 48:840-852.e5.

      Liu, Z. et al. (2022). PHR1 positively regulates phosphate starvation-induced anthocyanin accumulation through direct upregulation of genes F3'H and LDOX in Arabidopsis. Planta. 256:42.

      Okushima, Y. et al. (2007). ARF7 and ARF19 regulate lateral root formation via direct activation of LBD/ASL genes in Arabidopsis. Plant Cell. 19:118-30.

      Ryu, K. H., Huang, L., Kang, H. M. & Schiefelbein, J. (2019). Single-cell RNA sequencing resolves molecular relationships among individual plant cells. Plant Physiol. 179:1444-1456.

      Shahan, R. et al. (2022). A single-cell Arabidopsis root atlas reveals developmental trajectories in wild-type and cell identity mutants. Dev Cell. 57:543-560.e9.

      Shulse, C. et al. (2019). High-throughput single-cell transcriptome profiling of plant cell types. Cell Rep. 27:2241-2247.e4.

      Tang, J. et al. (2022). Plant immunity suppression via PHR1-RALF-FERONIA shapes the root microbiome to alleviate phosphate starvation. EMBO J. 41:e109102.

      Wendrich, J.R., et al. (2020). Vascular transcription factors guide plant epidermal responses to limiting phosphate conditions. Science. 370:eaay4970.

      Zhang, T. et al. (2019). A single-cell RNA sequencing profiles the developmental landscape of arabidopsis root. Mol Plant. 12:648-660.

      Lastly, the authors have discovered a novel transcription factor belonging to the Lateral Organ Boundaries Domain (LBD) family known as CsLBD37 that can co-regulate the synthesis of theanine and the development of lateral roots. The authors observed that CsLBD37 is located within the nucleus and can repress the CsAlaDC promoter's activity. To investigate this mechanism further, the authors conducted experiments to determine whether CsLBD37 can inhibit CsAlaDC expression in vivo. They achieved this by creating transiently CsLBD37-silenced or over-expression tea seedlings through antisense oligonucleotide interference and generation of transgenic hairy roots. Based on their findings, the authors hypothesise that CsLBD37 regulates CsAlaDC expression to modulate the synthesis of ethylamine and theanine.

      Additionally, the available literature suggests that the transcription factors belonging to the Lateral Organ Boundaries Domain (LBD) family play a crucial role in regulating the development of lateral roots and secondary root growth. Considering this, they confirmed that pericycle cells exhibit a higher expression of CsLBD37. A recent experiment revealed that overexpression of CsLBD37 in transgenic Arabidopsis thaliana plants led to fewer lateral roots than the wild type. From this observation, the researchers concluded that CsLBD37 regulates lateral root development in tea plants. I respectfully submit that the current conclusion may require additional research before it can be considered definitive.

      Further efforts should be made to investigate the signalling mechanisms that govern CsLBD37 expression to arrive at a more comprehensive understanding of this process. In the context of Arabidopsis lateral root founder cells, the establishment of asymmetry is regulated by LBD16/ASL18 and other related LBD/ASL proteins, as well as the AUXIN RESPONSE FACTORs (ARF7 and ARF19). This is achieved by activating plant-specific transcriptional regulators such as LBD16/ASL18 (Go et al., 2012, https://doi.org/10.1242/dev.071928). On the other hand, other downstream homologues of LBD genes regulated by cytokinin signalling play a role in secondary root growth (Ye et al., 2021, https://doi.org/10.1016/j.cub.2021.05.036). It is imperative to shed light on the hormonal regulation of CsLBD37 expression in order to gain a comprehensive understanding of its involvement in the morphogenic process.

      We are very grateful for your valuable suggestions and we fully agree with you. In an earlier study, we also observed a link between theanine metabolism, hormone metabolism and root development (Chen et al., 2022), but there is still insufficient evidence to fully characterize these links. In the current study, the focus was on the cell-specific theanine biosynthesis, transport and regulation, and we identified that CsLBD37 negatively regulates theanine biosynthesis. However, the upstream regulatory mechanism of CsLBD37 has not been addressed in this study. It is a pertinent question for future investigation as to how CsLBD37 is regulated in root development. We have included the following additional discussion in the revised manuscript: “Besides, it has been reported that LBD family TFs were regulated by, or interacted with, regulators of hormone pathways (e.g., ARFs) to regulate the process of root morphogenesis (Goh et al., 2012; Ye et al., 2021). Based on these findings, we speculated that CsLBD37 is likely regulated by, or interacts with, other proteins to form a complex to regulate root development or theanine biosynthesis.” (Line 573-576). At the same time, we revised the text “These results provided support for a model in which CsLBD37 plays a role in regulating lateral root development in tea plants” to “These findings suggested that CsLBD37 may play a role in regulating lateral root development in tea plant roots” (Line 401-402).

      Reference:

      Chen, T. et al. (2022). Theanine, a tea plant specific non-proteinogenic amino acid, is involved in the regulation of lateral root development in response to nitrogen status. Hortic. Res. 10:uhac267.

      Goh, T., Joi, S., Mimura, T. & Fukaki, H. (2012). The establishment of asymmetry in Arabidopsis lateral root founder cells is regulated by LBD16/ASL18 and related LBD/ASL proteins. Development 139:883-893.

      Ye, L. et al. (2021). Cytokinins initiate secondary growth in the Arabidopsis root through a set of LBD genes. Curr. Biol. 31:3365-3373.e3367.

      Strength:

      The manuscript showcases significant dedication and hard work, resulting in valuable insights that serve as a fundamental basis for generating knowledge. The authors skillfully integrated various tools available for this type of study and meticulously presented and illustrated every step involved in the survey. The overall quality of the work is exceptional, and it would be a valuable addition to any academic or professional setting.

      Weaknesses:

      In its current form, the article presents certain weaknesses that need to be addressed to improve its overall quality. Specifically, the authors' conclusions appear to have been drawn in haste without sufficient experimental data and a comprehensive discussion of the entire plant. It is strongly advised that the authors devote additional effort to resolving the abovementioned issues to bolster the article's credibility and dependability. This will ensure that the article is of the highest quality, providing readers with reliable and trustworthy information.

      Thank you for your feedback. We acknowledge that our experiments and data require further improvement. Currently, the genetic transformation of the tea plant remains a challenge, making it difficult to obtain sufficient in vivo evidence. Despite this situation, we have made every effort to obtain support for our conclusions based on the current situation and available technology. Indeed, additional studies will be performed once the impediment associated with genetic transformation of the tea plant has been resolved.

      Reviewer #2 (Public Review):

      Summary:

      In their manuscript, Lin et al. present a comprehensive single-cell analysis of tea plant roots. They measured the transcriptomes of 10,435 cells from tea plant root tips, leading to the identification and annotation of 8 distinct cell clusters using marker genes. Through this dataset, they delved into the cell-type-specific expression profiles of genes crucial for the biosynthesis, transport, and storage of theanine, revealing potential multicellular compartmentalization in theanine biosynthesis pathways. Furthermore, their findings highlight CsLBD37 as a novel transcription factor with dual regulatory roles in both theanine biosynthesis and lateral root development.

      Strengths:

      This manuscript provides the first single-cell dataset analysis of roots of the tea plants. It also enables detailed analysis of the specific expression patterns of the gene involved in theanine biosynthesis. Some of these gene expression patterns in roots were further validated through in-situ RT-PCR. Additionally, a novel TF gene CsLBD37's role in regulating theanine biosynthesis was identified through their analysis.

      Weaknesses:

      Several issues need to be addressed:<br /> (1) The annotation of single-cell clusters (1-8) in Figure 2 could benefit from further improvement. Currently, the authors utilize several key genes, such as CsAAP1, CsLHW, CsWAT1, CsIRX9, CsWOX5, CsGL3, and CsSCR, to annotate cell types. However, it is notable that some of these genes are expressed in only a limited number of cells within their respective clusters, such as CsAAP1, CsLHW, CsGL3, CsIRX9, and CsWOX5. It would be advisable to utilize other marker genes expressed in a higher percentage of cells or employ a combination of multiple marker genes for more accurate annotation.

      Thank you for your comments. In this study, we first utilized classical marker genes, such as CsWAT1 and CsPP2, to annotate cell types. The expression patterns of these marker genes were confirmed using in situ RT-PCR. Additionally, a combination of multiple marker genes was employed for cell type annotation. We also analyzed the top 10 cluster-enriched genes, in each cluster, and their homologous expression in Arabidopsis, populus, etc., to serve as a reference for cluster annotation (Figure 2D; Supplemental Figures 2-6; Supplemental data 3). Subsequently, differentiation trajectories of root cells were analyzed based on pseudo-time analyses, which aligned well with cell type annotation and further supported the reliability of our annotations through these combined methods.

      (2) Figure 3 could enhance clarity by displaying the trajectory of cell differentiation atop the UMAP, similar to the examples demonstrated by Monocle 3.

      Thanks for this advice. We have supplied the trajectory of cell differentiation atop the UMAP in the revised supplemental figure 7 (Line 185).

      (3) The identification of CsLBD37 primarily relies on bulk RNA-seq data. The manuscript could benefit from elaborating on the role of the single-cell dataset in this context.

      Thanks for your comments. In this study, we determined that CsTSI was highly expressed in cluster 8, but its regulator CsMYB6 was highly expressed in cluster 3, cluster 6 and cluster 1 (Line 301-304). Thus, target genes and their regulators seem not to always be highly expressed in the same cell cluster. A similar situation was also observed in terms of CsAlaDC transcriptional regulation (Line 305-311). Based on these findings, we hypothesized that, for the regulation of theanine biosynthesis, it is not necessary for transcription factors and target genes to always be highly expressed in the same cells. Thus, taking the transcriptional regulation of CsAlaDC as an example, we next analyzed the TFs that were co-expressed with CsAlaDC to test this notion. We used scRNA-seq data to screen for genes that were not highly co-expressed with CsAlaDC, such as CsLBD37, to test our hypothesis (Line 338-340 and Line 365).

      (4) The manuscript's conclusions predominantly rely on the expression patterns of key genes. This reliance might stem from the inherent challenges of tea research, which often faces limitations in exploring molecular mechanisms due to the lack of suitable genetic and molecular methods. The authors may consider discussing this point further in the discussion section.

      Thanks for your suggestions and we totally agree. We discussed this point in the discussion section, “In some non-model plants, including tea, transgenic technologies are not currently available and, hence, we used in situ RNA hybridization to establish the location(s) for gene expression. In some studies, isolation of different cell types was combined with q-RT-PCR to detect cell-type marker gene expression (Wang et al., 2022). However, this approach has two limitations in that it cannot display the gene location directly and has only low resolution”, “After numerous trials, we were able to optimize in situ RT-PCR assays (detailed in the Methods), which enabled a cell-specific characterization of gene expression in tea plant root cells, prior to establishing a genetic transformation system for tea…we note the challenge associated with weak calling of homologous marker genes…” (Line 431-444).

      Reviewer #3 (Public Review):

      Summary:

      Lin et al., performed a scRNA-seq-based study of tea roots, as an example, to elucidate the biosynthesis and regulatory processes for theanine, a root-specific secondary metabolite, and established the first map of tea roots comprised of 8 cell clusters. Their findings contribute to deepening our understanding of the regulation of the synthesis of important flavor substances in tea plant roots. They have presented some innovative ideas.

      It is notable that the authors - based on single-cell analysis results - proposed that TFs and target genes are not necessarily always highly expressed in the same cells. Many of the important TFs they previously identified, along with their target genes (CsTSI or CsAlaDC), were not found in the same cell cluster. Therefore, they proposed a model in which the theanine biosynthesis pathway occurs via multicellular compartmentation and does not require high co-expression levels of transcription factors and their target genes within the same cell cluster. Since it is not known whether the theanine content is absolutely high in the cell cluster 1 containing a high CsAlaDC expression level (due to the lack of cell cluster theanine content determination, which may be a current technical challenge), it is difficult to determine whether this non-coexpressing cell cluster 1 is a precise regulatory mechanism for inhibiting theanine content in plants.

      Thank you for your comments. We concur with your assessment that the accumulation level of the spatial distribution of theanine may affect the expression of these genes. However, as you said, due to some technical limitations, we are not currently in a position to verify this distribution of theanine at the root cell spatial level. The spatial distribution of theanine in the roots can be affected by transport processes. So, it is likely that the cell types in which theanine is distributed do not exactly correspond to the cell types in which theanine is being synthesized (Line 491-493). We will make efforts in this direction to characterize the spatial distribution of theanine using techniques such as spatial metabolome and mass spectrometry imaging in the future (Line 582-586).

      In fact, there are a small number of cells where TFs and CsAlaDC are simultaneously highly expressed, but the quantity is insufficient to form a separate cluster. However, these few cells may be sufficient to meet the current demands for theanine synthesis. This possibility may better align with some previous experiments and validation results in this study. Moreover, I feel that under normal conditions, plants may not mobilize a large number of cells to synthesize a particular substance. Perhaps, cell cluster 1 is actually a type of cell that inhibits the synthesis of theanine, aiming to prevent excessive theanine production? I do not oppose the model proposed by the author, but I feel there is a possibility as I mentioned. If it seems reasonable, the author may consider adding it to an appropriate position in the discussion.

      Thanks a lot for your suggestion. We agree that tea plant roots likely have mechanisms aiming to prevent excessive theanine production.We have improved our discussion according your suggestion. 

      Theanine is the most abundant free amino acid in the tea plant, accounting for 1-2% of leaf dry weight (Line 62-63), and can even reach 4-6% in the root, accounting for more than 60%-80% of the total free amino acids (Yang et al., 2020). This means that theanine biosynthesis indeed requires the root cells to consume significant resources and energy. Thus, theanine biosynthesis needs to be controlled by a series of regulation mechanisms, which would function as a “brake”. In a previous study, we suggested that CsMYB40 and CsHHO3 bound to the CsAlaDC promoter to regulate theanine synthesis, at the transcription level, in “accelerator” or “brake” mode to maintain stable synthesis of theanines (Guo et al., 2022). At a posttranslational level, CsTSI and CsAlaDC are modified by ubiquitination, which is probably involved in the degradation of these proteins in response to N levels (Wang et al., 2021). In the current study, we discovered a novel “brake” in the form of spatial separation. The differential expression of AlaDC and TSI suggests that ethylamine and theanine are synthesized in separate different cell types, allowing cell compartmentalization of the synthetic precursor and the product to form multicellular compartmentation of metabolites (Line 270-280). On the one hand, compartmentalization may effectively prevent interference between secondary metabolic pathways, whereas compartmentalization could also be used as a way of metabolic regulation to avoid excessive, or inhibition of, theanine synthesis (Line 483-488).

      Reference

      Guo, J. et al. (2022). Potential “accelerator” and “brake” regulation of theanine biosynthesis in tea plant (Camellia sinensis). Hortic. Res. 9:uhac169.

      Yang, T. et al. (2020). Transcriptional regulation of amino acid metabolism in response to nitrogen deficiency and nitrogen forms in tea plant root (Camellia sinensis L.). Sci. Rep. 10:6868.

      Wang, Y. et al. (2021). Nitrogen-Regulated Theanine and Flavonoid Biosynthesis in Tea Plant Roots: Protein-Level Regulation Revealed by Multiomics Analyses. J Agric Food Chem. 69:10002-10016.

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      (1) The dataset, including the raw sequencing data and processed files is *.Rdata and should be deposited in a public database for accessibility and reproducibility.

      Thanks for your comments and advice. The raw data and processed files have been submitted to the Gene Expression Omnibus (https://www.ncbi.nlm.nih.gov/geo/) under accession number GSE267845 (Line 763-764).

      (2) Providing the code for the primary analysis steps in a publicly accessible location would facilitate others in replicating the analysis.

      Thank you for your comment. Unfortunately, we have been unable to obtain permission to publicly release a portion of the primary analysis code due to its intellectual property belonging to OE Corporation.

      (3) Enhancements in the writing of the manuscript are recommended for improved clarity and coherence.

      Thanks. We revised our writing to improve the manuscript clarity and coherence.

      Reviewer #3 (Recommendations For The Authors):

      Suggestions for revisions:

      (1) Introduction and Discussion, there are too many paragraphs, even one sentence is a paragraph. I suggest that all the sentences in Introduction be merged into three big paragraphs. For example, lines 30-57 become the first paragraph, lines 58-87 become the second paragraph, lines 88-106 become the third paragraph, and the authors can merge them reasonably according to the content. The discussion part is also suggested to be divided into several paragraphs according to the focus, and perhaps it is more appropriate to give a title to each paragraph.

      Thank you for your comments and suggestions. We have merged several paragraphs and added a title to each paragraph in the Discussion section (“Cell cluster annotation of non-transgenic plants” in line 428; “Nitrogen metabolism and transport of tea plant root at the single cell level” in line 445; “Multicellular compartmentation of theanine metabolism and transport” in line 469; “The regulation of theanine biosynthesis at the single cell level” in line 517; “Cross-talk between theanine metabolism and root development” in line 554).

      (2) Tea is a food, while tea tree is a substance. It should be tea plant root instead of tea root, it is suggested to revise this issue in the whole text.

      Thanks. We corrected “tea root” to “tea plant root” in this manuscript.

      (3) Lines 35-43, this sentence is too long, suggest each example should be one sentence.

      Thanks. We revised this sentence into short sentences. We changed this part to “Root-synthesized flavonoids regulate root tip growth through affecting auxin transport and metabolism (Santelia et al., 2008; Wan et al., 2018). Legume roots secrete flavonoids as signaling agents to attract symbiotic bacteria, such as Rhizobium for nitrogen fixation (Hartman et al., 2017). In Abies nordmanniana, volatile organic compounds (e.g., propanal, g-nonalactone, and dimethyl disulfide) function to recruit certain bacteria or fungi, such as Paenibacillus. Paenibacillus sp. S37 produces high quantities of indole-3-acetic acid that can then promote plant root growth (Garcia-Lemos et al., 2020; Schulz-Bohm et al., 2018).” (Line 35-42)

      (4) Line 510 is missing a reference.

      Thank you - we have added the reference in the revised manuscript (Line 549 and Line 840-842).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This manuscript addresses two main issues:

      (i) do MAPKs play an important role in SAC regulation in single-cell organism such as S. pombe?

      (ii) what is the nature of their involvement and what are their molecular targets?

      The authors have extensively used the cold-sensitive β-tubulin mutant to activate or inactivate SAC employing an arrest-release protocol. Localization of Cdc13 (cyclin B) to the SPBs is used as a readout for the SAC activation or inactivation. The roles of two major MAPK pathways i.e. stress-activated pathway (SAP) and cell integrity pathway (CIP), have been explored in this context (with CIP more extensively than SAP). sty1Δ or pmk1Δ mutants were used to inactivate the SAP or CIP pathways and wis1DD or pek1DD expression was utilized to constitutively activate these pathways, respectively. Lowering of Slp1Cdc20 abundance (by phosphorylation of Slp1-Thr 480) is revealed as the main function of MAPK to augment the robustness of the spindle assembly checkpoint.

      Strengths:

      The experiments are generally well-conducted, and the results support the interpretations in various sections. The experimental data clearly supports some of the key conclusions:

      (1) While inactivation of SAP and CIP compromises SAC-imposed arrest, their constitutive activation delays the release from the SAC-imposed arrest.

      (2) CIP signaling, but not SAP signaling, attenuates Slp1Cdc20 levels.

      (3) Pmk1 and Cdc20 physically interact and Pmk1-docking sequences in Slp1 (PDSS) are identified and confirmed by mutational/substitution experiments.

      (4) Thr480 (and also S76) is identified as the residue phosphorylated by Pmk1. S28 and T31 are identified as Cdk1 phosphorylation sites. These are confirmed by mutational and other related analyses.

      (5) Functional aspects of the phosphorylation sites have been elucidated to some extent: (a) Phosphorylation of Slp1-T480 by Pmk1 reduces its abundance thereby augmenting the SAC-induced arrest; (b) S28, T31 (also S59) are phosphorylated by Cdk1; (c) K472 and K479 residues are involved in ubiquitylation of Slp1.

      Weaknesses:

      (1) Cdc13 localization to SPBs has been used as a readout for SAC activation/inactivation throughout the manuscript. However, the only image showing such localization (Figure 1C) is of poor quality where the Cdc13 localization to SPBs is barely visible. This should be replaced by a better image.

      We have replaced those pictures with a new set of representative images, which show clear presence or absence of SPB-localized Cdc13-GFP.

      (2) The overlapping error bars in Cdc13-localization data in some figures (for instance Figure 3E and 4H) make the effect of various mutations on SAC activation/inactivation rather marginal. In some of these cases, Western-blotting data support the authors' conclusions better.

      We agree that the overlapping error bars may look ambiguous in most figures showing time course curves, this is due to the fact that all these data from a group of strains have to be better presented in a single graph to more directly compare the potential effects. We have been fully aware of the drawback of these figure representations, that is why we always presented the data corresponding two major time points (0 and 50 min after release) from all time course analyses in an alternative way, namely using individual histograms to represent the data from each strain with means of repeats, absolute values, error bars and p values clearly labeled. In particular, the data from time point 0 min can provide important information on the SAC activation efficiency. Generally, we placed those data and graphs in corresponding supplemental figures, such as: Figure 1-figure supplement 1C, Figure 1-figure supplement 2D, Figure 3-figure supplement 3, Figure 4-figure supplement 6B, Figure 5-figure supplement 1, and Figure 6-figure supplement 2.

      In addition, as you have noticed, almost all time course data were backed up by our Western blotting data.

      (3) This specific point is not really a weakness but rather a loose end:

      One of the conclusions of this study is that MAPK (Pmk1) contributes to the robustness of SAC-induced arrest by lowering the abundance of Slp1Cdc20. The authors have used pmk1Δ or constitutively activating the MAPK pathways (Pek1DD) and documented their effect on SAC activation/inactivation dynamics. It is not clear if SAC activation also leads to activation of MAPK pathways for them to contribute to the SAC robustness. To tie this loose end, the author could have checked if the MAPK pathway is also activated under the conditions when SAC is activated. Unless this is shown, one must assume that the authors are attributing the effect they observe to the basal activity of MAPKs.

      We agree with your concern. We have followed your suggestion and performed further experiments. Please see our more detailed response to your point #ii(a) in your “Recommendations for the authors”.

      (4) This is also a loose end:

      The authors show that activation of stress pathways (by addition of KCl for instance) causes phosphorylation-dependent Slp1Cdc20 downregulation (Figure 6) under the SAC-activating condition. Does activation of the stress pathway cause phosphorylation-dependent Slp1Cdc20 downregulation under the non-SAC-activation condition or does it occur only under the SAC-activating condition?

      We agree with your concern. We have followed your suggestion and performed further experiments. Please see our more detailed response to your point #ii(b) in your “Recommendations for the authors”.

      (5) Although the authors have gone to some length to identify S28 and T31 (also S59) as phosphorylation sites for Cdk1, their functional significance in the context of MAPK involvement is not yet clear. Perhaps it is outside the scope of this study to dig deeper into this aspect more than the authors have.

      Based on our data from Mass spectrometry analysis, mutational analysis, in vitro and in vivo kinase assays using phosphorylation site-specific antibodies, we confirmed that at least S28 and T31 are Cdk1 phosphorylation sites. From our time course analysis of these phosphorylation-deficient mutants, it seems the mechanisms of Slp1 activity or protein abundance regulated by Cdk1 or MAPK are quite different. How these two or even more kinases coordinate to control Slp1 activity during APC/C activation is one very interesting issue to be investigated, however, as you have realized, it is indeed beyond the scope of our current study.

      (6) In its current state, the Discussion section is quite disjointed. The first section "Involvement of MAPKs in cell cycle regulation" should be in the Introduction section (very briefly, if at all). It certainly does not belong to the Discussion section. In any case, the Discussion section should be more organized with a better flow of arguments/interpretations.

      We have re-organized our “Discussion” section. Please see our more detailed response to your point #iii in your “Recommendations for the authors”.

      Reviewer #2 (Public Review):

      Summary:

      This study by Sun et al. presents a role for the S. pombe MAP kinase Pmk1 in the activation of the Spindle Assembly Checkpoint (SAC) via controlling the protein levels of APC/C activator Cdc20 (Slp1 in S. pombe). The data presented in the manuscript is thorough and convincing. The authors have shown that Pmk1 binds and phosphorylates Slp1, promoting its ubiquitination and subsequent degradation. Since Cdc20 is an activator of APC/C, which promotes anaphase entry, constitutive Pmk1 activation leads to an increased percentage of metaphase-arrested cells. The authors have used genetic and environmental stress conditions to modulate MAP kinase signalling and demonstrate their effect on APC/C activation. This work provides evidence for the role of MAP kinases in cell cycle regulation in S. pombe and opens avenues for exploration of similar regulation in other eukaryotes.

      Strengths:

      The authors have done a very comprehensive experimental analysis to support their hypothesis. The data is well represented, and including a model in every figure summarizes the data well.

      Weaknesses:

      As mentioned in the comments, the manuscript does not establish that MAP kinase activity leads to genome stability when cells are subjected to genotoxic stressors. That would establish the importance of this pathway for checkpoint activation.

      We understand your concern. We have followed your suggestion and performed further experiments to examine whether the absence of Pmk1 causes chromosome segregation defects. Please see our more detailed response to your point #5 in your “Recommendations for the authors”.

      Recommendations for the authors:

      Reviewing Editor

      Please go through the reviews and recommendations and revise the paper accordingly. I think nearly everything is very straightforward and all issues raised by the two expert referees are fully justified. I look forward to seeing an appropriately revised manuscript.

      Reviewer #1 (Recommendations For The Authors):<br /> (i) Cdc13 localization to SPBs has been used as a readout for SAC activation/inactivation throughout the manuscript. However, the only image showing such localization (Figure 1C) is of poor quality where the Cdc13 localization to SPBs is barely visible. This should be replaced by a better image.

      We have replaced those pictures with a new set of representative images, which show clear presence or absence of SPB-localized Cdc13-GFP.

      (ii) I reiterate the loose ends in this manuscript I have mentioned above. If the authors have already conducted these experiments, they should include the results in the manuscript to tighten the story further. (I am not suggesting that the authors must perform these experiments...if they have not).

      (a) One of conclusions of this study is that MAPK (Pmk1) contributes to the robustness of SAC-induced arrest by lowering the abundance of Slp1Cdc20. The authors have used pmk1Δ or constitutively activating the MAPK pathways (pek1DD) and documented their effect on SAC activation/inactivation dynamics. It is not clear if SAC activation also leads to activation of MAPK pathways for them to contribute to the SAC robustness. To tie this loose end, the author could have checked if the MAPK pathway is also activated under the conditions when SAC is activated. Unless this is shown, one must assume that the authors are attributing the effect they observe to the basal activity of MAPKs.

      Actually, our data shown in Figure 6B demonstrated that SAC activation per se cannot trigger activation of MAPK pathway CIP, because we did not observe any elevated Pmk1 phosphorylation (i.e. Pmk1-P detected by anti-phospho p42/44 antibodies) in nda3-arrested cells (Please see “control” samples in Figure 6B).

      To corroborate this observation, we further examined the Pmk1 phosphorylation/activation in Mad2-overexpressing cells, and could not detect elevated Pmk1 phosphorylation. This data again lends support to the notion that SAC activation per se cannot trigger activation of CIP signaling.

      We have added our newly obtained result in Figure 6-figure supplement 1 in our revised manuscript.

      (b) The authors show that activation of stress pathways (by addition of KCL instance) causes phosphorylation-dependent Slp1Cdc20 downregulation (Figure 6) under the SAC-activating conditions. Does activation of the stress pathway cause phosphorylation-dependent Slp1Cdc20 downregulation under the non-SAC-activation conditions or does it occur only under the SAC-activating condition?

      As you suggested, we have constructed cdc25-22 background strains with pmk1+ deleted or expressing Padh11-pek1DD to remove or constitutively activate CIP signaling, respectively. By immunoblotting, we followed the Slp1Cdc20 levels when cells went through mitosis after being released at 25 °C from G2/M-arrest at high temperature. We found that Slp1Cdc20 levels in pek1DD cells were only marginally reduced compared to wild-type cells, whereas we failed to observe any elevated Slp1Cdc20 levels in pmk1Δ cells. These results suggested that CIP signaling only plays a negligible role in influencing Slp1Cdc20 levels under the non-SAC-activation conditions.

      We have presented our newly obtained result in Figure 2-figure supplement 1 in our revised manuscript.

      (iii) The Discussion section is quite disjointed. The first section "Involvement of MAPKs in cell cycle regulation" should be in the Introduction section (very briefly, if at all). It certainly does not belong to the Discussion section. In any case, the Discussion section should be more organized with a better flow of arguments/interpretations.

      Thank you for suggestion on the organization and flow for “Discussion”. We have reorganized our “Discussion” sections and moved the previous “Involvement of MAPKs in cell cycle regulation” to the section “Introduction” and rewrote the corresponding paragraph.

      (iv) A minor point in this context:

      In the cold-sensitive β-tubulin mutant, growth at 18C causes loss of kinetochore-microtubule attachments as well as the intra-kinetochore tension. Both perturbations individually can lead to the activation of SAC. This study does not distinguish whether MAPK involvement in SAC dynamics is relevant to one perturbation or another or both. It would be pertinent to briefly mention this point in the Discussion section.

      As you suggested, we have added two sentences to briefly mention this point in our “Discussion” section.

      Reviewer #2 (Recommendations For The Authors):

      This study by Sun et al. presents a role for the S. pombe MAP kinase Pmk1 in the activation of the Spindle Assembly Checkpoint (SAC) via controlling the protein levels of APC/C activator Cdc20 (Slp1 in S. pombe). The data presented in the manuscript is thorough and convincing. The authors have shown that Pmk1 binds and phosphorylates Slp1, promoting its ubiquitination and subsequent degradation. Since Cdc20 is an activator of APC/C, which promotes anaphase entry, constitutive Pmk1 activation leads to an increased percentage of metaphase-arrested cells. The authors have used genetic and environmental stress conditions to modulate MAP kinase signalling and demonstrate their effect on APC/C activation. This work provides evidence for the role of MAP kinases in cell cycle regulation in S. pombe and opens avenues for exploration of similar regulation in other eukaryotes.

      Although the data largely supports the conclusions, a major addition will be testing whether cells accumulate chromosomal or inheritance defects when MAPK Pmk1 is absent. It will be interesting to know that this mechanism of SAC activation contributes to genome integrity.

      Some additions that can improve the manuscript are mentioned below:

      (1) In Figure 1, the authors should also test the effect of constitutive activation of Spk1 to rule out the involvement of the PSP pathway.

      To meet your curiosity and requirement, we have constructed yeast strains expressing constitutively active byr1DD alleles carrying S214D and T218D point mutations under the control of the adh21 or adh11 promoters (Padh21 or Padh11 in short), i.e. Padh21-6HA-byr1DD and Padh11-6HA-byr1DD, respectively. We examined the expression of these byr1DD alleles by Western blotting, and tested the TBZ sensitivity of these alleles and also checked whether they affect the efficiency of SAC activation or inactivation. Our results showed that constitutive activation of Spk1 by overexpressing Byr1DD does not cause yeast cells to be TBZ-sensitive or affect the efficiency of SAC activation or inactivation.

      We have added these new data in Figure 1-figure supplement 2 in our revised manuscript.

      (2) The number of analyzed cells (n) should be mentioned in the figure legends in Figure 1D, and all other figure panels should represent similar data in the consequent figures.

      We have added the information on sample size for all experiments involving time course analyses.

      (3) The authors should also use another arresting mechanism (e.g. nocodazole treatment) and corroborate the result in Figure 1C to rule out any effects due to the mutant.

      Figure 1C in our manuscript actually shows our experimental design and not the result. We guess here you asked for alternative strategy to arrest cells at metaphase and confirm our results shown in Figure 1D.

      We need to mention that, as a commonly used inhibitor of microtubule polymerization, Nocodazole is very effective in mammalian and human cells and also in budding yeast cells, but not effective at all in wild-type fission yeast cells. It has been found that Nocodazole is only active in fission yeast α- or β-tubulin mutants (please see Umesono, K., et al., J Mol Biol. 168 (2): 271-284 (1983); PMID: 6887245; DOI: 10.1016/s0022-2836(83)80018-7.) or multidrug resistance (MDR) transporter mutants (please see Kawashima, SA, et al., Chemistry & Biology 19, 893–901 (2012); PMID: 22840777; doi: 10.1016/j.chembiol.2012.06.008.). Therefore, this feature of Nocodazole has limited and restricted its routine use as a metaphase arrest or spindle checkpoint activation drug in fission yeast.

      Instead, in order to achieve the spindle checkpoint activation and metaphase arrest, we took advantage of a metaphase-arresting mechanism involving Mad2 overexpression, which has been described and used previously (Please see He, X., et al., Proc Natl Acad Sci USA. 94 (15): 7965-70 (1997); PMID: 9223296; DOI: 10.1073/pnas.94.15.7965, and May, K.M., et al., Current Biology, 27(8):1221-1228 (2017); PMID: 28366744; DOI: 10.1016/j.cub.2017.03.013). With this strategy, we could analyze the metaphase-arresting and SAC-activation efficiency by counting cells with short spindles as judged by GFP-Atb2 signals. We compared the frequencies of cells with short spindles in wild-type, pmk1Δ, sty1-T97A, and spk1Δ backgrounds after Mad2 has been induced to overexpress for 18 hours, and found that SAC-activating efficiency was compromised in pmk1Δ and sty1-T97A cells, but not in spk1Δ cells. This data indeed corroborated our result shown in Figure 1D and ruled out possible effects due to the nda3-KM311 mutant.

      We have added this new data in Figure 1-figure supplement 3 in our revised manuscript.

      (4) It would also be helpful to assess SAC or APC/C activation with another cellular readout in addition to Cdc13-GFP accumulation on SPBs, at least for initial experiments.

      Actually, Cdc13-GFP accumulation on SPBs has been routinely used as a reliable cellular readout for SAC or APC/C activation in the field. It was first developed and used by Kevin Hardwick lab in their paper (Vanoosthuyse V and Hardwick KG. Curr Biol. 2009, 19(14):1176-81. PMID: 19592249; doi: 10.1016/j.cub.2009.05.060.). This method was also used in a paper by Meadows JC, et al. (2011) (Dev Cell. 20(6):739-50. PMID: 21664573; doi: 10.1016/j.devcel.2011.05.008.).

      In our previous study, we also employed a different strategy to assess SAC inactivation or APC/C activation, in which degradation of nuclear Cut2-GFP was used as a cellular readout (Please see S4 Fig in Bai S, et al., PLoS Genet 18(9): e1010397 (2022); PMID: 36108046; DOI: 10.1371/journal.pgen.1010397.). Cut2 is the securin homologue in S. pombe and therefore also a target of APC/C at anaphase. Our data in the above paper confirmed that the degradation of both nuclear Cut2-GFP and SPB-localized Cdc13-GFP shows similar dynamics when cells are released from metaphase-arrest.

      As we described in our response to your comment #3, we employed short spindles visualized by GFP-Atb2 signals as an alternative readout for metaphase-arrest and SAC-activation in cells overexpressing Mad2. We confirmed that SAC-activation efficiency was compromised in pmk1Δ and sty1-T97A cells, but not in spk1Δ cells.

      (5) The authors have shown a role for Pmk1 in controlling the activation of APC/C and, hence, cell cycle progression through metaphase to anaphase. One crucial experiment would be to check if pmk1Δ cells show an accumulation of chromosomal aberrations or unequal distribution when subjected to genotoxic stressors. That would implicate a direct importance on Pmk1's role in cell cycle arrest and genome maintenance.

      As you suggested, we have constructed cdc25-22 GFP-atb2+ strains with pmk1+ present or deleted, and treated cells with 0.6 M KCl or 2 μg/mL caspofungin to activate MAPKs and checked whether the absence of pmk1 could cause defective chromosome segregation in anaphase cells. Indeed, we found that stressed pmk1Δ cells displayed greatly increased frequency of lagging chromosomes and chromosome mis-segregation at mitotic anaphase compared to similarly treated wild-type cells and also untreated pmk1Δ cells. This new data implicated a direct role for Pmk1 in cell cycle arrest and genome maintenance, especially when cells are exposed to adverse environment.

      We have presented this new data as Figure 7 in our revised manuscript.

      Typos:

      (1) In line 406, "docking" is misspelled as "docing".

      Thank you for pointing this out. We have corrected this mistake.

      (2) In Figure 6, panel "F" is not marked in the figure.

      We mistakenly mentioned and labeled “F” in Figure 6 legend. In our revised manuscript, we have added new results of protein levels of Pmk1 phosphorylation- and ubiquitylation-deficient Slp1Cdc20 mutants upon SAC activation detected by Western blotting in Figure 6-figure supplement 3.

      (3) In Figure S1, panel "D" is not marked.

      We apologize for our previous wording in our former Figure S1 legend, which was misleading. Actually, there was no panel “D” in Figure S1 (now Figure 1-figure supplement 1). We have rewritten the legend to avoid ambiguity.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The manuscript characterizes a functional peptidergic system in the echinoderm Apostichopus japonicus that is related to the widely conserved family of calcitonin/diuretic hormone 31 (CT/DH31) peptides in bilaterian animals. In vitro analysis of receptor-ligand interactions, using multiple receptor activation assays, identifies three cognate receptors for two CT-like peptides in the sea cucumber, which stimulate cAMP, calcium, and ERK signaling. Only one of these receptors is closely related to the family of calcitonin and calcitonin-like receptors (CTR/CLR) in bilaterian animals, whereas two other receptors cluster with invertebrate pigment dispersing factor receptors (PDFRs). In addition, this study sheds light on the transcript expression and in vivo functions of CT-like peptides in A. japonicus, by quantitative real-time PCR, in situ hybridization, pharmacological experiments on body wall muscle and intestine preparations, and peptide injection and RNAi knockdown experiments. This reveals a conserved function of CT-like peptides as muscle relaxants and hints at a potential role as growth regulators in A. japonicus.

      Strengths:

      This work combines both in vitro and in vivo functional assays to identify a CT-like peptidergic system in an economically relevant echinoderm species, the sea cucumber A. japonicus. A major strength of the study is that it identifies three G protein-coupled receptors for AjCT-like peptides, one related to the CTR/CLR family and two related to the PDFR family. A similar finding was previously reported for the CT-related peptide DH31 in Drosophila melanogaster that activates both CT-type and PDF-type receptors. Here, the authors expand this observation to a deuterostomian animal, which suggests that receptor promiscuity is a more general feature of the CT/DH31 peptide family and that CT/DH31-like peptides may activate both CT-type and PDF-type receptors in other animals as well.

      Besides the identification of receptor-ligand pairs, the downstream signaling pathways of AjCT receptors have been characterized, highlighting broad effects on cAMP, calcium, and ERK signaling. Functional characterization of the CT-related peptide system in heterologous cells is complemented with ex vivo and in vivo experiments. First, peptide injection and RNAi knockdown experiments establish transcriptional regulation of all three identified receptors in response to changing AjCT peptide levels. Second, ex vivo experiments reveal a conserved role for the two CT-like peptides as muscle relaxants, which have differential effects on body wall muscle and intestine preparations. Finally, peptide injection studies suggest a putative role for one of the two CT-like peptides (AjCT2) in growth regulation.

      Weaknesses:

      (1) Analysis of transcript expression is limited to the CT-peptide encoding gene, while no gene expression analysis was attempted for the three identified receptors. Differences in the activation of downstream signaling pathways between the three receptors are also questionable due to unclarities in the statistical analysis and variation in the control and experimental data in heterologous assays. Together, this makes it difficult to propose a mechanism underlying differences in the functions of the two CT-like peptides in muscle control and growth regulation.

      Thank you for the reviewer’s comment, we will supplement the expression analysis for the three identified receptors. Actually, we did all the statistical tests for all the experiments, and maybe the form of marking is a bit messy, so sorry for the confusion and we will uniform them and include all this information both in the figure legends and the methods section. And for the variation in the control and experimental data, because every control is transfected with different receptors or uses cells from different batches, that is why the control conditions in different experiments have little bit variation.

      (2) The authors also suggest a putative orexigenic role for the CT-like peptidergic system in feeding behavior. This effect is not well supported by the experimental data provided, as no detailed analysis of feeding behavior was carried out (only indirect measurements were performed that could be influenced by other peptidergic effects, such as on muscle relaxation) and no statistically significant differences were reported in these assays.

      Thank you for the reviewer’s comment. Actually, we did all the statistical tests for all the experiments, the mass of remaining bait and the excrement were added in Figure 7A-figure supplement 1 and we will conduct additional behavioral experiments to explore the changes in feeding behavior of A. japonicus after injection of CT-type neuropeptides to support the role of CT-like peptidergic system in the regulation of feeding behavior, as I mentioned above, maybe the form of marking is a bit messy, so sorry for the confusion and we will uniform them and include all this information both in the figure legends and the methods section. And also we will supplement the experiments to further support our claim by assessing the feeding and growth factors after knocking down the CTP encoding genes.

      (3) Overall, details regarding statistical analyses are not (clearly) specified in the manuscript, and there are several instances where statements are not supported by literature evidence.

      Again, actually, we did all the statistical tests for all the experiments, as I mentioned above, maybe the form of marking is a bit messy, so sorry for the confusion and we will uniform them and include all this information both in the figure legends and the methods section. And we will also supplement more experiments and add more literature evidence to support our statements.

      Reviewer #2 (Public review):

      Summary:

      The authors show that A. japonicus calcitonins (AjCT1 and AjCT2) activate not only the calcitonin/calcitonin-like receptor but also activate the two PDF receptors, ex vivo. They also explore secondary messenger pathways that are recruited following receptor activation. They determine the source of CT1 and CT2 using qPCR and in situ hybridization and finally test the effects of these peptides on tissue contractions, feeding, and growth. This study provides solid evidence that CT1 and CT2 act as ligands for calcitonin receptors; however, evidence supporting cross-talk between CT peptides and PDF receptors is only based on ex vivo experiments.

      Strengths:

      This is the first study to report the pharmacological characterization of CT receptors in an echinoderm. Multiple lines of evidence in cell culture (receptor internalization and secondary messenger pathways) support this conclusion.

      Weaknesses:

      (1) The authors claim that A. japonicus CTs activate "PDF" receptors and suggest that this cross-talk is evolutionarily ancient since a similar phenomenon also exists in the fly Drosophila melanogaster. These conclusions are not fully supported for several reasons. The authors perform phylogenetic analysis to show that the two "PDF" receptors form an independent clade. This clade is sister to the clade comprising CT receptors. This phylogenetic analysis suffers from several issues. Firstly, the phylogenies lack bootstrap support. Secondly, the resolution of the phylogeny is poor because representative members from diverse phyla have not been included. For instance, insect or other protostomian PDF receptors have not been included so how can the authors distinguish between "PDF" receptors or another group of CT receptors? Thirdly, no in vivo evidence has been presented to support that CT can activate "PDF" receptors in vivo.

      We do agree with the reviewer that the cross-talk between CTs and PDFRs is not so solid based on our current study.

      So firstly, we will re-do the phylogenetic analyses as the reviewer suggested and mark the bootstrap value, then we will supplement more experiments (like PDFR knockdown) to further confirm that CT can activate “PDF” receptors in vivo.

      (2) The source of CT which mediates the effects on longitudinal muscles and intestine is unclear. Is it autocrine or paracrine signaling by CT from the same tissue or is it long-range hormonal signaling?

      Thank you for the reviewer’s comment, actually we have done in situ and immunohistochemical experiments for CTP and CT in different tissues, we just did not put them in our current manuscript, we will add them in the revised version.

      (3) Pharmacology experiments showing the effects of CT1 and CT2 on ACh-induced contractions were performed. Sample traces have been provided but no traces with ACh alone have been included. How long do ACh-induced contractions persist? These controls are necessary to differentiate between the eventual decay of ACh effects and relaxation induced by CT1 and CT2. The traces also do not reflect the results portrayed in dose-response curves. For instance, in Figure 6B, maximum relaxation is reported for 10-6M. Yet, the trace hardly shows any difference before and after the addition of 10-6M peptide. The maximum effect in the trace appears to be after the addition of 10-8M peptide.

      Thank you for the reviewer’s comment, we will provide the trace of contraction caused by ACh alone. In Figure 6B, the trace represents successive treatments of neuropeptides at different concentrations, which represents a cumulative effect. Therefore, the corresponding receptors may become desensitized when high concentration of peptide is finally applied. Actually, we examined the pharmacological effects of CT2 at 10-6M concentrations, which exhibited the maximum relaxation, and we will provide this trace.

      (4) I am unsure how differences in wet mass indicate feeding and growth differences since no justification has been provided. Couldn't wet mass also be influenced by differences in osmotic balance, a key function of calcitonin-like peptides in protostomian invertebrates? The statistical comparisons have not been included in Figure 7B.

      Thank you for the reviewer’s comment, we will analyze the weight gain rate, growth rate and feeding rate of A. japonicus to explain the difference of feeding and growth between injection group and control group. And we can confirm that wet mass is not influenced by differences in osmotic balance, we will put our supporting evidence in supplementary files in revised manuscript and we did not find the key function of calcitonin-like peptides observed in protostomian invertebrates. And we will include the statistical comparisons in Figure 7B.

      (5) While the authors succeeded in knocking down CT, the physiological effects of reduced CT signaling were not examined.

      Thank you for the reviewer’s insightful suggestion, we will supplement the experiments about the physiological effects after knocking down CT.

    1. Author response:

      Response to Reviewer #1:

      We agree with the reviewer that ChIP is much better than ChEC at recovering RNA polymerase II and elongation factors associated with the transcribed regions.  We believe that this is due to cross-linking, which enriches for these interactions.  However, as we highlight in the manuscript, cross-linking may not accurately report on the occupancy of RNA polymerase II and elongation factors over all regions.  Although, by ChEC, we observe elongation factors upstream of the transcribed region, compared with total RNA polymerase II, the relative enrichment of elongation factors or phosphorylated RNA polymerase II is significantly higher over transcribed regions, with a bias to the 3’ end (Figure 4B & C). This is consistent with these proteins and modifications functioning in elongation.  

      Regarding how convincing the results with the gcn4-pd mutant are, we would highlight that the phenotype of this mutant is a quantitative decrease in transcription and this leads to a quantitative decrease, rather than qualitative loss, of RNA polymerase II over the promoter, without impacting the association of RNA polymerase II over the UAS region.  This effect was small but statistically significant (p = 4e-5). Obviously, more mechanistic studies will need to be performed, but this result supports a role for the interaction with the nuclear pore complex in either enhancing the transfer of RNA polymerase II from the enhancer to the promoter or in preventing its dissociation from the promoter.

      Response to Reviewer #2:

      Thank you for your supportive comments and suggestions.  We will clarify our use of Nascent RNA in the text.  We agree that the stronger enrichment of the transcribed region from Rpb1 ChIP-seq experiments should correlate well with actively transcribing RNA polymerase II observed by PRO-seq; enrichment by PRO-seq reported in a paper from John Lis’ lab strongly favors transcribed regions with a modest peak over the terminator (PMID: 27197211, Figure 2A).  ChEC reveals functionally important forms of RNA polymerase II that are not engaged in transcription.  This manuscripts highlights the potential utility of ChEC-seq2 in measuring these interactions - suggested by the recent work from Buratowski and Gelles’ single molecule studies - globally.

    1. Author response:

      (1) Rationale of the study and key finding

      We respectively disagree with Reviewer #1’s comments on ‘the fundamental weakness of this paper … about regional identity ...’. We believe that they misunderstood the rationale and key message of the paper.

      The rationale of the study stems from the increasing recognition of the importance of generating ‘regional-specific’ astrocytes from iPSCs for disease modelling, due to astrocyte heterogeneity and their region-specific involvement in disease pathology. Regional astrocytes are typically differentiated from neural progenitors (NPCs) that are ‘patterned’ to the desired fate during iPSC neural induction. While the efficiency is not 100%, it is nevertheless assumed that the initial lineage composition (%) of patterned NPCs is preserved during the course of astrocyte differentiation and hence that the derived astrocytes represent the intended regional fate.

      We questioned this approach using genetic lineage tracing with ventral midbrain-patterned neural progenitors as an example. By monitoring astrocytic induction of purified BFP+ NPCs and unsorted ventral midbrain-patterned NPC (referred to as BFP- in the paper, line 154 submitted PDF), we demonstrate that despite BFP+ NPCs being the vast majority (>90% LMX1A+ and FOXA2+) at the onset of astrocytic induction, their derivatives were lost in the final astrocyte product unless BFP+ NPCs were purified prior to astrocytic induction and differentiation. 

      Our findings demonstrate that iPSC-derived astrocytes may not faithfully represent the antecedent neural progenitor pool in terms of lineage, and that the regionality of PSC-derived astrocytes should not be assumed based on the (dominant) NPC identity. We believe that this finding is important for iPSC disease modelling research, especially where disease pathophysiology concerns astrocytes of specific brain regions.

      Reviewer #1 raised several interesting questions concerning floor plate marker expression during astrocytic induction and astrocyte differentiation in normal development. These are important outstanding questions in developmental neurobiology, but they are outside the scope of this in vitro study. Indeed, the approach taken by published PSC-astrocyte studies - such as assigning regional identity of PSC-derived astrocytes based on the starting NPC fate or validating PSC-astrocyte using regional markers defined in the developing embryo - is partly due to our limited knowledge about the developing and mature astrocytes in different brain regions. This knowledge gap consequently restricts a thorough characterisation of the regional identity of PSC-astrocytes in such cases.

      (2) LMX1A expression in the brain and LMX1A-BFP lineage tracer line

      We thank Reviewer #1 for highlighting the wider expression of LMX1A. We are fully aware of this consideration and hence the thorough examination of PSC-derived ventral midbrain-patterned NPCs by immunostaining and single cell RNA-sequencing in this and a previous study (PMID: 38132179). All LMX1A+ cells produced in our protocol exhibit ventral midbrain progenitor gene expression profiles when compared to dataset obtained from human fetal ventral midbrain.

      Some of the comments give us the impression that there might be some confusion regarding the lineage tracing system used in this study. The LMX1A-Cre/AAVS1-BFP line is not a classic reporter line that mark LMX1A-expressing cells in real time. Instead, it was designed as a tracer line that expresses BFP in the derivatives of LMX1A+ cells as well as cells expressing LMX1A at the time of analysis.  

      (3) Is regional identity fixed?

      We feel that Reviewer #1 misunderstood the paper in their comments ‘The authors are making an assumption that regional identity is fixed when they begin their astrocyte differentiation protocol - not necessarily true…’.  We in fact pointed out in the paper that expression of LMX1A and FOXA2, a signature of midbrain floor plate progenitors, is lost in our BFP+ astrocytes. In this paper, ‘regional identity’ was loosely used to also refer to lineage identity and genetic traits, not just gene expression. We will consider alternative wording during revision to avoid potential confusion.  

      (4) Splice disruption in the COL3A1 gene and potential effect on astrocyte differentiation of Kolf2 iPSCs

      We thank Reviewer #2 for highlighting the variations in KOLF2C1 hiPSCs and the study by Bradley et al. (2019) on differential COL3A1 expression in some ventral astrocytes. We noted that the progenitors produced by Bradley et al. were NKX2.1+ ventral forebrain cells, rather than the LMX1A+ ventral midbrain progenitors investigated in our study. Our scRNAseq data show that all three populations of astrocytes exhibit low levels of COL3A1 expression. While we will continue to examine astrocyte COL3A1 expression in publicly available gene expression datasets, we feel that a selective impairment in astrocyte differentiation of BFP+ cells is unlikely.

      (5) Additional data analysis and validation of potential new markers

      We will carefully consider the reviewers’ suggestions on further analysis of our single-cell RNA sequencing dada during revision. Regarding eLife’s assessment of validating differential gene expression in different brain regions, it is worth noting that both BFP+ and BFP- cells mapped to the published midbrain scRNAseq data set (La Manno et al, Cell 2016, PMID: 27716510), supporting their midbrain fate. We agree in principle that all single-cell RNA sequencing findings should be validated by immunostaining. It would be beneficial to experimentally verify that our candidate BFP+ differentially expressed genes indeed mark astrocytes derived from LMX1A+ NPCs in vivo, as opposed to other midbrain NPCs. However, this verification cannot be realistically performed in a human setting, but only in an analogous mouse tracer line.

      The current eLife assessment nicely summarised part of our findings, in a sense secondary output of this work. We would appreciate a revised eLife assessment that include the message that iPSC-derived astrocytes, in terms of genetic lineage, can deviate greatly from the starting progenitor pool.  We would be very happy to provide further information or clarification if it would be helpful. We are committed to doing our best as authors to enhance reader experience and support the continued success of eLife.

    1. Author response:

      Reviewer #1:

      Summary:

      García-Vázquez et al. identify GTSE1 as a novel target of the cyclin D1-CDK4/6 kinases. The authors show that GTSE1 is phosphorylated at four distinct serine residues and that this phosphorylation stabilizes GTSE1 protein levels to promote proliferation.

      Strengths:

      The authors support their Kindings with several previously published results, including databases. In addition, the authors perform a wide range of experiments to support their Kindings.

      Weaknesses:

      I feel that important controls and considerations in the context of the cell cycle are missing. Cyclin D1 overexpression, Palbociclib treatment and apparently also AMBRA1 depletion can lead to major changes in cell cycle distribution, which could strongly inKluence many of the observed effects on the cell cycle protein GTSE1. It is therefore important that the authors assess such changes and normalize their results accordingly.

      We have approached the question of GTSE1 phosphorylation to account for potential cell cycle effects from multiple angles:  

      (i) We conducted in vitro experiments with puriIied, recombinant proteins and shown that GTSE1 is phosphorylated by cyclin D1-CDK4 in a cell-free system (Figure 2A-C). This experiment provides direct evidence of GTSE1 phosphorylation by cyclin D1-CDK4 without the inIluence of any other cell cycle effectors.  

      (ii) We present data using synchronized AMBRA1 KO cells (Figure 2G and Supplementary Figure 3B).  As shown previously (Simoneschi et al., Nature 2021, PMC8875297), AMBRA1 KO cells progress faster in the cell cycle but they are still synchronized as shown, for example by the mitotic phosphorylation of Histone H3. Under these conditions we observed that while phosphorylation of GTSE1 in parental cells peaks at the G2/M transition, AMBRA1 KO cells exhibited sustained phosphorylation of GTSE1 across all cell cycle phases.  This is evident when using Phos-tag gels as in the current top panel of Figure 2G. We now re-run one the biological triplicates of the synchronized cells using higher concentration of Zn+2-Phos-tag reagent and lower voltage to allow better separation.  Under these conditions, GTSE1 phosphorylation is more apparent. In the new version of the paper, we will either show both blots or substitute the old panel with the new one. This experiment provides evidence that high levels of cyclin D1 in AMBRA1 KO cells affect GTSE1 independently of the speciIic points in the cell cycle.  

      (iii) The relative short half-life of GTSE1 (<4 hours) makes its levels sensitive to acute treatments such as Palbococlib or AMBRA1 depletion. The effects of these treatments on GTSE1 levels are measurable within a time frame too short to affect cell cycle progression in a meaningful way. For example, we used cells with fusion of endogenous AMBRA1 to a mini-Auxin Inducible Degron (mAID) at the N-terminus. This system allows for rapid and inducible degradation of AMBRA1 upon addition of auxin, thereby minimizing compensatory cellular rewiring. Again, we observed an increase in GTSE1 levels upon acute ablation of AMBRA1 (i.e., in 8 hours) (Figure 3B), when no signiIicant effects on cell cycle distribution are observed (please see Simoneschi et al., Nature 2021, PMC8875297 and Rona et al., Mol. Cell 2024, PMC10997477). 

      All together, these lines of evidence support our conclusion that GTSE1 is a target of cyclin D1-CDK4, independent of cell cycle effects. In conclusion, as stated in the Discussion section, GTSE1 has been established as a substrate of mitotic cyclins, but we observed that overexpression of cyclin D1-CDK4 induce GTSE1 phosphorylation at any point of the cell cycle. Thus, we propose that GTSE1 is phosphorylated by CDK4 and CDK6 particularly in pathological states, such as cancers displaying overexpression of D-type cyclins beyond the G1 phase. In turn, GTSE1 phosphorylation induces its stabilization, leading to increased levels that, as expected based on the existing literature, contribute to enhanced cell proliferation. So, the cyclin D1-CDK4/6 kinase-dependent phosphorylation of GTSE1 induces its stabilization independently of the cell cycle.  

      Reviewer #2:

      Summary:

      The manuscript by García-Vázquez et al identifies the G2 and S phases expressed protein

      1(GTSE1) as a substrate of the CycD-CDK4/6 complex. CycD-CDK4/6 is a key regulator of the G1/S cell cycle restriction point, which commits cells to enter a new cell cycle. This kinase is also an important therapeutic cancer target by approved drugs including Palbocyclib. Identification of substrates of CycD-CDK4/6 can therefore provide insights into cell cycle regulation and the mechanism of action of cancer therapeutics. A previous study identified GTSE1 as a target of CycB-Cdk1 but this appears to be the first study to address the phosphorylation of the protein by Cdk4/6.

      The authors identified GTSE1 by mining an existing proteomic dataset that is elevated in AMBRA1 knockout cells. The AMBRA1 complex normally targets D cyclins for degradation. From this list, they then identified proteins that contain a CDK4/6 consensus phosphorylation site and were responsive to treatment with Palbocyclib. 

      The authors show CycD-CDK4/6 overexpression induces a shift in GTSE1 on phostag gels that can be reversed by Palbocyclib. In vitro kinase assays also showed phosphorylation by CDK4. The phosphorylation sites were then identified by mutagenizing the predicted sites and phostag got to see which eliminated the shift. 

      The authors go on to show that phosphorylation of GTSE1 affects the steady state level of the protein. Moreover, they show that expression and phosphorylation of GTSE1 confer a growth advantage on tumor cells and correlate with poor prognosis in patients.

      Strengths:

      The biochemical and mutagenesis evidence presented convincingly show that the GTSE1 protein is indeed a target of the CycD-CDK4 kinase. The follow-up experiments begin to show that the phosphorylation state of the protein affects function and has an impact on patient outcomes. 

      Weaknesses:

      It is not clear at which stage in the cell cycle GTSE1 is being phosphorylated and how this is affecting the cell cycle. Considering that the protein is also phosphorylated during mitosis by CycB-Cdk1, it is unclear which phosphorylation events may be regulating the protein.

      In cells that do not overexpress cyclin D1, GTSE1 is phosphorylated at the G2/M transition, consistent with the known cyclin B1-CDK1-mediated phosphorylation of this protein. However, AMBRA1 KO cells exhibited high levels of cyclin D1 throughout the cell cycle and sustained phosphorylation of GTSE1 across all cell cycle points (Figure 2G and Supplementary Figure 3B). Please see also answer to Reviewer #1.  Moreover, we show that, compared to the amino acids phosphorylated by cyclin D1-CDK4, cyclin B1-CDK1 phosphorylates GTSE1 on either additional residues or different sites (Figure 2H). Finally, we show that expression of a phospho-mimicking GTSE1 mutant leads to accelerated growth and an increase in the cell proliferative index (Figure 4C).  However, we have not evaluated how phosphorylation affects the cell cycle distribution.  We will perform FACS analyses and include them in the new version. 

      Reviewer #3:

      Summary:

      This paper identifies GTSE1 as a potential substrate of cyclin D1-CDK4/6 and shows that GTSE1 correlates with cancer prognosis, probably through an effect on cell proliferation. The main problem is that the phosphorylation analysis relies on the over-expression of cyclin D1. It is unclear if the endogenous cyclin D1 is responsible for any phosphorylation of GTSE1 in vivo, and what, if anything, this moderate amount of GTSE1 phosphorylation does to drive proliferation.

      Strengths: 

      There are few bonafide cyclin D1-Cdk4/6 substrates identified to be important in vivo so GTSE1 represents a potentially important finding for the field. Currently, the only cyclin D1 substrates involved in proliferation are the Rb family proteins.

      Weaknesses:

      The main weakness is that it is unclear if the endogenous cyclin D1 is responsible for phosphorylating GTSE1 in the G1 phase. For example, in Figure 2G there doesn't seem to be a higher band in the phos-tag gel in the early time points for the parental cells. This experiment could be redone with the addition of palbociclib to the parental to see if there is a reduction in GTSE1 phosphorylation and an increase in the amount in the G1 phase as predicted by the authors' model. The experiments involving palbociclib do not disentangle cell cycle effects. Adding Cdk4 inhibitors will progressively arrest more and more cells in the G1 phase and so there will be a reduction not just in Cdk4 activity but also in Cdk2 and Cdk1 activity. More experiments, like the serum starvation/release in Figure 2G, with synchronized populations of cells would be needed to disentangle the cell cycle effects of palbociclib treatment.    

      In normal cells, GTSE1 is phosphorylated at the G2/M transition in a cyclin B1-CDK1dependent manner.  During G1, when the levels of cyclin D1 peak, GTSE1 is not phosphorylated. This could be due to a higher affinity between GTSE1 and mitotic cyclins as compared to G1 cyclins or to a higher concentration of mitotic cyclins compared to G1 cyclins.  We show that higher levels of cyclin D1 induce GTSE1 phosphorylation during interphase, but we do not rely only on the overexpression of exogenous cyclin D1. In fact, we observe similar effect when we deplete endogenous AMBRA1, resulting in the stabilization of endogenous cyclin D1.  As mentioned in the Discussion section, we propose that GTSE1 is phosphorylated by CDK4 and CDK6 particularly in pathological states, such as cancers displaying overexpression of D-type cyclins (i.e., the overexpression appears to overcome the lower afIinity of the cyclin D1-GTSE1 complex). In sum, our study suggests that overexpression of cyclin D1, which is often observed in cancers cells beyond the G1 phase, induces phosphorylation of GTSE1 at all points in the cell cycle displaying high levels of cyclin D1.  Please see also response to Reviewer #1.  Concerning the experiments involving palbociclib, we limited confounding effects on the cell cycle by treating cells with palbociclib for only 4-6 hours. Under these conditions, there is simply not enough time for the cells to arrest in G1.

      It is unclear if GTSE1 drives the G1/S transition. Presumably, this is part of the authors' model and should be tested.

      We are not claiming that GTSE1 drives the G1/S transition.  GTSE1 is known to promote cell proliferation, but how it performs this task is not well understood.  Our experiments indicate that, when overexpressed, cyclin D1 promotes GTSE1 phosphorylation and its consequent stabilization.  In agreement with the literature, we show that higher levels of GTSE1 promote cell proliferation.  To measure cell cycle distribution upon expressing various forms of GTSE1, we will now perform FACS analyses and include them in the new version. 

      The proliferation assays need to be more quantitative. Figure 4B should be plotted on a log scale so that the slope can be used to infer the proliferation rate of an exponentially increasing population of cells. Figure 4c should be done with more replicates and error analysis since the effects shown in the lower right-hand panel are modest.

      In Figure 4B, we plotted data in a linear scale as done in the past (Donato et al. Nature Cell Biol. 2017, PMC5376241) to better represent the changes in total cell number overtime.  The experiments in Figure 4C were performed in triplicate. Error analysis was not included for simplicity, given the complexity of the data. We will include the other two sets of experiments in the revised version.  While the effects shown in the lower right-hand panel of Figure 4C are modest, they demonstrate the same trend as those observed in the AMBRA KO cells (Figure 4C and Simoneschi et al., Nature 2021, PMC8875297). It's important to note that this effect is achieved through the stable expression of a single phosphomimicking protein, whereas AMBRA KO cells exhibit changes in numerous cell cycle regulators.

      We appreciate the constructive comments and suggestions made by the reviewers, and we believe that the resulting additions and changes will improve the clarity and message of our study.

    1. Author response:

      We thank the reviewers for their valuable comments and recommendations for improvement. In this provisional response we aim to address a few of the major concerns and briefly outline a plan for revision. We plan to submit a more detailed response along with the revised manuscript.

      The reviewers have suggested additional analyses to strengthen the manuscript. As noted, the primary focus of this paper is on single units, to act as a starting point in the characterization of orofacial sensorimotor cortical activity in relation to tongue direction. Research on the cortical mechanisms that underlie sensorimotor control of tongue movements has lagged research on limb movements. Thus, the goal of our paper was to first characterize the individual neuron’s 3D directional tuning properties to provide a basis for future in-depth analysis of population dynamics. However, as multiple reviewers have pointed out the strengths of further investigating population activity, we will aim to address this through additional analysis and discussion. Our starting point for this will be to try other decoding algorithms and dimensionality reduction techniques.

      Reviewers 1 and 2 suggested we compare a subset of trials from the nerve block dataset that has similar kinematics to the control to eliminate the confounding effect of differing kinematics between the two conditions. We did this for feeding, by sampling an equal number of trials with similar kinematics for both control and nerve block despite the different overall distribution. We will be sure to make this clearer within the text. We will also implement this for drinking by subsampling trials from each category with similar kinematics to see if this can account for the difference in neural activity.

      We understand that while using a small number of datasets is typical in non-human primate neuroscience, the inclusion of additional data would greatly reinforce our findings. We are working to process data from other sessions and have completed a few since this submission, which we will run through the analysis and consider adding a comparison into the manuscript.

      Reviewer 3 has raised a valid point that the different movement of the jaw may be a confounding factor in our study of tongue movements. We reported in our recent paper (see Supplementary Fig. 4 in Laurence-Chasen et al., 2023) that “Through iterative sampling of sub-regions of the test trials, we found that correlation of tongue kinematic variables with mandibular motion does not account for decoding accuracy. Even at times where tongue motion was completely un-correlated with the jaw, decoding accuracy could be quite high.” We expect that this also will be true for the analysis of single-unit activity.  

      To address the concern on the robustness of our analytical methods, we plan to show the variability of neural firing rates across trials using the Fano factor and use a bootstrap test for the directional tuning analysis.

      As recommended, we will expand the introduction/discussion to further contextualize the results of this paper within the existing literature and attempt to clarify some of the sections that reviewers have identified.

      “Have the authors confirmed or characterized the strength of their inactivation or block, I was unable to find any electrophysiological evidence characterizing the perturbation.”

      The strength of the nerve block is characterized by a decrease in baseline firing rate of SIo neurons. We can include a figure showing this as supplementary material in the revised version.

      “Can the authors explain (or at least speculate) why there was such a large difference in behavioral effect due to nerve block between the two monkeys (Figure 7)?”

      We acknowledge this as a variable inherent to this type of experimentation. Previous studies have found large kinematic variation in the effect of oral nerve block as well as in the following compensatory strategies between subjects. Every animal’s biology and how they respond to perturbation will be different, which is something we are unable to prevent. Indeed, our subjects exhibited different feeding behavior even in the absence of nerve block perturbation (see Figure 2 in Laurence-Chasen et al., 2022). This is why each individual serves as its own control.

    1. Author response:

      Reviewer #1 (Public Review):

      (1) All the figure legends need to expand significantly, so it is clear what is being presented. All experiments showing data quantification need the numbers of independent biological replicates to be added, plus an indication of what the P-values are associated with the asterisks (and the tests used).

      Thank you for your valuable suggestions. We will significantly expand the figure legends to provide a clear and detailed description of the data presented in each figure. Additionally, we will include dot plots in the bar graphs to illustrate the number of independent biological replicates for each experiment. Furthermore, we will specify the statistical tests used for each analysis and include the corresponding P-values associated with the asterisks in the figure legends.

      (2) All the Related to point 1, the description of the data in the text needs to expand significantly, so the figure panels are interpretable. Examples are given below but this is not an exhaustive list.

      We appreciate your feedback on the clarity of the data description in the text. In response to your suggestion, we will significantly expand the descriptions throughout the manuscript to ensure that each figure panel is fully interpretable. The revised text will provide a more detailed and comprehensive explanation of the data presented.

      (3) All the The addition of "super-enhancer-driven" to the title is a distraction. This is the starting point but the finding is portrayed by the last part of the title. Moreover, it is not clear why this is a super enhancer rather than just a typical enhancer as only one seems to be relevant and functional. I suggest avoiding this term after initial characterisations.

      Thank you for your thoughtful comment. In this study, the key molecule ZFP36L1 was identified as a target gene through the characterization of the super-enhancer ZFP36L1-SE. The enrichment of H3K27ac at this site meets the threshold defined by the ROSE algorithm, and transcription of ZFP36L1 is regulated by BRD4, making it susceptible to inhibition by the super-enhancer inhibitor JQ1. Although we were unable to directly observe the effects of knocking out the ZFP36L1-SE via Cas9 due to experimental constraints, we believe that the indirect evidence we have gathered is sufficient to demonstrate the super-enhancer's driving role. This approach is consistent with the conventions of previous studies on super-enhancers.

      (4) The descriptions of Figures 1B, C, and D are very poor. How for example do you go from nearly 2000 SE peaks to a couple of hundred target genes? What are the other 90% doing? What is the definition of a target gene? This whole start section needs a complete overhaul to make it understandable and this is important as is what leads us to ZFP36L1 in the first place.

      We appreciate your feedback and apologize for the confusion caused by the initial descriptions. As described in the manuscript, the function of SE peaks depends on their location. Figure 1C shows the distribution of these peaks, where "Over 50% of these peaks were located in the non-coding regions such as exons and introns, and their predicted target genes were transcribed to produce non-coding RNAs; the peaks distributed in transcription start and termination sites activated the promoters and directly drove the transcription of protein-coding genes". Our research focuses on protein-coding genes, and we apologize for any misunderstanding due to the inadequate description. We will provide additional clarification to make this distinction clear.

      (5) It is impossible to work out what Figures 1F, H, and I are from the accompanying text. The same applies to supplementary Figure S1D. Figure 1G is not described in the results.

      Thank you for pointing out these issues. We will make the necessary revisions to provide additional explanations for Figures 1F, H, I, G, and supplementary Figure S1D.

      (6) What is Figure 2A? There is no axis label or description.

      Thank you for bringing this to our attention. We will add the missing axis labels and provide a detailed description for Figure 2A to ensure clarity and accurate interpretation.

      (7) Why is CD274 discussed in the text from Figure 2E but none of the other genes? The rationale needs expanding.

      CD274 (also known as PD-L1) is a key focus of our subsequent research. The other immune checkpoints are not expressed on tumor cells but rather on immune cells. We will provide additional explanation in the text to clarify this distinction.

      (8) Figure 2G needs zooming in more over the putative SE region and the two enhancers labelling. This looks very strange at the moment and does not show typical peak shapes for histone acetylation at enhancers.

      We appreciate your feedback. Our intention with Figure 2G was to present the position of ZFP36L1-SE at a macro level rather than focusing on specific details. This broader view is meant to provide context for the SE region in relation to the surrounding genomic landscape.

      (9) The use of JQ1 does not prove something is a super enhancer, just that it is BRD4 regulated and might be a typical enhancer.

      Thank you for your comment. The role of JQ1 as a super-enhancer inhibitor has been widely reported and recognized in the literature. Its use in experimental studies targeting super-enhancers is a well-established practice. We acknowledge that while JQ1 inhibition indicates BRD4 regulation, it is consistent with the identification of super-enhancers as well.

      (10) An explanation of how the motifs were identified in E1 is needed. Enrichment over what? Were they purposefully looking for multiple motifs per enhancer? Otherwise what it all comes down to later in the figure is a single motif, and how can that be "enriched"?

      Thank you for your feedback. We used the MEME-ChIP online tool for motif identification, which is a widely recognized method in transcription factor research. MEME-ChIP applies established algorithms to identify known motifs within DNA sequences. For detailed information on the tool's working principles and algorithms, please refer to the reference provided and the URL included in the Materials and Methods section of our manuscript. MEME-ChIP: https://meme-suite.org/meme/tools/meme.

      (11) A major missing experiment is to deplete rather than over-express SPI1 for the various assays in Figure 4.

      We apologize for this oversight and acknowledge that the depletion of SPI1, in addition to over-expression, would have provided a more comprehensive analysis. Due to experimental constraints, we are unable to include this depletion experiment in the current study. We appreciate your understanding and will consider this suggestion for future research.

      (12) The authors start jumping around cell lines, sometimes with little justification. Why is MGC803 used in Figure 4I rather than MKN45? This might be due to more endogenous SPI1. However, this does not make sense in Figure 5M, where ZFP36L is overexpressed in this line rather than MKN45. If SPI1 is already high in MGC803, then the prediction is that ZFP36L1 should already be high. Is this the case?

      Thank you for your feedback. We want to clarify that we are not arbitrarily jumping between cell lines. Each experiment was validated in two different cell lines. We aimed to present representative results within the constraints of the manuscript, but if more detailed results from additional cell lines are needed, we can provide them upon request. Regarding your concern, results from the MKN45 cell line are consistent with those observed in MGC803, and these findings are not influenced by SPI1 or ZFP36L1 expression levels.

      (13) In Figure 5, HDAC3 should also be depleted to show opposite effects to over-expression (as the latter could be artefactual). Also, direct involvement should be proven by ChIP.

      We appreciate your feedback. We acknowledge that depleting HDAC3, in addition to overexpressing it, would provide a more comprehensive analysis. Unfortunately, due to experimental constraints, we are unable to include this depletion experiment in the current study. We recognize these limitations and appreciate your understanding. We will consider these aspects for future research. Additionally, we would like to clarify that HDAC3 is a histone deacetylase and not a transcription factor, so it does not directly bind to DNA and therefore is not suitable for ChIP analysis.

      (14) Figure 5G and H are not discussed in the text.

      Thank you for pointing this out. We will include a discussion of Figures 5G and H in the revised manuscript. The additional details should provide the necessary context and interpretation for these figures.

      (15) Figure 6C needs explaining. Why are three patients selected here? Are these supposed to be illustrative of the whole cohort? What sub-type of GC are these?

      Thank you for your comment. The three patients with infiltrative GC shown in Figure 6C were selected as representative images based on prior reviewer suggestions.

      (16) Figure 6E onwards, they switch to MFC cell line. They provide a rationale but the key regulatory axis should be sown to also be operational in these cells to use this as a model system.

      Thank you for your comment. We would like to clarify that we used the MC38 cell line, which is a colon cancer cell line, rather than MFC. Our focus was on demonstrating the role of ZFP36L1 in vivo, rather than specifically discussing the regulatory axis in this context. We chose MC38 cells instead of MFC cells due to practical considerations. Specifically, MFC cells were shown in our experiments to be unable to form tumors in wild-type mice, despite previous reports suggesting their tumorigenicity. We will provide a rationale for this choice in the manuscript. We acknowledge that validating the entire regulatory axis in the MC38 cell line would enhance the study's depth. However, due to experimental constraints, we are unable to complete this additional validation. We appreciate your understanding and will consider this aspect for future research.

      Reviewer #2( Public Review):

      (17) The difference in H3K27ac over the ZFP36L1 locus and SE between the expanding and infiltrative GC is marginal (Figure 2G). Although the authors establish that ZFP36L1 is upregulated in GC, particularly in the infiltrative subtype, no direct proof is provided that the identified SE is the source of this observation. CRISPR-Cas9 should be employed to delete the identified SE to prove that it is causatively linked to the expression of ZFP36L1.

      Thank you for your thoughtful comment. In this study, the key molecule ZFP36L1 was identified as a target gene through the characterization of the super-enhancer ZFP36L1-SE. The enrichment of H3K27ac at this site meets the threshold defined by the ROSE algorithm, and transcription of ZFP36L1 is regulated by BRD4, making it susceptible to inhibition by the super-enhancer inhibitor JQ1. Although we were unable to directly observe the effects of knocking out the ZFP36L1-SE via Cas9 due to experimental constraints, we believe that the indirect evidence we have gathered is sufficient to demonstrate the super-enhancer's driving role. This approach is consistent with the conventions of previous studies on super-enhancers.

      (18) In Figure 3C the impact of shZFP36L1 on PD-L1 expression is marginal and it is observed in the context of IFNg stimulation. Moreover, in XGC-1 cell line the shZFP36L1 failed to knock down protein expression thus the small decrease in PD-L1 level is likely independent of ZFP36L1. The same is the case in Figure 3D where forced expression of ZFP36L1 does not upregulate the expression of PDL1 and even in the context of IFNg stimulation the effect is marginal.

      Thank you for your detailed observations. In our study, the regulatory effect of ZFP36L1 on PD-L1 was validated at the mRNA level, protein level, and through flow cytometry, with each experiment being repeated multiple times. The results of the Western blot were quantitatively assessed using densitometry rather than relying solely on visual inspection. It is important to note that interferon-gamma (IFNγ) stimulation significantly enhances PD-L1 expression, which under the same exposure conditions, may make the baseline expression of PD-L1 appear unchanged. This could explain the marginal effect observed under IFNγ stimulation.

      (19) In Figure 4, it is unclear why ELF1 and E2F1 that bind ZFP36L1-SE do not upregulate its expression and only SPI1 does. In Figure 4D the impact of SPI overexpression on ZFP36L1 in MKN45 cells is marginal. Likewise, the forced expression of SPI did not upregulate PD-L1 which contradicts the model. Only in the context of IFNg PD-L1 is expressed suggesting that whatever role, if any, ZFP36L1-SPI1 axis plays is secondary.

      Thank you for your insightful comments. First, ELF1, E2F1, and SPI1 were predicted transcription factors, and experimental validation is crucial. Our results specifically demonstrate that only SPI1 binds to ZFP36L1-SE, while ELF1 and E2F1 do not, confirming the specificity of SPI1. Second, Second, as mentioned in point (18), experimental results, such as those from western blot, should not be evaluated by eye alone. Our findings are quantitatively assessed, and the regulatory relationships have been confirmed through repeated experiments. This finding is supported by multiple experimental validations, including mRNA, protein, and flow cytometry analyses. Furthermore, using IFNγ to study the regulation of PD-L1 is a common and widely accepted approach in this field. Many studies adopt this model, and it should not be concluded that the axis is secondary simply because PD-L1 expression is observed primarily under IFNγ stimulation. Similarly, other popular research areas, such as ferroptosis and autophagy, also use specific inducers, but this does not diminish the significance of the pathways being studied.

      (20) The data presented in Figure 6 are not convincing. First, there is no difference in the tumor growth (Figure 6E). IHC in Figure 6I for CD8a is misleading. Can the authors provide insets to point CD8a cells? This figure also needs quantification and review from a pathologist.

      Regarding this observation, we will provide an explanation in the discussion section: "Several studies have proposed that reducing PD-L1 expression enhances the tumor-killing effect of cytotoxic T lymphocytes in vitro and reduces primary tumor foci in vivo. Conversely, findings from this study suggest that PD-L1 expression is associated with immune evasion in metastatic foci." We are unsure why those studies concluded that PD-L1 expression levels would impact the size of the primary tumor. We are more inclined to support the perspective of John et al.Klement JD, Redd PS, Lu C, et al. Tumor PD-L1 engages myeloid PD-1 to suppress type I interferon to impair cytotoxic T lymphocyte recruitment. Cancer Cell. 2023;41(3):620-636.e9. doi:10.1016/j.ccell.2023.02.005

      Reviewer #1 (Recommendations For The Authors):

      (21) Supplementary Figure 1 lacks a legend.

      We will add the legend for Supplementary Figure 1.

      (22) Figure 1E, data from "expanding" GC samples is not discussed.

      We will add a discussion of the "expanding" GC samples in the manuscript.

      (23) How are "high" and "low" defined in Figure 2A, right?

      Thank you for your question. In Figure 2A, the "high" and "low" categories on the x-axis are derived from the Friends analysis. This analysis is designed to compare the similarity between different genes or gene sets based on semantic similarity metrics from Gene Ontology (GO). The x-axis represents the semantic similarity score, which reflects how closely related the functions of the genes or gene sets are. This helps in identifying the most significant genes or those related to specific pathways or cell types of interest.

      GOSemSim[2.22.0]

      Yu G, Li F, Qin Y, Bo X, Wu Y, Wang S. GOSemSim: an R package for measuring semantic similarity among GO terms and gene products. Bioinformatics. 2010;26(7):976-978. doi:10.1093/bioinformatics/btq064

      (24) Font sizes in multiple figures need to increase. For example, Figure 2C (but many other places).

      The font sizes in the figures, including Figure 2C, will be increased as requested.

      (25) Figure 4K assays TE activity, not SE as stated in the text.

      SEs are composed of multiple TEs. ZFP36L1-E1 is a core element of the ZFP36L1-SE. Due to the excessive length of the ZFP36L1-SE sequence, it was not feasible to insert the entire SE into a dual-luciferase reporter plasmid. It is a common practice to validate such experiments by inserting the typical enhancer elements instead.

      (26) In Figure 6I, why is CD8 shown? What is the reason for choosing this?

      CD8α is primarily used to assess immune evasion by tumor cells against T-cell cytotoxicity. CD8α is typically negatively correlated with PD-L1 expression and serves as an indicator of T-cell infiltration.

      (27) The discussion should be more focussed. The majority of this is general stuff about either super enhancers or PD-L1 regulation. This should be curtailed and more pertinent things retained.

      We will revise the discussion to be more focused. The content will be streamlined to emphasize the most pertinent points related to our study.

      Reviewer #2 (Recommendations For The Authors):

      (28) In Figure 1H various immune cell populations differ between the two types of GC. Unclear what is the biological significance in the context of ZFP36L1.

      The results in Figure 1H provide insight into the SE-driven immune escape signatures of infiltrative gastric cancer (GC). These findings help to contextualize the role of ZFP36L1 in modulating the tumor microenvironment, particularly in relation to immune cell infiltration and immune evasion mechanisms.

      (29) A bivalent profile for H3K27ac is also observed in expanding gastric cancer (Figure 1B), not only in infiltrating GC as the authors claim.

      We did not intend to imply that bivalent H3K27ac enrichment is exclusive to infiltrating gastric cancer. In fact, super-enhancers were identified in both expanding and infiltrative GC. Our point was to highlight that the bivalent enrichment profile is more pronounced in infiltrative GC.

      (30) There is a typo in line 81.

      The typo in line 81 will be corrected. Thank you for pointing it out.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public reviews

      This study describes a group of CRH-releasing neurons, located in the paraventricular nucleus of the hypothalamus, which, in mice, affects both the state of sevoflurane anesthesia and a grooming behavior observed after it. PVHCRH neurons showed elevated calcium activity during the post-anesthesia period. Optogenetic activation of these PVHCRH neurons during sevoflurane anesthesia shifts the EEG from burst-suppression to a seemingly activated state (an apparent arousal effect), although without a behavioral correlate. Chemogenetic activation of the PVHCRH neurons delays sevoflurane-induced loss of righting reflex (another apparent arousal effect). On the other hand, chemogenetic inhibition of PVHCRH neurons delays recovery of righting reflex and decreases sevoflurane-induced stress (an apparent decrease in the arousal effect). The authors conclude that PVHCRH neurons "integrate" sevoflurane-induced anesthesia and stress. The authors also claim that their findings show that sevoflurane itself produces a post-anesthesia stress response that is independent of any surgical trauma, such as an incision. In its revised form, the article does not achieve its intended goal and will not have impact on the clinical practice of anesthesiology nor on anesthesiology research.

      Thanks for the reviews. Please see our responses to the following comments.

      Weaknesses:

      The most significant weaknesses remain:

      a) overinterpretation of the significance of their findings

      b) the failure to use another anesthetic as a control,

      c) a failure to compellingly link their post-sevoflurane measures in mice to anything measured in humans, and

      d) limitations in the novelty of the findings. These weaknesses are related to the primary concerns described below:

      Concerns about the primary conclusion that PVHCRH neurons integrate the anesthetic effects and post-anesthesia stress response of sevoflurane GA

      (1) After revision, their remain multiple places where it is claimed that PVHCRH neurons mediate the anesthetic effects of sevoflurane (impact statement: we explain "how sevoflurane-induced general anesthesia works..."; introduction: "the neuronal mechanisms that mediate the anesthetic effects...of sevoflurane GA remain poorly understood" and "PVHCRH neurons may act as a crucial node integrating the anesthetic effect and stress response of sevoflurane").The manuscript simply does not support these statements. The authors show that a short duration exposure to sevoflurane inhibits PVHCRH neurons, but this is followed by hyperexcitability of these neurons for a short period after anesthesia is terminated. They show that the induction and recovery from sevoflurane anesthesia can be modulated by PVHCRH neuronal activity, most likely through changes in brain state (measured by EEG). They also show that PVHCRH neuronal activity modulates corticosterone levels and grooming behavior observed post-anesthesia (which the authors argue are two stress responses).These two things (effects during anesthesia and effects post-anesthesia)may be mechanistically unrelated to each other. None of these observations relate to the primary mechanism of action for sevoflurane. All claims relating to "anesthetic effects" should be removed. Even the term "integration" seems wrong-it implies the PVH is combining information about the anesthetic effect and post-anesthesia stress responses.

      As requested, we have removed all claims related to ‘anesthetic effects’ or ‘integration’. Please see the revised manuscript.

      (2) lt is important to compare the effects of sevoflurane with at least one other inhaled ether anesthetic as one step towards elevating the impact of this paper to the level required for a journal such as eLife. Isoflurane, desflurane, and enflurane are ether anesthetics that are very similar to each other, as well as being similar to sevoflurane. For example, one study cited by the authors (Marana et al.2013) concludes that there is weak evidence for differences in stress-related hormones between sevoflurane and desflurane, with lower levels of cortisol and ACTH observed during the desflurane intraoperative period. It is important to determine whether desflurane activates PVHCRH neurons in the post-anesthesia period, and whether this is accompanied by excess grooming in the mice, because this will distinguish whether the effects of sevoflurane generalize to other inhaled anesthestics, or, alternatively, relate to unique idiosyncratic properties of this gas that may not be a part of its anesthetic properties.

      Thanks for your insightful comments and suggestions. Regarding your request for additional experiments, we acknowledge the value they could add to our study. However, investigating whether the effects of sevoflurane generalize to other inhaled anesthetics is beyond the scope of our current study. There is evidence indicating the prevalence of anesthetic stress caused by inhaled ether anesthetics1,2. The post-anesthesia stress-related behaviors caused by sevoflurane administration are reminiscent of delirium observed clinically. Notably, studies have shown that the use of desflurane for maintenance of anesthesia did not significantly affect the incidence or duration of delirium compared to sevoflurane administration3. This suggests that our observations likely represent a generalized response to inhaled ether anesthetic rather than being specific to sevoflurane.

      Concerns about the clinical relevance of the experiments

      In anesthesiology practice, perioperative stress observed in patients is more commonly related to the trauma of the surgical intervention, with inadequate levels of antinociception or unconsciousness intraoperatively and/or poor post-operative pain control. The authors seem to be suggesting that the sevoflurane itself is causing stress because their mice receive sevoflurane but no invasive procedures, but there is no evidence of sevoflurane inducing stress in human patients. It is important to know whether sevoflurane effectively produces behavioral stress in the recovery room in patients that could be related to the putative stress response (excess grooming) observed in mice. For example, in surgeries or procedures which required only a brief period of unconsciousness that could be achieved by administering sevoflurane alone (comparable to the 30 min administered to the mice), is there clinical evidence of post-operative stress? It is also important to describe a rationale for using a 30 min sevoflurane exposure. What proportion of human surgeries using sevoflurane use exposure times that are comparable to this?

      It is also the case that there are explicit published findings showing that mild and moderate surgical procedures in children receiving sevoflurane (which might be the closest human proxy to the brief 30 minutes sevoflurane exposure used here) do not have elevated cortisol (Taylor et al, J Clin Endocrinol Metab, 2013). This again raises the question of whether the enhanced grooming or elevated corticosterone observed in the mice here has any relevance to humans.

      Thanks for the comments. Most ear, nose, and throat surgeries in children involve a short period of anesthesia with sevoflurane alone4-6, which is similar to the 30-minute exposure in our mouse study. In clinical settings, emergence delirium and agitation are common in young children undergoing sevoflurane anesthesia7, often accompanied by troublesome excitation phenomena during induction and awakening8. These clinical observations align with the post-operative stress response (e.g., excessive grooming) we identified in our study.

      It is the experience of one of the reviewers that human patients who receive sevoflurane as the primary anesthetic do not wake up more stressed than if they had had one of the other GABAergic anesthetics. If there were signs of stress upon emergence (increased heart rate, blood pressure, thrashing movements) from general anesthesia, this would be treated immediately. The most likely cause of post-operative stress behaviors in humans is probably inadequate anti-nociception during the procedure, which translates into inadequate post-op analgesia and likely delirium. It is the case that children receiving sevoflurane do have a higher likelihood of post-operative delirium. Perhaps the authors' studies address a mechanism for delirium associated with sevoflurane, but this is barely mentioned. Delirium seems likely to be the closest clinical phenomenon to what was studied. As noted by the Besnier et al (2017) article cited by the authors, surgery can elevate postoperative glucocorticoidstress hormones, but it generally correlates with the intensity of the surgical procedure. Besnier et al also note the elevation of glucocorticoids is generally considered to be adaptive. Thus, reducing glucocorticoids during surgery with sevoflurane may hamper recovery, especially as it relates to tissue damage, which was not measured or considered here. This paper only considers glucocorticoid release as a negative factor, which causes "immunosuppression", "proteolysis", and "delays postoperative recovery and leads to increased morbidity".

      Thanks for the comments. We agree that the post-anesthetic stress behaviors mentioned in our manuscript are similar to the clinical phenomenon of delirium, which were defined in Cheng Li’s study as ‘sevoflurane-induced post-operative delirium’9. Therefore, we conducted additional behavioral tests for cognitive function, including the Y-maze and novel object recognition test, in mice administrated 30-minute sevoflurane anesthesia. The results demonstrate that chemogenetic inhibition of PVHCRH neurons ameliorated the short-term memory impairment in mice exposed to 30-minute sevoflurane GA (Figure 7-figure supplement 9), suggesting PVHCRH neurons may involve in modulating sevoflurane-induced postoperative delirium.

      Concerns about the novelty of the findings:

      The key finding here is that CRH neurons mediate measures of arousal, and arousal modulates sevoflurane anesthesia induction and recovery. However, CRH is associated with arousal in numerous studies. In fact, the authors' own work, published in eLife in 2021, showed that stimulating the hypothalamic CRH cells lead to arousal and their inhibition promoted hypersomnia. In both papers the authors use fos expression in CRH cells during a specific event to implicate the cells, then manipulate them and measure EEG responses. In the previous work, the cells were active during wakefulness; here- they were active in the awake state the follows anesthesia (Figure1). Thus, the findings in the current work are incremental and not particularly impactful. Claims like "Here, a core hypothalamic ensemble, corticotropin-releasing hormone neurons in the paraventricular nucleus of the hypothalamus, is discovered" are overstated. PVHCRH cell populations were discovered in the 1980s. Suggesting that it is novel to identify that hypothalamic CRH cells regulate post-anesthesia stress is unfounded as well: this PVH population has been shown over four decades to regulate a plethora of different responses to stress. Anesthesia stress is no different. Their role in arousal is not being discovered in this paper. Even their role in grooming is not discovered in this paper.

      Thanks for the comments. As requested, we have revised our manuscript by removing overstated sentences. Please see the revised manuscript. In terms of novelty, our study reveals that PVHCRH neurons are implicated not only in the induction and emergence of sevoflurane general anesthesia but also in sevoflurane-induced post-operative delirium. This finding represents a novel contribution to the field, as it has not been previously reported by other studies.

      The activation of CRH cells in PVH has already been shown to result in grooming by Jaideep Bains (a paper cited by the authors). Thus, the involvement of these cells in this behavior is not surprising. The authors perform elaborate manipulations of CRH cells and numerous analyses of grooming and related behaviors. For example, they compare grooming and paw licking after anesthesia with those after other stressors such as forced swim, spraying mice with water, physical attack and restraint. The authors have identified a behavioral phenomenon in a rodent model that does not have a clear correlation with a behavior state observed in humans during the use of sevoflurane as part of an anesthetic regimen. The grooming behaviors are not a model of the emergence delirium or the cognitive dysfunction observed commonly in patients receiving sevoflurane for general anesthesia. Emergence delirium is commonly seen in children after sevoflurane is used as part of general anesthesia and cognitive dysfunction is commonly observed in adults-particularly the elderly-- following general anesthesia. No features of delirium or cognitive dysfunction are measured here.

      As requested, behavioral tests for cognitive function have been conducted and displayed in Figure 7-figure supplement 9.

      Other concerns:

      In Figure 2, cFos was measured in the PVH at different points before, during and after sevoflurane. The greatest cFos expression was seen in Post 2, the latest time point after anesthesia. However, this may simply reflect the fact that there is a delay between activity levels and expression of cFos (as noted by the authors, 2-3 hours). Thus, sacrificing mice 30 minutes after the onset of sevoflurane application would be expected to drive minimal cFos expression, and the cFos observed at 30 minutes would not accurately reflect the activity levels during the sevoflurane. Also, the authors state that the hyperactivity, as measured by cFos, lasted "approximately 1 hours before returning to baseline", but there is no data to support this return to baseline.

      Thanks for the comments. We apologize that the protocol we used for c-fos staining may not accurately reflect the activity levels, so we have removed Figure 2F. The sentence ‘lasted approximately 1 hours before returning to baseline’ refers to the calcium signal but not c-fos level.

      In Figure 7, the number of animals appears to change from panel to panel even though they are supposed to show animals from the same groups. For example, cort was measured in only 3 saline-treated O2 animals (Fig 7E), but cFos and CRH were assessed in 4 (Fig C,D). Similarly, grooming time and time spent in open arms was measured in 6 saline-treated O2 controls (Fig 7F, H) but central distance was measured in 8(Fig 7G). There are other group number discrepancies in this figure--the number of data points in the plots do not match what is reported in the legend for numerous groups. Similarly, Figure 4 has a mismatch between the Ns reported in the legend and the number of points plotted per bar. For example, there were 10 animals in the hM3Di group; all are shown for the LORR and time to emergence plots, but only8 were used for time to induction. The legends reported N=7 for the mCherry group, yet 9 are shown for the time to emergence panel. No reason for exclusions is cited. These figures (and their statistics) should be corrected.

      Thanks for the comments. We have rechecked and corrected our figures and illustrations in the revised manuscript.

      Recommendations for the authors:

      In Figure 6, the BSR pre-stim data points for panels F and H look exactly identical, even though these data are from two different sets of mice. It seems likely that one of these panels is not depicting the correct pre-stim data points. Please check this.

      Thanks for the comments. We have corrected this mistake.

      General anesthesia is a combination of behavioral and physiological states induced and maintained primarily by pharmacologic agents. The authors do not provide a definition of general anesthesia.

      Thanks for the advice. We have added the definition of general anesthesia in the introduction part.

      The first sentence of the abstract closely resembles the first sentence of the abstract of Brown,Purdon and Van Dort,Annu. Rev. Neurosci. 2011,34:601-28 yet, there is no citation.

      Thanks for the comments. We have revised the first sentence.

      ln the Discussion, the authors cite the research on circuitry that is relevant for emergence from general anesthesia. Conspicuously missing from this section of the paper is the large body of work by Solt and colleagues which has demonstrated that dopamine agonists (such as methylphenidate), electrical stimulation of the ventral tegmental area and optogenetic stimulation of the D1 neurons in the ventral tegmental area can hasten emergence from general anesthesia. Also omitted is the work of Kelzand colleagues and a discussion of neural inertia.

      Thanks for the suggestions. We have added these citations as requested.

      As regards the weaknesses of p-values for reporting the results of scientific studies, l offer the following reference to the authors. Ronald L. Wasserstein & Nicole A.Lazar (2016)The ASA Statement on p-Values: Context, Process, and Purpose, The American Statistician,70:2,129- 133, DOl:10.1080/00031305.2016.1154108

      Thanks for the suggestions. We have revised the manuscript as requested.

      The methods for the CRF antibody are unclear. It was previously suggested that the antibody be validated (for example, show an absence of immunostaining with CRF knockdown) because the concentration of antiserum (1:800) is quite high, suggesting either the antibody is not potent or (more concerning) not specific. The methods also indicated that colchicine was infused ICV prior to perfusion for staining of cFos and CRF, but no surgical methods are described that would enable ICV infusion, and it is not clear why colchicine was used. Please clarify.

      The anti-CRF antibody is validated by other studies11,12. F For CRF immunostaining, animals' brains were pre-treated with intraventricular injections of colchicine (20 μg in 500 nL saline) 24 hours before perfusion to inhibit fast axonal transport13,14. Additional details regarding these methods have been included in the Method section of the revised manuscript.

      Editor's note:

      Full statistical reporting including exact p-values alongside summary statistics (test statistic and df) and 95% confidence intervals is lacking.

      Thanks for the suggestions. We have added full statistical reporting in the revised manuscript as requested.

      Reference

      (1) Marana, E. et al. Desflurane versus sevoflurane: a comparison on stress response. Minerva Anestesiol 79, 7-14 (2013).

      (2) Yang, L., Chen, Z. & Xiang, D. Effects of intravenous anesthesia with sevoflurane combined with propofol on intraoperative hemodynamics, postoperative stress disorder and cognitive function in elderly patients undergoing laparoscopic surgery. Pak J Med Sci 38, 1938-1944, doi:10.12669/pjms.38.7.5763 (2022).

      (3) Driscoll, J. N. et al. Comparing incidence of emergence delirium between sevoflurane and desflurane in children following routine otolaryngology procedures. Minerva Anestesiol 83, 383-391, doi:10.23736/s0375-9393.16.11362-8 (2017).

      (4) Galinkin, J. L. et al. Use of intranasal fentanyl in children undergoing myringotomy and tube placement during halothane and sevoflurane anesthesia. Anesthesiology 93, 1378-1383, doi:10.1097/00000542-200012000-00006 (2000).

      (5) Greenspun, J. C., Hannallah, R. S., Welborn, L. G. & Norden, J. M. Comparison of sevoflurane and halothane anesthesia in children undergoing outpatient ear, nose, and throat surgery. J Clin Anesth 7, 398-402, doi:10.1016/0952-8180(95)00071-o (1995).

      (6) Messieha, Z. Prevention of sevoflurane delirium and agitation with propofol. Anesth Prog 60, 67-71, doi:10.2344/0003-3006-60.3.67 (2013).

      (7) Shi, M. et al. Dexmedetomidine for the prevention of emergence delirium and postoperative behavioral changes in pediatric patients with sevoflurane anesthesia: a double-blind, randomized trial. Drug Des Devel Ther 13, 897-905, doi:10.2147/dddt.S196075 (2019).

      (8) Veyckemans, F. Excitation and delirium during sevoflurane anesthesia in pediatric patients. Minerva Anestesiol 68, 402-405 (2002).

      (9) Xu, Y., Gao, G., Sun, X., Liu, Q. & Li, C. ATPase Inhibitory Factor 1 Is Critical for Regulating Sevoflurane-Induced Microglial Inflammatory Responses and Caspase-3 Activation. Front Cell Neurosci 15, 770666, doi:10.3389/fncel.2021.770666 (2021).

      (10) Friedman, E. B. et al. A conserved behavioral state barrier impedes transitions between anesthetic-induced unconsciousness and wakefulness: evidence for neural inertia. PLoS One 5, e11903, doi:10.1371/journal.pone.0011903 (2010).

      (11) Giardino, W. J. et al. Parallel circuits from the bed nuclei of stria terminalis to the lateral hypothalamus drive opposing emotional states. Nat Neurosci 21, 1084-1095, doi:10.1038/s41593-018-0198-x (2018).

      (12) Yeo, S. H., Kyle, V., Blouet, C., Jones, S. & Colledge, W. H. Mapping neuronal inputs to Kiss1 neurons in the arcuate nucleus of the mouse. PLoS One 14, e0213927, doi:10.1371/journal.pone.0213927 (2019).

      (13) de Goeij, D. C. et al. Repeated stress-induced activation of corticotropin-releasing factor neurons enhances vasopressin stores and colocalization with corticotropin-releasing factor in the median eminence of rats. Neuroendocrinology 53, 150-159, doi:10.1159/000125712 (1991).

      (14) Yuan, Y. et al. Reward Inhibits Paraventricular CRH Neurons to Relieve Stress. Curr Biol 29, 1243-1251.e1244, doi:10.1016/j.cub.2019.02.048 (2019).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      Li Zhang et al. characterized two new Gram-negative endolysins identified through an AMPtargeted search in bacterial proteomes. These endolysins exhibit broad lytic activity against both Gram-negative and Gram-positive bacteria and retain significant antimicrobial activity even after prolonged exposure to high temperatures (100{degree sign}C for 1 hour). This stability is attributed to a temperature-reversible transition from a dimer to a monomer. The authors suggest several potential applications, such as complementing heat sterilization processes or being used in animal feed premixes that undergo high-temperature pelleting, which I agree with. 

      We appreciate the reviewer’s valuable comments and suggestions.

      Strengths: 

      The claims are well-supported by relevant and complementary assays, as well as extensive bioinformatic analyses. 

      We appreciate the reviewer’s valuable comments and suggestions.

      Weaknesses: 

      There are numerous statements in the introduction and discussion sections that I currently do not agree with and consider need to be addressed. Therefore, I recommend major revisions. 

      Based on your valuable comments and suggestions, we have revised relevant introduction and discussion sections (pages 3-4, lines 82-101; page 21, lines 480-483).

      Major comments: 

      Introduction and Discussion: 

      The introduction and the discussion are currently too general and not focused. Furthermore, there are some key concepts that are missing and are important for the reader to have an overview of the current state-of-the-art regarding endolysins that target gram-negatives. Specifically, the concepts of 'Artilysins', 'Innolysins', and 'Lysocins' are not introduced. Besides this, the authors do not mention other high-throughput mining or engineering strategies for endolysins, such as e.g. the VersaTile platform, which was initially developed by Hans Gerstmans et al. for one of the targeted pathogens in this manuscript (i.e., Acinetobacter baumannii). Recent works by Niels Vander Elst et al. have demonstrated that this VersaTile platform can be used to high-throughput screen and hit-to-lead select endolysins in the magnitude tens of thousands. Lastly, Roberto Vázquez et al. have recently demonstrated with bio-informatic analyses that approximately 30% of Gram-negative endolysin entries have AMP-like regions (hydrophobic short sequences), and that these entries are interesting candidates for further wet lab testing due to their outer membrane penetrating capacities. Therefore, I fully disagree with the statement being made in the introduction that endolysin strategies to target Gram-negatives are 'in its infancy' and I urge the authors to provide a new introduction that properly gives an overview of the Gram-negative endolysin field.   

      We thank the reviewer for the valuable suggestions. A new paragraph has been added to the revised manuscript to reflect the concepts and strategies for lysin engineering and discovery against Gram-negative bacteria (pages 3-4, lines 82-101). 

      Results: 

      It should be mentioned that the halo assay is a qualitative assay for activity testing. I personally do not like that the size of the halos is used to discriminate in endolysin activity. In this reviewer's opinion, the size of the halo is highly dependent on (i) the molecular size of the endolysin as smaller proteins can diffuse further in the agar, and (ii) the affinity of the CBD subdomain of the endolysin for the bacterial peptidoglycan. It should also be said that in the halo assay, there is a long contact time between the endolysin and the bacteria that are statically embedded in the agar, which can result in false positive results. How did the authors mitigate this? 

      We quite agree with the reviewer that the halo assay is only a qualitative method for activity testing and may be perturbed by multiple parameters (DOI:

      10.3390/antibiotics9090621). In our study, the halo assay was used only as a preliminary method to rapidly distinguish the activities of multiple candidates, and then the candidates with high antibacterial activities were further characterized through a series of in vitro and in vivo assays in this work.

      Testing should have been done at equimolar concentrations. If the authors decided to e.g. test 50 µg/mL for each protein, how was this then compensated for differences in molecular weight? For example, if PHAb10 and PHAb11 have smaller molecular sizes than PHAb7, 8, and 9, there is more protein present in 50 µg/mL for the first two compared to the others, and this would explain the higher decrease in bacterial killing (and possibly the larger halos). 

      We thank the reviewer for his valuable suggestions and concerns. We agree with the reviewer that when we need to know exactly how much times more active an enzyme is than the another, we should directly compare the performance of the two enzymes at equimolar concentrations. In our previous work, we followed this rule to distinguish novel chimeric lysins from their parental lysins or their variants (DOI: 10.1128/AAC.00311-20; DOI: 10.1128/AAC.01610-19; DOI: 10.1128/AAC.02043-18). In the present work, our initial goal of testing was to reflect the robustness and efficiency of screening strategy initiated by lysinderived antimicrobial peptides. With this in mind, we therefore did not spend more effort to compare the activities of these candidates in detail but continued to clarify their host range and thermo-tolerance mechanisms, and then continued to examine their performance in infection models. Nonetheless, in future work, we will definitely follow your suggestions when it is necessary to quantify the differences between these candidates.  

      Reviewer #2 (Public Review)

      Summary: 

      The study explores a new strategy of lysin-derived antimicrobial peptide-primed screening to find peptidoglycan hydrolases from bacterial proteomes. Using this strategy authors identified five peptidoglycan hydrolases from A. baumannii. They further tested their antimicrobial activities on various Gram-positive and Gram-negative pathogens.

      We appreciate the reviewer’s valuable comments.

      Strengths: 

      Overall, the study is good and adds new members to the peptidoglycan hydrolases family. The authors also show that these lysins have bactericidal activities against both Gram-positive and Gram-negative bacteria. The crystal structure data is good, and reveals different thermostablility to the peptidoglycan hydrolases. Structural data also reveals that PhAb10 and PHAb11 form thermostable dimers and data is corroborated by generating variant protein defective in supporting intermolecular bond pairs. The mice bacterial infection shows promise for the use of these hydrolases as antimicrobial agents. 

      We appreciate the reviewer’s valuable comments and suggestions.

      Weaknesses: 

      While the authors have employed various mechanisms to justify their findings, some aspects are still unclear. Only CFU has been used to test bactericidal activity. This should also be corroborated by live/dead assay. Moreover, SEM or TEM analysis would reveal the effect of these peptidoglycan hydrolases on Gram-negative /Gram-positive cell envelopes. The authors claim that these hydrolases are similar to T4 lysozyme, but they have not correlated their findings with already published findings on T4 lysozyme. T4 lysozyme has a C-terminal amphipathic helix with antimicrobial properties. Moreover, heat, denatured lysozyme also shows enhanced bactericidal activity due to the formation of hydrophobic dimeric forms, which are inserted in the membrane. Authors also observe that heat-denatured PHAb10 and PHAb11 have bactericidal activity but no enzymatic activity. These findings should be corroborated by studying the effect of these holoenzymes/ truncated peptides on bacterial cell membranes. Also, a quantitative peptidoglycan cleavage assay should be performed in addition to the halo assay. Including these details would make the work more comprehensive. 

      We thank the reviewer for his valuable suggestions and concerns. We agree with the reviewer that employing more methods and techniques such as SEM, TEM, live/dead imaging, and GC-MS will provide a deeper understanding of how these peptidoglycan hydrolases interact with the bacterial envelopes and peptidoglycan bones, which will definitely make our study more comprehensive. The principal idea of this study is, however, to test the robustness and effectiveness of the screening strategy triggered by lysin-derived antimicrobial peptide in discovering new peptidoglycan hydrolases. Therefore, we did not put more efforts in charactering the interactions of these peptidoglycan hydrolases with the bacterial envelopes/membranes in multiple assays; instead, we continued to elucidate their host range and thermo-tolerance mechanisms and then continued to examine their performance in infection models. 

      We are also very grateful to the reviewers for their suggestions to correlate our results to published findings on lysozymes. Based on these suggestions, we have included an extensive discussion in the Discussion section of the revised manuscript (page 22, lines 502-514).

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      Abstract and title. 

      In my opinion, the current title does not fully cover the work that is presented in the manuscript. 

      According to your valuable comment, we have revised the title to “Dimer-monomer transition defines a hyper-thermostable peptidoglycan hydrolase mined from bacterial proteome by lysin-derived antimicrobial peptide-primed screening”.

      Please remove the word 'novel' from the title, as well as elsewhere in the manuscript. As it is true that PHAb10 and PHAb11 are new, they are not novel. There are many reports that have been published on endolysins with activity against Gram-negatives, and sometimes even also Gram-positives. 

      We have changed the description of PHAb10 and PHAb11 to avoid using the word “novel”, but alternatively, using “new” or “active” in the title and throughout the text in the revised manuscript.

      Additional information for the Introduction section in the Public Review: 

      DOI: 10.1128/AAC.00285-16  

      DOI: 10.1038/s41598-020-68983-3 

      DOI: 10.1128/AAC.00342-19  

      DOI: 10.1126/sciadv.aaz1136   

      DOI: 10.1111/1751-7915.14339 

      DOI: 10.1128/JVI.00321-21  G-

      We appreciate the reviewer for these selected references and have cited almost all of them in a new paragraph in the Introduction section of the revised manuscript (pages 3-4, lines 82-101).

      Minor Comments: 

      Line 30. For a lay person it is not clear what is meant by 'unique mechanism of action.' 

      These has been replaced by “direct peptidoglycan degradation activity” in the revised manuscript (page 2, lines 30-31).

      Line 60 & 62. Please merge these sentences into one as they have the same meaning.

      We have deleted one of the sentences based on your suggestion.

      Line 67. Replace 'also' with 'simultaneously'. 

      Revised as suggested (page 3, line 66).

      Line 74. 'Modern clinical practice' should specifically refer to infectious diseases in humans. 

      Revised as suggested (page 3, line 73).

      Line 76 to 105. There is too much information that is not focused. This section should be rewritten so that it is in line with the focus of the presented work. I would remove this section and replace it with a new section as proposed in my major comments. 

      Based on your suggestion, we deleted this section and prepared a new paragraph in the revised manuscript (pages 3-4, lines 82-101).

      Line 113. I strongly disagree with the wording 'in its infancy'. Please see my major comment. 

      We have rewritten the paragraph as “However, compared with the current progress in the clinical translation of lysins against Gram-positive bacteria, the discovery of lysins against Gram-negative bacteria that meet the needs described in the WHO priority pathogen list is still urgently needed.” according to your valuable comments in the revised manuscript (page 4, lines 98-101).

      Line 116. Remove 'on'. 

      Revised as suggested (page 4, line 104).

      Results. 

      Additional information for the Results section in the Public Review: 

      DOI: 10.3390/antibiotics9090621

      We thank the reviewer for this valuable reference, which has been cited in the Results section and Methods sections of the revised manuscript (page 7, line 159; page 25, line 605).

      Minor comments: 

      Line 135. Replace 'would' with 'could'.

      Revised as suggested (page 5, line 124).

      Line 150. Why was this naming decided to go from 11 -> 7, whereas in Figure 1a the clades go from I to V? This way of naming is not clear to me. 

      Thank you for the reviewer's question. There are two numbering systems here: 1-11 is the numbering of peptidoglycan hydrolases mined from different bacterial proteomes by lysin-derived peptide primer screening strategy, and the characterization of candidates mined from the proteome of A. baumannii are 7 to 11 (characterization candidates numbered 1 to 6 are from other bacterial proteomes). Whereas the cladistic analysis of all potential candidates in the A. baumannii proteome is regularly labelled by clade I to V. 

      Line 250. Replace 'casts doubt' with 'questions'. 

      Revised as suggested (page 10, line 244).

      Line 252 to 257. I would encourage the authors to mention if there is any homology in between the peptides on the one hand, and in between the lysozyme catalytic domains on the other hand.

      This information has been added to the revised manuscript (page 10, lines 249-251).

      Line 266. The following sentence should be reworded: 'However, rare lytic activity was observed in P11-NP, suggesting that a potential role for it in functions other than bactericidal

      activity.' 

      In the revised manuscript (page 11, lines 261-262), the sentence has been revised as “However, rare lytic activity was observed in P11-NP, suggesting that its function remains to be established”.

      Line 276. Replace 'asked' with 'questioned'.

      Revised as suggested (page 11, line 270).

      From 302 onwards. Why was it chosen to solve the crystal structure of PHAb8, and not PHAb7 and 9? This should be briefly mentioned. 

      Initially we tried to decipher the structures of all five enzymes, but we finally obtained the crystal structures of only three enzymes, PHAb8, PHAb10, and PHAb11 by Xray crystallography. This reason has been added in the revised manuscript (page 13, lines 300301).

      Line 437. Replace 'the burn wound model' with 'a burn wound model'. 

      Revised as suggested (page 19, line 433).

      Line 445. Replace 'the mouse abscess model' with 'a mouse abscess model'. 

      Revised as suggested (page 19, line 441).

      Line 449 to 451. Given that the mice received 5 doses of minocycline and no difference was observed with the group that received tris buffer, was it tested if the Acinetobacter baumannii 3437 isolates became resistant against minocycline during the experiment? 

      We appreciate the Reviewer for his valuable concern. In our study, we did not explore in detail the reasons why minocycline was ineffective. But we strongly agree with the reviewer that drug resistance may be one of the reasons.

      Discussion. 

      Minor comments. 

      Line 479. Delete this sentence: 'Policy makers, scientists, enterprisers, and investigators have worked together for decades to exploit the 'trojan horse' globally, but new options for treating antimicrobial resistance in the clinic remain to be seen'. 

      Revised as suggested.

      Line 483. Reformulate as follows: 'unique mechanism of action, potent bactericidal activity, low risks of drug resistance, and ongoing clinical trials targeting Gram-positive bacteria.' To my knowledge, all these clinical trials target S. aureus, but I might be wrong. 

      Revised as suggested (page 21, lines 476-478).

      Line 486. 'However, for Gram-negative bacteria, the effects of phage-derived lysins were often hampered by their outer membranes, which requires more strategies to overcome this barrier.' After this sentence, the concepts of Artilysins, Innolysins, and Lysocins should be mentioned, in addition to the introduction. These are important engineering strategies and the reader should be informed that your strategy is thus not the only existent one. 

      Revised as suggested (page 21, lines 480-482).

      Line 491. Please, again refer to the work of Roberto Vázquez et al., who has done very similar work to your work presented. DOI: 10.1128/JVI.00321-21 

      We have cited this interesting work in the Introduction section and Discussion section of the revised manuscript (page 4, line 106; page 21, lines 482-483).

      Line 499. Reformulate: 'Gram-positive bacteria are primarily killed through the action of the antimicrobial peptides only'. 

      According to your suggestion, it was changed to “while Gram-positive bacteria are killed mainly through the action of the intrinsic antimicrobial peptides” in the revised manuscript (pages 21-22, lines 497-498).

      Line 500. Delete this sentence, as this is already mentioned in the results and too detailed:

      'Interestingly, we noted a difference in the killing of Gram-positive bacteria by PHAb10 and PHAb11, which may be due to the fact that P11-CP had one more basic amino acid than P10CP, so it had stronger bactericidal activity.' 

      Revised as suggested.

      Line 503. This statement doesn't make sense because you cannot directly compare ug/mL between endolysins, you must compare equimolar concentrations. Furthermore, testing conditions between studies were different, thus making this claim unjustified. 

      These statements have been deleted in the revised manuscript.

      Line 524. Please delete:' To our knowledge, this is the first time that an enzyme had been found to adapt to ambient temperature by altering its dimerization state.' 

      Revised as suggested.

      Figures. 

      Figure 1a. Please choose a different name for 'dry job' and 'wet job'. 

      Following your suggestion, they have been specified as “In silico analysis” and “Experimental verification” in the revised Figure 1a. 

      Figure 6. I suggest moving Figure 6e to the supplementary materials and reorganizing Figure 6 with only panels a to d. 

      Revised as suggested.

      Materials and Methods, References, and Supplementary Materials.  No comments. 

      Reviewer #2 (Recommendations For The Authors): 

      Most figure labelings are very small and difficult to read. 

      All figures in the revised manuscript have been replaced with high-resolution figures, which hopefully will make these labels easier to follow.

      The authors should include a data availability statement in the manuscript.

      Revised as suggested (page 28, lines 704-706).

    1. Author response:

      Reviewer #1:

      (1) Adding microscopy of the untreated group to compare Figure 2A with would further strengthen the findings here.

      Thank you for your comments on our manuscript. We will carefully revise this part. Actually, we used a time-lapse method to capture images at 0 minute before any drugs were added. We will change '0 min' to 'untreated,' which will further strengthen our findings.

      (2) Quantification of immune infiltration and histological scoring of kidney, liver, and spleen in the various treatment groups would increase the impact of Figure 4.

      Thank you for your comments on our manuscript. To further strengthen Figure 4, we will use quantification of immune infiltration and histological scoring of the kidney, liver, and spleen in different groups. Additionally, we will use ImageJ software for molecular immunohistochemistry and determine the ratio of normal to abnormal cells, providing more comprehensive insights into the effects of the treatments.

      (3) The data in Figure 6 I is not sufficiently convincing as being significant.

      Thank you for your comments on our manuscript. Previous researches have shown that antibiotics and other drugs can cause alterations in gut microbiota. Therefore, we plan to study the effects of linalool on gut microbiota. The results of this part were mostly built on gut microbiota sequencing and correlation analysis, we have tried several times to isolate vital microbes from the gut, but this is a very challenging work and the results were not good. Thus, in this study, we just predicted the effects of linalool on gut microbiota. In the future, we will continue to delve into interesting aspects of how linalool affects gut microbiota.

      (4) Comparisons of the global transcriptomic analysis of the untreated group to the PC, LP, and LT groups would strengthen the author's claims about the immunological and transcriptomic changes caused by linalool and provide a true baseline.

      Thank you for your comments on our manuscript. We will compare the global transcriptomic analysis of the untreated group with the PC, LP, and LT groups to strengthen the claims about the immunological and transcriptomic changes induced by linalool, thereby providing a true baseline.

      Reviewer #2:

      (1) The authors have taken for granted that the readers already know the experiments/assays used in the manuscript. There was not enough explanation for the figures as well as figure legends.

      Thank you for your comments on our manuscript. We will provide more detailed explanations of the experiments and assays used in the manuscript, as well as enhance the descriptions in the figure legends, to ensure that readers have a clear understanding of the figures and context.

      (2) The authors missed adding the serial numbers to the references.

      Thank you for your comments on our manuscript. We will add serial numbers to the references to ensure proper citation and improve the clarity of our manuscript.

      (3) The introduction section does not provide adequate rationale for their work, rather it is focused more on the assays done.

      Thank you for your comments on our manuscript. We will add a section to the introduction that provides a rationale for our work, specifically focusing on the impact of plant extract on immunoregulation.

      (4) Full forms are missing in many places (both in the text and figure legends), also the resolution of the figures is not good. In some figures, the font size is too small.

      Thank you for your comments on our manuscript. We will ensure that all abbreviations are expanded where necessary, both in the text and figure legends. Additionally, we will improve the resolution of the figures and increase the font size where needed to enhance clarity.

      (5) There is much mislabeling of the figure panels in the main text. A detailed explanation of why and how they did the experiments and how the results were interpreted is missing.

      Thank you for your comments on our manuscript. We will improve the labeling of the figure panels, provide detailed explanations of the experimental methods, including their rationale and interpretation, and clarify the connections between the methods.

      (6) There is not enough experimental data to support their hypothesis on the mechanism of action of linalool. Most of the data comes from pathway analysis, and experimental validation is missing.

      Thank you for your comments on our manuscript. We have tried our best to link transcriptomic data, pathway analysis, experimental validation. We carried out many experiments to substantiate the changes inferred from the transcriptomic data as SEM, TEM, CLSM, molecular docking, RT-qPCR, histopathological examinations. The detailed information is listed as follows. (1) As shown in Figure 2, we combined the transcriptomic data related to membrane and organelle with SEM, TEM, and CLSM images. After deep analysis of these data and observation together, we illustrated that cell membrane may be a potential target for linalool. (2) As shown in Figure 3, we carried out molecular docking to explore the specific binding protein of linalool with ribosome which were screen out as potential target of linalool by transcriptomic data. (3) As shown in Figure 5, transcriptomic data illustrated that linalool enhanced the host complement and coagulation system. To substantiate these changes, we carried out RT-qPCR to detect those important immune-related gene expressions, and found that RT-qPCR analysis results were consistent with the expression trend of transcriptome analysis genes. (4) As shown in Figure 4 and 5, transcriptomics data revealed that linalool promoted wound healing tissue repair, and phagocytosis (Figure. 5E). To ensure these, we carried out histopathological examinations, and found that linalool alleviated tissue damage caused by S. parasitica infection on the dorsal surface of grass carp and enhancing the healing capacity (Figure. 4G). But we know the antimicrobial mechanism of linalool need further investigation, we will conduct more experiments to explore the antimicrobial mechanism of action of linalool in the future.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors of this study aim to use an optimization algorithm approach, based on the established NelderMead method, to infer polymer models that best match input bulk Hi-C contact data. The procedure infers the best parameters of a generic polymer model that combines loop-extrusion (LE) dynamics and compartmentalization of chromatin types driven by weak biochemical affinities. Using this and DNA FISH, the authors investigate the chromatin structure of the MYC locus in leukemia cells, showing that loop extrusion alone cannot explain local pathogenic chromatin rearrangements. Finally, they study the locus single-cell heterogeneity and time dynamics.

      Strengths:

      - The optimization method provides a fast computational tool that speeds up the parameter search of complex chromatin polymer models and is a good technical advancement.

      - The method is not restricted to short genomic regions, as in principle it can be applied genome-wide to any input Hi-C dataset, and could be potentially useful for testing predictions on chromatin structure.

      Weaknesses:

      (1) The optimization is based on the iterative comparison of simulated and Hi-C contact matrices using the Spearman correlation. However, the inferred set of the best-fit simulation parameters could sensitively depend on such a specific metric choice, questioning the robustness of the output polymer models. How do results change by using different correlation coefficients?

      This is an important question. We have tested several metrics in the process of building the fitting procedure. We now showcase side-by-side comparisons of the fitting results obtained using these different metrics in supplementary figure 2.

      (2) The best-fit contact threshold of 420nm seems a quite large value, considering that contact probabilities of pairs of loci at the mega-base scale are defined within 150nm (see, e.g., (Bintu et al. 2018) and  (Takei et al. 2021)).

      This is a good point. Unfortunately, there is no established standard distance cutoff to map distances to Hi-C contact frequency data. Indeed, previous publications have used anywhere between 120 nm to 500 nm (see e.g. (Cardozo Gizzi et al. 2019), (Cattoni et al. 2017) , (Mateo et al. 2019), (Hafner et al. 2022), (Murphy and Boettiger 2022), (Takei et al. 2021), (Fudenberg and Imakaev 2017) , (Wang et al. 2016), (Su et al. 2020), (Chen et al. 2022), (Finn et al. 2019)). 

      We have included a supplementary table in the revised preprint (supplementary table 3) listing these values to demonstrate the lack of consensus. This large variation could reflect different chromatin compaction levels across distinct model systems, and different spatial resolutions in DNA FISH experiments performed by different labs. The variance in the threshold choice is also likely partially explained by Hi-C experimental details, e.g. the enzyme used for digestion, which biases the effective length scale of interactions detected (Akgol Oksuz et al. 2021). Among commonly used restriction enzymes, HindIII has a relatively low cutting frequency which results in a lower sensitivity to short-range interactions; on the other hand, MboI has a higher cutting frequency which results in a higher sensitivity to short-range interactions (Akgol Oksuz et al. 2021). Because the Hi-C data we used for the Myc locus in (Kloetgen et al. 2020) was generated using HindIII, we chose a distance cutoff close to the larger end of published values (420 nm). 

      (3) In their model, the authors consider the presence of LE anchor sites at Hi-C TAD boundaries. Do they correspond to real, experimentally found CTCF sites located at genomic positions, or they are just assumed? A track of CTCF peaks of the considered chromatin loci would be needed.

      We apologize this was not clear. The LE anchor sites in the simulation model were chosen because they correspond to experimental CTCF sites and ChIP-seq peaks located at the corresponding genomic positions. Representative CTCF ChIP-seq tracks from (Kloetgen et al. 2020) have been added to figure 2A in the revised preprint version to emphasize this point.

      (4) In the model, each TAD is assigned a specific energy affinity value. Do the different domain types (i.e., different colors) have a mutually attractive energy? If so, what is its value and how is it determined? The simulated contact maps (e.g., Figure 2C) seem to allow attractions between different blocks, yet this is unclear.

      Sorry this was not explicit. The attraction energy between a pair of monomers in the simulation is determined using the geometric mean of the affinities of the two monomers. This applies to both monomers within the same domain and in different domains. This detail has been clarified in the Methods section: “To optimize the simulation duration to streamline the parameter search (Supp. Fig. 1 B), we computed the autocorrelation function of the TAD2-TAD4 inter-TAD distance using the initial guess simulation parameters of the MYC locus in CUTLL. The simulation was saved every 5 simulation blocks.”

      (5) To substantiate the claim that the simulations can predict heterogeneity across single cells, the authors should perform additional analyses. For instance, they could plot the histograms (models vs. experiments) of the TAD2-TAD4 distance distributions and check whether the models can recapitulate the FISH-observed variance or standard deviation. They could also add other testable predictions, e.g., on gyration radius distributions, kurtosis, all-against-all comparison of single-molecule distance matrices, etc,.

      We agree that heterogeneity prediction is a key advantage of the simulations. We do note that the histograms (models vs. experiments) of the TAD2-TAD4 distance distributions measured by FISH were plotted in Fig. 3C as empirical cumulative probability distributions (as is standard in the field), side by side with the simulation predictions. Simulations indeed recapitulate the variance observed by FISH. We also had emphasized this important point in the main text: “Importantly, not just the average distances, but the shape of the distance distribution across individual cells closely matches the predictions of the simulations in both cell types, further confirming that the simulations can predict heterogeneity across cells.”

      (6) The authors state that loop extrusion is crucial for enhancer function only at large distances. How does that reconcile, e.g., with Mach et al. Nature Gen. (2022) where LE is found to constrain the dynamics of genomically close (150kb) chromatin loci?

      This is an interesting question. In (Mach et al. 2022), the authors tracked the physical distance between two fluorescent labels positioned next to either anchor of a ~150 kb engineered topological domain using live-cell imaging. They found that abrogation of the loop anchors by ablation of the CTCF binding motifs, or knock-down of the cohesin subunit Rad21 resulted in increased physical distance between the loci. HMM Modeling of the distance over time traces suggests that the increased distance resulted from rarer and shorter contacts between the anchors. While this might seem at odds with the results of Fig. 4L, we note a key difference between the loci. While (Mach et al. 2022) observed the dynamics of the distance separating two CTCF loop anchors, in our model only the MYC promoter is proximal to a loop anchor, while the position of the second locus is varied, but remains far from the other anchor. The deletion of the CTCF sites at both anchors in (Mach et al. 2022) indeed results in a lowered sensitivity of the physical distance to Rad21 knock-down, reminiscent of the results of Fig. 4L in our work. This result demonstrates that loop extrusion disruption disproportionately impacts distances between loci close to loop anchors, consistent with Hi-C results (Rao et al. 2017; Nora et al. 2017). We therefore believe that the models in our work and (Mach et al. 2022) are not at odds, but simply reflect that loop extrusion perturbations impact distances between loop anchors the most.  Enhancer-Promoter loops are generally distinct from CTCF-mediated loops (Hsieh et al. 2020, 2022). While (Mach et al. 2022) represents a landmark study in our understanding of the dynamics of genomic folding by loop extrusion, we therefore believe that the locus we chose here - which matches the endogenous MYC architecture - may more accurately represent Enhancer-Promoter dynamics than a synthetic CTCF loop.  To better articulate the similarities between model predictions and differences between the two loci, we have simulated a synthetic locus matching that of (Mach et al. 2022) in the revised preprint. Our simulation recapitulates the results obtained by Mach et al, including the sensitivity of contact frequency and duration to in silico cohesin knock-down (supplementary figure 6). We have updated the Results section accordingly: “The dependence of contact dynamics on loop extrusion in our simulations of MYC differs from that previously observed for two TAD boundaries (45). To check whether the different results are the product of different simulation models, we simulated contact dynamics across two TAD boundaries matching the locus of (45). Our simulations recapitulate the distance distribution and loop extrusion dependence previously observed (Supp. Fig. 6), establishing that the differences between the two systems are biological. While loop extrusion controls both the frequency and duration of contacts at TAD boundaries, it exerts a more nuanced effect on the frequency of contacts in loci pairs like the MYC locus that might better reflect typical enhancer-promoter pairs.”

      Reviewer #2 (Public Review):

      Summary:

      The authors Fu et al., developed polymer models that combine loop extrusion with attractive interactions to best describe Hi-C population average data. They analyzed Hi-C data of the MYC locus as an example and developed an optimization strategy to extract the parameters that best fit this average Hi-C data.

      Strengths:

      The model has an intuitive nature and the authors masterfully fitted the model to predict relevant biology/Hi-C methodology parameters. This includes loop extrusion parameters, the need for self-interaction with specific energies, and the time and distance parameters expected for Hi-C capture.

      Weaknesses:

      (1) We are no longer in the age in which the community only has access to population average Hi-C. Why was only the population average Hi-C used in this study?

      Can single-cell data: i.e. single-cell Hi-C/Dip-C data or chromatin tracing data (i.e. see Tan et al Science 2018 - for Dip-C, Bintu et al Science 2018, Su et al Cell 2020 for chromatin tracing, etc.) or even 2 color DNA FISH data (used here only as validation) better constrain these models? At the very least the simulations themselves could be used to answer this essential question.

      I am expecting that the single-cell variance and overall distributions of distances between loci might better constrain the models, and the authors should at least comment on it.

      We agree that it is possible to recapitulate single-cell Hi-C or chromatin tracing data with simulations, and that these data modalities have a superior potential to constrain polymer models because they provide an ensemble of single allele structures rather than population-averaged contact frequencies. However, these data remain out of reach for most labs compared to Hi-C. Our goal with this work was to provide an approachable method that anyone interested could deploy on their locus of choice, and reasoned that Hi-C currently remains the data modality available to most. We envision this strategy will help reach labs beyond the small number of groups expert in single cell chromatin architecture, and thus hopefully broaden the impact of polymer simulations in the chromatin organization field. 

      Nevertheless, we do agree that the comparison of single-cell chromatin architectures to simulations is a fertile ground for future studies, and have modified the preprint accordingly (Discussion):

      “Future work extending this framework to single cell readouts out chromatin architecture (e.g. single-cell Hi-C or chromatin tracing) holds promise to further constrain chromatin models.”

      (2) The authors claimed "Our parameter optimization can be adapted to build biophysical models of any locus of interest. Despite the model's simplicity, the best-fit simulations are sufficient to predict the contribution of loop extrusion and domain interactions, as well as single-cell variability from Hi-C data. Modeling dynamics enables testing mechanistic relationships between chromatin dynamics and transcription regulation. As more experimental results emerge to define simulation parameters, updates to the model should further increase its power." The focus on the Myc locus in this study is too narrow for this claim. I am expecting at least one more locus for testing the generality of this model.

      We note that we used two distinct loci in the initial version of our study, the MYC locus in leukemia vs T cells (Figs. 2-3) and a representative locus in experiments comparing WT CTCF with a mutant that leads to loss of a subset of CTCF binding sites (Fig. 1L). To further demonstrate generality, we have added to the revised preprint a demonstration of the simulation fitting to other loci acquired in different cell types (supplementary figure 3).

      Recommendations for the authors:.

      Reviewer #1 (Recommendations For The Authors):

      (1) The Methods part of the imaging analysis lacks some quantitative details that could be useful for the readers: what is the frequency of double detections? How "small" is the 3D region around the centroid? How many cells with no spots or more than four spots are excluded?

      We have clarified these important analysis parameters in the revised version of the preprint (Methods), including supplementary Table 2, listing the statistics of excluded cells:

      “We then cropped out a small 3D region (20x20x10 pixels) around each approximate centroid, and subtracted the surrounding background intensity.”

      “Cells with no spots or more than four spots were excluded from the cell cycle analysis (statistics in Supp. Table 2).”

      (2) How is the autocorrelation function of chromatin structures computed?

      We computed the autocorrelation function of the TAD2-TAD4 inter-TAD distance using the initial guess simulation parameters (Eattr, boundary permeabilities) of the MYC locus in CUTLL. All other simulation parameters are the same as other simulations in the preprint. The structure of the locus was saved every 5 simulation blocks. These structures were used to compute the TAD2-TAD4 inter-TAD distance as a function of time, which was used to calculate the autocorrelation function. This has been clarified in the revised version of the preprint (Methods):

      “To optimize the simulation duration to streamline the parameter search (Supp. Fig. 1 B), we computed the autocorrelation function of the TAD2-TAD4 inter-TAD distance using the initial guess simulation parameters of the MYC locus in CUTLL. The simulation was saved every 5 simulation blocks.”

      (3) How is the monomer length (35nm) chosen to best compare FISH data?

      Because monomer length is difficult to derive from first principles, the standard in the field is to convert the size of a simulated monomer into a physical distance using a reference measurement in the system of choice. Similar to the Hi-C distance threshold, values for monomer size vary throughout the literature, e.g. 53 nm per 3 kbp monomer (Giorgetti et al. 2014), 50 nm per 2.5 kbp monomer (Nuebler et al. 2018), or from 36 to 60 nm per 3 kbp monomer, depending on the cell line or model details (Conte et al. 2022; Conte et al. 2020). 

      Here we used the mean of the median TAD2-TAD4 distances in T Cells and CUTLL as our length reference, and converted simulation distances into nm by matching this value. We obtained 35 nm per 2.5 kbp monomer, a value well within the range of the literature values (see above).

      Using this simple conversion, the simulated distance distributions recapitulate two independent metrics accessible by DNA FISH: the shift in median distances between T cell and CUTLL, and the width of each distribution. This agreement indicates that simulations recapitulate both the differences between the two cell types, and the single cell heterogeneity within each cell type. 

      (4) The main text does not make clear the "known" biophysical parameters that establish the model ground truth.

      In the initial validation of the fitting procedure, by “known biophysical parameters”, we meant that we generated simulated Hi-C maps in which we set the left/right permeabilities at each boundary, and Eattr values within each TAD to known values. We then assessed how well the fitting could recover these known ground truth values by trying to match the simulated representative Hi-C map. The specific values chosen are plotted for each set of simulations in Fig.1 F, H, J. The main text has been made more explicit in the revised preprint version (Results):

      “We first validated the optimization method using ground truth maps built from simulation runs with known values of StallL, StallR, Eattr for each boundary/domainbiophysical parameters.”

      (5) What are the correlation coefficients between experimental and model contact maps in Figure 1L?

      We apologize for the oversight. The missing coefficient values have been added in the revised version of the manuscript (Results):

      “As expected, the simulation predicted a significant drop of 0.13 in boundary permeability in CTCFmut compared to WT (Fig. 1 L; Spearman Correlation: 0.85±0.02 for CTCFmut, 0.82±0.01 for WT).”

      (6) Figure 2A, B: Contact matrices look oversaturated. Next, why do model contact maps have negative values?

      We apologize this was not clear. Figure 2 A,B plotted the log value of the contact matrices, thus the negative values. This has been made explicit in the revised version of the preprint (Fig. 2 Legend). 

      (7) For model reproducibility, the authors could report the coordinates of the Hi-C TAD boundaries employed for the model.

      We have included in the revised version of the preprint an explicit mention of all genomic coordinates of the loci simulated in the Methods section:

      “The model used to fit into MYC Hi-C data consists of 1920 monomers representing chr8:126,720,000131,680,000, with the TAD boundaries located at monomer 456 (chr8: 127,840,000 - 127,880,001), monomer

      808 (chr8: 128,720,000 - 128,760,001), monomer 1178 (chr8: 130,160,000 - 130,200,001) and monomer 1592 (chr8: 130,680,000 - 130,720,001).”

      (8) What is the shaded area in Figure 3C?

      The shaded area in Figure 3C is the standard deviation calculated from three independent DNA FISH or simulation replicates for each bin of the histogram. This detail has been clarified in the revised preprint (Figure 3 legend). 

      (9) In the Discussion, I suggest changing as follows: "the time- and distance-gated model proposed here recapitulates several observations" -> "the time- and distance-gated model proposed here could recapitulate several observations", as they are speculations.

      The sentence has been changed accordingly in the revised preprint (Discussion). Thank you for the suggestion. 

      Reviewer #2 (Recommendations For The Authors):

      Suggest analyzing the ability of single-cell data to better constrain dynamical models.

      While we agree that modeling single-cell distributions is a worthwhile endeavor to be explored in future work, we believe that the tool presented here serves a slightly different purpose: enabling labs that only have access to the most widespread technique at present to perform simulations to interrogate the forces that shape the organization of an arbitrary locus in their model of choice. Analyzing single-cell data is in principle very powerful, but would by necessity be limited to the small number of systems where these cutting-edge techniques have been deployed. 

      Suggest selecting another locus other than MYC to demonstrate generality.

      We note that we used two distinct loci in the study, the MYC locus in leukemia vs. T cells (Figs. 2-3) and a representative locus in experiments comparing WT CTCF with a mutant that leads to loss of a subset of CTCF binding sites (Fig. 1L). To further demonstrate generality, we have added to the revised preprint a demonstration of the simulation fitting to other loci acquired in different cell types (supplementary figure 3).

      Akgol Oksuz, Betul, Liyan Yang, Sameer Abraham, Sergey V. Venev, Nils Krietenstein, Krishna Mohan Parsi, Hakan Ozadam, et al. 2021. “Systematic Evaluation of Chromosome Conformation Capture Assays.” Nature Methods 18 (9): 1046–55.

      Bintu, Bogdan, Leslie J. Mateo, Jun-Han Su, Nicholas A. Sinnott-Armstrong, Mirae Parker, Seon Kinrot, Kei Yamaya, Alistair N. Boettiger, and Xiaowei Zhuang. 2018. “Super-Resolution Chromatin Tracing Reveals Domains and Cooperative Interactions in Single Cells.” Science 362 (6413). https://doi.org/10.1126/science.aau1783.

      Cardozo Gizzi, Andrés M., Diego I. Cattoni, Jean-Bernard Fiche, Sergio M. Espinola, Julian Gurgo, Olivier Messina, Christophe Houbron, et al. 2019. “Microscopy-Based Chromosome Conformation Capture Enables Simultaneous Visualization of Genome Organization and Transcription in Intact Organisms.” Molecular Cell 74 (1): 212–22.e5.

      Cattoni, Diego I., Andrés M. Cardozo Gizzi, Mariya Georgieva, Marco Di Stefano, Alessandro Valeri, Delphine Chamousset, Christophe Houbron, et al. 2017. “Single-Cell Absolute Contact Probability Detection Reveals Chromosomes Are Organized by Multiple Low-Frequency yet Specific Interactions.” Nature Communications 8 (1): 1753.

      Chen, Liang-Fu, Hannah Katherine Long, Minhee Park, Tomek Swigut, Alistair Nicol Boettiger, and Joanna Wysocka. 2022. “Structural Elements Facilitate Extreme Long-Range Gene Regulation at a Human Disease Locus.” bioRxiv. https://doi.org/10.1101/2022.10.20.513057.

      Finn, Elizabeth H., Gianluca Pegoraro, Hugo B. Brandão, Anne-Laure Valton, Marlies E. Oomen, Job Dekker, Leonid Mirny, and Tom Misteli. 2019. “Extensive Heterogeneity and Intrinsic Variation in Spatial Genome Organization.” Cell 176 (6): 1502–15.e10.

      Fudenberg, Geoffrey, and Maxim Imakaev. 2017. “FISH-Ing for Captured Contacts: Towards Reconciling FISH and 3C.” Nature Methods 14 (7): 673–78.

      Hafner, Antonina, Minhee Park, Scott E. Berger, Elphège P. Nora, and Alistair N. Boettiger. 2022. “Loop Stacking Organizes Genome Folding from TADs to Chromosomes.” bioRxiv. https://doi.org/10.1101/2022.07.13.499982.

      Hsieh, Tsung-Han S., Claudia Cattoglio, Elena Slobodyanyuk, Anders S. Hansen, Xavier Darzacq, and Robert Tjian. 2022. “Enhancer-Promoter Interactions and Transcription Are Largely Maintained upon Acute Loss of CTCF, Cohesin, WAPL or YY1.” Nature Genetics 54 (12): 1919–32.

      Hsieh, Tsung-Han S., Claudia Cattoglio, Elena Slobodyanyuk, Anders S. Hansen, Oliver J. Rando, Robert Tjian, and Xavier Darzacq. 2020. “Resolving the 3D Landscape of Transcription-Linked Mammalian Chromatin Folding.” Molecular Cell 78 (3): 539–53.e8.

      Kloetgen, Andreas, Palaniraja Thandapani, Panagiotis Ntziachristos, Yohana Ghebrechristos, Sofia Nomikou, Charalampos Lazaris, Xufeng Chen, et al. 2020. “Three-Dimensional Chromatin Landscapes in T Cell Acute Lymphoblastic Leukemia.” Nature Genetics 52 (4): 388–400.

      Mach, Pia, Pavel I. Kos, Yinxiu Zhan, Julie Cramard, Simon Gaudin, Jana Tünnermann, Edoardo Marchi, et al. 2022. “Cohesin and CTCF Control the Dynamics of Chromosome Folding.” Nature Genetics 54 (12): 1907–18.

      Mateo, Leslie J., Sedona E. Murphy, Antonina Hafner, Isaac S. Cinquini, Carly A. Walker, and Alistair N. Boettiger. 2019. “Visualizing DNA Folding and RNA in Embryos at Single-Cell Resolution.” Nature 568 (7750): 49–54.

      Murphy, Sedona, and Alistair Nicol Boettiger. 2022. “Polycomb Repression of Hox Genes Involves Spatial Feedback but Not Domain Compaction or Demixing.” bioRxiv. https://doi.org/10.1101/2022.10.14.512199.

      Nora, Elphège P., Anton Goloborodko, Anne-Laure Valton, Johan H. Gibcus, Alec Uebersohn, Nezar Abdennur, Job Dekker, Leonid A. Mirny, and Benoit G. Bruneau. 2017. “Targeted Degradation of CTCF Decouples Local Insulation of Chromosome Domains from Genomic Compartmentalization.” Cell 169 (5): 930–44.e22.

      Nuebler, Johannes, Geoffrey Fudenberg, Maxim Imakaev, Nezar Abdennur, and Leonid A. Mirny. 2018. “Chromatin Organization by an Interplay of Loop Extrusion and Compartmental Segregation.” Proceedings of the National Academy of Sciences of the United States of America 115 (29): E6697–6706.

      Rao, Suhas S. P., Su-Chen Huang, Brian Glenn St Hilaire, Jesse M. Engreitz, Elizabeth M. Perez, Kyong-Rim Kieffer-Kwon, Adrian L. Sanborn, et al. 2017. “Cohesin Loss Eliminates All Loop Domains.” Cell 171 (2): 305–20.e24.

      Su, Jun-Han, Pu Zheng, Seon S. Kinrot, Bogdan Bintu, and Xiaowei Zhuang. 2020. “Genome-Scale Imaging of the 3D Organization and Transcriptional Activity of Chromatin.” Cell 182 (6): 1641–59.e26.

      Takei, Yodai, Shiwei Zheng, Jina Yun, Sheel Shah, Nico Pierson, Jonathan White, Simone Schindler, Carsten H. Tischbirek, Guo-Cheng Yuan, and Long Cai. 2021. “Single-Cell Nuclear Architecture across Cell Types in the Mouse Brain.” Science 374 (6567): 586–94.

      Wang, Siyuan, Jun-Han Su, Brian J. Beliveau, Bogdan Bintu, Jeffrey R. Moffitt, Chao-Ting Wu, and Xiaowei Zhuang. 2016. “Spatial Organization of Chromatin Domains and Compartments in Single Chromosomes.” Science 353 (6299): 598–602.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Matsui et al. present an experimental pipeline for visualizing the molecular machinery of synapses in the brain, which includes numerous techniques, starting with generating labeled antibodies and recombinant mice, continuing with HPF and FIB milling, and finishing with tilt series collection and 3D image processing. This pipeline represents a breakthrough in the preparation of brain tissue for high-resolution imaging and can be used in future tomographic research to reconstruct molecular details of synaptic complexes as well as pre- and post-synaptic assemblies. This methodology can also be adapted for a broader range of tissue preparations and signifies the next step towards a better structural understanding of how molecular machineries operate in natural conditions.

      Strengths:

      The manuscript is very well written, contains a detailed description of methodology, provides nice illustrations, and will be an outstanding guide for future research.

      Weaknesses:

      None noted.

      Reviewer #2 (Public Review):

      Summary:

      The authors present a method that allows for the identification and localization of molecular machinery at chemical synapses in unstained, unfixed native brain tissue slices. They believe that this approach will provide a 3D structural basis for understanding different mechanisms of synaptic transmission, plasticity, and development. To achieve this, the group used genetically engineered mouse lines and generated thin brain slices that underwent high-pressure freezing (HPF) and focused ion beam (FIB) milling. Utilizing cryo-electron tomography (cryo-ET) and integrating it with cryo-fluorescence microscopy, they achieved micrometer resolution in identifying the glutamatergic synapses along with nanometer resolution to locate AMPA receptors GluA2-subunits using Fab-AuNP conjugates. The findings are summarized with detailed examples of successfully prepared substrates for cryo-ET, specific morphological identification and localization, and the detailed structural organization of excitatory synapses, including synaptic vesicle clusters close to the postsynaptic density and in the cleft.

      Strengths:

      The study advances previous work that used cultured neurons or synaptosomes. Combining cryo-electron tomography (cryo-ET) with fluorescence-guided targeting and labeling with Fab-AuNP conjugates enabled the study of synapses and molecular structures in their native environment without chemical fixation or staining. This preserves their near-native state, offering high specificity and resolution. The methods developed are generalizable, allowing adaptation for identifying and localizing other key molecules at glutamatergic synapses and potentially useful for studying a variety of synapses and cellular structures beyond the scope of this research.

      Weaknesses

      The preparation and imaging techniques are complex and require highly specialized equipment and expertise, potentially limiting their accessibility and widespread adoption.

      Additionally, the methods might need further modifications/tweaks to study other types of synapses or molecular structures effectively.

      The reliance on genetically engineered mouse lines may again impact the generalizability of the findings.

      Similarly, the requirement of monoclonal, high-affinity antibodies/Fab fragments to specifically label receptors/proteins would limit the wider employment of these methods.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Matsui et al. present an experimental pipeline for visualizing the molecular machinery of synapsis in the brain, which includes numerous techniques, starting with generating labeled antibodies and recombinant mice, continuing with HPF and FIB milling, and finishing with tilt series collection and 3D image processing. This pipeline represents a breakthrough in the preparation of brain tissue for high-resolution imaging and can be used in future tomographic research to reconstruct molecular details of synaptic complexes as well as pre- and post-synaptic assemblies. This methodology can also be adapted for a broader range of tissue preparations and signifies the next step towards a better structural understanding of how molecular machineries operate in natural conditions.

      The manuscript is very well written, contains a detailed description of methodology, provides nice illustrations, and will be an outstanding guide for future research. I only have a few suggestions to further improve this excellent manuscript.

      The labeling experiment in Supplementary Figure 3 may have a limitation in the accessibility of certain "narrow" regions to 15F1Fabs (both JF646 and AuNP labeled). Would that be more correct to refer to the labeling of accessible GluA2-containing AMPARs rather than the majority of these receptors in the tissue (lines 180-183)?

      The text has been modified to reference “accessible GluA2-containing AMPARs”

      Minor comments:

      (1) Lines 38-39. "natively derived" appears to be unnecessary here and can be deleted.

      Done

      (2) Line 153. Please specify the 20% dextran cryoprotectant.

      Done.

      (3) Lines 155-157. Please label the stratum radiatum and stratum lacunosum-moleculare in Figure 3B.

      Done

      (4) Figures 1C, 2B, 5B, 5D-E. Missing units for Y-axes.

      Done

      (5) Supplemental Figure 1. Please add band annotation.

      Done

      (6) Supplemental Figure 3. Scale bars are missing.

      Done

      (7) Supplemental Video 1 does not play.

      The video file has been corrected.

      Reviewer #2 (Recommendations For The Authors):

      My congratulations to the authors for undertaking this challenging work.

      Major concerns that need to be addressed:

      It's unclear if the anti-GluA2 15F1 Fab-AuNP conjugate would affect the receptor clustering and localization on the synaptic membranes. It binds at the distal end, which is likely to impact its interactions with other synaptic proteins, which may affect the synaptic organization and function.

      Concern addressed in the ‘Discussion’ section.

      The hippocampal slices were treated with the anti-GluA2 15F1 Fab-148 AuNP conjugate for 1 hour at room temperature. It might be helpful to discuss the potential affects of Fab-AuNp on synaptic function. It has been demonstrated previously that introducing binders of the receptors ectodomains can affect synaptic function.

      Concern also addressed in the ‘Discussion’ section. 

      Kunimichi Suzuki et al. Science369,eabb4853(2020).DOI:10.1126/science.abb4853 https://patents.google.com/patent/US20230192810A1/en

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, the authors established an in vitro triple co-culture BBB model and demonstrated its advantages compared with the mono or double co-culture BBB model. Further, the authors used their established in vitro BBB model and combined it with other methodologies to investigate the specific mechanism that co-culture with astrocytes but also neurons enhanced the integrity of endothelial cells.

      Strengths:

      The results persuasively showed the established triple co-culture BBB model well mimicked several important characteristics of BBB compared with the mono-culture BBB model, including better barrier function and in vivo/in vitro correlation. The human-derived immortalized cells used made the model construction process faster and more efficient, and have a better in vivo correlation without species differences. This model is expected to be a useful high-throughput evaluation tool in the development of CNS drugs.

      Based on the previous experimental results, detailed studies investigated how co-culture with neurons and astrocytes promoted claudin-5 and VE-cadherin in endothelial cells, and the specific signaling mechanisms were also studied. Interestingly, the authors found that neurons also released GDNF to promote barrier properties of brain endothelial cells, as most current research has focused on the promoting effect of astrocytes-derived GDNF on BBB. Meanwhile, the author also validated the functions of GDNF for BBB integrity in vivo by silencing GDNF in mouse brains. Overall, the experiments and data presented support their claim that, in addition to astrocytes, neurons also have a promoting effect on the barrier function of endothelial cells through GDNF secretion.

      Weaknesses:

      Although the authors demonstrated a highly usable for predicting the BBB permeability, recorded TEER measurements are still far from the human BBB in vivo reported measurements of TEER, and expression of transporters was not promoted by co-culture, which may lead to the model being unsuitable for studying drug transport mediated by transporters on BBB.

      Thank the reviewer very much for the opportunity to improve our manuscript. The immortalized human cell lines, hCMEC/D3 cell, have poor barrier properties and differences in the expression of some transporters and metabolic enzymes as well as TEER compared to human physiological BBB. However, the use of human primary BMECs may be restricted by the acquisition of materials and ethical approval. Isolation and purification of human primary BMECs are time-consuming and laborious. Moreover, culture conditions can alter transcriptional activity (PMID: 37076016). All limit the establishment of BBB models based on primary human BMECs for high-throughput screening. Thus, hCMEC/D3 is still widely used to study characteristics of drug transport across BBB and the effects of certain diseases on BBB (PMID: 37076016; 38711118; 31163193) as it is easy to culture and can express a large number of transporters and metabolic enzymes in its physiological state. Therefore, hCMEC/D3 cells were selected to develop our in vitro BBB model.

      Reviewer #1 (Recommendations For The Authors):

      Point 1: The authors claim that GDNF is mainly released by human neuroblastoma SH-SY5Y cells in the in vitro BBB model, but there are still some differences between the characteristics of cell lines and neurons. The authors should discuss or provide evidence about the distribution and source of GDNF in the brain to support this conclusion.

      We greatly appreciate your helpful suggestions. According to your advice, we have revised the “Discussion” in the revised manuscript as follows:

      In “Discussion”:

      “GDNF is mainly expressed in astrocytes and neurons (Lonka-Nevalaita et al., 2010; Pochon et al., 1997). In adult animals, GDNF is mainly secreted by striatal neurons rather than astrocytes and microglial cells (Hidalgo-Figueroa et al., 2012). The present study also shows that GDNF mRNA levels in SH-SY5Y cells were significantly higher than that in U251 cells. GDNF was also detected in conditioned medium from SH-SY5Y cells. All these results demonstrate that neurons may secrete GDNF”.

      Point 2: The authors found that co-culture induced the proliferation of endothelial cells (Figure 1H). I suggest the authors discuss whether the proliferation of endothelial cells would affect their permeability.

      Thanks for your suggestion. According to your advice, we have investigated the effect of cell proliferation on the leakage of the cell layer and included the results in Figure 1—figure supplement 1. The present study showed that basic fibroblast growth factor (bFGF) increased cell proliferation of hCMEC/D3 cells but little affected the expression of both claudin-5 and VE-cadherin (in Figure 2F). The hCMEC/D3 cells were incubated with different doses of bFGF and permeabilities of fluorescein (NaF) and FITC-Dextran 3–5 kDa across hCMEC/D3 cell monolayer were measured. The results showed that incubation with bFGF increased cell proliferation and reduced permeabilities of fluorescein and FITC-Dextran across hCMEC/D3 cell monolayer. However, the permeability reduction was less than that by double co-culture with U251 cells or triple co-culture. These results inferred that contribution of cell proliferation to the barrier function of hCMEC/D3 cells was minor. We have made the modifications in “Results” of our manuscript as follows:

      In “Result”:

      “Furthermore, hCMEC/D3 cells were incubated with basic fibroblast growth factor (bFGF), which promotes cell proliferation without affecting both claudin-5 and VE-cadherin expression (Figure 2F). The results showed that incubation with bFGF increased cell proliferation and reduced permeabilities of fluorescein and FITC-Dex across hCMEC/D3 cell monolayer. However, the permeability reduction was less than that by double co-culture with U251 cells or triple co-culture. These results inferred that contribution of cell proliferation to the barrier function of hCMEC/D3 was minor (Figure 1—figure supplement 1)”.

      Point 3: The authors claimed that GDNF induced the expression of claudin-5 and VE-cadherin separately. However, Andrea Taddei et al. reported that VE-cadherin itself also regulates claudin-5 through the inhibitory activity of FoxO1 (Andrea Taddei et al., 2008). The authors did not consider whether the upregulation of claudin-5 is associated with the increase of VE-cadherin.

      Thank you for your suggestion. We also investigated whether VE-cadherin affected claudin-5 expression in hCMEC/D3 cells transfected with VE-cadherin siRNA. It was not consistent with the report by Taddei et al. that silencing VE-cadherin only slightly decreased the mRNA level of claudin-5 without significant difference. Furthermore, basal and GDNF-induced claudin-5 protein levels were unaltered by silencing VE-cadherin. The discrepancies may come from characteristics of the tested cells. Endothelial cells derived from murine embryonic stem cells with homozygous null mutation were used in Taddei’s study, while we transfected immortalized brain microvascular endothelial cells with siRNA. Several reports have demonstrated different mechanisms regulating expression of claudin-5 and VE-cadherin. In retinal endothelial cells, hyperglycemia remarkably reduced claudin-5 expression (but not VE-cadherin) (PMID: 24594192). However, in hCMEC/D3 cells, hypoglycemia significantly decreased claudin-5 (not VE-cadherin) expression but hyperglycemia increased VE-cadherin expression (not claudin 5) (PMID: 24708805). Therefore, the roles of VE-cadherin in regulation of claudin-5 in BBB should be further investigated.

      Following your valuable suggestion, we have modified the “Results”, “Discussion” and “Figure 4—figure supplement 1” in the revised manuscript as follows:

      In “Result”:

      “It was reported that VE-cadherin also upregulates claudin-5 via inhibiting FOXO1 activities (Taddei et al, 2008). Effect of VE-cadherin on claudin-5 was studied in hCMEC/D3 cells silencing VE-cadherin. It was not consistent with the report by Taddei et al. that silencing VE-cadherin only slightly decreased the mRNA level of claudin-5 without significant difference. Furthermore, basal and GDNF-induced claudin-5 protein levels were unaltered by silencing VE-cadherin (Figure 4—figure supplement 1). Thus, the roles of VE-cadherin in regulation of claudin-5 in BBB should be further investigated.”

      In “Discussion”:

      “Claudin-5 expression is also regulated by VE-cadherin (Taddei et al., 2008). Differing from the previous reports, silencing VE-cadherin with siRNA only slightly affected basal and GDNF-induced claudin-5 expression. The discrepancies may come from different characteristics of the tested cells. Several reports have supported the above deduction. In retinal endothelial cells, hyperglycemia remarkably reduced claudin-5 expression (but not VE-cadherin) (Saker et al., 2014). However, in hCMEC/D3 cells, hypoglycemia significantly decreased claudin-5 expression but hyperglycemia increased VE-cadherin expression (Sajja et al., 2014)”.

      “Figure 4—figure supplement 1: The contribution of VE-Cadherin on the GDNF-induced claudin-5 expression. Effects of the VE-Cadherin siRNA (siVE-Cad) on mRNA expression of VE-cadherin (A) and claudin-5 (B). Effects of siVE-Cad and GDNF on claudin-5 and VE-cadherin protein expression (C). NC: negative control plasmids. The above data are shown as the mean ± SEM. Four biological replicates per group. Two technical replicates for A and B, and one technical replicate for C. Statistical significance was determined using unpaired Student’s t-test or one-way ANOVA test followed by Fisher’s LSD test.”

      Point 4:  The annotation of significance with the p-values in the figures might not be visually concise and clear. It is recommended to provide the p-values in the legends or raw data.

      Thank you for your valuable suggestion. We have revised our figures in our revised manuscript. The specific p-values and statistical methods were summarized in the source data files of each figure.

      Point 5: The authors need to note the material of the Transwell membrane used to increase the reproducibility of experiments, because different materials may cause differences in permeability and TEER (DianeM. Wuest et al., 2013).

      We greatly appreciate your valuable suggestions. According to your advice, we have provided the information on the material of the Transwell membrane in the “Materials and Methods” in the revised manuscript as follows:

      In “Materials and Methods”:

      “U251 cells were seeded at 2 × 104 cells/cm2 on the bottom of Transwell inserts (PET, 0.4 µm pore size, SPL Life Sciences, Pocheon, Korea) coated with rat-tail collagen (Corning Inc., Corning, NY, USA)”.

      Point 6: It is not necessary to abbreviate "in vitro/in vivo correlation" in the legend of Figure 7 as it was not mentioned again in the following text.

      Thank you for your valuable suggestion. We have deleted the abbreviation of "Figure 7" of the revised manuscript.

      In “Figure 7”

      “Figure 7. In vitro/in vivo correlation assay of BBB permeability."

      Reviewer #2 (Public Review):

      Summary:

      Yang and colleagues developed a new in vitro blood-brain barrier model that is relatively simple yet outperforms previous models. By incorporating a neuroblastoma cell line, they demonstrated increased electrical resistance and decreased permeability to small molecules.

      Strengths:

      The authors initially elucidated the soluble mediator responsible for enhancing endothelial functionality, namely GDNF. Subsequently, they elucidated the mechanisms by which GDNF upregulates the expression of VE-cadherin and Claudin-5. They further validated these findings in vivo, and demonstrated predictive value for molecular permeability as well. The study is meticulously conducted and easily comprehensible. The conclusions are firmly supported by the data, and the objectives are successfully achieved. This research is poised to advance future investigations in BBB permeability, leakage, dysfunction, disease modeling, and drug delivery, particularly in high-throughput experiments. I anticipate an enthusiastic reception from the community interested in this area. While other studies have produced similar results with tri-cultures (PMID: 25630899), this study notably enhances electrical resistance compared to previous attempts.

      Weaknesses:

      (A) Considerable effort has been directed towards developing in vitro models that more closely resemble their in vivo counterparts, utilizing stem cell-derived NVU cells. Although these examples are currently rudimentary, they offer better BBB mimicry than Yang's study.

      Thank you very much for your valuable comments. Indeed, hCMEC/D3 cells, have poor barrier properties and low TEER compared to human physiological BBB. The human pluripotent stem cells BBB models (hPSC-BBB models) make it possible to provide a robust and scalable cell source for BBB modeling, although many challenges remain, particularly concerning reproducibility and recreation of multifaceted phenotypes in vitro with increasing complexity. Moreover, the hPSC-derived BBB models are highly dependent upon the heterogeneous incorporation of hPSC-derived BMEC origins, cells derived from different protocols are not well validated and standardized in the BBB models. Thus, the hPSC-BBB models are still being developed and their clinic applications are still at an early stage (PMID: 34815809; 35755780). The hCMEC/D3 cell line is still widely used to study characteristics of drug transport across BBB and the effects of certain diseases on BBB (PMID: 37076016; 38711118; 31163193) as it is easy to culture and can express a large number of transporters and metabolic enzymes in its physiological state. Therefore, hCMEC/D3 cells were selected to develop our in vitro BBB model.

      (B) Additionally, some instances might benefit from more robust statistical tests; nonetheless, I do not think this would significantly alter the experimental conclusions.

      Thank you for your valuable suggestions on the statistical methods used in our study, which made us realize our lack of rigor in selecting statistical methods. We have made modifications to statistical methods, and all statistical results showed the manuscript have been updated accordingly.

      (C) Similar experiments with tri-cultures yielding analogous results have been reported by other authors (PMID: 25630899). TEER values are a bit higher than the aforementioned experiments; however, this study has values at least one order of magnitude lower than physiological levels.

      Thank your advice. We also noticed that TEER values in the present study were different from previous reports, which may come from types of BEMCs, astrocytes, and neurons.

      Reviewer #2 (Recommendations For The Authors):

      Point 1: If you've already decided to enhance the model by incorporating additional cell types, why not include pericytes as well? As mentioned in the public review, other studies have explored tri-culture models; adding pericytes or other cell types could provide valuable insights.

      We greatly appreciate your helpful suggestions. As you mentioned, the barrier function of our model still needs further improvement, which is also a limitation of our current model. In our future research, we will aim to optimize our model by incorporating other NVU cells. Beyond drug screening, we also hope that our in vitro BBB model can serve as a versatile tool to investigate underlying factors associated with neuropathological disorders. According to your advice, we have modified “Discussion” in the revised manuscript as follows:

      In “Discussion”:

      “However, the study also has some limitations. In addition to neurons and astrocytes, other cells such as microglia, pericytes, and vascular smooth muscle cells, especially pericytes, may also affect BBB function. How pericytes affect BBB function and interaction among neurons, astrocytes, and pericytes needs further investigation.”

      Point 2: The decline in TEER after 6 days is concerning. Have you extended your experiments beyond day 7? If so, what were the outcomes? Did the system degrade, leading to decreased resistance, or did cell death occur?

      We greatly appreciate your helpful recommendation. We also observed that the TEER of our culture system began to decline on day 7. To ensure the reliability of our experiments, our experiments were conducted on day 6 of co-cultivation and did not extend beyond day 7. We speculate that the reason for the decrease in TEER values may be due to excessive cell contact, which could inhibit cell proliferation and long-term cultivation may lead to cell aging. Similar results showing a decrease in TEER of i_n vitro_ BBB models after prolonged culture have been reported in other studies (PMID: 31079318; 8470770). To eliminate misunderstandings, we have made the following modifications to our manuscript:

      In “Result”:

      “TEER values were measured during the co-culture (Figure 1B). TEER values of the four in vitro BBB models gradually increased until day 6. On day 7, the TEER values showed a decreasing trend. Thus, six-day co-culture period was used for subsequent experiments”.

      In “In vitro BBB permeability study” of “Materials and Methods”:

      “On day 7, the TEER values of BBB models showed a decreasing trend. Therefore, the subsequent experiments were all completed on day 6”.

      Point 3: It is standard practice for figures to be referenced in the order they appear in the manuscript. However, Figures 1A and 1B are not mentioned until the end of the methods section. Adding a brief sentence at the beginning of the main body referencing these figures would improve the clarity of the experimental approach.

      Thank you for your valuable suggestion. We had made modifications to Figure 1, and the details of the cell model establishment process had been included in Figure 9 which is mentioned in the “Materials and Methods” section.

      Point 4: To strengthen the evidence supporting the proliferative effect of GDNF, consider incorporating additional measures beyond cell count alone. While an increase in cell count could be attributed to reduced cell death (given GDNF's pro-survival properties), proliferation effects have also been shown (PMID: 28878618). I suggest demonstrating proliferation with markers or cell cycle analysis would provide more robust evidence.

      Thank you for your helpful suggestion. We used EdU incorporation and CCK-8 assays to further detect the proliferation of hCMEC/D3 cells, and corresponding results were added in the revised Figure 1H and Figure 1I. The description of results is shown as follows:

      In “Results”:

      “Co-culture with SH-SY5Y, U251, and U251 + SH-SY5Y cells also enhanced the proliferation of hCMEC/D3 cells. Moreover, the promoting effect of SH-SY5Y cells was stronger than that of U251 cells (Figure 1G-1I).”

      Point 5: Could you specify the use of technical replicates in your experiments? How many?

      Thank you for your helpful suggestion, and we apologize for the issue you pointed out. We have now specified the technical replicates of experiments in the legends of the revised manuscript. In general, the technical replicate number of ELISA and qPCR is two, and that of the rest experiments is one. And we have also made the following modifications to our manuscript:

      In “Statistical analyses” of “Materials and Methods”:

      “All results are presented as mean ± SEM. The average of technical replicates generated a single independent value that contributes to the n value used for comparative statistical analysis”.

      Point 6: Given the sample size of 4 in most experiments, it may be insufficient for passing a normality test. Therefore, it's advisable to employ non-parametric tests such as the Kruskal-Wallis test, followed by appropriate post-hoc tests.

      Thank you for your valuable and useful suggestion. We apologize for our initial oversight regarding statistics. Based on your suggestion, we have thoroughly reviewed and revised the statistical methods and statistical results in the manuscript. Referring to the ‘Statistics Guide’ of GraphPad (H. J. Motulsky, "The power of nonparametric tests", GraphPad Statistics Guide. Accessed 20 June 2024. https://www.graphpad.com/guides/prism/latest/statistics/stat_the_power_of_nonparametric_tes.htm), the Kruskal-Wallis test is more robust when the data does not follow a normal distribution or homogeneity of variance. However, due to its reliance on ranks, it may have lower sensitivity in detecting small differences. If the total sample size is tiny, the Kruskal-Wallis test will always give a P value greater than 0.05 no matter how much the groups differ. To address this, we first used the Shapiro-Wilk test to assume whether the samples come from Gaussian distributions. For samples meeting this criterion, parametric tests were employed. For samples that do not follow the Gaussian distribution, as per your advice, we utilized the non-parametric tests. We have modified the “Statistical analyses” in the revised manuscript as follows:

      In “Statistical analyses” of “Materials and Methods”:

      “The data were assessed for Gaussian distributions using Shapiro-Wilk test. Brown-Forsythe test was employed to evaluate the homogeneity of variance between groups. For comparisons between two groups, statistical significance was determined by unpaired 2-tailed t-test. The acquired data with significant variation were tested using unpaired t-test with Welch's correction, and non-Gaussian distributed data were tested using Mann-Whitney test. For multiple group comparisons, one-way ANOVA followed by Fisher’s LSD test was used to determine statistical significance. The acquired data with significant variation were tested using Welch's ANOVA test, and non-Gaussian distributed data were tested using Kruskal-Wallis test. P < 0.05 was considered statistically significant. The simple linear regression analysis was used to examine the presence of a linear relationship between two variables. Data were analyzed using GraphPad Prism software version 8.0.2 (GraphPad Software, La Jolla, CA, USA)”.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      In particular, theoretical analysis of the extant evidence and formulation of the hypothesis remains elusive in terms of the potential mechanisms of updating/maintaining balance in obesity

      We thank the reviewer for their feedback regarding the theoretical analysis and hypothesis formulation in our manuscript. We have attempted to build our hypothesis based on established correlations between dopamine levels and working memory capabilities, as seen in various populations affected by dopaminerelated changes (e.g. Parkinson’s disease (Fallon et al. 2017), older individuals (Podell et al., 2012), or more generally, in individuals with lower dopamine synthesis capacity (Colzato et al., 2013)). Our hypothesis — that individuals with higher BMI might show impaired updating — is an extrapolation from observed patterns in these conditions. We recognize that the evidence connecting obesity to similar neuropsychological profiles may seem preliminary. We have tried to elaborate more clearly on how we reached our hypotheses in the revised version of the introduction. 

      “Based on the above considerations these inconsistencies may be due to prior studies not clearly differentiating between distractor-resistant maintenance and updating in the context of working memory. This distinction may be crucial, however, as indirect evidence hints at potential specific alteration in these two sub-processes in obesity. For instance, obesity has been associated with aberrant dopamine transmission, with there being an abundance of literature linking obesity to changes in D2 receptor availability in the striatum (see e.g. Horstmann et al., 2015). However, results are not consensual, with studies reporting decreased, increased, or unchanged D2 receptor availability in obesity (Ribeiro et al., 2023; Janssen & Horstmann, 2022; see Darcey et al. (2023) for a potential explanation). Additionally, there are reports of differences in dopamine transporter (DAT) availability in both obese humans (Chen et al., 2008; but also see Pak et al., 2023) and rodents (Narayanaswami et al., 2013; Jones et al., 2021; Hamamah et al., 2023). The observed changes in dopamine are often interpreted as being due to chronic dopaminergic overstimulation resulting from overeating (Volkow & Wise, 2005; Volkow et al., 2008) and altered reward sensitivity as a consequence thereof (Blum et al., 1996). Considering that working memory gating is highly dependent on dopamine signaling, such changes could theoretically alter the balance between maintenance and updating processes in obesity. Next to this, obesity has frequently been associated with functional and structural changes in WM gating-related brain areas, implying another pathway through which working memory gating might get affected. At the level of the prefrontal cortex (PFC), studies have reported reduced gray matter volume and compromised white matter microstructure in individuals with obesity (Debette et al., 2014; Kullmann et al., 2016; Morys et al., 2024; Lv et al., 2024), and functional changes become evident with frequent reports of decreased activity in the dorsolateral PFC during tasks requiring cognitive control (e.g., Morys et al., 2018; Xu et al., 2017). Notably, Han et al. (2022) observed significantly lower spontaneous dlPFC activity during rest, potentially indicating reduced baseline dlPFC activity in obesity. On the level of the striatum, gray matter volume seems to correlate positively with measures of obesity (Horstmann et al., 2011), and individuals with obesity show greater activation of the dorsal striatum in response to high-calorie food stimuli compared to normal-weight individuals, indicating a stronger dopamine-dependent reward response to food cues (Stice et al., 2008; Small et al., 2003). Additionally, changes in connectivity between and within the striatum and PFC in obesity, both structurally (Li et al., 2023) and functionally (Verdejo-Román et al., 2017a, 2017b; Contreras-Rodríguez et al., 2017) have been reported. Although these studies mostly investigate brain function in relation to food and reward processing, changes in these areas may also impair the ability to adequately engage in working memory gating processes, as activity in affective (reward) and cognitive fronto-striatal loops immensely overlap (Janssen et al., 2019). On the behavioral level, individuals with obesity consistently demonstrate impairments in food-specific (Janssen et al., 2017) but also non-food specific goal-directed behavioral control (Janssen et al., 2020) and reinforcement learning (Weydmann et al., 2023). It seems that difficulties with integrating negative feedback may be central to these alterations (Mathar et al., 2017; Kastner et al., 2017), which could explain a potential insensitivity to the negative consequences associated with (over) eating. Crucially, in humans, a substantial contribution to (reward) learning is mediated by working memory processes (Moustafa et al., 2008; Collins & Frank, 2012, 2018; Collins et al., 2014, 2017; Westbrook et al., 2024). The observed difficulties in reward learning in obesity may hence partly be rooted in a failure to update working memory with new reward information, suggesting cognitive issues that extend beyond mere difficulties in valuation processes. However, empirical support for this interpretation is currently lacking. A more nuanced understanding of the effects of obesity on working memory is crucial, however, as it could lead to more targeted intervention options.”

      The result that Taq1A and DARPP-32 moderated the interaction between WM condition and BMI requires intricate post hoc analysis to understand the bearings to update. The authors found that Taq1A or DARPP32 genotype moderated the negative association between BMI and WM exclusively in the update condition (significant two-way interaction effect), suggesting that the BMI-WM associations in other conditions were similar across genotypes. Importantly, visual inspection of the relationship between WM and BMI (Fig 4 & 5) suggests more prevalent positive effects of the putatively advantageous Taq1A-A1 and DARPP-32-AA genotypes to the overall negative relationship between WM and BMI in updating, but not in the other conditions. Given that an overall negative relationship was statistically supported across all conditions (model 1), a plausible interpretation would be that the updating condition stands out in terms of a positive moderation by putative advantageous genotypes, rather than compound negative consequences of BMI and genotype in updating. Critically, this interpretation stands in stark contrast with the interpretation put forth by the authors suggesting a specifically negative association between BMI and WM updating.

      We are grateful for the reviewers’ thorough review and insightful comments. We appreciate the attention to detail and the opportunity to improve our manuscript. We agree that further examination of the relationship between Taq1A, DARPP-32, and BMI, particularly in the update condition, is crucial for a comprehensive understanding of our results. In response to your feedback, we have conducted additional post hoc analyses, which indeed revealed the effects anticipated by the reviewer. Accordingly, we have revisited our discussion and conclusions to ensure that they accurately reflect the complexities of our findings, particularly regarding the positive moderation by putative advantageous genotypes in the update condition. Once again, we appreciate your thoughtful review and are grateful for the opportunity to strengthen the manuscript based on your feedback.

      In the results section we added: 

      “Further post hoc examination of the effects on updating revealed that, the association between BMI and performance was significant for A1-carriers (95%CIs: -0.488 to -0.190), with 33.9% lower probability to score correctly per unit change in BMI, but non-significant for non-A1-carriers (95%CIs: -0.153 to 0.129; 1.22% lower probability). Interestingly, compared to all other conditions, in the update condition, the negative association between BMI and task performance was weakest for non-A1-carriers (estimate = -0.012, SE = 0.072, but strongest for A1-carriers (estimate = -0.339, SE = 0.076; see Figure 3 and Table S6), emphasizing that genotype impacts this condition the most. To further check if this difference in slope was statistically significant across conditions, we stratified the sample into Taq1A subgroups (A1+ vs. A1-) and assessed whether BMI affected task performance differently across conditions separately for each subgroup. This analysis revealed no significant difference in the relationship between BMI and task performance across conditions among A1+ individuals (pBMI*condition = 0.219). However, within the A1- subgroup, a significant interaction effect between BMI and condition emerged (pBMI*condition = 0.049). Collectively, these findings suggest that the absence of the A1-allele is linked to improved task performance, particularly in the context of updating, where it seems to mitigate the otherwise negative effects of BMI.” 

      “Once more, further examination of the observed DARPP-32, BMI, and condition interaction showed that, in the update condition, the negative association between BMI and task performance was weakest and nonsignificant for A/A (estimate = -0.044, SE = 0.066; 95%CIs: -0.174 to 0.086), but strongest and significant for G-carrying individuals (estimate = -0.324, SE = 0.079; 95%CIs: -0.478 to – 0.170). See Table S7 and Figure 5.  Splitting the sample in to DARPP subgroups (A/A vs. G-carrier) revealed that in both subgroups, there was significant interaction effect of BMI and condition on task performance (pA/A = 0.034, pG-carrier = 0.003). In the case of DARPP, it hence appears that carrying the disadvantageous G-allele could exacerbate the negative effects of BMI, while the more advantageous allele (A/A) might mitigate them - once again particularly in the context of updating.” 

      Following from this, we added the following text snippets to the discussion:

      “Noteworthy, our data revealed that differences in updating appeared to be driven by the non-risk allele groups. Despite increasing BMI, performance remained stable.” 

      “However, as BMI increases, the possession of a greater D2 receptor density seems to become advantageous, as evidenced by the lack of a negative correlation between BMI and updating performance in non-A carriers. We speculate that this phenomenon could potentially be attributed to the compensating effects of this genotype. While individuals with fewer D2 receptors (A1+) may have quicker saturation of receptors regardless of dopamine levels, in those with more D2 receptors (A1-) saturation may be slower. This could contribute to a more finely tuned balance between "go" and "no-go" signaling, despite potential alterations in dopamine tone in obesity (Horstmann et al., 2015; but also see Darcey et al., 2023 or Janssen & Horstmann, 2022). Clearly, the current data cannot provide empirical evidence for these speculations, and further discrete research is needed to establish firm conclusions. 

      Regarding DARPP, we found that carrying the G-allele significantly exacerbated the negative effects of BMI, while the more advantageous allele (A/A) mitigated them, once again particularly in the context of updating.”

      “Collectively, our observations hint at the potential of advantageous genotypes to moderate the adverse impacts of high BMI on cognitive functions.” 

      In conclusion, in its current form the title of the present work is ambivalent in terms of 1) the use of the term "impaired" in the context of cognitively normal individuals, 2) a BMI group difference specifically in the updating condition, and 3) the dopaminergic mechanisms based on observational data

      Given the results of the additional post hoc analyses, we agree with the reviewer and have refined the title of our work to be less misleading. The title now reads:     

      “Working Memory Gating in Obesity is Moderated by Striatal Dopaminergic Gene Variants” 

      Reviewer #1 (Recommendations for the Authors):

      Beyond the issues raised in the public review, I recommend the authors adjust the use of pathologizing terminology in the context of a clinically healthy population. In particular, terms like "dopaminergic abnormalities" and "working memory deficits/impairment" seem pathologizing in a healthy, non-morbidly obese cohort. To that end, despite a negative continuous association between BMI and WM, there are high and low-performing individuals in all BMI segments, and group differences (high vs low BMI; not reported) do not seem as dramatic as between healthy controls and say Parkinson's disease patients. Furthermore, owing to the observational design of the present study the authors should pay attention to the use of terms suggesting causal relationships, such as "influence" in the context of statistical associations. Also, sentences like "Our study is the first to show such selective effects" seem problematic not only in terms of claims of primacy, but also in terms of the selectivity of the effects (associations). See the public review for an alternative interpretation of selectivity to updating conditions.   

      Of minor importance are the occasional spelling errors, that should be carefully checked by the authors. Also, I would like the authors to double-check the model configurations reported in the main text and the supplementary material. According to the supplement model 1 contains task condition by subject as a random effect (random slope model), whereas the main text states that this model configuration didn't converge and therefore only subject-specific intercepts are included. Hence, there seems to be discordance between the model descriptions in the main text and supplement. To that end, it would seem appropriate to briefly motivate the use of LME and the random effect for subject (within-subject correlation between conditions). Also, the origin of the odds ratios (OR) reported in the results section is not explicitly defined in the methods or results.

      We appreciate the reviewer's thoughtful recommendations and have taken several steps to address the concerns raised:

      (1) We have revised our manuscript to ensure that the language is less pathologizing and avoids suggesting causal relationships where only associations are indicated.  

      For example: 

      In the abstract, we replaced “abnormalities” with “alterations”:   

      “Dopaminergic alterations have emerged as a potential mediator. However, current models suggest these alterations should only shift the balance in working memory tasks, not produce overall deficits”

      In the introduction we replaced “impairments” with “alterations”:               

      “This distinction may be crucial, however, as indirect evidence hints at potential specific alteration in these two sub-processes in obesity.

      Generally, we took care to replace terms like 'dopaminergic abnormalities' and 'working memory deficits/impairments' with more neutral descriptors suitable for a clinically healthy population in the whole manuscript. 

      (2) We have modified primacy statements to be more nuanced. In the discussion, for example, we now say “This finding is compelling as it demonstrates a rarely observed selective effect.” Instead of “This finding is compelling as we are the first to show such selective effects.”

      (3) We have conducted an additional thorough review of our manuscript to correct any spelling errors.

      (4) Upon reevaluation, we corrected the inconsistencies with respect to the random structure of model 1. We therefore have revised the supplementary material to now accurately reflect that the model did not converge when including condition as a random factor, and thus, only subject-specific intercepts are included.

      (5) We have expanded our methods section to better explain the use of linear mixed effects models (LMEs) and the inclusion of random effects for subjects to account for within-subject correlation between conditions. We added the following text:

      “Given the within-subject design of our study, we used generalized linear mixed models (GLM) […]” and

      “The random structure of the model was thus reduced to include the factor ‘subject’ only, thereby accounting for the repeated measures taken from each subject.”

      (6) We have clearly defined the derivation of the odds ratios reported in our results in the methods section of our manuscript. We added the following text to the methods section:

      “Reported odds ratios (OR) are retrieved from exponentiating the log-odds coefficients called with the summary() function.”

      Reviewer #2 (Public Review):

      The majority of participants seem to fall within the normal BMI range, whereas the interaction between BMI and genetic variations or amino acid ratio particularly surfaces at higher BMI. As genetic variations are usually associated with small effect sizes, the effective sample size, although large for a behavioral analysis only, might have been too small to detect meaningful effects of risk alleles of COMT and C957T.

      We thank the reviewer for the valuable feedback. We concur that the effective sample size may have posed a limitation in detecting meaningful effects of COMT and C957T, particularly given the skewness of our data towards participants within the normal BMI range. In response to the reviewer’s comments, we have refined the relevant paragraph in the limitations section of our manuscript, emphasizing the importance of recruiting a more balanced sample, including individuals with higher BMI, in future studies.

      “Furthermore, an additional limitation is that our data is slightly skewed towards participants within the normal BMI range. The effective sample size to detect meaningful genotype effects (e.g. for COMT or C957T) might thus have been too small, particularly at higher BMI levels. Future studies may address this limitation by recruiting a more balanced sample, including more individuals with higher BMI.”

      The relationships between genetic variations, BMI, and specific disturbances in dopamine signaling are complex, as compensating mechanisms might be at play to mitigate any detrimental effects. The results would therefore benefit from more direct measures or manipulations of dopaminergic processes.

      We thank the reviewer for this valuable input. We acknowledge the potential benefits of employing a more direct measure, or ideally, a dopaminergic manipulation, to establish a clearer causal link between dopamine processes and working memory gating in the context of obesity. In response to the reviewers' constructive feedback, we have addressed this limitation in the discussion section of our manuscript, emphasizing the need for further research in this area:

      “Additionally, the correlational nature of our findings highlights the need for more direct experimental manipulations of dopaminergic processes in obesity. Previous studies have established a causal link between dopamine and WM gating through drug manipulations (Fallon et al., 2017, 2019). Applying a similar approach to an obese sample could help establish a clearer causal link between dopamine activity and WM gating in the context of obesity.”

      The introduction could benefit from a more elaborate description of the predicted effects: into which direction (better or worse updating) would the authors predict each effect to go and why? This is clearly explained for COMT, but not for e.g. DARPP-32.

      We thank the reviewer for their valuable feedback. We appreciate the suggestion to provide a more detailed description of the predicted effects for each genetic marker in the introduction. We would like to note, however, that the analyses involving markers such as DARPP-32 were inherently exploratory in nature. Consequently, we intentionally refrained from formulating directed hypotheses, as our primary aim was to observe and report any emergent patterns.

      Reviewer #2 (Recommendations for the Authors):              

      To what extent are the polymorphisms or amino acid ratios associated with BMI? For example, when including C957T polymorphism in the analysis, the detrimental effect of BMI on working memory is no longer statistically significant. Could this be due to a relatively strong relationship between C957T polymorphism and BMI? Could the authors provide figures showing how BMI relates to the genetic polymorphisms and amino acid ratio?

      We appreciate the reviewer's insightful comment and have thoroughly investigated the potential relationship between the polymorphism and BMI. Our analysis did not reveal any direct association between C957T and BMI. We have included this analysis in our manuscript. The reviewer’s comment strengthened the comprehensiveness of our study.

      “Because the main effect of BMI dissipated when including C957T in the model, we ran an additional exploratory analysis to check whether this polymorphism directly related to BMI. Linear regression, predicting BMI by genotype, showed no association between the two (p = 0.2432), indicating that BMI effect is probably not masked by the presence of the C957T polymorphism. See Table S8.”

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment 

      This valuable manuscript reports alterations in autophagy present in dopaminergic neurons differentiated from iPSCs in patients with WDR45 mutations. The authors identified compounds that improved the defects present in mutant cells by generating isogenic iPSC without the mutation and performing an automated drug screening. The methodological approaches are solid, but the claims still need to be completed: showing the effects of the identified compounds on iron-related alterations is crucial. The effects of these drugs in vivo would be a great addition to the study. 

      Thank you for this assessment. We agree that further hit validation would be a great addition to the study. At present, we provide this through RNAseq data but not at the protein level. Further validation using in vivo models would also be warranted but is beyond the scope of the current work.

      Public Reviews:

      Reviewer #1 (Public Review): 

      Summary: 

      In the current study, Papandreou et al. developed an iPSC-based midbrain dopaminergic neuronal cell model of Beta-Propeller Protein-Associated Neurodegeneration (BPAN), which is caused by mutations in the WDR45 gene and is known to impair autophagy. They also noted defective autophagy and abnormal BPAN-related gene expression signatures. Further, they performed a drug screening and identified five cardiac glycosides. Treatment with these drugs effectively in improved autophagy defects and restored gene expression. 

      Strengths: 

      Seeing the autophagy defects and impaired expression of BPAN-related genes adds strength to this study. Importantly, this work shows the value of iPSC-based modeling in studying disease and finding therapeutic strategies for genetic disorders, including BPAN. 

      Weaknesses: 

      It is unclear whether these cells show iron metabolism defects and whether treatment with these drugs can ameliorate the iron metabolism phenotypes. 

      We are pleased to ascertain that the reviewer feels the work is an important step in the field for BPAN. We also absolutely agree that secondary hit validation assays showing cardiac glycoside efficacy in restoring patient-related in vitro phenotypes would be very valuable. 

      We set up  assays to investigate iron metabolism phenotypes, including  western blotting for Ferritin Heavy Chain 1, Transferrin and Ferroportin 1 (SLC40A1) at day 65 of differentiation, but found no significant difference when comparing patient lines to controls (data not shown). 

      We also performed cell viability studies using the Alamar Blue assay on Day 11 ventral midbrain progenitors after 24 hour exposure to a) glucose starvation, b) media with no antioxidants (L-ascorbic acid and B-27 supplement), c) oxidative stressors MPP+ 1mM and FeCl3 100 uM (MPP+ and FeCl3 as suggested by  Seibler et al  (Brain 2018 PMID: 30169597). We found no difference in cell viability between patients, age-matched controls and CRISPR lines (data not shown). Additionally, we examined lysosomal function in BPAN Day 11 progenitors (2 age-matched controls, 3 patient lines, 2 isogenic controls); again, using the autophagy flux treatments mentioned above) via LAMP1 high content imaging immunofluorescence. We have seen no difference in LAMP1 puncta production between patient lines and controls and, therefore, have not included this data in our revision.

      Overall, we agree with the reviewer that  more validation of the compound hits’ ability to restore robust BPAN-related in vitro and in vivo phenotypes (including studies of iron metabolism/ homeostasis) will be needed in the future – this could be undertaken in more mature 2D culture systems, 3D organoid models and disease-relevant animal models.

      Reviewer #2 (Public Review): 

      Summary: 

      In this manuscript, the authors aim to demonstrate that cardiac glycosides restore autophagy flux in an iPSC-derived mDA neuronal model of WDR45 deficiency. They established a patientderived induced pluripotent stem cell (iPSC)-based midbrain dopaminergic (mDA) neuronal model and performed a medium-throughput drug screen using high-content imaging-based IF analysis. Several compounds were identified to ameliorate disease-specific phenotypes in vitro. 

      Strengths: 

      This manuscript engaged in an important topic and yielded some interesting data. 

      Weaknesses: 

      This manuscript failed to provide solid evidence to support the conclusion. 

      We are pleased that the reviewer assesses the work as conceptually important and interesting. We also agree that more work to understand the pathophysiology underpinning BPAN, and the mechanisms through which cardiac glycosides help restore affected intracellular pathways are warranted. More validation of the compound hits’ ability to restore broader disease-specific in vitro and in vivo phenotypes is also needed in future studies. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      Overall, this is a nicely executed study. Here are my suggestions:

      (1) Showing the iron phenotypes in these cells and testing if treatment with these drugs rescues iron-related phenotypes will add significant value to this work. 

      We absolutely agree that secondary hit validation assays showing  glycoside efficacy in restoring disease-related in vitro phenotypes is warranted. The main issue here is identifying how WDR45 deficiency leads to cellular dysfunction or dyshomeostasis and early death. Unfortunately, the mechanism by which this happens is not yet delineated, and more relevant future work is needed. 

      In our lab, we set up such assays. Regarding iron metabolism-related phenotypes, we performed western blotting for Ferritin Heavy Chain 1, Transferrin and Ferroportin 1 (SLC40A1) but found no significant difference when comparing patient lines to controls (data not shown). We also performed cell viability studies using the Alamar Blue assay on Day 11 ventral midbrain progenitors after 24 hour exposure to a) glucose starvation, b) media with no antioxidants (L-ascorbic acid and B-27 supplement), c) oxidative stressors MPP+ 1mM and FeCl3 100 uM (MPP+ and FeCl3, as suggested by the Seibler et al paper, Brain 2018 PMID: 30169597). We found no difference in cell viability between patients, age-matched controls and CRISPR lines (data not shown). Additionally, we examined lysosomal function in BPAN Day 11 progenitors (2 age-matched controls, 3 patient lines, 2 isogenic controls; again, using the autophagy flux treatments mentioned above) via LAMP1 high content imaging immunofluorescence. We have seen no difference in LAMP1 puncta production between patient lines and controls and, therefore, have not included this data in our revision.

      (2) Assessing the effects of these drugs in an in vivo model will strengthen this study. 

      This is a valid point, and we agree that further validation using in vivo models such as the reported BPAN mouse models, would be warranted in the future.

      Reviewer #2 (Recommendations For The Authors): 

      While this manuscript engaged in an important topic and yielded exciting data, there are still some concerns for the authors to address. 

      (1) The biggest concern is that the characterization of autophagic flux solely with LC3 is not convincing enough. Although ATG2A and ATG2B are required for phagophore formation during autophagy, their interaction with WDR45 seems dispensable for phagophore formation for a mild autophagy defect observed in WDR45 knockout cell models and mouse models. All wdr45/- mice are born normally and survive the postnatal starvation period, unlike mice lacking essential ATG proteins, like ATG5, ATG7, and VMP1. The functional relevance of WDR45 and autophagy remains to be fully established. Overall, this manuscript failed to provide solid evidence to support the conclusion. 

      This is a valid point. We have looked at autophagy flux in fibroblasts and Day 11 ventral midbrain stage. For fibroblasts, 1 control line and three patient lines were used; for Day 11 progenitors, 2 control lines, 2 patient lines and one isogenic control were used. Cells from different lines were cultured on the same 96-well plates, at the same plating density, and treated concurrently to minimise fluctuations in flux due to unaccounted factors, e.g., confluence, incubator temperature etc. Treatments consisted of a) DMSO (basal condition), b) Bafilomycin A1 (flux inhibition via autophagosome/ lysosome fusion blockage), c) Torin A1 (mTOR inhibitor, flux inducer) and d) combination of Bafilomycin A1 and Torin 1, for a total of 3 hours. In all these conditions, LC3 puncta production in BPAN lines was reduced when compared to controls. We believe that these results indicate defective autophagy flux in BPAN in different cell types.

      Moreover, we have demonstrated defects in autophagy-related gene (ATG) expression through RNA sequencing, that is restored after CRISPR/Cas9-mediated correction of the disease-causing mutation in a patient derived line, but also after treatments with torin 1 and digoxin. These results suggest a dysregulated ATG network in WDR45 deficiency. 

      (2) WDR45 is linked to BPAN. Do the authors detect any iron accumulation in DA progenitors or mDA neurons? 

      Regarding iron metabolism-related phenotypes, we performed western blotting for Ferritin Heavy Chain 1, Transferrin and Ferroportin 1 (SLC40A1) but found no significant difference when comparing patient lines to controls (data not shown). We agree that more studies into the links between WDR45 deficiency, iron metabolism and neurodegeneration are needed. 

      (3) It is necessary to detect LC3 protein levels by western blot to distinguish LC3I and LC3II and gain a more accurate understanding for the process of LC3 - marked autophagosome. 

      Thank you for this valid point. 

      Due to the very dynamic nature of autophagy, and many factors influencing flux , we have not been able to meaningfully examine autophagy-related markers in an iPSC-derived system that is also inherently prone to variability.  Therefore, LC3 and p62 values exhibited high variability, and hence we are unable to adequately interpret them (data not shown). Instead, in this manuscript we have focused on high-content assays with cells cultured and treated simultaneously at Day 11 of differentiation, which have shown autophagy flux defects.

      We have looked at autophagy flux in fibroblasts and at Day 11 ventral midbrain stage. For fibroblasts, 1 control line and three patient lines were used; for Day 11 progenitors, 2 control lines, 2 patient lines and one isogenic control were used. Cells from different lines were cultured on the same 96-well plates, at the same plating density, and treated concurrently to minimise fluctuations in flux due to unaccounted factors, e.g., confluence, incubator temperature etc. Treatments consisted of a) DMSO (basal condition), b) Bafilomycin A1 (flux inhibition via autophagosome/ lysosome fusion blockage), c) Torin A1 (mTOR inhibitor, flux inducer) and d) combination of Bafilomycin A1 and Torin 1, for a total of 3 hours. In all these conditions, LC3 puncta production in BPAN lines was reduced when compared to controls. We believe that these results indicate defective autophagy flux in BPAN in different cell types.

      (4)  Some methodological details need to be included - detailed descriptions of various quantifications for IF staining should be provided. For example, it is unclear how "% cells+ ve for marker combination" (Fig.1B) was quantified, and there are many unconventional units such as "% cells+ ve for marker combination "; please check and correct them. 

      Thank you for pointing this out. We have changed the legends in Figure 1B and Supplementary Figure 2C to ‘percentage of cells positive for marker combination’. Moreover, in our Methods section (Immunocytochemistry sub-section), we have updated the text as follows, to give more clarification on the process of marker quantification (Page 25, Paragraph 2): ‘For quantification, 4 random fields were imaged from each independent experiment. Subsequently, 1200 to 1800 randomly selected nuclei were quantified using ImageJ (National Institutes of Health). Manual counting for nuclear (DAPI) staining and co-staining with the marker of interest was performed, and percentages of cells expressing combinations of markers were calculated as needed.’

      (5) In Figure 3 and Figure 4, the quantifications for IF images were inconsistent with the shown IF image, for example, the representative IF image for detection of LC3 with Tor1 treatment. 

      Due to space restrictions, we have not included representative images from all patient lines, and every treatment condition depicted in the graphs. In Figure 3 (describing the set-up of the LC3 screening assay), only one control line and one patient line is shown in basal (DMSO-treated) conditions. In Supplementary Figure 4D, only one patient line and the corresponding isogenic control line are depicted after Torin 1 treatments.

      Quantification of the LC3 puncta in this assay (20 fields per well, each condition in a technical duplicate, n=8 biological replicates) was automated, using ImageJ and R Studio, with subsequent statistical significance calculation on GraphPad Prism. Hence, the immunofluorescence figures depict a reduction in LC3 puncta per nuclei numbers in patient-derived lines versus controls, but not the exact difference after automated image analysis. We have detailed this in the Methods section (High content imaging-based immunofluorescence subsection) of our manuscript (Page 26, Paragraph 2): ‘For all high content imaging-based experiments, the PerkinElmer Opera Phenix microscope was used for imaging. 20 fields were imaged per well, at 40 x magnification, Numerical Aperture 1.1, Binning 1. Image analysis was performed using ImageJ and R Studio.60 For the drug screen, puncta values were normalised according to positive and negative controls from each plate and Z-scores for each compound screened were generated.  Statistical significances were calculated on GraphPad Prism V.

      8.1.2. software (GraphPad Software, Inc.; https://www.graphpad.com/scientific-software/prism/).’

      (6)  In Figure 4C, LC3 should be co-stain with the DA progenitor maker to indicate that the intercellular LC3 level within the projectors. 

      Thank you for raising this point. The images from Figure 4C were obtained during the medium throughput drug screen, where the FOXA2 co-stain was not used. The FOXA2 stain was only used during the initial set-up of the LC3 screening assay, to confirm that the Day 11 cells had ventral midbrain identities. Indeed, most of the Day 11 cells used in the high content imaging-related experiments were FOXA2-positive, as shown in Figure 3 and Supplementary Figure 4.

      (7) Examining P62, one of the most important indicators for autophagic flux, should be parallel with LC3 detection. In Figure 5A, P62 accumulation seems not significant in patient 02 Day 11 ventral midbrain projectors; how about that in Day 65? 

      The reviewer is raising a valid point. We have not examined p62 and LC3 staining in parallel in high content imaging-based experiments but agree that this would be good to examine in future studies. 

      Some other minor points 

      (8) It needs to give a more detailed description of the tested compounds you mentioned in the text. 

      Thank you for this point. We have elaborated on the contents of the Prestwick library used for the screening, as below (Page 9, Paragraph 3): ‘We then utilised this high-content imaging LC3 assay to identify novel compounds of potential therapeutic interest for BPAN by screening the Prestwick Chemical Library containing 1,280 compounds, of which more than 95% FDA/ EMA approved.’

      In the Methods Section, Page 25, Paragraph 5, we also detail the library as follows: ‘For drug screening, the Prestwick Chemical Library (1,280 compounds, 95% FDA/ EMA approved, 10 mM in DMSO, https://www.prestwickchemical.com/screening-libraries/prestwick-chemical-library/) was used; cells were treated with compounds for 24 hours at 10 μM final concentration.’

      (9) Please pay attention to the abbreviation; many gene names only have abbreviations without full names when they first appear in the context. 

      Thank you for this point. We have corrected this in various places throughout the manuscript and especially in the introduction section.

      (10) Almost all figures have the problem of insufficient image resolution, or the font of the indicated words needs to be bigger to be distinguished clearly, like in Fig.1B, 1C, 1E. 

      Thank you for this point, we have ensured that all figures have adequate image resolution as specified by the journal requirements. 

      (11) The sample size or biological repeated times should be given in figure legends. 

      Thank you for this point. We have now indicated numbers of biological replicates where appropriate.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Below, I will list the points that should be addressed by the authors:

      (1) Line 139: The authors conclude that the lack of a phenotype induced by knockdown of Polr1F is due to reduced baseline sleep because of the leakiness of the Genswitch system. However, it is not clear why the argument of the SybGS being leaky should not apply to all experiments done with this tool. The authors should comment on that aspect. Furthermore, this claim is testable since it should be detectable against genetic controls. An alternative explanation to the proposed scenario is that the Polr1F sleep phenotype observed in the constitutive knockdown experiment is based on developmental defects. The authors should provide additional evidence to explain the discrepancy.

      We appreciate the reviewer’s insightful feedback. We assume the reviewer is referring to Regnase-1 RNAi (and not Polr1F) as Regnase-1 RNAi flies exhibit reduced sleep before dusk, potentially hindering further detection of sleep reduction. The leaky sleep reduction was based upon comparison with genetic controls in that experiment. Nevertheless, to discern whether our observations stem from developmental effects, we conducted adult-specific knockdowns of both Polr1F and Regnase-1 using the TARGET system. We generated the R35B12-Gal4:TubGal80ts line and crossed it with the UAS-Polr1FRNAi and UAS-Regnase-1RNAi lines. We confirmed that Polr1F RNAi promotes sleep when knocked down in adults (Figure 3 - supplemental figure 1). Conversely, Regnase-1 showed no effect on sleep in the adult stage, which is consistent with our nSyb-GS experiments, and suggests, as noted by the reviewer, that the Regnase-1 RNAi sleep effect is likely developmental (Figure 3 – supplemental figure 3).

      (2) Line 170: Regnase1 knockdown affects all memory types, including short-term and long-term memory. The authors conclude that these genes are involved in consolidation. However, besides consolidation, it has been shown that α′β′ KCs are involved in short-term appetitive memory retrieval. Thus, an equally possible explanation is that the knockdown impairs the neuronal function per se, which would lead to a defect in all behaviors related to α′β′ KCs, rather than a specific role for consolidation. The authors have to provide additional evidence to substantiate their claim.

      The exact role of Regnase-1 in the α′β′ KCs remains unclear.  We acknowledge the reviewer’s concern and have amended our conclusion to include this potential explanation suggested by the reviewer.

      (3) Line 87-88: For the protocol used, it was reported that GFPnls cannot be used for FACS sorting. The authors might want to comment/clarify that aspect. https://star-protocols.cell.com/protocols/1669.

      For our RNA-seq experiments, we conducted single cell isolation by FACS sorting cells, instead of nuclei, labeled with GFP.nls. The protocol mentioned that GFP.nls is not effective for single nuclear RNA-seq as it is not specific for nuclei, but for our cell sorting purposes that did not matter.

      (4) Line 131: The authors should report the concentration of RU486.

      Sorry, this is now in methods.

      (5) Line 155: Is that really 42 hours? This might be a typo. If not, it would be good to justify the prolonged re-starvation period.

      Flies fed after training form sleep-dependent memories but did not show robust long-term memory after 30 h of restarvation. As starvation is a requisite for appetitive memory retrieval (Krashes and Waddell 2008), the low memory scores after 30 h could be due to inadequate starvation. Therefore, we starved flies for 42h, which is similar to the sleep-independent memory paradigm in which flies are starved for 18 h before training and then tested 24 h after training; this protocol resulted in robust long-term memory performance. These flies were fine and able to make choices in a T-maze after 42 h starvation.

      (6) I will be listing mistakes/unclear points in the figures. However, all figures should be checked very carefully for clarity.

      Thanks for these valuable comments. We have gone over the figures carefully and fixed any issues we found.

      (7) Figure 1C: It is not entirely clear to me how this heatmap was created and what the values mean.

      The 59 differentially expressed genes (DEGs) were selected based on DESeq2 described in the methods. For the heatmap, Transcripts per million (TPM) of these 59 DEGs were log-transformed and then scaled row-wise and plotted with IDEP v0.95 (http://bioinformatics.sdstate.edu/idep95/).

      (8) Figures 2A and 2B: The units might be missing. For Supplementary Figure 2, it is not clear what the different groups are without looking at the main figure.

      Fixed.

      (9) Figure 3: The panel arrangement is confusing. Furthermore, the "B)" is cut. The same issue is present in the Supplementary Figure.

      Sorry! We rearranged the panels, and fixed the issue in both figures.

      (10) Figure 5B: It is not clear what the scale bar means.

      Now indicated

      (11) Line 119: The citation "Marygold et al n.d."?

      Fixed

      (12) Line 620: I'm not sure that the rate and localization of nascent peptide synthesis are measured.

      Great point. We used the puromycin assay to estimate significant changes in translation. However, we did not measure the absolute translational rate or the localization of newly synthesized proteins. We rephrased this in the updated manuscript.

      (13) Line 627, the authors should give the NA of the objective, further the authors should double-check the information they provide on the resolution.

      Fixed, it was 20X.

      (14) Line 629 "Fuji" is unclear, it might refer to the Fiji software, and in that case, it should be listed in the used software. Further, the authors have to check on the information they provide on the intensity, e.g. is that GFP fluorescence?

      Yes, it was Fiji and GFP. The manuscript has been updated accordingly.

      (15) Line 634, It is stated that two concentrations of CX-5461 are used, however, as far as I can see only data for the 0.2 mM.

      We apologize for the confusion. Data are indeed only shown for 0.2 mM. We also tested 0.4 mM and 0.6 mM under fed conditions once and 0.1 mM under starved conditions twice. Since all effects were not significant, we only presented the complete 0.2 mM results in the supplementary figure.

      (16) Line 352 "Marygold et al nd" is probably a glitch in the citation?

      It’s a citation tool issue and has been fixed.

      (17) The authors use apostrophe rather than a prime in describing the α "prime" β "prime" KCs

      We have corrected this.

      Reviewer #2 (Recommendations For The Authors):

      The authors have generated an interesting study that promises to advance the understanding of how context-dependent changes in sleep and memory are executed at the molecular level. The manuscript is well-written and the statistical analyses appear robust. Major and minor comments are detailed below.

      Overall, I would suggest that the authors try to obtain additional evidence that Pol1rF modulates sleep and test the effect of acute adult-stage knockdown of Polr1F and Regnase-1 specifically in ap α'β' MBNs rather than pan-neuronally.

      Major comments

      (1) In Figures 2 and 3 and associated supplemental figures, the authors first test for a role for Polr1F and Regnase-1 specifically in ap α'β' MBNs (Fig. 2), then test for an acute role for these proteins via pan-neuronal drug inducible expression (Fig. 3). Because the former manipulation is cell-specific and the latter is pan-neuronal, it is hard for the reader to draw conclusions pertaining to ap α'β' MBNs from the second dataset. Perhaps Regnase-1 indeed acutely regulates sleep in ap α'β' MBNs, but that effect is masked by counteracting roles in other neurons? Conversely, it remains possible that Polr1F and Regnase-1 act during development in ap α'β' MBNs to modulate sleep. Indeed, since silencing the output of ap α'β' MBNs using temperature-sensitive shibire does not alter baseline sleep (Chouhan et al., (2021) Nature), the notion that Regnase-1 could act acutely in ap α'β' MBNs to reduce baseline sleep is somewhat surprising.

      The authors could address this by using a method such as TARGET (temperature-sensitive GAL80) to acutely reduce Polr1F and Regnase-1 expression specifically in ap α'β' MBNs and test how this impacts sleep.

      Thanks for the very helpful suggestions. We have done the suggested experiments and discuss them above in response to Reviewer 1. They are included in the manuscript as Figure 3 – supplemental figure 1 and figure 3 – supplemental figure 3.

      (2) Figure 4 presents data examining whether Polr1F and Regnase-1 knockdown suppresses training-induced increases in sleep. For the untrained flies, based on the data in Fig. 2C, E I expected that Polr1F knockdown flies would exhibit more sleep than their respective controls (Fig. 4E), but this was not the case. These data suggest that more evidence may be warranted to strengthen the link between Polr1F (and potentially Regnase-1) knockdown and sleep. Could the authors use independent RNAi constructs or cell-specific CRISPR (all available from current stock centres) to validate their current results? Related to this, it would be useful to know whether the authors outcrossed any of their transgenic reagents into a defined genetic background.

      The untrained flies in figure 4E are not equivalent to flies tested for Polr1F effects on sleep in figure 2C. In Figure 4E, flies were starved for 18 h and then exposed to sucrose without an odor at ZT6. Following sucrose exposure, flies were moved to sucrose locomotor tubes, and sleep was assessed only in the ZT8-12 interval. Sleep was not significantly different between untrained R35B12>Polr1FRNAi and Polr1FRNAi/+ flies, and while it was higher in R35B12>Polr1FRNAi than in R35B12/+ untrained flies, the data overall indicate that Polr1F downregulation has no impact on sleep under these conditions and at this time. Similarly, in fully satiated settings (Figure 2C), we found no difference in sleep during the ZT8-12 period between R35B12>Polr1FRNAi flies and genetic controls. We did not outcross our transgenic lines but have now tested another available Polr1F RNAi (VDRC: v103392) (Figure 3 – supplemental figure 1). As shown in the figure, adult-specific knockdown of Polr1F by this RNAi line promoted sleep, as did the initial RNAi line.

      (3) Could the authors provide additional evidence that Polr1F knockdown in ap α'β' MBNs does not enhance sleep by reducing movement? A separate assay such as climbing would be beneficial. Alternatively, examining peak activity levels at dawn/dusk from the 12L: 12D DAM data.

      We checked the peak activity per minute per day for adult specific knockdown of PorlF1 and Regnase-1 (data shown in Figure 3 – supplemental figure 4). The results show that Polr1F knockdown in ap α'β' MBNs does not enhance sleep by reducing movement.

      (4) In terms of validating their proposed model, over-expressing of Polr1F during appetitive training might be predicted to suppress training-induced sleep increases and potentially long-term memory. Do the authors have any evidence for this?

      We were unable to find any Pol1rF overexpression line. However, we obtained the Regnase-1 over-expression line from Dr. Ryuya Fukunaga’s lab and found that Regnase-1 OE does not affect sleep (Figure 4 – supplemental figure 1).

      Minor comments

      (1) Abstract: can the authors please define 'ap' as anterior posterior?

      Fixed.

      (2) Figure 2 Supplemental 1: can the authors please denote the genotypes each color refers to in?

      Fixed.

      (3) In Figure 3 Supplemental 1, the authors state that acute Regnase-1 knockdown did not reduce sleep, but sleep during the night period does appear to be reduced (panel A). Was this quantified?

      We quantified this, and it was not significant.

      (4) Discussion, line 234: the heading of this section is 'Polr1F regulates ribosome RNA synthesis and memory' but the data presented in Figure 4 suggests that Polr1F does not affect memory. Can the authors clarify this?

      We made an adjustment to the title and acknowledge that at the present time we cannot say Polr1F affects memory.

      (5) Methods, Key Resource Table: can the authors please identify which fly lines were used for Polr1F and Regnase-1 knockdown experiments?

      Fixed. Fly line BDSC64553 was used for Polr1F RNAi except in Figure 3 – supplemental figure 1 and 4, where VDRC 103392 was used. VDRC 27330 was used for Regnase-1 knockdown experiments.

      Reviewer #3 (Recommendations For The Authors):

      (1) Figure 1B: This plot is currently labelled as PCA of DEGs, which I believe is inaccurate, as such a plot is a quality control that examines the overall clustering of samples by using all read counts (not just the DEGs). In addition, the color key value of this Figure 1B is not provided.

      Thank you for the insightful suggestion. The reviewer’s comment here that typically PCA plots are used for overall clustering of RNA-seq samples is indeed valid. We've acknowledged that our samples, due to their high similarity in cell populations and mild treatments, do not exhibit clear separation when we use all genes. However, we show a PathwayPCA plot of all DEGs. We aim to highlight that RNA processing pathways enriched among the DEGs account for much of the separation of the groups.

      (2) A reviewer token is not provided to examine the sequencing data set.

      The RNA-seq data has been submitted to the Sequence Read Archive (SRA) with NCBI BioProject accession number PRJNA1132369. The reviewer token is https://dataview.ncbi.nlm.nih.gov/object/PRJNA1132369?reviewer=cvqkddp8rjuebsjefk0f19556r.

      (3) In the discussion, the author pointed out that many of the 59 DEGs have implicated functions in RNA processing. To strengthen the statement, it would be beneficial to conduct the Gene Ontology analysis to test whether the DEGs are enriched for RNA processing-related GO terms.

      We have included the GO analysis results in Figure1 and another GO analysis of all DEGs in Figure 1 – supplemental figure 1.

      (4) Figure 4E presents an intriguing finding because it shows that the untrained R35B12>Polr1FRNAi flies exhibit reduced sleep (instead of increased sleep) when compared to untrained Polr1/+ control flies.

      Please see above response to reviewer #2 question2.

      (5) For the memory assay method, the identity of odor A and odor B is not provided.

      We used 4-methylcyclohexanol and 3-octanol; this information has been added into the methods section.

      (6) Female flies were used for the sleep assay. However, it is not clear whether only female flies were used for the memory assay.

      Mixed sexes are used for memory assays because a huge number of files is needed for these experiments. We added this information in the methods.

      (7) It is important to provide olfactory acuity data on control and experimental animals to rule out that the learning/memory phenotype is caused by defects in sensing the odor used for training and testing.

      Since Polr1F RNAi flies perform well, odor acuity is not an issue. Regnase1RNAi affects both short-term and long-term memories, but this seems to be a developmental issue, so we did not do the odor acuity experiments here.

      (8) Line 20: "ap alpha'/beta'" neurons should be spelled as "anterior posterior (ap) alpha'/beta' neurons", as this is the first time that this anatomical name appears in this manuscript.

      Fixed.

      (9) Figure 2C and 2D labelling: R35B12>control; UAS control should be changed to R35B12/+ control; UAS-RNAi/+ control.

      Fixed.

      (10) Line 155: it is unclear why the flies were re-starved for 42hr before testing. Is this a different protocol from the 30hr re-starvation that was used by Chouhan et al., 2021?

      We have explained the rationale above. The starvation period was increased to get better memory scores.

      (11) Line 160: it is stated that knocking down Polr1F did not affect memory, which is consistent with Polr1f levels typically decreasing during memory consolidation. Is there a reference demonstrating that Polr1f levels typically decrease during memory consolidation?

      It’s from our RNA-seq dataset from Figure1C. The level of Polr1F decreased in fed trained flies compared with other control flies.

      (12)  Genotype labeling in Figure 4F is inconsistent with the rest of the manuscript.

      Fixed.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      This is a very nice study of Belidae weevils using anchored phylogenomics that presents a new backbone for the family and explores, despite a limited taxon sampling, several evolutionary aspects of the group. The phylogeny is useful to understand the relationships between major lineages in this group and preliminary estimation of ancestral traits reveals interesting patterns linked to host-plant diet and geographic range evolution. I find that the methodology is appropriate, and all analytical steps are well presented. The paper is well-written and presents interesting aspects of Belidae systematics and evolution. The major weakness of the study is the very limited taxon sampling which has deep implications for the discussion of ancestral estimations.

      Thank you for these comments.

      The taxon sampling only appears limited if counting the number of species. However, 70 % of belid species diversity belongs to just two genera. Moreover, patterns of host plant and host organ usage and distribution are highly conserved within genera and even tribes. Therefore, generic-level sampling is a reasonable measure of completeness. Although 60 % of the generic diversity was sampled in our study, we acknowledge that our discussion of ancestral estimations would be stronger if at least one genus of

      Afrocorynina and the South American genus of Pachyurini could be included.

      Reviewer #2 (Public Review):

      Summary:

      The authors used a combination of anchored hybrid enrichment and Sanger sequencing to construct a phylogenomic data set for the weevil family Belidae. Using evidence from fossils and previous studies they can estimate a phylogenetic tree with a range of dates for each node - a time tree. They use this to reconstruct the history of the belids' geographic distributions and associations with their host plants. They infer that the belids' association with conifers pre-dates the rise of the angiosperms. They offer an interpretation of belid history in terms of the breakup of Gondwanaland but acknowledge that they cannot rule out alternative interpretations that invoke dispersal.

      Strengths:

      The strength of any molecular-phylogenetic study hinges on four things: the extent of the sampling of taxa; the extent of the sampling of loci (DNA sequences) per genome; the quality of the analysis; and - most subjectively - the importance and interest of the evolutionary questions the study allows the authors to address. The first two of these, sampling of taxa and loci, impose a tradeoff: with finite resources, do you add more taxa or more loci? The authors follow a reasonable compromise here, obtaining a solid anchored-enrichment phylogenomic data set (423 genes, >97 kpb) for 33 taxa, but also doing additional analyses that included 13 additional taxa from which only Sanger sequencing data from 4 genes was available. The taxon sampling was pretty solid, including all 7 tribes and a majority of genera in the group. The analyses also seemed to be solid - exemplary, even, given the data available.

      This leaves the subjective question of how interesting the results are. The very scale of the task that faces systematists in general, and beetle systematists in particular, presents a daunting challenge to the reader's attention: there are so many taxa, and even a sophisticated reader may never have heard of any of them. Thus it's often the case that such studies are ignored by virtually everyone outside a tiny cadre of fellow specialists. The authors of the present study make an unusually strong case for the broader interest and importance of their investigation and its focal taxon, the belid weevils.

      The belids are of special interest because - in a world churning with change and upheaval, geologically and evolutionarily - relatively little seems to have been going on with them, at least with some of them, for the last hundred million years or so. The authors make a good case that the Araucaria-feeding belid lineages found in present-day Australasia and South America have been feeding on Araucaria continuously since the days when it was a dominant tree taxon nearly worldwide before it was largely replaced by angiosperms. Thus these lineages plausibly offer a modern glimpse of an ancient ecological community.

      Weaknesses:

      I didn't find the biogeographical analysis particularly compelling. The promise of vicariance biogeography for understanding Gondwanan taxa seems to have peaked about 3 or 4 decades ago, and since then almost every classic case has been falsified by improved phylogenetic and fossil evidence. I was hopeful, early in my reading of this article, that it would be a counterexample, showing that yes, vicariance really does explain the history of *something*. But the authors don't make a particularly strong claim for their preferred minimum-dispersal scenario; also they don't deal with the fact that the range of Araucaria was vastly greater in the past and included places like North America. Were there belids in what is now Arizona's petrified forest? It seems likely. Ignoring all of that is methodologically reasonable but doesn't yield anything particularly persuasive.

      Thank you for these comments.

      The criticism that the biogeographical analysis is “not very compelling” is true to a degree, but it is only a small part of the discussion and, as stated by the reviewer, cannot be made more “persuasive”, in part because of limitations in taxon sampling but also because of uncertainties of host associations (e.g. with ferns). We tried to draw persuasive conclusions while not being too speculative at the same time. Elaborating on our short section here would only make it much more speculative — and dispersal scenarios more so than vicariance ones (at least in Belinae).

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      I have a few comments relative to this last point of a more general nature:

      - I think it would be informative in Figure 1 to present family names for the outgroups.

      Family names for outgroups have been added to Figure 1.

      - There is a summary of matrix composition in the results but I think a table would be better listing all necessary information for each dataset (number of taxa, number of taxa with only Sanger data, parsimony informative sites, GC content, missing data, etc...).

      We added Table S4 with detailed information about the matrices.

      - Perhaps I missed it, but I didn't find how fossil calibrations were implemented in BEAST (which prior distribution was chosen and with which parameters).

      We used uniform priors, this has been added to the Methods section.

      - I am worried that the taxon sampling (ca. 10% of the family) is too low to conduct meaningful ancestral estimations, without mentioning the moderately supported relationships among genera and large time credibility intervals. This should be better acknowledged in the paper and perhaps should weigh more into the discussion.

      Belidae in general are a rare group of weevils, and it has been a huge effort and a global collaboration to sample all tribes and over 60 % of the generic diversity in the present study. A high degree of conservation of host plant associations, host plant organ usage and distribution are observed within genera and even tribes. Therefore, we feel strongly that the resulting ancestral states are meaningful.

      Moreover, 70 % of the belid species diversity belongs to only two genera, Rhinotia and Proterhinus. Our species sampling is about 36 % if we disregard the 255 species of these two genera.

      However, we acknowledge that our results could be improved by sampling more genera of Afrocorynina and Pachyurini. However, these taxa are very hard to collect. We have acknowledged the limitation of our taxon sampling, branching supports and timetree credibility intervals in the discussion to minimize speculative in conclusions.

      - It might be nice to have a more detailed discussion of flanking regions. In my experience and from the literature there seems to be increasing concern about the use of these regions in phylogenomic inferences for multiple solid reasons especially the more you go back in time (complex homology assessment, overall gappyness, difficulty to partition the data, etc...)

      We tested the impact of flanking regions on the results of our analyses and showed this data did not having a detrimental impact. We added more details about this to the results section of the paper, including information about the cutoffs we used to trim the flanking regions.

      Reviewer #2 (Recommendations For The Authors):

      Line 42, change "recent temporal origins" to "recent origins".

      Modified in the text.

      Line 97-98, "phylogenetic hypotheses have been proposed for all genera" This is ambiguous. The syntax makes it sound like these were separate hypotheses for each genus - the relationships of the species within them, maybe. However, the context implies that the hypotheses relate to the relationships between the genera. Clarify. "A phylogenetic hypothesis is available for generic relationships in each subfamily. . . " or something.

      Modified in the text.

      Line 162, ". . . all three subtribes (Agnesiotinidi, Belini. . . " Something's wrong here. Change "subtribes" to "tribes"?

      Modified in the text.

      Line 219, the comma after "unequivocally" needs to be a semicolon.

      Modified in the text.

      Line 327 and elsewhere, the abbreviation "AHE" is used but never spelled out; spell out what it stands for at first use. Or why not spell it out every single time? You hardly ever use it and scientists' habit of using lots of obscure abbreviations is a bad one that's worth resisting, especially now that it no longer requires extra ink and paper to spell things out.

      Modified in the text.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      Minor

      (MN1) The segregants should be referred to as F2 segregants as they are derived from an F1 cross.

      We thank the reviewer for pointing out this important oversight. We indeed analyzed segregants of an F1 cross and have corrected this in the text.

      (MN2) The connections to eQTLs in other organisms should be addressed in the introduction and conclusion. For example, in humans, there has been little evidence for trans eQTLs in contrast to what has been found in yeast.

      We thank the reviewer for pointing this out and improved our introduction and conclusion with such connections.

      (M3) The authors state that an advantage of scRNAseq over bulk is that it captures rare cell populations (line 79), but this advantage is not exploited in this study.

      While we did not explicitly demonstrate the effect of using scRNA-seq on capturing variation in rare cell populations, the referenced literature (21, 40) provides evidence that pooled scRNA-seq captures important expression heterogeneity (which implicitly contains potentially rare expression states). In our study, this is leveraged on F2 segregants to assess expression variation within the same lineage (genotype). This impacts the partitioning of expression variance from genotype.

      Thus, we mentioned this point to further support the choice of using scRNA-seq for this analysis and showed that even a few single cells enable the reconstruction of the genome and expression profile of rare cell types.

      (MN4) The authors use ~5% of the lineages from the original study. There is no rationale for why this is an appropriate sample size. Is there an argument for using more cells in eQTL mapping or conversely could the authors ask if fewer cells would provide similar conclusions by downsampling?

      Although scRNA-seq is highly scalable, it has limitations in terms of throughput. Indeed, a single library with 10x Genomics generates data in the order of 10^4 wellcovered cells. With these limitations, our choice of ~5% of the lineages of the original study stems from the need to recover the same lineage multiple times within these 10^4 cells (in our study, each lineage is recovered on average 4 times). 

      While it is possible to run multiple libraries and sequencing lanes, budget limitations prevent us from running more libraries, especially since we expect power to scale with the square-root of the number of lineages (there is diminishing returns). 

      (MN5) I do not agree that the use of UMIs overcomes the challenges of low sequencing depth. UMIs mitigate the possible technical artifacts due to massive PCR amplification.

      We thank the reviewer for this comment and will clarify this in the manuscript. Indeed, we intended to refer to the breadth of coverage (instead of the depth), which would usually manifest with massive PCR amplification of few transcripts.

      (MN6) There is an inadequate reference to prior work on scRNAseq in yeast that established the methods used by the authors and eQTL mapping in human cells using scRNAseq.

      We thank the reviewer for this and have added more context on scRNA-seq methods benchmark in yeast (drop-seq etc) and sc-eQTL in human. Additionally, we have cited Jariani et al. (2020) in eLife where similar techniques were employed for scRNA-seq in yeast.

      (MN7) The use of empty quotes in Figure 4A is confusing and an alternative presentation method should be used.

      We will remove these empty quotes characters and replace them with a more meaningful representation like “none”.

      (MN8) The authors speculate about the use of predicted fitness instead of observed fitness, but this is something they could explicitly address in their current study.

      We thank the reviewer for this comment but have decided not to perform a whole new bulk-segregant analysis experiment (X-QTL) to identify QTL that way. However, we do agree that we could in principle use the QTL that were identified in our previous study (Nguyen Ba et al, 2022). Despite this, we do not see the need for this because the predicted fitness is the overlap between genotype and phenotype (within the variance partitioning framework, it is the ‘narrow-sense heritability’ if one ignores epistasis). Thus, the use of predicted fitness when partitioning for expression variation would be constrained to that overlap (as opposed to the real observed fitness). This means that within the variance partitioning framework, the overlap of genotype, expression, and fitness is fully recapitulated by using predicted fitness instead (given that this predicted fitness is accurate to the narrow-sense heritability). In our previous study, we found that the QTL essentially predict all of the narrow-sense heritability. We believe it is therefore evident that the use of predicted fitness would be sufficient if and only if the expression variation independent of genotype is not associated with observed fitness.

      We note that our study cannot generalize whether the overlap between genotype and expression fully captures fitness variation explained by expression. Indeed, we believe this is not generalizable to many other contexts (for example, in development). Thus, at present, the use of predicted fitness remains a speculation.

      Major:

      (MJ1) There is insufficient information provided about the nature of data. At a minimum, the following information should be provided to enable assessment of the study: What is the total library size, how many genes are identified per cell, how many UMIs are found per cell, what is the doublet rate, and how are doublets identified (e.g. on the basis of heterozygous calls at polymorphic loci?), how many times is each genotype observed, and how many polymorphic sites are identified per cell that are the basis of genotype inferences?

      We understand that these metrics are relevant to the reader to have an idea of the power of our approach and integrate them in the manuscript in Table 1.

      (MJ2) The prior study analyzed 18 different conditions, whereas this study only assays expression in a single condition. However, the power of the authors' approach is that its efficiency enables testing eQTLs in multiple conditions. The study would be greatly strengthened through analysis of at least one more condition, and ideally several more conditions. The previous fitness study would be a useful guide for choosing additional conditions as identifying those conditions that result in the greatest contrasts in fitness QTL would be best suited to testing the generalizations that can be drawn from the study.

      We agree that a major strength of our approach is that it rapidly allows eQTL mapping in several conditions. While the experiments presented here are likely less expensive than the classical eQTL mapping experiments, the cost of 10x genomics and sequencing is still an important consideration. The pleiotropy analysis of the prior study was substantially difficult to interpret and put in context, and thus we decided to focus on a proof of concept and leave room for a more thorough analysis of multiple environments for a future study. We acknowledge that this is a main weakness of our manuscript.

      (MJ3) Alternatively, the authors could demonstrate the power of their approach by applying it to a cross between two other yeast strains. As the cross between BY and RM has been exhaustively studied, applying this approach to a different cross would increase the likelihood of making novel biological discoveries.

      We thank the reviewers for this suggestion, and it is indeed something that our lab is considering. Currently, one of our main point of the manuscript still relies on growth measurements of segregants (the fitness), which we cannot obtain from segregants and scRNA-seq alone. 

      Unfortunately, in this experimental design, it is difficult to obtain the fitness of cells and the genotype simultaneously because the barcode of the segregant is not expressed and not frequently read during genotyping. Thus, we still need to perform a whole QTL panel for a new cross without substantial re-engineering. 

      That being said, we are working on this but feel that including a new panel in this study is beyond the scope of our manuscript. 

      (MJ4) Figure 1 is misleading as A presents the original study from 2022 without important details such as how genotypes were identified. It is unclear what the barcode is in this study and how it is used in the analysis. Is the barcode for each lineage transcribed so that it is identified in the scRNA-seq data? Or, does the barcode in B refer to the cell index barcode? A clearer presentation and explanation of terms are needed to understand the method.

      Because F2 segregant lineage barcodes are not expressed, the barcode indicated in Figure 1B refers to cell barcodes from 10x Genomics. Our present study does not make use of the lineage barcode. We clarified this in the figure clarifying that panel A refers to the original study from 2022 and explicitly mentioning ‘cell barcodes’. 

      (MJ5) The rationale for the analysis reported in Figure 2B is unclear. The fitness data are from the previous study and the goal is to estimate the heritability using the genotyping data from the scRNA-Seq data. What is the explanation for why the data don't agree for only one condition, i.e. 37C? And, what are we to understand from the overall result?

      The rationale of Figure 2A/B is to show that cell lineage genotyping with scRNA-seq yields consistent results with previous genotype-phenotype analyses of the same cross. While Figure 2A shows that the single-cell imputed genotypes resemble the reference panel (sequenced in the Nguyen Ba 2022 study), Figure 2B shows that the variance partitioning to associate genotype to phenotype can be performed using the single-cell genotypes themselves (bypassing the reference panel). We believe this is an interesting result given that the reads obtained by scRNA-seq are constrained to a subset of SNP. However, we note that if the imputed single-cell genotypes were perfectly matching with the reference panel, it would not be surprising that one could do genotype-phenotype mapping from the single-cell genotypes.

      In Figure 2B, we tested whether the similarity of the single-cell imputed genotypes to the reference panel was enough to estimate heritabilities (another summary statistic). 

      In the remaining paragraphs of that result section, we further discuss that the single-cell lineage genotypes can be used for QTL mapping as well, recapitulating many of the QTL identified in the reference panel (provided that one controls for power). This result did not make it as a main Figure but is included in Figure S4.

      That being said, we decided to update the figure by comparing the estimates in subsamples of batch1 scRNA-seq to subsamples of batch 1 reference panel and subsamples of the full reference panel. Subsamples were performed to control for power in the variance partitioning. We also noticed that the fitness of several F2 segregants is missing for the phenotypes 33C, 35C and 37C in the original study so we decided to exclude these environments.

      (MJ6) Figure 3 presents an analysis of variance partitioning as a Venn diagram. This summarized result is very hard to understand in the absence of any examples of what the underlying raw data look like. For example, what does trait variation look like if only genotype explains the variance or if only gene expression explains the variance? The presented highly summarized data is not intuitive and its presentation is poor - the result that is currently provided would be easier to read in a table format, but the reader needs more information to be able to interpret and understand the result.

      The Venn diagram is largely adopted in the context of variance partitioning (see Cohen, Jacob, and Patricia Cohen. 1975. Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences.) but we realize that it has not been used often for displaying heritability estimates. To this end, we have added explanatory labels for the biological meaning of the areas or components of the diagram in the Figure and in the text. 

      (MJ7) I am concerned about the conclusions that can be drawn about expression heritability. The authors claim that expression heritability is correlated with expression levels. It seems likely that this reflects differing statistical power. How can this possibility be excluded?

      We thank the reviewer for highlighting this. We now explicitly acknowledge this potential confounding factor in the manuscript.

      (MJ8) Conversely, the authors claim that the genes with the lowest heritability are genes involved in the cell cycle. However, uniquely in scRNA-seq, cell cycle regulated genes appear to have the highest variance in the data as they are only expressed in a subset of cells. Without incorporating this fact one would erroneously conclude that the variation is not heritable. To test the heritability of cell cycle regulation genes the authors should partition the cells into each cell cycle stage based on expression.

      The reviewer is right to say that the low heritability of cell cycle control genes could be explained by the fact that these genes are only expressed in a subset of the dataset. Indeed, a high transcriptomic variance does not necessarily imply a low expression heritability: the cell cycle could be the residual of the expression heritability model, i.e. it explains expression variance with low association to genetic mutation.

      That being said, our result is consistent with results obtained from yeast bulk RNA-seq (Albert et al. 2018), in which cell cycle is averaged out. 

      In our study, we also average out the cell-cycle as we use the consensus expression and the consensus genome to estimate the heritability.

      (MJ9) I do not understand Figure S5 and how eQTL sites are assigned to these specific classes given that the authors say that causative variation cannot be resolved because of linkage disequilibrium.

      The rationale for Figure S5 is to show that the QTL model obtained from single-cell data is consistent with the reference panel QTL mapping experiment. Although there is uncertainty around the exact position of the QTL, we relied on the loci with the highest likelihood and showed that the datasets have consistent features. This is enabled by the fact that the QTL identified using the scRNA-seq genotypes are the ones with largest effect size in the reference panel, and are thus more likely to be mapped accurately.

      (MJ10) The paragraph starting at line 305 is very confusing. In particular, the authors state that they identify a hotspot of regulation at the mating type locus. It is not obvious why this would be the case. Moreover, they claim that they find evidence for both MATa and MATalpha gene expression. Information is not provided about how segregants were isolated, but assuming that the authors did not dissect 25,000 tetrads to obtain 100,000 segregants I would infer that random spore using SGA was used. In that case, all cells should be MATa. The authors should clarify and explain this observation.

      Although most of the cells have the MATa mating type (as selected by random spore using SGA), it is well known and discussed in Nguyen Ba et al. paper that there are few lineages with other mating types or diploids (they are leakers in the selection process). 

      Indeed, we verified that we can detect a small number of MATalpha cells or diploids within this pool.

      (MJ11) Ultimately, it is not clear what new biological findings the authors have made. There are no novel findings with respect to causative variation underlying eQTLs and I would encourage the authors to make clearer statements in their abstract, introduction, and conclusion about the key discoveries. E.g. What are the "new associations between phenotypic and transcriptomic variations" mentioned in the abstract?

      This paper focuses more on the proof of concept that scRNA-seq can help integrate expression data in GPM analysis to reveal broad scale associations between fitness and expression. Indeed, novel findings include new hotspots of expression regulation in the RM/BY genetic background, we find that trans-regulation of expression has more impact than cis-regulation on fitness and evaluate the strength of the association between the genome, the transcriptome and fitness (in one environment). Additionally, the analysis reveals biological questions that cannot be answered even by increasing the experimental scale of eQTL mapping experiments. For example, we find that most of the missing heritability is not explained by expression. These key points will be clarified in the abstract, introduction and conclusion as suggested by the editors.

      Reviewer #2:

      (MJ1) Most of the figures center on methods development and validation for the authors' single-cell RNA-seq in the yeast cross […] One potential novelty of the study is the methods per se: that is, showing that scRNA-seq works for concomitant genotyping and gene expression profiling in the natural variation context. The authors' rigor and effort notwithstanding: in my view, this can be described as modest in terms of principles. That is, the authors did a good job putting the scRNA-seq idea into practice, but their success is perhaps not surprising or highly relevant for work outside of yeast (as the discussion says).

      Although the scope of the method is limited, we think that it can apply to any largescale dataset in which transcription variance and genetic diversity are not small. This can help reduce the lack of associations between trait heritability and expression regulation, which is frequent as these two parameters are often not measured within the same dataset. 

      We can, however, think of some other settings where a similar experiment may be interesting. This includes, for example, pooling cells from different human individuals (with enough genetic diversity) and applying the same scRNA-seq method to back-identify the individuals and matching them to a particular phenotype. We believe our proof of concept is therefore an important contribution as these other experiments might have broad implications.

      (MJ2) The more substantive claim by the authors for the impact of the study is that they make new observations about the role of expression in phenotype (lines 333-335). The major display item of the manuscript on this theme is Figure 4A, reporting which loci that control growth phenotype (from an earlier paper) also control expression. This is solid but I regret to say that the results strike me as modest.

      This paper focuses more on the proof of concept that scRNA-seq can help integrate expression data in GPM analysis to reveal broad scale associations between fitness and expression. Indeed, novel findings include new hotspots of expression regulation in the RM/BY genetic background, we find that trans-regulation of expression has more impact than cis-regulation on fitness and evaluate the strength of the association between the genome, the transcriptome and fitness (in one environment). Additionally, the analysis reveals biological questions that cannot be answered even by increasing the experimental scale of eQTL mapping experiments. For example, we find that most of the missing heritability is not explained by expression. These key points will be clarified in the abstract, introduction and conclusion as suggested by the editors.

      (MJ3) The discussion makes some perhaps fairly big claims that the work has helped "bridge understanding of how genetic variation influences transcriptomic variation" and ultimately cellular phenotype. But with the data as they stand, the authors have missed an opportunity to crystallize exactly how a given variant affects expression (perhaps in waves of regulators affecting targets that affect more regulators) and then phenotype, except for the speculations in the text on lines 305-319. The field started down this road years ago with Bayesian causality inference methods applied to eQTL and phenotype mapping (via e.g. the work of Eric Schadt). The authors could now try Mendelian randomization-type fine-grained detailed models for more firepower toward the same end, and/or experimental tests of the genotype-to-expression-to-phenotype relationship. I would see these directions, motivated by fundamental questions that are relevant to the field at large, as leading to a major advance for this very crowded field. As it stands, I felt their absence in this manuscript especially if the authors are selling principles about linking expression and phenotype as their take-home.

      We thank the reviewer for this suggestion and agree that the analysis of the genotypeto-expression-to-phenotype relationship would benefit from a more fine-grain model. While we are interested in exploring this, we decided to limit the scope of this manuscript to the proof of concept that scRNA-seq can help gain insights about the genotypephenotype map at a broader scale.

      (MN1) I also wonder whether the co-mapping of expression and growth traits in Figure 4A would have been possible with e.g. the bulk RNA-seq from Albert et al., 2018, and I recommend that the authors repeat the Figure 4A-type analyses with the latter to justify their statement that their massive scRNA data set would actually be necessary for them to bear fruit (lines 386-388).

      By repeating our eQTL hotspot analysis with Albert et al. (2018) data, we observed a non-significant association between eQTL hotspot and QTL (χ2 p = 0.50). That being said, there are some differences in the Albert et al. Experiment that preclude us from conclusively saying whether the bulk RNA-seq experiments by Alberts would not bear fruit. Indeed, that experiment is only 4 times smaller in scale and so we would not expect dramatic differences. To highlight power differences, the Albert et al. Paper identified about 6 eQTL per gene, while our study identified about 21 which is consistent with the power differences.

      This highlights that this scRNA-seq experiment is scalable, so the technique may be useful for further studies. In addition, this pooled scRNA-seq strategy enables analysis of the association of transcription with phenotype.

      (MN2) I also read the discussion of the manuscript as bringing to the fore some of the challenges a reader has in judging the current state of the results to be of actionable impact. The discussion, and the manuscript, will be improved if the authors can put the work in context, posing concrete questions from the field and stating how they are addressed here and what's left to do.

      We agree with the reviewer and have summarized our answers to some of the questions in the field in the discussion section.

      All that being said, we acknowledge the limitations of our study.